Skip to main content

Here’s how Facebook taught its Portal A.I. to think like a Hollywood filmmaker

Facebook Portal+ review
Dan Baker/Digital Trends

When Mark Zuckerberg built the first version of Facebook in his college dorm room at Harvard, he imagined it as a window that would allow people to look in on the lives of other users. If Google was a search engine for information then Facebook, by contrast, was a search engine for people. Fifteen years later, Facebook has taken this ambition to the next level. By creating Portal and Portal+, its line of screen-enhanced smart speakers, launched in November 2018, the social media giant has established a far more literal window, letting Facebook users to make video calls to one another.

The Portal smart speakers literalize another Facebook dream, too. Where Facebook was, in essence, a search engine for people, Portal actually does search them out: with a roving 12-megapixel camera, boasting a 140-degree field of view, which follows you around the room to see what you’re doing. As Digital Trends put it in our review, “if you’re busy moving about the kitchen while asking Grandma how to make her famous meatballs, you can keep busy while listening to her talk.”

Recommended Videos

What exactly is the smart technology that drives Portal? And how does Facebook think it’s cracked the challenge of making regular video chat feel as personal as sitting down for a real conversation? The answer involves some impressive artificial intelligence — and an added human touch.

Facebook Portal+ review
Dan Baker/Digital Trends

Making cameras smarter

Right from the start, Facebook knew that the core to its Portal experience would be the so-called “Smart Camera” system. The idea of the Smart Camera was to move beyond the kind of static shot that services like Skype have been offering us for years, and to play a more creative role in the process. Just as a movie director or cinematographer knows when to employ a wide shot or when to zoom in for an intimate close-up, so Facebook challenged its engineers to imitate this same ability with Portal.

To give this camera the necessary human touch, Facebook worked with filmmakers to figure out the best way of distilling their wisdom into machine learnable insights. In one case, it asked them to demonstrate how they might shoot a scene in which it was impossible to capture all the relevant information from one fixed angle.

Portal comprises an extremely wide-angle lens in which all movement and editing decisions are made entirely digitally.

In another, Facebook engineers looked at the different photographic elements that camera operators prioritize in portrait and landscape shots. These observations formed the basis of software models which attempt to imbue Portal with some of the decision-making quirks we would normally attribute to human creativity.

“We wanted to create a hands-free video calling experience that removes feelings of physical distance and is more like hanging out together,” Eric Hwang, one of the engineers behind Portal, explained to Digital Trends.

The resulting system — which Facebook says took it “under two years” to create from scratch — allows Portal to make decisions designed to improve the flow of a conversation. In a newly published blog post, it details some of the illustrations of why this might be necessary. For example, if you’re in a crowded room, full of people interacting with one another, it must choose when to follow an individual out of frame or when to zoom out to accommodate new subjects.

Facebook software engineers Eric Hwang (sitting in chair initially) and Arthur Cavalcanti demonstrate the Portal's cinematic camera-like tracking and framing.

Similarly, it must learn to deal with changing light situations in real time. What do you do if your subject is lying down in a dark room, half covered by a blanket, but there are kids running around in the background causing motion blur? Portal weighs all of this information in less than the blink of an eye and tries to determine the best outcome. (If you want to manually control who it focuses on, that’s now possible too.)

Technical challenges

From a technical perspective, a a couple of things make Portal’s technology impressive. The first is that it can do all of this without the use of an actual moving camera. Early on in the development process, Portal’s engineers tried out prototypes which used a motorized camera, which swiveled to face subjects. However, this was decided against on the basis that it caused a lag and a point of potential mechanical failure. Instead, Portal comprises an extremely wide-angle lens in which all movement and editing decisions are made entirely digitally.

Second, the team working on Portal found a way to achieve its decision making processes without having to rely on cloud computing. According to Hwang, the computational firepower is all achieved in-device.

Evolution of the Facebook Portal
Early Portal prototypes relied on a motor to physically move the camera. Facebook Engineering

“Capturing everyone in a video frame isn’t a hard engineering problem, as many engineers can do that with today’s computer vision advancements,” he said. “The innovation is in capturing the relevant people or person in real-time, on-device, using just the small mobile chip inside Portal as processing power. Usually these types of A.I. tasks require dedicated, large servers. [We] overcame that obstacle by compressing complex computer vision models until they could fit on the chip we use for Portal and still run accurately and reliably.”

To do this, Portal draws on Facebook’s long-term investment in artificial intelligence. It uses a 2D pose-detection system which runs at 30 frames per second. The intentionality of these poses help Portal to make continuous decisions about what its subjects are doing — and when it might need to digitally pan or zoom as a result. It additionally utilizes research into depth cameras developed by Facebook Reality Labs as part of the social media giant’s virtual reality efforts.

A growing market

Facebook is convinced that it is onto a winner with Portal. It’s easy to see where its confidence comes from. Right now, the smart speaker market is booming. Although largely dominated by market leader Amazon, it is growing at more than 100 percent year-on-year. That’s good news for tech companies searching for the next big thing at a time of flattening smartphone sales.

Facebook Portal+ review
Dan Baker/Digital Trends

While Facebook was the last of the big four tech giants (Amazon, Alphabet, Facebook and Apple) to jump on the bandwagon, it is still one of the first wave of smart speakers centered around the screen as a communication device.

“Portal is the only product on the market of its kind,” Hwang said. “Today, smart speakers and displays are built around information and commerce. Portal is built to make it easier to connect with the people that matter most: our closest friends and family. And Portal is focused on connecting people — part of Facebook’s mission — which is not currently served well by the home device market.”

Privacy challenges ahead?

So what’s stopping stopping Facebook? Well, potentially privacy. Users have proven surprisingly willing to embrace “always listening” gadgets from companies like Google with a vested interest in user data. But a device that both watches and listens you is more invasive still. Furthermore, Facebook’s reputation is still suffering after last year’s Cambridge Analytica scandal.

Adding smarts to the Portal video chat camera (Facebook)

Just days before this very article was published, the Washington Post reported that Facebook is negotiating a record breaking, multi-billion dollar settlement with the FTC for its privacy misdemeanors. With a growing backlash from many former users, it’s yet to be revealed if Facebook has an Amazon Echo-style hit on its hands — or an Amazon Fire Phone-style flop.

Facebook assured us that it does not listen to, view, or keep the contents of Portal video calls, which are additionally encrypted to avoid eavesdropping. The fact that Portal’s A.I. smarts run locally on the device, and not on Facebook servers, also means that this information does not leave your home. Voice commands are sent to the company only after you say “Hey Portal,” and users can delete their voice history in Facebook’s Activity Log at any time.

But there’s no getting around the fact that there is still a degree of data collection taking place. “While we don’t listen to, view, or keep the contents of your Portal video calls, or use this information to target ads, we do process some device usage information to understand how Portal is being used and to improve the product,” Facebook notes. (Portal’s privacy policy can be read here.)

Portal offers some very smart technology with massive implications for the future of video chat. There’s no doubt that the company has managed to pull off something very impressive from a technological point of view. But whether it can convince potential customers that this is a solution they need in their lives will, ultimately, prove to be the real achievement.

Luke Dormehl
Former Digital Trends Contributor
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Surveillance on steroids: How A.I. is making Big Brother bigger and brainier
ai taking facial recognition next level skylark labs plaza post detection

It’s no big secret that we live in a surveillance state. The average American is caught on CCTV camera an estimated 75 times a day. Meanwhile an average Londoner, the world’s most photographed person, is snapped on public and private security cameras an estimated 300 times every 24 hours.

But if you thought that the future was just more cameras in more places, you’d better think again. Thanks to breakthroughs in fields like computer vision, tomorrow’s surveillance society is going to be a whole lot more advanced. The amount of information which can be extracted from video footage is increasing all the time. As a result, instead of simply static recordings made for future reference, or live feeds viewed by bored workers watching banks of monitors at the same time, CCTV is getting smarter. Way smarter. I'm talking multiple orders of magnitude smarter, and a whole lot more mobile, too.
The eye in the sky
One such new technology, the so-called Aerial Suspicious Analysis (ASANA), is the work of computer vision company Skylark Labs. ASANA is a drone-based security system, designed to spot suspicious activity in crowds of hundreds of people from anywhere between 10 feet and 300 feet. Drones equipped with its cloud-based technology can hover over large gatherings of people and accurately identify behavior such as fighting, kicking, stabbing, fighting with weapons, and more. That’s alongside non-suspicious activities like high-fiving, dancing, and hugging, used to make the system more robust to false alerts. The system is also being modified to spot suspicious objects, such as unattended bags. Alerts can then be sent to the relevant authorities.

Read more
BYD’s cheap EVs might remain out of Canada too
BYD Han

With Chinese-made electric vehicles facing stiff tariffs in both Europe and America, a stirring question for EV drivers has started to arise: Can the race to make EVs more affordable continue if the world leader is kept out of the race?

China’s BYD, recognized as a global leader in terms of affordability, had to backtrack on plans to reach the U.S. market after the Biden administration in May imposed 100% tariffs on EVs made in China.

Read more
Tesla posts exaggerate self-driving capacity, safety regulators say
Beta of Tesla's FSD in a car.

The National Highway Traffic Safety Administration (NHTSA) is concerned that Tesla’s use of social media and its website makes false promises about the automaker’s full-self driving (FSD) software.
The warning dates back from May, but was made public in an email to Tesla released on November 8.
The NHTSA opened an investigation in October into 2.4 million Tesla vehicles equipped with the FSD software, following three reported collisions and a fatal crash. The investigation centers on FSD’s ability to perform in “relatively common” reduced visibility conditions, such as sun glare, fog, and airborne dust.
In these instances, it appears that “the driver may not be aware that he or she is responsible” to make appropriate operational selections, or “fully understand” the nuances of the system, NHTSA said.
Meanwhile, “Tesla’s X (Twitter) account has reposted or endorsed postings that exhibit disengaged driver behavior,” Gregory Magno, the NHTSA’s vehicle defects chief investigator, wrote to Tesla in an email.
The postings, which included reposted YouTube videos, may encourage viewers to see FSD-supervised as a “Robotaxi” instead of a partially automated, driver-assist system that requires “persistent attention and intermittent intervention by the driver,” Magno said.
In one of a number of Tesla posts on X, the social media platform owned by Tesla CEO Elon Musk, a driver was seen using FSD to reach a hospital while undergoing a heart attack. In another post, a driver said he had used FSD for a 50-minute ride home. Meanwhile, third-party comments on the posts promoted the advantages of using FSD while under the influence of alcohol or when tired, NHTSA said.
Tesla’s official website also promotes conflicting messaging on the capabilities of the FSD software, the regulator said.
NHTSA has requested that Tesla revisit its communications to ensure its messaging remains consistent with FSD’s approved instructions, namely that the software provides only a driver assist/support system requiring drivers to remain vigilant and maintain constant readiness to intervene in driving.
Tesla last month unveiled the Cybercab, an autonomous-driving EV with no steering wheel or pedals. The vehicle has been promoted as a robotaxi, a self-driving vehicle operated as part of a ride-paying service, such as the one already offered by Alphabet-owned Waymo.
But Tesla’s self-driving technology has remained under the scrutiny of regulators. FSD relies on multiple onboard cameras to feed machine-learning models that, in turn, help the car make decisions based on what it sees.
Meanwhile, Waymo’s technology relies on premapped roads, sensors, cameras, radar, and lidar (a laser-light radar), which might be very costly, but has met the approval of safety regulators.

Read more