Facebook’s auto tagging feature was already impressive — but now the social media network will recognize that the photo you just uploaded is at the Golden Gate Bridge, that you’re wearing a black shirt, and whether or not you are walking, dancing, or perhaps riding a horse.
Today, Joaquin Candela, the network’s director of applied machine learning, shared a blog post detailing how Facebook can now recognize not just the objects in your photographs, but scenes and actions, too, even on posts shared without a single word.
Lumos is the artificially intelligent program that allows the computer to “see” what’s inside the image you just shared, even if you didn’t give it any sort of text description. That machine learning system is behind a number of Facebook’s image-recognition features, from flagging nudity to fighting spam.
Today, that system is getting an upgrade with the ability to recognize actions, not just objects. The network’s automatic alt text, used for describing a photo to the visually impaired, will now recognize 12 different actions, from walking and dancing to actions that can be described by a verb with a noun tacked at the end — like riding a horse or playing an instrument.
While the program stemmed from, and still continues to work toward, giving the visually impaired a better idea of what’s in their newsfeed, the same technology will now allow any Facebook user to search for an image of a particular place, action, or even a particular garment like a black shirt or red dress. While photos with a description may have popped up in the search results previously, the new action-tagging gives the search more accuracy even for photos that were uploaded without any relevant text.
Lumos is a machine learning program, which means that the more photos it labels and the more details the team feeds it, the more accurate those photo labels become. Facebook’s machine learning team taught the program to recognize actions by asking humans to label 130,000 actual Facebook photos. Feeding those descriptions into the machine learning program “taught” the system how to identify actions.
Searching for a particular object or action and then clicking on the “photos” tab in the search results will now display relevant searches from your own newsfeed as well as friends and public images, using the images tagged with the machine learning, not just accompanying captions.
Candela says that the idea will also advance into automatically labeling videos in the future. “While these new developments are noteworthy, we have a long and exciting road ahead and are just scratching the surface of what is possible with a self-serve computer vision platform,” he wrote. “With computer vision models getting pixel perfect and Facebook advancing into video and other immersive formats, Lumos will help unlock new possibilities in a reliable, fast, and scaleable way and pave the road for richer product experiences in the near future.”