The AI research division at Facebook is open sourcing its image recognition software with the aim of advancing the tech so it can one day be applied to live video. Facebook’s DeepMask, SharpMask, and MultiPathNet software is now available to everyone on GitHub.
Facebook previously laid out its image-recognition systems in a number of research papers, which are also being made available to the public along with its demos. At present, the company’s algorithms work in conjunction with its MultiPathNet convolutional neural networks — an AI that is fed huge amounts of data until it can autonomously recognize other data — allowing Facebook to understand an image based on each pixel it contains.
In order to classify and label the objects in an image, Facebook couples its DeepMask segmentation framework with its SharpMask segment refinement module. The final stage in Facebook’s machine vision system utilizes its MultiPathNet deep learning AI to label each object in the photo.
According to Facebook, AI machine vision software has progressed in leaps and bounds over the past few years, allowing the type of image classification that didn’t even exist a short while ago. Facebook claims that open sourcing the software is critical to its advancement.
Deep learning techniques are springing up all over the big blue behemoth. The AI powers Facebook’s (controversial) facial-recognition feature, manages curation on its News Feed, and is even utilized within its digital assistant for Messenger.
This isn’t the first time Facebook has open sourced its AI. In fact, the company is somewhat of a trailblazer when it comes to sharing its tech. In December, Facebook submitted its state-of-the-art computer server dedicated to AI to the Open Compute Project — a group consisting of tech giants, such as Apple and Microsoft, that share the designs of their respective computer infrastructures.
Facebook is already predicting the future use cases for the image-recognition tech. The company reveals that it could potentially help it to build upon its existing AI generated image descriptions for the visually impaired.
“Currently, visually impaired users browsing photos on Facebook only hear the name of the person who shared the photo, followed by the term “photo,” when they come upon an image in their News Feed,” writes Piotr Dollar, research scientist at Facebook AI Research (FAIR), in a blog post. “Instead we aim to offer richer descriptions, such as ‘Photo contains beach, trees, and three smiling people.’”
Additionally, Facebook claims that its next challenge is to apply its image-recognition techniques to video, “where objects are moving, interacting, and changing over time,” and even Facebook Live broadcasts. “Real-time classification could help surface relevant and important Live videos on Facebook, while applying more refined techniques to detect scenes, objects, and actions over space and time could one day allow for real-time narration,” Dollar adds.