Earlier attempts to turn photos into recipes were limited by smaller datasets — although “small” is relative to all the possible recipes available. One study used 65,000 recipes, but it only included traditional Chinese cuisine; another only had about a 50 percent accuracy in initial testing. Because deep learning algorithms “learn” from being fed large quantities of data, these resulting programs were missing large gaps in potential ingredients, affecting the program’s accuracy.
To create a larger database, the researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) knew the software would have to be based on a wide-ranging set of data. So to solve that narrow dataset, the team turned to large sets of photos and recipes that already exists — food websites. Compiling data from places like Food.com and All Recipes, the team created Recipe1M, a dataset of over one million recipes.
Using those recipes and the associated images, the team was able to train the software to use object recognition to pick up on what each dish’s ingredients might be. With a list of ingredients, the system then selected the recipe that best matched the list. Pic2Recipe was able to recognize ingredients like flour, eggs, and butter.
The program doesn’t actually identify a recipe from the photo — it creates a list of ingredients. With that list, the program can then go through that one-million-recipe database and choose the one with ingredients that match the list from the photo.
“In computer vision, food is mostly neglected because we don’t have the large-scale datasets needed to make predictions,” said Yusuf Aytar, a postdoctoral associate who co-wrote the paper with MIT professor Antonio Torralba. “But seemingly useless photos on social media can actually provide valuable insight into healthy habits and dietary preferences.”
Since the computer already has that large dataset, it is also able to able to pick up on a number of different patterns, like that the average recipe has nine ingredients and the most popular are salt, butter, sugar, olive oil, water, eggs, garlic cloves, milk, flour, and onion.
The software could have a number of different real-world uses. A person could snap a photo at a restaurant to learn how to make the dish at home, or to track her personal nutrition.
The program, while it contains a wider dataset than earlier attempts, still has a few gaps. The researchers said the program has trouble with dishes that are a bit more ambiguous, like smoothies and sushi rolls. Similar recipes with a number of different variations, like lasagna for example, also tended to confuse the program.
The group plans to continue developing the program and even hopes to give the system the ability to tell how something is cooked, like picking up the difference between stewed and diced. Future work could also expand the program’s ability to recognize specific ingredients, like determining the type of onion instead of just listing onion.
You don’t have to wait until Pic2Recipe becomes a full fledged app to try it out. An online version allows users to upload images and try it out.