How do you track air pollution in a place like China, which doesn’t readily reveal this data? That question inspired researchers at Rice University to come up with a solution. Their answer involved analyzing the words used in millions of posts on Chinese social media to find out whether microblogging can double as a proxy measurement system for airborne pollution.
“Historically, China has been very guarded about revealing air pollution data,” environmental engineer Daniel Cohan told Digital Trends. “They take very high quality real-time measurements and have monitors at around 24 sites around Beijing, but they make it very difficult for people to get hold of archival data.”
Cohan and and Rice computer scientist Dan Wallach used the Chinese microblogging platform Weibo (basically China’s version of Twitter) for their study. They looked at recurrent keywords such as “dust,” “cough,” “haze” and “blue sky” which they hypothesized would correlate with daily pollution levels.
The results of their study created something called the Air Discussion Index (ADI), which is based on the frequency of pollution-related terms in 112 million posts, made between 2011 and 2013, by residents of Beijing, Shanghai, Guangzhou and Chengdu.
“What we’ve introduced with the Air Discussion Index is analogous to what Google has done in predicting flu trends,” Cohan continued. “Just as Google is able to look at search terms and figure out where there’s a flu outbreak, we’ve determined that we can look at Weibo messages and predict how bad the air pollution was on that particular day.”
They compared their model with hourly sensor readings taken by the (more open) U.S. Embassy in Beijing, and found it could predict these measurements with an accuracy of 88.2 percent in Beijing. This number dipped in other cities where pollution wasn’t so severe and Weibo posts were not so widespread.
“What makes China unique is that the air pollution levels are so high that people were actually discussing them on social media,” Cohan said. “Even when they weren’t addressing them directly, they’d often make reference to the haziness or color of the sky. You probably couldn’t do this in U.S. cities, where pollution isn’t so visible. However, in places like Beijing, where you can literally see the air pollution, it really does become something a lot of people speak about regularly.”
Going forward, he suggested the findings could be expanded to cover other places in the world to test how well the model functions.
“There are large parts of the developing world where we don’t have high quality measurements related to air pollution,” he concluded. “This points to the possibility that we could use social media to infer how bad the pollution is in those locations. It won’t work everywhere, but in some of the world’s most polluted mega-cities it definitely could be part of a potential solution.”