There are patterns in the news that go deeper than the punditry we get from today’s 24-hour cycles. Some of it is so subtle, in fact, it takes a big data analysis to identify.
A team of artificial intelligence researchers and social scientists from the University of Bristol recently did just that. In a landmark study, they analyzed the contents of 35 million articles in 100 British local newspapers, and were able to track the emergence of new technologies, epidemics, and subtle changes in the social and political environment.
Over the past 10 years, Nello Cristianini and his colleagues have been analyzing the news, developing a number of tools to piece apart content for large scale study. Their aim has been to determine whether societal shifts — such as moves toward women’s rights or away from political views — could be identified through statistical footprints marked in the text.
Partnering with the company findmypast, Cristianini and his team used digitized texts available in the British Newspaper Archive, which included some 28.6 billion words and 35 million articles, spanning 150 years from 1800 to 1950.
“At this stage we are very much focused on checking that we can find useful information, so we look for things that we expect to find,” Cristianini told Digital Trends. “This is why we focused on epidemics, coronations, conclaves, wars, et cetera. But even here we have obtained new information, for example the trajectory followed by new technologies.”
The researchers were able to track subtle changes — that spawned great transitions — such as the emergence of electricity in big cities as it began to overtake steam power in the early 1900s. They were also able to pinpoint the suffragette movement from 1906 to 1918 and watch as football (or soccer for us Yanks) became more popular than cricket beginning in 1909.
In the future, Cristianini hopes such data-driven approaches can help inform historical and social science studies. “Enabling historians to integrate and fuse multiple sources of information will become an increasingly important activity,” he said. His team is now applying similar methods to the troves of Twitter content now available online.
A paper detailing their work was recently published in the journal Proceedings of the National Academy of Sciences.