It might sound like a plot point in a Dan Brown novel, but it’s not: A computer science undergraduate really has managed to hide the complete works of William Shakespeare, one of the world’s greatest writers, in a single tiny image that was shared in a Twitter message. Carried out by David Buchanan, a third-year student at Cardiff University in the U.K., it’s an amazing demonstration of how computers can be used to embed hidden messages in plain sight.
“Twitter filters most metadata from images, presumably for privacy and data usage reasons,” Buchanan told Digital Trends. “However, I found that ‘ICC profile’ metadata is left untouched. So I crafted an image file which also contains a ZIP archive inside its ICC profile. The ZIP file format is flexible enough that I was able to make the file simultaneously valid as a JPEG and ZIP file. For technical reasons, the contents of the ZIP file had to be split into 64-kilobit chunks, so I used a multipart RAR archive, which finally contained the text document.”
Assuming this all works out, the image in this tweet is also a valid ZIP archive, containing a multipart RAR archive, containing the complete works of Shakespeare.
This technique also survives twitter’s thumbnailer :P pic.twitter.com/P0Owq9abRC
— Dаvіd Вucһаnаn (@David3141593) October 29, 2018
Got that? Okay, so it’s not something that most of us are going to worry about when sharing images on Twitter, but it’s an impressive demo of how much raw data can be squeezed into a tweet. By embedding a ZIP file of the complete Shakespeare into a portrait of The Bard himself, it pushes Twitter’s text limit way beyond the current 280 characters per tweet. (While we don’t know exactly how many letters are in the complete works of Shakespeare, according to the Folger Library there are 884,647 words in total.)
“There are two broad terms you could use to describe this technique,” Buchanan said. “[One is] steganography, which is the art of hiding information inside some other data. Modern-day steganography typically aims to be completely undetectable, which my technique is certainly not. A more accurate description of this technique would be a polyglot file, which is used to describe a file which can be simultaneously interpreted as multiple different data formats, depending on what software reads it.”
Buchanan said that after he found out that this could be done, he submitted it to Twitter’s bug bounty program, which pays out money to anyone who can find potential Twitter vulnerabilities that could be exploited by hackers. Twitter turned it down on the basis that it didn’t have any potential security impact, but Buchanan decided to have some fun with it nonetheless. While other people have hidden files in social media posts, Buchanan said that, to his knowledge, he is the first person to do it on Twitter.