Deep-learning artificial intelligence is helping grapple with plenty of problems in the modern world. But it also has its part to play in helping solve some ancient problems as well — such as assisting in the translation of 2,500-year-old clay tablet documents from Persia’s Achaemenid Empire.
These tablets, which were discovered in modern-day Iran in 1933, have been studied by scholars for decades. However, they’ve found the translation process for the tablets — which number in the tens of thousands — to be laborious and prone to errors. A.I. technology can help.
“We have initial experiments applying machine learning to identify which cuneiform symbols are present in images of a tablet,” Sanjay Krishnan, assistant professor at the University of Chicago’s Department of Computer Science, told Digital Trends. “Machine learning works by extrapolating patterns from human-labeled examples, and this allows us to automate the annotations in the future. We envision that it is a step toward significant automation in the analysis and study of these tablets.”
In this case, the human-labeled examples are annotated tablets from the Persepolis Fortification Archive’s (PFA) Online Cultural and Historical Research Environment (OCHRE) dataset. In DeepScribe, a collaboration between researchers from the University of Chicago’s Oriental Institute and its Department of Computer Science, they used a training set of more than 6,000 annotated images to build a neural network able to read unanalyzed tablets in the collection.
When the algorithm was tested on other tablets, it was able to translate the cuneiform signs with an accuracy level of around 80%. The hope is to increase this benchmark in the future. Even if that doesn’t happen, though, the system could be used to translate large amounts of the tablets, leaving human scholars to focus their efforts on the really difficult bits.
“Cuneiform is a script used since the third millennium BCE to write multiple languages including Sumerian, Akkadian, and Elamite,” Susanne Paulus, associate professor for Assyriology, told Digital Trends.
Cuneiform poses a series of particular challenges for machine translation. Firstly, it was written by impressing a reed stylus into wet clay. This makes cuneiform one of very few three-dimensional script systems. Secondly, cuneiform is a complex script system using hundreds of signs. Each sign has different meanings depending on its context. Thirdly, cuneiform tablets are ancient artifacts. They are often broken and hard to decipher, which means reading one tablet can take days.
“So far, we have an initial prototype that suggests that such techniques are very effective in a controlled setting,” Krishnan said. “Given a clean image of a single symbol, [we can] determine what the symbol is. Our next step is to develop more robust models that account for context and data quality.”