Exactly what constitutes hate speech is one of the most hotly contested topics in 2018. One of the reasons it is so vigorously debated is because of how difficult it can be to define. If humans find hate speech difficult to define, machines find it even more more of a chore, as a new survey of seven different computer systems intended to identify such online speech makes clear. It also touches on just how easy they are to circumvent.
Researchers from Finland’s Aalto University analyzed various anti-hate speech systems, including tools built by Google’s Counter Abuse team. Their findings? Not only can the systems used to flag offensive content online not agree on a solid definition for hate speech, they can also be easily fooled with little more than a typo or letter substitution.
“Researchers and companies have suggested various text analysis and machine learning methods for automatic hate speech detection,” Gröndahl Tommi, one of the researchers on the project, told Digital Trends. “These systems are trained with examples of hateful and non-hateful text, with the goal of generalizing beyond the training examples. We applied a system trained with one data set to other data sets. We discovered that none of them worked well on other data sets. This indicates that what is called ‘hate speech’ differs a lot between existing data sets, and cannot be treated as a clearly definable property. Given this, we should not expect A.I. to replace humans completely in this task, as human labor continues to be required to make the final decisions on what constitutes hate speech proper.”
The researchers next demonstrated how all seven systems could be easily fooled by simple automatic text transformation attacks — such as making small changes to words, introducing or removing spaces, or adding unrelated words. For example, adding the word “love” into an otherwise hate-filled message confuses detection systems. These tricks were capable of fooling both straightforward keyword filters and more complex A.I. systems, based on deep-learning neural network architectures.
That today’s flagging tools are inadequate for dealing with online hate speech is no great shock. While we’ve covered some innovative cutting-edge projects in this domain, research such as this reveals just how much more work there is to do Hopefully, projects like this one will make researchers double down on the challenge, and not throw up their hands in defeat.