It’s no secret that there are many, many fake Twitter accounts out there — and we’re not just talking about people who pretend to be celebrities to cause PR headaches.
A new research project developed at Carnegie Mellon University has set out to solve this problem with an algorithm called FRAUDAR.
“On Twitter and social media, popularity is important,” Christos Faloutsos, professor of machine learning and computer science, told Digital Trends. “If I have 500 followers and you have 10,000 then you appear more important than me. As a result, there are companies on the internet which sell fake followers. Twitter, Facebook and … other companies want to suppress this kind of behavior. The goal of our work was therefore to figure out a good way to allow them to do so.”
FRAUDAR is based around graph mining, referring to a method for seeking out patterns in data. In this case, it is searching for something called a “bipartite core” — meaning groups of users who have interactions with members of a second group, but none with each other. This suggests that they might be fraudulent accounts, with the sole purpose of having false interactions, such as posting fake reviews.
An added challenge, however, is that these fraudulent users typically camouflage themselves — and can even go as far as using real user accounts that have been hijacked. FRAUDAR strips away this camouflage by starting with accounts it can confidently confirm as being legitimate and then working outward to find the bipartite core.
It’s smart stuff, and it was awarded the “Best Paper Award” at last month’s Association for Computing Machinery’s Conference on Knowledge Discovery and Data Mining (KDD2016) in San Francisco. In an experiment using Twitter data for a massive 41.7 million users and 1.47 billion followers, FRAUDAR was able to uncover upwards of 4,000 accounts which had not previously been flagged as fraudulent.
Best of all, Professor Faloutsos and his team have published the algorithm online for free as open-source code so that companies can use it as they wish.
“In academia, the more [important] thing is the number of citations and the impact we’re seen as having,” is how Faloutsos explained the decision to pass on the valuable tool for free. “We could sell 100 copies and make a bit of money, or we can give it away to companies and it’ll be a good publicity tool for the students who worked on it. It’s the tradeoff between glory and money: it’s much better for us to give our research away.”