A Plan for Eliminating Toxic Users on Twitter using Artificial Intelligence & Trustworthiness

Dare Obasanjo
3 min readAug 22, 2018

Twitter as a service is widely known for the toxic behavior of a subset of its users. In addition, the company has done a worse job than its peers in giving people the impression that it cares or knows how to deal with these toxic users.

One key flaw in Twitter’s approach to dealing with toxic users is revealed in this interview with Jack Dorsey, their CEO excerpted below

“Our model right now relies heavily on people reporting violations and reporting things they find suspicious, and then we can act on it,” he said. “But ultimately we want to take that reporting burden off the individual and automate a lot more of this.”

There are a number of drawbacks in this approach, chief of which is that by putting the burden of policing toxic behavior on users you risk that behavior going unreported either because their victims don’t want to deal with the burden of reporting especially given Twitter’s poor reputation in handling reports or the behavior occurs within a close circle who wouldn’t report each other such as Alex Jones telling his legion of fans to get their “battle rifles” in a live stream.

Given its users post over 350,000 tweets per minute it is a tall order for Twitter to review every piece of content posted to the site in a cost effective manner especially for less accessible content like a video or live stream. However software engineering is often about finding shortcuts that address 80% of the needs extremely well even if 20% of the time you need to do more work (aka hacks/optimizations).

It was recently revealed that Facebook now uses a trustworthiness score to determine a user’s propensity to post fake news or not. This same approach actually applies quite well to abusive behavior. A user who harasses or abuses other users is a lower value user to the service since regardless of how many ads they are clicking on or tweets they are posting they are making someone else’s experience worse.

Once a user is determined to exhibit toxic behaviors whether automatically detected (e.g. calling people names in replies with easily determinable insults such as c*nt or n*gger) or via reports, they get a negative score and each additional infraction is cumulatively added to that negative score. They’re usage of the service should then be steadily restricted (e.g. can no longer DM or reply to people who don’t follow them) as their trustworthiness score drops. This isn’t a new idea since sites like Slashdot & Reddit have a karma score which serves the same purpose.

However one thing unique to social media sites like Twitter is the roving hordes of bad users who then participate in organized harassment campaigns and other internet mob-like behavior. Once you have a notion of a trustworthiness score, you can also apply it to groups. For example, anyone who follows multiple toxic accounts, shares hashtags linked to abusive behavior or other exhibits other characteristics that correlate with being a toxic user can be places in a cohort with a particular trustworthiness score using machine learning techniques.

This isn’t a form of pre-crime but instead a signal that the system should treat them with more scrutiny. For example, using the word “whore” or “idiot” in a reply may be abusive in certain contexts and fine in others. Reviewing every instance of these is obviously infeasible but if you focus on users whose individual or cohort trustworthiness score indicates a propensity for abuse you can significantly reduce how much content you have to cover.

Eventually the system will start to take action against abusive users either automatically or after human review without others having to report them which leads to happier users. 😊

Now Playing: 2 ChainzBigger Than You (featuring Drake & Quavo)



Dare Obasanjo

"Everything you touch you change. Everything you change, changes you" - Octavia Butler, Parable of the Sower