Trolls and bots are widespread across social media, and they influence us in ways we are not always aware of. Trolls can be relatively harmless, just trying to entertain themselves at others’ expense, but they can also be political actors sowing mistrust or discord. While some bots offer helpful information, others can be used to manipulate vote counts and promote content that supports their agenda. We’ll show you how machine learning can help protect our communities from abuse.


In the first part of this series, we covered the problem of trolls and bots on the popular site Reddit. We described how we built a dashboard to moderate suspected trolls and bots. In this part, we’ll show you how we used machine learning to detect bots and trolls in political discussions and then mark suspicious comments on our moderator dashboard.

Background on troll and bot detection

Troll and bot detection is a relatively new field. Historically, companies have employed human moderators to detect and remove content that’s inconsistent their terms of service. However, this manual process is expensive, plus it can be emotionally tiring for humans to review the worst content. We will quickly hit the limits of human moderator efficaciousness as new technologies like OpenAI GPT-2 natural language generation are unleashed. As bots improve, it is important to employ counter technologies to protect the integrity of online communities.

There’ve been several studies done on the topic of bot detection. For example, one researcher found competing pro-Trump and anti-Trump bots on Twitter. Researchers at Indiana University have provided a tool to check Twitter users called botornot.

There’s also been interesting research on online trolls. Research from Stanford has shown that just 1% of accounts create 74% of conflict. Researchers at Georgia Tech used a natural language processing model to identify users who violate norms with behavior like making personal attacks, misogynistic slurs, or even mansplaining.

Screening comments for moderation

Our goal is to create a machine learning model to screen comments on the politics subreddit for moderators to review. It doesn’t need to have perfect accuracy since the comments will be reviewed by a human moderator. Instead, our measure of success is how much more efficient we can make human moderators. Rather than needing to review every comment, they will be able to review a prescreened subset. We are not trying to replace the existing moderation system that Reddit provides, which allows moderators to review comments that have been reported by users. Instead, this is an additional source of information that can complement the existing system.

As described in our part one article, we have created a dashboard allowing moderators to review the comments. The machine learning model will score each comment as being a normal user, a bot, or a troll.

Try it out for yourself at

To set your expectations, our system is designed as a proof of concept. It’s not meant to be a production system and is not 100% accurate. We’ll use it to illustrate the steps involved in building a system, with the hopes that platform providers will be able to offer official tools like these in the future.

Collecting training data

Our initial training dataset was collected from a list of known bots and trolls. We’ll use two lists of these 393 known bots plus 167 more from the botwatch subreddit. We’ll also use a list of 944 troll accounts from Reddit’s 2017 Transparency Report that were suspected of working for the Russian Internet Research Agency.

We are using an event-driven architecture that consists of a process that downloads data from Reddit and pushes it in a Kafka queue. We then have a Kafka consumer that writes the data into a Redshift data warehouse in batches. We wrote a Kafka producer application to download the comments from the list of bots and trolls. As a result, our data warehouse contains not only the data from the known bots and trolls, but also real-time comments from the politics subreddit.

While Reddit comments aren’t exactly private, you may have data that is private. For example, you may have data that’s regulated by HIPAA or PCI, or is sensitive to your business or customers. We followed a Heroku reference architecture that was designed to protect private data. It provides a Terraform script to automatically configure a Redshift data warehouse and connect it to a Heroku Private Space. As a result, only apps running in the Private Space can access the data.

We can either train our model on a dyno directly or run a one-off dyno to download the data to CSV and train the model locally. We’ll choose the latter for simplicity, but you’d want to keep sensitive data in the Private Space.

heroku run bash -a kafka-stream-viz-jorge
export PGPASSWORD=<password>
echo “select * from reddit_comments” | psql -h -U jorge -d redshift_jorge -p 5439 -A -o reddit.csv
gzip reddit.csv
curl -F “file=@reddit.csv.gz”

If you prefer to use our training data to try it out yourself, you can download our CSV.

Now we have comments from both sets of users and count a total of 93,668. The ratios between the classes are fixed at 5% trolls, 10% bots, and 85% normal. This is useful for training but likely underestimates the true percentage of normal users.

Selecting features

Next, we need to select features to build our model. Reddit provides dozens of JSON fields for each user and comment. Some don’t have meaningful values. For example, banned_by was null in every case, probably because we lack moderator permissions. We picked the fields below because we thought they’d be valuable as predictors or to understand how well our model performs. We added the column recent_comments with an array of the last 20 comments made by that user.


Some fields like “score” are useful for historical comments, but not for a real-time dashboard because users won’t have had time to vote on that comment yet.

We added additional calculated fields that we thought would correlate well with bots and trolls. For example, we suspected that a user’s recent comment history would provide valuable insight into whether they are a bot or troll. For example, if a user repeatedly posts controversial comments with a negative sentiment, perhaps they are a troll. Likewise, if a user repeatedly posts comments with the same text, perhaps they are a bot. We used the TextBlob package to calculate numerical values for each of these. We’ll see whether these features are useful in practice soon.


For more information on what these fields are and how they are calculated, see the code in our Jupyter Notebooks in

Building a machine learning model

Our next step is to create a new machine learning model based on this list. We’ll use Python’s excellent scikit learn framework to build our model. We’ll store our training data into two data frames: one for the set of features to train in and the second with the desired class labels. We’ll then split our dataset into 70% training data and 30% test data.

X_train, X_test, y_train, y_test = train_test_split(
input_x, input_y,
test_size=0.3, random_state=16)

Next, we’ll create a decision tree classifier to predict whether each comment is a bot, a troll, or a normal user. We’ll use a decision tree because the created rule is very easy to understand. The accuracy would probably be improved using a more robust algorithm like a random forest, but we’ll stick to a decision tree for the purposes of keeping our example simple.

clf = DecisionTreeClassifier(max_depth=3,
class_weight={‘normal’:1, ‘bot’:2.5, ‘troll’:5},

You’ll notice a few parameters in the above code sample. We are setting the maximum depth of the tree to 3 not only to avoid overfitting, but also so that it’s easier to visualize the resulting tree. We are also setting the class weights so that bots and trolls are less likely to be missed, even at the expense of falsely labeling a normal user. Lastly, we are requiring that the leaf nodes have at least 100 samples to keep our tree simpler.

Now we’ll test the model against the 30% of data we held out as a test set. This will tell us how well our model performs at guessing whether each comment is from a bot, troll, or normal user.

matrix = pd.crosstab(y_true, y_pred, rownames=[‘True’], colnames=[‘Predicted’], margins=True)

This will create a confusion matrix showing, for each true target label, how many of the comments were predicted correctly or incorrectly. For example, we can see below that out of 1,956 total troll comments, we correctly predicted 1,451 of them.

Predicted    bot        normal    troll        All
bot 3677 585 33 4295
normal 197 20593 993 21783
troll 5 500 1451 1956
All 3879 21678 2477 28034

In other words, the recall for trolls is 74%. The precision is lower; of all comments predicted as being a troll, only 58% really are.

Recall : [0.85611176 0.94537024 0.74182004]
Precision: [0.94792472 0.94994926 0.58578926]
Accuracy: 0.917493044160662

We can calculate the overall accuracy at 91.7%. The model performed the best for normal users, with about 95% precision and recall. It performed fairly well for bots, but had a harder time distinguishing trolls from normal users. Overall, the results look fairly strong even with a fairly simple model.

What does the model tell us?

Now that we have this great machine learning model that can predict bots and trolls, how does it work and what can we learn from it? A great start is to look at which features were most important.

feature_imp = pd.Series(
recent_avg_diff_ratio 0.465169
author_comment_karma 0.329354
author_link_karma 0.099974
recent_avg_responses 0.098622
author_verified 0.006882
recent_min_sentiment_polarity 0.000000
recent_avg_no_follow 0.000000
over_18 0.000000
is_submitter 0.000000
recent_num_comments 0.000000
recent_num_last_30_days 0.000000
recent_avg_gilded 0.000000
recent_avg_sentiment_polarity 0.000000
recent_percent_neg_score 0.000000
recent_avg_score 0.000000
recent_min_score 0.000000
recent_avg_controversiality 0.000000
recent_avg_ups 0.000000
recent_max_diff_ratio 0.000000
no_follow 0.000000

Interesting! The most important feature was the average difference ratio in the text of the recent comments. This means if the text of the last 20 comments is very similar, it’s probably a bot. The next most important features were the comment karma, link karma, the number of responses to recent comments, and whether the account is verified.

Why are the rest zero? We limited the depth of our binary tree to 3 levels, so we are intentionally not including all the features. Of note is that we didn’t consider the scores or sentiment of previous comments to classify the trolls. Either these trolls were fairly polite and earned a decent number of votes, or the other features had better discriminatory power.

Let’s take a look at the actual decision tree to get more information.

export_graphviz(estimator, out_file=’’,
feature_names = data.drop([‘target’], axis=1).columns.values,
class_names = np.array([‘normal’,’bot’,’troll’]),
rounded = False, proportion = False,
precision = 5, filled = True)

Now we can get an idea of how this model works! You might need to zoom in to see the details.

Let’s start at the top of the tree. When the recent comments are fairly similar to each other (the average difference ratio is high), then it’s more likely to be a bot. When they have dissimilar comments, low comment karma, and high link karma, they are more likely to be a troll. This could make sense if the trolls use posts of kittens to pump up their link karma, and then make nasty comments in the forums that either get ignored or downvoted.

Hosting an API

To make our machine learning model available to the world, we need to make it available to our moderator dashboard. We can do that by hosting an API for the dashboard to call.

To serve our API, we used Flask, which is a lightweight web framework for Python. When we load our machine learning model, the server starts. When it receives a POST request containing a JSON object with the comment data, it responds back with the prediction.

Example for a bot user:

“body”:”Hey, thanks for posting at \\/r\\/SwitchHaxing! Unfortunately your comment has been removed due to rule 6; please post questions in the stickied Q&amp;A thread.If you believe this is an error, please contact us via modmail and well sort it out.*I am a bot”,
“recent_comments”:”[…array of 20 recent comments…]”

The response returned is:

“prediction”: “Is a bot user”

We deployed our API on Heroku because it makes it very easy to run. We just create a Procfile with a single line telling Heroku which file to use for the web server.

web: python ${port}

We can then git push our code to heroku:

git push heroku master

Heroku takes care of the hassle of downloading requirements, building the API, setting up a web server, routing, etc. We can now access our API at this URL and use Postman to send a test request:

See it working

Thanks to the great moderator dashboard we wrote in the first part of this series, we can now see the performance of our model operating on real comments. If you haven’t already, check it out here:

Dashboard at

It’s streaming real live comments from the r/politics subreddit. You can see each comment and whether the model scored it as a bot, troll or normal user.

You may see some comments labeled as bots or trolls, but it’s not obvious why after inspecting their comment history. Keep in mind that we used a simple model in order to keep our tutorial easier to follow. The precision for labeling trolls is only 58%. That’s why we designed it as a filter for human moderators to review.

If you’re interested in playing with this model yourself, check out the code on GitHub at You can try improving the accuracy of the model by using a more sophisticated algorithm such as a random forest. Spoiler alert: it’s possible to get 95%+ accuracy on the test data with more sophisticated models, but we’ll leave it as an exercise for you.