Naive Bayes algorithm is another very popular supervised machine learning algorithm that attempts to classify data based on the Bayes Theorem, according to predefined classes. A classical example that you’ll find Naive Bayes in is text classification such as detecting spam from normal emails. Naive Bayes is also called a probabilistic classifier as it calculates the probability of any certain data to belong to which class.

The term Naive Bayes itself is an interesting one, and one you ought to know. The term Naive itself refers to the fact that the algorithm assumes that all the features are independent of each other. In other terms the presence/weights or the lack of other features do not affect any other feature, hence the term Naive. Bayes simply is named after Thomas Bayes an 18th century English statistician which was his most famous accomplishment, named the Bayes Theorem.

Now let’s dive straight to the not so confusing Bayes formula:

Explanation is as follows:

  • P (A | B) : Named a Posteriori probability, which is the probability of A happening given B is true.
  • P(B | A) : The probability that B happens given A is true.
  • P ( A ) : Probability of A being true, also called the Prior probability of A.
  • P ( B ) : Probability of B being true, also called the Prior probability of B.

Let’s apply this formula on a real life example, most of the examples online try to relate it to the normal/spam email example, and I will do so as well. I will attach the original material for it below ( All sources are listed at the end of this guide).

Given the fact we have a number of emails that are either normal versus spam. We define normal emails as email with a normal body of text that is more or less predicted conversation. While spam is unwanted emails usually with fraudulent claims and intents. It’s not like you don’t know the difference, but it’s a good thing to have a base definition to start from.

Now imagine we have a dataset, alternatively also called a corpus, of emails of both classes. These emails would consist of the text found within these particular emails. Now armed with that dataset we shall attempt to predict using Bayes Theorem.

First we need to find the prior probability of each class, in layman’s terms find the possibility that any word,email,sentence would belong to either spam or normal email regardless of the content of the email. Next is we get the body text we are trying to classify and find the probability of each word occurring in each class, multiply all of all and multiply them by the prior probability of each class. Okay that was a handful, wasn’t it? Okay let’s apply an example to this.

Imagine we have an email, could be spam or normal right? Let’s start with an imaginary sentence from that imaginary email, something along the lines of “Dear Customer”. First we find the prior probability of each class, then the probability of Dear, and the probability of Customer. After that we multiply all of them together and we should get the probability of the phrase “Dear Customer” in either class.

There is another version called a Gaussian Naive Bayes. It is named after the Gaussian distribution, also known as a normal distribution. We do this by first finding the mean and the standard deviation of each input. We then multiply the prior probability with the likelihood of each variable given a certain class. So for instance if we have 3 variables we find the likelihood of these three variables according to their distribution. We then apply log() on each likelihood and find the sum of all these log values. We then repeat the process on the other class, whichever process yields the higher value will signify the predicted class.


Naive Bayes: The maths behind it, how it works, and an example was originally published in Analytics Vidhya on Medium, where people are continuing the conversation by highlighting and responding to this story.