Quick Steps: Naive Bayes Classifier
- Prerequisites (Probability): A sound understanding of conditional, marginal probabilities and Bayes Theorem is desirable.
In this tutorial, we are going to learn about the Naive Bayes algorithm including how it works and how to use it using python. There are several ways we can do for classifying the data, including the following. (1) Nearest Neighbour Matching: (2) Classification Rules, (3) Classification/Decision Tree, and (4) Probabilistic learning.
In this article, we will focus on probabilistic learning only. For example, when you flip a fair coin, there is an equal chance of getting either heads or tails. So you can say the probability of getting heads is 50%.
Similarly what would be the probability of getting a 1 when you roll dice with 6 faces? Assuming the dice is fair, the probability of 1/6 = 0.166.
The Probabilistic classification technique is based on Bayes’ Theorem with an assumption of independence among predictors. In simple words, we can define the meaning of Naive as the features that can go into the model as independent of each other.
Naive Assumption: is that each feature makes an independent and equal contribution to the outcome. This is a very strange assumption that is most unlikely in a real-world scenario, i.e. that the attributes do not interact with each other.
Bayes’ Theorem: Mathematically the Bayes’ Theorem is stated as:
Types of Naive Bayes Classifiers: there are three different types of Naive Bayes classifiers which are the following:
(1) Gaussian Naive Bayes: When the continuous values associated with each attribute are assumed to be distributed according to a Gaussian distribution or Normal distribution.
(2) Multinomial Naive Bayes: It is used for discrete counts. e.g., text classification problem, where we want to count how often a word occurs in a given document.
(3) Bernoulli Naive Bayes: There may be multiple features but each one is assumed to be a binary-valued (Bernoulli, boolean) variable. i.e., if not we can binarize the feature’s values to fit into Bernoulli Naive Bayes classifier.
Example of Gaussian NB using Python:
# Guassain Naive bayes
1. from sklearn.naive_bayes import GaussianNB
2. import numpy as np# Data preparation for training and test set
3. X_train = np.array([[-1, -1], [-2, -4], [-4, -6], [1, 2]])
4. Y_train = np.array([1, 1, 2, 1])# Method to construct Gaussian Naïve Bayes Classifier.
5. GNBclf = GaussianNB()
# Model training process
6. GNBclf.fit(X_train, Y_train)
Where the fit method will train the Gaussian NB (Naive Bayes) classifier according to X_train (contains data) and Y_train (contains training labels).
# Test the model with unseen data point let say [-4, -7] print((GNBclf.predict([[-4,-7]])))
output: [1]
# Test the model with unseen data point let say [-1, -1] print((GNBclf.predict([[-4,-6]])))
output: [2]
You can download the code from My GitHub. This article is improving further.