What Is Naive Bayes Classifier Algorithm In Machine Learning Pianalytix

what is naive Bayes classifier algorithm in machine learning

Naïve Bayes: It is a classification technique based on bias theorem with assumption of independence events with normal distribution.

Naïve bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature even if features depend on each other or upon the existence of the other feature all these independently contribute to the probability. e.g. whether A fruit is an apple, an orange or a banana.

That’s why it’s known as naïve 

 

Naïve Bayes model is easy to use and build for particularly large datasets.

 

 According to Bias theorem 

It states that: – Relationship between probability of hypothesis before getting the evidence and after getting the evidence P(H|E) is 

PH|E=PE|H*PHPE

Where H is ‘Hypothesis’, E is ‘Evidence’ 

 

e.g. suppose you have a deck of cards and if a single card is drawn from it,

 Probability that card is Queen  =452=113

Queen is event this card is a queen the probability of the given is 113

Evidence provided as the card is single card face card 

 

 

So,

   

              

PQueen|Face=PFace|Queen *PQueenPface

Let’s Assume a data scientist working in a major bank in NYC and he wants to classify a new client as eligible to retire or not.

 

Customers features are his/her age and salary

 

Prior Probability: 

  • Points can be classified as RED or BLUE and our task is to classify a new point to RED or BLUE.
  • Prior Probability: Since we have more BLUE compared to RED, we can assume that our new point is twice as likely to be BLUE then RED 

 

Likelihood: 

  • For the new point, if there are more BLUE points in its vicinity, it is more likely that the new point is classify as BLUE 

  • So, we draw a circle around the point, then we calculate the number of each points in the circle belonging to each class label

Posterior Probability:

  • Let’s combine prior probability and likelihood to create a posterior probability 

  • Prior: suggests that X may be classified as BLUE because there are 2x as much BLUE points.

  • Likelihood: suggests that X is RED because there are more RED points in the vicinity of X

  • Bayes’ rule combines both to form a posterior probability. 

Day

Outlook 

Humidity

Wind

Play

1

Sunny

High

Week 

No

2

Sunny 

Normal

Strong

Yes 

3

Overcast

High

Week

Yes

4

Rain

Normal

Strong

No

5

Rain

Normal

Week

No

6

Sunny

high

Strong

Yes

7

Overcast

Normal

Strong 

Yes

8

Sunny

High

Week

Yes

9

Overcast

High

Strong

Yes

10

Rain

Normal

Week

No

11

Overcast

High

Strong

No

12

Rain

High

Week

No

13

Sunny

Normal

Week

No

14

Overcast

High

Strong

Yes

Here we have whether data for a particular place we want to know at is there any chance to play game tomorrow 

Frequency tables 

For Outlook  

Outlook

                              Play

Likelihood [ p(C) ]

Yes [ p(X|C) ]

No [ p(X) ]

Sunny

3

2

5/14

Overcast

4

0

4/14

Rain

3

2

5/14

Humidity

                              Play

likelihood

Yes

No

Normal

3

4

7/14

High

6

1

7/14

For wind

Wind 

                              Play

Likelihood 

Yes

No

Strong

6

2

8/14

week

3

3

6/15

 

Outlook    P(X|C) =P(Sunny)/Yes) = 3/10  = 0.30

 P(X) =P(Sunny) = 5/14 = 0.36

 P(C) =P(Yes)      = 10/14 = 0.71 

Likelihood of ‘Yes’ given its Sunny 

PC=PYes*PCPX

PC=0.3*0.360.71=0.591

Similarly 

Likelihood for ‘NO’ = (0.4*0.36)/0.36 = 0.40

Suppose we have given data 

Outlook = yes

Humidity=high

Wind=low

Then,                             Play =???

So, let’s start prediction that what is the possibility to game tomorrow by using naïve bays 

Likelihood of “Yes” on that Day

=[PYes* Pyes* Pyes* Pyes]

=29*39*69*914=0.0199

Similarly for “NO”    =[PNO* PNo* PNo*PNo]

=25*45*25*514=0.0166

Probability of play on that :

PYes=0.01990.199+0.166=55 %

PNo=0.0660.199+0.166=45 %

Here there is a 55 % chance to have a game on that day.

 

INDUSTRIAL USES OF THE MODEL:

  1. NEWS Classification: by using Naïve bays model we can classify the news on the basis of there content in their type like sports news, political, national, international, finance, stock market, cinema, media education etc

  2.  Spam Mail Or message filter 

  3. Object detection 

  4. Medical diagnosis: is very useful and effective in the medical domain and gives accurate observation.

  5. Whether Prediction (as we done in our example)

Types of Naïve Bayes:

  1. Gaussian: It used in classification and it assumes that feature follow normal distribution 

  2. Multinomial: It is used for discrete count for e.g. Text Classification 

  3. Bernoulli: It is one step further and instead of word occurring in the document we have count how after ways word occurs in documents 

Based on the dataset you can use Any of these.

Article by:  Sachin Dubey

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program

Also Check Other technical and Non Technical Internship Programs

Facebook
WhatsApp
LinkedIn
Email