Hypothesis Testing

One of the important and basic concepts in statistics is Hypothesis Testing. When we started with data science this is the important concept for every data enthusiast and also hypothesis is used in many other fields, since it is very useful in proving a claim or an assumption.

As I said above, Hypothesis is important for every field and how they use this technique.

Let’s take an example, we have used sanitizers for the past 6 months because of covid thing. Sanitizer claims that it kills viruses upto 99%. How do they claim this? Is there any technique they used to prove the claim that they are advertising. Yes, they use hypothesis testing to prove the claim. As I said earlier testing is majorly used to prove a claim or assumption.

What is Hypothesis testing?
Null hypothesis and alternative hypothesis
One tailed and two tailed hypothesis testing
Level of confidence
Level of significance
Type I and type II error
P- Value

What is Hypothesis testing?

Hypothesis Testing Gives A Way Of Using Samples To Test Whether Or Not Statistical Claims Are Likely To Be True Or Not.

Hypothesis testing is an assumption we make over a population parameter. This assumption we make over the event may have occurred(right) or may not be occured(may not be right).

For example, “IPL 2020 champions title will be won by RCB team”. This is just an assumption statement that we make based on the number of matches won and net run rate. So we can try testing this statement based on the IPL match dataset.

Null Hypothesis and Alternative Hypothesis

The null hypothesis is what you/me believe in,other words we say by default. The null hypothesis is nothing but there is no relation between our statement and event going to happen.

In simple terms, a null hypothesis is something we are going to accept the statement i,e true statements. For example, “The coin is fair”. Null hypothesis is denoted by H0

H0: The coin is fair

The alternative hypothesis is inverse to the null hypothesis. Using the null and alternative hypothesis, both helps in making assumptions over the population parameter. It is denoted by H1.

For example, “The coin is biased towards head”.

H1: The coin is biased towards head

Let’s understand with the example

During this pandemic time, we all have used sanitizers for the past 6 months because of covid thing. Sanitizer claims that it kills viruses upto 99%. How do they claim this?.Let’s formulate this using null and alternative concepts.

Null hypothesis H0: Sanitizer kills viruses upto 99% (average)

Alternative hypothesis H1: Sanitizer kills viruses less than 99% (not equal to 99%)

One tailed and two tailed hypothesis testing

Using null and alternative hypothesis we formulate the statements. This is the normal procedure we follow when we have a statement. In alternative hypothesis, when we test the hypothesis if the alternative hypothesis gives both alternate directions (lesser than and greater than) value which we specified in the null hypothesis. Then it is called a two tailed hypothesis.

For example: If our Ho = 1000; 1000 =<H1>= 1000. In this case, our H1 is less than 1000 and also greater than 1000. So this is a two tailed hypothesis.

When the hypothesis test gives, i,e if the alternative hypothesis gives one alternate directions (either lesser than and greater than) value which we specified in the null hypothesis. Then it is called a one tailed hypothesis.

For example: If our Ho = 1000; H1>= 1000. In this case, our H1 is greater than 1000. So this is one tailed hypothesis.

Critical Region | Hypothesis Testing

In order to test a hypothesis, the entire sample space is divided/partitioned into two regions

Critical region or rejection region
Non rejection region

1. Critical region or rejection region

A region in a sample space in which if the calculated value of the test statistics lies. so, we will reject the null hypothesis. This region is called a critical region or rejection region.

2. Non rejection region

A region in a sample space in which if the calculated value of the test statistics lies, we will not reject the null hypothesis. so, This region is called a critical region or rejection region.

Test statistics : This is a statistic, which is used in decision making about null hypothesis.

Level of confidence

As a name suggests the definition of the level of confidence. How confident you are in taking the decisions. Basically the level of confidence( LOR ) must be 95% or more but if LOR is less than that it will be rejected.

Level of significance

The probability level below which we reject the null hypothesis is called the level of significance.

We can’t accept the 100% accuracy or it is not possible to accept or reject the hypothesis tests. So the level of significance will be 5% usually.

It is denoted with the α term and formula is, α = 1- Confidence level

Type I and Type II Error

When we test null hypothesis against alternative hypothesis, there we will have four probabilities.

1. Null hypothesis (H0) is accepted when null hypothesis (HO) is true

2. in this, Null hypothesis (H0) is rejected when null hypothesis (H0) is true [ type I error ]

3. Null hypothesis (H0) is accepted when null hypothesis (HO) is false [ type II error ]

3. here, Null hypothesis (H0) is rejected when null hypothesis (H0) is false

Type I error

We reject null hypothesis even though our hypothesis was true

α = p( type I error )

α = p( reject H0|H0 is true )

Type II error

We accept null hypothesis but our hypothesis was false

α = p( type II error )

α = p( accept H0|H0 is false )

Let’s understand with an example

The person got caught with the police for not putting on a helmet, but he has all the documents like license and others. The police has to decide whether the person is innocent or guilty

H0 : Person is innocent

H1 : Person is not innocent or he is guilty

Type I error will be if the police convicts the person [rejects H0] although the person was innocent having all the documents [H0 is true].

Type II error will be the case when police release the person [Do not reject H0] although the person is guilty without putting on a helmet [H1 is true].

Pic Credits: Google

P – Value

The calculated probability or p value, is nothing but finding the probability of observed value. We use p value in hypothesis to make decisions in accepting or rejecting the hypothesis tests. It is a major evidence against the H0. If the p value is smaller, the stronger evidence to reject the H0.

Implementation

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

Find the total notebook code for above concepts here. If you like the kernel upvote.

Happy learning!

Thank you!

written by: Krishna Heroor

reviewed by: Umamah

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Table of contents