In this blog we are going to see about one of the popular and open-source library- SciPy.This library depends on python Numpy, so little knowledge about python Numpy will help you to understand better.
CONTENT:
- What is SciPy?
- Why we need to use SciPy?
- How to install SciPy?
- SciPy-Subpackages
What is SciPy?
SciPy is an python open source library ,which is used for scientific computing. The word Scipy stands for “Scientific Python”. SciPy is built on top of numpy in python. This library contains the collection of algorithms and functions for performing scientific and numerical computations.
Why we need to use SciPy?
It provides high level command for visualizing data.One of the easiest method to perform linear algebra .Easy and fast when compared with other libraries.Also provides wide number of algorithms for statistical analysis.
How to install SciPy?
In order to use the various functions of this library we need to install the Scipy in our system. SciPy is not a part of standard python distribution so we need to install it manually by using the popular python package manager-PIP.
We can install using a simple command “pip install”
#To install SciPy in python >>>pip install scipy #you can also use pip3 to install>>>pip3 install –user scipy #alternative>>>pip3 install scipy |
Install using Anaconda
We can also install this library using anaconda by entering the following commands in anaconda prompt.
conda install -c anaconda scipy |
Importing the SciPy library
After installing the Scipy library we can use the library in our program with the following commands “from scipy import (functions)”.
>>>from scipy import stats#or we can use this command to import the specific functions >>>from scipy.stats import norm |
Subpackages-SciPy
There are also various subpackages in Scipy used for different scientific computing like performing statistical analysis, linear algebra ,input output operations etc.
The following are the various subpackages in scipy:
- Scipy.stats
This module is also used for performing statistics.
- Scipy.integrates:
therefore, This module contains number of routines for numerical integration
- Scipy.linalg
This module also helps to solve the linear algebraic equations.
- Scipy.io
This is used for input /output operation.
- Scipy.interpolate
This module is also used for performing interpolation i.e. finding the values between two variables
- scipy.signal
It is used in signal processing like convolving two dimension arrays,filters design,filtering ,waveform generation etc
- Scipy.constants
This package thus contains the default constants which are used in scientific calculations.
- Scipy.fftpack
This package is also used to compute fast fourier transformation and inverse fast fourier transformation for a time domain signal.
- Scipy.cluster
however, This module is in use for clustering unlabeled data eg:(Kmeans clustering)
- Scipy.ndimage
This package is also used for computing multidimensional images.
- Scipy.optimize
This package thus provides various optimization algorithms for machine learning applications.
From this subpackages, we will see about the one of the important package (ie)-stats.
Scipy.stats
This module contains all the functions which are used in performing statistics for a given data. With this module we can perform descriptive statistics,1-way anova, one sample t test, two sample t test etc. This module is widely in use in machine learning for data analysis.We will see about some of the important functions in stats sub-package of Scipy.
One sample t-test:
One sample t test is a popular method for hypothesis testing. lets us consider a random population with mean of 45000.Random samples are from the population. The population mean and sample mean should be tested for 1 sample t test.Consider 2 hypotheses (H0 and Ha),Suppose if the mean of population and samples are equal, then we go with Ho hypothesis,if not we conclude with Ha hypothesis.This can be calculated with the help of one sample t test.
#import numpy and Scipy libraryImport numpy as npfrom scipy.stats import normFrom scipy import stats#iniatize random normally distributed samples of meana=norm.rvs(loc=50000,scale=5000,size=50)print(a.mean())#perform one ttest with population mean=45000stats.ttest_1sample(a,45000) |
OUTPUT
49897.15297832133Ttest_1sampResult(statistic=7.283322467815558, pvalue=2.4174962854521968e-09) |
From this result the p value is less than 0.05 ,so we can conclude that the mean of the random sample is not equal to the mean of the population.so we can conclude with Ha hypothesis.
Two sample t test:
Two sample t test is to check whether the mean of the two random sample is equal or not. Consider two groups of random samples from a population. so, we need to check whether the sample means are equal, we can perform this with the help of the below code.
#importing the numpy and scipy library import numpy as np from scipy.stats import normfrom scipy import stats#create 2 random sample of variable a and ba=norm.rvs(loc=50000,scale=5000,size=50)c=norm.rvs(loc=50000,scale=5000,size=50)print(a.mean())print(b.mean())#perform the two sample t teststats.ttest_ind(a,b) |
OUTPUT
49627.28181165291648551.14446711863Ttest_indResult(statistic=1.0930782813495403, pvalue=0.27703911880208276) |
From this the p value is greater than 0.05 so we can conclude with the alternate hypothesis Ha(the mean of the two group is not equal)
Conclusion:
This blog will hhelps you to understand the basics of Scipy library and its role in performing statistics. Not only in statistics Scipy plays a major role in many applications. so, i Hope you enjoyed the blog.
Happy Learning…!
Written By: Nikesh Joseph
Reviewed By: Vikas Bhardwaj
If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs