PyCaret | An Open Source Library For ML

Introduction to PyCaret

PyCaret is an open source library, low code machine learning library in python that enables you to travel from preparing your data to deploying your model within seconds in your choice of notebook environment. thus, It is very much needed/ suitable for data scientists who run to increase the productivity of their business problems using PyCaret library. It is also an important library for those who want to include in their workflow irrespective of coding background(with or without). 

PyCaret’s also major role/aim is to reduce the run time from hypothesis to getting insights from the data. it also is a deployment ready python library.

Pic Credits :  Google

Installation of PyCaret

For every enthusiast before building the model using PyCaret library, it is required to install the library first. It’s a basic step every beginner will be used to do.

Pic Credits :  Google

Using the below command, install the PyCaret

#installation for the first time

!pip install pycaret

#if you previously installed, check the version using below code

From pycaret.utils import version

version()

Before installing pycaret or any library, creating a virtual environment is highly recommended for example python 3.6+

Creating a virtual environment. so, If you have anaconda navigator installed then follow these steps:-

#creating a virtual environment

conda create -- name (envname) python 3.6+

#activating conda environment

conda activate envname

Then, follow the above steps by installing the PyCaret library using commands.

Pre- Processing steps | PyCaret

As I said above, PyCaret is the development ready python library which states that as you proceed with the ML experiment, all the steps will be saved automatically in the pipeline which is easier in future while deploying into production. Beautiful thing about PyCaret is it will arrange all dependencies in the pipeline.

Some steps in pre-processing
  1. Getting the data
  2. Splitting the data and sampling the data
  3. Data preparation or cleaning 
  4. Scaling and transforming
  5. Feature engineering
  6. Feature selection
  7. Modelling 

Find the total details HERE

Next step,

We will do some hands-on using PyCaret library i,e will try to solve Regression Problem

What all Prerequisites needed?

  1. Python 3x (preferable 3.6 + )
  2. PyCaret latest version
  3. Internet connections to load data with the library
  4. Knowledge about Binary classification problems

What is Regression?

Regression is a supervised machine learning technique where the goal is to investigate the relation between the dependent and independent variable.

Techniques covered in this PyCaret based regression:

  1. Normalization
  2. Transformation
  3. Target transformation 
  4. Combine rare levels
  5. Bin numeric variables
  6. Model ensembling and techniques
  7. Tuning Hyperparameters of ensembles
1: Getting the data

#shape of the data

2: Setting up the environment in pycaret
from pycaret.regression import *
exp_reg101 = setup(data = data, target = 'Gold_T+22', session_id=123)
3: Comparing all models
4: Create a model
  1. Adaboost regressor
  1. Light gradient boosting machine
  1. Decision tree 
5:  Tune a model
  1. Adaboost regressor
  1. Light gradient boosting machines
  1. Decision tree
6: Plat a model 
  1. Residual plot
plot_model(tuned_lightgbm)
  1. Prediction error plot
plot_model(tuned_lightgbm, plot = 'error')
  1. Feature importance plot
plot_model(tuned_lightgbm, plot='feature')
7: Predict on sample/Hold- out sample
predict_model(tuned_lightgbm);
 ModelMAEMSERMSER2RMSLEMAPE
0Light Gradient Boosting Machine0.01910.00070.02580.64330.0221-0.1316
8: Finalize model for deployment
final_lightgbm = finalize_model(tuned_lightgbm)
predict_model(final_lightgbm);
 ModelMAEMSERMSER2RMSLEMAPE
0Light Gradient Boosting Machine0.00250.00.00340.99380.0032-0.0803
9: Predict on Unseen data
unseen_predictions = predict_model(final_lightgbm, data=data_unseen)
unseen_predictions.head()
10: Saving the model 
save_model(final_lightgbm,'Final Lightgbm Model 08Feb2020')
11: Loading the saved model
saved_final_lightgbm = load_model('Final Lightgbm Model 08Feb2020')
new_prediction = predict_model(saved_final_lightgbm, data=data_unseen
new_prediction.head()

Conclusion

In this article, we have covered almost everything under PyCaret and this is also called a new AutoML technique. Hope this article gives you insight for your learning.

Thanks for reading!

written by: Krishna Heroor

reviewed by: Umamah

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *