Frameworks For Machine Learning Process

Frameworks for approaching the process of machine learning are Yufeng Guo’s The 7 Steps of Machine Learning Process and section 4.5 of Francois Chollet’s Deep Learning with Python.

7 Steps of Machine Learning Process

I actually came across Guo’s article i came to know that there are interesting fact of machine learnig

Image source

Guo laid out the steps as follows:-

1 – Data Collection

The quantity & quality of your data will describe how accurate our model is.
The outcome of this step is generally a representation of data (can simplifies to specifying a table) which we will use for training.
Using pre-collected data, can be taken from kaggle or any rich and genuine data source website.

2 – Data Preparation

Wrangle data and prepare it for training
Cleaning this may require it (remove duplicates, correct errors, deal with missin.g values, normalization, data type conversions, etc.).
Randomize data, which is randomly collecting or preparing our data.
Visualize data to help detect relevant relations between variables or perform othe exploratory analysis.
Split into training and evaluation sets.

3 – Choose a Model

choose the right model as different algorithms are useful for different tasks.

4 – Train the Model

The goal of training is to answer a question or make a prediction correctly as often as possible
Linear regression example: algorithm would need to learn values for W and b (x is input, y is output)
Each iteration of process is a training step and we have

5 – Evaluate the Model

Uses some metric or combination of metrics to “measure” objective performance of model
Test the model against unseen data
This unseen data is meant to be somewhat representative of model performance in the real world, but still helps tune the model (as opposed to test data, which does not)
Good train/test split can be 80:20, 70:30,also depending on domain, dataset particulars,data availability, etc.

6 – Parameter Tuning

This step refers to hyperparameter tuning, which is an “artform” as opposed to a science
Tune model parameters for improved performance
Simple model hyperparameters thus may include number of training steps, learning rate, initialization values and distribution, etc.

7 – Make Predictions

Using further test data which we have, until this point we have been using class labels which are known) which were used to test the model.with this we are deciding how well it will perform to real world data

Universal Workflow of Machine Learning

In section 4.5 of his book, Chollet also outlines a universal workflow of machine learning, which he describes as a blueprint for solving machine learning problems.

however, The blueprint ties together the concepts we’ve learned about in this chapter: problem definition, evaluation, feature engineering, and fighting overfitting.

How does this compare with the above steps?Let’s have a look at the 7 steps of Chollet’s treatment (keeping in mind that, while not explicitly stated as being specifically tailored for them, his blueprint is written for a book on neural networks):

Defining the problem and assembling a dataset.
also, Choosing a measure of success.
Deciding on an evaluation protocol.
Preparing your data.
Developing a model that does better than a baseline.
Scaling up: developing a model that overfits.
Regularizing your model and thus tuning your parameters.

Chollet’s workflow is higher level, and focuses more on getting your model from good to great, as opposed to Guo’s, which seems more concerned with going from zero to good. While it does not necessarily jettison any other important steps in order to do so, the blueprint places more emphasis on hyperparameter tuning and regularization in its pursuit of greatness.

Drafting A Simplified Framework

We can reasonably conclude that Guo’s framework outlines a “beginner” approach to the machine learning process, more explicitly defining early steps, while Chollet’s is a more advanced approach, emphasizing both the explicit decisions regarding model evaluation and the tweaking of machine learning models. Both approaches are equally valid, and do not prescribe anything fundamentally different from one another; you could superimpose Chollet’s on top of Guo’s and find that, while the 7 steps of the 2 models would not line up, they would end up covering the same tasks in sum.

Mapping Chollet’s to Guo’s, here is where I see the steps lining up (Guo’s are numbered, while Chollet’s are listed underneath the corresponding Guo step with their Chollet workflow step number in parenthesis):

1. Data collection

Defining the problem and also assembling a dataset (1)

2. Data preparation

Preparing your data (4)

3. Choose model

4. Train model

Developing a model that does better than a baseline (5)

5. Evaluate model

Choosing a measure of success (2)

Deciding on an evaluation protocol (3)

6. Parameter tuning

Scaling up: developing a model that overfits (6)

Regularizing your model and tuning your parameters (7)

7. Predict

It’s not so perfect but can be used

so, this presents something important: both flows agree, and together place emphasis, on particular points of the framework. It should be clear that model evaluation and parameter tuning are important aspects of machine learning. Adding to that this areas of importance are the assembly/preparation of data and original model selection/training.

Let’s use the above to put together a simplified framework to machine learning.

5 main areas of the machine learning process:

1 – Data collection and preparation:from choosing the right data to clean it for feature engineering

2 – Feature selection and feature engineering: this includes all changes to the data from once it has been cleaned up to when it is ingested into the machine learning model

3 – Choosing the right machine learning algorithm and training our first model: make a “better than baseline” which may helpful to improve

4 – Evaluating our model: it includes the selection of the measure as well as the actual evaluation; seemingly a smaller step than others, but important to our end result

5 – Model tweaking, regularization, and hyperparameter tuning: this is where we iteratively go from a “good enough” model to our best effort

So now you can decide which framework can be more useful as per requirements

Written By: Nikesh Maurya

Reviewed By: Krishna Heroor

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs