Machine Learning is the art of Science, where computers learn from data through Programming.
A general Definition by Arthur Samuel in 1959,
ML is the field of study that gives computers the ability to learn without being explicitly program.
Then in the era of the 1990’s, a more engineering-oriented definition given,
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by p, improves with experience E.
For example, a general Sentimental Analysis is a ML program, where it detects the positive or negative opinions within the text, whether a whole document or paragraph or sentence or clause. This detection is Complete through learning from a training set.
In a training set, each example is known a training sample (or instance). The particular performance measure for classification problems is called ‘Accuracy’.
Machine Learning Application
Let’s have look at some concrete examples of Machine learning-related problems, along with the perspective technique-
Detecting Tumors in Medical Imaging
A Semantic Segmentation, where each pixel in the image is classified, typically using CNN’s
Automatically classifying new news article
It is a Natural Language Processing oriented more specifically text classification.
Forecasting of a company’s revenue in the future, based on many performance metrics
It is a Regression Task which is generally predicting values. this can be achieve by any regression model such as Linear Regression or Polynomial regression or Regression-Based SVM, or a regression Based Random Forest.
Voice Chat-based app
It is a Speech Recognition task, It is a complex task which is process out by Neural Networks.
Types of Machine Learning Systems
There are so many different types of Machine Learning System, we can classify them into some broad categories:-
1. Supervised, Unsupervised, semi-supervised, and reinforcement learning
2. Online and Batch Learning
3. Instance-Based Learning or Model-Based Learning
The above classification is not exclusive, it can combine while solving some machine learning tasks.
Supervision Learning
Machine Learning systems can be described under how much supervision they are getting label. There are mainly 4 categories such as supervised, unsupervised, semi-supervised, and reinforcement learning
1. Supervised Learning
In this Learning, the training set of data that is input to the algorithm includes the desire solution known as labels. It is typically a classification task.
Some of the most important algorithms are:-
- k- Nearest Neighbors.
- Linear Regression.
- Logistic Regression.
- Support Vector Machine (SVMs).
- Decision Trees and Random Forest.
- Neural Networks.
Note- Some Neural Network architecture can be unsupervised, such as autoencoders and Restricted Boltzmann Machines. They can be semi-supervised such as deep belief networks.
2. Unsupervised Learning
In this Learning, set of data is given to the algorithms that too unlabel.
Clustering
- K-Means.
- BDSCAN.
- Hierarchical Cluster Analysis (HCA).
Anomaly Detection and Novelty Detection
- One-class SVM.
- Isolation Forest.
Visualization and Dimensionality Reduction
- Principal Component Analysis (PCA).
- Kernel PCA.
- Locally Linear Embedding (LLE).
- t-Distributed Stochastic Neighbour Embedding(t-SNE).
Association Learning
- Apriori.
- Eclat.
3. Semi-supervised Learning
Since the Labelling data is usually time-consuming and costly, sometimes we occur with plenty of Unlabeled instances and few labeled instances. Some algorithms deal with data that’s partially tag. This is called semi-supervised learning.
For this kind of learning Google Photos is a good example. Simply, when we upload some pictures to the service it automatically recognizes that the same person let’s say A is shown in 2,7 and 12, while another person B shows up in photos 2,8 and 12. This process comes under unsupervised learning more especially Clustering. Now we must add a label per person in the service so that it can detect automatically.
Most semi-supervised learning algorithms are a combination of both supervised learning and unsupervised learning algorithms. For example, Deep belief networks (DBNs) are based on unsupervised components called restricted Boltzmann machines (RBM’s) stacked on top of one another. RBM’s are instruct sequentially in an Unsupervised manner, then the whole system is fine-tuned using supervise learning techniques.
4. Reinforcement Learning
Reinforcement Learning is a very different and interesting field of approach. The learning system consists of an agent that observes the environment, selects and performs actions, and gets rewards or penalties in the form of wrong rewards. The agent must adapt itself to learn a new strategy in the environment which is also known as a policy.
A policy is define as what an action the agent should choose when it is in a given situation.
Batch and online Learning
It is a type of Machine Learning system in which we mainly focus on how much a system can learn from a stream of incoming data.
1. Batch Learning
In Batch Learning, the system is incapable of learning incrementally, that is it must be trained using all the available data. This will generally take more time and computing resources which are not very handy for smartphone applications.
so basically, the whole set of operations are done offline.
When the whole training part is done then it is launch as in production and runs without learning anymore. Sometimes it is also offline system.
2. Online Learning
In these kinds of systems, generally, the engineer trains the system incrementally by feeding it data instances sequentially, either individually or in small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data when it arrives in the environment. Stock Price prediction systems are good examples of that.
Online Learning utilizes a Huge Dataset if one has not enough memory. that process is Out-of-core learning.
Instance-Based vs Model-Based Learning
We can classify machine learning systems in terms of how they generalize data. The main two approaches are:-
1. Instance-Based Learning
In Instance Learning the system learns the examples by heart, then generalizes to new cases by using similarity measures to compare them to the learned example.
2. Model-Based Learning
Another way to Generalize from a set of Datasets is to build a Model of these data points and then use them in the model for predictions. This is also known with the name model-based learning.
Machine Learning Project Flow
1. Study the data.
2. Select a model.
3. Train that model on training data such that cost function is to minimize the cost function
4. Finally, applying the model to make predictions on new data (this is called inference), at this time we hope for the best generalization.
The above is what a typical Machine Learning project looks like.
Main Challenges of Machine Learning
1. Insufficient Quality of Training Data
If we look at a child who wants to know about Apple then he should be well aware of all points of apple. Well, this is not the part where machine learning comes because it takes a lot of data in thousands for even simple problems, and for complex problems such as speech recognition, we may need millions of samples.
2. Non-representative Training data
In Order to generalize well, it is crucial that your training data be representative of the new cases you want to generalize to.
3. Poor-Quality of Data
If your data is full of errors, Outliers, and noise it will make it harder for the system to detect the underlying patterns, so the system will not perform well. It is always suggest to spend more time cleaning up your data.
4. Irrelevant features
A critical part of the success of a Machine Learning project coming up with a good set of features to train on. This process – feature engineering, It involves the following steps:
- Feature selection.
- Feature extraction.
5. Overfitting the Training Data
In Machine Learning, if a model performs well on the training data, but it does not generalize well on the testing part then it is overfitted.
6. Underfitting the Training Data
Underfitting is the opposite of overfitting, it occurs when your model is too simple to learn the underlying structure of the data.
Testing and Validation
Here we generally split our data into two sets: the training set and testing set. As the above name suggests, the model should be trained on a training set and tested on a testing set after checking for some parameters so that our model generalization error is low. For making generalizations error low we use Hyperparameter tuning and Model Selection procedure.
Hyper parameter Tuning and Model Selection
Suppose you launch a model in production, but unfortunately it does not perform as well as expected and produces 15% errors. The Problem is that you measured that generalization error multiple times on the test set and adapted the model and hyperparameter for that particular set. This means that the model is unlikely to perform as well on new data.
The solution to this problem is called holdout validation, you simply hold out part of the training set to evaluate several candidate models and select the best one. The new held-out set is – the validation set.
More specifically, you train multiple models with various Hyper parameters on the reduced training set and select the model that performs best on the validation set. After this holdout process, you train the best model on a full training set and this gives you the final model.
Written By: Karan Gupta
Reviewed By: Viswanadh
If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs