Introduction to Automation System
Broadly speaking, Automation System is a technology which enables a process to be performed without human intervention. There are two types of automation systems:
- Rule-Based Systems
- Machine Learning Systems
A rule-based system represents knowledge in terms of a bunch of rules that tell you what you should do or what you could conclude in different situations. Again, like Data Science, the definition of Machine learning(ML) is up for debate. One of the widely accepted definitions has been provided by Arthur Samuels.
Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed. – Arthur Samuels (1959)
Before the advent of ML as a formal field of application and study, rule-based approaches were use to simulate intelligence. The intelligence of the computer system that played chess and defeat a grandmaster, Deep Blue was essentially improve by a set of rules that were defined by other grandmasters of chess and improve search algorithms.
So, essentially an expert defined a set of rules which helped automate some tasks and reduce human intervention. Thus, another name for these rule-based systems is Expert Systems.
Comparing Rule-Based and ML Systems | Automation System
If we think a little deeper, traditional computer programs are rule-base systems at heart. So, let’s design a rule-based system on how to recognize a car from a set of images. General rules for a car are:
- It has wheels
- also, It has headlight
- It has a wheel
Then for a computer program, you can make the algorithm if it has wheels AND if it has headlights AND if it has a wheel. You will code it up and then the input arrives, as shown below, which breaks your existing system.
What if the car is at an angle in the picture? What if the image of the car is just a drawing? It would be really CUMBERSOME to list down all scenarios and all possible rules.
On the other hand, think of a scenario where you can just pass on the different images of the car to a program, along with some data to help identify the car and then allow the program to figure out the rest of the rules by itself. This is a typical ML program – which, when given enough data will generalize and figure out the rest of the rules by itself.
So, the difference between a traditional computer program and a Machine Learning program is:
A Traditional Algorithm is a clear set of instructions to which we give input and get the desired output. A rule-based approach.
An ML Algorithm – gives a set of data and lets the algorithm figure out patterns and rules to apply for future inputs.
Why Rule-Based Systems Fail? | Automation System
Expert systems worked well before because the data generated was very structure in nature and the scope was limited. Thus, it was possible to define rules by hand. As the size of data keeps increasing, the rules need to be modified constantly. With the advent of semi-structured and unstructured data, rule-base systems were no longer practical.
thus, It would be something like fixing bugs as the code keeps getting bigger. The advantage of ML is that it gets better as the size of data keeps increasing. In summary, the advantage of ML over expert systems is that it is scalable and generalizes well on data.
We have looked at the basic intuition of ML and also looked at it in contrast with expert systems. There are a lot of technical terms associate up with Machine Learning, which we will introduce as and when the need arises. In the next topic, we will look at some of the misconceptions associated with ML.
Destroying the ML Misconceptions | Automation System
Before we deep dive into the various aspects of Machine Learning, it is important to do a sanity check on the popular misconceptions surrounding it.
Myth 1: Autonomous Learning
For the machines to be successful and provide the right insights on data, a lot of work needs to be done by humans on data to get it ready. True autonomous learning is not possible and humans cannot be completely removed from the equation.
Myth 2: Universal Application
Surprisingly, despite AI’s breadth of impact, the types of it being deployed are still extremely limited. Almost all of AI’s recent progress is through one type, in which some input data (A) is used to quickly generate some simple response (B). – Andrew Ng 2016 (Source: HBR)
Nothing summarizes the response to the myth better than these words of Andrew Ng, one of the pioneers in ML. Also, ML cannot be applied effectively when the collection of data is very difficult like in health domains.
On the other side of the coin, a lot of problems in the industry can be solved through simple data summarization and analysis and don’t need AI. Why use complicated statistical models to solve problems that could be solved by simple rules?
Myth 3: Transformational Insights
A popular myth is that once you feed the algorithm data, actionable insights will emerge like the pot of gold at the edge of the rainbow. It is exactly that – a mirage. For every data generated, defining the problem to solve is very hard and it is a continuous iterative process of refining data and rebuilding models.
Myth 4: Superhuman Intelligence
Machines are as intelligent as the humans who programmed it. Despite all the advancement, machines don’t have common sense and cannot be taught the same. Don’t worry – your smartphone won’t take over the world and enslave you. EVER!
Histogram and Scatter plot | Automation System
What is a histogram and why do you need it?
A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data. This allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc. Note that it requires only one array or series since it displays the frequencies.
How to construct a histogram using matplotlib?
Matplotlib’s .hist() the method provides an easy way of generating histograms. To construct a histogram from a continuous variable you first need to split the data into intervals, called bins. Each bin contains the number of occurrences of scores in the data set that are contained within that bin. Below are two plots with bins 10 and 20 carried out with the feature HP of the Pokemon dataset.
Key takeaways from these plots
- First and the most important thing; both these plots are the same.
- An important concept with histograms is binning.
- Binning is a way to group continuous values into a smaller number of bins.
- More the number of bins more is the number of intervals that leads to less frequency for every interval and vice versa.
- In the left image, the number of bins is small and so values are tightly packed. The right image has a number of bins, values are loosely packed and hence looks that way.
- But always remember that both these plots are the same; the difference in their appearance is due to the different number of bins.
- Both these plots have a right tail i.e. have a long tail on the right side
- Some observations have extreme values in the interval 230-260
- Most of the observations lie in the interval 40-75
Difference with bar charts | Automation System
Unlike a bar chart, there are no gaps between the bars (although some bars might be absently reflecting no frequencies). This is because a histogram represents a continuous data set, and as such, there are no gaps in the data.
Histograms are based on the area of the bars, not the height of bars
In a histogram, it is the area of the bar that indicates the frequency of occurrences for each bin. This means that the height of the bar does not necessarily indicate how many occurrences of scores there were within each individual bin. It is the product of height multiplied by the width of the bin that indicates the frequency of occurrences within that bin.
One of the reasons that the height of the bars is often incorrectly assessed as indicating the frequency and not the area of the bar is due to the fact that a lot of histograms often have equally spaced bars (bins), and under these circumstances, the height of the bin does reflect the frequency.
Visualize the distribution of `Attack` points for `Dragon` type (`Type 1`) Pokemons
In this task, you will plot the distribution of Attack points for Pokemons which have their first type (Type 1) as Dragon. You will also compare the mean values for Attack for all the Pokemons against the mean value of Attack for dragon type (by drawing a vertical line).
- Calculate the mean attack points for all the Pokemons and store it in a variable mean_attack
- Create a dataframe named dragon consisting of only Dragon type pokemon using conditional filtering (based on Type 1)
- Calculate the mean attack points for dragon and store it in a variable mean_dragon
- Use matplotlib’s .hist() on dragon and pass arguments column=’Attack’, bins=8
- To compare mean attack points you need to draw two lines; one with mean_attack and mean_dragon
- Use .axvline() and pass arguments x=mean_attack, color=’green’ to plot a vertical line representing mean attack points for all the Pokemons
- Use .axvline() and pass arguments x=mean_dragon, color=’black’ to plot a vertical line representing mean attack points for Dragon Pokemons
Test Cases:
- The variables mean_attack, dragaon and mean_dragon have to be declared.
- The value of mean_attack should be 79.00
- The value of mean_dragon should be 112.12
Why use scatter plots?
A question for many data sets is whether two items are related to each other in some way, that is, are they correlated? In our case, you can ask the question of whether Attack and Defense points are related to each other. Scatter plot helps us answer this kind of questions.
What is a scatter plot?
A scatter plot is a two-dimensional data visualization that is used to represent the values obtained for two different variables – one plotted along the x-axis and the other plotted along the y-axis. For generating a scatter plot you need two numerical arrays of data.
When to use a scatter plot?
A scatter plot helps us determine if two quantities are weakly or strongly correlated. Correlation implies that as one variable changes, the other also changes. While calculating the correlation coefficient will give us a precise number, a scatter plot helps us find outliers, gain a more intuitive sense of how to spread out the data is, and compare more easily.
For example:- The following scatter plot gives a clear indication between Attack and Defense points. As you can see there is a positive linear relationship between the Attack and Defense points.
written by: Thomas Vengazhiyil Alex
reviewed by: Kothakota Viswanadh
If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs