How Reinforcement learning Works?

The Reinforcement learning concept became more and more popular nowadays. From Playing Chess To Self Driving Cars And Defeating Professional Players Over Online Or Offline. Before directly jumping into the reinforcement concept let’s have a quick overview into Machine learning concepts.

What is machine learning?

We all know machine learning is the subset of Artificial Intelligence(AI) which provides machines the ability to learn without being explicitly programmed. Data is the key for the ML throughout the process.

Types of machine learning | Reinforcement Learning

1. Supervised Learning: In supervised machine learning we train the machine with the help of labeled data so it can be able to predict the future events.

Consider you have Data: (x, y) where x is feature space and y is target then the main goal is to map the function x to y.

Examples:

I. Classification

II. Regression

III. Object detection

IV. Image captioning

2. Unsupervised learning:

In unsupervised learning, the model learns through observation and finds structures in the data. Based on the relationships or other factors it will make clusters or groups.

For example Netflix movies, based on the types of movies he/she watched in the past will be recommended with the same type of movies.

Consider you have the Data: x where (x) is the feature and there are no labels then the main goal is to learn some underlying hidden pattern of the data.

Examples:

I. Clustering

II. Association

                                         Pic Credits: Google

3. Reinforcement Learning:

Reinforcement learning includes the training of algorithms using a system of reward and punishment. A reinforcement algorithm, or an agent, learns from the feedback  with its environment. It is more general than supervised and unsupervised learning.

Problem involving an agent interacting with an environment. From the above figure, a car is our agent and the sign board is our environment. When the agent did not take the right turn then there is a possibility of accident or car crash. From this agent will get a penalty for not interacting with the environment and receives feedback. In the next trial it will not make any mistake while turning. Basically it performs a hit and trial method.

In reinforcement learning, the agent initially: –

I. Observe

II. Select action using policy

III. Action

IV. Reward or penalty

V. Update policy

VI. Same process continues…

Reinforcement learning is the science of decision making

For example: Teach cat some new tricks

Consider the scenario of teaching the cat some new tricks. The cat doesn’t understand our normal English language, so we can’t tell the cat what to do using language as a source. Instead, we try to follow some sign language or different strategy. We try to emulate a situation, so the cat tries to understand and the cat tries to respond in different ways.

If the cat’s response is the desired one, we reward the cat with some milk or other stuff. Now when we put a cat in the same situation on another day, the cat executes the similar action. Similarly, cats will tend to learn what not to do when faced with negative experiences.

How Reinforcement Learning works in a broader sense?

a. Your cat is an “agent” that is exposed to the environment. The environment could be in your house or any other place, with you.

b. The situations they encounter are analogous to a state. An example of a state could be your cat standing and you use a specific word in a certain tone in your living room .

c. Our agents react by performing an action to transition from one “state” to another “state,” your cat goes from standing to sitting, for example. After the transition, the cat may receive a reward or penalty in return. You give them milk! Or a “No” as a penalty.

d. The policy is the strategy of choosing an action given a state in expectation of better outcomes.

Reinforcement learning definitions:

1. Agent: The RL model which learns from hit and trial method.

2. Environment: The real world or in which agent moves.

3. Action(A): All possibilities that agents can take.

4. State(S): Current condition required by the environment.

5. Reward(R): An instant feedback from the environment i,e positive or negative.

6. Policy(pie): The approach that the agent uses for the next actions based on the reward.

7. Value(V): The expected long term with discount, as opposed to the short term reward.

Classification of Reinforcement agent

1. Model based reinforcement learning : In model based reinforcement learning, the agent will try to understand the world and try to create a model to represent it. 

2. Model free reinforcement learning: Model free models learn directly for experience, this means that the agent will perform actions in real world or on the computer. Then agents will collect rewards from the environment, whether positive or negative, and they update their value.

Markov decision process

Markov Decision Processes are formalization of sequential decision making, where actions influence not just intermediate reward, but also subsequent situation, or states. MDPs involve delayed reward and need to trade-off immediately . Future state and reward currently depends on the current state and actions.

In this method, the following parameters are used to get solutions.

1. Set of actions, A

2. Set of states, S

3. Reward, R

4. Policy

5. Value, V

The agent must take some set of actions from start state to the end state. While doing the action the agent will receive the rewards for each action that agent takes. The series of actions taken by the agent called policy or approach. The rewards are collected and define the value. So the main goal is to maximize the rewards by choosing the best policy.

Let’s understand Markov decision process with the shortest path problem 

In this problem, the goal is to find the shortest path between any two points with minimum possible cost.

From the above figure,

1. Set of states are denoted by nodes( A,B,C,D)

2. Action is to traverse from one node to other ( A->B->C->D)

3. Reward is the value represented over each edge

4. Policy is the action taken to reach from one node to other

If we want to know the shortest path from A to reach D, the policy we should choose which has the highest reward and also shortest path too. When we make policy A->B->D, the reward we get is 42. When we check the policy A->C->D, the reward we get is 76 which is greater than policy A->B->D. Choosing the best policy with higher reward makes the decision more accurate.

Understanding Q – Learning

Using Q -Learning a Reinforcement learning agent tries to understand the quality of the actions based on the rewards R  it receives. So that It decides which actions A to be performed in sequence to get maximum rewards in long-term . 

Let’s develop a intuition behind Q Learning

Let’s say you are playing a football game and in the game your current state is x(t). Your next step is to take action to score a goal where action is a(i), i ranges from 1 to n. Then, you move to a different position followed by state changes to x(t+1) and also gives you reward r(i). In the end we can say, for each state there will be a reward or profit value called Q- Value. The value associated with the state x(t) as decision process(x(t)) or DP[x(t)].

Then Decision process DP is written with

DP[x(t+1)] =  maximum actions taken {r(i) + updated state DP[ x(t) ] }

Now, the above equations are also called with the Bellman Equation : Q(s,a) = max ( r + Q(s,a) ). 

The above equation normally looks like the function x=f(x). When we compared this equation with the Bellman equations then our Q(s,a) is x and and f is our function. This equation totally follows iterations. For each  iteration there will be an update. So it is also called Bellman Updates.

When we perform Bellman updates to solve the Decision Process problems(Bellman Equations). This is called Q – Learning.

Reinforcement Applications

  1. Self Driving Cars/ Autonomous Cars
  2. Traffic light control system
  3. Self Drive Drones
  4. Personalized Recommendations
  5. And many more…

Conclusion

This article is just an overview of RL concepts. I hope you liked the article and found it useful. If any of you worked with any RL projects feel free to share your thoughts in the comment section below.

Resources

1.Awesome Reinforcement Learning Github rep

2. Book on Introduction to Reinforcement Learning

Thanks for reading!

article by: Krishna Heroor

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *