28 Jan 2024

Deep Reinforcement Learning 01

Deep Reinforcement Learning 01

image

Keypoints:

  • Introduction
    • What is Reinforcement Learning?
    • Why Deep Reinforcement Learning?

      Introduction

What is Reinforcement Learning?

image

  • Agent: The learner and decision maker
  • Environment: Everything outside the agent
  • State: The state of the environment
    • Observation: The observation of the agent from the environment. To be more specific, the observation is the state of the environment that the agent can perceive.
  • Action: The action of the agent
  • Reward: The reward of the agent after taking the action

Supervised Learning vs Reinforcement Learning

Some problems don’t have a clear answer, but we can still train the agent to get the best result. For example, the game of Go. The agent can’t know the best move in the current state, but it can learn from the experience and get the best move. However, in supervised learning, we need to know the answer of the problem. For example, the image classification problem. We need to know the label of the image to train the model.

Using Machine Learning to Evaluate the Action

Sometimes, there is no clear evaluation of the action. For example, when traing AI to imitate human conservation, we can’t know the best answer. However, we can use machine learning to evaluate the action. For example, we can use a neural network to evaluate the action. The input of the neural network is the state of the environment and the output is the evaluation of the action. Then, we can use the evaluation to train the agent.

Reward Delay

The reward of the action may be delayed. For example, when playing chess, the reward of the action may be delayed until the end of the game. Therefore, we need to consider the reward delay when training the agent.

Why Deep Reinforcement Learning?

  • Deep = scalable learning from large, complex datasets
    • i.e. to use neural network to model the agent or the evaluation of the action
  • Reinforcement Learning = Optimization

Tags:
0 comments