Reinforcement Learning

Cff75b6381740dd7da4dc246ef13e085cfb9d907

Education

03 Oct 2020 09:00 - 31 Oct 2020 12:00

Online Event

30 followers

Education

Reinforcement Learning

03 Oct 2020 09:00 - 31 Oct 2020 12:00

Online Event

30 followers

Reinforcement Learning คืออะไร ?

Reinforcement Learning เป็นแนวทางการเรียนรู้ของ AI ซึ่งมีลักษณะที่เหมือนกับการเรียนรู้ของมนุษย์ นั่นคือเป็นการเรียนรู้จากการลองผิดลองถูก และพยายามค้นหาแนวทางรับมือกับปัญหาหนึ่ง ๆ ให้ดีที่สุด ซึ่งนำไปสู่ผลลัพธ์ที่มีประสิทธิภาพ เช่น AlphaGO, self-driving car, stock trading bot

แนวทางการเรียนรู้แบบ Reinforcement Learning แตกต่างจาก Supervised Learning อย่างสิ้นเชิง เพราะ Supervised Learning เป็นการเรียนรู้จากข้อมูลที่มีอยู่เพื่อพยากรณ์ข้อมูลที่อยู่นอกขอบเขตที่มี ในขณะที่ Reinforcement Learning เป็นการเรียนรู้โดยอาศัยประสบการณ์จากการลองผิดลองถูกและเรียนรู้ผลดี-ผลเสียของวิธีแก้ปัญหาหนึ่ง ๆ เพื่อเสาะหาวิธีแก้ปัญหาที่ได้ผลดีที่สุดกับปัญหานั้น

ตัวอย่างเนื้อหาที่เราจะได้เรียนอย่างละเอียด มีดังนี้

1. k-Armed Bandit Problem

2. Markov Decision Process

3. Dynamic Programming

4. Monte Carlo

5. Temporal Difference Learning

6. Sarsa

7. Q-Learning

8. Double Q-Learning

Concept การเรียนคอร์สนี้

1. คณิตศาสตร์ที่ใช้ในการสร้าง Reinforcement Learning

2. เขียน Code สร้าง Reinforcement Learning

3. ตัวอย่างการประยุกต์ใช้ Reinforcement Learning ในชีวิตจริง

จุดเด่นของคอร์ส

1. คณิตศาสตร์ที่ยากจะถูกแปลงเป็น "ภาพ" และถูกอธิบายด้วย "ภาษาที่คนทั่วไปเข้าใจได้"

2. นักเรียนเห็นภาพรวม (Overview) และความต่อเนื่องของเนื้อหา

3. อธิบายทุกขั้นตอนอย่างละเอียดและรัดกุม (ย่อยมาให้อย่างดีแล้ว)

4. ตัวอย่างการคำนวณด้วยมือ (เพื่อให้นักเรียนได้ลงมือปฏิบัติและทบทวนความเข้าใจ)

5. ส่วนประกอบของคอร์สนี้มีครบทั้ง I) ทฤษฎี II) เขียน code III) การประยุกต์ใช้

คอรส์นี้เหมาะกับ

1. ทุกคนที่สนใจศึกษา Reinforcement Learning

Agenda

Week

Content

Day 1

(03/10/63)

Day 1 - เช้า

รู้จัก Reinforcement Learning
ตัวอย่างการใช้งาน Reinforcement Learning ในปัจจุบัน
Concept: Agent vs Environment
Reinforcement Learning ฉลาดได้อย่างไร ?
ทำความเข้าใจปัญหา ผ่าน k-Armed Bandit Problem
ทำความเข้าใจ Action-value Method และ Incremental Implementation
ตัวอย่างการคำนวณ Incremental Implementation
ความแตกต่างของ Stationary กับ Non-stationary Problems
ตัวอย่างสถานการณ์ที่เป็น Stationary กับ Non-stationary
ทำความเข้าใจ UCB (greedy ที่มีประสิทธิภาพมากขึ้น)
ตัวอย่างการคำนวณ UCB
ทำความเข้าใจ Associative Search: พูดคุยเพื่อนำไปสู่ปัญหาที่ซับซ้อนขึ้น

Day 1 - บ่าย

ทำความเข้าใจ Markov Decision Process
รู้จักกับ Rewards และ Returns
Policies and Value Functions
ตัวอย่างการคำนวณ Policies และ Value Functions
Bellman Equations: Optimal Policies and Optimal Value Functions
ตัวอย่างการคำนวณ Optimal Policies และ Optimal Value Functions
Optimality and Approximation: พูดคุยเพื่อนำไปสู่การประมาณ optimal policy

Day 2

(10/10/63)

Day 2 - เช้า

Dynamic Programming (DP): แนวคิดแรกในการประมาณ optimal policy
Policy Evaluation, Policy Improvement, and Policy Iteration
ตัวอย่างการคำนวณ Policy Evaluation
ตัวอย่างการคำนวณ Policy Improvement
ตัวอย่างการคำนวณ Policy Iteration
Value Iteration: การปรับปรุงจาก Policy Iteration
ตัวอย่างการคำนวณ Value Iteration

Day 2 - บ่าย

Asynchronous DP
ตัวอย่างการคำนวณ Asynchronous DP
Generalized Policy Iteration (GPI): แนวคิดทั่วไปของการประมาณ optimal policy
Monte Carlo Prediction
ตัวอย่างการคำนวณ Monte Carlo Prediction
Monte Carlo Control (GPI ภายใต้แนวคิดแบบ Monte Carlo)
ตัวอย่างการคำนวณ Monte Carlo Control
On-policy and Off-policy Predictions
ตัวอย่างสถานการณ์ที่เป็น On-policy กับ Off-policy

Day 3

(17/10/63)

Day 3 - เช้า

Off-policy Monte Carlo Control
ตัวอย่างการคำนวณ Off-policy Monte Carlo Control
Temporal-Difference Learning (TD): แนวคิดที่ทำให้ประมาณ optimal policy ได้เร็วขึ้น
ตัวอย่างการคำนวณ TD
TD vs Monte Carlo
ตัวอย่างการเปรียบเทียบ TD กับ Monte Carlo
ข้อควรคำนึงในการใช้ TD และแนวทางการปรับปรุง

Day 3 - บ่าย

Sarsa
ตัวอย่างการคำนวณ Sarsa
Q-Learning
ตัวอย่างการคำนวณ Q-Learning
Expected Sarsa
ตัวอย่างการคำนวณ Expected Sarsa
Double Q-Learning
ตัวอย่างการคำนวณ Double Q-Learning

Day 4

(24/10/63)

Day 4 - เช้า

n-step TD Methods
ตัวอย่างการคำนวณ n-step TD Methods
n-step Sarsa
ตัวอย่างการคำนวณ n-step Sarsa

Day 4 - บ่าย

Off-policy Learning without Importance Sampling
ตัวอย่างการคำนวณ Off-policy Learning without Importance Sampling
n-step Off-policy Learning
ตัวอย่างการคำนวณ n-step Off-policy Learning

Day 5

(31/10/63)

Day 5 - เช้า

Models and Planning
Dyna: Integrated Planning, Acting and Learning
ตัวอย่าการคำนวณ Dyna
สถานการณ์เมื่อ model มีปัญหา

Day 5 - บ่าย

Expected vs Sample Updates
ตัวอย่างการคำนวณ Expected and Sample Updates
Real-time Dynamic Programming
ตัวอย่างการคำนวณ Real-time Dynamic Programming
Monte Carlo Tree Search

* สิ่งที่ผู้เรียนต้องเตรียมมา

Notebook ที่ลงโปรแกรม Anaconda (Python version 3.7)

Instructor

อาจารย์ฆฤณ ชินประสาทศักดิ์ |

นักวิจัยด้าน AI และที่ปรึกษาด้าน AI ให้กับบริษัทเอกชน

Quantitative Researcher ผู้ก่อตั้งบริษัท Made by AI, Quant Metric

สอบถามรายละเอียดเพิ่มเติมทาง

Email: krin.c@madebyai.io

Call: 086-524-4463 (คริน)

tech, programing, ai

Organized by

TAUTOLOGY

Contact

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning