Skip to main content

Reinforcement Learning

Foundations

Markov Decision Process - 状态、动作、策略、奖励、回报、马尔可夫决策过程
Bellman Equation - 贝尔曼公式
Bellman Optimality Equation - 贝尔曼最优公式
Iteration - 值迭代、策略迭代、截断策略迭代

Algorithms

Monte Carlo - 蒙特卡洛学习
Stochastic Approximation - 随机梯度下降
Temporal-Difference - TD 学习与 Q-learning
Value Function Approximation - 值函数逼近
Policy Gradient - 策略梯度算法
Actor-Critic - Actor-Critic 算法

Foundations
Algorithms