AIMachine LearningReinforcement LearningOn this pageReinforcement Learning 强化学习是一种机器学习方法, 通过智能体与环境交互, 智能体根据环境的反馈调整策略, 利用梯度上升算法 (Gradient Ascent), 最大化长期奖励 (learn from rewards and mistakes). θ∗=argmaxθRˉθ=argmaxθ∑τR(τ)P(τ∣θ)θt+1=θt+η∇Rˉθ∇Rˉθ=[∂Rˉθ∂w1∂Rˉθ∂w2⋮∂Rˉθ∂b1⋮]Rt=∑n=tNγn−trn\begin{equation} \begin{split} \theta^*&=\arg\max\limits_\theta\bar{R}_\theta=\arg\max\limits_\theta\sum\limits_{\tau}R(\tau)P(\tau|\theta)\\ \theta_{t+1}&=\theta_t+\eta\nabla\bar{R}_\theta\\ \nabla\bar{R}_\theta&=\begin{bmatrix}\frac{\partial\bar{R}_\theta}{\partial{w_1}}\\\frac{\partial\bar{R}_\theta}{\partial{w_2}}\\\vdots\\\frac{\partial\bar{R}_\theta}{\partial{b_1}}\\\vdots\end{bmatrix}\\ R_t&=\sum\limits_{n=t}^N\gamma^{n-t}r_n \end{split} \end{equation}θ∗θt+1∇RˉθRt=argθmaxRˉθ=argθmaxτ∑R(τ)P(τ∣θ)=θt+η∇Rˉθ=∂w1∂Rˉθ∂w2∂Rˉθ⋮∂b1∂Rˉθ⋮=n=t∑Nγn−trn Actor-Critic Model Inverse Reinforcement Learning