均方价值误差 (Mean Squared Value Error, MSVE):
J(w)=E[(vπ(S)−v^(S,w))2]
均方贝尔曼误差 (Mean Squared Bellman Error, MSBE):
J(w)=E[(r+γv^(S′,w)−v^(S,w))2]
梯度下降更新:
wt+1=wt+αt(vπ(st)−v^(st,wt))∇wv^(st,wt)
DQN 用深度神经网络逼近 Q(s,a), experience replay:
- Break correlations (样本相关性)
- 提高数据利用效率