DDPG Project「建议收藏」

DDPG Project「建议收藏」1.RememberthedifferencebetweentheDQNandDDPGintheQfunctionlearningisthattheTarget’snextMAXQvalueisestimatedbytheactor,notthecriticitself.(Incontinuousactionspace,the…

大家好,又见面了,我是你们的朋友全栈君。

1. Remember the difference between the DQN and DDPG in the Q function learning is that the Target’s next MAX Q value is estimated by the actor, not the critic itself. (In continuous action space, the critic cannot estimate the MAX Q value without optimization. So the best choice is to use actor directly gives the BEST action.)

 

The code of 1st pic is wrong:

71: the critic_target network is to output the maximum Q value based on the estimation of actor_target network, so there is no need once more max operation (But in DQN we do need that max operation because in DQN the next Max Q value is directly estimated by critic_target itself (Q value function).)

72. the critic (Q function) in DDPG can directly output the relative input action Q value, so there is not need to gather the action index relative Q value.

74. Because optimizer will accumulate the gradient values. so use optimizer.zero_grad() to clear it.(instead of network.zero_grad)

75. Optimizer should call the step() function for backward the error.

. Do not forget to add the determination of final state: 1- dones.

DDPG Project「建议收藏」

DDPG Project「建议收藏」

 

 

79. In the actor learning part, the input actions of the critic_local is not the sample action, is the action estimated by actor. (Be careful with that). Also, it should calculate the mean of it. Finally, we want to maximize the performance but the optimizer is used to minimize object, so we have to set the negative sign.

DDPG Project「建议收藏」

In the soft_update, remember to use the attributes of the data to copy. 

DDPG Project「建议收藏」

DDPG Project「建议收藏」

 

 

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/148618.html原文链接:https://javaforall.cn

【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛

【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...

(0)
blank

相关推荐

  • insertBefore()

    insertBefore()insertBefore()方法将把一个给定的节点插入到一个给定元素节点的给定子节点前面,他返回一个指向新增子节点的引用指针:如上所示,节点newNode将被插入元素节点element并出现在节点t

  • Source Insight 4.0 序列号 license文件

    Source Insight 4.0 序列号 license文件安装程序下载在官网上下载SourceInsight4.0的安装程序.目前版本4.00.0098可用30天的试用安装首次启动选择授权方式,这里选择第二个选项,30天试用。点击下一步,输入名称、公司或组织名称、邮箱信息,申请30天的试用。输入完成后,点击下一步,直到安装完成。修改sourceinsight4.exe用16进制编辑器(sublimetext)打开s…

  • python numpy矩阵转置_python转制

    python numpy矩阵转置_python转制题目难度:★☆☆☆☆类型:几何、二维数组、数学给定一个矩阵A,返回A的转置矩阵。矩阵的转置是指将矩阵的主对角线翻转,交换矩阵的行索引与列索引。示例示例1输入:[[1,2,3],[4,5,6],[7,8,9]]输出:[[1,4,7],[2,5,8],[3,6,9]]示例2输入:[[1,2,3],[4,5,6]]输出:[[1,4],[2,5],[3,6]]提示1…

  • Sublime text3 Version 3.2.1 3207 和 3.2.2 3211(2019-11-06亲测有效)

    Sublime text3 Version 3.2.1 3207 和 3.2.2 3211(2019-11-06亲测有效)Sublimetext3Version3.2.13207激活码许可证(2019-04-30亲测有效)在hosts中添加: 127.0.0.1license.sublimehq.comhosts地址: C:\Windows\System32\drivers\etc点击下载Sublimetext3打开sublime安装文件地址点击下载激活成功教程工具将激活成功教程工具复制到安装文件…

  • MySQL事务表和非事务表

    MySQL事务表和非事务表

  • 一比一还原axios源码(六)—— 配置化

    上一章我们完成了拦截器的代码实现,这一章我们来看看配置化是如何实现的。首先,按照惯例我们来看看axios的文档是怎么说的:首先我们可以可以通过axios上的defaults属性来配置api。我们可

发表回复

您的电子邮箱地址不会被公开。

关注全栈程序员社区公众号