DDPG Project「建议收藏」

全栈程序员-用户IM • 2022年6月28日下午10:36 • 未分类

DDPG Project「建议收藏」1.RememberthedifferencebetweentheDQNandDDPGintheQfunctionlearningisthattheTarget’snextMAXQvalueisestimatedbytheactor,notthecriticitself.(Incontinuousactionspace,the…

大家好，又见面了，我是你们的朋友全栈君。

1. Remember the difference between the DQN and DDPG in the Q function learning is that the Target’s next MAX Q value is estimated by the actor, not the critic itself. (In continuous action space, the critic cannot estimate the MAX Q value without optimization. So the best choice is to use actor directly gives the BEST action.)

The code of 1st pic is wrong:

71: the critic_target network is to output the maximum Q value based on the estimation of actor_target network, so there is no need once more max operation (But in DQN we do need that max operation because in DQN the next Max Q value is directly estimated by critic_target itself (Q value function).)

72. the critic (Q function) in DDPG can directly output the relative input action Q value, so there is not need to gather the action index relative Q value.

74. Because optimizer will accumulate the gradient values. so use optimizer.zero_grad() to clear it.(instead of network.zero_grad)

75. Optimizer should call the step() function for backward the error.

. Do not forget to add the determination of final state: 1- dones.

DDPG Project「建议收藏」

DDPG Project「建议收藏」

79. In the actor learning part, the input actions of the critic_local is not the sample action, is the action estimated by actor. (Be careful with that). Also, it should calculate the mean of it. Finally, we want to maximize the performance but the optimizer is used to minimize object, so we have to set the negative sign.

DDPG Project「建议收藏」

In the soft_update, remember to use the attributes of the data to copy.

DDPG Project「建议收藏」

DDPG Project「建议收藏」

版权声明：本文内容由互联网用户自发贡献，该文观点仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至举报，一经查实，本站将立刻删除。

发布者：全栈程序员-用户IM，转载请注明出处：https://javaforall.cn/148618.html原文链接：https://javaforall.cn

【正版授权，激活自己账号】： Jetbrains全家桶Ide使用，1年售后保障，每天仅需1毛

【官方授权正版激活】： 官方授权正版激活支持Jetbrains家族下所有IDE 使用个人JB账号...

赞 (0)

全栈程序员-用户IM

0 0

怎么卸载nodejs(nodejs mongodb)

Node.js是一个JavaScript运行环境，可以使JavaScript这类脚本语言编写出来的代码运行速度获得极大提升，那么安装后该如何卸载呢？Windows平台下卸载nodejs对于Windows平台来说，所有的应用程序的卸载方法都是一样的。1、在【卸载程序】中卸载程序和功能在桌面左下角单击【开始】按钮，然后选择【控制面板】，在控制面板窗口中找到【卸载程序】，单击打开。打开后可以看到所有已经安装的程序，找到node.js，然后单击右键选择【卸载】等待一会后系统就会提示卸..

全栈程序员-用户IM
2022年4月18日
DM368_了解电脑硬件基本知识

DM368_了解电脑硬件基本知识最近到了找工作准备期，之前已将C语言、数据结构与算法、APUE总结完毕，现在需要抓紧将以往项目加以总结。关于DM368首先我们先从硬件部分开始讲起，然后再讲环境搭建、系统移植、文件烧写、最后程序开发。一、认识开发板参看下面网址可下载DM368参考原理图和Gerber文件。参看：EVMDM368SupportHome参看：EVMDM365SupportHomeDM365与DM

全栈程序员-用户IM
2022年8月13日
Spring MVC 3 深入总结

Spring MVC 3 深入总结

全栈程序员-用户IM
2021年12月4日
用js来实现那些数据结构10（集合02-集合的操作）[通俗易懂]

前一篇文章我们一起实现了自定义的set集合类。那么这一篇我们来给set类增加一些操作方法。那么在开始之前，还是有必要解释一下集合的操作有哪些。便于我们更快速的理解代码。1、并集：对于给定的两个集合，

全栈程序员-用户IM
2022年3月25日
从零开始学习java一般需要多长时间？「建议收藏」

从零开始学习java一般需要多长时间？「建议收藏」其实学java一般要多久？因人而异，例如一个零基础的小白自学java，每天学习8个小时来算，而且在有学习资料的基础上，每天学习，从零到找到工作，起码要半年起步，而且还要有项目经验，否则是不会有公司要你的。而一个有一些基础的人，在经过有人系统的教学后，是可以很快学会掌握java的，大概3个月左右。不过java相对于C,C++java而言，java无疑简单了很多，不需要指针，不需要销毁对象，使得对ja…

全栈程序员-用户IM
2022年7月7日
简单工厂和策略模式_21680002工厂模式

简单工厂和策略模式_21680002工厂模式简单工厂加策略模式的应用

全栈程序员-用户IM
2022年9月12日

发表回复

关注全栈程序员社区公众号