Note that the Roboschool reward scales are different from MuJoCo's. All results are ran with 4 sessions with distinct random seeds.
mean_returns_mais the returns moving-average over 100 checkpoints from the sessions averaged.
| Env. \ SAC | mean_returns_ma |
graph |
|---|---|---|
| RoboschoolAnt | 2451.55 | ![]() |
| RoboschoolHalfCheetah | 2004.27 | ![]() |
| RoboschoolHopper | 2090.52 | ![]() |
| RoboschoolWalker2d | 1711.92 | ![]() |
![]() |
![]() |
| Trial graph | Moving average |





