Back to Modules

intermediateComponents

Controller: Evolution Strategies

Learn how the controller is trained using CMA-ES evolution strategies.

30 min read

2 references

Controller: Evolution Strategies

Overview

The Controller in World Models is a simple linear model that maps the combined representation (z, h) to actions.

Why a Simple Controller?

The key insight is that most of the "intelligence" is captured by the World Model:

The VAE extracts relevant visual features
The MDN-RNN learns environment dynamics

The controller only needs to map compressed representations to actions.

Evolution Strategies

Instead of gradient-based optimization, the controller is trained using Covariance Matrix Adaptation Evolution Strategy (CMA-ES).

Why Evolution Strategies?

Gradient-free: No need for differentiable reward
Parallelizable: Evaluate many candidates simultaneously
Global optimization: Less likely to get stuck in local minima

Training in Dreams

A remarkable feature of World Models is dream training:

Collect real data: Random rollouts in the actual environment
Train V and M: Learn the World Model
Train C in dreams: Use MDN-RNN to simulate experiences

The controller trained entirely in dreams achieves near-optimal performance!

References

Academic papers and resources

The CMA Evolution Strategy: A Tutorial

Nikolaus Hansen (2016)

paper

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Tim Salimans et al. (2017)

paper