Controller: Evolution Strategies
Overview
The Controller in World Models is a simple linear model that maps the combined representation (z, h) to actions.
Why a Simple Controller?
The key insight is that most of the "intelligence" is captured by the World Model:
- The VAE extracts relevant visual features
- The MDN-RNN learns environment dynamics
The controller only needs to map compressed representations to actions.
Evolution Strategies
Instead of gradient-based optimization, the controller is trained using Covariance Matrix Adaptation Evolution Strategy (CMA-ES).
Why Evolution Strategies?
- Gradient-free: No need for differentiable reward
- Parallelizable: Evaluate many candidates simultaneously
- Global optimization: Less likely to get stuck in local minima
Training in Dreams
A remarkable feature of World Models is dream training:
- Collect real data: Random rollouts in the actual environment
- Train V and M: Learn the World Model
- Train C in dreams: Use MDN-RNN to simulate experiences
The controller trained entirely in dreams achieves near-optimal performance!