Interactive demonstrations showing how World Models perceive, predict, and act. Understand the V-M-C architecture through visual exploration.
┌─────────────────────────────────────────────────────────────────┐ │ World Model Architecture │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ │ │ │ │ │ │ Vision │ ──▶ │ Memory │ ──▶ │ Controller │ │ │ │ Model (V) │ │ Model (M) │ │ (C) │ │ │ │ │ │ │ │ │ │ │ │ VAE │ │ MDN-RNN │ │ Linear │ │ │ │ │ │ │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Latent │ │ Hidden │ │ Action │ │ │ │ Vector z │ │ State h │ │ Output a │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘
Uses a VAE to compress 64x64 pixel frames into a 32-dimensional latent vector z, capturing essential visual features.
An MDN-RNN that predicts future latent states, maintaining a 256-dimensional hidden state h for temporal context.
A simple linear model that maps z and h to actions, trained using evolution strategies (CMA-ES).
Dive deeper into the theory and implementation of World Models through our structured learning modules.