Reinforcement learning simulators
- Reinforcement learning, thoughts on simulation planning vs execution within a simulation. I was confused previously about this dichotomy when thinking about traffic simulation in the problib; what counts as the “simulation”? Do agents have access to a simulation beforehand, or are they only allowed to make observations in the “real environment”? Are we designing agents to be good learners once thrown into the target environment, or to have possibly learned about the target environment logic beforehand and behave well during an actual “run”?
- Referencing Deepmind’s AlphaGo, the agents of course have access to the game logic during training. They train with the ability to roll out move sequences in the game tree, knowing what actions can be taken at each point. Note however that this is not entirely specific (before training, that is), as different games can simply provide a different set of moves at each point. So, while general in the fact they can rollout different game trees, they internally have access to the game logic like a search agent.
- So, while a simulated environment is there to evaluate an agent’s performance (whether the simulation is the target env or it’s a suitable replacement of the real world), it can also be used by the agent itself both during training and inference. Deepmind’s AlphaGo is fed actions by the game simulation during a real evaluation run (e.g. against Lee Sudol), but also makes use of the simulator internally when generating rollouts and deciding how to act at each point in the game.