Reinforcement Learning with Multi-Fidelity Simulators

created: 2020-11-05 · modified: 2021-01-07
media: paper · source: MIT · volume: 5

tags:

Presents a framework for Reinforcement learning in the multi-fidelity setting. It does this by:
1. Minimize number of suboptimal steps in $\Sigma_D$ , the highest cost/fidelity simulator i.e. the target task
2. Only $O(|\Sigma_i|)$ steps taken with simulator $\Sigma_i$ before near-optimal behavior is reached or the switch is made back to a lower-fidelity simulator.
3. Only polynomial in $|\Sigma|$ and $D$ number of switches made between simulators
An optimistic multi-fidelity simulator chain is a series of simulators $\Sigma_1,\dots,\Sigma_D$ , with $\Sigma_D$ the highest fidelity simulator, or the target task. The chain has a defined fidelity between simulators $\Sigma_i$ and $\Sigma_{i+1}$ under a transfer mapping $\rho_i$ and threshold $\beta_i$ .
Multi-fidelity bandits

Metadata