- Presents a framework for Reinforcement learning in the multi-fidelity setting. It does this by:
- Minimize number of suboptimal steps in ΣD, the highest cost/fidelity simulator i.e. the target task
- Only O(∣Σi∣) steps taken with simulator Σi before near-optimal behavior is reached or the switch is made back to a lower-fidelity simulator.
- Only polynomial in ∣Σ∣ and D number of switches made between simulators
- An optimistic multi-fidelity simulator chain is a series of simulators Σ1,…,ΣD, with ΣD the highest fidelity simulator, or the target task. The chain has a defined fidelity between simulators Σi and Σi+1 under a transfer mapping ρi and threshold βi.
- Multi-fidelity bandits