Reinforcement Learning with Multi-Fidelity Simulators

  • Presents a framework for Reinforcement learning in the multi-fidelity setting. It does this by:
    1. Minimize number of suboptimal steps in ΣD\Sigma_D, the highest cost/fidelity simulator i.e. the target task
    2. Only O(Σi)O(|\Sigma_i|) steps taken with simulator Σi\Sigma_i before near-optimal behavior is reached or the switch is made back to a lower-fidelity simulator.
    3. Only polynomial in Σ|\Sigma| and DD number of switches made between simulators
  • An optimistic multi-fidelity simulator chain is a series of simulators Σ1,,ΣD\Sigma_1,\dots,\Sigma_D, with ΣD\Sigma_D the highest fidelity simulator, or the target task. The chain has a defined fidelity between simulators Σi\Sigma_i and Σi+1\Sigma_{i+1} under a transfer mapping ρi\rho_i and threshold βi\beta_i.
  • Multi-fidelity bandits