Active learning

Given data points $(x,y)$, how to select next data point for the model? Might select points which maximize expected information gain (EIG). That is, the data point $x$ which reduces the amount of uncertainty is the most informative.

$\text{argmax}_x \mathbb{H}[\theta|\mathcal{D}] - \mathbb{E}_{y\sim p(y|x, \mathcal{D})}[\mathbb{H}[\theta|y,x,\mathcal{D}]]$

1. Experimental configuration
2. Perform the experiment
3. Get observed data, and perform (Bayesian) inference using data and existing prior . Produces posterior over parameters of interest
4. Question is now: what experiment to run next?
1. Have method of proposing new experiments
2. Evaluate the EIG of the proposed experiment
3. Iterate/optimize this until converging on an experiment with high EIG

Bayesian optimization

• Have underlying true function mapping experiments to EIG; don’t know what this is, and expensive to evaluate points on this function (i.e. running/simulation the experiment and evaluating info gain from resulting posterior under generated data)
• The goal at any given iteration in our learning process is to find the experiment that maximizes this function. Here we use Bayesian optimization; have an acquisition function which itself evaluates which data point should be sampled next to maximize what we know about the underlying EIG function. This AF is designed to be inexpensive and easy to optimize.
• So we have a set of true data points lying on our EIG function. We have some probabilistic model (say, a Gaussian process) predicting this function given these points. The true EIG function is unknown to us, and this model gives our current best guess at the function given our data. A cheap acquisition function (e.g. UCB) is maximized to select a new data point to evaluate on this function. This data point effectively defines the experiment we wish to run, so we simulate this experiment and compute the EIG of that experiment using the generated data. We have now labeled a new point on our EIG curve, and are one step closer to finding the experiment that we believe will maximize the EIG of our model.

Simulation-based inference

The component where we just hand-waved Bayesian inference above is not actually available. Given our whole process designed so far, once we have real experimental data and a prior, usually we can just compute our posterior over parameters (as mentioned). BUT in the case we don’t actually know about our likelihood model is (or it’s intractable); we just have our simulator which encodes the knowledge about the setting we’re trying to reason about. Here is where we make use of simulation-based inference

Scientific process

All in all, this process encapsulates a sort of automated scientific method. We have some knowledge about the world and want to design experiments that yield data that can be used to reason about parameters of interest.

Directions

• When we have a simulator, we want to be able to use it very efficiently i.e. run simulator on parameters that maximize EIG and reason cumulatively about all runs. Can also make use of probabilistic programming to p
• Can we completely close the loop? We can learn the simulator via deep learning and have an entirely data driven inference/experimental design pipeline; not even the logic about how our data are generated comes from a manually designed program. Here we could either learn a model using a simulator (this is an interesting problem along the active learning, which is more or less what we’ve been trying to do for the posterior estimation side instead of directly the likelihood. But the interesting bits are still the same: how do we select parameters to use in the simulator to produce maximally informative (param, data) pairs to train our DL likelihood estimator on?), have existing data from the world and train a model directly on this to learn the latent dynamics of the world, OR we simply start with nothing but the ability to perform experiments and want to design experiments to incrementally improve our understanding of the underlying world dynamics AND our ability to design better experiments, building more and more on each other (this is sort of a meta-learning approach; how can we learn to learn the fastest from scratch?).
• Probabilistic programming can allow a more directed approach with particular values and variables we can tune to ultimately produce data like that which we observe. Here we have ideas like differentiable simulators that can be adapted with some learnable components to the real world from data, instead of relying purely on the theoretical framework underneath.