# Simulated annealing

Common metaheuristic for global optimization

Simulated annealing is a Metaheuristic for approximating global optima. Broadly speaking, the algorithm uses an energy-based iterative optimization process for exploring the search space. The exact approach varies from problem to problem, but the generic algorithm has the following components:

• Iteration: the basic step of the algorithm is a probabilistic decision between staying in the current state $s$ and moving to some neighboring state $s^\prime$. Each iteration yields a state of lower energy, effectively decreasing the exploratory capability of future iterations (i.e. by decreasing the likelihood of moving to higher energy neighbors than the current state).
• Neighboring states: the neighboring states $\mathcal{N}_s \subseteq \mathcal{S}$ of a given state $s \in \mathcal{S}$ are those states reachable via a transformation of $s$. A stochastic transformation function $N: \mathcal{S} \rightarrow \mathcal{S}$ (with $\mathcal{N}_s \coloneqq \{s^\prime \in \mathcal{S} | s^\prime = N(s)\}$) embodies the notion of a “move” from one state to the next, and is usually defined as a simple/naive random alteration to a given state. Even simple alterations allow the algorithm to progressively explore a great variety of states in the search space.
• Acceptance probability: there are a few components needed in order to make informed decisions about transitions between states:
1. We must define some method of evaluating the relative difference between two states. An energy function $E: \mathcal{S} \rightarrow \mathbb{R}$ maps states to energy values, with lower energy implying greater quality i.e. closer to the target.
2. We need a time-varying, monotonically decreasing temperature function $T: \mathbb{R} \rightarrow \mathbb{R}^+$, mapping a timestep $t$ to a temperature value. This function (along with $P$ defined below) defines the annealing schedule of the algorithm, and effectively controls the speed at which we move from exploration to exploitation. We must have a temperature of zero at the end of the allotted time budget i.e. $T(t_e) = 0$ where the timestep $0 \le t \le t_e$.
3. We need a method of producing the probability of accepting a candidate state $s^\prime$ while in the current $s$, under an energy function $E$ and temperature function $T$ at the current iteration $t$. This is defined by the acceptance probability function $P: \mathbb{R}^3 \rightarrow [0, 1]$, mapping energies $e = E(s)$, $e^\prime = E(s^\prime)$, and temperature $T_t = T(t)$ to the probability of making a transition as described above. Some necessary qualities of $P$ include:
• As $T_t \rightarrow 0$, $P \rightarrow 0$ if $e^\prime > e$. $P$ must tend to some positive value for $e^\prime < e$. This has the effect of narrowing the search space as we exhaust our iteration schedule, becoming increasingly strict about transitioning to better (lower energy) states.
• For $T_t > 0$, $P > 0$ for $e^\prime > e$. This ensures that, so long as we have some non-zero temperature, there is at least some chance of transition to higher energy states i.e. exploration is still possible.
Some desired (albeit not necessary) qualities of $P$ include:
• Ensuring the probability produced by $P$ is equal to one when $e^\prime < e$, irregardless of temperature. This ensures we always move to a more optimal state when we can. This is not, however, necessary for the algorithm to converge1.
• It is typical for $P$ to reflect differences in energies $e$ and $e^\prime$ such that the probability of a state transition increases as the difference $e - e^\prime$ decreases. This is fairly intuitive property to have, and will likely have positive impacts on efficiency, but again is not necessary2.

Intuitively, simulated annealing attempts to find an optimal solution to an optimization problem by initially allowing a large amount of global exploration of