# Simulated annealing

*Common metaheuristic for global optimization*

**Simulated annealing** is a Metaheuristic for approximating global optima. Broadly speaking, the algorithm uses an energy-based iterative optimization process for exploring the search space. The exact approach varies from problem to problem, but the generic algorithm has the following components:

**Iteration**: the basic step of the algorithm is a probabilistic decision between staying in the current state $s$ and moving to some neighboring state $s^\prime$. Each iteration yields a state of lower energy, effectively decreasing the exploratory capability of future iterations (i.e. by decreasing the likelihood of moving to higher energy neighbors than the current state).**Neighboring states**: the*neighboring states*$\mathcal{N}_s \subseteq \mathcal{S}$ of a given state $s \in \mathcal{S}$ are those states reachable via a transformation of $s$. A stochastic transformation function $N: \mathcal{S} \rightarrow \mathcal{S}$ (with $\mathcal{N}_s \coloneqq \{s^\prime \in \mathcal{S} | s^\prime = N(s)\}$) embodies the notion of a “move” from one state to the next, and is usually defined as a simple/naive random alteration to a given state. Even simple alterations allow the algorithm to progressively explore a great variety of states in the search space.**Acceptance probability**: there are a few components needed in order to make informed decisions about transitions between states:- We must define some method of evaluating the relative difference between two states. An energy function $E: \mathcal{S} \rightarrow \mathbb{R}$ maps states to energy values, with lower energy implying greater quality i.e. closer to the target.
- We need a time-varying, monotonically decreasing temperature function $T: \mathbb{R} \rightarrow \mathbb{R}^+$, mapping a timestep $t$ to a temperature value. This function (along with $P$ defined below) defines the
*annealing schedule*of the algorithm, and effectively controls the speed at which we move from exploration to exploitation. We must have a temperature of zero at the end of the allotted time budget i.e. $T(t_e) = 0$ where the timestep $0 \le t \le t_e$. - We need a method of producing the probability of accepting a candidate state $s^\prime$ while in the current $s$, under an energy function $E$ and temperature function $T$ at the current iteration $t$. This is defined by the acceptance probability function $P: \mathbb{R}^3 \rightarrow [0, 1]$, mapping energies $e = E(s)$, $e^\prime = E(s^\prime)$, and temperature $T_t = T(t)$ to the probability of making a transition as described above. Some necessary qualities of $P$ include:
- As $T_t \rightarrow 0$, $P \rightarrow 0$ if $e^\prime > e$. $P$ must tend to some positive value for $e^\prime < e$. This has the effect of narrowing the search space as we exhaust our iteration schedule, becoming increasingly strict about transitioning to better (lower energy) states.
- For $T_t > 0$, $P > 0$ for $e^\prime > e$. This ensures that, so long as we have some non-zero temperature, there is at least
*some*chance of transition to higher energy states i.e. exploration is still possible.

- Ensuring the probability produced by $P$ is equal to one when $e^\prime < e$, irregardless of temperature. This ensures we always move to a more optimal state when we can. This is not, however, necessary for the algorithm to converge
^{1}. - It is typical for $P$ to reflect differences in energies $e$ and $e^\prime$ such that the probability of a state transition increases as the difference $e - e^\prime$ decreases. This is fairly intuitive property to have, and will likely have positive impacts on efficiency, but again is not necessary
^{2}.

Intuitively, simulated annealing attempts to find an optimal solution to an optimization problem by initially allowing a large amount of global exploration of