Simulated annealing

Common metaheuristic for global optimization

Simulated annealing is a Metaheuristic for approximating global optima. Broadly speaking, the algorithm uses an energy-based iterative optimization process for exploring the search space. The exact approach varies from problem to problem, but the generic algorithm has the following components:

  • Iteration: the basic step of the algorithm is a probabilistic decision between staying in the current state ss and moving to some neighboring state ss^\prime. Each iteration yields a state of lower energy, effectively decreasing the exploratory capability of future iterations (i.e. by decreasing the likelihood of moving to higher energy neighbors than the current state).
  • Neighboring states: the neighboring states NsS\mathcal{N}_s \subseteq \mathcal{S} of a given state sSs \in \mathcal{S} are those states reachable via a transformation of ss. A stochastic transformation function N:SSN: \mathcal{S} \rightarrow \mathcal{S} (with Ns{sSs=N(s)}\mathcal{N}_s \coloneqq \{s^\prime \in \mathcal{S} | s^\prime = N(s)\}) embodies the notion of a “move” from one state to the next, and is usually defined as a simple/naive random alteration to a given state. Even simple alterations allow the algorithm to progressively explore a great variety of states in the search space.
  • Acceptance probability: there are a few components needed in order to make informed decisions about transitions between states:
    1. We must define some method of evaluating the relative difference between two states. An energy function E:SRE: \mathcal{S} \rightarrow \mathbb{R} maps states to energy values, with lower energy implying greater quality i.e. closer to the target.
    2. We need a time-varying, monotonically decreasing temperature function T:RR+T: \mathbb{R} \rightarrow \mathbb{R}^+, mapping a timestep tt to a temperature value. This function (along with PP defined below) defines the annealing schedule of the algorithm, and effectively controls the speed at which we move from exploration to exploitation. We must have a temperature of zero at the end of the allotted time budget i.e. T(te)=0T(t_e) = 0 where the timestep 0tte0 \le t \le t_e.
    3. We need a method of producing the probability of accepting a candidate state ss^\prime while in the current ss, under an energy function EE and temperature function TT at the current iteration tt. This is defined by the acceptance probability function P:R3[0,1]P: \mathbb{R}^3 \rightarrow [0, 1], mapping energies e=E(s)e = E(s), e=E(s)e^\prime = E(s^\prime), and temperature Tt=T(t)T_t = T(t) to the probability of making a transition as described above. Some necessary qualities of PP include:
      • As Tt0T_t \rightarrow 0, P0P \rightarrow 0 if e>ee^\prime > e. PP must tend to some positive value for e<ee^\prime < e. This has the effect of narrowing the search space as we exhaust our iteration schedule, becoming increasingly strict about transitioning to better (lower energy) states.
      • For Tt>0T_t > 0, P>0P > 0 for e>ee^\prime > e. This ensures that, so long as we have some non-zero temperature, there is at least some chance of transition to higher energy states i.e. exploration is still possible.
      Some desired (albeit not necessary) qualities of PP include:
      • Ensuring the probability produced by PP is equal to one when e<ee^\prime < e, irregardless of temperature. This ensures we always move to a more optimal state when we can. This is not, however, necessary for the algorithm to converge1.
      • It is typical for PP to reflect differences in energies ee and ee^\prime such that the probability of a state transition increases as the difference eee - e^\prime decreases. This is fairly intuitive property to have, and will likely have positive impacts on efficiency, but again is not necessary2.

Intuitively, simulated annealing attempts to find an optimal solution to an optimization problem by initially allowing a large amount of global exploration of

  1. Simulated annealing (Wikipedia)↩︎

  2. Simulated annealing (Wikipedia)↩︎