Some days ago, there was some discussion in #lisp about how to program a random variable with nontrivial probability distribution. That is, given values x_{1},…,x_{n}, and probabilities p_{1},…,p_{n}, how to write a function that randomly returns one of those values with the corresponding probability each time it is called.

It turns out this problem has a beautiful, simple, and efficient solution called the alias method (R. A. Kronmal and A. V. Peterson. *On the alias method for generating random variables from a discrete distribution*. The American Statistician, 33:214–218, 1979.), that requires only one call to the random number generator. While I haven’t read any complete explanation of it (only the one in R. A. Kronmal, A. V. Peterson, *The alias and alias-rejection-mixture methods for generating random variables from probability distributions*. Proc. 1979 Winter Simulation Conference (ed. H.J. Highland et al.) 269–280, 1979, which isn’t complete), I think it is highly unlikely that it is different from the method I describe below.

An Common Lisp implementation can be found here.

**The alias method**

If you choose one real number between 0 and 1 at random with a uniform distribution, the probability that it lies between a and b, with

0≤a<b≤1, is b-a. So the obvious solution to our problem is to put intervals I_{i} in [0,1) of length p_{i} next to each other. Each time we want a value of our random variable, we choose a value v in [0,1) at random with a uniform distribution, and see in which interval it falls. If it falls in I_{k}, the value is x_{k}. The problem is that finding the interval is costly.

The fundamental idea behind the alias method is that this approach still works if each I_{i} was subdivided in nonoverlaping, not necessarily contiguous intervals J_{i}^{1}, J_{i}^{2},…,J_{i}^{ki} in [0,1) such that the sum of their lengths is p_{i}. This can be used to rearange the intervals in such a way that it is easy to know in what interval a given point lies, and to which value the interval corresponds. To do this, one divides [0,1) in n intervals L_{k} of equal length, and then packs each with at most two intervals belonging to different x_{k}. Thus, finding out in what interval a point falls amounts to finding out, first, in which L_{k} it falls, which can be obtained by division and truncation, and then finding out which one of the two possibilities that are left holds.

To get this redistribution right, pack L_{1} with an interval whose length is less or equal than 1/n, and, if necessary, take away what is needed to fill up L_{1} from one whose length is greater than 1/n. Now, filling up the rest of the L_{i} is very close to the initial problem, but with one value (and one interval) less.

#1 by Jens Axel Søgaard on April 17, 2006 - 7:42 pm

An interesting problem. Is this alias method the same as the alias method in “The Art of Computer Programming” by Knuth vol 2 section 3.2.1 ? It seems to solve the same problem, but is attributed to A.J.Walker (1977).

A Scheme implementation can be found at the bottom of

#2 by Jens Axel Søgaard on April 17, 2006 - 7:43 pm

The link disappeared:

http://cvs.sourceforge.net/viewcvs.py/schematics/schemathics/random.ss?rev=1.5&view=auto

#3 by prxq on April 17, 2006 - 10:55 pm

In the second reference I gave above, the authors say that their version is a slight modification of the alias method by Walker. Not having the Knuth at hand, I can’t really say.

#4 by Jens Axel Søgaard on April 17, 2006 - 11:31 pm

Based on the title “On the alias method…” I think you are right it is a refinement of the Walker method. I was unable to find an online copy, so I hope my local library can get it for me.