It turns out that, as pointed out by Jens Axel Søgaard, the method I wrote about a few days ago is the alias method by Walker, as found in D. Knuth’s TAOCP, but with a modification by R. A. Kronmal and A. V. Peterson. Their contribution, it seems, was the algorithm to rearrange the intervals in O(N) operations.
In any case, there is a very good and complete textbook, nowadays available freely on the net, where all these things are explained, disected, and thoroughly overengineered in every good sense of the word. The book is “Non-Uniform Random Variate Generation”, by Luc Devroye (thanks to Brad Lucier for mailing me this great link).
Below you’ll find some additional notes about the method, prompted by reading part of the book, and a link to an updated version of the CL code. (For simplicity, I will write as if in the context of my first post).
The first thing I found interesting is that, while in theory one random number is enough, in practice this doesn’t always hold, because one is using the higher bits to choose the slot, and the lower bits to compute the alternative. The more you need for the former, the less you have for the latter. This means that the simulated variable will produce the values with slightly different probabilities than those specified. However, I think it is unlikely for this deviation to be really important unless the number of possible values of the variable becomes large, but it is something to think about.
The other interesting point is that the method can be made even faster. The trick to speed up the generation of values of the random variable is to add more possible values but with probability zero. What happens then is that one has a variety of slots for whom only the alternaive needs to be considered, and thus one can avoid the second call to the random number generator. The more of these phantom values, the less calls are needed. And if these ‘dead’ slots are rearanged in the correct order, which btw is very easy, it is also possible to reduce the number of array accesses.
In old days that seems to have been very important. I doubt that an array access will make that much difference nowadays, but being so easy to implement, it doesn’t cost you much either. Economizing calls to the random number generator, in particular if it is an expensive one, seems to me a bigger advantage.
I have updated my implementation, and included the option :phantom-values to specify the number of extra values with zero probability to consider. This version goes the safe route, using a second call to RANDOM should it be necessary, but also avoids array accesses when possible. You can get it here.