Generating Normally Distributed Random Numbers Using Boost

One of my previous posts describes a very minimal program that generates a normally distributed random number using Boost. However, I've realised from the number of people visiting that this example may indeed be a bit too short and simple, and lead people to not exactly the best practises. So here is a longer post with more pointers.

Lifetime of the generator object

It may sound obvious, but in order to be sure of getting good quality random numbers, it is essential to avoid reseeding or re-initialising the random number generator. In the case of normally-distributed random number, this means preserving the state of the underlying integer random generator.

Because Boost does not use global or static variables, preserving the state of the generator requires either having long-lived (probably heap-allocated) objects with the generators or using appropriate copying of the states.

Example of poor practice

If one tried to extend the example from previous post in the most obvious way, the results would certainly not be satisfactory:

// Bojan Nikolic <bojan@bnikolic.co.uk>
// Example of bad practice, do not do this!

#include <time.h>
#include <boost/random/normal_distribution.hpp>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/variate_generator.hpp>

double gen_normal(void)
{
  boost::variate_generator<boost::mt19937, boost::normal_distribution<> >
    generator(boost::mt19937(time(0)),
           boost::normal_distribution<>());

  double r = generator();
  return r;
}

Here I only moved the content of what was previously in main to a new function gen_normal. The problem is calling gen_normal will generate terribly non-random random numbers:

int main(void)
{
  for(size_t i=0; i<10; ++i)
    std::cout<<gen_normal()
             <<std::endl;
}
// output:
// 0.121596
// 0.121596
// 0.121596
// 0.121596
// 0.121596
// 0.121596
// 0.121596
// 0.121596
// 0.121596
// 0.121596

I.e., all of the values are the same. This is simply due to repeated re-initialisation of the random number generator from the system clock.

Careful with copies

Another way to get bitten is to be careless with copies of the generator object. Here is an example:

template<class T>
double gen_normal_2(T generator)
{
  return generator();
}

int main(void)
{
  boost::variate_generator<boost::mt19937, boost::normal_distribution<> >
    generator(boost::mt19937(time(0)),
              boost::normal_distribution<>());

  for(size_t i=0; i<10; ++i)
    std::cout<<gen_normal_2(generator)
             <<std::endl;
}
// Output:
// 0.898862
// 0.898862
// 0.898862
// 0.898862
// 0.898862
// 0.898862
// 0.898862
// 0.898862
// 0.898862
// 0.898862

What happens here is that every time the function gen_normal_2 is called it makes a copy of the generator object and advances the state of this copy only, not of the parent object.

Solution by passing references

The simplest solution in this case it of course to ensure a single generator object exists and only pass references to it. In the above example, this only requires inserting a simple &:

template<class T>
double gen_normal_3(T &generator)
{
  return generator();
}

// Version that fills a vector
template<class T>
void gen_normal_3(T &generator,
              std::vector<double> &res)
{
  for(size_t i=0; i<res.size(); ++i)
    res[i]=generator();
}

int main(void)
{
  boost::variate_generator<boost::mt19937, boost::normal_distribution<> >
    generator(boost::mt19937(time(0)),
              boost::normal_distribution<>());

  for(size_t i=0; i<10; ++i)
    std::cout<<gen_normal_3(generator)
             <<std::endl;
}
// Output:
// -0.643475
// 0.144729
// 0.439714
// 0.481678
// 0.402485
// 0.416421
// -1.59029
// -0.800964
// -0.621854
// -0.150999

Solution with copies

Passing references may not be practical in some situations (for example, in multi-threaded programs when the references are non-local). In these cases it is possible to ensure the parent object is advanced by returning the new state and copying it back into the parent:

template<class T>
T gen_normal_4(T generator,
            std::vector<double> &res)
{
  for(size_t i=0; i<res.size(); ++i)
    res[i]=generator();
  // Note the generator is returned back
  return  generator;
}

int main(void)
{
  boost::variate_generator<boost::mt19937, boost::normal_distribution<> >
    generator(boost::mt19937(time(0)),
              boost::normal_distribution<>());

  std::vector<double> res(10);
  // Assigning back to the generator ensures the state is advanced
  generator=gen_normal_4(generator, res);

  for(size_t i=0; i<10; ++i)
    std::cout<<res[i]
             <<std::endl;

  generator=gen_normal_4(generator, res);

  for(size_t i=0; i<10; ++i)
    std::cout<<res[i]
             <<std::endl;
}
// Output:
// -2.01509
// -1.87061
// 1.35597
// -0.434768
// -0.0233686
// -0.0525241
// -0.521989
// -0.362958
// -0.557344
// -1.02654
// -0.506174
// 0.299165
// 0.847568
// 0.0126109
// -1.33701
// 0.892904
// 0.612492
// 1.01212
// 1.33039
// 0.487829

It can be seen that the two calls to the gen_normal_4 function return different random number sequences, so this approach works fine too.

Because it is more error prone and requires more analysis, I would however avoid this pattern unless you are absolutely sure this is what you require.

Summary

To be sure that you are getting good quality random numbers it is best to advance a single generator for the duration of the numerical computation task. But boost random number generators do not hold a global state and it is up to the user to ensure generator states are preserved. The easiest and usually sufficient way of doing this is to pass references to a single, top-level, generator object.

This is still only scratching the surface, so I guess there will be more posts on boost random numbers...