Boost Accumulators Library

The Boost Accumulators library is a sophisticated framework for incremental computations, that is, computations that can process a single sample at a time. On top of this framework, the Accumulators library additionally provides an implementation of a broad set of statistical accumulators that allow computation of moments, covariances, and other commonly used statistical estimators.

A simple fragment of using the accumulators library is the following code that computes the mean and standard variance of the sequence of numbers in an input stream ifs:

a::accumulator_set<double,
                  a::stats<a::tag::mean,
                  a::tag::moment<2> > acc;
while(ifs.good())
{
  double x;
  ifs >> x;
  acc(x);
}

The key feature of the Accumulators library is that the quantities to be computed are specified at compile time through an MPL sequence. This allows the compiler to analyse the dependencies between the quantities requested by the user and compute the minimal set of accumulators that will provide all of the required quantities and then to optimise these computations. As a trivial example, requesting the computation of the sum and mean of a sequence of numbers will result in only one sum to be computed, with the mean calculated from the same sum. In the terminology of the library, the concept of mean and sum that we wish to calculate are the features which we request, while the accumulator automatically selected by the library will be the sum and counting accumulator.

The use of the Accumulators is recommended in situations when:

  • Writing any small applications requiring statistical computation, as the advantage of using well-tested code will likely outweigh any possible concerns
  • Writing applications that require a diverse set of statistics to be computed, especially if high performance is required
  • Writing libraries for incremental computation where:
    1. A number of different quantities may be required
    2. Time to compute samples is comparable with or shorter than the time to compute the required statistics

The main drawback is that the accumulators makes heavy use of templates with the usual disadvantages that this sometimes brings:

  1. Difficult to interpret error messages
  2. Longer compilation time

Both of these are being addressed by newer compiler versions and of the compilation time can of course be reduced by appropriate use of distributed compilation.

Here is a complete example program that computes the mean, second and sixth moments of a sequence of normally distributed numbers:

#include <time.h>
#include <iostream>

#include <boost/random/normal_distribution.hpp>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/variate_generator.hpp>

#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/mean.hpp>
#include <boost/accumulators/statistics/moment.hpp>

// I'm importing the accumulators namespace as "a" to make clear which
// elements belong to it
namespace a=boost::accumulators;

/// Allows calculation of arbiratry accumulated quantities from
/// numbers iid normal distribution with zero mean and unit variance
template<class T>
void accNormal(T &acc,
               size_t n)
{
  // Create a normally distributed random number generator, seeded
  // from the system time
  boost::variate_generator<boost::mt19937, boost::normal_distribution<> >
    generator(boost::mt19937(time(0)),
              boost::normal_distribution<>());

  // Accumulate n values
  for(size_t i=0; i < n; ++i)
    acc(generator());

}

int main()
{

  a::accumulator_set<double,
                     a::stats<a::tag::mean,
                              a::tag::moment<2>,
                              a::tag::moment<6> > > acc;

  accNormal(acc,
            1000);

  std::cout<<"Mean:   "
           <<a::mean(acc)<<std::endl
           <<"Moment 2: "
           <<a::moment<2>(acc)<<std::endl
           <<"Moment 6: "
           <<a::moment<6>(acc)<<std::endl;
  return 0;
}