Simple input parsing using Boost

Inexperienced programmers often get stuck getting data into their C++ program and with output out of their program. Somewhat more experienced programmers often use clunky routines from the Standard C library such as strtok, atof and friends. I think pretty much everybody agrees that plain C++ does not provide for elegant parsing of simple input data. Fortunately the Boost libraries provide lots of tools to tackle this, and here is a very simple example.

The problem

Parse an input stream consisting of comma separated floating point values, such as:

16.61023,1.4835680,-1.775000, 61.43000

Alternatives

The Boost collection has of course the feature rich Boost.Spirit parser framework that can be used to parse complex input streams. Here I will illustrate a simpler and more limited approach which is nevertheless usually sufficient for input of data into small applications.

The Solution

Here is a very simple function to tackle input data such as described above:

// Bojan Nikolic <bojan@bnikolic.co.uk>
//
// Simple parsing using boost

#include <boost/tokenizer.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/algorithm/string.hpp>

/**
   Turn string s into a vector of types T
 */
template<class T>
void tokenizeV(const std::string &s,
          std::vector<T> &o)
{
  typedef boost::tokenizer<boost::escaped_list_separator<char> >  tok_t;

  tok_t tok(s);
  for(tok_t::iterator j (tok.begin());
      j != tok.end();
      ++j)
  {
    std::string f(*j);
    boost::trim(f);
    o.push_back(boost::lexical_cast<T>(f));
  }
}

It is built on top of three simple but useful libraries:

Boost.tokenizer allows splitting of strings (of both characters and other types) into constituent tokens. The boost::escaped_list_separator splits by default on the comma character but other field separators can be used very easily by supplying them as a parameter to the constructor
Boost.StringAlgo is a library that provides many functions for manipulating strings. In this case I use the boost::trim (documentation) function to remove any whitespace after the input has been split tokenized.
The Boost.Conversion library provides the lexical_cast function that is used to convert the character strings to floating point values. This is roughly equivalent of the atof type functions in C, but very usefully it is templated on the parameter to convert to , and will report errors through exceptions.

As you can see from the code shown above, the Boost.tokenizer library provides for an iterator interface for going through the input data stream. I next use the trim function from Boost.StringAlgo to remove any whitespace around the token, which would prevent a correct conversion to target type. Finally I use the lexical_cast to convert to the target type and push onto the supplied vector.

The end result is a short but versatile function to load data from an input stream.