Boost Tokenizer Library

The Boost Tokenizer library is a simple and small library to break up strings (or other input streams of characters) into sequences of tokens. The user can specify what characters delimit the tokens. Therefore, the Tokenizer library is a simple and convenient alternative for parsing input data in a very regular format as opposed to more fully featured libraries such as Xpressive that can handle more complex inputs.

A typical example of usage of this library looks something like the following fragment which splits the input string ''s'' into tokens according to the syntax of the comma separated variables (CSV) files:

typedef tokenizer<escaped_list_separator<char> >  tok_t;
// s is the input string
tok_t tok(s);
for(tok_t::iterator j (tok.begin());
    j != tok.end();
    ++j)
{
  // process the tokens
}

The normal usage scenarios for this library are parsing input data such as comma and tab separated data files and similar streams. The main advantage is that it is simple and easy to use and does not require writing regular expressions or grammars. The library is similar in function to the standard C function ''strtok'' but is more convenient and safer to use.

Here is a complete example which loads comma-separated numbers from the standard input:

#include <iostream>
#include <boost/tokenizer.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/foreach.hpp>

/**
   Turn string s into a vector of types T
*/
template<class T>
void tokenizeV(const std::string &s,
               std::vector<T> &o)
{
  typedef boost::tokenizer<boost::escaped_list_separator<char> >  tok_t;
  tok_t tok(s);
  for(tok_t::iterator j (tok.begin());
      j != tok.end();
      ++j)
  {
    o.push_back(boost::lexical_cast<T>(*j));
  }
}

int main(void)
{
  std::string buffer;
  std::cin>>buffer;

  std::vector<double>v;
  tokenizeV(buffer,v);

  BOOST_FOREACH(const double &x, v)
    std::cout<<x<<",";
  std::cout<<std::endl;
}