Boost.Xpressive

Boost.Xpressive provides a comprehensive, flexible, and efficient library for building regular expressions and matching of text in a variety situations. A minimal usage example is searching for a combination of sub-strings:

/// Does the supplied string indicate an error?
bool Error_p(const std::string &s)
{
  using namespace boost::xpressive;
  sregex rex = *_w >> "Error:" >> *(~_n);
  return regex_match(s,rex);
}

The Boost.Xpressive library has two unique features:

  1. Regular expressions can be parsed at compile time (the traditional approach, see for example Boost.RegEx) and at compile time:

    // Compile-time
    sregex rex = *_w >> "Error:" >> *(~_n);
    // Run-time
    sregex rex2 = sregex::compile( "(\\w*)Error:(.*)" );
    
  2. When constructed at compile time, a regular expression can reference other expressions, including itself:

    sregex rex = *_w >> "Error:" >> *(~_n);
    // One or more Error's in same string
    sregex rex_rec = +rex;
    

These two features lead to a number of important benefits:

  • Compile-time constructions means that the syntax of regular expressions and parameter types are checked during the compilation, allowing early identification of errors
  • Compile-time constructions allows compiler optimisations, improving the performance of the matching/searching engine
  • Allowing references reduces possibility of error and increases the clarity of the code
  • References allow building of complex grammars

These benefits mean that the Xpressive library should be used in the majority of situations where non-trivial parsing of input strings is required. The library has the additional advantage of being a header-only library easing building of programs that make use of this library. This also however leads to one of the disadvantages compared to Boost.RegEx, in that the compilation time is significantly longer (as usual, if this is a problem you should consider using distcc or similar tools).

The flexibility of Xpressive means that it is close in functionality to a parser generator such as the Spirit that is also part of Boost collection. The two features which differentiate Xpressive from Spirit are:

  1. Run-time creation of regular expressions defining the textual analysis to be done
  2. Backtracking, i.e., exhaustively trying every possibility to match the given pattern

In quantitative finance, the main use of Xpressive is in parsing of input data, for example market quote information or information on the order book flow, especially when the incoming formats are complex and liable to change, thus making use of Xpressive's flexibility. A second typical usage, which fully makes of Xpressive's features, is to allow advanced user-driven filtering of data by combining compile-time and run-time created regular expressions.

Here is a complete example program:

// Bojan Nikolic <bojan@bnikolic.co.uk>

// A simple program to filter the standard input, allowing the user to
// supply a regular expression and then embelishing this regular
// expression by requiring specific characters before its occurance


#include <iostream>
#include <boost/xpressive/xpressive_static.hpp>
#include <boost/xpressive/xpressive_dynamic.hpp>


using namespace boost::xpressive;

struct filter
{

  /// The user-supplied regular expression
  sregex urex;

  /// The static part of the regular expression
  sregex srex;

  /// Construct a filter witch maches one or more of 'X', 'x' or ' '
  /// (space) and then a user-supplied regular expression and then a
  /// colon.
  ///
  /// For example could be used to find string of type:
  ///
  /// XXX Warning: There was a delay in accessing the market data server
  ///
  /// Note the use of reference to urex in the static expression srex
  filter(const char *u):
    urex(sregex::compile(u)),
    srex(+(set='X','x',' ')>>urex>>':'>> +_)
  {}

  /// Does the string match?
  bool pred(const std::string &s)
  {
    return regex_match(s,srex);
  }

};

void dofilter(std::istream &ins,
              std::ostream &os,
              const char *urex_str)
{
  filter f(urex_str);
  std::string cline;
  while(ins.good())
  {
    std::getline(ins,cline);
    if (f.pred(cline))
    {
       os<<cline;
    }
  }
}

int main(int argc, const char **argv)
{
  dofilter(std::cin,
           std::cout,
           argv[1]);
}