Lazy Code Generation (LzCG) and C++ Expression Templates

I've been thinking recently about strategies for writing fast numerical codes based on C++. One of the ideas I have been pursuing is generating the code for the some of the performance critical sections after the compilation of the main body of the code. The summary of the idea is:

  • Expression templates in C++ encode the algorithm they perform in the signature of functions to be called
  • These signatures can be parsed after the compilation of original code and alternative implementations substituted
  • This allows for substituting an optimised version of the function after compilation (lazily) by exploiting for example:
    • Optimisations not implemented in the compiler
    • Knowledge about the precise processor/cache/memory configuration
    • Appropriate multi-threading
    • Use of an auxiliary processing unit (e.g., GPU)

I have posted the full paper here.