Qt + Boost.Regex = QtBoostRegex

More or less everybody is using simple Regular Expressions from time to time, may it be through grep, url_rewrite or QRegExp. Most people seem to stick to a basic set of regex features: alternation, character classes, greedy quantifiers, capturing, grouping and zero-width assertions (e.g. anchors). However, some people, myself included, expect more from a regex engine.

A few weeks before I started working at Qt Software Berlin (Trolltech at that time) I wrote a mail to qt-interest asking for extension of QRegExp by

  • Lookbehind
  • ^ and $ match for each newline, not only string start/end
  • Dot-matches-newline switch
  • Callback input (matching text from sources unlike "plain arrays")

I knew there were existing libraries that could do that, though Boost.Regex actually does a lot more. As a side project, I started working on a Qt wrapper around Boost.Regex so I could directly feed it with QStrings. Boost.Regex is templated and works with wchar_t, char and more but cannot work with QChar directly. It works with ushort though and QChar more or less is a ushort.

It's incomplete, but I did not want it to gather more dust. Maybe this is what a few people have been hoping for just like I did back then. So I'm releasing it on Qt Labs now. Download the sources from here:

http://labs.trolltech.com/gitweb?p=qtboostregex;a=summary

Please note you need bcp (Boost Copy) which you install through

$ sudo apt-get install bcp

on Debian-based Systems.

Stuff to do include:

  • try to move Boost.Regex "further inside"
  • upgrade to a more recent Boost.Regex (e.g. to fix a GCC 4.3.x compile error)

What's interesting to note is that QtBoostRegex outperforms QRegExp for some cases, for some QRegExp is the clear winner. For details please see ./examples/test holding a small test suite.

Besides the extra features on the regex language level this wrapper also allows to search non-array input. What does that mean? Usually a regex engine iterates over an array of characters or the structure wrapping these characters. Boost.Regex offers an extra layer of abstraction that we use to wrap an arbitrary non-array structure (e.g. a list of lines holding a document inside a text editor) inside a bidirectional iterator (I called feeder) here. The code contains an example of such an iterator wrapping a QStringList: StringListRegexFeeder. Please have a look at the ./examples/simple folder to see how it's used.

To prepare the Boost.Regex sources please check out the script ./extract_boost_regex.sh. Without running that "qmake && make" will not succeed.

To summarize my post:

  • QtBoostRegex is a Qt wrapper around Boost.Regex
  • Boost.Regex is more powerful than QRegExp but slower in some cases
  • QtBoostRegex is not finished yet, I will keep working on it as time permits

Have a nice day, Sebastian


Blog Topics:

Comments