Qt + Boost.Regex = QtBoostRegex
More or less everybody is using simple Regular Expressions from time to time, may it be through grep, url_rewrite or QRegExp. Most people seem to stick to a basic set of regex features: alternation, character classes, greedy quantifiers, capturing, grouping and zero-width assertions (e.g. anchors). However, some people, myself included, expect more from a regex engine.
A few weeks before I started working at Qt Software Berlin (Trolltech at that time) I wrote a mail to qt-interest asking for extension of QRegExp by
- ^ and $ match for each newline, not only string start/end
- Dot-matches-newline switch
- Callback input (matching text from sources unlike "plain arrays")
I knew there were existing libraries that could do that, though Boost.Regex actually does a lot more. As a side project, I started working on a Qt wrapper around Boost.Regex so I could directly feed it with QStrings. Boost.Regex is templated and works with wchar_t, char and more but cannot work with QChar directly. It works with ushort though and QChar more or less is a ushort.
It's incomplete, but I did not want it to gather more dust. Maybe this is what a few people have been hoping for just like I did back then. So I'm releasing it on Qt Labs now. Download the sources from here:
Please note you need bcp (Boost Copy) which you install through
$ sudo apt-get install bcp
on Debian-based Systems.
Stuff to do include:
- try to move Boost.Regex "further inside"
- upgrade to a more recent Boost.Regex (e.g. to fix a GCC 4.3.x compile error)
What's interesting to note is that QtBoostRegex outperforms QRegExp for some cases, for some QRegExp is the clear winner. For details please see ./examples/test holding a small test suite.
Besides the extra features on the regex language level this wrapper also allows to search non-array input. What does that mean? Usually a regex engine iterates over an array of characters or the structure wrapping these characters. Boost.Regex offers an extra layer of abstraction that we use to wrap an arbitrary non-array structure (e.g. a list of lines holding a document inside a text editor) inside a bidirectional iterator (I called feeder) here. The code contains an example of such an iterator wrapping a QStringList: StringListRegexFeeder. Please have a look at the ./examples/simple folder to see how it's used.
To prepare the Boost.Regex sources please check out the script ./extract_boost_regex.sh. Without running that "qmake && make" will not succeed.
To summarize my post:
- QtBoostRegex is a Qt wrapper around Boost.Regex
- Boost.Regex is more powerful than QRegExp but slower in some cases
- QtBoostRegex is not finished yet, I will keep working on it as time permits
Have a nice day, Sebastian
Subscribe to our newsletter
Try Qt 5.12 LTS Now!
Download the latest release here: www.qt.io/download.
Qt 5.12 was developed with a strong focus on quality and is a long-term-supported (LTS) release that will be supported for 3 years.
Check out all our open positions here and follow us on Instagram to see what it's like to be #QtPeople.
Näytä tämä julkaisu Instagramissa.
Want to build something for tomorrow, join #QtPeople today! We have loads of cool jobs you don’t want to miss! http://qt.io/careers #builtwithQt #software #developers #coding #framework #tool #tooling #C++ #QML #engineers #sales #tech #technology #UI #UX #CX #Qt #Qtdev #global #openpositions #careers #job
Henkilön Qt (@theqtcompany) jakama julkaisu