Iterating efficiently

Qt provides a whole lot of ways to iterate over a QList: You can use STL like iterators, Java like iterators, use the foreach macro or access the list elements via their index. Personally I fell in love with foreach from the first day, but it was not long that somebody remarked "foreach is slow" and that I should use STL const iterators ...

With the new benchmarking extensions of QTestLib (Morten already blogged about them), it's really easy to finally sort this out :-) . We will iterate over a QStringList with 100.000 elements (imagine you want to make a locale aware search or something alike, we will not modify the list), and compare numerous ways of iterating.

The test

Here is complete source code of the benchmark:

#include <QtCore/QStringList>
#include <assert.h>
#include <qtest.h>

class tst_iteration : public QObject
{
Q_OBJECT
public:
enum IterationType {
Foreach,
ForeachConstRef,
Java,
Stl,
StlConst,
IndexAt
};

private slots:
void qStringList_data() {
QTest::addColumn<QStringList>("list");
QTest::addColumn<IterationType>("type");

QStringList list;
for (int m = 100000; m >= 0; --m)
list < < QString::number(m);

QTest::newRow("index based") << list << IndexAt;
QTest::newRow("Java") << list << Java;
QTest::newRow("STL") << list << Stl;
QTest::newRow("STL (const iterators)") << list << StlConst;
QTest::newRow("foreach") << list << Foreach;
QTest::newRow("foreach (const references)") << list << ForeachConstRef;
}

The qStringList_data() method defines the test data for each test run: Each run will use the same list with 100.000 elements in it, but use a different type of loop.

    void qStringList() {
QFETCH(QStringList, list);
QFETCH(IterationType, type);

int dummy = 0;

switch (type) {
case IndexAt:
QBENCHMARK {
const int listSize = list.size();
for (int i = 0; i < listSize; ++i)
dummy += list.at(i).size();
}
break;
case Java:
QBENCHMARK {
QListIterator iter(list);
while (iter.hasNext())
dummy += (iter.next()).size();
}
break;
case Stl:
QBENCHMARK {
QStringList::iterator iter = list.begin();
const QStringList::iterator end = list.end();
for (; iter != end; ++iter)
dummy += (*iter).size();
}
break;
case StlConst:
QBENCHMARK {
QStringList::const_iterator iter = list.constBegin();
const QStringList::const_iterator end = list.constEnd();
for (; iter != end; ++iter) {
dummy += (*iter).size();
}
}
break;
case Foreach:
QBENCHMARK {
foreach (QString s, list) {
dummy += s.size();
}
}
break;
case ForeachConstRef:
QBENCHMARK {
foreach (const QString &s, list)
dummy += s.size();
}
break;
default:
break;
}
assert(dummy);
}

Note that I access the size() of each list element. This "side effect" prevents the compiler from being too clever and optimizing the whole loop away :)

Just for the sake of completeness here is the rest of the file:

}; // class tst_iteration

Q_DECLARE_METATYPE(tst_iteration::IterationType);

QTEST_MAIN(tst_iteration)

#include "main.moc"

Results

The benchmarking library allows you to run the same tests with different "backends": That is, you can let it measure the actual time, use the callgrind module of valgrind to measure cache misses or use the CPU ticks. For details see Morten's post. Here are the results for measuring the actual time on my developer machine (2Core Intel, Ubuntu32, gcc 4.3.2):

RESULT : tst_iteration::qStringList():"index based":
0.32 msec per iteration (total: 21, iterations: 64)
RESULT : tst_iteration::qStringList():"Java":
0.39 msec per iteration (total: 25, iterations: 64)
RESULT : tst_iteration::qStringList():"STL":
0.39 msec per iteration (total: 25, iterations: 64)
RESULT : tst_iteration::qStringList():"STL (const iterators)":
0.35 msec per iteration (total: 23, iterations: 64)
RESULT : tst_iteration::qStringList():"foreach":
3.3 msec per iteration (total: 27, iterations: 8 )
RESULT : tst_iteration::qStringList():"foreach (const references)":
0.40 msec per iteration (total: 26, iterations: 64)

Aside from the first simple foreach, all ways of iterating do not really differ performance wise. But why is the foreach without references so much slower? Well, in this case a new QString object is created inside the loop. It doesn't detach (it's still a shallow copy), but since we are not doing that much else in the loop, this makes a difference by a factor of 10!

Bottom line: If you - like me - value the readability of foreach, you should probably get used to writing foreach (const Type &, list) (that is, whenever you do not alter the value). Otherwise the different styles of iterating do not really differ performance wise. Whatever you do in the loop body will in most cases have a much larger impact :)

Update: Rerunning the benchmark on Windows 32 bit with a Visual Studio 2005 compiler (release build) actually shows more severe differences in runtime performance:

RESULT : tst_iteration::qStringList():"index based":
0.12 msec per iteration (total: 32, iterations: 256)
RESULT : tst_iteration::qStringList():"Java":
0.24 msec per iteration (total: 31, iterations: 128)
RESULT : tst_iteration::qStringList():"STL":
0.18 msec per iteration (total: 47, iterations: 256)
RESULT : tst_iteration::qStringList():"STL (const iterators)":
0.18 msec per iteration (total: 47, iterations: 256)
RESULT : tst_iteration::qStringList():"foreach":
3.8 msec per iteration (total: 31, iterations: 8)
RESULT : tst_iteration::qStringList():"foreach (const references)":
0.48 msec per iteration (total: 31, iterations: 64)

MSVC seems to be having problems optimizing the template code away ...


Blog Topics:

Comments