Qt Quick with the power of OpenCL on Embedded Linux devices

Many Qt users encounter a need to integrate GPU compute solutions into their Qt-based applications. What is more, with the advent of compute API implementations and powerful GPUs on embedded devices, using OpenCL or CUDA on an Embedded Linux device is a reality now. In a previous post we looked at NVIDIA's Jetson TK1 board and discovered how easy it is to get started with CUDA development in Qt applications using OpenGL. When it comes to OpenCL, developers are not left out in the cold either, thanks to Hardkernel's ODROID-XU3, where the ARM Mali-T628 graphics processor provides full OpenCL 1.1 support with CL-GL interop in addition to OpenGL ES 3.0.

In this post we will take a look at a simple and powerful approach to integrating Qt Quick applications and OpenCL. We will focus on use cases that involve sharing OpenGL resources like textures or buffers between OpenGL and OpenCL. The examples demonstrate three standard compute use cases and we will see them running on an actual ODROID board.


Why OpenCL and Qt?

The ability to perform complex, highly parallel computations on embedded devices while keeping as much data on the GPU as possible and to visualize the results with Qt Quick and touch-friendly Qt Quick Controls open the door for easily creating embedded systems performing advanced tasks in the domain of computer vision, robotics, image and signal processing, bioinformatics, and all sorts of heavyweight data crunching. As an example, think of gesture recognition: with high resolution webcams, Qt Multimedia, Qt Quick, Qt Quick Controls, and the little framework presented below, applications can focus on the things that matter: the algorithms (OpenCL kernels) performing the core of the work and the C++ counterpart that enqueues these kernels. The rest is taken care of by Qt.

Looking back: Qt OpenCL

OpenCL is not unknown to Qt - once upon a time, back in the Qt 4 days, there used to be a Qt OpenCL module, a research project developed in Brisbane. It used to contain a full 1:1 API wrapper for OpenCL 1.0 and 1.1, and some very helpful classes to get started with CL-GL interop.

Today, with the rapid evolution of the OpenCL API, the availability of an official C++ wrapper, and the upcoming tighter C++ integration approaches like SYCL, we believe there is little need for straightforward Qt-ish wrappers. Applications are encouraged to use the OpenCL C or C++ APIs as they see fit. However, when it comes to the helpers that simplify common tasks like choosing an OpenCL platform and device so that we get interoperability with OpenGL, they turn out to be really handy. Especially when writing cross-platform applications. Case in point: Qt Multimedia 5.5 ships with an OpenCL-based example as presented in the video filters introduction post. The OpenCL initialization boilerplate code in that example is unexpectedly huge. This shows that the need for modern, Qt 5 based equivalents of the old Qt OpenCL classes like QCLContextGL has not gone away. In fact, with the ubiquity of OpenCL and OpenGL on all kinds of devices and platforms, they are more desirable than ever.

Qt 5.5 on the ODROID-XU3

Qt 5.5 introduces support for the board in the device makespec linux-odroid-xu3-g++. Just pass -device odroid-xu3 to configure.

For example, to build release mode binaries with a toolchain borrowed from the Raspberry Pi, assuming a sysroot at ~/odroid/sysroot:

./configure -release -prefix /usr/local -extprefix ~/odroid/sysroot/usr/local -hostprefix ~/odroid/qt5-build -device odroid-xu3 -device-option CROSS_COMPILE=~/odroid/toolchain/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64/bin/arm-linux-gnueabihf- -sysroot ~/odroid/sysroot -nomake examples -nomake tests -opengl es2

This will configure the Qt libraries and target tools like qmlscene to be deployed under /usr/local in the sysroot, while the host tools - like the x86 build of qmake that is to be used when building applications afterwards - get installed into ~/odroid/qt5-build.

When it comes to the platform plugins, both xcb and eglfs are usable, but only one at a time: the Mali graphics driver binary is different for X11 and fbdev, and has to be switched accordingly. The Ubuntu image from Hardkernel comes with X11 in place. While OpenGL is usable under X too, the usage of eglfs and the fbdev drivers is recommended, as usual.

For more information on the intricacies and a step by step guide to deploying Qt on top of the Hardkernel image, see this wiki page. If you have a Mali-based ARM Chromebook featuring a similar CPU-GPU combo, see here.

It is worth noting that thanks to Qt's Android port, running a full Android system with Qt apps on top is also feasible on this board.

Time for some action

Now to the fun part. Below are three examples running on the framebuffer in full HD resolution with the fbdev Mali driver variant, Qt 5.5 and the eglfs platform plugin. All of them utilize OpenCL 1.1, CL-GL interop, and are regular Qt Quick 2 applications. They all utilize the little example framework which we call Qt Quick CL for now.

OpenGL texture to OpenGL texture via OpenCL

malicl_imageprocess2 An OpenCL-based alternative for ShaderEffect

First, let's take a look at a standard image processing use case: we will execute one or more OpenCL kernels on our input, which can be a Qt Quick Image element, a (potentially invisible) sub-tree of the scene, or any texture provider, and generate a new texture. With CL-GL interop the data never leaves the GPU: no pixel data is copied between the CPU and the GPU. Those familiar with Qt Quick have likely realized already that this is in fact an OpenCL-based alternative to the built-in, GLSL-based ShaderEffect items.

By using the easy-to-use base classes to automatically and transparently manage OpenCL and CL-GL initialization, and to hide the struggles and gotchas of Qt Quick's dedicated render thread and OpenGL contexts, the meat of the above application gets reduced to something like the following:

class CLRunnable : public QQuickCLImageRunnable
    CLRunnable(QQuickCLItem *item)
        : QQuickCLImageRunnable(item)
        m_clProgram = item->buildProgramFromFile(":/kernels.cl");
        m_clKernel = clCreateKernel(m_clProgram, "Emboss", 0);
    ~CLRunnable() {
    void runKernel(cl_mem inImage, cl_mem outImage, const QSize &size) Q_DECL_OVERRIDE {
        clSetKernelArg(m_clKernel, 0, sizeof(cl_mem), &inImage);
        clSetKernelArg(m_clKernel, 1, sizeof(cl_mem), &outImage);
        const size_t workSize[] = { size_t(size.width()), size_t(size.height()) };
        clEnqueueNDRangeKernel(commandQueue(), m_clKernel, 2, 0, workSize, 0, 0, 0, 0);
    cl_program m_clProgram;
    cl_kernel m_clKernel;

class CLItem : public QQuickCLItem { Q_OBJECT Q_PROPERTY(QQuickItem *source READ source WRITE setSource) public: CLItem() : m_source(0) { } QQuickCLRunnable *createCL() Q_DECL_OVERRIDE { return new CLRunnable(this); } QQuickItem *source() const { return m_source; } void setSource(QQuickItem *source) { m_source = source; update(); } private: QQuickItem *m_source; };

... qmlRegisterType("quickcl.qt.io", 1, 0, "CLItem") ...

import quickcl.qt.io 1.0

Item { Item { id: src layer.enabled: true ... } CLItem { id: clItem source: src ... } }

Needless to say, the application works on a wide variety of platforms. Windows, OS X, Android, and Linux are all good as long as OpenGL (ES) 2.0, OpenCL 1.1 and CL-GL interop are available. Getting started with OpenCL in Qt Quick applications won't get simpler than this.

OpenGL texture to arbitrary data via OpenCL

CL-GL histogram Histogram in Qt Quick directly on the GPU

And now something more complex: an image histogram. Histograms are popular with Qt, and the recent improvements in Qt Multimedia introduce the possibility of efficiently calculating live video frame histograms on the GPU.

In this example we take it to the next level: the input is an arbitrary live sub-tree of the Qt Quick scene, while the results of the calculation are visualized with a little Javascript and regular OpenGL-based Qt Quick elements. Those 256 bars on the right are nothing else but standard Rectangle elements. The input image never leaves the GPU, naturally. All this with a few lines of C++ and QML code.

OpenGL vertex buffer generation with OpenCL

VBO generation from OpenCL on the ODROID-XU3 VBO generation from OpenCL

Last but not least, something other than GL textures and CL image objects: buffers! The position of the vertices, that get visualized with GL by drawing points, are written to the vertex buffer using OpenCL. The data is then used from GL as-is, no readbacks and copies are necessary, unlike with Qt Quick's own GL-based particle systems.

To make it all more exciting, the drawing happens inside a custom QQuickItem that functions similarly to QQuickFramebufferObject. This allows us to mix our CL-generated drawing with the rest of the scene, including Qt Quick Controls when necessary.

Looking forward: Qt Quick CL

QtQuickCL is a small research and demo framework for Qt 5 that enables easily creating Qt Quick items that execute OpenCL kernels and use OpenGL resources as their input or output. The functionality is intentionally minimal but powerful. All the CL-GL interop, including the selection of the correct CL platform and device, is taken care of by the module. The QQuickCLItem - QQuickCLRunnable split in the API ensures easy and safe CL and GL resource management even when Qt Quick's threaded render loop is in use. Additional convenience is provided for the cases when the input, output or both are OpenGL textures, like for instance the first two of the three examples shown above.

The code, including the three examples shown above, is all available on Gerrit and code.qt.io as a qt-labs repository. The goal is not to provide a full-blown OpenCL framework or wrapper, but rather to serve as a useful example and reference for integrating Qt Quick and OpenCL, and to help getting started with OpenCL development. Happy hacking!

Blog Topics: