Optimizing Real-Time 3D for Entry Level Hardware

When more and more cars are being equipped with digital instrument clusters, it is very important to consider the cost of the electronics needed to achieve the desired user experience. In volume products, it matters a lot whether the cost of a hardware component is 10, 50 or 100 USD - so it really pays off to consider what is the most efficient approach for software running in the SoC. Today users expect high-quality graphics and smooth animations, and in a system like a digital instrument cluster, it is also a matter of safety to make sure performance is constantly high. In this post, we will explain how we reached a silky smooth and constant 60 fps rendering performance with Renesas R-Car D3 entry-level SoC using Qt Quick and Qt 3D Studio.

Requirements for the project

We selected the Renesas R-Car D3 low-end SoC as the hardware for this project. Similar performance entry level SoCs are available from other manufacturers as well, and we have also completed projects in the past with NXP's entry-level SoCs. Now we wanted to use the Renesas R-Car D3, which comes with the Imagination PowerVR GE8300 entry level GPU (https://www.imgtec.com/powervr-gpu/ge8300/) and one ARM Cortex A53 CPU core.

Our visual designers created a rather typical modern digital instrument cluster design as the basis of this work. The cluster UI has two different gauge modes and a real-time 3D ADAS (Advanced Driver Assistance Systems) view in the middle. The target was to achieve a solid 60 FPS at 1280 x 480 resolution on the selected low-end hardware (Renesas R-Car D3 entry-level SoC). The design concept contained both real-time 3D elements (ADAS view) as well as elements that can be rendered with 2D using OpenGL shaders and visual design bringing a seamless 3D experience (without actually being calculated as real-time 3D).

demolayout

The structure of the design concept

Initial setup and findings

After completing the initial prototype using Qt 3D Studio we deployed it to the R-Car D3 development board running Qt 5.12 to analyze performance. Even though the graphics were not very complex, this non-optimized design was only rendering a mere ~10 fps. After we optimized the project assets it improved the situation, but at ~20 fps we were still quite far from the target rendering 60fps. It was clear, we will have to do further analysis and consider the overall architecture of the application.

The analysis was started with PVRTrace to learn more from the D3 SoC and where the pain points are. Very soon we learned that usage of large FBO's (Frame Buffer Objects) or rendering target buffers in OpenGL basically kill the performance on this hardware. We measured 10ms overhead from using even just a single fullscreen FBO, which was 10ms out of the 16.6 we had available for the targeted 60fps.

Measured architectures

We measured variety of different application architectures and to get a baseline what can be reached with D3 hardware we also measured Qt Quick + plain OpenGL ADAS view. We implemented everything possible with Qt Quick and drew ADAS with pure OpenGL commands. With this setup we reached 60fps, which confirmed that are target is attainable.

We then moved on to the Qt Quick + Qt 3D Studio setup. The gauges were implemented with Qt Quick and the ADAS view in the middle as a Qt 3D Studio component. This method uses 1 FBO. A single layer was used for 3D content (single layer pass through), therefore no second FBO is used. Further savings on RAM can be achieved using texture compression. This setup was able to render static 60fps as well.

The most optimal architecture

After trying different approaches it was obvious that we will achieve the best possible performance when implementing the 2D elements with Qt Quick and use Qt 3D Studio runtime for the ADAS view. By implementing a background drawing patch to the runtime, we were able to squeeze out a few more frames per second and lower the CPU consumption.

demolayout2

Gauges implemented with QtQuick and 3D ADAS view with Qt 3D Studio 

Normally when you want to combine a 3D scene created with 3D Studio with 2D content with Qt Quick, it is necessary to composite the two together.  In this case Qt Quick has the responsibility for composition, however since Qt Quick is a 2D renderer it is necessary to flatten any 3D content embedded in the scene.  This is by default done by rendering the 3D content to an off-screen texture first, but as we mentioned above, the target hardware comes with a huge performance penalty for rendering to off-screen surfaces.  So this default route needed to be avoided.  Fortunately, Qt Quick is quite flexible and allows for some workarounds.

Our workaround was to render the Qt 3D Studio scene directly to the window at the start of the frame and then render the Qt Quick User interface over it.  The element in the Qt Quick UI where the 3D scene would have in this case been just a transparent hole so that you can see the underlying 3D content that was already rendered to the screen.  When doing this it's also important to remember not to clear the window when rendering, which can be done by setting the Window color to "transparent" in QML.

This approach does come with some design limitations, though. In this case the limitation was not a problem and the design concept was possible to implement without any changes.  You can choose to either render the 3D content before, or after you render the Qt Quick UI, meaning the content will always either be above or below.  Since we chose to draw before and thus below the the Qt Quick UI it is then possible for us to draw controls on top of the 3D UI (even though we don't in our case), but we would have not been able to blend any content that was under the 3D view.

If we needed to blend the 3D view with some 2D content that was under the 3D view, we could have chosen to render after the 2D content was rendered, thus above.  What is not possible with the chosen approach though is composing content below and above of the 3D content at the same time.  That is only possible via the use of an off-screen texture, so keep that in mind when designing content for such targets. It is also worth considering that blending has high resource utilization, especially on low-end hardware where your resources are limited, so although we did not have this constraint for this particular project it's worth considering whether blending is needed early on in the design phase.

Final design optimisations to the 3D parts

As always with real-time 3D applications, we also looked into optimizing the graphics assets. Very often the 3D models and other assets first received from the designer are not yet optimized for the target application and hardware environment. For example, the models may have a higher level of detail than the human eye can detect from the screen. When every GPU and CPU cycle counts, it is essential to optimize the graphics assets as well. With the right tricks, the system load can be significantly reduced without any visible difference for the user.

Road mesh optimizations

We created a road mesh that uses vertex colors multiplied by road texture. This way the road can blend to the black background without rendering transparency, or using another texture to create an overlay gradient.

demovertexcolors

Vertex colors are multiplied by road texture

We created the road movement by animating the models' texture coordinates. (Animatable material)

demoroad

Animated road

For low cost "Anti-aliasing effect" the edges of road texture were painted with background color (black).

demoantialias

Antialiasing effect

Car mesh optimizations

First step for optimizing a high polygon car model with multiple textures and materials was to create a low poly version. A few low poly models were generated with different poly counts in Zbrush and those low poly models were tested in application to see what is the minimum poly count that still looks good. The final low poly car model was created with Autodesk Maya's Quad Draw tool.

demolowpoly

High polygon car model to low polygon

The final car mesh had multiple materials with detailed textures which were baked into single texture map, deleting half of the car and three of the wheels. The UV texture layout was created for the left side of the car and one of the wheels and the textures were baked based on the high poly model. After this, the geometry was duplicated and mirrored to create the right side of the car, where the wheels were also copied. This way our model had 50 % more pixels on the geometry while maintaining the same texture size.

democar

Creating the texture map

The maximum size of the car on screen is 200*150 pixels, so we reduced the size of the texture to 256*256 pixels.

demotexture

Reducing texture size. The size of the pixels on the car's surface A are similar to the pixel size in the actual render of B. and the final size C still preserves enough details.

2D component optimizations

We built the 2D UI from reusable textures to reduce memory usage and startup time. In this case we loaded ~15 000 pixels into memory, and used those to render ~200 000 pixels to screen. By using this approach, we basically only load 7.5% of the used assets, because of the re-usage.

demotopbar

Reusing 2D elements 

Optimized Proof of Concept Application

You can download application source codes and assets from https://git.qt.io/public-demos/qt3dstudio/

And here is a video showing the application running on Renesas R-Car D3

[embed]https://youtu.be/bLKSXb2YNoE[/embed]

 

Summary

Getting the desired visual concept running constant 60 fps with real-time 3D content can be a challenge, but Qt provides a great framework and tools to achieve amazing performance even on a low-end hardware. To achieve maximal performance it is important to consider and identify which components that must be in real-time 3D and the components that can be done with 2,5D (visually looking 3D, but not rendered spatially). This helps to achieve a computationally optimal architecture, and it is a pre-requisite for getting the maximal performance. All the graphical assets should be optimized carefully, especially the 3D models and textures, which can easily become too complex, considering what is possible to detect with a human eye. When using OpenGL shaders, it is important to make sure these are suitable and optimized for the target hardware. With the architecture and assets in shape, the remaining work is to use sound Qt programming techniques - and to regularly use profiling tools to identify and analyze where the possible performance glitches may be coming from.

If you are planning or working on a project where high-performance graphics on lower-end hardware are requirements, we can help. Contact us for more information about our professional services.


Blog Topics:

Comments