QtMultimedia in action: a spectrum analyser

In the previous post, we looked at the QtMultimedia audio APIs, and discussed when you might want to use them. In this post, we'll look at a demo app which uses these APIs, and discuss how it was developed, how it makes use of QAudioInput and QAudioOutput, and some of the challenges which needed to be overcome in order to get it up and running smoothly on all of our favourite platforms.

Before describing the development, let's look at what the app does - you can see it in action in the video above. Having started it, you are given the choice of three modes: "Play generated tone", "Record and play back" and "Play WAV file". First, we play a generated tone. By default, the tone generator produces a sine wave whose frequency smoothly increases. This sine wave data is written into a fixed-size buffer allocated by the application, and pressing the play button starts streaming this data to the audio device via QAudioOutput.The widget at the top of the application shows three pieces of information:

  • The amount of data currently in the buffer, shown in blue
  • The segment of data most recently analysed to compute its frequency spectrum, shown in green
  • The raw audio waveform, shown in white and scrolling from right to left

Below that is a widget showing the frequency spectrum itself, divided into 10 bars. The range of frequencies to which a given bar corresponds can be shown by clicking on the bar. Finally, there is a level meter showing the RMS and peak audio levels of the most recently analysed segment of data, and a 'peak hold' marker which shows the 'high watermark' peak level over the last few seconds. During playback, the same analyses (frequency spectrum and level) are calculated again, in real time. If one of the bars of the spectrograph is clicked, its frequency range is displayed

In order to also exercise QAudioInput, we use "record and play back" mode, in which the sine wave generator is replaced by QAudioInput as the data source. During recording, the same real-time analyses are performed.

Finally, in "play WAV file" mode, the first few seconds of audio data are read into the buffer and can be played back - again, with analysis in real time.

A settings dialog allows the user to select the audio input and output devices which are used, to set the parameters of the tone generator, and to select the window function used by the spectrum analyser (see below).

Architecture of the demoArchitecture of the spectrum demo app

Interfacing with QAudioInput and QAudioOutput is the responsibility of the application's Engine class. This class takes care of the following:

  • Creation of the buffer, which is simply a QByteArray
  • Populating the buffer (in the case of "play generated tone" and "play WAV file" modes)
  • Configuring the QtMultimedia audio objects. For "record and play back", this consists of finding a format which is supported by both QAudioInput and QAudioOutput, so that no format conversion needs to be done.
  • Directing data flow between the buffer and the QtMultimedia audio objects. Recording is done in push mode, so that the Engine only needs to implement a single slot (to which QIODevice::readyRead is connected), rather than needing to implement its own QIODevice. Playback is simpler - the Engine simply needs to feed its internal buffer to the QAudioOutput, which it does by using a QBuffer instance.
  • Managing the calculations - spectrum analysis and level - which are performed during recording and playback.
  • Providing status updates, by emitting signals which are consumed by the MainWidget. These signals include state changes, record / playback position updates and the results of the spectrum analysis and level calculations.

QtMultimedia does not, at present, provide a WAV file parser, so the application rolls its own. Tone generation is taken care of by a simple function which takes as its input the parameters of the tone (start and end frequencies; amplitude) and a QAudioFormat.

Spectrum analysis is performed using a Fast Fourier Transform (FFT). The FFT takes a signal in the time domain (i.e. the audio waveform, in which time is the x-axis) and transforms it into a signal in the frequency domain. Whereas the input signal consists only of real values, each element of the output series is a complex number, and can therefore be expressed as an amplitude and a phase. If we discard the phase information, we are left with just the amplitudes of each of a discrete series of frequency values - which we can then plot as a frequency spectrum.

There are a large number of FFT implementations available, under various licenses. Rather than reinvent the wheel, I chose to use the FFTReal v2.00 library by Laurent de Soras. This was because it is written in C++, exposes a nice simple interface, is a fairly small and simple codebase, and is available under an LGPL license. While I haven't done any benchmarks myself, those available from the FFTW homepage show that FFTReal compares well against other implementations, and for this application, it runs fast enough on all platforms I have used (including on mobile devices).

While either recording or playback is ongoing, the frequency spectrum is calculated using a window of N samples, taken from as close as possible to the current recording / playback position. Fourier analysis is based on the assumption that the input data is representative of the complete waveform - that is to say, concatenating multiple copies of the input signal would perfectly reproduce the entire waveform. Even if the original waveform was a pure sine wave, this assumption will not hold if the length of the window is not an integer multiple of the wavelength - in this case, discontinuities will occur at each 'join' between successive windows. This phenomenon is called aliasing, and can lead to artefacts in the output of the FFT. To avoid this, the input window should be pre-multiplied before feeding it to the FFT, using a 'window function' which basically damps down the values near the start and end of the window. The demo app uses the Hann window function to achieve this. This windowing can be disabled in the settings dialog - aliasing artefacts should be clearly visible after selecting "none" as the window function type and then doing "play generated tone".

Level calculation is computationally simple, and therefore is performed directly inside the Engine class. Because, in contrast, the FFT calculation can take a certain amount of time, it is run in a separate thread from the rest of the application. The threading, and interaction with the FFTReal library, is abstracted away by the application's SpectrumAnalyser class. This allows the Engine to provide a window of data (in the form of a QByteArray) and its QAudioFormat, and later to receive the resulting frequency spectrum via a spectrumChanged() signal. Note that the audio data itself is shared between the two threads, because the QByteArray passed to the SpectrumAnalyser is constructed using QByteArray::fromRawData(), rather than taking a copy. No synchronisation is required however, because the Engine ensures that the corresponding part of its internal buffer is not overwritten while the spectrum analysis thread holds a pointer into it.

The output of the FFT is provided in the form of a FrequencySpectrum object, which contains both amplitude and phase information. The frequency signal contains (N/2)-2 values, the ith of which corresponds to a frequency of (i/N)*F, where F is the sampling frequency of the original audio signal. It can be seen, therefore, that frequencies of greater than F/2 are not represented in the frequency spectrum. (F/2 is known as the Nyquist frequency, and the statement that no frequency above this can be reconstructed from a sampled signal is the Nyquist-Shannon sampling theorem).

The frequency spectrum is plotted by the Spectrograph widget, which receives a FrequencySpectrum provided by the SpectrumAnalyser. This simply divides the frequency spectrum into a number of equally-sized bins, finds the maximum amplitude within each bin, and plots the result as a bar chart. Mouse events are captured, allowing the frequency range of a bar to be displayed when it is clicked. The current implementation uses 10 bins, covering a range from 0 to 2kHz - therefore well below the Nyquist frequency, even when capturing at 8kHz.

Displaying the overall amplitude is the job of the LevelMeter widget. This receives peak and RMS level values from the Engine, and plots them both in the form of a single stacked bar. Directly updating the heights of these bars using the raw peak and RMS levels resulted in a very 'jittery' display, with the bars jumping up and down very quickly. The current implementation therefore applies some smoothing to the RMS level, and allows the peak height to decay slowly following a loud series of samples, rather than dropping down immediately.

Finally, the raw audio signal is plotted by the Waveform widget. At any given time, this widget displays the audio waveform over a 0.5s window. In order to achieve the effect of the waveform smoothly scrolling from right to left through the widget's viewport, its content needs to be updated frequently. The naieve approach of re-plotting the entire 0.5s signal in each paintEvent predictably led to very poor performance, so the widget now maintains a series of tiles - each of which owns a QPixmap - and its paintEvent simply draws each tile at the appropriate offset. As each tile slides off the left-hand side of the widget, it is shuffled around to the rightmost end of the tile queue, and the appropriate segment of audio data is plotted onto it.

Dynamic linkage

One final aspect of the implementation of the demo app which merits some discussion is the fact that, because it is LGPL'd, the FFTReal library is built as a dynamic library to which the demo app links. Ensuring that (a) the library is deployed along with the application, and (b) that the target platform can load the DLL when the app is launched, require a few additional .pro file directives:

  • Windows
    • Both the DLL and the executable (named spectrum.bin) are written to the bin/ subdirectory of the project. Because the current directory (.) is always implicitly in the PATH, this ensures that the DLL can be loaded.
  • Linux
  • Both the DLL and the executable (named spectrum.bin) are written to the bin/ subdirectory of the project. A shell script, spectrum, is then copied into the bin/ directory by app.pro:
unix: !symbian {
copy_launch_script.target = copy_launch_script
copy_launch_script.commands =
install -m 0555 spectrum.sh ../bin/spectrum
QMAKE_EXTRA_TARGETS += copy_launch_script
POST_TARGETDEPS += copy_launch_script
  • Its purpose is to append to the LD_LIBRARY_PATH environment variable before launching the application, so that the DLL can be loaded:
#!/bin/shbindir=`dirname "$0"`
exec "${bindir}/spectrum.bin" ${1+"$@"}
  • Mac
  • FFTReal is built as an OSX framework by adding the following to fftreal.pro:
<code>CONFIG += lib_bundle</code>
  • The framework is then copied into the application bundle, and the paths embedded in the two binaries are updated using the install_name_tool command. This is done by the following code, in app.pro:

framework_dir = ../spectrum.app/Contents/Frameworks
framework_name = fftreal.framework/Versions/1/fftreal
mkdir -p $${framework_dir} &&
rm -rf $${framework_dir}/fftreal.framework &&
cp -R ../fftreal/fftreal.framework $${framework_dir} &&
install_name_tool -id @executable_path/../Frameworks/$${framework_name}
$${framework_dir}/$${framework_name} &&
install_name_tool -change $${framework_name}
  • Symbian
  • All binaries, whether applications or DLLs, go into the same target directory on Symbian, so no work is required to help the OS located the library.

    Deploying to library to the device, however, does require a bit of effort. When qmake is run over an application's .pro file, it generates a 'make sis' target, which packages up the application into a SIS file which can be installed on the device. For this application, we need to include not only the application binary and associated resource files, but also the dynamic library. At present, the qmake generator doesn't do this for us, so we need to add the following to spectrum.pro:

bin.sources     = fftreal.dll spectrum.exe
bin.path = /sys/bin
rsc.sources = $${EPOCROOT}$$HW_ZDIR$$APP_RESOURCE_DIR/spectrum.rsc
rsc.path = $$APP_RESOURCE_DIR
mif.sources = $${EPOCROOT}$$HW_ZDIR$$APP_RESOURCE_DIR/spectrum.mif
mif.path = $$APP_RESOURCE_DIR
reg_rsc.sources = $${EPOCROOT}$$HW_ZDIR$$REG_RESOURCE_IMPORT_DIR/spectrum_reg.rsc
reg_rsc.path = $$REG_RESOURCE_IMPORT_DIR
DEPLOYMENT += bin rsc mif reg_rsc

Room for improvement

There are plenty of features which could be added to make the app more appealing, but for now, I just don't have the time :(

Here are a few:

  • Add more interesting frequency spectrum visualizations
  • One route would be to plug in an existing open-source visualizer such as AVS. Alternatively, using QtOpenGL to write some particle effects would be fun.
  • Dynamic buffer allocation
  • Instead of using a single fixed-size buffer, the Engine could dynamically allocate buffers as required, allowing arbitrary-length recording. For playback of WAV files, data could be read into memory progressively rather than all up-front, which would enable playback of complete WAV files.
  • Adaptive compensation of analysis window position
  • Whenever the spectrum analyser finishes working on its current window of samples, a new window is provided to it. As described above, during recording, the position of this window is calculated such that the analysis is performed on the most recently-received samples. How far 'out of date' these samples are is dictated by the buffering latency, which is fixed once recording begins. For playback, we already have the entire clip in memory before starting, so we are free to place the window wherever we wish when starting the next analysis run. The app at present places the start of the window at the current playback position. How far 'out of date' are the results of the analysis is therefore determined by the length of time taken to calculate the frequency spectrum. We do not know how long this may be, and it may vary while the application is running due to changing CPU load. A better approach may be to measure the time taken for each frequency spectrum calculation, and determine an offset for the next analysis window based on the preceding duration. In this way, the application could attempt to ensure that the centre of the analysis window always lay on (or at least close to) the playback position at the time when the spectrograph was redrawn.

OK, let me have a go

If you've got this far, I'll assume that you have at least a passing interest in picking up these APIs and developing that killer audio app idea which you've been nurturing. Or at least, that you may be keen to build, run and maybe even hack and improve the spectrum analyser demo...

The code for the application can be found in the demos/spectrum directory in the GIT repository. For Symbian, a SIS file containing the application is available here.

Of course, whether you build the app yourself or install the SIS file, you will first need a version of Qt which supports the relevant QtMultimedia APIs. At the time of writing, the latest Qt release is version 4.6.2. This release includes implementations of QAudioInput / QAudioOutput for the following platforms:

  • Windows (WaveOut)
  • Linux (ALSA)
  • Mac (CoreAudio)

For the Symbian platform, the implementation of the QAudio* APIs didn't quite make it into the Qt 4.6.2 release - the demo app will therefore not work with this version. The implementation is available today in the 4.7.0-beta1 release, and will also be in the 4.6.3 release.

Looking forward, while most of QtMultimedia will remain outside Qt as a separate solution package, the QAudio* APIs will remain in Qt 4.7, and are accompanied by the usual compatibility promise. The application will therefore continue to work in Qt 4.7.

Meanwhile, we're working on some other multimedia demos, so watch this space ...

Blog Topics: