One of the new features introduced in Qt 4.6 is the QtMultimedia module. The 'big picture' view of QtMultimedia has been presented in an earlier post to this blog, and has been recently updated. Here I want to take a closer look at the low-level audio APIs in particular, to discuss the types of applications for which they may be useful.
In a following post, I'll illustrate this by describing a new demo application which has been added to Qt. To whet your appetite, here's a picture of it:
Anatomy of an audio stack
One way to explain the intention of the QtMultimedia audio APIs is to take a step back and look at what happens inside an audio playback software stack. For now, let's think about an archetypal stack, rather than the software which is running on any particular platform. While the implementations vary considerably between platforms, the concepts are broadly similar, at least for the purposes of this discussion.
When the user hits the 'play' button on a music track, the following may be among the operations which take place under the covers:
The following picture tries to show how the components which perform these operations may be related in our imaginary audio stack. The exact arrangement may vary considerably between implementations. In some cases, these differences are simply the result of differing philosophies or approaches to the design of the audio stack. In others, the configuration of audio components may be dictated by hardware constraints. For example, on many embedded devices, audio processing may be performed by a dedicated coprocessor. The physical output connection of this processor may constrain what processing can happen downstream of it - for example, if the MP3 codec runs on a processor whose output is wired directly to the DAC, then no effects can be inserted into the PCM part of the audio path.
The 'high level API' deals only with control, not data. This is to say that no buffers of audio data - be it MP3, PCM or any other format - pass between the client and the stack via this interface. Instead, the client describes the audio data which it wishes to process in the form of a descriptor such as a filename or URL. The processing itself is controlled via high-level commands such as play / record, pause, stop, and seek. On top of these commands, there may be another layer which provides features such as playlist management.
Parameters of the processing may be exposed to the client: these will almost definitely include volume / gain; more advanced parameters such as balance, equalizer, and control over audio effects may also be available. In addition, the API - or perhaps a companion API at a similar level of the stack - may allow the client control over the audio routing, by providing information about which audio input / output devices are currently available, and allowing the client to select which of them is used for a given playback / recording session.
In contrast with the above, the 'low level API' deals directly with the content of the audio stream. Buffers of audio data are exchanged between the client and the lower levels of the audio stack. The data formats which can be used at this level may vary depending on the platform: most, if not all, audio stacks will allow the client to play or record PCM streams, while support for processing streams of compressed data may or may not be provided.
Because this API is dealing directly with the data stream rather than with an abstract clip descriptor, some control commands - notably seek - do not make sense at this level. Others such as pause still do have a place: although the client is providing or consuming data via the API, it is not typically directly connected to the audio hardware itself. There must usually be a level of buffering between the two in order to ensure that, should the client temporarily stop processing (for example due to its thread yielding to, or being pre-empted by, another one), the audio hardware can continue to read or write data into memory.
The set of audio parameters which are available to the client may be restricted in comparison with those provided by the higher-level API - volume / gain may well be the only parameter which this interface exposes. Similarly, the client may be given less control over audio routing than the higher-level API affords. This would be the case, for example, if the low-level API represented a specific physical device, while the high-level API represented the audio subsystem as a whole.
This description of the high level API should sound familiar to those who have used Qt's Phonon API (at least if we only think about audio playback - Phonon does not support recording). The functional scope of the high level API may, however, go significantly beyond that of Phonon, as discussed previously in Justin's post.
The low level API, on the other hand, corresponds to QtMultimedia audio. Before looking at the latter in more detail, it's worth emphasising one point regarding the relationship between Phonon and QtMultimedia: current Phonon backends do not use QtMultimedia. The implementations of these two APIs are currently completely separate - at least down as far as the native API level.
Looking forward, the QtMobility project is delivering a suite of high-level multimedia APIs. These provide a similar level of abstraction to Phonon, but include features which Phonon lacks, and afford additional flexibility. For a recent update on the status and availability of these APIs, see this post.
The QtMultimedia audio APIs
So, having looked at audio APIs from an abstract standpoint, let's look at the QtMultimedia audio APIs themselves. This consists of the following four classes:
The client can control some aspects of the latency* - i.e. the amount of time between audio being sampled by the hardware and the corresponding data arriving to the application - by calling setBufferSize(). Supported buffer sizes may vary from platform to platform, but most allow sub-10ms latencies at all supported formats.
The processedUSecs() function allows the client to determine how much data has been captured by the audio device. At any given time, the difference between this and the amount of data which has been received via the QIODevice indicates the amount of latency.
* Note however that the other source of latency - the time taken for the audio device to prepare to capture data - is outside the control of the client. This initialization happens asynchronously following a call to start(), and its completion is indicated by a stateChanged(QAudio::State) signal.
Work in progress
It's worth saying at this point that there are low-level audio use cases which QAudioInput and QAudioOutput don't cover. Or to put it another way, some of the functionality towards the bottom part of the diagram above is not currently exposed by QtMultimedia. This missing functionality includes the following:
On some platforms - particularly in the embedded space - the concept of audio pre-emption is important. For example, on a mobile phone, music playback may need to be terminated by the system when a call is received, so that the ringtone can be played. Once the call has ended (or has been rejected by the user), music playback can be resumed; whether this happens automatically, or requires user to resume playback, depends on the device in question.
For a QAudioOutput or QAudioInput client which is pre-empted in this way, we need (a) to be able to tell the client that it has been pre-empted, and (b) a way to notify the client when the audio resources which it needs have become available once again, so that it can either auto-resume, or just re-enable the 'play' button in its UI.
Plans for adding volume control to QtMultimedia are under way; discussions around the other topics are ongoing. As always, we'd value feedback on these or any other aspects of Qt - please get involved by commenting on this post or via the #qt-labs IRC channel.
When you should go low
So, given the choice between the high-level Phonon API and the low-level QtMultimedia API, what considerations can help you decide which one to use?
Well, let's start with some easy wins - there are some use cases when Phonon is clearly the right choice:
In these cases, there is no reason to go to the extra effort which using QAudioOutput would require - the Phonon backend is already doing the heavy lifting for you.
Another use case for which Phonon is probably the way to go is when your application needs to deal with DRM data. This is because using QAudioOutput would likely require the application to handle plaintext (i.e. decrypted) data itself, and may therefore impose some limits on how the application can be deployed.
Update 18/05/10: Phonon does not currently support playback of DRM content, and there is no plan to add this.
Conversely, if the application needs to record, rather than just play audio, then Phonon clearly isn't suitable since it doesn't support audio capture.
On the other hand, if the application has any of the following characteristics, QAudioOutput may be the better choice:
Applications which may have such requirements include:
A problem which presents itself when looking at the above list is that taking the decision to use QAudioOutput leaves a lot of work to be done in the application itself. Imagine that the reason Phonon cannot be used is that, although the application needs to stream audio via a protocol which the Phonon backend supports (say, RTSP), the stream is encoded using a proprietary codec which is not available to the Phonon backend. In this case, the application needs to manage the RTSP stream itself - either by using its own streaming engine, or maybe by using native platform streaming APIs, decode the stream using the proprietary codec, and then pass the decoded audio data to QAudioOutput.
The root of this problem is that the abstractions offered by both Phonon and QtMultimedia correspond to a large chunk of audio stack, rather than allowing access to individual components. In the case of Phonon, the entire stack is lumped together and abstracted by a single API. QtMultimedia breaks this down a bit, but still groups the codecs and the final output stage (routing and the DAC) together. There aren't yet any Qt abstractions for individual components such as codecs, meaning that application developers who wish to address those components directly must do so via platform-specific APIs.
A demo is worth a thousand words
... but I've already written twice that much, so it's probably time for a break. In a following post, we'll look at putting QtMultimedia to work in that demo application.
Stay up to date with the latest marketing, sales and service tips and news.
Download the latest release here: www.qt.io/download.
Qt 5.12 was developed with a strong focus on quality and is a long-term-supported (LTS) release that will be supported for 3 years.
Näytä tämä julkaisu Instagramissa.
Want to build something for tomorrow, join #QtPeople today! We have loads of cool jobs you don’t want to miss! http://qt.io/careers #builtwithQt #software #developers #coding #framework #tool #tooling #C++ #QML #engineers #sales #tech #technology #UI #UX #CX #Qt #Qtdev #global #openpositions #careers #job
Henkilön Qt (@theqtcompany) jakama julkaisu