About my Qt times, and a Qt for Python voice assistant

About my Qt times, and a Qt for Python voice assistant

I spent a short time @Qt, but a fruitful one. I was lucky to work with the great Qt for Python team that made me feel very welcomed. I was also lucky to have had a great mentor, Cristián Maureira-Fredes, that was super supportive and cool to work with.

Besides helping to spread the word about Qt for Python in the Open Source community by going to conferences and events, during my time in Qt I added a few features to the Qt ecosystem to make Qt for Python more visible and interesting to beginners like adding new templates and the example tab in QtCreator, I also wrote a translation tool to convert Qt code for Qt for Python and translate our docs, wrote a tutorial and an example to present at the Qt World Summit 2019: a voice assistant using open source technologies. This blog post will explain a little bit about this example and it's going to show you a thing or two about this it in case you want to experiment with it.


Project overview

The idea of ​​this project is to be a generalist voice assistant who will answer questions for you. If you understand these guidelines roughly, you'll be able to customize this example to serve whatever simple purpose you want.

We have a graphic interface that will allow us to capture voice with a Speech To Text (STT) tool. This tool is going to transcribe user's voice in to text. We then take this text and send it to our NLU component which will extract meaning from this data like intents and entities. This processed text goes in to a Dialogue Manager that will decide which response on its database is more appropriated to give back to the user. Once it has decided, it handles the string to the Text To Speech (TTS) tool that will reproduce the audio to the user and to the GUI that will show it in the screen. Here's what's happening in this software in a more graphic way:


Specifically, we're working with Mozilla DeepSpeech as our STT tool and Mozilla TTS as our TTS tool, Rasa for NLU and Dialogue Management, Qt for Python for the backend and general glue to make all these technologies interact well with each other and QML for the front end interface. The reason why I chose Mozilla and Rasa technologies in the detriment of others is simple, both of these projects have very active open source communities, they're agnostic regarding which OS you're using and they're really malleable allowing you to customize your application with not a lot of effort. Besides that all of these technologies have Python implementations which make them perfect to integrate with Qt for Python. QML is here to make everything looks beautiful and it also integrates very well with Qt for Python.

I strongly recommend you to create a virtual environment to reproduce this project, but you're free of course to not do it. It's necessary to have an updated Python version (>= 3.5).

In the following sessions we'll go through each one of the elements that compose the voice assistant. If you're curious to look at the code or want to run this project head over here or access it in you Qt for Python installation, just go in to your Python directory and find the PySide2 directory.

Speech to text component

DeepSpeech offers you the opportunity of processing audio in chunks, allowing you to process it in real time while being recorded and in this case this is the kind of implementation we're using. You can read more about how this works under the hood in this blog post by Reuben Morais.

To add DeepSpeech in your project you'll have to install the project:

pip install deepspeech


You'll also need a trained model to use it. Currently I'm using v0.5.1 and you can download it here (you can also get a newer version in the DeepSpeech repository).

Unzip it inside your project folder and that's all you have to do to have DeepSpeech up and running.


For the NLP part we're using a really simple bot that answers Qt for Python questions related. You will find it in the PySide2 code base under the examples/multimedia-extra/virtual-assistant folder once is merged or you can also download it from here. In both places you can find a complete version of the code, not only the bot implementation.

To install Rasa, our NLP tool:

pip install rasa


Rasa doesn't come with a model, which means you have to train one yourself, but don't worry it's going to be fast! Just get in to your qt-rasa directory and train it with:

cd qt-rasa && rasa train --augmentation 0


If you're interested in getting more details about how Rasa ecosystem works please have a look at Justina Petraityte blog posts, you can begin with this one.

Text to speech component

Now we're going to set up the TTS tool. First you have to download TTS repo and checkout for the branch you want, in this case we're using the branch db7f3d3 and this is the command you can use to get it:

git clone https://github.com/mozilla/TTS.git && cd TTS && git checkout db7f3d3


The next step is going to take a while to finish, but don't worry, it's normal. We're building the whole environment to be able to train our own models and depending on your Python version you'll also be compiling the dependencies.

python setup.py develop


Now we need to download the model related to the branch we are. You can do it here. We'll only need the "config.json" and "best_model.th.tar" files. If you're inside your TTS directory go back one level (to the main directory of your application) and create a directory called tts_model. Put the files you just downloaded inside this directory.

In the voice assistant we're using a really straight forward implementation of TTS where we have the initializer + two classes, load_tts_model() which will load the model to our code base and allow us to use it and tts_predict() that will actually return us a sound file with an answer to the user.

If you think this setup is too much of an overhead, Mozilla have recently started to work in an easier solution to get TTS working on your machine. It's a work in progress, but you can already test this solution. Just go in to their Github, scroll to the end of page and check the new Tacotron2-iter-260K version. In a near future this will be updated to run with a simple pip install, so depending on when you're reading this you can already try it out!

If you're curious about how TTS works have a look at Eren Gölge's posts!

Qt for Python

Qt for Python is here to make all of these technologies work smoothly together as well to help us with some handy Qt functions that will make our job easier. We're also using it to create a database system so we have a history of our talks with the assistant. To get PySide you simply have to:

pip install pyside2


In case you're confused, PySide2 is the name of the Python module that we're using here while Qt for Python refers to the complete project, including things like Shiboken - but that's another story.


For the interface you don't really have to install anything, we're using QML that comes in the Qt for Python package. The QML implementation is simple and I'm not getting in to details here because you can find an explanation on how everything is working in the Qt for Python tutorial's page accessing the "QML, SQL and PySide2 integration" tutorial.


So after all of these installing and configuration you just have to run:

python main.py


To load our GUI and in a different terminal window inside the qt-rasa directory run:

rasa run --enable-api -p 5002 -vv


To activate Rasa. Now you have a functional voice assistant system that can be configured as you like!

Here is a wrap up for Unix users in case you got lost somewhere:

  1. Create a virtual environment using your favorite tool
  2. Using a Python >= 3.5 pip install PySide2 rasa deepspeech torch or use the requirements file
  3. Download the model from DeepSpeech's repo and unzip it wget https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/deepspeech-0.5.1-models.tar.gz tar xvfz deepspeech-0.5.1-models.tar.gz
  4. To download and checkout to the correct branch of our TTS tool git clone https://github.com/mozilla/TTS.git && cd TTS && git checkout db7f3d3
  5. python setup.py develop . This step may take a while because Mozilla TTS will install everything it needs to train a model, also, depending on your python version you might need to compile the dependencies.
  6. Download from here these two files: "config.json" and "best_model.th.tar". They are the configuration that we're going to use for TTS and the best model available at the time this tutorial was first created. TTS is constantly improving and you can access new, better models here
  7. If you're still in the TTS directory we just created cd .. to the main directory. Create a directory inside of it called tts_model and put the files downloaded in step 6 there
  8. cd qt-rasa && rasa train
  9. On qt-rasa directory run rasa run --enable-api -p 5002 -vv to start the NLP server
  10. On the main directory run python main.py to open the GUI

TIP: There is a README file both in the examples directory inside PySide2 and in the git repository with this same succinct step to step guide for Windows users.


Hopefully this blog post have helped you to get an understanding and feeling of how easy it is to create complicated things using Qt for Python and how this voice assistant works in a high level.

Now, if you want more information about how to personalize this voice assistant to create your own, head to Rasa's documentation page and read through session 2 to 4 here. There is extensive documentation in the website that will help you go from there.

If you created something cool that's open source using Qt for Python and you want to share with the community reach out to the Qt for Python team in one of our channels we'd love to see what you've created and include it in our official docs and package!

Blog Topics: