Blog

Face and Voice Recognition on MCUs: The Edge AI Coffee Machine Challenge

Mar 13, 2026

by Bruno Vunderl

Comments

Imagine a scenario - it’s Monday and you just arrived at the office at 7:55 AM since you have a meeting at 8 AM. What kind of person schedules a meeting at 8 AM? Everything is a pain and a hardship. You can barely see, but your smart coffee machine is your friend. It knows who you are and what your favorite drink is, it understands you, but doesn’t judge. It can’t give you a hug, but it can quickly brew you a cup of warm and fresh cup of your preferred beverage, helping you survive that meeting and come back to the land of the living.

This “problem” was given to the students attending the Distributed Software Development course in cooperation between Politecnico di Milano (Italy), University of Zagreb (Croatia), and Mälardalen University (Sweden). Despite limited knowledge of microcontrollers (resource-constrained devices), AI, or Qt, they crushed the challenge and created an impressive edge AI application.

The Challenge: Build a Complete Edge AI Application on Resource‑Constrained Hardware

Students were tasked with designing the UI, developing backend logic, and implementing face and voice recognition, turning a simple coffee machine into a smart coffee machine. Instead of a powerful CPU, the student team was given an ESP32‑P4‑Function‑EV microcontroller development board with a built-in camera, microphone, and capacitive touchscreen.

The requirements were ambitious:

Camera‑based facial recognition to identify users
Voice command support for hands‑free interaction
Touch‑driven UI built using Qt for MCUs
Drink personalization based on user habits
Customizable brewing workflows with visual feedback
Fully offline processing—no cloud services involved

In addition, students needed to integrate image recognition and voice models into the microcontroller environment using frameworks such as ESP‑DL or ESP-SR. The project required the blend of skills modern embedded developers need: UI/UX design, C++ development, AI model training, and the creativity to make it all work on a constrained MCU platform.

Meet the Edge AI Coffee Machine

The Edge Coffee Machine (ECM) is an AI-driven embedded application built for the ESP32-P4. Developed through a collaboration between PoliMi and MDU, it explores how Edge AI can support everyday office routines. All processing runs directly on the device, enabling fast responses and preserving user privacy through fully offline operation.

UI Highlights

Coffee selection: A Qt-based interface offers a menu of beverages such as Espresso, Cappuccino, and Mocha. The menu adapts over time, prioritizing favorites while occasionally introducing new options.
User switching: Profiles are loaded automatically through facial recognition, but users can switch manually, register as a new user with a generated name, or continue as a guest.
Brewing animation: When brewing starts, the price area becomes a wave-style progress indicator showing the drink status, with an option to cancel.

AI Behavior

Using the ESP-DL and ESP-SR frameworks, the system supports:

Facial recognition: Detects users and loads their preferences automatically.
Voice commands: Enables hands-free control through wake-word detection (“Hi ESP”) and commands like “brew coffee” or “cancel,” all processed locally.
Adaptive recommendations: Tracks user habits over time and adjusts drink ordering based on behavior patterns such as default, conservative, or exploratory preferences.

We’re especially happy with the UI we built using Qt for MCUs. Even with the limits of a microcontroller, we managed to create a smooth interface with custom animations, like the rising waveform that shows brewing progress.

We also put a lot of effort into the recommendation system. It tracks user behavior over time using simple scoring based on exponential decay, estimating things like how often someone tries new drinks or customizes orders. Based on that, the system adjusts the drink menu to better match their habits.

Another highlight was getting multiple AI features to run directly on the device. We integrated both facial recognition (ESP-DL) and speech recognition (ESP-SR) on the ESP32-P4 and managed to run them alongside the UI without slowing the system down. Making all of that work together reliably on embedded hardware was a big step for the project.

Working with Qt for MCUs was a challenging but useful experience. The framework’s clear separation between the QML presentation layer and the C++ backend made collaboration much easier for our distributed team. It let us work in parallel: one group focused on the desktop UI simulation for quick prototyping, while another handled the AI and hardware integration on the board. This modular structure made it possible to bring all the different components together in the end.

Meet the Student Team

This year’s Qt project team brought together seven students with diverse academic backgrounds and technical strengths: Emiliano Finetti, Matteo Delton, Piervito Creanza, and Maxime Trimboli from Politecnico di Milano and Javier Asensio Castillo, Jorge Jiménez Oropesa, and Rakocevic Balsa from Märlardalen University. So let’s hear from the students!

Why did you Choose the Project?

We were drawn to the Edge Coffee Machine because it combines physical hardware with advanced software in a meaningful way. What really interested us was the technical challenge: running AI vision and speech recognition directly on the edge. Using Qt for MCUs gave us a solid framework to build a clean, responsive interface, and we wanted to explore how much we could achieve on a microcontroller without depending on cloud services.

How did you Split the Work within the Team?

With a diverse group of seven students across two countries, clear organization was vital. We divided ourselves into three specialized sub-teams to ensure parallel development:

UI/UX Team: Focused on designing a responsive, intuitive interface using Qt, ensuring the "personal touch" of the machine felt seamless.
Backend Team: Handled the core logic of the brewing workflows and user habit tracking.
AI Components Team: Adapting and integrating the facial recognition and voice command components from ESP-DL and ESP-SR.

What Kind of Challenges did you Face, and how did you Solve Those?

Early on, we had to get up to speed with QML and Qt for MCUs, which were new to most of us. We handled that by building quick prototypes and doing a lot of collaborative pair-programming sessions to learn the tools as we went.

Later, the main challenge was integration—making the AI models, UI, and backend logic work together without noticeable lag. We also ran into the hardware limits of the ESP32-P4, which meant being careful about how we used its resources. To keep the interface responsive, we spent quite a bit of time optimizing the system, including manually assigning different AI models to specific processor cores to spread the workload. It turned into a good exercise in practical embedded systems optimization.

Inspiring the Next Generation

The Edge Coffee Machine project demonstrates what’s possible when talented students gain access to modern tools, real hardware, and industry‑relevant challenges. By combining Qt for MCUs, embedded C++, and edge‑optimized AI, the team built far more than a course assignment—they created a glimpse into the future of smart, personalized devices.

Their work highlights the growing importance of embedded intelligence and the value of hands-on learning opportunities that bridge academia and industry. We’re proud to support the next generation of developers as they explore what’s possible on the edge—and we can’t wait to see where their ideas take them next.

Stay tuned for more student stories, project showcases, and opportunities from the Qt University and Talent Network.

📎 Useful Links

Share with your friends

Blog Topics

Comments

Related Articles

Qt for MCUs 2.12.2 LTS Released

Qt for MCUs 2.12.2 LTS has been released and is available for download...

Read Article

Introducing the Qt Project CMake Skill for AI Agents

The Challenge: CMake and Qt - Powerful Together, Tricky in Practice The..

Read Article

Introducing Agentic Test Generation Skills for Qt Quick

Writing unit tests can be one of the most time-consuming and least..

Read Article