The Copilot Era: How Generative AI Is Reshaping Quality Assurance Team Roles in 2025 and Beyond

Video Version: Maximize the potential of AI in Quality Assurance

You can also watch the video version of this whitepaper

Introduction

An Opportunity for AI in Software QA

AI is no longer some futuristic concept confined to academic labs or sci-fi films. It's now a practical, widely adopted tool integrated into everyday software development and quality assurance (QA) workflows. What began as experiments with machine learning has evolved into strategic applications of generative AI across the software lifecycle.

This whitepaper explores how GenAI is transforming QA practices, not by replacing engineers, but by augmenting their capabilities. It examines the practical applications, strategic shifts, and emerging responsibilities reshaping the QA discipline.

The central theme is clear:

AI is not the pilot—it is the copilot.

Used effectively, it accelerates development, enhances quality, and improves efficiency. Used poorly or prematurely, it introduces risk. The future belongs to those who understand how to integrate AI not as a replacement, but as a partner in engineering excellence.

-

This whitepaper is the result of insights shared during the panel discussion “Maximize the Potential of AI in Quality Assurance” and the contributions of leading QA and Test Automation experts: Peter Schneider, Principal of Product Management at Qt Group; Maaret Pyhäjärvi, Director of Consulting at CGI; and Felix Kortmann, CTO at Ignite by FORVIA HELLA. Together, they offer extensive experience and unique perspectives on modern testing strategies, automation frameworks, and the evolving role of quality assurance in software development.

Meet the contributors:

Contributor

Peter Schneider

Principal, Product Management, Qt Group

Peter is a seasoned product leader at Qt that brings hands-on experimentation to the table. From generating automated Squish test scripts and converting Figma to QML, to fine-tuning and deploying LLMs via Hugging Face.

Watch the keynote: Using AI Assistants for Qt UI Development

Follow Peter on

Contributor

Maaret Pyhäjärvi

Director, Consulting, CGI

Maaret's work spans early experimentation with machine learning to practical use of tools like GitHub Copilot and ChatGPT in real test environments.

Read the blog by Maaret: Exploratory Testing with GenAI: How AI Becomes an External Imagination in Software QA

Follow Maaret on

Contributor

Felix Kortmann

CTO, Ignite by FORVIA HELLA

Felix leads a dedicated software-focused spinout in the automotive space, applying GenAI to development and QA tooling with high stakes.

Read the blog by Felix: "It’s a Copilot, Not a Pilot”: How to Use GenAI Responsibly in Software Quality Engineering

Follow Felix on

Updated: Oct 8, 2025

GenAI in Development: From Suggestion to Support

AI is reshaping how software gets built, and one of the clearest changes is its handling of repetitive coding tasks. Tools like GitHub Copilot have turned the blank screen into a starting point, offering smart suggestions that help developers move faster and focus on higher-value work.

That momentum is now reaching quality assurance.

Take unit test fixtures as an example. These are the bare-bones structures developers create before writing actual tests. They usually take hours to set up, but with AI they can be generated in minutes. That time savings allows teams to begin testing earlier in the cycle (shift left testing), where gaps surface quickly and can be addressed before they become expensive defects.

Reviews are changing too. Traditionally, a code review might involve wading through dozens of changes line by line. AI can now produce summaries of code changes, spotlighting the parts most worth human attention. The result: reviewers spend less energy on scanning and more on the critical reasoning that actually improves quality.

But here’s the catch: speed is not the same as quality. AI predicts what looks right based on patterns; it doesn’t understand what is right for your product, your users, or your business rules. It cannot anticipate the subtle edge cases that make or break user experience. Those responsibilities still belong to engineers and QA leaders.

For quality leaders, the mindset shifts. Output must always be measured against context, intent, and risk. The real question is not just “does it work?” but “does it work for the right reasons, in the right way?”

As one principle reminds us:

“We must emphasize reviewing AI-generated code holistically. Copilot isn’t a pilot,” shares Felix Kortmann, CTO at Ignite by FORVIA HELLA

The organizations that succeed won’t be the ones that hand over the reins to AI. They’ll be the ones that know when to use it as leverage, and when to trust human judgment. The future of QA belongs to teams that stay responsible, stay curious, and keep people, not predictions, at the heart of software quality.

Where AI Supports QA Today

Generative AI is already adding practical value in QA environments. From regulated healthcare to fast-paced SaaS, these tools are quietly reshaping how testing and validation are approached. The most useful AI applications are meant to enhance core QA activities:

1. Testing Support

AI can generate test scaffolds, surface edge cases, and create simple validation logic directly from source code, requirements, or user stories. This speeds up test development and ensures broader coverage from the start.

In regulated environments, this foundational layer is a huge time-saver. While human-designed tests are still required for rigor and risk-based analysis, AI provides a reliable launchpad for test creation and iteration.

2. Requirements Engineering

AI is especially effective at dealing with large sets of requirements where complexity often hides defects. It can:

Flag contradictions or overlaps in requirements.
Translate natural-language requirements into structured, testable conditions.
Detect ambiguous or vague statements before development begins.

For products with thousands of functional and regulatory requirements, this early insight is invaluable. It enables teams to eliminate confusion and reduce downstream change requests that can derail timelines.

3. Code Reviews

Code reviews are essential for maintaining quality, but they remain a well-known bottleneck in modern and fast-paced development teams. GenAI can help ease this pressure by summarizing pull requests, pointing out potentially risky changes, and suggesting areas that might need extra testing. Still, AI can’t make the final call: Its suggestions always require human judgment to separate the helpful from the misleading.

Some challenges, however, go far beyond what generative AI can solve. Hidden structural issues, architectural drift, and functional safety risks demand deeper inspection. This is where specialized tools make a difference. With advanced static analysis and architecture validation features offered by specialized tools (like Axivion), you can identify code smells, enforce architectural rules, and expose risks that could compromise long-term maintainability or safety.

The most effective path forward is not to choose one over the other but to combine them: AI for rapid, surface-level insights; and specialized tools for deep structural analysis; and human expertise to guide decisions. Together, you will be able to build a safer, and more reliable review process.

4. Exploratory and Visual Testing

Exploratory testing
often depends on experience and intuition. AI now adds a new dimension by:

Analyzing screenshots or session recordings.
Identifying broken UI flows, visual inconsistencies, or unexpected behaviors.
Prompting testers to investigate paths they may not have considered.

This turns AI into a creative companion, helping testers explore the unknown and catch the unexpected.

(Read more about the Modern Approach to Exploratory Testing with GenAI here.)

5. Documentation Review

In highly regulated fields, documentation is a core deliverable. Every release must be accompanied by extensive evidence of compliance, traceability, and quality assurance. Traditionally, QA teams have spent significant time compiling, formatting, and cross-checking documents for submission. This overhead often slows down release cycles and shifts focus away from higher-value activities like improving test coverage and analyzing product risks. Some teams now use RAG (Retrieval-Augmented Generation) models to process hundreds of documents before submission.

By combining a vector database with large language models, these systems can process hundreds of documents at once and assist teams in preparing compliance-ready submissions.

For example, RAG-based tools can:

Surface missing or incomplete test evidence.
Detect inconsistencies across requirements, test cases, and execution results.
Verify that formatting and citation standards meet regulatory requirements.
Summarize traceability links between requirements, design artifacts, and validation tests.

Instead of spending cycles on manual formatting and cross-referencing, QA engineers can focus on ensuring the correctness and completeness of their test evidence.

(Read more about RAG-QA and how to build Retrieval-Augmented Generation QA (RAG-QA here.)

Early-Stage Use Cases Worth Watching

AI is beginning to stretch beyond assistive tools and into more integrated roles within QA. Some of the promising experiments include:

Generating end-to-end test cases from structured inputs like Gherkin scenarios.
Creating object maps for UI tests directly from screenshots.
Mapping software architecture diagrams against known compliance patterns.

These are still exploratory. But they point to a future where QA might work by conversing with AI systems:

“Here’s what changed: What should we test? What are the likely risks?”

The Missing Piece: Judgment, Feedback, and Trust

For all its power, GenAI still lacks one crucial element: trust metrics. Unlike traditional AI models in vision or classification, most LLMs don’t offer confidence scores. That means:

QA engineers currently have no reliable way to determine whether a suggestion is accurate or completely off-base.
There’s no system of record for accepted vs. rejected suggestions, which means no learning loop.
Prompting remains more of an art than a science.

This emphasizes the need for better feedback loops, validation tools, and human-in-the-loop patterns that balance speed with caution.

The Future QA Role: Strategic, Cross-Functional, and AI-Literate

As these tools mature, QA professionals are evolving too. Their roles are expanding in three directions:

Prompt engineers, skilled at writing inputs that produce accurate, useful AI responses.
Test strategists, focused on risk modeling, prioritization, and system-wide insight rather than manual execution.
Educators and stewards, guiding their teams in how to use AI responsibly, audit outcomes, and interpret results correctly.

By automating routine tasks, AI gives QA professionals space to lead with insight and strategy.

What 12 Functional Testing Workflows to Automate vs. AI-Augment vs. to Keep with a QA Engineer (Infographic)

Click on an image to open the full infographic

Critical Limitations of AI in QA Workflows

AI can support many aspects of QA, but it also introduces critical risks that demand careful attention. These limitations go beyond technical issues and have direct consequences for product reliability, user trust, and team responsibility. To use AI safely and effectively in quality workflows, teams must understand where its capabilities fall short.

Key limitations include:

Hallucinations: AI may produce output that seems plausible but is actually incorrect or entirely fabricated, such as test cases based on false assumptions.
Lack of Compliance Awareness: AI cannot interpret legal, regulatory, or domain-specific requirements, which can result in gaps in coverage or risk exposure.
Shallow Understanding: It can follow patterns and syntax but often misses the underlying logic or business context behind code and test cases.
No Accountability: When failures occur in production, responsibility remains with the human team. AI bears no responsibility for outcomes or consequences.

How to Begin with AI in QA

You don’t need to overhaul your QA stack or seek executive approval to get started. Begin with a small, meaningful experiment:

Look at your current workflow. Which tasks are repetitive or error-prone?
Try a GenAI tool like ChatGPT or Copilot to assist with one of them.
Reflect on the outcome. Did it save time? Improve understanding? Reduce review friction?
If it worked, expand gradually from summaries to test generation to CI/CD pipeline support.

“Even summarizing a release note or writing a good test case can be made better with AI. Start there,” shares Maaret Pyhäjärvi, Director at Consulting, CGI

Be mindful: large prompts and API usage may carry cost implications. Use these tools intentionally.

You might also check whether the QA tools you already use have started offering AI features. Many vendors are adding copilots, smart test recommendations, or auto-documentation to their platforms. If not, it may be worth considering specialized QA solutions that are built with AI at the core, providing tighter integration with test management, compliance, and release workflows within a unified platform, or a single IDE.

Final Thoughts: Adopt the Copilot Mindset

AI is not here to replace QA professionals. It’s here to support them, help them move faster, think more clearly, and focus on the work that matters most.

The real opportunity is not automation. It’s augmentation.

When we treat AI as a partner, one that suggests, but doesn’t decide, we preserve what makes QA truly valuable: human insight, ethical judgment, and the responsibility to safeguard quality.

So stop asking:
“When will AI do my job?”
Start asking:
“How can AI help me do my job better?”

That shift in mindset is what will define the next era of software quality. And the QA teams who embrace it will be the ones driving that future forward.

Strengthen Every Step of Your QA Process

From code analysis to test execution and reporting, these tools work together to help QA teams improve coverage, detect issues early, and maintain long-term software quality

Detect code smells and architecture violations before they reach testing

Gain insight into your code coverage and identify testing gaps

Automate GUI testing across desktop, web, mobile, and embedded systems

Consolidate results, track progress, and ensure traceability