As the use of Graphics Processing Units (GPUs) becomes increasingly prevalent in various domains, including artificial intelligence, machine learning, and high-performance computing, it's essential to ensure that software applications running on these devices are reliable, efficient, and free from errors. CUDA is a popular programming model for GPUs developed by NVIDIA, which allows developers to harness the power of parallel processing to accelerate their applications. However, analyzing CUDA code can be challenging due to its unique characteristics, such as inherent parallelism, variants, and floating-point values. In this blog post, we'll explore some of the key challenges that static analysis systems face when trying to analyze CUDA code.
We have accepted these challenges and, starting with Axivion Suite 7.10, we fully support analyzing applications utilizing CUDA.
The usual benefits of using C++ templates, e.g. compile-time evaluation including type checking and constant folding apply to CUDA code as well. Furthermore, the nature of CUDA adheres to some benefits of using templates particularly well:
In consequence, typical CUDA applications make heavy use of C++ templates. This is also true for the CUDA standard library and for several supporting libraries provided by NVIDIA such as cuBLAS (for linear algebra) or cu DNN (for neural networks).
However, templates are heavy on the compiler and by extension on static analysis systems as well. Typically, compiling templates involves analyzing dependencies between them, resolving overloads and finding appropriate instantiations. Static analyzers must take the same steps to appropriately analyze CUDA code, as the pure written-as-is source code does not convey enough information to infer desired properties.
Many advanced analysis techniques ultimately rely on propagating information through the control or dataflow of an application. While this can already be complicated for linear but branching control flows, analysis of different possible thread interleavings is certainly even more complex and consumes higher computational resources. Yet, parallelism is what CUDA is all about. Offloading computation to GPUs really benefits from their high number of cores and their capabilities to parallelize.
This blows up the search space of potential program behaviors an analysis tool must traverse to uncover potentially problematic behaviors. Furthermore, new kinds of issues can arise in these systems and should be detected by static analysis systems: deadlocks, data races or other incorrect usage of shared memory.
Compute capabilities refer to the architecture of NVIDIA GPUs and their ability to execute CUDA kernels. Each compute capability represents a specific generation of NVIDIA GPUs, with newer generations typically offering improved performance, power efficiency, and support for more advanced features.
To support both old and new hardware at the same time, CUDA applications have to take the compute capabilities into account. This can either be done at compile time using conditional compilation and the __CUDA_ARCH__ macro or dynamically at runtime using compute_capability(), as shown in the following examples:
void perform_operation(…) {
#ifdef(__CUDA_ARCH__) && __CUDA_ARCH__ >= 302
// code using features available in the architectures
#else
// code for older architectures' capabilities
#endif
}
void perform_operation(…) {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
if (prop.major * 10 + prop.minor >= 30) {
// code using features available in the architectures
} else {
// code for older architectures' capabilities
}
}
Both approaches have consequences for static analysis systems. In the static case, where the compute capability is determined at compile time, analyzers must decide which branch (or both in two analyses!) to take when checking the code in question.
This is comparable but not identical to the increased complexity when considering software product lines. However, when considering product lines, it is usually sufficient to consider each individual realization separately as they never interact directly. This is not the case for CUDA, where it is perfectly possible to compile different versions for different devices and use them in the same application. Therefore, analyzers have to compile different versions of the application under analysis and analyze them simultaneously.
Comparably, the dynamic case does not lead to an increase in variants that have to be considered but instead increases the complexity of the control-flow that has to be considered.
With their roots in graphics processing, GPUs and by extensions the AI-tailored chips by NVIDIA work on floating-point or even double-precision floating-point data first and foremost. While this is appropriate for their primary use cases in scientific computing or machine learning, it is unusual for the domains in which static analysis systems are usually applied. In fact, the well-known MISRA ruleset for safety critical systems discourages the use of floating-point numbers and arithmetic on them to a certain degree. Thus, tools are not well tuned to the analysis of floating-point values. Furthermore, floats are harder to analyze due to special behavior when it comes to rounding, handling of precision, their representation in memory and different other aspects.
However, the new focus on floating-point numbers due to CUDA and various AI libraries and systems will also help improve results for other rulesets such as MISRA or CERT. At the same time, examples in these rulesets can be extended to CUDA to get a first impression on what properties can and should be reasonable checked.
As an example, CERT C (which is to some extend applicable to C++ as well) includes among others three rules regarding floating points that should be adhered to by CUDA applications as well:
In conclusion, analyzing CUDA code presents several unique challenges for static analysis. By understanding the complexities of CUDA programming and adapting our analysis techniques accordingly, we ensured that Axivion can help developing safety-critical applications with the highest level of quality and maintainability in mind even and especially if they contain CUDA code. As the use of GPUs continues to grow in various safety-critical domains, the importance of developing robust and reliable CUDA applications will only increase. By addressing the challenges posed by CUDA, Axivion stays ahead of these developments and provides the static analysis tools needed ensure that software applications meet the high standards required.
Learn more about Axivion for CUDA or request a demo from one of our experts.