CUDA.jl is special in that developers may want to depend on the GPU toolchain even though users might not have a GPU. In this section, we describe two different usage scenarios and how to implement them. Key to remember is that CUDA.jl will always load, which means you need to manually check if the package is functional.
Because CUDA.jl always loads, even if the user doesn't have a GPU or CUDA, you should just depend on it like any other package (and not use, e.g., Requires.jl). This ensures that breaking changes to the GPU stack will be taken into account by the package resolver when installing your package.
If you unconditionally use the functionality from CUDA.jl, you will get a run-time error in the case the package failed to initialize. For example, on a system without CUDA:
julia> using CUDA
ERROR: UndefVarError: libcuda not defined
To avoid this, you should call
CUDA.functional() to inspect whether the package is functional and condition your use of GPU functionality on that. Let's illustrate with two scenarios, one where having a GPU is required, and one where it's optional.
If your application requires a GPU, and its functionality is not designed to work without CUDA, you should just import the necessary packages and inspect if they are functional:
true as an argument makes CUDA.jl display why initialization might have failed.
If you are developing a package, you should take care only to perform this check at run time. This ensures that your module can always be precompiled, even on a system without a GPU:
__init__() = @assert CUDA.functional(true)
This of course also implies that you should avoid any calls to the GPU stack from global scope, since the package might not be functional.
If your application does not require a GPU, and can work without the CUDA packages, there is a tradeoff. As an example, let's define a function that uploads an array to the GPU if available:
to_gpu_or_not_to_gpu(x::AbstractArray) = CuArray(x)
to_gpu_or_not_to_gpu(x::AbstractArray) = x
This works, but cannot be simply adapted to a scenario with precompilation on a system without CUDA. One option is to evaluate code at run time:
@eval to_gpu_or_not_to_gpu(x::AbstractArray) = CuArray(x)
@eval to_gpu_or_not_to_gpu(x::AbstractArray) = x
However, this causes compilation at run-time, and might negate much of the advantages that precompilation has to offer. Instead, you can use a global flag:
const use_gpu = Ref(false)
to_gpu_or_not_to_gpu(x::AbstractArray) = use_gpu ? CuArray(x) : x
use_gpu = CUDA.functional()
The disadvantage of this approach is the introduction of a type instability.