2. Prerequisites and environment setup

Before installing this software, make sure you have the pre-requisites corresponding to your build environment installed. CUDA Toolkit and a suitable GNAT Ada compiler are required for building the CUDA bindings.

Note

In cross compilation workflow the version of CUDA Toolkit on the development host must be the same as that on the target.

2.1. CUDA Toolkit installation on a workstation with NVIDIA GPU

In case the machine used for development has a CUDA-capable NVIDIA GPU the toolkit can be installed by following the standard setup instructions from NVIDIA. Start from downloading the CUDA Toolkit for your development host from https://developer.nvidia.com/cuda-downloads.

You need to have the CUDA Toolkit in your PATH, and in particular ptxas. You can check that by running:

which ptxas

If it doesn’t return anything, CUDA may not be properly installed, or needs to be put in your PATH, e.g.:

export PATH=/usr/local/cuda/bin:$PATH

2.2. CUDA Toolkit installation on a workstation without a suitable GPU

In case the development host doesn’t have CUDA-capable GPU, the available GPU is not compliant with that on the target or the development environment needs to be installed without root permissions, the toolkit can be installed without video card drivers.

Downloading the CUDA Toolkit in runfile format for your development host from https://developer.nvidia.com/cuda-downloads.

Decide where the toolkit shall be installed and expose the location of the toolkit with environment variable CUDA_ROOT.

Warning

CUDA_ROOT cannot point to a folder that contains a gcc or gnat installation in any of its subdirectories. By default, gcc is installed in /usr. Avoid installing a custom CUDA toolkit in the same folder.

mkdir cuda-toolkit
export CUDA_ROOT=`pwd`/cuda-toolkit

Install the toolkit using the runfile downloaded from NVIDIA website using the options listed below:

sh cuda_<cuda version>_linux.run --silent --toolkit --toolkitpath=$CUDA_ROOT --override --defaultroot=$CUDA_ROOT/root

Expose CUDA libraries for the linker and binaries for the setup script:

export LD_LIBRARY_PATH=$CUDA_ROOT/targets/<architecture>/lib:$LD_LIBRARY_PATH
export PATH=$CUDA_ROOT/bin:$PATH

<architecture> above is the name of the architecture for the target platform.

2.3. Native compiler on x86_64 Linux

In case both the development host and the target are running x86_64 Linux then the following tools are required:

An x86_64 Linux environment with CUDA drivers (see above)

An installation of GNAT Pro, version 24.0w (20230413) or later.

2.4. Cross compilation for aarch64 Linux

If the development host is running x86_64 Linux and the target aarch64 Linux then the following tools are required:

An aarch64 Linux environment with CUDA drivers on the target.

An installation of GNAT Pro cross toolchain for aarch64-linux, version 24.0w (20230413) or later, on the development host.

Obtain a copy of the system libraries according to the instructions in the cross toolchain documentation and place them in a directory of your choice. NB! if you are going to copy the folders from target to the development host then make sure that all of the required libraries are installed on target before.

As an example, the files can be copied form the target board as follows:

$ mkdir ./sysroot
$ mkdir ./sysroot/usr
$ scp -rp <my-aarch64-linux-target>:/usr/include ./sysroot/usr/
$ scp -rp <my-aarch64-linux-target>:/usr/lib ./sysroot/usr/
$ scp -rp <my-aarch64-linux-target>:/usr/lib64 ./sysroot/usr/
$ scp -rp <my-aarch64-linux-target>:/lib ./sysroot/
$ scp -rp <my-aarch64-linux-target>:/lib64 ./sysroot/

Obtain a copy of the CUDA libraries from the target board and place it in the targets folder of your CUDA setup:

$ scp -rp <my-aarch64-linux-target>:/usr/local/cuda/targets/aarch64-linux ./
$ sudo mv aarch64-linux <CUDA_TOOLBOX_ROOT>/targets

Where <CUDA_TOOLBOX_ROOT> is the location of the cuda toolbox:

$CUDA_ROOT in case the toolbox was installed according to the instructions in CUDA Toolkit installation on a workstation without a suitable GPU
/usr/local/cuda in case of CUDA Toolkit installation on a workstation with NVIDIA GPU

Make the sysroot location visible to GNAT via the ENV_PREFIX environment variable:

$ export ENV_PREFIX=`pwd`/sysroot

Let the toolchain know that the intended compilation target is aarch64-linux:

$ export CUDA_HOST=aarch64-linux

3. GNAT-CUDA setup

After setting up the environment, you can extract the gnat-cuda package:

tar -xzf gnat-cuda-[version]-x86_64-linux-bin.tar.gz

Now you need to know which GPU architecture you’re targeting. This is typically an sm_ prefix followed by a number. For example sm_89 is the Ada Lovelace architecture. You can find details from the GPU architecture mapping article. You pass this parameter to the next script.

In the extracted directory, generate the tool suite setup for your current installation:

cd gnat-cuda-[version]-x86_64-linux-bin/cuda
./setup.sh [-mcpu sm_<GPU architecture>] [-clean]

If the -mcpu argument is not provided, then the setup attempts to determine the compute capability automatically using the utilities in CUDA toolbox.

The -clean argument can be optionally used for removing the temporary object files in case the environment changes and the change cannot be detected automatically by the binding generation process. This can happen, for instance, when the compiler is upgraded or the same gnat-cuda source tree is used for multiple targets (e.g. for native x86_64-linux build and aarch64-linux cross compilation) and you switch from one target to another by changing the value of $CUDA_HOST variable.

In the same directory, execute:

source ./env.sh

You need to perform the above step every time you want to compile a CUDA application.

To check if everything is correctly installed, you can try an example:

cd cuda/examples/0_Introduction/vectorAdd
make
./main

Note

In cross compilation workflow you have to copy main to target before executing it.

Note

If you are switching between different targets by changing the $CUDA_HOST variable or upgraded the compiler then the old object files can be removed by calling make clean before a new build.

After executing the code you should see:

CUDA kernel launch with  16 blocks of  256  threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done