PETSc execution on GPUs

Experimental html version of downloadable textbook, see
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 38.1 : Installation with GPUs
38.2 : Setup for GPU
38.3 : Distributed objects
38.4 : Other
Back to Table of Contents

38 PETSc execution on GPUs

38.1 Installation with GPUs

crumb trail: > petsc-gpu > Installation with GPUs

PETSc can be configured with options

--with-cuda   --with-cudac=nvcc?

You can test the presence of CUDA with:

// cudainstalled.c
#error "CUDA is not installed in this version of PETSC"

Some GPUs can accomodate MPI by being directly connected to the network through GPUDirect RMA . If not, use this runtime option:

-use_gpu_aware_mpi 0

More conveniently, add this to your .petscrc file; section  39.3.3 .

38.2 Setup for GPU

crumb trail: > petsc-gpu > Setup for GPU

GPUs need to be initialized. This can be done implicitly when a GPU object is created, or explicitly through PetscCUDAInitialize .

// cudamatself.c
ierr = PetscCUDAInitialize(comm,PETSC_DECIDE); CHKERRQ(ierr);
ierr = PetscCUDAInitializeCheck(); CHKERRQ(ierr);

38.3 Distributed objects

crumb trail: > petsc-gpu > Distributed objects

Dense matrices: MatCreateDenseCUDA , MatCreateSeqDenseCUDA , giving types MATMPIDENSECUDA , MATDENSECUDA , MATAIJCUSPARSE


All sorts of `array' operations such as MatDenseCUDAGetArray , VecCUDAGetArray ,

Set PetscMalloc to use the GPU: PetscMallocSetCUDAHost , and switch back with PetscMallocResetCUDAHost .

38.4 Other

crumb trail: > petsc-gpu > Other

The memories of a CPU and GPU are not coherent. This means that routines such as PetscMalloc1 can not immediately be used for GPU allocation. Use the routines PetscMallocSetCUDAHost and PetscMallocResetCUDAHost to switch the allocator to GPU memory and back.

Mat cuda_matrix;
PetscScalar *matdata;
ierr = PetscMallocSetCUDAHost(); CHKERRQ(ierr);
ierr = PetscMalloc1(global_size*global_size,&matdata); CHKERRQ(ierr);
ierr = PetscMallocResetCUDAHost(); CHKERRQ(ierr);
ierr = MatCreateDenseCUDA
   &cuda_matrix); CHKERRQ(ierr);

Back to Table of Contents