PETSc execution on GPUs

Experimental html version of downloadable textbook, see http://www.tacc.utexas.edu/~eijkhout/istc/istc.html
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 37.1 : Installation with GPUs
37.2 : Setup for GPU
37.3 : Distributed objects
37.4 : Other
Back to Table of Contents

37 PETSc execution on GPUs

37.1 Installation with GPUs

crumb trail: > petsc-gpu > Installation with GPUs

PETSc can be configured with options

--with-cuda   --with-cudac=nvcc?

You can test the presence of CUDA with:

// cudainstalled.c
#ifndef PETSC_HAVE_CUDA
#error "CUDA is not installed in this version of PETSC"
#endif

Some GPUs can accomodate MPI by being directly connected to the network through GPUDirect RMA . If not:

-use_gpu_aware_mpi 0

37.2 Setup for GPU

crumb trail: > petsc-gpu > Setup for GPU

GPUs need to be initialized. This can be done implicitly when a GPU object is created, or explicitly through PetscCUDAInitialize .

// cudamatself.c
ierr = PetscCUDAInitialize(comm,PETSC_DECIDE); CHKERRQ(ierr);
ierr = PetscCUDAInitializeCheck(); CHKERRQ(ierr);

37.3 Distributed objects

crumb trail: > petsc-gpu > Distributed objects

Dense matrices: MatCreateDenseCUDA , MatCreateSeqDenseCUDA , giving types MATMPIDENSECUDA , MATDENSECUDA , MATAIJCUSPARSE

Also VecCreateSeqCUDA , VecCreateMPICUDAWithArray , VECCUDA , VECSEQCUDA , VECMPICUDA .

All sorts of `array' operations such as MatDenseCUDAGetArray , VecCUDAGetArray ,

Set PetscMalloc to use the GPU: PetscMallocSetCUDAHost , and switch back with PetscMallocResetCUDAHost .

37.4 Other

crumb trail: > petsc-gpu > Other

The memories of a CPU and GPU are not coherent. This means that routines such as PetscMalloc1 can not immediately be used for GPU allocation. Use the routines PetscMallocSetCUDAHost and PetscMallocResetCUDAHost to switch the allocator to GPU memory and back.

Mat cuda_matrix;
PetscScalar *matdata;
ierr = PetscMallocSetCUDAHost(); CHKERRQ(ierr);
ierr = PetscMalloc1(global_size*global_size,&matdata); CHKERRQ(ierr);
ierr = PetscMallocResetCUDAHost(); CHKERRQ(ierr);
ierr = MatCreateDenseCUDA
  (comm,
   global_size,global_size,global_size,global_size,
   matdata,
   &cuda_matrix); CHKERRQ(ierr);

Back to Table of Contents