PETSc objects

Experimental html version of downloadable textbook, see
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 32.1 : Distributed objects
32.1.1 : Support for distributions
32.2 : Scalars
32.2.1 : Integers
32.2.2 : Complex
32.2.3 : MPI Scalars
32.2.4 : Booleans
32.3 : Vec: Vectors
32.3.1 : Vector construction
32.3.2 : Vector layout
32.3.3 : Vector operations : Split collectives
32.3.4 : Vector elements : Explicit element access
32.3.5 : File I/O
32.4 : Mat: Matrices
32.4.1 : Matrix creation
32.4.2 : Nonzero structure
32.4.3 : Matrix elements : Element access
32.4.4 : Matrix operations : Matrix-vector operations : Matrix-matrix operations
32.4.5 : Submatrices
32.4.6 : Shell matrices : Shell operations : Shell context
32.4.7 : Multi-component matrices
32.4.8 : Fourier transform
32.5 : Index sets and Vector Scatters
32.5.1 : IS: index sets
32.5.2 : VecScatter: all-to-all operations : More VecScatter modes
32.6 : AO: Application Orderings
32.7 : Partitionings
Back to Table of Contents

32 PETSc objects

32.1 Distributed objects

crumb trail: > petsc-objects > Distributed objects

PETSc is based on the SPMD model, and all its objects act like they exist in parallel, spread out over all the processes. Therefore, prior to discussing specific objects in detail, we briefly discuss how PETSc treats distributed objects.

For a matrix or vector you need to specify the size. This can be done two ways:

  • you specify the global size and PETSc distributes the object over the processes, or
  • you specify on each process the local size

If you specify both the global size and the local sizes, PETSc will check for consistency.

For example, if you have a vector of $N$ components, or a matrix of $N$ rows, and you have $P$ processes, each process will receive $N/P$ components or rows if $P$ divides evenly in~$N$. If $P$ does not divide evenly, the excess is spread over the processes.

The way the distribution is done is by contiguous blocks: with 10 processes and 1000 components in a vector, process 0 gets the range $0\cdots99$, process 1 gets $1\cdots199$, et cetera. This simple scheme suffices for many cases, but PETSc has facilities for more sophisticated load balancing.

32.1.1 Support for distributions

crumb trail: > petsc-objects > Distributed objects > Support for distributions

Once an object has been created and distributed, you do not need to remember the size or the distribution yourself: you can query these with calls such as \clstinline{VecGetSize}, \clstinline{VecGetLocalSize}.

The corresponding matrix routines \clstinline{MatGetSize}, \clstinline{MatGetLocalSize} give both information for the distributions in $i$ and~$j$ direction, which can be independent. Since a matrix is distributed by rows, \clstinline{MatGetOwnershipRange} only gives a row range.

// split.c
N = 100; n = PETSC_DECIDE;
PetscPrintf(comm,"Global %d, local %d\n",N,n);

N = PETSC_DECIDE; n = 10;
PetscPrintf(comm,"Global %d, local %d\n",N,n);

While PETSc objects are implemented using local memory on each process, conceptually they act like global objects, with a global indexing scheme. Thus, each process can query which elements out of the global object are stored locally. For vectors, the relevant routine is \clstinline{VecGetOwnershipRange}, which returns two parameters, \clstinline{low} and~\clstinline{high}, respectively the first element index stored, and one-more-than-the-last index stored.

This gives the idiom:

for (int myidx=low; myidx<high; myidx++)
  // do something at index myidx

These conversions between local and global size can also be done explicitly, using the


#include "petscsys.h"
PetscErrorCode  PetscSplitOwnership
   (MPI_Comm comm,PetscInt *n,PetscInt *N)

Collective (if n or N is PETSC_DECIDE)

Input Parameters
comm - MPI communicator that shares the object being divided
n - local length (or PETSC_DECIDE to have it set)
N - global length (or PETSC_DECIDE)
PetscSplitOwnership routine. This routine takes two parameter, for the local and global size, and whichever one is initialized to PETSC_DECIDE gets computed from the other.

32.2 Scalars

crumb trail: > petsc-objects > Scalars

Unlike programming languages that explicitly distinguish between single and double precision numbers, PETSc has only a single scalar type: PetscScalar . The precision of this is determined at installation time. In fact, a \clstinline{PetscScalar} can even be a complex number if the installation specified that the scalar type is complex.

Even in applications that use complex numbers there can be quantities that are real: for instance, the norm of a complex vector is a real number. For that reason, PETSc also has the type PetscReal . There is also an explicit PetscComplex .

Furthermore, there is

#define PETSC_BINARY_INT_SIZE    (32/8)
#define PETSC_BINARY_CHAR_SIZE   (8/8)
#define PETSC_BINARY_SCALAR_SIZE sizeof(PetscScalar)

32.2.1 Integers

crumb trail: > petsc-objects > Scalars > Integers

Integers in PETSc are likewise of a size determined at installation time: PetscInt can be 32 or 64 bits. Furthermore, there is a PetscErrorCode type for catching the return code of PETSc routines; see section  38.1.2 .

For compatibility with other packages there are two more integer types:

  • PetscBLASInt is the integer type used by the BLAS  / LAPACK library. This is 32-bits if the -download-blas-lapack option is used, but it can be 64-bit if MKL is used. The routine PetscBLASIntCast casts a PetscInt to PetscBLASInt , or returns PETSC_ERR_ARG_OUTOFRANGE if it is too large.
  • PetscMPIInt is the integer type of the MPI library, which is always 32-bits. The routine PetscMPIIntCast casts a PetscInt to PetscMPIInt , or returns PETSC_ERR_ARG_OUTOFRANGE if it is too large.

Many external packages do not support 64-bit integers.

32.2.2 Complex

crumb trail: > petsc-objects > Scalars > Complex

Numbers of type PetscComplex have a precision matching PetscReal .

Form a complex number using PETSC_i :

PetscComplex x = 1.0 + 2.0 * PETSC_i;

The real and imaginary part can be extract with the functions PetscRealPart and PetscImaginaryPart which return a PetscReal .

There are also routines VecRealPart and VecImaginaryPart that replace a vector with its real or imaginary part respectively. Likewise MatRealPart and MatImaginaryPart .

32.2.3 MPI Scalars

crumb trail: > petsc-objects > Scalars > MPI Scalars

For MPI calls, MPIU_REAL is the MPI type corresponding to the current PetscReal .

For MPI calls, MPIU_SCALAR is the MPI type corresponding to the current PetscScalar .

For MPI calls, MPIU_COMPLEX is the MPI type corresponding to the current PetscComplex .

32.2.4 Booleans

crumb trail: > petsc-objects > Scalars > Booleans

There is a PetscBool datatype with values PETSC_TRUE and PETSC_FALSE .

32.3 Vec: Vectors

crumb trail: > petsc-objects > Vec: Vectors

Vectors are objects with a linear index. The elements of a vector are floating point numbers or complex numbers (see section  32.2 ), but not integers: for that see section  32.5.1 .

32.3.1 Vector construction

crumb trail: > petsc-objects > Vec: Vectors > Vector construction

Constructing a vector takes a number of steps. First of all, the vector object needs to be created on a communicator with

PetscErrorCode VecCreate(MPI_Comm comm,Vec *v);

VecCreate( comm,v,ierr )
MPI_Comm :: comm
Vec      :: v
PetscErrorCode :: ierr

vec = PETSc.Vec()
# or:
vec = PETSc.Vec().create()


Python note

In python, \plstinline{PETSc.Vec()} creates an object with null handle, so a subsequent \plstinline{create()} call is needed. In C and Fortran, the vector type is a keyword; in Python it is a member of PETSc.Vec.Type .

The corresponding routine

#include "petscvec.h"
PetscErrorCode  VecDestroy(Vec *v)

Collective on Vec

Input Parameters:
v -the vector
VecDestroy deallocates data and zeros the pointer.

The vector type needs to be set with

#include "petscvec.h"
PetscErrorCode VecSetType(Vec vec, VecType method)

Collective on Vec

Input Parameters:
vec- The vector object
method- The name of the vector type

Options Database Key
-vec_type  -Sets the vector type; use -help for a list of available types
VecSetType .

The most common vector types are:

  • VECSEQ for sequential vectors, that is, living on a single process; This is typically created on the MPI_COMM_SELF or PETSC_COMM_SELF communicator.
  • VECMPI for a vector distributed over the communicator. This is typically created on the MPI_COMM_WORLD or PETSC_COMM_WORLD communicator, or one derived from it.

Once you have created one vector, you can make more like it by VecDuplicate ,

VecDuplicate(Vec old,Vec *new);

or VecDuplicateVecs

VecDuplicateVecs(Vec old,PetscInt n,Vec **new);

for multiple vectors. For the latter, there is a joint destroy call VecDestroyVecs :

VecDestroyVecs(PetscInt n,Vec **vecs);

(which is different in Fortran).

32.3.2 Vector layout

crumb trail: > petsc-objects > Vec: Vectors > Vector layout

Next in the creation process the vector size is set with

#include "petscvec.h"
PetscErrorCode  VecSetSizes(Vec v, PetscInt n, PetscInt N)
Collective on Vec

Input Parameters
v :the vector
n : the local size (or PETSC_DECIDE to have it set)
N : the global size (or PETSC_DECIDE)

PETSc.Vec.setSizes(self, size, bsize=None)
size is a tuple of local/global
VecSetSizes . Since a vector is typically distributed, this involves the global size and the sizes on the processors. Setting both is redundant, so it is possible to specify one and let the other be computed by the library. This is indicated by setting it to PETSC_DECIDE (\plstinline{PETSc.DECIDE} in python).

The size is queried with

VecGetSize / VecGetLocalSize

#include "petscvec.h"
PetscErrorCode  VecGetSize(Vec x,PetscInt *gsize)
PetscErrorCode  VecGetLocalSize(Vec x,PetscInt *lsize)

Input Parameter
x -the vector

Output Parameters
gsize - the global length of the vector
lsize - the local length of the vector

VecGetSize for the global size and

Each processor gets a contiguous part of the vector. Use

#include "petscvec.h"
PetscErrorCode  VecGetOwnershipRange(Vec x,PetscInt *low,PetscInt *high)

Input parameter:
x - the vector

Output parameters:
low  - the first local element, pass in NULL if not interested
high - one more than the last local element, pass in NULL if not interested

Fortran note:
VecGetOwnershipRange to query the first index on this process, and the first one of the next process.

In general it is best to let PETSc take care of memory management of matrix and vector objects, including allocating and freeing the memory. However, in cases where PETSc interfaces to other applications it maybe desirable to create a \clstinline{Vec} object from an already allocated array: VecCreateSeqWithArray and VecCreateMPIWithArray .

   (MPI_Comm comm,PetscInt bs,
    PetscInt n,PetscScalar *array,Vec *V);
   (MPI_Comm comm,PetscInt bs,
    PetscInt n,PetscInt N,PetscScalar *array,Vec *vv);

As you will see in section  32.4.1 , you can also create vectors based on the layout of a matrix, using MatCreateVecs .

32.3.3 Vector operations

crumb trail: > petsc-objects > Vec: Vectors > Vector operations

There are many routines operating on vectors that you need to write scientific applications. Examples are: norms, vector addition (including BLAS -type `AXPY' routines), pointwise scaling, inner products. A large number of such operations are available in PETSc through single function calls to {VecXYZ} routines.

For debugging purpoases, the

#include "petscvec.h"
PetscErrorCode  VecView(Vec vec,PetscViewer viewer)

for ascii output use:

PETSc.Vec.view(self, Viewer viewer=None)

ascii output is default or use:
PETSc.Viewer.STDOUT(type cls, comm=None)
VecView routine can be used to display vectors on screen as ascii output,
// fftsine.c
ierr = VecView(signal,PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr);
ierr = MatMult(transform,signal,frequencies); CHKERRQ(ierr);
ierr = VecView(frequencies,PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr);
but the routine call also use more general \clstinline{PetscViewer} objects, for instance to dump a vector to file.

Here are a couple of representative vector routines:

PetscReal lambda;
ierr = VecNorm(y,NORM_2,&lambda); CHKERRQ(ierr);
ierr = VecScale(y,1./lambda); CHKERRQ(ierr);


Create a vector where the values are a single sine wave. using VecGetSize , VecGetLocalSize , VecGetOwnershipRange . Quick visual inspection:

ibrun vec -n 12 -vec_view



Use the routines

#include "petscvec.h"
PetscErrorCode VecDot(Vec x,Vec y,PetscScalar *val)

Collective on Vec

Input Parameters:
x, y - the vectors

Output Parameter:
val - the dot product
VecDot ,
#include "petscvec.h"
PetscErrorCode VecScale(Vec x, PetscScalar alpha)

Not collective on Vec

Input Parameters:
x     - the vector
alpha - the scalar

Output Parameter:
x - the scaled vector
VecScale and
#include "petscvec.h"
PetscErrorCode  VecNorm(Vec x,NormType type,PetscReal *val)
where type is

PETSc.Vec.norm(self, norm_type=None)

where norm is variable in PETSc.NormType:
    N1, N2, FRB, INF
VecNorm to compute the inner product of vectors x,y , scale the vector  x , and check its norm: \[ \begin{array}{l} p \leftarrow x^ty\\ x \leftarrow x/p\\ n \leftarrow \|x\|_2\\ \end{array} \] Split collectives

crumb trail: > petsc-objects > Vec: Vectors > Vector operations > Split collectives

MPI is capable (in principle) of `overlapping computation and communication', or latency hiding . PETSc supports this by splitting norms and inner products into two phases.

  • Start inner product / norm with VecDotBegin  / VecNormBegin ;
  • Conclude inner product / norm with VecDotEnd  / VecNormEnd ;

Even if you achieve no overlap, it is possible to use these calls to combine a number of `collectives': do the Begin calls of one inner product and one norm; then do (in the same sequence) the End calls. This means that only a single reduction is performed on a two-word package, rather than two separate reductions on a single word.

32.3.4 Vector elements

crumb trail: > petsc-objects > Vec: Vectors > Vector elements

Setting elements of a traditional array is simple. Setting elements of a distributed array is harder. First of all, VecSet sets the vector to a constant value:

ierr = VecSet(x,1.); CHKERRQ(ierr);

In the general case, setting elements in a PETSc vector is done through a function

PetscErrorCode VecSetValue
   (Vec v,PetscInt row,PetscScalar value,InsertMode mode);

Not Collective

Input Parameters
v- the vector
row- the row location of the entry
value- the value to insert
VecSetValue for setting elements that uses global numbering; any process can set any elements in the vector. There is also a routine
#include "petscvec.h"
PetscErrorCode  VecSetValues
   (Vec x,PetscInt ni,const PetscInt
    ix[],const PetscScalar y[],InsertMode iora)

Not Collective

Input Parameters:
x - vector to insert in
ni - number of elements to add
ix - indices where to add
y - array of values
iora - either INSERT_VALUES or ADD_VALUES, where
    ADD_VALUES adds values to any existing entries, and
    INSERT_VALUES replaces existing entries with new values
VecSetValues for setting multiple elements. This is mostly useful for setting dense subblocks of a block matrix.

We illustrate both routines by setting a single element with VecSetValue , and two elements with VecSetValues . In the latter case we need an array of length two for both the indices and values. The indices need not be successive.

i = 1; v = 3.14;
ii[0] = 1; ii[1] = 2; vv[0] = 2.7; vv[1] = 3.1;

call VecSetValue(x,i,v,INSERT_VALUES,ierr)
ii(1) = 1; ii(2) = 2; vv(1) = 2.7; vv(2) = 3.1
call VecSetValues(x,2,ii,vv,INSERT_VALUES,ierr)

Using \clstinline{VecSetValue} for specifying a local vector element corresponds to simple insertion in the local array. However, an element that belongs to another process needs to be transferred. This done in two calls:

#include "petscvec.h"
PetscErrorCode  VecAssemblyBegin(Vec vec)
PetscErrorCode  VecAssemblyEnd(Vec vec)

Collective on Vec

Input Parameter
vec -the vector

VecAssemblyBegin and VecAssemblyEnd .

if (myrank==0) then
   do vecidx=0,globalsize-1
      vecelt = vecidx
      call VecSetValue(vector,vecidx,vecelt,INSERT_VALUES,ierr)
   end do
end if
call VecAssemblyBegin(vector,ierr)
call VecAssemblyEnd(vector,ierr)

(If you know the MPI library, you'll recognize that the first call corresponds to posting nonblocking send and receive calls; the second then contains the wait calls. Thus, the existence of these separate calls make latency hiding possible.)

// do work that does not need the vector myvec

Elements can either be inserted with INSERT_VALUES , or added with ADD_VALUES in the VecSetValue  / VecSetValues call. You can not immediately mix these modes; to do so you need to call VecAssemblyBegin  / VecAssemblyEnd in between add/insert phases. Explicit element access

crumb trail: > petsc-objects > Vec: Vectors > Vector elements > Explicit element access

Since the vector routines cover a large repertoire of operations, you hardly ever need to access the actual elements. Should you still need those elements, you can use

#include "petscvec.h"
PetscErrorCode VecGetArray(Vec x,PetscScalar **a)
PetscErrorCode VecGetArrayRead(Vec x,const PetscScalar **a)

Input Parameter
x : the vector

Output Parameter
a : location to put pointer to the array

PetscErrorCode VecRestoreArray(Vec x,PetscScalar **a)
PetscErrorCode VecRestoreArrayRead(Vec x,const PetscScalar **a)

Input Parameters
x : the vector
a : location of pointer to array obtained from VecGetArray()

use petscvec
VecGetArrayF90(Vec x,{Scalar, pointer :: xx_v(:)},integer ierr)
(there is a Fortran77 version)
VecRestoreArrayF90(Vec x,{Scalar, pointer :: xx_v(:)},integer ierr)

PETSc.Vec.getArray(self, readonly=False)
?? PETSc.Vec.resetArray(self, force=False)
VecGetArray for general access or

PETSc insists that you properly release this pointer again with

#include "petscvec.h"
PetscErrorCode VecRestoreArray(Vec x,PetscScalar **a)

Logically Collective on Vec

Input Parameters:
x- the vector
a- location of pointer to array obtained from VecGetArray()

use petscvec
VecRestoreArrayF90(Vec x,{Scalar, pointer :: xx_v(:)},integer ierr)

Input Parameters:
x- vector
xx_v- the Fortran90 pointer to the array
VecRestoreArray or

Note that in a distributed running context you can only get the array of local elements. Accessing the elements from another process requires explicit communication; see section  32.5.2 .

PetscScalar *in_array,*out_array;
for (int i=0; i<localsize; i++)
  out_array[i] = 2*in_array[i];

There are some variants to the \clstinline{VecGetArray} operation:

  • \clstinline{Vec} object, and replaces it with a different array. That latter array needs to be allocated with PetscMalloc .
  • Replace the storage of a vector by another array
    #include "petscvec.h"
    PetscErrorCode  VecPlaceArray(Vec vec,const PetscScalar array[])
    PetscErrorCode  VecReplaceArray(Vec vec,const PetscScalar array[])
    Input Parameters
    vec   - the vector
    array - the array
    VecPlaceArray also installs a new array in the vector, but it keeps the original array; this can be restored with VecResetArray .

Putting the array of one vector into another has a common application, where you have a distributed vector, but want to apply PETSc operations to its local section as if it were a sequential vector. In that case you would create a sequential vector, and VecPlaceArray the contents of the distributed vector into it.

Fortran note

There are routines such as VecGetArrayF90 (with corresponding VecRestoreArrayF90 ) that return a (Fortran) pointer to a one-dimensional array.

// vecset.F90
Vec            :: vector
PetscScalar,dimension(:),pointer :: elements
call VecGetArrayF90(vector,elements,ierr)
write (msg,10) myrank,elements(1)
10 format("First element on process",i3,":",f7.4,"\n")
call PetscSynchronizedPrintf(comm,msg,ierr)
call PetscSynchronizedFlush(comm,PETSC_STDOUT,ierr)
call VecRestoreArrayF90(vector,elements,ierr)

32.3.5 File I/O

crumb trail: > petsc-objects > Vec: Vectors > File I/O

As mentioned above, VecView can be used for displaying a vector on the terminal screen. However, viewers are actually much more general. As explained in section  38.2.2 , they can also be used to export vector data, for instance to file.

The converse operation, to load a vector that was exported in this manner, is VecLoad .

Since these operations are each other's inverses, usually you don't need to know the file format. But just in case:

PetscInt    number of rows
PetscScalar *values of all entries

That is, the file starts with a magic number, then the number of vector elements, and subsequently all scalar values.

32.4 Mat: Matrices

crumb trail: > petsc-objects > Mat: Matrices

PETSc matrices come in a number of types, sparse and dense being the most important ones. Another possibility is to have the matrix in operation form, where only the action $y\leftarrow Ax$ is defined.

32.4.1 Matrix creation

crumb trail: > petsc-objects > Mat: Matrices > Matrix creation

Creating a matrix also starts by specifying a communicator on which the matrix lives collectively:

PetscErrorCode MatCreate(MPI_Comm comm,Mat *v);

mat = PETSc.Mat()
# or:
mat = PETSc.Mat().create()


Set the matrix type with

#include "petscmat.h"
PetscErrorCode  MatSetType(Mat mat, MatType matype)

Collective on Mat

Input Parameters:
mat- the matrix object
matype- matrix type

Options Database Key
-mat_type  -Sets the type; use -help for a list of available methods (for instance, seqaij)
MatSetType . The main choices are between sequential versus distributed and dense versus sparse, giving types: MATMPIDENSE , MATMPIAIJ , MATSEQDENSE , MATSEQAIJ .

Distributed matrices are partitioned by block rows: each process stores a block row , that is, a contiguous set of matrix rows. It stores all elements in that block row.

FIGURE 32.1: Matrix partitioning by block rows

In order for a matrix-vector product to be executable, both the input and output vector need to be partitioned conforming to the matrix.

While for dense matrices the block row scheme is not scalable, for matrices from PDEs it makes sense. There, a subdivision by matrix blocks would lead to many empty blocks.

Just as with vectors, there is a local and global size; except that that now applies to rows and columns. Set sizes with

#include "petscmat.h"
PetscErrorCode MatSetSizes(Mat A,
    PetscInt m, PetscInt n, PetscInt M, PetscInt N)

Input Parameters
A : the matrix
m : number of local rows (or PETSC_DECIDE)
n : number of local columns (or PETSC_DECIDE)
M : number of global rows (or PETSC_DETERMINE)
N : number of global columns (or PETSC_DETERMINE)

PETSc.Mat.setSizes(self, size, bsize=None)
where 'size' is a tuple of 2 global sizes
or a tuple of 2 local/global pairs
MatSetSizes and subsequently query them with
#include "petscmat.h"
PetscErrorCode MatGetSize(Mat mat,PetscInt *m,PetscInt *n)
PetscErrorCode MatGetLocalSize(Mat mat,PetscInt *m,PetscInt *n)

PETSc.Mat.getSize(self) # tuple of global sizes
PETSc.Mat.getLocalSize(self) # tuple of local sizes
PETSc.Mat.getSizes(self) # tuple of local/global size tuples
MatSizes . The concept of local column size is tricky: since a process stores a full block row you may expect the local column size to be the full matrix size, but that is not true. The exact definition will be discussed later, but for square matrices it is a safe strategy to let the local row and column size to be equal.

Instead of querying a matrix size and creating vectors accordingly, the routine

Get vector(s) compatible with the matrix, i.e. with the same parallel layout

#include "petscmat.h"
PetscErrorCode MatCreateVecs(Mat mat,Vec *right,Vec *left)

Collective on Mat

Input Parameter
mat - the matrix

Output Parameter;
right 	- (optional) vector that the matrix can be multiplied against
left 	- (optional) vector that the matrix vector product can be stored in
MatCreateVecs can be used. (Sometimes this is even required; see section  32.4.8 .)

32.4.2 Nonzero structure

crumb trail: > petsc-objects > Mat: Matrices > Nonzero structure

In case of a dense matrix, once you have specified the size and the number of MPI processes, it is simple to determine how much space PETSc needs to allocate for the matrix. For a sparse matrix this is more complicated, since the matrix can be anywhere between completely empty and completely filled in. It would be possible to have a dynamic approach where, as elements are specified, the space grows; however, repeated allocations and re-allocations are inefficient. For this reason PETSc puts a small burden on the programmer: you need to specify a bound on how many elements the matrix will contain.

We explain this by looking at some cases. First we consider a matrix that only lives on a single process. You would then use

#include "petscmat.h"
PetscErrorCode  MatSeqAIJSetPreallocation
  (Mat B,PetscInt nz,const PetscInt nnz[])
PetscErrorCode  MatMPIAIJSetPreallocation
  (Mat B,PetscInt d_nz,const PetscInt d_nnz[],
   PetscInt o_nz,const PetscInt o_nnz[])

Input Parameters

B - the matrix
nz/d_nz/o_nz - number of nonzeros per row in matrix or
    diagonal/off-diagonal portion of local submatrix
nnz/d_nnz/o_nnz - array containing the number of nonzeros in the various rows of
    the sequential matrix / diagonal / offdiagonal part of the local submatrix
    or NULL (PETSC_NULL_INTEGER in Fortran) if nz/d_nz/o_nz is used.

PETSc.Mat.setPreallocationNNZ(self, [nnz_d,nnz_o] )
PETSc.Mat.setPreallocationCSR(self, csr)
PETSc.Mat.setPreallocationDense(self, array)
MatSeqAIJSetPreallocation . In the case of a tridiagonal matrix you would specify that each row has three elements:

MatSeqAIJSetPreallocation(A,3, NULL);

If the matrix is less regular you can use the third argument to give an array of explicit row lengths:

int *rowlengths;
// allocate, and then:
for (int row=0; row<nrows; row++)
  rowlengths[row] = // calculation of row length

In case of a distributed matrix you need to specify this bound with respect to the block structure of the matrix. As illustrated in figure  32.2 , a matrix has a diagonal part and an off-diagonal part.

FIGURE 32.2: The diagonal and off-diagonal parts of a matrix

The diagonal part describes the matrix elements that couple elements of the input and output vector that live on this process. The off-diagonal part contains the matrix elements that are multiplied with elements not on this process, in order to compute elements that do live on this process.

The preallocation specification now has separate parameters for these diagonal and off-diagonal parts: with you specify for both either a global upper bound on the number of nonzeros, or a detailed listing of row lengths. For the matrix of the Laplace equation , this specification would seem to be:

MatMPIAIJSetPreallocation(A, 3, NULL, 2, NULL);

However, this is only correct if the block structure from the parallel division equals that from the lines in the domain. In general it may be necessary to use values that are an overestimate. It is then possible to contract the storage by copying the matrix.

Specifying bounds on the number of nonzeros is often enough, and not too wasteful. However, if many rows have fewer nonzeros than these bounds, a lot of space is wasted. In that case you can replace the NULL arguments by an array that lists for each row the number of nonzeros in that row.

32.4.3 Matrix elements

crumb trail: > petsc-objects > Mat: Matrices > Matrix elements

You can set a single matrix element with

PetscErrorCode MatSetValue(
    Mat m,PetscInt row,PetscInt col,PetscScalar value,InsertMode mode)

Input Parameters
m : the matrix
row : the row location of the entry
col : the column location of the entry
value : the value to insert

PETSc.Mat.setValue(self, row, col, value, addv=None)
also supported:
A[row,col] = value
MatSetValue or a block of them, where you supply a set of $i$ and $j$ indices, using MatSetValues .

After setting matrix elements, the matrix needs to be assembled. This is where PETSc moves matrix elements to the right processor, if they were specified elsewhere. As with vectors this takes two calls:

#include "petscmat.h"
PetscErrorCode MatAssemblyBegin(Mat mat,MatAssemblyType type)
PetscErrorCode MatAssemblyEnd(Mat mat,MatAssemblyType type)

Input Parameters
mat- the matrix
type- type of assembly, either MAT_FLUSH_ASSEMBLY

assemble(self, assembly=None)
assemblyBegin(self, assembly=None)
assemblyEnd(self, assembly=None)

there is a class PETSc.Mat.AssemblyType:
MatAssemblyBegin and which can be used to achieve latency hiding .

Elements can either be inserted ( INSERT_VALUES ) or added ( ADD_VALUES ). You can not immediately mix these modes; to do so you need to call MatAssemblyBegin  / MatAssemblyEnd with a value of MAT_FLUSH_ASSEMBLY .

PETSc sparse matrices are very flexible: you can create them empty and then start adding elements. However, this is very inefficient in execution since the OS needs to reallocate the matrix every time it grows a little. Therefore, PETSc has calls for the user to indicate how many elements the matrix will ultimately contain.


crumb trail: > petsc-objects > Mat: Matrices > Matrix elements > Element access

If you absolutely need access to the matrix elements, there are routines such as


#include "petscmat.h"
PetscErrorCode MatGetRow
   (Mat mat,PetscInt row,
    PetscInt *ncols,const PetscInt *cols[],const PetscScalar *vals[])
PetscErrorCode MatRestoreRow
   (Mat mat,PetscInt row,
    PetscInt *ncols,const PetscInt *cols[],const PetscScalar *vals[])

Input Parameters:
mat - the matrix
row - the row to get

Output Parameters
ncols - if not NULL, the number of nonzeros in the row
cols - if not NULL, the column numbers
vals - if not NULL, the values
MatGetRow . With this, any process can request, using global row numbering, the contents of a row that it owns. (Requesting elements that are not local requires the different mechanism of taking submatrices; section  32.4.5 .)

Since PETSc is geared towards sparse matrices , this returns not only the element values, but also the column numbers, as well as the mere number of stored columns. If any of these three return values are not needed, they can be unrequested by setting the parameter passed to \clstinline{NULL}.

PETSc insists that you properly release the row again with

It is also possible to retrieve the full CRS contents of the local matrix with MatDenseGetArray , MatDenseRestoreArray , MatSeqAIJGetArray , MatSeqAIJRestoreArray . (Routines are deprecated.)

32.4.4 Matrix operations

crumb trail: > petsc-objects > Mat: Matrices > Matrix operations Matrix-vector operations

crumb trail: > petsc-objects > Mat: Matrices > Matrix operations > Matrix-vector operations

In the typical application of PETSc, solving large sparse linear systems of equations with iterative methods, matrix-vector operations are most important. Foremost there is the matrix-vector product

#include "petscmat.h"
PetscErrorCode MatMult(Mat mat,Vec x,Vec y)
PetscErrorCode MatMultTranspose(Mat mat,Vec x,Vec y)

Neighbor-wise Collective on Mat

Input Parameters
mat - the matrix
x - the vector to be multiplied

Output Parameters
y - the result
MatMult and the transpose product (In the complex case, the transpose product is not the Hermitian matrix product; for that use MatMultHermitianTranspose .)

For the BLAS gemv semantics $y\leftarrow \alpha Ax + \beta y$,

#include "petscmat.h"
PetscErrorCode MatMultAdd(Mat mat,Vec x,Vec y,Vec z)

Neighbor-wise Collective on Mat

Input Parameters
mat 	- the matrix
x, y 	- the vectors

Output Parameters
z -the result

The vectors x and z cannot be the same.
MatMultAdd computes $z\leftarrow Ax +y $. Matrix-matrix operations

crumb trail: > petsc-objects > Mat: Matrices > Matrix operations > Matrix-matrix operations

There is a number of matrix-matrix routines such as MatMatMult .

32.4.5 Submatrices

crumb trail: > petsc-objects > Mat: Matrices > Submatrices

Given a parallel matrix, there are two routines for extracting submatrices:

  • MatCreateSubMatrix creates a single parallel submatrix.
  • MatCreateSubMatrices creates a sequential submatrix on each process.

32.4.6 Shell matrices

crumb trail: > petsc-objects > Mat: Matrices > Shell matrices

In many scientific applications, a matrix stands for some operator, and we are not intrinsically interested in the matrix elements, but only in the action of the matrix on a vector. In fact, under certain circumstances it is more convenient to implement a routine that computes the matrix action than to construct the matrix explicitly.

Maybe surprisingly, solving a linear system of equations can be handled this way. The reason is that PETSc's iterative solvers (section  35.1 ) only need the matrix-times-vector (and perhaps the matrix-transpose-times-vector) product.

PETSc supports this mode of working. The routine

#include "petscmat.h"
PetscErrorCode  MatCreateShell
   (MPI_Comm comm,
    PetscInt m,PetscInt n,PetscInt M,PetscInt N,
    void *ctx,Mat *A)


Input Parameters:
comm- MPI communicator
m- number of local rows (must be given)
n- number of local columns (must be given)
M- number of global rows (may be PETSC_DETERMINE)
N- number of global columns (may be PETSC_DETERMINE)
ctx- pointer to data needed by the shell matrix routines

Output Parameter:
A -the matrix
MatCreateShell declares the argument to be a matrix given in operator form. Shell operations

crumb trail: > petsc-objects > Mat: Matrices > Shell matrices > Shell operations

The next step is then to add the custom multiplication routine, which will be invoked by MatMult :

#include "petscmat.h"
PetscErrorCode MatShellSetOperation
   (Mat mat,MatOperation op,void (*g)(void))

Logically Collective on Mat

Input Parameters:
mat- the shell matrix
op- the name of the operation
g- the function that provides the operation.

The routine that implements the actual product should have the same prototype as MatMult , accepting a matrix and two vectors. The key to realizing your own product routine lies in the `context' argument to the create routine. With

#include "petscmat.h"
PetscErrorCode  MatShellSetContext(Mat mat,void *ctx)

Input Parameters
mat - the shell matrix
ctx - the context
MatShellSetContext you pass a pointer to some structure that contains all contextual information you need. In your multiplication routine you then retrieve this with
#include "petscmat.h"
PetscErrorCode  MatShellGetContext(Mat mat,void *ctx)

Not Collective

Input Parameter:
mat -the matrix, should have been created with MatCreateShell()

Output Parameter:
ctx -the user provided context
MatShellGetContext .

What operation is specified is determined by a keyword MATOP_<OP>

where OP is the name of the matrix routine, minus the Mat part, in all caps.


(The call to MatSetSizes needs to come before MatSetType .) Shell context

crumb trail: > petsc-objects > Mat: Matrices > Shell matrices > Shell context

Setting the context means passing a pointer (really: an address) to some allocated structure

struct matrix_data mystruct;
MatShellSetContext( A, &mystruct );

The routine prototype has this argument as a \clstinline{void*} but it's not necessary to cast it to that. Getting the context means that a pointer to your structure needs to be set

struct matrix_data *mystruct;
MatShellGetContext( A, &mystruct );

Somewhat confusingly, the Get routine also has a \clstinline{void*} argument, even though it's really a pointer variable.

32.4.7 Multi-component matrices

crumb trail: > petsc-objects > Mat: Matrices > Multi-component matrices

For multi-component physics problems there are essentially two ways of storing the linear system

  1. Grouping the physics equations together, or
  2. grouping the domain nodes together.

In both cases this corresponds to a block matrix, but for a problem of $N$ nodes and $3$ equations, the respective structures are:

  1. $3\times 3 $ blocks of size $N$, versus
  2. $N\times N$ blocks of size $3$.

The first case can be pictured as \[ \begin{pmatrix} A_{00}&A_{01}&A_{02}\\ A_{10}&A_{11}&A_{12}\\ A_{20}&A_{21}&A_{22}\\ \end{pmatrix} \] and while it looks natural, there is a computational problem with it. Preconditioners for such problems often look like \[ \begin{pmatrix} A_{00}&\\ &A_{11}&\\ &&A_{22}\\ \end{pmatrix} \quad\hbox{or}\quad \begin{pmatrix} A_{00}&&\\ A_{10}&A_{11}&\\ A_{20}&A_{21}&A_{22}\\ \end{pmatrix} \] With the block-row partitioning of PETSc's matrices, this means at most a 50\% efficiency for the preconditioner solve.

It is better to use the second scheme, which requires the MATMPIBIJ format, and use so-called field-split preconditioner s; see section .

32.4.8 Fourier transform

crumb trail: > petsc-objects > Mat: Matrices > Fourier transform

The FFT can be considered a matrix-vector multiplication. PETSc supports this by letting you create a matrix with MatCreateFFT . This requires that you add an FFT library, such as fftw , at configuration time; see section  31.3.4 .

FFT libraries may use padding, so vectors should be created with not with an independent VecSetSizes .

32.5 Index sets and Vector Scatters

crumb trail: > petsc-objects > Index sets and Vector Scatters

In the PDE type of applications that PETSc was originally intended for, vector data can only be real or complex: there are no vector of integers. On the other hand, integers are used for indexing into vector, for instance for gathering boundary elements into a halo region , or for doing the data transpose of an FFT operation.

To support this, PETSc has the following object types:

  • An \clstinline{IS} object describes a set of integer indices;
  • a \clstinline{VecScatter} object describes the correspondence between a group of indices in an input vector and a group of indices in an output vector.

32.5.1 IS: index sets

crumb trail: > petsc-objects > Index sets and Vector Scatters > IS: index sets

An \clstinline{IS} object contains a set of PetscInt values. It can be created with

  • ISCreate for creating an empty set;
  • ISCreateStride for a strided set;
  • ISCreateBlock for a set of contiguous blocks, placed at an explicitly given list of starting indices.
  • ISCreateGeneral for an explicitly given list of indices.

For example, to describe odd and even indices (on two processes):

// oddeven.c
IS oddeven;
if (procid==0) {
  ierr = ISCreateStride(comm,Nglobal/2,0,2,&oddeven); CHKERRQ(ierr);
} else {
  ierr = ISCreateStride(comm,Nglobal/2,1,2,&oddeven); CHKERRQ(ierr);

After this, there are various query and set operations on index sets.

You can read out the indices of a set by ISGetIndices and ISRestoreIndices .

32.5.2 VecScatter: all-to-all operations

crumb trail: > petsc-objects > Index sets and Vector Scatters > VecScatter: all-to-all operations

A VecScatter object is a generalization of an all-to-all operation. However, unlike MPI MPI_Alltoall , which formulates everything in terms of local buffers, a \clstinline{VecScatter} is more implicit in only describing indices in the input and output vectors.


Creates a vector scatter context. Collective on Vec

#include "petscvec.h"
PetscErrorCode VecScatterCreate(Vec xin,IS ix,Vec yin,IS iy,VecScatter *newctx)

Input Parameters:
xin : a vector that defines the layout of vectors from which we scatter
yin : a vector that defines the layout of vectors to which we scatter
ix : the indices of xin to scatter (if NULL scatters all values)
iy : the indices of yin to hold results (if NULL fills entire vector yin)

Output Parameter
newctx : location to store the new scatter context
VecScatterCreate call has as arguments:

  • An input vector. From this, the parallel layout is used; any vector being scattered from should have this same layout.
  • An \clstinline{IS} object describing what indices are being scattered; if the whole vector is rearranged, \clstinline{NULL} (Fortran:
  • An output vector. From this, the parallel layout is used; any vector being scattered into should have this same layout.
  • An \clstinline{IS} object describing what indices are being scattered into; if the whole vector is a target, \clstinline{NULL} can be given.

As a simple example, the odd/even sets defined above can be used to move all components with even index to process zero, and the ones with odd index to process one:

VecScatter separate;
ierr = VecScatterCreate
  (in,oddeven,out,NULL,&separate); CHKERRQ(ierr);
ierr = VecScatterBegin
ierr = VecScatterEnd

Note that the index set is applied to the input vector, since it describes the components to be moved. The output vector uses \clstinline{NULL} since these components are placed in sequence.


Modify this example so that the components are still separated odd/even, but now placed in descending order on each process.


Can you extend this example so that process $p$ receives all indices that are multiples of $p$? Is your solution correct if \clstinline{Nglobal} is not a multiple of \clstinline{nprocs}? More VecScatter modes

crumb trail: > petsc-objects > Index sets and Vector Scatters > VecScatter: all-to-all operations > More VecScatter modes

There is an added complication, in that a \clstinline{VecScatter} can have both sequential and parallel input or output vectors. Scattering onto process zero is also a popular option.

32.6 AO: Application Orderings

crumb trail: > petsc-objects > AO: Application Orderings

PETSc's decision to partition a matrix by contiguous block rows may be a limitation in the sense an application can have a natural ordering that is different. For such cases the AO type can translate between the two schemes.

32.7 Partitionings

crumb trail: > petsc-objects > Partitionings

By default, PETSc uses partitioning of matrices and vectors based on consecutive blocks of variables. In regular cases that is not a bad strategy. However, for some matrices a permutation and re-division can be advantageous. For instance, one could look at the adjacency graph , and minimize the number of edge cuts or the sum of the edge weight s.

This functionality is not built into PETSc, but can be provided by graph partitioning packages such as ParMetis or Zoltan . The basic object is the MatPartitioning , with routines for

  • Create and destroy: MatPartitioningCreate , MatPartitioningDestroy ;
  • Setting the type MatPartitioningSetType to an explicit partitioner, or something generated as the dual or a refinement of the current matrix;
  • Apply with MatPartitioningApply , giving a distribued IS object, which can then be used in MatCreateSubMatrix to repartition.

Illustrative example:

MatPartitioning part;
/* get new global number of each old global number */

Other scenario:

Back to Table of Contents