# Graph theory

$%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% %%%% This text file is part of the source of %%%% Introduction to High-Performance Scientific Computing' %%%% by Victor Eijkhout, copyright 2012-2020 %%%% %%%% mathjax.tex : macros to facility mathjax use in html version %%%% %%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{{\equiv\atop\mathrm{D}}}}} \newcommand\macro{\langle#1\rangle} \newcommand\dtdxx{\frac{\alpha\Delta t}{\Delta x^2}}$ 18.1 : Definitions
18.2 : Common types of graphs
18.2.1 : Directed Acyclic Graphs
18.2.2 : Trees
18.2.3 : Bipartite graphs
18.3 : Graph colouring and independent sets
18.4 : Graph algorithms
18.5 : Graphs and matrices
18.5.1 : Permutation
18.5.2 : Irreducibility
18.5.3 : Graph closure
18.5.4 : Graph operations in linear algebra
18.5.4.1 : Markov chains
18.5.4.2 : General matrix-vector product
18.6 : Spectral graph theory
18.6.1 : The graph Laplacian
18.6.2 : Domain decomposition through Laplacian matrices
18.6.3 : Cheeger's inequality

# 18 Graph theory

Graph theory is the branch of mathematics that studies pairwise relations between objects. Graphs both appear as tools for analyzing issues in HPC , and as objects of study themselves. This appendix introduces the basic concepts and some relevant theory.

## 18.1 Definitions

crumb trail: > graph > Definitions

A graph consists of a set of objects, and set of relations between them. The objects, called the nodes or vertices of the graph, usually form a finite set, so we usually identify them with consecutive integers $1\ldots n$ or $0\ldots n-1$. The relation that holds between nodes is described by the edges of the graph: if $i$ and~$j$ are related, we say that $( i,j)$ is an edge of the graph. This relation does not need to be symmetric, take for instance the less than' relation.

Formally, then, a graph is a tuple $G=\langle V,E\rangle$ where $V=\{1,\ldots n\}$ for some~$n$, and $E\subset\{(i,j)\colon 1\leq i,j\leq n,\,i\not=j\}$.

A graph is called an undirected graph if $(i,j)\in E\Leftrightarrow (j,i)\in E$. The alternative is a directed graph , where we indicate an edge $(i,j)$ with an arrow from $i$ to~$j$.

Two concepts that often appear in graph theory are the degree and the diameter of a graph.

Definition

The degree denotes the maximum number of nodes that are connected to any node: $d(G)\equiv \max_i \left|\{j\colon j\not=i\wedge (i,j)\in E\}\right|.$

Definition

The diameter of a graph is the length of the longest shortest path in the graph, where a path is defined as a set of vertices $v_1,\ldots, v_{k+1}$ such that $v_i\not=v_j$ for all $i\not=j$ and $\forall_{1\leq i\leq k}\colon (v_i,v_{i+1})\in E.$ The length of this path is~$k$.

The concept of diameter is illustrated in figure~ .

A~path where all nodes are disjoint except for $v_1=v_{k+1}$ is called a cycle .

Sometimes we are only interested in the mere existence of an edge $(i,j)$, at other times we attach a value or weight' $w_{ij}$ to that edge. A~graph with weighted edges is called a weighted graph . Such a graph can be represented as a tuple $G=\langle V,E,W\rangle$ where $E$ and $W$ have the same cardinality. Conversely, an unweighted graph has all weights the same value, in which case we omit mention of weights.

## 18.2 Common types of graphs

crumb trail: > graph > Common types of graphs

### 18.2.1 Directed Acyclic Graphs

crumb trail: > graph > Common types of graphs > Directed Acyclic Graphs

A graph that does not have cycles is called acyclic . A special case of this type of graph is the DAG . This type of graph can for instance be used to model dependencies between tasks: if there is an edge between $i,j$, it means that task $i$ has to be done before task~$j$.

### 18.2.2 Trees

crumb trail: > graph > Common types of graphs > Trees

One special case of DAGs is the tree graph : here any node can have multiple incoming edges, but only one outgoing edge. Nodes with no incoming edges are leaf nodes ; a node with no outgoing edges is called a root (can there be more than one root?), and all other nodes are called interior nodes .

### 18.2.3 Bipartite graphs

crumb trail: > graph > Common types of graphs > Bipartite graphs

If a graph can be partitioned into two sets of nodes, where edges only run from the one set to the other, but not inside a set, we call this a bipartite graph .

## 18.3 Graph colouring and independent sets

crumb trail: > graph > Graph colouring and independent sets

We can assign labels to the nodes of a graph, which is equivalent to partitioning the set of nodes into disjoint subsets. One type of labeling that is of interest is graph colouring : here the labels (or colours') are chosen so that, if nodes $i$ and~$j$ have the same colour, there is no edge connecting them: $(i,j)\not\in E$.

There is a trivial colouring of a graph, where each node has its own colour. More interestingly, the minimum number of colours with which you can colour a graph is called the colour number of the graph.

Exercise

Show that, if a graph has degree~$d$, the colour number is at most~$d+1$.

A~famous graph colouring problem is the four colour theorem': if a graph depicts countries on a two-dimensional map (a so-called planar' graph), then the colour number is at most four. In general, finding the colour number is hard (in fact, NP-hard).

The colour sets of a graph colouring are also called independent sets , since within each colour no node is connected to a node of the same colour.

There is a trivial way of finding independent sets: declare each node to have its own unique colour. On the other hand, finding the best' division in independent sets, for instance through finding the colour number of the graph, is difficult. However, often it is enough to find a reasonable partitioning of the nodes into independent sets, for instance in constructing paralell ILU preconditioners; section~ . The following algorithm does that~ [jopl94,Luby:parallel] :

• Give each node a unique random number.
• Now find the set of nodes that have a higher number than all of their neighbours; call this the first independent set.
• Remove this set from the graph, and find again the nodes with a higher number than all their neighbours; this will be the second set.
• Repeat this procedure until all nodes are in an independent set.
Exercise

Convince yourself that the sets found this way are indeed independent.

## 18.4 Graph algorithms

crumb trail: > graph > Graph algorithms

In this section we only briefly touch on graph algorithms; a full discussion can be found in section  .

• Distance algorithms. In applications such as traffic routing it is of interest to know the shortest distance from a given node to all others: the single source shortest path problem; or the shortest distance for any pair: the all-pairs shortest path problem.
• Connectivity algorithms. In social networks it is of interest to know if two people are connected at all, if the graph can be split up into unconnected subgraphs, or whether some person is the crucial bridge between two groups.

## 18.5 Graphs and matrices

crumb trail: > graph > Graphs and matrices

A graph can be rendered in a number of ways. You could of course just list nodes and edges, but little insight can be derived that way. Simple graphs can be visualized by drawing vertices and edges, but for large graphs this becomes unwieldy. Another option is to construct the adjacency matrix of the graph. For a graph $G=\langle V,E\rangle$, the adjacency matrix $M$ (with a size $n$ equal to the number of vertices $|V|$) is defined by $M_{ij}= \begin{cases} 1&(i,j)\in E\\ 0&\mbox{otherwise} \end{cases}$ Conversely, if you have a matrix, especially a sparse matrix , you can construct its adjacency graph . This is illustrated in figure  for

both a dense and a sparse matrix. In this example, the matrices are structurally symmetric, so we use lines instead of arrows in the graphs. There is an edge on each vertex corresponding to the diagonal element; this edge will often be left out of illustrations.

For graphs with edge weights, we set the elements of the adjacency matrix to the weights: $M_{ij}= \begin{cases} w_{ij}&(i,j)\in E\\ 0&\mbox{otherwise} \end{cases}$

If a matrix has no zero elements, its adjacency graph has an edge between each pair of vertices. Such a graph is called a clique . If the graph is undirected, the adjacency matrix is symmetric, and conversely, if a matrix is structurally symmetric , its adjacency graph is undirected.

### 18.5.1 Permutation

crumb trail: > graph > Graphs and matrices > Permutation

Graphs are often used to indicate relations between objects in the real world. One example would be friend-of' relations in Facebook. In such cases, the nodes in a graph do not have a natural numbering: they are identified by a name and any numbering is artificial. Thus, we could wonder which graph properties remain invariant, and which ones change, if we apply a different numbering.

Renumbering a set of objects can be modeled algebraically by applying a permutation matrix .

Definition

A permutation matrix is a square matrix where each row and column has exactly one element equal to one; all other elements are zero.

Exercise

Let a set of $N$ objects $x_1,…,x_N$ be given. What is the permutation matrix that orders them as $x_1,x_3,…,x_2,x_4,…$? That is, find the matrix $P$ such that $\begin{pmatrix} x_1\\x_3\\\vdots\\x_2\\x_4\\\vdots \end{pmatrix} = P \begin{pmatrix} x_1\\\vdots\\x_N \end{pmatrix}$

Exercise

Show that the eigenvalues of a matrix are invariant under permutation.

### 18.5.2 Irreducibility

crumb trail: > graph > Graphs and matrices > Irreducibility

As an example of graph concepts that has an easy interpretation in the adjacency matrix, consider reducibility.

Definition

A graph is called irreducible if for every pair $i,j$ of nodes there is a path from $i$ to $j$ and from $j$ to $i$. A graph is reducible if it is not irreducible.

Exercise

Let $A$ be a matrix $A= \begin{pmatrix} B&C\\ \emptyset&D \end{pmatrix}$ where $B$ and $D$ are square matrices. Prove the reducibility of the graph of which this is the adjacency matrix.

If we permute graph, its reducibility or irreducibility is not changed. However, it may now no longer be apparent from looking at the adjacency matrix.

### 18.5.3 Graph closure

crumb trail: > graph > Graphs and matrices > Graph closure

Here is another example of how adjacency matrices can simplify reasoning about graphs.

Exercise

Let $G=\langle V,E\rangle$ be an undirected graph, and let $G'=\langle V,E'\rangle$ be the graph with the same vertices, but with vertices defined by $(i,j)\in E'\Leftrightarrow \exists_k\colon (i,k)\in E\wedge (k,j)\in E.$ If $M$ is the adjacency matrix of $G$, show that $M^2$ is the adjacency matrix of $G'$, where we use boolean multiplication on the elements: $1\cdot1=1$, $1+1=1$.

### 18.5.4 Graph operations in linear algebra

crumb trail: > graph > Graphs and matrices > Graph operations in linear algebra

In most of the above, the adjacency matrix was nothing more than a table. We will now show that we can actually do linear algebra with it, making it deserving of the name `matrix'. As a simple example of using linear algebra on an adjacency matrix $G$, let $e$ the vector of all $1$s, then $Ge$ is the vector that lists the degrees of the nodes.

We can do many operations this way. Consider a weighted graph $G$. Finding for each node $i$ the largest weight $g_{ij}$ can be described as $y=G\otimes e\qquad\hbox{where}\qquad y_i=\max_j g_{ij}\cdot 1$ This looks like the regular matrix-vector product $Ge$, but with the sum replaced by a maximum calculation.

In many cases we actually need the left product, that is, multiplying the adjacency matrix from the left by a row vector. Let for example $e_i$ be the vector with a $1$ in location $i$ and zero elsewhere. Then $e_i^tG$ has a one in every $j$ that is a neighbour of $i$ in the adjacency graph of $G$.

#### 18.5.4.1 Markov chains

crumb trail: > graph > Graphs and matrices > Graph operations in linear algebra > Markov chains

This left matrix-vector product has a neat application: Markov chains . Suppose we have a system (see for instance  ) that can be in any of $n$ states. We can then use the adjacency matrix to model state transitions.

Let $G$ by an adjacency matrix with the property that all elements are non-negative, and the elements in each row sum to 1. We now interpret $g_{ij}$ as the probability of going from state $i$ to state $j$. Let $x$ be a probability vector, that is, $x_i$ is the nonegative probability of being in state $i$, then $y^t=x^tG$ describes these probabilities, after one state transition.

Exercise

For a vector $x$ to be a proper probability vector, its elements need to sum to 1. Show that the elements of $y$ again sum to 1.

#### 18.5.4.2 General matrix-vector product

The examples above showed that sometimes we perform an operation on an adjacency matrix that has the structure of a matrix-vector product, but may not necessarily use addition and multiplication. (Many more examples of this can be found in section  .)

Motivated by this, we define a general product $y = G \mathop{\oplus\cdot_\otimes} x$ where $\otimes$ is a binary operator, and $\oplus$ a reduction operator, as $y_i = \bigoplus_j (g_{ij} \otimes x_j ).$ In this notation, finding the largest weight would be $w= G\mathop{\max\cdot_\times} e.$

This will be used in several algorithms in chapter  ; for other applications see  [Kung:pegasus2009] .

## 18.6 Spectral graph theory

crumb trail: > graph > Spectral graph theory

With a graph $G$\footnote{This section owes much to Dan Spielman's course on spectral graph theory  http://www.cs.yale.edu/homes/spielman/561/ .} and its adjacency matrix $A_G$, we can define a stochastic matrix or Markov matrix by scaling $A_G$ to have unit row sums: $W_G = D_G\inv A_G\qquad \hbox{where (D_G)_{ii}=\deg(i)}.$ To see how we interpret this, let's look at a simple example. Let's take an unweighted graph with an adjacency matrix $A_G = \begin{pmatrix} 1& &1&1\\ & &1&1\\ 1& &1&1\\ &1&1& \\ \end{pmatrix}$ and look at the second row, which says that there are edges $(2,3)$ and $(2,4)$. This means that if you are on node 2, you can go to nodes 3 and 4. Scaling this matrix we get $W_G = \begin{pmatrix} 1/3& &1/3&1/3\\ & &1/2&1/2\\ 1/3& &1/3&1/3\\ &1/2&1/2& \\ \end{pmatrix}$ and now the second row says that from node 2 you can get with equal probability to nodes 3 and 4. You can also derive this statement mathematically: $\begin{pmatrix} 0&1&0&0 \end{pmatrix} W_G = \begin{pmatrix} 0& 0&1/2&1/2\\ \end{pmatrix}$ It is simple to extrapolate that: if $p$ is a vector where the $i$-th component gives the probability of being in node $i$, then $(p^tW_G)_i$ is the probability of being in node $i$ if you take one more step along a graph edge.

Exercise

Prove that $p^tW_G$ is indeed a vector of probabilities. Hint: you can express that $p$ is a probability vector as $p^te=e$, where $e$ is the vector of all ones.

### 18.6.1 The graph Laplacian

crumb trail: > graph > Spectral graph theory > The graph Laplacian

Another matrix to associate with a graph is the graph Laplacian $L_G = D_G-A_G.$ This matrix has zero rowsums and positive diagonal entries, so by the Gershgorin theorem (section  all its eigenvalues are in the complex right half plane.

Exercise

Show that the vector of all ones is an eigenvector with eigenvalue 1.

This Laplacian matrix gives us a quadratic form: $x^tL_Gx = \sum_{(i,j)\in E} (x_i-x_j)^2.$

### 18.6.2 Domain decomposition through Laplacian matrices

There are various interesting theorems connected with the graph adjacency and Laplacian matrix. These have a very practical application to domain decomposition .

We get our inspiration of elliptic PDEs .

Connected with the Laplace equation $-\Delta u=f$ is an operator ${\cal L}u=-\Delta u$. On the unit interval $[0,1]$ the eigenfunctions of this operator, that is, the functions for which ${\cal L}u=\lambda u$, are $u_n(x)=\sin n\pi x$ for $n>0$. These have the property that $u_n(x)$ has $n-1$ zeros in the interior of the interval, and they divide the interval in $n$ connected regions where the function is positive of negative. Thus, if you wanted to divide a domain $\Omega$ over $p$ processors, you could consider the $p$-th eigenfunction of the Laplacian on $\Omega$, and find the connected regions where it is positive or negative.

This statement about PDE has a graph equivalent in two versions of Fiedler's theorem . (We will not give any proofs in this section; see  [Spielman:spectral-graph-theory] .)

Theorem

Let $G$ be a weighted path graph on $n$ vertices, let $L_P$ have eigenvalues $0 = \lambda_1 < \lambda_2\leq…\leq\lambda_n$, and let $v_k$ be an eigenvector of $\lambda_k$. Then $v_k$ changes sign $k-1$ times.

The second theorem is more useful  [Fiedler:75-property] :

Theorem

Let $G = (V,E,w)$ be a weighted connected graph, and let $L_G$ be its Laplacian matrix. Let $0 = \lambda_1 < \lambda_2 \leq \cdots \leq \lambda_n$ be the eigenvalues of $L_G$ and let $v_1,…,v_n$ be the corresponding eigenvectors. For any $k \geq 2$, let $W_k =\{i\in V\colon v_k(i)\geq0\}$. Then, the graph induced by $G$ on $W_k$ has at most $k-1$ connected components.

The important consequence of this is that the eigenvector to the first nontrivial eigenvalue can be used to partition the graph in two connected piecesone of nodes where the eigenvector is positive, and one where the eigenvector is negative. This eigenvector is known as the Fiedler vector . The adjacency matrix is nonnegative, and there is an extensive theory for this type of matrix  [BePl:book] ; see the Perron-Frobenius theorem in section  .

In general there are no guarantees for how good a decomposition this is, measured by the ratio of the numbers of edges, but in practice it can be shown that the behaviour is pretty good  [Spielman96spectralpartitioning] .

\begin{lemma} $\alpha_1\leq d_{\max}.$ \end{lemma}

### 18.6.3 Cheeger's inequality

crumb trail: > graph > Spectral graph theory > Cheeger's inequality

Above we remarked that the first non-trivial eigenvalue of the graph Laplacian has a relation to partitioning a graph in two parts. The Cheeger's constant and Cheeger's inequality relate this eigenvalue to a certain quality measure of partitionings.

Let $V$ be the set of vertices and $S\subset V$, then Cheeger's constant of a graph is defined as $C=\min_S \frac{e(S,V-S)} {\min{\mathord{\mathrm{vol}}(S),\mathord{\mathrm{vol}}(V-S)}}$ where $e(S,V-S)$ denotes the number of edges connecting $S$ to $V-S$, and the volume of a set of nodes is defined as $\mathord{\mathrm{vol}}(S) = \sum_{e\in S}d(e).$

Cheeger's inequality then states $2C \geq \lambda \geq \frac{C^2}2$ where $\lambda$ is the first nontrivial eigenvalue of the graph Laplacian.