\[
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%
%%%% This text file is part of the source of
%%%% `Introduction to High-Performance Scientific Computing'
%%%% by Victor Eijkhout, copyright 2012-2020
%%%%
%%%% mathjax.tex : macros to facility mathjax use in html version
%%%%
%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand\inv{^{-1}}\newcommand\invt{^{-t}}
\newcommand\bbP{\mathbb{P}}
\newcommand\bbR{\mathbb{R}}
\newcommand\defined{
\mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}}
\newcommand\macro[1]{$\langle$#1$\rangle$}
\newcommand\dtdxx{\frac{\alpha\Delta t}{\Delta x^2}}
\]
13.1 : Norms

13.1.1 : Vector norms

13.1.2 : Matrix norms

13.2 : Gram-Schmidt orthogonalization

13.3 : The power method

13.4 : Nonnegative matrices; Perron vectors

13.5 : The Gershgorin theorem

13.6 : Householder reflectors

Back to Table of Contents# 13 Linear algebra

## 13.1 Norms

### 13.1.1 Vector norms

### 13.1.2 Matrix norms

## 13.2 Gram-Schmidt orthogonalization

**Exercise**
## 13.3 The power method

**Exercise**
## 13.4 Nonnegative matrices; Perron vectors

**Theorem**
## 13.5 The Gershgorin theorem

**Theorem**
## 13.6 Householder reflectors

Back to Table of Contents
13.1.1 : Vector norms

13.1.2 : Matrix norms

13.2 : Gram-Schmidt orthogonalization

13.3 : The power method

13.4 : Nonnegative matrices; Perron vectors

13.5 : The Gershgorin theorem

13.6 : Householder reflectors

Back to Table of Contents

In this course it is assumed that you know what a matrix and a vector are, simple algorithms such as how to multiply them, and some properties such as invertibility of a matrix. This appendix introduces some concepts and theorems that are not typically part of a first course in linear algebra.

A
*norm*
is a way to generalize the concept of absolute
value to multi-dimensional objects such as vectors and matrices. There
are many ways of defining a norm, and there is theory of how different
norms relate. Here we only give the basic definitions; for more detail
see any linear algebra textbook, for instance~
[golo83]
.

crumb trail: > norms > Norms > Vector norms

A norm is any function $n(\cdot)$ on a vector space $V$ with the following properties:

- $n(x)\geq0$ for all $x\in V$ and $n(x)=0$ only for $x=0$,
- $n(\lambda x)=|\lambda|n(x)$ for all $x\in V$ and $\lambda\in\bbR$.
- $n(x+y)\leq n(x)+n(y)$

For any $p\geq1$, the following defines a vector norm: \[ |x|_p = \sqrt{\sum_i|x_i|^p}. \] Common norms are $\|\cdot\|_1$ (`sum of absolute values') and $\|\cdot\|_2$ (`square root of sum of squares'); the $\|\cdot\|_\infty$ norm is defined as $\lim_{p\rightarrow\infty}\|\cdot\|_p$, and it is not hard to see that this equals \[ \|x\|_\infty=\max_i |x_i|. \]

crumb trail: > norms > Norms > Matrix norms

By considering a matrix of size~$n$ as a vector of length $n^2$, we
can define the Frobenius matrix norm:
\[
\|A\|_F=\sqrt{\sum_{i,j}|a_{ij}|^2}.
\]
However, we will mostly look at
*associated matrix norms*
:
\[
\|A\|_p=\sup_{\|x\|_p=1}\|Ax\|_p=
\sup_x\frac{\|Ax\|_p}{\|x\|_p}.
\]
From their definition, it then follows that
\[
\|Ax\|\leq\|A\|\|x\|
\]
for associated norms.

The following are easy to derive:

- $\|A\|_1=\max_j\sum_i|a_{ij}|$,
- $\|A\|_\infty=\max_i\sum_j|a_{ij}|$.

By observing that $\|A\|_2=\sup_{\|x\|_2=1} x^tA^tAx$, it is not hard to derive that $\|A\|_2$ is the maximal singular value of~$A$, which is the root of the maximal eigenvalue of~$A^tA$.

The matrix
*condition number*
is defined as
\[
\kappa(A)=\|A\|\,\|A\inv\|.
\]
In the symmetric case, and using the 2-norm, this is the ratio between the largest and
smallest eigenvalue.

crumb trail: > norms > Gram-Schmidt orthogonalization

The
*GS*
algorithm takes a series of vectors and inductively
orthogonalizes them. This can be used to turn an arbitrary basis of a
vector space into an orthogonal basis; it can also be viewed as
transforming a matrix $A$ into one with orthogonal columns. If $Q$ has
orthogonal columns, $Q^tQ$ is diagonal, which is often a convenient
property to have.

The basic principle of the
*GS*
algorithm
can be demonstrated with two
vectors~$u,v$. Suppose we want a vector $v'$ so that $u,v$ and $u,v'$
span the same space, but $v'\perp u$. For this we let
\[
v'\leftarrow v-\frac{u^tv}{u^tu}u.
\]
It is easy to see that this satisfies the requirements.

Suppose we have an set of vectors $u_1,\ldots,u_n$ that we want to orthogonalize. We do this by successive applications of the above transformation:

For $i=1,\ldots,n$:

For $j=1,\ldots i-1$:

let $c_{ji}=u_j^tu_i/u_j^tu_j$

For $i=1,\ldots,n$:

update $u_i\leftarrow u_i-u_jc_{ji}$

Often the vector $v$ in the algorithm above is normalized; this adds a line

$u_i\leftarrow u_i/\|u_i\|$

to the algorithm.
*GS*
orthogonalization with this normalization, applied to a
matrix, is also known as the
*QR factorization*
.

Suppose that we apply the
*GS*
algorithm to the columns of a
rectangular matrix~$A$, giving a matrix~$Q$. Prove that there is an
upper triangular matrix~$R$ such that $A=QR$. (Hint: look at the
$c_{ji}$ coefficients above.) If we normalize the
orthogonal vector in the algorithm above, $Q$~has the additional property
that $Q^tQ=I$. Prove this too.

The
*GS*
algorithm as given above computes the desired result,
but only in exact arithmetic. A~computer implementation can be quite
inaccurate if the angle between $v$ and one of the $u_i$ is small. In
that case, the
*MGS*
algorithm will
perform better:

For $i=1,\ldots,n$:

For $j=1,\ldots i-1$:

let $c_{ji}=u_j^tu_i/u_j^tu_j$

update $u_i\leftarrow u_i-u_jc_{ji}$

To contrast it with
*MGS*
, the original
*GS*
algorithm is also
known as
*CGS*
.

As an illustration of the difference between the two methods, consider
the matrix
\[
A=
\begin{pmatrix}
1&1&1\\ \epsilon&0&0\\ 0&\epsilon&0\\ 0&0&\epsilon
\end{pmatrix}
\]
where $\epsilon$ is of the order of the machine precision, so that
$1+\epsilon^2=\nobreak1$ in machine arithmetic.
The
*CGS*
method proceeds as follows:

- The first column is of length~1 in machine arithmetic, so \[ q_1 = a_1 = \begin{pmatrix} 1\\ \epsilon\\ 0\\ 0 \end{pmatrix} . \]
- The second column gets orthogonalized as $v\leftarrow a_2-1\cdot q_1$, giving \[ v= \begin{pmatrix} 0\\ -\epsilon\\ \epsilon\\ 0 \end{pmatrix} , \quad\hbox{normalized:}\quad q_2 = \begin{pmatrix} 0\\ -\frac{\sqrt 2}2\\ \frac{\sqrt2}2\\ 0 \end{pmatrix} \]
- The third column gets orthogonalized as $v\leftarrow a_3-c_1q_1-c_2q_2$, where \[ \begin{cases} c_1 = q_1^ta_3 = 1\\ c_2 = q_2^ta_3=0 \end{cases} \Rightarrow v= \begin{pmatrix} 0\\ -\epsilon\\ 0\\ \epsilon \end{pmatrix} ;\quad \hbox{normalized:}\quad q_3= \begin{pmatrix} 0\\ \frac{\sqrt2}2\\ 0\\ \frac{\sqrt2}2 \end{pmatrix} \]

It is easy to see that $q_2$ and $q_3$ are not orthogonal at all.
By contrast, the
*MGS*
method differs in the last step:

- As before, $q_1^ta_3=1$, so \[ v\leftarrow a_3-q_1 = \begin{pmatrix} 0\\ -\epsilon\\ \epsilon\\ 0 \end{pmatrix} . \] Then, $q_2^tv=\frac{\sqrt2}2\epsilon$ (note that $q_2^ta_3=0$ before), so the second update gives \[ v\leftarrow v- \frac{\sqrt2}2\epsilon q_2= \begin{pmatrix} 0\\ \frac\epsilon2\\ -\frac\epsilon2\\ \epsilon \end{pmatrix} ,\quad\hbox{normalized:}\quad \begin{pmatrix} 0\\ \frac{\sqrt6}6\\ -\frac{\sqrt6}6\\ 2\frac{\sqrt6}6 \end{pmatrix} \] Now all $q_i^tq_j$ are on the order of $\epsilon$ for~$i\not=\nobreak j$.

crumb trail: > norms > The power method

The vector sequence \[ x_i = Ax_{i-1}, \] where $x_0$ is some starting vector, is called the method} since it computes the product of subsequent matrix powers times a vector: \[ x_i = A^ix_0. \] There are cases where the relation between the $x_i$ vectors is simple. For instance, if $x_0$ is an eigenvector of~$A$, we have for some scalar~$\lambda$ \[ Ax_0=\lambda x_0 \qquad\hbox{and}\qquad x_i=\lambda^i x_0. \] However, for an arbitrary vector $x_0$, the sequence $\{x_i\}_i$ is likely to consist of independent vectors. Up to a point.

Let $A$ and $x$ be the $n\times n$ matrix and dimension~$n$ vector
\[
A =
\begin{pmatrix}
1&1\\ &1&1\\ &&\ddots&\ddots\\ &&&1&1\\&&&&1
\end{pmatrix}
,\qquad
x = (0,\ldots,0,1)^t.
\]
Show that the sequence $[x,Ax,\ldots,A^ix]$ is an independent set
for $i

Now consider the matrix $B$:
\[
B=\left(
\begin{array}{cccc|cccc}
1&1&&\\ &\ddots&\ddots&\\ &&1&1\\ &&&1\\ \hline
&&&&1&1\\ &&&&&\ddots&\ddots\\ &&&&&&1&1\\ &&&&&&&1
\end{array}
\right),\qquad
y = (0,\ldots,0,1)^t
\]
Show that the set $[y,By,\ldots,B^iy]$ is an independent set for
$i

While in general the vectors $x,Ax,A^2x,\ldots$ can be expected to be independent, in computer arithmetic this story is no longer so clear.

Suppose the matrix has eigenvalues $\lambda_0>\lambda_1\geq\cdots
\lambda_{n-1}$ and corresponding eigenvectors~$u_i$ so that
\[
Au_i=\lambda_i u_i.
\]
Let the vector~$x$ be written as
\[
x=c_0u_0+\cdots +c_{n-1}u_{n-1},
\]
then
\[
A^ix = c_0\lambda_0^iu_i+\cdots +c_{n-1}\lambda_{n-1}^iu_{n-1}.
\]
If we write this as
\[
A^ix = \lambda_0^i\left[
c_0u_i+c_1\left(\frac{\lambda_1}{\lambda_0}\right)^i+
\cdots +c_{n-1}\left(\frac{\lambda_{n-1}}{\lambda_0}\right)^i
\right],
\]
we see that, numerically, $A^ix$ will get progressively closer
to a multiple of~$u_0$, the
*dominant eigenvector*
. Hence,
any calculation that uses independence of the $A^ix$ vectors is likely
to be inaccurate.

crumb trail: > norms > Nonnegative matrices; Perron vectors

If $A$ is a nonnegative matrix, the maximal eigenvalue has the
property that its eigenvector is nonnegative: this is the the
*Perron-Frobenius theorem*
.

If a nonnegative matrix $A$ is irreducible, its eigenvalues satisfy

- The eigenvalue $\alpha_1$ that is largest in magnitude is real and simple: \[ \alpha_1> |\alpha_2|\geq\cdots. \]
- The corresponding eigenvector is positive.

crumb trail: > norms > The Gershgorin theorem

Finding the eigenvalues of a matrix is usually complicated. However, there are some tools to estimate eigenvalues. In this section you will see a theorem that, in some circumstances, can give useful information on eigenvalues.

Let $A$ be a square matrix, and $x,\lambda$ an eigenpair: $Ax=\lambda x$. Looking at one component, we have \[ a_{ii}x_i+\sum_{j\not=i} a_{ij}x_j=\lambda x_i. \] Taking norms: \[ (a_{ii}-\lambda) \leq \sum_{j\not=i} |a_{ij}| \left|\frac{x_j}{x_i}\right| \] Taking the value of~$i$ for which $|x_i|$ is maximal, we find \[ (a_{ii}-\lambda) \leq \sum_{j\not=i} |a_{ij}|. \] This statement can be interpreted as follows:

The eigenvalue $\lambda$ is located in the circle around~$a_{ii}$ with radius $\sum_{j\not=i}|a_{ij}|$.

Since we do not know for which value of~$i$ $|x_i|$ is maximal, we can
only say that there is
*some*
value of~$i$ such that $\lambda$
lies in such a circle. This is the Gershgorin theorem.

Let $A$ be a square matrix, and let $D_i$ be the circle with center $a_{ii}$ and radius $\sum_{j\not=i}|a_{ij}|$, then the eigenvalues are contained in the union of circles~$\cup_i D_i$.

We can conclude that the eigenvalues are in the interior of these discs, if the constant vector is not an eigenvector.

crumb trail: > norms > Householder reflectors

In some contexts the question comes up how to transform one subspace into another.
*Householder reflectors*
a unit vector~$u$, and let
\[
H= I-2uu^t.
\]
For this matrix we have $Hu=-u$, and if $u\perp v$, then $Hv=v$.
In other words, the subspace of multiples of~$u$ is
flipped, and the orthogonal subspace stays invariant.

Now for the original problem of mapping one space into another. Let the original space be spanned by a vector~$x$ and the resulting by~$y$, then note that \[ \begin{cases} x = (x+y)/2 + (x-y)/2\\ y = (x+y)/2 - (x-y)/2 \end{cases} \]

In other words, we can map $x$ into~$y$ with the reflector based on $u=(x-y)/2$.

We can generalize Householder reflectors to a form \[ H=I-2uv^t. \] The matrices $L_i$ used in LU factorization (see section~ ) can then be seen to be of the form $L_i = I-\ell_ie_i^t$ where $e_i$~has a single one in the $i$-th location, and $\ell_i$~only has nonzero below that location. That form also makes it easy to see that $L_i\inv = I+\ell_ie_i^t$: \[ (I-uv^t)(I+uv^t) = I-uv^tuv^t = 0 \] if $v^tu=0$.