Notes on Linear Algebra

A laptop showing a Matrix-like green display. — Photo by Markus Spiske on Unsplash.

Several of my classes, especially Recommender Systems, make use of linear algebra concepts and notation. This document collects some basic notes on linear algebra concepts and what they mean to act as quick crash course or refresher, along with pointers for further study.

This document is relatively terse; for a more in-depth treatment, see the resources at the end of the page, particularly Linear Algebra Done Right.

Core Terms

The core structures in liner algebra are vectors and matrices.

Vectors

A vector is an ordered sequence of values, typically real numbers. In programming terms, we would represent it as an array (i.e., a 1-“dimensional” NumPy ndarray). We denote vectors either with the vector symbol $\vec{x}$ or as bold-face lowercase letters $\mathbf{x}$ ; to write out the content of a vector, we usually use angle brackets, e.g. $\mathbf{x} = \langle 1, 5, 0 \rangle$ or $\mathbf{x} = \langle x_1, x_2, \dots, x_n \rangle$ . For a vector $\mathbf{x}$ , we denote individual elements of the vector with a subscripted variable $x_i$ , where $i$ is the position in the vector (starting from 1).

The number of values $n$ is called the dimension of the vector. To say that a vector is a real-valued vector with dimension $n$ , we write $\mathbf{x} \in \mathbb{R}^n$ .

Matrices

A matrix is a grid of numbers (a 2D array, in programming terms). For example, the following is a $4 \times 4$ matrix $M \in \mathbb{R}^{4 \times 4}$ :

$M = \begin{bmatrix} 2 & 3 & 0 & 5 \\ 1 & 9 & -2 & 0 \\ 3 & 2 & -5 & 7 \\ -2 & 6 & 1 & 4 \end{bmatrix}$

We conventionally use uppercase letters to denote matrices (and sets). Individual elements of the matrix are denoted with subscripted lowercase, with the row number coming first. For the matrix above, $m_{1,2}=9$ . When we are using variables as subscripts, we may omit the comma, as in $m_{ij}$ .

The rows and columns of the matrix form vectors, which we also reference with subscripts while specifying whether we are referring to a row or column vector. For $M$ , the column vector $\mathbf{m}_3 = \langle 0, -2, -5, 1 \rangle$ , while the row vector $\mathbf{m}_1 = \langle 2, 3, 0, 5 \rangle$ .

As with a vector, we denote the dimension of a matrix by using a superscript on the domain of the matrix values: a matrix $A$ with $m$ rows and $n$ columns of real numbers is denoted $A \in \mathbb{R}^{m \times n}$ .

There are a few special kinds of matrices. A diagonal matrix is a matrix is zero everywhere except along the diagonal starting at the top left. That is, $i = j \Rightarrow m_{ij} = 0$ , for example:

$M = \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 9 & 0 & 0 \\ 0 & 0 & -5 & 0 \\ 0 & 0 & 0 & 4 \end{bmatrix}$

Diagonal matrices are usually square.

An identity matrix, usually denoted $I \in \mathbb{R}^{n \times n}$ or $\mathbb{1} \in \mathbb{R}^{n \times n}$ , is a square diagonal matrix whose diagonal is all 1.

Other Domains

Above, we have used the real numbers $\mathbb{R}$ as the domain, or the set of valid valies for a matrix or vector. These are not the only valid values, though; in principle, any set can be used.

If we have a vector matrix of binary values 0 and 1, we may write $\mathbf{x} \in \{0,1\}^n$ . If our values are known to be in the range 0–1, we may write $\mathbf{x} \in [0,1]^n$ ( $[a,b]$ is the standard notation for the closed interval from $a$ to $b$ : $\{x \in \mathbb{R}: a \le x \le b\}$ ).

Term Cross-Reference

The terminological overlap between mathematical dimension (number of elements in a vector) and programming dimension (whether you have a 1D array, 2D matrix, 3D, etc.) is confusing. Vectors also have a mathematical length, which is different from the mathematical dimension (see Vector Norms). When programming languages talk about the length of an array or list, e.g. with the Python len(array) function, that is the dimension in linear algebra.

Dimension

In linear algebra, the number of elements in a vector or matrix.
In programming (including NumPy), the number of dimensions along which we can index an array, such as 1D for a vector, 2D for a matrix, and 3D for a tensor.

Length

In linear algebra, the $L_2$ norm of a vector (see Vector Norms).
In programming, the number of elements in an array, along a specific dimension (equivalent to the linear algebra dimension).

To reduce confusion, I rarely use the term “length” to refer to mathematical length, preferring “ $L_2$ norm” or “Euclidean length”.

Beyond Matrices

A tensor is a generalization of a matrix to more than 2 dimensions.

A scalar is a single value (not a vector or matrix).

Vector Operations

Vectors define several operations:

Addition

Two vectors can be added, denoted by $\mathbf{z} = \mathbf{x} + \mathbf{y}$ . Both vectors must be the same length, and vector addition adds the corresponding elements of each vector to produce a new vector ( $z_i = x_i + y_i$ ).

Vectors can also be added to a scalar, as in $\mathbf{z} = \mathbf{x} + y$ , in which case $y$ is added to each element of $\mathbf{x}$ .

Scalar multiplication

A vector $\mathbf{x}$ can be multipled by a scalar $\alpha \in \mathbb{R}$ , denoted $\mathbf{y} = \alpha\mathbf{x}$ . This is defined by multipling each element of the vector by the scalar, so $y_i = \alpha x_i$ .

Addition and scalar multiplication combine to support subraction ( $\mathbf{x} - \mathbf{y}$ ).

Inner product

The inner product or dot product of two vectors, denoted $\mathbf{x} \cdot \mathbf{y}$ , is the sum of the products of the corresponding pairs of elements:

$\mathbf{x} \cdot \mathbf{y} = \sum_i x_i y_i$

Note that “multiplication” is not a supported operation for vectors. Array-oriented programming environments such as NumPy, R, and Matlab implement array multiplication arr1 * arr2 that multiplies the elements pairwise to produce a new vector of the same length, analogous to vector addition. This pairwise multiplication is a very useful operation in many contexts, but it is not a defined operator in linear algebra.

Vector Norms

The norm of a vector is a measure of its length, in geometrical terms. The typical length is the L₂ norm, also called the Euclidean norm or Euclidean length (or sometimes Euclidean distance, which is just the length of a vector obtained by subtracting two other vectors), and is a multidimensional generalization of the Pythagorean theorem:

$\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}$

If you see a reference to the “norm” of a vector without further qualification, or the length operator $\|\mathbf{x}\|$ , it is usually referring to the L₂ norm. In my writing and teaching, I try to explicitly name the norm to avoid confusion.

We will often seen the square of the L₂ norm, denoted $\|\mathbf{x}\|_2^2$ : this is just the sum of the squares of the vector’s elements, as the square has the effect of removing the square root. Further, $\|\mathbf{x}\|_2^2 = \mathbf{x} \cdot \mathbf{x}$ .

A unit vector is a vector whose L₂ norm is 1 ( $\|\mathbf{x}\|_2 = 1$ ). We can normalize a vector to be a unit vector by dividing it by the norm: $\hat{\mathbf{x}} = \frac{1}{\|\mathbf{x}\|_2} \mathbf{x}$ .

Another common norm is the L₁ norm, also called the Manhattan distance. This is denoted $\|\mathbf{x}\|_1$ , and is the sum of the absolute values: $\|\mathbf{x}\|_1 = \sum_i |x_i|$ .

You may sometimes see reference to the $L_\infty$ norm, which is the maximum value of the vector.

Matrix Operations

Matrices also support addition and scalar multiplication, identically to vectors. In addition, they support the following operations:

Transpose

A^\operatorname{T}

denotes the transpose of matrix

A

, which just switches the rows and columns of

A

. That is, if

A \in \mathbb{R}^{m \times n}

and

B = A^\operatorname{T}

, then

B \in \mathbb{R}^{n \times m}

and

b_{ij} = a_{ji}

Multiplication

Unlike vectors, two matrices can be multiplied, provided their dimensions align. If $A \in \mathbb{R}^{m \times k}$ and $B \in \mathbb{R}^{k \ times n}$ (so $A$ has the same number of columns as $B$ has rows), then the multiplication $C = A B$ yields a matrix $C \in \mathbb{R}^{m \times n}$ such that:

$c_{ij} = \sum_{e=1}^{k} a_{ie} b_{ej}$

This is the same as the inner product: $c_{ij}$ is the inner product of row $i$ of $A$ and column $j$ of $B$ .

Note that unlike scalar multiplication, matrix multiplication is not commutative ( $A B$ is not necessarily the same as $B A$ , even if the matrices are square so the multiplication is defined). Multplication also interacts with transpose, by reversing the matrices: $(AB)^\operatorname{T}= B^\operatorname{T}A^\operatorname{T}$ .

Matrix-vector multiplication

You can also multiply a matrix by a vector. If $A \in \mathbb{R}^{m \times n}$ and $\mathbf{x} \in \mathbb{R}^n$ , then the multiplication $\mathbf{y} = A \mathbf{x}$ is defined, and the result is the same as multiplying $A$ by a matrix with a single column $\mathbf{x}$ ( $y_j = \sum_i a_{ij} x_j$ ).

If $A \in \mathbb{R}^{m \times n}$ and $\mathbf{x} \in \mathbb{R}^m$ (the vector matches the number of rows), then instead you can pre-multiply the vector by the matrix, as in $\mathbf{y} = \mathbf{x} A$ — $\mathbf{y} \in \mathbb{R}^m$ , and $y_{ij} = \sum_j x_i a_{ij}$ .

Matrix inversion

The matrix inverse $A^{-1}$ is a matrix such that $A^{-1}A = \mathbb{1}$ . Not all matrices have inverses; if the inverse of a matrix $A$ exists, then $A$ is called invertible, and if no inverse exists then it is singular.

While linear algebra packages include routines to compute matrix inverses, we very rarely use them. The primary use for a matrix inverse is to solve a system of equations, and there are usually betters ways to solve the system (see Linear Systems).

Frobenius norm

The Frobenius norm of a matrix, denoted

\|A\|_F

, is the matrix version of the L₂ norm: the square root of the sum of the squares. Its square

\|A\|_F^2 = \sum_i \sum_j a_{ij}^2

Linear Systems

TBD.

Michael Ekstrand