Notes on Linear Algebra

A laptop showing a Matrix-like green display.
Photo by Markus Spiske on Unsplash.

Several of my classes, especially Recommender Systems, make use of linear algebra concepts and notation. This document collects some basic notes on linear algebra concepts and what they mean to act as quick crash course or refresher, along with pointers for further study.

This document is relatively terse; for a more in-depth treatment, see the resources at the end of the page, particularly Linear Algebra Done Right.

Core Terms

The core structures in liner algebra are vectors and matrices.

Vectors

A vector is an ordered sequence of values, typically real numbers. In programming terms, we would represent it as an array (i.e., a 1-“dimensional” NumPy ndarray). We denote vectors either with the vector symbol x\vec{x} or as bold-face lowercase letters 𝐱\mathbf{x}; to write out the content of a vector, we usually use angle brackets, e.g. 𝐱=1,5,0\mathbf{x} = \langle 1, 5, 0 \rangle or 𝐱=x1,x2,,xn\mathbf{x} = \langle x_1, x_2, \dots, x_n \rangle. For a vector 𝐱\mathbf{x}, we denote individual elements of the vector with a subscripted variable xix_i, where ii is the position in the vector (starting from 1).

The number of values nn is called the dimension of the vector. To say that a vector is a real-valued vector with dimension nn, we write 𝐱n\mathbf{x} \in \mathbb{R}^n.

Matrices

A matrix is a grid of numbers (a 2D array, in programming terms). For example, the following is a 4×44 \times 4 matrix M4×4M \in \mathbb{R}^{4 \times 4}:

M=[2305192032572614]M = \begin{bmatrix} 2 & 3 & 0 & 5 \\ 1 & 9 & -2 & 0 \\ 3 & 2 & -5 & 7 \\ -2 & 6 & 1 & 4 \end{bmatrix}

We conventionally use uppercase letters to denote matrices (and sets). Individual elements of the matrix are denoted with subscripted lowercase, with the row number coming first. For the matrix above, m1,2=9m_{1,2}=9. When we are using variables as subscripts, we may omit the comma, as in mijm_{ij}.

The rows and columns of the matrix form vectors, which we also reference with subscripts while specifying whether we are referring to a row or column vector. For MM, the column vector 𝐦3=0,2,5,1\mathbf{m}_3 = \langle 0, -2, -5, 1 \rangle, while the row vector 𝐦1=2,3,0,5\mathbf{m}_1 = \langle 2, 3, 0, 5 \rangle.

As with a vector, we denote the dimension of a matrix by using a superscript on the domain of the matrix values: a matrix AA with mm rows and nn columns of real numbers is denoted Am×nA \in \mathbb{R}^{m \times n}.

There are a few special kinds of matrices. A diagonal matrix is a matrix is zero everywhere except along the diagonal starting at the top left. That is, i=jmij=0i = j \Rightarrow m_{ij} = 0, for example:

M=[2000090000500004]M = \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 9 & 0 & 0 \\ 0 & 0 & -5 & 0 \\ 0 & 0 & 0 & 4 \end{bmatrix}

Diagonal matrices are usually square.

An identity matrix, usually denoted In×nI \in \mathbb{R}^{n \times n} or 𝟙n×n\mathbb{1} \in \mathbb{R}^{n \times n}, is a square diagonal matrix whose diagonal is all 1.

Other Domains

Above, we have used the real numbers \mathbb{R} as the domain, or the set of valid valies for a matrix or vector. These are not the only valid values, though; in principle, any set can be used.

If we have a vector matrix of binary values 0 and 1, we may write 𝐱{0,1}n\mathbf{x} \in \{0,1\}^n. If our values are known to be in the range 0–1, we may write 𝐱[0,1]n\mathbf{x} \in [0,1]^n ([a,b][a,b] is the standard notation for the closed interval from aa to bb: {x:axb}\{x \in \mathbb{R}: a \le x \le b\}).

Term Cross-Reference

The terminological overlap between mathematical dimension (number of elements in a vector) and programming dimension (whether you have a 1D array, 2D matrix, 3D, etc.) is confusing. Vectors also have a mathematical length, which is different from the mathematical dimension (see Vector Norms). When programming languages talk about the length of an array or list, e.g. with the Python len(array) function, that is the dimension in linear algebra.

Dimension
  • In linear algebra, the number of elements in a vector or matrix.
  • In programming (including NumPy), the number of dimensions along which we can index an array, such as 1D for a vector, 2D for a matrix, and 3D for a tensor.
Length
  • In linear algebra, the L2L_2 norm of a vector (see Vector Norms).
  • In programming, the number of elements in an array, along a specific dimension (equivalent to the linear algebra dimension).

To reduce confusion, I rarely use the term “length” to refer to mathematical length, preferring “L2L_2 norm” or “Euclidean length”.

Beyond Matrices

A tensor is a generalization of a matrix to more than 2 dimensions.

A scalar is a single value (not a vector or matrix).

Vector Operations

Vectors define several operations:

Addition

Two vectors can be added, denoted by 𝐳=𝐱+𝐲\mathbf{z} = \mathbf{x} + \mathbf{y}. Both vectors must be the same length, and vector addition adds the corresponding elements of each vector to produce a new vector (zi=xi+yiz_i = x_i + y_i).

Vectors can also be added to a scalar, as in 𝐳=𝐱+y\mathbf{z} = \mathbf{x} + y, in which case yy is added to each element of 𝐱\mathbf{x}.

Scalar multiplication

A vector 𝐱\mathbf{x} can be multipled by a scalar α\alpha \in \mathbb{R}, denoted 𝐲=α𝐱\mathbf{y} = \alpha\mathbf{x}. This is defined by multipling each element of the vector by the scalar, so yi=αxiy_i = \alpha x_i.

Addition and scalar multiplication combine to support subraction (𝐱𝐲\mathbf{x} - \mathbf{y}).

Inner product

The inner product or dot product of two vectors, denoted 𝐱𝐲\mathbf{x} \cdot \mathbf{y}, is the sum of the products of the corresponding pairs of elements:

𝐱𝐲=ixiyi\mathbf{x} \cdot \mathbf{y} = \sum_i x_i y_i

Note that “multiplication” is not a supported operation for vectors. Array-oriented programming environments such as NumPy, R, and Matlab implement array multiplication arr1 * arr2 that multiplies the elements pairwise to produce a new vector of the same length, analogous to vector addition. This pairwise multiplication is a very useful operation in many contexts, but it is not a defined operator in linear algebra.

Vector Norms

The norm of a vector is a measure of its length, in geometrical terms. The typical length is the L₂ norm, also called the Euclidean norm or Euclidean length (or sometimes Euclidean distance, which is just the length of a vector obtained by subtracting two other vectors), and is a multidimensional generalization of the Pythagorean theorem:

𝐱2=ixi2\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}

If you see a reference to the “norm” of a vector without further qualification, or the length operator 𝐱\|\mathbf{x}\|, it is usually referring to the L₂ norm. In my writing and teaching, I try to explicitly name the norm to avoid confusion.

We will often seen the square of the L₂ norm, denoted 𝐱22\|\mathbf{x}\|_2^2: this is just the sum of the squares of the vector’s elements, as the square has the effect of removing the square root. Further, 𝐱22=𝐱𝐱\|\mathbf{x}\|_2^2 = \mathbf{x} \cdot \mathbf{x}.

A unit vector is a vector whose L₂ norm is 1 (𝐱2=1\|\mathbf{x}\|_2 = 1). We can normalize a vector to be a unit vector by dividing it by the norm: 𝐱̂=1𝐱2𝐱\hat{\mathbf{x}} = \frac{1}{\|\mathbf{x}\|_2} \mathbf{x}.

Another common norm is the L₁ norm, also called the Manhattan distance. This is denoted 𝐱1\|\mathbf{x}\|_1, and is the sum of the absolute values: 𝐱1=i|xi|\|\mathbf{x}\|_1 = \sum_i |x_i|.

You may sometimes see reference to the LL_\infty norm, which is the maximum value of the vector.

Matrix Operations

Matrices also support addition and scalar multiplication, identically to vectors. In addition, they support the following operations:

Transpose
ATA^\operatorname{T} denotes the transpose of matrix AA, which just switches the rows and columns of AA. That is, if Am×nA \in \mathbb{R}^{m \times n} and B=ATB = A^\operatorname{T}, then Bn×mB \in \mathbb{R}^{n \times m} and bij=ajib_{ij} = a_{ji}.
Multiplication

Unlike vectors, two matrices can be multiplied, provided their dimensions align. If Am×kA \in \mathbb{R}^{m \times k} and BktimesnB \in \mathbb{R}^{k \ times n} (so AA has the same number of columns as BB has rows), then the multiplication C=ABC = A B yields a matrix Cm×nC \in \mathbb{R}^{m \times n} such that:

cij=e=1kaiebejc_{ij} = \sum_{e=1}^{k} a_{ie} b_{ej}

This is the same as the inner product: cijc_{ij} is the inner product of row ii of AA and column jj of BB.

Note that unlike scalar multiplication, matrix multiplication is not commutative (ABA B is not necessarily the same as BAB A, even if the matrices are square so the multiplication is defined). Multplication also interacts with transpose, by reversing the matrices: (AB)T=BTAT(AB)^\operatorname{T}= B^\operatorname{T}A^\operatorname{T}.

Matrix-vector multiplication

You can also multiply a matrix by a vector. If Am×nA \in \mathbb{R}^{m \times n} and 𝐱n\mathbf{x} \in \mathbb{R}^n, then the multiplication 𝐲=A𝐱\mathbf{y} = A \mathbf{x} is defined, and the result is the same as multiplying AA by a matrix with a single column 𝐱\mathbf{x} (yj=iaijxjy_j = \sum_i a_{ij} x_j).

If Am×nA \in \mathbb{R}^{m \times n} and 𝐱m\mathbf{x} \in \mathbb{R}^m (the vector matches the number of rows), then instead you can pre-multiply the vector by the matrix, as in 𝐲=𝐱A\mathbf{y} = \mathbf{x} A𝐲m\mathbf{y} \in \mathbb{R}^m, and yij=jxiaijy_{ij} = \sum_j x_i a_{ij}.

Matrix inversion

The matrix inverse A1A^{-1} is a matrix such that A1A=𝟙A^{-1}A = \mathbb{1}. Not all matrices have inverses; if the inverse of a matrix AA exists, then AA is called invertible, and if no inverse exists then it is singular.

While linear algebra packages include routines to compute matrix inverses, we very rarely use them. The primary use for a matrix inverse is to solve a system of equations, and there are usually betters ways to solve the system (see Linear Systems).

Frobenius norm
The Frobenius norm of a matrix, denoted AF\|A\|_F, is the matrix version of the L₂ norm: the square root of the sum of the squares. Its square AF2=ijaij2\|A\|_F^2 = \sum_i \sum_j a_{ij}^2.

Linear Systems

TBD.

Further Reading