Matrix calculus

This seems kind of esoteric but it’s the basis for really simplifying some complicated mathematics, like the general theory of least squares. It’s well worth working through if you’re going to be doing the linear algebra version of least squares or multivariable optimization.

The two main results proved as theorems below:

\nabla \bf{a^Tx} = \bf{a} \\ \nabla \bf{x^TQx} = \bf{Qx}

Definitions

In the expressions below, \mathbf{x} is an (n x 1) column vector of variables,

\bf{x} =  \begin{pmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}

\mathbf{a} is an (n x 1) column vector of coefficients,

\bf{a} =  \begin{pmatrix}a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix}

\mathbf{Q} is a symmetric (n x n) matrix of coefficients

\bf{Q} =  \begin{pmatrix}q_{11} & q_{12} & \cdots & q_{1n} \\  q_{21} & q_{22} & \cdots & q_{2n} \\  \vdots & \vdots & \cdots & \vdots \\  q_{n1} & q_{n2} & \cdots & q_{nn} \\  \end{pmatrix}

with q_{ij} = q_{ji} for all i, j and the gradient \nabla is with respect to \mathbf{x}

\nabla f(\bf{x}) =  \begin{pmatrix}\partial{f} / \partial{x_1} \\  \partial{f} / \partial{x_2} \\ \vdots \\  \partial{f} / \partial{x_n} \end{pmatrix}

Gradient of linear function

Theorem: \nabla \bf{a^Tx} = \bf{a}.

Proof: To see this, just work out the expression in terms of the individual components.

\bf{a^Tx} = \sum_{i=1}^n a_i x_i is a general linear function of \bf{x}. The partial derivative with respect to x_i is obviously a_i since the i-th term is the only term in which x_i appears.

Thus

\nabla \bf{a^Tx} =  \begin{pmatrix}a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix} = \bf{a}

General quadratic function

Before examining its gradient, let’s examine some properties of the expression \nabla \bf{x^TQx} and why it represents the most general quadratic function.

The most general quadratic function of the \{x_i\} is a function containing all products x_i x_j for i < j in addition to all x_i^2. Every product x_i x_j with i > j is a duplicate of a product with i < j so we do not need those terms. In other words

f(x) = \sum_{i=1}^n a_i x_i^2 + \sum_{i < j} b_{ij} x_i x_j

Now let’s examine the expression \nabla \bf{x^TAx} for an arbitrary (not necessarily symmetric) A. We show that we can always replace it with a symmetric one.

First, note that \bf{Ax} is a column vector so \bf{x^T(Ax)} is a scalar. Thus it is equal to its own transpose, \bf{x^TAx} = \bf{x^TA^Tx}. So

\bf{x^TAx} = \frac{1}{2} \left ( \bf{x^TAx} + \bf{x^TA^Tx} \right ) \\[6 pt] = \frac{1}{2} \bf{x^T} (\bf{Ax} + \bf{A^Tx}) \\[6 pt] = \frac{1}{2} \bf{x^T} (\bf{A} + \bf{A^T}) \bf{x} \\[6 pt] = \bf{x^T} \left [ \frac{1}{2}(\bf{A} + \bf{A^T}) \right ] \bf{x}

The matrix \bf{Q} = \frac{1}{2}(\bf{A} + \bf{A^T}) is symmetric, \bf{Q^T} = \frac{1}{2}(\bf{A^T} + \bf{A}) = \bf{Q}. Thus if you have an expression \bf{x^TAx} with non-symmetric square matrix \bf{A} you can always replace \bf{A} with the symmetric \bf{Q}.

Finally, let’s expand \bf{x^TQx}.

The i-th element of \bf{Qx} is

\bf(Qx)_i = \sum_{j=1}^n q_{ij} x_j

So

\bf{x^TQx} = \sum_{i=1}^n x_i (\bf{Qx})_i \\[6 pt] = \sum_{i=1}^n \sum_{j=1}^n x_i q_{ij} x_j

which we can break into diagonal and off-diagonal terms

\bf{x^TQx} = \sum_{i=1}^n q_{ii} x_i x_i +  \sum_{i, j, i \neq j} q_{ij} x_i x_j

Now note in the second sum that every combination of distinct i and j occurs twice, once with q_{ij} x_i x_j and once with q_{ji} x_j x_i. For instance we have a term q_{23} x_2 x_3 and q_{32} x_3 x_2, and because \bf{Q} is symmetric, these are equal. So we can write

\bf{x^TQx} = \sum_{i=1}^n q_{ii} x_i^2 +  \sum_{i, j, i < j} 2 q_{ij} x_i x_j

Comparing to the general quadratic function at the top of this section, we can see that any such expression can be written in the form \bf{x^T Q x} with a symmetric \bf{Q}, with q_{ii} = a_i and q_{ij} = q_{ji} = (1/2)b_{ij}

Gradient of a general quadratic

And now we can attack our final gradient calculation.

Theorem: \nabla \bf{x^TQx} = \bf{2Qx}.

Proof:

Note that \sum_{i, j, i < j} 2 q_{ij} x_i x_j has exactly one term for each distinct combination of i and j, i.e. that each x_i occurs in exactly one term with each x_j with i != j. Thus the partial of this sum with respect to x_i is \sum_{j \neq i} 2q_{ij} x_j.

So the i-th element of the gradient vector is

\frac{\partial}{\partial {x_i}} \bf{x^TQx} =  \frac{\partial}{\partial {x_i}}\sum_{i=1}^n q_{ii} x_i^2 +   \frac{\partial}{\partial {x_i}}\sum_{i, j, i < j} 2 q_{ij} x_i x_j \\[6 pt] = 2 q_{ii} x_i + \sum_{j \neq i} 2 q_{ij} x_j \\[6 pt] = 2 \sum_{j=1}^n q_{ij} x_j \\[6 pt] = 2 (\bf{Qx})_i

and it follows that \nabla \bf{x^TQx} = \bf{2Qx}

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: