This seems kind of esoteric but it’s the basis for really simplifying some complicated mathematics, like the general theory of least squares. It’s well worth working through if you’re going to be doing the linear algebra version of least squares or multivariable optimization.
The two main results proved as theorems below:
In the expressions below, is an (n x 1) column vector of variables,
is an (n x 1) column vector of coefficients,
is a symmetric (n x n) matrix of coefficients
with for all and the gradient is with respect to
Gradient of linear function
Proof: To see this, just work out the expression in terms of the individual components.
is a general linear function of . The partial derivative with respect to is obviously a_i since the i-th term is the only term in which appears.
General quadratic function
Before examining its gradient, let’s examine some properties of the expression and why it represents the most general quadratic function.
The most general quadratic function of the is a function containing all products for in addition to all . Every product with is a duplicate of a product with so we do not need those terms. In other words
Now let’s examine the expression for an arbitrary (not necessarily symmetric) A. We show that we can always replace it with a symmetric one.
First, note that is a column vector so is a scalar. Thus it is equal to its own transpose, . So
The matrix is symmetric, . Thus if you have an expression with non-symmetric square matrix you can always replace with the symmetric .
Finally, let’s expand .
The i-th element of is
which we can break into diagonal and off-diagonal terms
Now note in the second sum that every combination of distinct i and j occurs twice, once with and once with . For instance we have a term and , and because is symmetric, these are equal. So we can write
Comparing to the general quadratic function at the top of this section, we can see that any such expression can be written in the form with a symmetric , with and
Gradient of a general quadratic
And now we can attack our final gradient calculation.
Note that has exactly one term for each distinct combination of i and j, i.e. that each occurs in exactly one term with each with . Thus the partial of this sum with respect to is .
So the i-th element of the gradient vector is
and it follows that
Leave a Reply