This seems kind of esoteric but it’s the basis for really simplifying some complicated mathematics, like the general theory of least squares. It’s well worth working through if you’re going to be doing the linear algebra version of least squares or multivariable optimization.
The two main results proved as theorems below:
Definitions
In the expressions below, is an (n x 1) column vector of variables,
is an (n x 1) column vector of coefficients,
is a symmetric (n x n) matrix of coefficients
with for all
and the gradient
is with respect to
Gradient of linear function
Theorem: .
Proof: To see this, just work out the expression in terms of the individual components.
is a general linear function of
. The partial derivative with respect to
is obviously a_i since the i-th term is the only term in which
appears.
Thus
General quadratic function
Before examining its gradient, let’s examine some properties of the expression and why it represents the most general quadratic function.
The most general quadratic function of the is a function containing all products
for
in addition to all
. Every product
with
is a duplicate of a product with
so we do not need those terms. In other words
Now let’s examine the expression for an arbitrary (not necessarily symmetric) A. We show that we can always replace it with a symmetric one.
First, note that is a column vector so
is a scalar. Thus it is equal to its own transpose,
. So
The matrix is symmetric,
. Thus if you have an expression
with non-symmetric square matrix
you can always replace
with the symmetric
.
Finally, let’s expand .
The i-th element of is
So
which we can break into diagonal and off-diagonal terms
Now note in the second sum that every combination of distinct i and j occurs twice, once with and once with
. For instance we have a term
and
, and because
is symmetric, these are equal. So we can write
Comparing to the general quadratic function at the top of this section, we can see that any such expression can be written in the form with a symmetric
, with
and
Gradient of a general quadratic
And now we can attack our final gradient calculation.
Theorem: .
Proof:
Note that has exactly one term for each distinct combination of i and j, i.e. that each
occurs in exactly one term with each
with
. Thus the partial of this sum with respect to
is
.
So the i-th element of the gradient vector is
and it follows that
Leave a Reply