next up previous contents index
Next: Inverting the Covariant Hessian Up: Geometric Technicalities Previous: Getting Around   Contents   Index


Covariant Differentiation

With the notion of a gradient and a notion of movement that respects the constraints of the manifold, one might wish to begin with an optimization routine of some sort. A steepest descent could be implemented from these alone. However, in order to carry out sophisticated optimizations, one usually wants some sort of second derivative information about the function.

In particular, one might wish to know by how much one can expect the gradient to change if one moves from $Y$ to $Y+\epsilon H$. This can actually be a difficult question to answer on a manifold. Technically, the gradient at $Y$ is a member of $T_Y(\mbox{Stief}(n,k))$, while the gradient at $Y+\epsilon H$ is a member of $T_{Y+\epsilon
H}(\mbox{Stief}(n,k))$. While taking their difference would work fine in a flat space (see Figure 9.8), if this were done on a curved space, it could give a vector which is not a member of the tangent space of either point (see Figure 9.9).

Figure 9.8: In a flat space, comparing vectors at nearby points is not problematic since all vectors lie in the same tangent space.
\begin{figure}\begin{center}
\leavevmode
\epsfxsize =4.0 in
\epsfbox{flathessian.eps}\end{center}\end{figure}

Figure 9.9: In a curved manifold, comparing vectors at nearby points can result in vectors which do not lie in the tangent space.
\begin{figure}\begin{center}
\leavevmode
\epsfxsize =4.0 in
\epsfbox{curvedhessian.eps}\end{center}\end{figure}

A more sophisticated means of taking this difference is to first move the gradient at $Y+\epsilon H$ to $Y$ in some manner which translates it in a parallel fashion from $Y+\epsilon H$ to $Y$, and then compare the two gradients within the same tangent space. One can check that for $V \in T_{Y+\epsilon H}(\mbox{Stief}(n,k))$ the rule

\begin{displaymath}V \rightarrow V + \epsilon \Gamma(V,H),\end{displaymath}

where $\Gamma(V,H)$ is the Levi-Civita connection, takes $V$ to an element of $T_{Y}(\mbox{Stief}(n,k))$ and preserves inner product information (to first order in $\epsilon$). This is the standard rule for parallel transport which can found in the usual literature ([80,459,273,222], and others).

Using this rule to compare nearby vectors to each other, one then has the following rule for taking derivatives of vector fields:

\begin{displaymath}D_H G = \frac{d}{ds} G(Y+sH) \vert _{s=0} + \Gamma(G,H),\end{displaymath}

where $G$ is any vector field (but we are only interested in derivatives of the gradient field). This is the function implemented by dgrad in the software.

In an unconstrained minimization, the second derivative of the gradient $g = \nabla f$ along a vector $\vec{h}$ is the Hessian $[\frac{\partial^2 f}{\partial x_i \partial x_j}]$ times $\vec{h}$. Covariantly, we then have the analogy,

\begin{displaymath}
\left[\frac{\partial^2 f}{\partial x_i \partial x_j} \right] \vec{h} =
(\vec{h} \cdot \nabla) g
\sim D_H G.\end{displaymath}


next up previous contents index
Next: Inverting the Covariant Hessian Up: Geometric Technicalities Previous: Getting Around   Contents   Index
Susan Blackford 2000-11-20