Introduction to Optimization on Manifolds/Lie-groups

Why writing this article?

There are many good resources about optimization on Manifolds/Lie-groups. They provided good technical details. However, people (including me) can easily be lost in those details.

I’d like to give a simple example of doing least squares on Manifolds to demonstrate

  1. How to do it,
  2. Why it is better.

What’s in this article?

  • A discussion of 3D rotation (SO3) parameterization.
  • An optimization problem with SO3 parameters.
    • \(\text{cost}(R) = ||Log(R_{target}^T R)||^2\)
    • The lie-groups/Manifold parameterization of this problem.
    • Prove the problem is linear. It can be solved in 1 Newton step.
  • Implement the solver in Python

Why doing optimization on Manifolds/lie-groups?

Because it is a better parameterization of some optimization problems. For example, optimizations with rotation can be solved faster and cleaner by lie-group parameterization.

(loosely speaking, better means fewer optimization steps, fast optimization time)

Optimization with Rotation Parameter

Consider an optimization problem where rotation is the parameter to optimize,

\(\text{cost}(R), R \in SO(3)\)

SO(3) is the collection of all rotation matrices in 3d.

\(R \in SO(3) \; \text{iff} \; R^T R = \text{Identity} \; \text{and} \; \text{det}(R) = 1\)

How to minimize the cost while keeping the parameter as a rotation matrix?

Solution1, 3×3 matrix + cost

Solution 1 is to treat R as a 3 by 3 matrix, and add a cost \(|| R^T R – I||^2\) to make R close to SO(3).

\(\text{cost1(R)} = \text{cost}(R) + ||R^T R – I||^2\)

Where R is not constrained.
It works, but it is inefficient and we need to apply extra steps to make \(R \in SO(3)\).

This paper used the method.
Simultaneous Robot-World and Hand-Eye Calibration
Fadi Dornaika, Radu Horaud
https://hal.inria.fr/inria-00590087/document

Solution2, yaw-pitch-roll

Solution2 is to parameterize \(R\) using (yaw, pitch, roll) in an Euler angle order.

So \(R\) is a function of (yaw, pitch, roll),

\(R(yaw, pitch, roll) = \text{euler-angle-to-rotation-mat}(yaw, pitch, roll)\).

For detail definition of (yaw, pitch, roll), check this: http://planning.cs.uiuc.edu/node102.html

Using this parameterization, we basically did a chain rule for the optimization problem.

The original problem is,

\(cost(R), R \in SO(3)\)

Using the yaw, pitch, roll parameterization of the rotation matrix, the above problem become,

\(\text{cost2}(yaw, pitch, roll) = \text{cost}(R(yaw, pitch, roll))\)

Then, we change the cost by changing R, which is changed by changing (yaw, pitch, roll). Doing this we can make sure \(R \in SO(3)\).

But it is inefficient because

  1. (yaw, pitch, roll) is not an even partition of the 3d sphere. e.g. At gimbal lock, changing yaw doesn’t change the cost.
  2. Since (yaw, pitch, roll) is not a good partition, jacobian w.r.t (yaw, pitch, roll) is not a good linearization of the cost function, so optimization is slower. e.g. The jacobian w.r.t (yaw, pitch, roll) is very nonlinear. So the Gaussian-Newton or Gradient descent is slower.
  3. Hard to work with. Derive jacobian by hand is impossible for (yaw, pitch, roll). Symbolic jacobian is a monster (The symbolic expression can be a few hundred lines for a moderate 3D problem).

Solution 3, Axis-angle representation

First, let me do a simple review of the axis-angle representation for rotation.

Define the axis-angle representation for a 3d rotation to be: w

\(w \in R^3\)

The idea of axis-angle representation is: Rotate the current coordinate system around a unit axis w / ||w|| for ||w|| radius.

Every rotation matrix can be represented by an axis-angle.

To rotation a vector, we need to use the Rodrigues’ rotation formulate to convert the axis-angle w to a rotation matrix. Then do the rotation by multiplying the rotation matrix with the vector.

Let’s call the Rodrigues’ rotation formulate the Exponential map,

\(R = Exp(w)\)

It maps a axis-angle rotation \(w \in R^3\) into a rotation matrix \(R \in SO(3)\).

The formal definition of Exp takes the skew-symmetric (cross-product matrix) form of w. but it is fine to ignore it in our application.

There is an invert formula for Exp(w). It is called the Log map,

\(w = Log(R)\)

It maps a rotation matrix to an axis-angle rotation.

For 3D rotation, the axis-angle \(w\) is called the \(so(3)\). The rotation matrix is called the \(SO(3)\).

The Exponential function maps \(so(3)\) to \(SO(3)\).

The Log function maps \(SO(3)\) to \(so(3)\).


The Solution 3 is using the axis-angle parameterization,

\(\text{cost3}(w) = \text{cost}(R(w))\)

where \(R(w) = Exp(w)\), \(R(w) \in SO(3)\)

People nowadays (after 2015) like to work with it because,

  1. Axis-angle representation is an even partition of the 3D sphere (need a proof). Optimization is faster because the jacobian/gradient w.r.t so(3) is a better local approximation for the residual/cost function.
  2. Easy to work with. The jacobian derivation is manageable. It is even simpler if we do local parameterization of the rotation.
  3. Looks fancy. You can call it: “xxx on the manifold”, “xxx Lie Groups”. With good story-telling, you can easily publish a paper.

Solution 4, Quaternions

Quaternions have “linear” local parameterization too. But personally I think it is hard to work with algebraically, and hard to understand it intuitively.

Read this paper for quaternion parameterization: VINS-Mono https://arxiv.org/pdf/1708.03852.pdf
For a formal discussion of rotation parameterization, read chapter 3 of https://www.amazon.com/Harmonic-Analysis-Engineers-Applied-Scientists/dp/0486795640

I will go with solution 3, the axis-angle representation.

The Optimization Problem on Manifold

Consider this problem,

\(\text{cost}(R) = ||Log(R_{target}^T R)||^2\)

Where the Log function maps a rotation matrix (SO3) to it’s axis-angle representation (so3). \(R_{target}\) is fixed.

Intuitively, We want \(R\) to be close to \(R_{target}\).

Break it done,

\(R_{target}^T * R = \text{diff-SO3}\) is the “difference” of \(R_{target}\) and \(R\). e.g. If they are the same, the result is identity.

\(Log(\text{diff-SO3}) = \text{diff-so3}\) maps the “difference” of rotation matrix into axis-angle. i.e. How much axis-angle we need to rotation \(R\) so that \(R\) is the same as \(R_{target}\).

\(||\text{diff-so3}||^2\) maps the axis-angle difference into a scalar,which is the cost value.

We will derive the so(3) parameterization of this problem and prove the residual function is linear.

so(3) Parameterization

Let’s parameterize R by its axis-angle representation so(3),

\(\text{cost}(w) = ||Log(R_{target}^T Exp(w))||^2\)

Follow the definition of least-squares problem, the residual function (stuffs inside the L2 norm) is,

\(r(w) = Log(R_{target}^T Exp(w)) \; \; \; (1)\)

For least-squares formulation (L2 norm of the residual function), we ONLY need to compute the jacobian of the residual function, and solve the normal equation.
For more info: https://wang-yimu.com/least-squares-refresher/

There are some suprising fact,

  1. This residual \(r(w) = Log(R_{target}^T Exp(w))\) is a linear function of w,
  2. Since residual is linear, Gaussian-newton can solve the optimization in 1 step.

The Linear Residual

The Lie-group theory is a little bit complicated. But to prove the residual is linear, we just need a math formula.

\(Log(Exp(w1)Exp(w2)) = w1 + \text{complex-function} (w1) w2 \; \; \; (2)\)

Where, \(w1, w2 \in R^3\) or \(w1, w2 \in so(3)\), \(\text{complex-function}\) is a function. It maps \(R^3 \mapsto R^{3×3}\)

In English, We can break the log of the product of two elements of SO(3) into a linear function of two elements of so(3).

The complete formula is here for reference,

Check the section II in this paper for details:
IMU Preintegration on Manifold
Christian Forster, Luca Carlone, Frank Dellaert, Davide Scaramuzza
http://www.roboticsproceedings.org/rss11/p06.pdf

Go back to the residual function (1),

\(r(w) = Log(R_{target}^T Exp(w))\)

let \(R1 = R_{target}^T\)

\(r(w) = Log(R1 Exp(w))\)

Since \(R1 \in SO(3)\), there exist a w1 such that,

\(r(w) = Log(Exp(w1)Exp(w))\)

Apply the formula (2),

\(r(w) = w1 + \text{complex-function}(w1) w\)

Since \(w1 = Log(R1)\) is fix in the residual function, \(r(w)\) is a linear function of w.

The theory behind it is the Lie-group theory. There are some papers which gave a good summary of it.

Toward Invariant Visual-Inertial State Estimation using Information Sparsification: https://www.ri.cmu.edu/publications/toward-invariant-visual-inertial-state-estimation-using-information-sparsification/
There are lots of helpful tutorials hidden in GTSAM’s repo: https://github.com/borglab/gtsam/tree/develop/doc

The Single Gaussian-newton Step

For a least square problem, if the residual is a linear function, then the cost is a quadratic function of the residual. We can minimize the cost in a single step by solving the normal equation.

This means Lie group parameterization is a better (than Euler angle) parameterization for rotation.

Check this post for a prove: https://wang-yimu.com/least-squares-refresher/

Implementation

Theory and math need support from facts.

Let’s implement it.

The code: https://github.com/yimuw/yimu-blog/tree/master/least_squares/lie_linear_residual

In reality, I prefer do some crazy program first. Then write done some math. In the end, I claim I am a genius.
Watch this course for more info : https://see.stanford.edu/Course/EE364A

The Exp/Log map

Exp/log maps are tricky to implement. For example, if w is 0, Rodrigues’ Rotation Formulate doesn’t work. But the good news it Scipy implements the general (but slow) Exp/Log maps. I can just call them.

For a practical implementation of Exp/Log, check the source code for GTSAM: https://github.com/borglab/gtsam/blob/develop/gtsam/geometry/SO3.cpp
def skew_symmetric(w):
    w1, w2, w3 = w
    return np.array([
        [0, -w3, w2], 
        [w3, 0, -w1],
        [-w2, w1, 0]
    ])


def unskew_symmetric(W):
    assert np.allclose(W, -W.T)
    return np.array([W[2, 1], W[0, 2], W[1, 0]])
    

def skew_SO3_exp(w):
    from scipy.linalg import expm
    R = expm(skew_symmetric(w))
    assert np.allclose(R.T @ R, np.identity(3))
    return R
    
    
def SO3_log_unskew(R):
    from scipy.linalg import logm
    assert np.allclose(R.T @ R, np.identity(3))
    if np.allclose(R, R.T):
        # break sysmetry
        # e.g. turning from 0 to pi, 
        # you can go clock-wise or go counter-clock-wise.
        return SO3_log_unskew(R @ skew_SO3_exp([1e-6, 0, 0.]))
    return unskew_symmetric(logm(R))

I ignored skew operation in the above discussion. Please read supporting materials for details.

For the log function, there is a singularity. e.g. For the 2D case, if we want to turn back 180 degrees, there are two options. Going clock-wise or counter-clock-wise. We broke the symmetry and called the log function recursively.

The Cost Function

The cost and residual functions are straightforward to implement.

residual function:

\(r(R) = Log(R_{target}^T R)\)

or in Lie-group parameterization,

\(r(w) = Log(R_{target}^T Exp(w))\)

class Problem:
    def __init__(self):
        self.R_target = skew_SO3_exp(
            [random_angle(), random_angle(),
             random_angle()])

    def residual_function(self, R):
        """
        A residual function r(R) = log(R_target.T @ R)
        """
        residual = SO3_log_unskew(self.R_target.T @ R)
        return residual

    def cost(self, R):
        """
        cost(R) = ||residual(R)||^2
        """
        r = self.residual_function(R)
        return r.T @ r

The Quadratic Cost

Let’s plot the cost and the squares root of the cost.

def plot_cost_sqrt():
    p = Problem()
    direction = np.random.rand(3, 1)
    direction = direction / np.linalg.norm(direction)

    deltas = np.linspace(-np.pi, 2 * np.pi, 300)
    costs = [p.cost(p.R_target @ skew_SO3_exp(d * direction)) for d in deltas]
    
    plt.subplot(2, 1, 1)
    plt.plot(deltas, costs)
    plt.title('change-along-a-direction vs cost')

    plt.subplot(2, 1, 2)
    plt.plot(deltas, [sqrt(cost) for cost in costs])
    plt.title('change-along-a-direction vs sqrt(cost)')
    plt.show()

We randomly generate a so(3) direction, then generate SO(3) rotations along the so(3) direction, and plot all costs and the square root of the costs.

Upper: The cost vs so(3) distance from 0. The cost is “locally” quadratic,
Lower: The sqrt(cost) vs so(3) distance from 0. The cost is “locally” linear.
The jumps of the curvature were due to the circular native of rotation.

The cost is quadratic, the sqrt(cost) is linear.

The Single Newton Step

For least squares problems, if the residual function is linear, the cost function is quadratic. We can solve the problem in one step.

def gaussian_newton():
    R_variable = skew_SO3_exp([random_angle(), random_angle(), random_angle()])
    p = Problem()
    print("R_init:", R_variable)
    print('cost:', p.cost(R_variable))

    print('gaussian_newton...')
    residual = p.residual_function(R_variable)
    jacobian = p.numerical_jacobian(R_variable)
    dw = np.linalg.solve(jacobian.T @ jacobian, -jacobian.T @ residual)
    R_single_iteration = R_variable @ skew_SO3_exp(dw)

    print("R_single_iteration:", R_single_iteration)
    print("R_target:", p.R_target)
    assert np.allclose(R_single_iteration, p.R_target, 1e-4, 1e-6)

    print('single iter cost:', p.cost(R_single_iteration))

We generate a random rotation and a random Problem (which has a random \(R_{target}\)). And we compute the jacobian and the residual for the least squares problem. In the end, we solve the normal equation.

the output,

yimu@yimu-mate:~/yimu-blog$ python3 /least_squares/lie_linear_residual/lie_linear_residaul.py
R_init: [[ 0.22538252 -0.56304922 -0.79509641]
 [ 0.50277012 -0.63181679  0.58994047]
 [-0.83452078 -0.53271299  0.14068387]]
cost: 7.526584507456086

gaussian_newton...

R_single_iteration: [[-0.68289327 -0.69234893 -0.2330445 ]
 [-0.68260839  0.71839255 -0.1340072 ]
 [ 0.26019717  0.06756551 -0.96318863]]
R_target: [[-0.68289328 -0.69234896 -0.2330444 ]
 [-0.68260838  0.71839254 -0.1340073 ]
 [ 0.26019717  0.06756538 -0.96318864]]
single iter cost: 1.8398480398370354e-14

The initial cost is 7.526. After 1 Gaussian-Newton iteration, the cost is 1.839e-14.

We solve the problem in one step.

Summary

Manifolds/Lie-groups is a good parameterization for the optimization problems with rotations.

Discussion

The problem should be solved in one step

The cost function was,

\(\text{cost}(w) = ||Log(R_{target}^T Exp(w))||^2\)

The argmin of this problem should be,

\(\text{argmin} \; \text{cost}(w) = Log(R_{target})\)

You can prove it by plug in \(Exp(w) = Exp(Log(R_{target})) = R_{target}\) into the cost definition.

It was a simple, motivative problem.

But it is the fundamental cost of the SLAM problem.

Simultaneous Localization and Mapping

If you are familiar with Simultaneous localization and mapping(SLAM), the cost function should be very familiar.

In brutally honest English, the SLAM solves this problem:
There are errors due to whatever reason. We hide the error evenly so that other people/modules can’t notice anything.

The odometry constraint and the loop-closure constraint between 2 poses in the SLAM problem is,

\(r(T_1, T_2) =Log( (T_1^T T_2)^T T_{measured})\)

In English, we want the “distance” between pose T1 and pose T2 close to T_measured.

Paper Unlock

A Tutorial on Graph-Based SLAM
Giorgio Grisetti, Rainer Kümmerle, Cyrill Stachniss, and Wolfram Burgard

This paper gave a tutorial for graph-based SLAM. graph-based SLAM is exactly the same as least squares problem.

The first section gave a tutorial of nonlinear least squares problems.

For a simpler tutorial, we can read this post: least squares refresher.

Then, the author introduced the Least Squares on Manifolds.

Please read this section for more discussion about Least Squares on Manifolds.

In the end, the author presented some experimental results to show the computational advantages for Least Squares on Manifolds.

The manifold formulation is able to converge much faster.
(while the Euler formulation is suboptimal, but it shouldn’t be this bad…)

Learning Spherical Convolution
for Fast Features from 360° Imagery

Yu-Chuan Su Kristen Grauman

The (local) so(3) can be thought of as a cutting plain for a Sphere. For some operations, it’s not well defined on the Sphere. but we can project stuff on the sphere into the cutting plain.

In this paper, the author tried to do a convolutional neural net on a spherical image. However, the convolution is not well defined for a sphere.

There are 2 common solusions.

Top: the “world map” projection. It has a large distortion. Bottom: the tangent plain project. Less distortion. Basically, it’s the same as the UTM projection.

What’s next?

With a basical understand of the problem, we can discuss other interesting applications for optimization on Manifolds/Lie-groups.

4 thoughts on “Introduction to Optimization on Manifolds/Lie-groups

Leave a Reply