How to Land a Rocket? Part 1, Problem Definition

Why Writing This Article?

I’ve discussed some simple least squares applications in my Blog. It’s time to do something super cool.

For example land a rocket.

spaceX

What’s in this article?

We will first formulate the rocket landing problem as a mathematical optimization. Then, we will discuss how to solve the optimization efficiently. Specifically, I will

  1. Formulate the rocket landing problem as a least squares optimization problem.
  2. Discuss some least squares common techniques.
    1. Regularization.
    2. Separable objective function.
    3. Sparse linear system solver.
  3. Implement in C++.
  4. Discuss a classical paper CHOMP about motion planning.

Prerequisite

You need to know Linear Algebra and Multi-variable Calculus. We are going to formulate the rocket landing problem as a least-squares which allow us to solve a optimization super fast.

This article refreshes almost of the math concept used: Least squares refresh

Read a simpler application of least squares is highly recommended since I will skip some simple least squares concepts: Lucas-Kanade visual tracker.

Weighted Nonlinear Least Squares Refresher

For a residual function \(r(x)\),

\(r(x) : R^n \to R^m\).

Define the cost function to be,

\(\text{minimize} \; cost(x) = r(x)^T W r(x)\).

To minimize it, we linearize the residual function at the current variable \( x_{0} \) as,

\(r(\Delta{x}) \approx J(x_0) \Delta{x} + r(x_0)\).

Where \(\Delta{x} = x – x_{0}\).

To simplify notations, let \(J = J(x_0), r = r(x_0)\),

Plug in the cost function,

\(cost(x) \approx \Delta{x}^T J^TWJ \Delta{x} + 2 J^T W b \Delta{x} + b^T b\)

It is a quadratic, we can optimize the linearized cost in one step by solving the normal equation,

\(J^TWJ \Delta{x} = -J^TWb\)

and we update x by,

\(x = x_{0} + \Delta{x}\)

We do this iteration until we meet a terminating condition. e.g. \(\Delta{x}\) is small.

Nonlinear least squares summary:

  1. Linearize the residual function \(r(x)\) at current variable \(x_0\) , \(r(\Delta x) = J \Delta x + r\). Where \(\Delta{x} = x – x_{0}\), So the cost function \(\text{cost}(x) = r(x)^TWr(x)\) is approximated as a quadratic form.
  2. We can minimize the quadratic form in 1 step by solve the Normal equation \(J^TWJ \Delta{x} = -J^TWb\).
  3. Because of the change of variable \(\Delta{x} = x – x_{0}\), we update varible x by \(x = x_{0} + \Delta{x}\).
  4. If not converge. e.g. \(\Delta{x}\) is small. We go back to 1.
Check this post for details. https://wang-yimu.com/least-squares-refresher/

We will follow this procedure to solve the rocket landing problem.

1. How to Land a Rocket Computationally?

We will model the rocket land problem as a least squares optimization problem.

In English, the goals of the rocket landing problem are:

  1. Given the a initial state \(s_{initial}\) and a final state \(s_{end}\) for a rocket,
  2. We want to find a rocket trajectory from \(s_{initial}\) to \(s_{end}\).
  3. The trajectory follows physics.

In sudo-math, we want to find a trajectory which minimize this cost function,

\(cost(\text{trajectory}) = \\ \; \; \text{first-state-of-the-trajectory-close-to-s-init}(\text{trajectory}) \\ \; \; + \text{last-state-of-the-trajectory-close-to-s-end}(\text{trajectory}) \\ \; \; + \text{trajectory-follow-the-physics}(\text{trajectory})\)

We will translate it to concrete math in later section.

Firstly, let me define the state for the rocket.

1.1 Rocket State & Trajectory

Definition: A rocket state \(s\) is a vector, \(\text{s} \in R^8\),

\(\text{state} = \begin{bmatrix}x\\ y\\ vx\\ vy\\ \theta\\ \text{thrust}\\ \dot\theta \\ dt \end{bmatrix} = \begin{bmatrix}\textbf{p} \\ \textbf{v} \\ \theta \\ \text{thrust}\\ \dot\theta \\ dt \end{bmatrix}\)

Where \(\textbf{p} = \begin{bmatrix} x \\ y \end{bmatrix} \) is the position of the rocket.

\( \textbf{v} = \begin{bmatrix} vx \\ vy \end{bmatrix} \) is the velocity of the rocket.

\(\theta\) is the heading of the rocket.

\(\text{thrust}, \dot\theta \) are the controls.

\( \text{dt} \) is the time difference to next state. It will be clear later.

Definition: a trajectory is a collocation of rocket states,

\(\text{trajectory} = \{ s_0, s_1, s_2, …, s_n \}\)

\(s_i\) is the rocket state at step i.

Given the rocket state, we can define how the rocket moves over time.

1.2 Motion model

Motion model defines how the rocket move. It maps a previous state to the current state.

Definition: The state motion model is a function, \( \text{motion} \in R^8 \to R^8\):

\(s_{i+1} = \text{motion}(s_i)\)

In vector form, let \(s^+ = s_{i+1} , \; s = s_{i} \)

\(s^+ = \begin{bmatrix}\textbf{p}^+ \\ \textbf{v}^+ \\ \\ \theta^+ \\ \text{thrust}^+ \\ \dot\theta^+ \\ dt^+\end{bmatrix} = \begin{bmatrix} \textbf{p} + dt \textbf{v} \\ \textbf{v} + dt (\begin{bmatrix} \cos(\theta) \text{thrust} \\ \sin(\theta) \text{thrust} \end{bmatrix} – \; \text{gravity}) \\ \theta + dt \dot\theta \\ \text{thrust} \\ \dot\theta \\ dt \end{bmatrix} = \text{motion}(s) \)

The position \(\textbf{p}\) is updated by dt times velocity.

The velocity \(\textbf{v}\) is updated by dt times (1) thrust in the direction of the rocket, (2) gravity in -y direction.

The heading \(\theta\) is updated by dt times the control variable \(\dot \theta\) (which could be a oversimplification of rocket model).

Controls are slowly changing. The changing rate is the weight of the residual term. It will be clear later.

The model is underactuated (if we are only allow to change controls).
A awesome course about planning/control.
Underactuated Robotics
Russ Tedrake
http://underactuated.csail.mit.edu/underactuated.html
https://www.edx.org/course/underactuated-robotics-2

Motion Model Implementation

struct RocketMotionModel
{
    RocketState motion(const RocketState &state_in) const
    {
        auto state = state_in;
        RocketState next_state = state;

        const double dt = state.delta_time();
        const double theta = state.heading();
        next_state.velocity() = state.velocity() + dt * Vector2d(std::cos(theta) * state.acceleration(), 
                                                                 std::sin(theta) * state.acceleration() - GRAVITY);
        next_state.heading() = state.heading() + dt * state.turning_rate();

        next_state.position() = state.position() + dt * state.velocity(); 

        return next_state;
    }

    MatrixXd jocobian_wrt_state(const RocketState &state_in) const
    {
        using S = RocketState;
        const int residual_size = RocketState::STATE_SIZE;
        MatrixXd jacobi_wrt_s = MatrixXd::Identity(residual_size, residual_size);
        
        const double h = state_in.heading();
        const double sin_h = std::sin(h);
        const double cos_h = std::cos(h);
        const double dt = state_in.delta_time();
        const double vx = state_in.velocity().x();
        const double vy = state_in.velocity().y();
        const double acc = state_in.acceleration();

        jacobi_wrt_s(S::i_position_x, S::i_velocity_x) = dt;
        jacobi_wrt_s(S::i_position_x, S::i_dt) = vx;

        jacobi_wrt_s(S::i_position_y, S::i_velocity_y) = dt;
        jacobi_wrt_s(S::i_position_y, S::i_dt) = vy;

        jacobi_wrt_s(S::i_velocity_x, S::i_acceleration) = dt * cos_h;
        jacobi_wrt_s(S::i_velocity_x, S::i_heading) = - dt * sin_h * acc;
        jacobi_wrt_s(S::i_velocity_x, S::i_dt) = cos_h * acc;

        jacobi_wrt_s(S::i_velocity_y, S::i_acceleration) = dt * sin_h;
        jacobi_wrt_s(S::i_velocity_y, S::i_heading) = dt * cos_h * acc;
        jacobi_wrt_s(S::i_velocity_y, S::i_dt) = sin_h * acc - GRAVITY;

        jacobi_wrt_s(S::i_heading, S::i_turning_rate) = dt;
        jacobi_wrt_s(S::i_heading, S::i_dt) = state_in.turning_rate();
        
        return jacobi_wrt_s;
    }

    ...
};

The first function RocketState motion(const RocketState &state_in) const implements \(s_{i+1} = \text{motion}(s_i)\)

The second function MatrixXd jocobian_wrt_state(const RocketState &state_in) const implements the jacobian of motion function w.r.t input state \(\frac{\partial \text{motion}(s_i)}{\partial s_i}\).


With the state and motion model defined, we can define the cost function for the least squres problem.

1.3 The Cost Function

Recall the goals of the rocket landing problem are,

  1. Given the a initial state \(s_{initial}\) and a final state \(s_{end}\) for a rocket,
  2. We want to find a rocket trajectory from \(s_{initial}\) to \(s_{end}\).
  3. The trajectory follows physics.

Translate it into math. we can define the rocket landing optimization problem.

Definition: the least square problem for the rocket landing problem.

\(\text{minimize} \; cost(\{s_0, s_1, …, s_n\}) = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2 \)

Note: \(||v||_W^2 = v^TWv\) is the weighted L2 norm.

Break it down,

  1. \(||s_0 – s_{initial}||_{W_0}^2 \) says we want the first state \(s_0\) close to the initial state \(s_{initial}\).
  2. \( ||s_n – s_{end}||_{W_n}^2 \) says we want the final state \(s_n\) close to the end state \(s_{end}\).
  3. \( ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2 \) says we want two consecutive states follow the motion model.

1.4 The Residual Function

To solve a least squares problem, we need to linearize the residual function. Before that, let’s convert the rocket landing cost into the standard least squares form.

Recall the definition of the least squares problem, the cost of a weighted least squares problem is,

\(cost(x) = r(x)^T W r(x)\)


Start with the rocket landing cost,

\(\text{cost}(\{s_0, s_1, …, s_n\}) = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2 \)

By definition of quadratic function, \(||v||_A^2 = v^T A v\), we can see,

\(\text{cost}(\{s_0, s_1, …, x_n\}) = \begin{bmatrix} s_0 – s_{initial}\\ s_{1} – \text{motion}(s_0)\\s_{2} – \text{motion}(s_1)\\…\\s_{n} – \text{motion}(s_{n-1})\\s_n – s_{end}\\\end{bmatrix}^T \begin{bmatrix} W_0 & & & & & 0 \\ & W_m & & & & \\ & & W_m & & & \\ & & & … & & \\ & & & & W_m & \\ 0 & & & & & W_n\end{bmatrix} \begin{bmatrix} s_0 – s_{initial}\\s_{1} – \text{motion}(s_0)\\s_{2} – \text{motion}(s_1)\\…\\s_{n} – \text{motion}(s_{n-1})\\s_n – s_{end}\\\end{bmatrix}\)

Match the rocket land cost with the definition of least squares cost, \(r(x)^T W r(x)\).

We can see the residual function \(r(\{s_0, s_1, …, s_n\})\) is,

\(r(\{s_0, s_1, …, s_n\}) = \begin{bmatrix} s_0 – s_{initial}\\ s_{1} – \text{motion}(s_0)\\ s_{2} – \text{motion}(s_1)\\…\\s_{n} – \text{motion}(s_{n-1})\\s_n – s_{end}\\\end{bmatrix}\)

The weight \(W\) is,

\(W = \begin{bmatrix} W_0 & & & & & 0 \\ & W_m & & & & \\ & & W_m & & & \\ & & & … & & \\ & & & & W_m & \\ 0 & & & & & W_n\end{bmatrix}\)

Now, the rocket landing problem is formulated as a least squares problem. we can solve this least squares problem by,

  1. Linearize the residual function \(r(x)\) at current variable \(x_0\) , \(r(\Delta x) = J \Delta x + r\). Where \(\Delta{x} = x – x_{0}\), So the cost function \(\text{cost}(x) = r(x)^T W r(x)\) is approximated as a quadratic form.
  2. We can minimize the quadratic form in 1 step by solve the Normal equation \(J^TWJ \Delta{x} = -J^TWb\).
  3. Because of the change of variable \(\Delta{x} = x – x_{0}\), we update varible x by \(x = x_{0} + \Delta{x}\).
  4. If not converge. e.g. \(\Delta{x}\) is small. We go back to 1.

1.5 The Jacobian of the Residual function

The next step of the least squares optimization is to linearize the residual function.

\(r(\{s_0, s_1, …, s_n\}) = \begin{bmatrix} s_0 – s_{initial}\\ s_{1} – \text{motion}(s_0)\\ s_{2} – \text{motion}(s_1)\\…\\s_{n} – \text{motion}(s_{n-1})\\s_n – s_{end}\\\end{bmatrix} \approx J \Delta S + r(S)\)

Where \(S = \{s_0, s_1, …, s_n\} \).

I will skip the algebraic details since it’s straightforward to compute.

Now, the residual function is linearized as \(r(\Delta x) = J \Delta x + r\). where \(\Delta x = x – x_{cur}\)

Residual Functions Implementation

code link

struct Residual
{
    virtual ~Residual(){};

    virtual MatrixXd jacobian() const = 0;

    virtual MatrixXd weight() const = 0;

    virtual bool is_diagnal_weight() const = 0;

    virtual VectorXd residual() const = 0;

    virtual int residual_size() const = 0;

    virtual int variable_size() const = 0;

    // assume variable block
    virtual int variable_start_index() const = 0;

    double cost() const
    {
        const VectorXd r = residual();
        return r.transpose() * weight() * r;
    }
};

The residual function can be abstract as a C++ virtual class with those interfaces.

struct MotionResidual : public Residual
{
    // support container
    MotionResidual() = default;

    MotionResidual(const RocketState &s1, 
                   const int s1_idx, 
                   const RocketState &s2,
                   const int s2_idx,
                   const double w_scale)
        : weight_scale(w_scale), state1_index(s1_idx), state2_index(s2_idx), 
          state1(s1), state2(s2) 
    {}

    MatrixXd jacobian() const override
    {
        const int residual_size = state1.variable_size();
        const int variable_size = 2 * state1.variable_size();

        MatrixXd jacobi = MatrixXd::Zero(residual_size, variable_size);

        const MatrixXd jacobi_wrt_s1 = RocketMotionModel().jocobian_wrt_state(state1);
        const MatrixXd jacobi_wrt_s2 = - MatrixXd::Identity(residual_size, residual_size);

        jacobi.block(0,0,residual_size, residual_size) = jacobi_wrt_s1;
        jacobi.block(0,residual_size,residual_size, residual_size) = jacobi_wrt_s2;

        return jacobi;
    }

    // r(x1, x2) = m(x1) - x2
    VectorXd residual() const override
    {
        RocketState state2_pred = RocketMotionModel().motion(state1);
        Vector8d r = state2_pred.variables - state2.variables;

        return r;
    }

    ...

    MatrixXd weight() const override
    {
        const int residual_size = state1.variable_size();
        Vector8d weight_diag;

        //  dt, x, y, vel_x, vel_y, heading, turn_rate, accl
        weight_diag << 1e5, 1e3, 1e3, 1e4, 1e4, 1e5, 1e2, 1e1;

        Matrix8d weight = MatrixXd::Identity(residual_size, residual_size);
        weight.diagonal() << weight_scale * weight_diag;

        return weight;
    }

    ...
};

The motion residual class has 3 important functions.

VectorXd residual() computes the residual \(r(s_2, s_1) = s_2 – \text{motion}(s_1)\).

MatrixXd jacobian() computes the jacobian of the residual w.r.t inputs, \(\frac{\partial r(s_2, s_1)}{\partial s_2, s_1}\)

MatrixXd weight() Weight function returns the weight W for this residual function.

There are some code to put all residual terms together. The code is straightforward. I will skip it.

1.6 Solve the Normal Equation

Now the the residual function is linearized. The least squares cost function become a quadratic function. And we can minimize the quadratic by solving the normal equation.

\(J^T W J \Delta{x} = -J^T W b\)

Let’s solve it!

The matrix solver complains: \(J^T W J\) is not invertable.

Usually, strange outputs are because of bugs.
Read this book for software engineering best practice
Code Complete: A Practical Handbook of Software Construction

2 Least Squares Techniques

2.1 Regularization

Why no solutions?

Actually there are too many solutions. The linear system of equations \(J^T W J \Delta{x} = -J^T W b\) is under-deterministic.

Intuitively, there are too many possible trajectories from the start state to the end state. Regularization solves the problem.

Regularization adds soft constraints to the problem.


Let’s do a example.

The original cost is:

\(\text{minimize} \; cost\{s_0, s_1, …, s_n\} = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2 \)

In English:

we want to find a rocket trajectory from \(x_{inital}\) to \(x_{end}\).

But there are too many possible trajectories. For example, you can first go left then go right, or first go right then go left.

Let’s add some constraints,

we want to find a trajectory from \(x_{inital}\) to \(x_{end}\) with low fuel consumption and arriving fast.

Translate to math,

\(\text{minimize} \; \text{cost-regularized}\{s_0, s_1, …, s_n\} = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2 \\ \; \; \; + k1 \sum_i (s_i.\text{acceleration})^2 + k2 \sum_i (s_i.\text{dt})^2\)

The \(k1 \sum_i (s_i.\text{acceleration})^2 + k2 \sum_i (s_i.\text{dt})^2\) are the regularizations.

\(s_i.\text{acceleration}^2\) penalize large acceleration which is basically fuel consumption.

\(s_i.\text{dt}^2\) penalize large traveling time.


Now, the normal equation is invertable (with extensive debugging and hair loss). We get the trajectory!

Regularization Implementation

code link

void apply_regularization_to_hessian(
                    const RocketLandingResiduals &residual,
                    NormalEqution &normal_equation)
{
    if(verbose_)  std::cout << "computing regularization..." << std::endl;

    const Trajectory trajectory = residual.get_variables();

    auto set_regularization_func = [&residual, &trajectory, &normal_equation]
                               (const double regularization, const int reg_idx)
    {
        for(int state_i = 0; state_i < residual.num_rocket_states; ++state_i)
        {
            const int var_idx = state_i * RocketState::STATE_SIZE + reg_idx;
            normal_equation.lhs(var_idx, var_idx) += 2 * regularization;

            normal_equation.rhs(var_idx) += - 2 * regularization 
                * trajectory.states[state_i].variables[reg_idx];
        }
    };

    if(residual.time_regularization > 0)
    {
        if(verbose_)  std::cout << "residual.time_regularization: " << residual.time_regularization << std::endl;
        set_regularization_func(residual.time_regularization, RocketState::i_dt);
    }

    if(residual.acceleration_regularization > 0)
    {
        if(verbose_)  std::cout << "residual.acceleration_regularization: " << residual.acceleration_regularization << std::endl;
        set_regularization_func(residual.acceleration_regularization, RocketState::i_acceleration);
    }

    if(residual.turning_rate_regularization > 0)
    {
        if(verbose_)  std::cout << "residual.turning_rate_regularization: " << residual.turning_rate_regularization << std::endl;
        set_regularization_func(residual.turning_rate_regularization, RocketState::i_turning_rate);
    }
}

We can add the regularization as a residual term. But it is easier to update the normal equation directly. The derivation is simple. I will skip it.

The lambda function set_regularization_func update the normal equation for regularization. Remember to update the RHS and the LHS.


Dude, it is slow

We can get a solution now. However, it is slow.

For a system of 50 rockets states. Each iteration (computing jacobian and solving the Normal equation) takes 30 milliseconds. The whole optimization takes more than 1 second.

Let’s make it faster.


The first bottleneck is the Hessian computation \(J^T J\) ( actually we are doing \(J^T W J\)). It takes 25.466 milliseconds on my computer.

Let’s do a run-time analysis for it.

The Jacobian J is a (num_residuals, num_variables) matrix.

\(J \in R^{(\text{num-residuals}, \text{num-variables})}\).

Assuming the number of states are 50. And we know the size of state is 8.

num_residuals = num_init_residuals + num_end_residuals + num_motion_residuals = 8 * (49 + 1 + 1) = 440

num_variables = num_states * state_size = 50 * 8 = 400

Compute \(J^T J\) takes at least 400 * 400 * 400 = 64,000,000 operations.

Can we make the \(J^T J\) computation faster?

2.2 Separable Objective Function

Separable objective function basically means Hessian operator is linear. So we can break the cost function into small parts and compute it’s \(J_i^T J_i\) independently, which is faster than computing \(J^T J\).


Let’s do a example. Consider a cost function,

\(cost(x1, x2) = cost_a(x1) + cost_b(x2)\)

Consider the Hessian of cost(x1, x2)

\(\text{Hessian}(cost(x1, x2)) = \text{Hessian}( cost_a(x1) + cost_b(x2) )\)

Since Hessian is a linear functional operator,

\(\text{Hessian}(cost(x1, x2)) = \text{Hessian}(cost_a(x1)) + \text{Hessian}(cost_b(x2))\)

For least squares problems, \(cost(x) = ||r(x)||^2\),

\(\text{Hessian}(||r(x1, x2)||^2) = \text{Hessian}(||r_a(x1)||^2) + \text{Hessian}(||r_b(x2)||^2) \)

Assuming \(J_{r_a}\) is the jacobian of \(r_a(x)\),

\(J_r^T J_r = J_{r_a}^T J_{r_a} + J_{r_b}^T J_{r_b}\)

It means for each residual term in the cost function, we can compute the \(J_i^T J_i\), and add them together to get \(J^T J\).

Go back to original problem (ignore regularization). Let’s analyze the time complexity after applying this technique.

\(\text{minimize} \; cost\{s_0, s_1, …, s_n\} = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2\)

The jacobian of the motion residual is a (16,8) matrix. We have 49 residual terms.

So, the total operation to compute the jacobian for the cost is at most,

operations = number_residual * operations_to_compute_small_jtj = (49 + 1 + 1) * (16 * 16 * 16) = 208,896

Big saving. 208,896/64,000,000 = 0.00326.

Now computing the normal equation take less than %1 of the original time. On my computer, the Hessian computation takes 0.182 milliseconds.

Separable Objective Function Impl

code link

void add_residual_direct_update(const Residual &residual,
                                int &residual_idx,
                                NormalEqution &equ)
{
    const MatrixXd jacobi = residual.jacobian();
    const MatrixXd weight = residual.weight();
    MatrixXd jtw = jacobi.transpose();
    if(residual.is_diagnal_weight())
    {
        for(int c = 0; c < jtw.cols(); ++c)
        {
            jtw.col(c) *= weight(c, c);
        }
    }
    else
    {
        jtw *= weight;
    }
    
    const int v_start_idx = residual.variable_start_index();
    const int v_size =  residual.variable_size();
    assert(v_start_idx + v_size <= equ.lhs.rows());
    assert(equ.lhs.rows() == equ.lhs.cols());
    
    equ.lhs.block(v_start_idx, v_start_idx, v_size, v_size) += jtw * jacobi;
    equ.rhs.segment(v_start_idx, v_size) += - jtw * residual.residual();

    ...

    residual_idx += residual.residual_size();
}

The code computes \(J_i^T W_i J_i\) for each residual block, and adds the residual block to the normal equation. The key is to correctly match variable and residual indexes.


Dude, it is still slow

Now computing the normal equation \(J^T J\) is fast. but we have another bottle-neck.

Solving the linear system \(J^T J \Delta{x} = -J^T b\) for \(\Delta{x}\) is slow.

Let me first review how to solve the least squares linear system \(J^T J \Delta = -J^T b\).

How to solve a linear system?

For a least squares system \( J^T J \Delta{x} = -J^T b \), the LHS \(J^T J\) is guaranteed to be a positive semi-definite (PSD) matrix. prove,

\(x^T(J^T J)x = ||Jx||^2 \geq 0\)

For PSD matrix, we can factor it into the product of triangular matrices. \(J^T J = L L^T\). It is called the Cholesky Factorization.

After the Cholesky, the system \( J^T J \Delta{x} = -J^T b \) become,

\(L L^T \Delta{x} = -J^T b\)

For a triangular system \(L x = b\), solving for x can be done by back-substituting, which is super fast.

Let \(y = L^T \Delta x\). By back-substituting \(L y = -J^T b\), we get y. Then by back-substituting \(L^T \Delta{x} = y\), we get \(\Delta{x}\).

The tricky part is the Cholesky factorization. It takes \(O(n^3)\) times, where n is the number of rows or cols of \(J^T J\).

2.3 Sparse Linear system

The \(J^T J\) is sparse for the rocket landing cost function,

\(\text{minimize} \; cost\{s_0, s_1, …, s_n\} = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2\)

Intuitively, a quadratic cost of \((s_1, s_2)\) only affect a sub-block of the Hessian which contains \((s_1, s_2)\). It means, the Hessian is block-diagonal.

The non zeros elements of Hessian of the rocket landing system
The non zeros elements of Hessian of the rocket landing system

For a sparse system, using the sparse Cholesky to solve it is much faster.

The sparse Cholesky is also a \(O(n^3)\) operation. However, the constant factor in the \(O(n^3)\) can be much smaller.

Why sparse systems are easier to solve?

Let’s consider a extreme case. \(A x= b\), where A is diagonal. We can solve it in \(O(n)\) steps by looping the diagonal elements.

For a general sparse system solver, it will first permute the matrix so that it is close to block diagonal. Solving the block-diagonal system takes \(O(a^3 n)\) steps where \(a\) is the size of the block.

When the system is very sparse, we can argue \(a << n\). So solving the sparse system takes \(O(n)\) steps.

Please read Convex Optimization page 616 for Stephen Boyd’s explanation about Sparse problems and Separable objective.

Sparse Solver Implementation

code link

virtual VectorXd solve_normal_eqution(NormalEqution &normal_equ) override
{
    ScopeProfiler p("solve_normal_equation");
#ifdef CHOLMOD_SUPPORT
    // Cholesky for A.T * A system
    return solve_normal_eqution_sparse<Eigen::CholmodSupernodalLLT<Eigen::SparseMatrix<double>>>(normal_equ);
#else
    return solve_normal_eqution_sparse<Eigen::SparseLU<Eigen::SparseMatrix<double>>>(normal_equ);
#endif
}

template<typename SOLVE_TYPE>
VectorXd solve_normal_eqution_sparse(NormalEqution &normal_equ)
{
    if(verbose_) std::cout << "Creating SparseView..." << std::endl;
    
    // TODO: Can't find good doc for sparseView()
    const double reference = 1e-4;
    const double epsilon = 1e-6;
    Eigen::SparseMatrix<double> sparse_lhs;
    {
        ScopeProfiler p("sparseView");
        sparse_lhs = normal_equ.lhs.sparseView(reference, epsilon);
    }
    VectorXd &rhs = normal_equ.rhs;

    SOLVE_TYPE solver;

    if(verbose_) std::cout << "Decomposing..." << std::endl;
    {
        ScopeProfiler p("matrix-decompose");
        solver.compute(sparse_lhs);
    }

    if(solver.info()!=Eigen::Success) 
    {
        assert(false && "Decompose failed");
    }

    if(verbose_) std::cout << "Back sub..." << std::endl;
    Eigen::VectorXd delta;
    {
        ScopeProfiler p("backsub");
        delta = solver.solve(rhs);
    }
    
    if(solver.info()!=Eigen::Success) 
    {
        assert(false && "Back sub failed");
    }

    return delta;
}

I used the Eigen warped CHOLMOD solver. CHOLMOD is one of the best implementation of the Sparse Cholesky Decomposition.

There are some supporting code. But the core logic is,

  1. convert dense matrix to sparse matrix by normal_equ.lhs.sparseView. (Can be optimized).
  2. Matrix factorization solver.compute(sparse_lhs).
  3. Back-substitution delta = solver.solve(rhs).

Since CHOLMOD is an extra dependency, I added ifdef CHOLMOD_SUPPORT to fall-back to the eigen QR solver whenever we want.

Timing

test_solvers =========================================
1.DenseSolver
Profiler: linear_function_to_normal_equation msec:21.63
solving Ax=b ...
Profiler: solve_normal_eqution eigen msec:4.413
Profiler: DenseSolver:solver_rocket_landing_least_squares msec:27.482
2.DenseSolverUpdateInPlace
Profiler: residual_function_to_normal_equation msec:0.283
solving Ax=b ...
Profiler: solve_normal_eqution eigen msec:4.495
Profiler: DenseSolverUpdateInPlace:solver_rocket_landing_least_squares msec:5.031
test_matrix_almost_equation pass
3.SparseSolver
Profiler: residual_function_to_normal_equation msec:0.142
Profiler: sparseView msec:0.271
Profiler: matrix-decompose msec:1.157
Profiler: backsub msec:0.042
Profiler: solve_normal_equation msec:1.51
Profiler: SparseSolver:solver_rocket_landing_least_squares msec:2.178
test_matrix_almost_equation pass

The comparison,

  1. Compute \(J^T J\) took 21.63 ms. With Separable Objectives, it took 0.283 ms.
  2. Solve the dense Normal Equation took 4.495 ms. With the sparse solver, it took 1.51 ms.

With the Separable Objectives functions and the Sparse Solver, the iteration time decreased from 27.482 ms to 2.178 ms.

A well formulated Newton Optimization with ok initial values normally converges in less than 10 iterations. So the total time to solve the rocket landing problem is about 25 ms.

Wait, Do you have constrains?

In the rocket landing cost function,

\(\text{minimize} \; cost\{s_0, s_1, …, s_n\} = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2 \\ \; \; \; \; + k1 \sum_i (s_i.\text{acceleration})^2 + k2 \sum_i (s_i.\text{dt})^2\)

There are no constraints for acceleration and turning rate. i.e. we can generate whatever force we want. And y can be smaller than 0, which means the rocket can go into the ground.

Man, It doesn’t work.

Simulation Result

Unconstraint rocket landing. Free Style.
The rocket go into the earth.

What’s next?

In this section, we formulated the rocket landing problem as a least-squares problem and applied some tricks to make the solver faster.

Next, we will formulate the rocket landing problem as a Constraint Optimization Problem,

\(\text{minimize} \; cost\{s_0, s_1, …, s_n\} = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2 \\ \; \; \; \; + k1 \sum_i (s_i.\text{acceleration})^2 + k2 \sum_i (s_i.\text{dt})^2\)

\(\text{subject to:} \\ \; s_i.y > 0, \; \forall \; i \\ \; s_i.\text{thrust} > 0, \; \forall \; i\)

Where we want the y coordinate and the thrust of every state larger than 0.

We will use the Primal-Dual Interior Point Methods to solve it.

Paper Unlock

With the knowledge of this post, we can discuss some interesting papers about motion planning.

CHOMP: Gradient Optimization Techniques for Efficient Motion Planning
Nathan Ratliff, Matt Zucker, J. Andrew Bagnell, Siddhartha Srinivasa

The goal of this paper was to
(1) find a smooth trajectory which follows physics from the start state to the end state,
(2) and avoid obstacles.
The authors formulate the two objectives as cost functions and optimization them together using the least squares method.

Smooth Trajectory

The authors tried to find a trajectory from \(q_{init}\) to \(q_{goal}\). The trajectory needed to satisfy the model constraint.

The trajectory optimization problem was formulated as a least squares problem,

Equation (1) is simply least squares. The author tried to be fancy by grouping every residuals together.

To optimize the problem, we can linearize the residual function, solve the Normal Equation, and update variables.

Obstacles Avoidance

Obstacles were modeled as a Gaussian like cost in the paper.

This cost is second order smooth, which means the optimization will be more stable.

Intuitively, we want the distance from the trajectory to obstacles to be large. E.g. if the trajectory is close to obsticles, the \(c(.)\) terms is large.

||.|| is the arch length. It is a weight for the obstacle cost c(.).

In the end, the authors jointly optimized the trajectory cost and the obstacle cost.

jointly optimize system continuous cost and obstacle cost.

What’s Next?


Least Squares Robotics

This article is part of the Least Squares Robotics.

12 thoughts on “How to Land a Rocket? Part 1, Problem Definition

  1. Pingback: go
  2. Pingback: Hunter898
  3. Pingback: pengeluaran sgp
  4. Pingback: openlims.org

Leave a Reply