How to Land a Rocket? Part 3, Exploit the Structure

I plan to do a small update and typo fixes for this post. Please don’t read it.

What’s in this post?

In previous parts of the rocket landing series, I discussed,

In a article, I discussed,

In this section, we will exploit the structure of the rocket landing problem to solve the KKT condition in linear time.

The Primal-Dual interior Point Methods Refresher

The primal-dual interior-point method is an algorithm to solve constrained optimization problems.

Without loss of generality, the constrained least squares problem is defined as,

\(\text{minimize}_x ||Ax – b||^2\)

\(\text{subject to}: h_i(x) \leq 0, \; \forall i \)

The optimal variables \(x^*\) is computed by solving KKT conditions.

\(A^TAx^* + A^Tb + \sum_i \lambda_i^* \nabla h_i(x^*) = 0 \\ h_i(x^*) \leq 0 \; \; ,\forall i\\ \lambda_i^* \geq 0 \; \; ,\forall i \\ h_i(x^*) \lambda_i^* = 1/t \; \; ,\forall i\)

The relaxed KKT conditions consists of two parts.

Relaxed KKT system of equations.

\(A^TAx^* + A^Tb + \sum_i \lambda_i^* \nabla h_i(x^*) = 0 \\ h_i(x^*) \lambda_i^* = 1/t \; \forall i\)

We can linearized it at \((x, \lambda)\) as,

\(\begin{bmatrix} A^T A + \sum_i \lambda_i \nabla^2 h_i & \nabla h(x)^T \\ -\text{diag}(\lambda) \nabla h(x) & -\text{diag}(h(x)) \end{bmatrix} \begin{bmatrix} \Delta x\\ \Delta \lambda \end{bmatrix} = – \begin{bmatrix} A^TAx + A^Tb + \sum_i \lambda_i \nabla h_i(x)\\ -\text{diag}( \lambda ) h(x) – 1/t \; \end{bmatrix} \; \; \; (1)\)

KKT inequality constraints.

\(\\ h_i(x^*) \leq 0 \; \; \forall i \\ \lambda_i^* \geq 0 \; \; \forall i \)

The primal-dual interior point method solves the KKT conditions by,

  1. Solve the linearized relaxed KKT system of equations to get a update direction \((\Delta x, \Delta \lambda)\).
  2. Do backtracking line search to ensure inequality constraints.
Please read this article for details.

In this section, we will exploit the structure of (1) to solve the linear system of equations in \(O(n)\) time. Whereas, general linear solve has \(O(n^3)\) time complexity.

The System of Equations

The linearized relaxed KKT system of equations is,

\(\begin{bmatrix} A^T A + \sum_i \lambda_i \nabla^2 h_i & \nabla g(x)^T \\ -\text{diag}(\lambda) \nabla g(x) & -\text{diag}(g(x)) \end{bmatrix} \begin{bmatrix} \Delta x\\ \Delta \lambda \end{bmatrix} = – \begin{bmatrix} A^TAx + A^Tb + \sum_i \lambda_i \nabla h_i(x)\\ -\text{diag}( \lambda ) g(x) – 1/t \; \end{bmatrix}\)

In previous section, we used the sparse LU decomposition to solve it. Can we solve it faster?

Yes, we can solve it faster by exploiting the structure of the equations.

To illustrate the point, let me start with the unconstrained case.

The Block Diagonal Matrix

Given a least squares problem,

\(\text{minimize}_x ||Ax – b||^2\)

The normal equation of the unconstrained least-squares problem is \(A^T A x = A^T b\).

For the unconstrained rocket landing problem, the cost is chain-like.

\(\text{minimize} \; cost(\{s_0, s_1, …, s_n\}) = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2 \)

Only near-by states have cost term \(cost_i(x_i, x_{i+1})\). As a result, the \(A^T A\) is blocked-diagonal.

This article defined the unconstrained rocket landing cost function in detail.

In simple math,

\(\frac{\partial cost}{\partial x_i \partial x_j} = \text{none-zero} \; \; \text{if cost is a function of } (x_i, x_j) \\
\frac{\partial cost}{\partial x_i \partial x_j} = \text{0 } \; \; \text{otherwise}\).

We can plot the non-zeros of the \(A^T A\).

The non-zero elements of the Hessian matrix \(A^T A\)

As we discuss in this post. For a blocked-diagonal system, eliminating variables in index order take \(O(n)\) time. I will prove it later.

Firstly, let’s briefly refresh the Schur Component.

In the later section, I will use block-diagonal for block-diagonal and block-tridiagonal for simplicity.

The Schur Complement Refresher

In this article, I discussed the variable elimination and Schur Complement.

Consider a example. Given a 2 by 2 block linear system,

\(\begin{bmatrix} A & B\\ C & D \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} E\\ F \end{bmatrix}\)

We can zero out block C by subtracting the first row times \(A^{-1}C\) from the second row. It doesn’t change the solution x of the linear system.

\((\begin{bmatrix} A & B\\ C & D \end{bmatrix} – \begin{bmatrix} 0 & 0\\ A A^{-1}C & B A^{-1}C \end{bmatrix}) \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} E\\ F \end{bmatrix} – \begin{bmatrix} 0\\ E A^{-1}C \end{bmatrix} \)

\(\begin{bmatrix} A & B\\ 0 & D – BA^{-1}C \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} E\\ F – E A^{-1}C \end{bmatrix}\)

To solve for \(x_2\), we can solve the second row,

\((D – BA^{-1}C) x_2 = F – E A^{-1}C \; \; \;(2)\)

Given \(x_2\), we can solve for \(x_1\) using the first row,

\(A x_1 = E – B x_2\)

\((D – BA^{-1}C)\) is the Schur Complement of \(\begin{bmatrix} A & B\\ C & D \end{bmatrix} \). But keep in mind, equation (2) is the “complete Schur Complement”.

Solve a Block-diagonal System in O(n)

Applying Schur Complement to solve a linear system of equations is basically the Gaussian elimination. We will apply it to solve a block-diagonal system and prove that for a block-diagonal matrix, we can solve it in \(O(n)\).

In the above example, we eliminated \(x_1\) and got a smaller equation with only \(x_2\). We can solve for \(x_2\) and using the first row to compute \(x_1\).

In Gaussian elimination, we eliminated all variables one-by-one except the last one \(x_n\). We got a small equation with only \(x_n\). In the end, we use the eliminated rows to do back-substitution to recover \(x_{n-1}, x_{n-2}, …, x_2, x_1\).

Claim: The Gaussian Elimination takes O(n) time for a blocked-diagonal system.

Let’s prove it here.

Assume there is a block-diagonal linear system with 4 variables (to make my life easier).

\(H1 x = b\)

\(\begin{bmatrix}
a1 & a2 & 0 & 0 \\
a3 & a4 & a5 & 0 \\
0 & a6 & a7 & a8 \\
0 & 0 & a9 & a10
\end{bmatrix}
\begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}
=
\begin{bmatrix} b_1 \\ b_2 \\ b_3 \\ b_4 \end{bmatrix}\)

For a running-time analysis, we only need to consider the left-hand side of the system of equations. (Keep in mind, for implementation, we have to consider the full equation)

\(H1 = \begin{bmatrix}
a1 & a2 & 0 & 0 \\
a3 & a4 & a5 & 0 \\
0 & a6 & a7 & a8 \\
0 & 0 & a9 & a10
\end{bmatrix}\)

Eliminating the first state by Schur Component,

\(H2 = \begin{bmatrix}
a4 & a5 & 0 \\
a6 & a7 & a8 \\
0 & a9 & a10
\end{bmatrix}
+
\begin{bmatrix}
a3 \\
0\\
0
\end{bmatrix}
a1^{-1}
\begin{bmatrix}
a2 & 0 & 0
\end{bmatrix} \)

Consider the second term,

\(\begin{bmatrix}
a3 \\
0\\
0
\end{bmatrix}
a1^{-1}
\begin{bmatrix}
a2 & 0 & 0
\end{bmatrix} =
\begin{bmatrix}
a3 a1^{-1} \\
0\\
0
\end{bmatrix}
\begin{bmatrix}
a2 & 0 & 0
\end{bmatrix}
=
\begin{bmatrix}
a3 a1^{-1} a2 & 0 & 0 \\
0 & 0 & 0 \\
0 & 0 & 0
\end{bmatrix} \).

Due to the structure of the system, Only one term in the above computation is non zero.

So,

\(H2 = \begin{bmatrix}
a4 & a5 & 0 \\
a6 & a7 & a8 \\
0 & a9 & a10
\end{bmatrix}
+
\begin{bmatrix}
a3 a1^{-1} a2 & 0 & 0 \\
0 & 0 & 0 \\
0 & 0 & 0
\end{bmatrix}
=
\begin{bmatrix}
a4 + a3 a1^{-1} a2& a5 & 0 \\
a6 & a7 & a8 \\
0 & a9 & a10
\end{bmatrix}\)

It means for a blocked-diagonal system, if we eliminating variables in index order, the only computation is

\(a3 a1^{-1} a2\).

and the result is still block-diagonal.

Since the size of the state is fixed. The time complexity to eliminated a state is \(O(1)\). i.e. computing \(a3 a1^{-1} a2\) takes constant time.

We follow this procedure to eliminate all variables except for the last one. All the elimination takes constant time since
(1) after elimination, the new matrix is still block-diagonal.
(2) for the block-diagonal matrix, eliminate a variable takes constant time.

Now consider a \((n, n)\) block-diagonal matrix. By induction, eliminating \((x_1, x_2, x_3, … x_{n-1})\) takes \(O(n)\) time.

Then, we solve for the last variable, and back-substitute for eliminated states in reverse order \((x_{n-1}, … x_2, x_1)\).

The time complexity to eliminate all variables is \(O(n)\), where n is the number of states. The back-substitution obviously takes \(O(n)\).

In sum, for a blocked-diagonal system, we can solve it in \(O(n)\).

It’s a huge difference compared to the general case of solving a linear system which takes \(O(n^3)\) time. And it’s definitely better than a general sparse system solver which has overhead to estimate the graph structure of the system.

Linearized KKT System of Equations

Go back to the constrained rocket landing problem.

To solve the problem, we used the Primal-Dual Interior Points methods. The bottleneck was solving the linearized KKT system of equations.

\(\begin{bmatrix} A^T A + \sum_i \lambda_i \nabla^2 h_i & \nabla g(x)^T \\ -\text{diag}(\lambda) \nabla g(x) & -\text{diag}(g(x)) \end{bmatrix} \begin{bmatrix} \Delta x\\ \Delta \lambda \end{bmatrix} = – \begin{bmatrix} A^TAx + A^Tb + \sum_i \lambda_i \nabla h_i(x)\\ -\text{diag}( \lambda ) g(x) – 1/t \; \end{bmatrix} \; \; \; (1)\)

The left-hand-side of (1) is not block-diagonal.

none zero elements of \(\begin{bmatrix} A^T A + \sum_i \lambda_i \nabla^2 h_i & \nabla g(x)^T \\ -\text{diag}(\lambda) \nabla g(x) & -\text{diag}(g(x)) \end{bmatrix} \)

Can we make this matrix block-diagonal so that we can solve it in \(O(n)\)?

Local Constraints, Augmented States

Due to the rocket landing problem formulation, we can permute the linearized KKT system of equations to make it block-diagonal.

Consider this problem

\(\text{minimize} \; \text{cost}(s_0, s_1, …, s_n) = \sum_i ||s_{i+1} – \text{motion}(s_{i})||^2\)

\(h_i(s_i) \leq 0, \; \; \forall \; i\)

Each cost term \(||s_{i+1} – \text{motion}(s_{i})||^2\) connects two nearby states. A constraint is a function of a single state. It’s a simplified version of the Rocket Landing problem.

The cost structure (factor-graph) with constraints

Each constraint \(h_i(s_i)\) is a function of a single state. I call it the local constraint.

Now, go back to the LHS of the linearized KKT system.

\(\begin{bmatrix} A^T A + \sum_i \lambda_i \nabla^2 h_i & \nabla g(x)^T \\ -\text{diag}(\lambda) \nabla g(x) & -\text{diag}(g(x)) \end{bmatrix} \; \; (3)\)

Note \(g(x)\) is just the stacked constraints \(h_i(x)\).

\(g(x) = \begin{bmatrix}h_1(x) \\ h_2(x) \\ …\\ h_n(x) \end{bmatrix}\)

In order to make (3) block-diagonal, let’s draw the non-zero elements of (3).

To make my life easier, I assume there are 3 states.

The non-zero pattern of (3).

The sub-block of \((x_1, x_2, x_3)\) is the sum of\(A^T A\), which is block-diagonal because of the cost structure, and \(\sum_i \lambda_i \nabla^2 h_i \) which is diagonal by definition.

The sub-block of \((\lambda_1, \lambda_2, \lambda_3) \) is diagonal by definition. i.e. \( -\text{diag}(g(x))\)

How about the off-diagonal blocks \((\lambda_1, \lambda_2, \lambda_3) \times (x_1, x_2, x_3) \) and \( (x_1, x_2, x_3) \times (\lambda_1, \lambda_2, \lambda_3)\) ?

Let’s do the math. Since \(h_i\) are local constraints, \(\nabla h_i(x_i)\) is non-zero only for \(\frac{\partial h_i}{\partial x_i}\) term.

The non-zeros of (3).

Is there a non-zeros pattern? Hard to see…

Now, let’s permute the system so that \(x_i\) is followed by \(\lambda_i\).

After permutation of (3) the non-zeros become block-diagonal.

To capture this idea, we can define the augmented state,

\(y_i = (x_i, \lambda_i)\)

We simplify put the primal and dual variables which belong to the same state together.

The KKT system with the augmented state.
The number of states is 6.

We can see the linearized KKT system with augmented states is block-diagonal.

For a block-diagonal linear system of equations, we can solve it in \(O(n)\) time.

This example is a simplified version of the rocket landing problem. But the problem structure is exactly the same.

The rocket landing problem is defined as,

\(\text{minimize} \; cost(\{s_0, s_1, …, s_n\}) = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2 \\ \; \; \; \; + k1 \sum_i (s_i.\text{acceleration})^2 + k2 \sum_i (s_i.\text{dt})^2\)

\(\text{subject to:} \; s_i.y > 0, \;s_i.\text{thrust} > 0 \;s_i.\text{dt} > 0 \; \forall \; i\)

The cost is chain like. The constraints only depends on a single state.

We can define the augmented state \(y_i = (s_i, \lambda_i)\) to get a block-diagonal linear system and solve it in \(O(n)\) time.

Implementation

Permutation, Augmented States

The implementation is in C++.

The only problem for C++ is the lack of visualization library. I used the matplotlib-cpp library, which calls the python matplot in C++ to plot data.

Note: I modified the matplotlib-cpp to support the spy function. Make sure to use the matplotlib-cpp code in the source tree of this project.

The LHS of the linearized KKT system of equations of the rocket landing system is,

\(\begin{bmatrix} A^T A + \sum_i \lambda_i \nabla^2 h_i & \nabla g(x)^T \\ -\text{diag}(\lambda) \nabla g(x) & -\text{diag}(g(x)) \end{bmatrix} \; \; \; (3)\)

Using the Spy function to show the non-zero elements of the matrix.

\(\begin{bmatrix} A^T A + \sum_i \lambda_i \nabla^2 h_i &  \nabla g(x)^T \\ -\text{diag}(\lambda) \nabla g(x) &  -\text{diag}(g(x)) \end{bmatrix}\)
You can solve the 4 sub-matrix block.
Non zeros of (3). You can see the 4 sub-matrix blocks.

We can clearly see the 4 matrix blocks of (3).

Now do the permutation for (3). (Or to be fancy, the linearized KKT system of the augmented states).

void construct_primal_dual_problem_lhs(const MatrixXd &cost_hessian,
                                       const VectorXd &dual_variables, 
                                       const Constrains &constrains,
                                       MatrixXd &augmented_normal_equ_lhs)
{
    augmented_normal_equ_lhs.setZero();
    
    const int state_size = RocketState::STATE_SIZE;
    // WARNING: hard coded!
    const int constrains_size_per_size = 3;
    assert(static_cast<int>(constrains.linear_constrains.size()) == num_states_ * 3);
    const int augmented_state_size = state_size + constrains_size_per_size;

    // Very tricky
    for(int state_idx = 0; state_idx < num_states_; ++ state_idx)
    {
        const int state_start_idx = state_idx * state_size;
        const int aug_state_start_idx = state_idx * augmented_state_size;
        augmented_normal_equ_lhs.block<state_size, state_size>(aug_state_start_idx, aug_state_start_idx)
         = cost_hessian.block<state_size, state_size>(state_start_idx, state_start_idx);
        
        if(state_idx != num_states_ - 1)
        {
            augmented_normal_equ_lhs.block<state_size, state_size>(aug_state_start_idx + augmented_state_size, aug_state_start_idx)
            = cost_hessian.block<state_size, state_size>(state_start_idx + state_size, state_start_idx);

            augmented_normal_equ_lhs.block<state_size, state_size>(aug_state_start_idx, aug_state_start_idx + augmented_state_size)
            = cost_hessian.block<state_size, state_size>(state_start_idx, state_start_idx + state_size);
        }
    }

    // Not general, simplified for 1d linear case.
    for(size_t constrain_idx = 0; constrain_idx < constrains.linear_constrains.size(); ++constrain_idx)
    {
        const auto &constrain_linear = constrains.linear_constrains[constrain_idx];
        const int correspond_primal_index = constrain_linear.state_index * augmented_state_size
            + constrain_linear.type_index;
        const int dual_var_index = constrain_linear.state_index * augmented_state_size + state_size
            + constrain_linear.dual_index;
        // upper left
        // hessian for linear constrain is 0. Don't need to update it

        // upper right block
        augmented_normal_equ_lhs(correspond_primal_index, dual_var_index) = constrain_linear.jacobian();
        // lower left block
        augmented_normal_equ_lhs(dual_var_index, correspond_primal_index) 
            = - dual_variables[constrain_idx] * constrain_linear.jacobian();
        // lower right block
        augmented_normal_equ_lhs(dual_var_index, dual_var_index) = - constrain_linear.h();
    }
}

Instead of permuting the KKT matrix, the implementation was based on the augmented state because the augmented state is easier to implement.

Note: Since we permuted the rows and cols of a linear system of equations, to make the solution of the linear system of equation unchanged, we should also permute the RHS and x of the system. I will ignore those steps for simplicity.

After the permutation, let’s Spy the new LHS of the linear system of equations,

Block-diagonal

Block-Diagonal Solver

To solve a block-diagonal system, we just need to do the Schur Complement iteratively and do the Back-Substitution.

// Block Diagonal
// e.g:
// | A1 A2          |
// | B1 B2 B3       |
// |    C1 C2 C3    |
// |       D1 D2 D3 |
// |          E1 E2 |
class BlockDiagonalSolver
{
public:
    // | A11 A12 | x1 = |b1|
    // | A21 A22 | x2 = |b2|
    struct EliminationBlock
    {
        EliminationBlock(const MatrixXd &iA11,
                         const MatrixXd &iA12,
                         const MatrixXd &iA11_inv,
                         const VectorXd &ib1)
            : A11(iA11), A12(iA12), A11_inv(iA11_inv), b1(ib1)
        {
        }

        MatrixXd A11;
        MatrixXd A12;
        MatrixXd A11_inv;
        VectorXd b1;
    };

// TODO: the const correctness of Eigen::block was messed up.
#define GET_MAT_STATE_BLOCK(lhs, num_state_rows, num_state_cols) \
lhs.block( (num_state_rows) * state_size_, (num_state_cols) * state_size_, state_size_, state_size_)

// TODO: the const correctness of Eigen::block was messed up.
#define GET_VECTOR_STATE_BLOCK(rhs, num_state_rows) \
rhs.block( (num_state_rows) * state_size_, 0, state_size_, 1)

    VectorXd solve(const NormalEqution &normal_equ, 
                    const int state_size,
                    const int num_states)
    {
        ScopeProfiler p("BlockDiagonalSolver_solve");

        state_size_ = state_size;
        num_states_ = num_states;

        assert(state_size_ * num_states_ == normal_equ.lhs.rows());
        assert(state_size_ * num_states_ == normal_equ.lhs.cols());
        assert(state_size_ * num_states_ == normal_equ.rhs.rows());

        if(false) PythonmatplotVisualizer().spy_matrix(normal_equ.lhs);

        const MatrixXd &lhs = normal_equ.lhs;
        const VectorXd &rhs = normal_equ.rhs;

        MatrixXd marginal_left = GET_MAT_STATE_BLOCK(lhs, 0, 0);
        VectorXd marginal_right = GET_VECTOR_STATE_BLOCK(rhs, 0);

        std::vector<EliminationBlock> elimination_chain;
        elimination_chain.reserve(num_states_ - 1);

        // eliminate by block
        for(int state_idx = 0; state_idx < num_states_ - 1; ++ state_idx)
        {
            const MatrixXd A11 = marginal_left;
            const MatrixXd A12 = GET_MAT_STATE_BLOCK(lhs, state_idx, state_idx + 1);
            const MatrixXd A21 = GET_MAT_STATE_BLOCK(lhs, state_idx + 1, state_idx);
            const MatrixXd A22 = GET_MAT_STATE_BLOCK(lhs, state_idx + 1, state_idx + 1);
            const VectorXd b1 = marginal_right;
            const VectorXd b2 = GET_VECTOR_STATE_BLOCK(rhs, state_idx + 1);
            const MatrixXd A11_inv = A11.inverse();

            marginal_left = (A22 - A21 * A11_inv * A12);
            marginal_right = b2 - A21 * A11_inv * b1;
            
            elimination_chain.emplace_back(A11, A12, A11_inv, b1);
        }

        assert(marginal_right.size() == state_size_);

        VectorXd x_result = VectorXd::Zero(state_size_ * num_states_);
        VectorXd x_last = marginal_left.inverse() * marginal_right;
        GET_VECTOR_STATE_BLOCK(x_result, num_states_ - 1) = x_last;

        // back substitition
        for(int x_idx = num_states_ - 2; x_idx >= 0; --x_idx)
        {
            EliminationBlock &eblock = elimination_chain.at(x_idx);
            x_last =  eblock.A11_inv * (eblock.b1 - eblock.A12 * x_last);
            GET_VECTOR_STATE_BLOCK(x_result, x_idx) = x_last;
        }

        return x_result;
    }
private:
    int state_size_;
    int num_states_;
};

Line 61~76, do the Schur Complement for all blocks.
Line 80~82, solve the last x \(x_n\).
Line 85~90, do the Back-Substitution for compute all x, \({x_0, x_1, …, x_{n-1}}\).

(I don’t remember why I used the macro. From the comment, it looks like something relates to const)

Test

The key to software development is testing and verification and how to make the software easy for testing and verification.

Since we already had a baseline implementation in the previous section, we can compare the block-diagonal solver with the LU decomposition baseline. The code is in test.cpp.

How fast?

primal_dual_profile =========================================
<Eigen sparse LU solver>
...
Profiler: solve_iter msec:3.498
Profiler: residual_function_to_normal_equation msec:0.241
Profiler: construct_primal_dual_problem msec:0.365
Profiler: sparseView msec:0.414
Profiler: matrix-decompose msec:2.347
Profiler: backsub msec:0.048
Profiler: solve_normal_equation msec:2.834
search_success :1
final_line_search_step :1
Profiler: solve_iter msec:3.552
Profiler: residual_function_to_normal_equation msec:0.265
Profiler: construct_primal_dual_problem msec:0.391
Profiler: sparseView msec:0.422
Profiler: matrix-decompose msec:2.066
Profiler: backsub msec:0.066
Profiler: solve_normal_equation msec:2.573
search_success :1
final_line_search_step :0.5
stop at iteration:23
Profiler: solve_iter msec:3.354
Profiler: primal_dual_sparse_solver msec:78.862

<block-diagonal solver>
...
Profiler: solve_iter msec:1.062
Profiler: residual_function_to_normal_equation msec:0.264
Profiler: construct_primal_dual_problem_strutural msec:0.237
Profiler: BlockDiagonalSolver_solve msec:0.43
search_success :1
final_line_search_step :1
Profiler: solve_iter msec:1.011
Profiler: residual_function_to_normal_equation msec:0.213
Profiler: construct_primal_dual_problem_strutural msec:0.164
Profiler: BlockDiagonalSolver_solve msec:0.424
search_success :1
final_line_search_step :1
Profiler: solve_iter msec:0.891
Profiler: residual_function_to_normal_equation msec:0.195
Profiler: construct_primal_dual_problem_strutural msec:0.142
Profiler: BlockDiagonalSolver_solve msec:0.381
search_success :1
final_line_search_step :0.5
stop at iteration:23
Profiler: solve_iter msec:0.814
Profiler: primal_dual_structural_solver msec:23.097

In the previous implementation, we used the Eigen Sparse LU solver to solve the linearized KKT condition. The time complexity of Sparse LU is also \(O(n)\), but the constant factor in big O is much larger.

Profiler: solve_normal_equation msec:2.573

On average, The sparse LU took 2.5 ms to solve the system.

In the block-diagonal implementation, we solved the permuted KKT system by applying the Schur Complement iteratively.

Profiler: BlockDiagonalSolver_solve msec:0.424

On average, The simple block-diagonal solver took 0.4 ms to solve the system.

There were other operations in the solver. e.g. constructing the linear system. But those operations were not the bottleneck or can be easily optimized.

For a well-formulated least-squares problem with okay initial guess, the Newton Method usually converges within 10 iterations. With the block-diagonal solver, we can solve the rocket landing optimization problem in about 5 ms.

Paper Unlock

Fast Model Predictive Control Using Online Optimization
Yang Wang and Stephen Boyd

The authors tried to solve the Model Predictive Control (MPC) problem efficiently by exploiting the structure of the linearized KKT system of equations in a different way.

The problem definition

The MPC problem is defined as,

The MPC problem is a Quadratic Programming problem (Least Squares is a subset of Quadratic Programming).

The cost function is a quadratic form. There are equality and inequality constraints.

The Inequality Constraints

The inequality constraints are handled by the primal barrier method.

The barrier method converts hard inequality constraints to differentiable cost functions. So, we can solve the inequality constraints with the original cost by the Newton Method.

The Optimality Conditions & Update

To solve a quadratic programming problem, we just need to solve it’s KKT conditions.

(8) is the linearized KKT system of equations. To make sure the updated variable is feasible \(Pz < h\), the authors did a backtracking line search.

Exploit Sparsity

Here is the interesting part.

The author wanted to convert (8) to a block-diagonal linear system of equations so that they can solve (8) faster.

The discussion was very quick. Let me break it down.

Given (8),

\(\begin{bmatrix} 2H + \kappa P^T \text{diag}(d)^2 P & C^T \\ C & 0\end{bmatrix} \begin{bmatrix} \Delta z \\ \Delta v \end{bmatrix} = – \begin{bmatrix} r_d \\ r_p \end{bmatrix}\)

Let \(\Phi = 2H + \kappa P^T \text{diag}(d)^2 P \),

\(\begin{bmatrix} \Phi & C^T \\ C & 0\end{bmatrix} \begin{bmatrix} \Delta z \\ \Delta v \end{bmatrix} = – \begin{bmatrix} r_d \\ r_p \end{bmatrix}\)

Do the Schur Complement to eliminate \(\Delta z\), we get,

\((C \Phi^{-1} C^T) \Delta v = -r_p + C \Phi ^{-1} r_d\)

We can solve it to get \(\Delta v\). And using the first row of (8) to compute \(\Delta z\).

The key observation was, \(Y = C \Phi^{-1} C^T\) is block-diagonal. So we can solve \((C \Phi^{-1} C^T) \Delta v = -r_p + C \Phi ^{-1} r_d\) efficiently.

Note: Block-tridiagonal is the same as block-diagonal I discussed above.

In general, the idea was the product of the block diagonal matrix is still block-diagonal. e.g. the product of identity matrices is still an identity.

Because \(\Phi\), \(C\), \(C^T\) are block-diagonal (remember the plot for equation (3)), \(Y = C \Phi^{-1} C^T\) is block-diagonal.

The “step 2”, this basically the block-diagonal solver we discussed above. But the block-diagonal solver works for non-PSD matrices too. Note: The running time in the paper was \(O(Tn^3)\), where n is the size of the state, T is the number of states.

Alternative

We solved the inequality constrained problem with the permutation/augmented-state.

The permutation/augment-state method works to equality constraints too. It’s easy to see the equality constraint links two nearby states. So if we put the primal variables, inequality-dual variables, and equality-dual variables together, the linearized KKT system should be block-diagonal (with a larger band).

In other word, we can define the augmented state \(y_i = (x_i, \lambda_i, v_i)\), where \(x_i\) is the primal variable at index i; \(\lambda_i\) is the inequality dual variable for \(x_i\); \(v_i\) is the equality dual variable for \(x_i\).

Then, we permute the KKT system accordingly. The permuted system is block-diagonal. And we can solve the system by the Schur complement efficiently.

On the contrary, the \(Y = C \Phi^{-1} C^T\) method can’t handle the inequality and equality constants together. And the inequality constraint makes the system non-positive-semi definite. As a result, we can’t apply Chelosky to factor \(Y\).

The big O time complexity for the two methods are the same. In terms of implementation, I think the augmented-state/permutation is better (with obvious bias) because it’s easy to implement which is important in practice.

I feel like I am doing something new. Let me know there is a paper about permute the KKT system. If not, I will apply for Stephen Boyd’s PhD. I am a big fan of you!

What’s Next?

we modeled the motion constrains as soft cost.

\(\text{minimize} \; cost\{s_0, s_1, …, s_n\} = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 +\sum_i ||s_{i+1} – \text{motion}(s_i)||_{W_m}^2 \\ \; \; \; \; + k1 \sum_i (s_i.\text{acceleration})^2 + k2 \sum_i (s_i.\text{dt})^2\)

\(\text{subject to:} \; s_i.y > 0, \;s_i.\text{thrust} > 0 \;s_i.\text{dt} > 0 \; \forall \; i\)

As we saw in the paper, the motion constraints were modeled as equality constraints.

In the next section, we are going to formulate the rocket landing problem as a inequality and equality constrained problem,

\(\text{minimize} \; cost\{s_0, s_1, …, s_n\} = ||s_0 – s_{initial}|| _{W_0} ^2 + ||s_n – s_{end}||_ {W_n} ^2 \\ \; \; \; \; + k1 \sum_i (s_i.\text{acceleration})^2 + k2 \sum_i (s_i.\text{dt})^2\)

\(\text{subject to:} \\
\; s_i.y > 0, \;s_i.\text{thrust} > 0 \;s_i.\text{dt} > 0 \; \forall \; i \\
\text{motion}(s_i) = s_{i+1}\)

One obvious method to solve it is applying the augmented-state method to the KKT conditions.

But there is a more efficient way to solve it. Dynamic Differential Programming.


Least Squares Robotics

This article is part of the Least Squares Robotics.

41 thoughts on “How to Land a Rocket? Part 3, Exploit the Structure

  1. Pingback: dmt kopen
  2. Pingback: reddit save
  3. Pingback: brians club fullz
  4. Pingback: redes informática
  5. Pingback: cach dang ky w88
  6. Pingback: beer789
  7. Pingback: YBLIVE
  8. Pingback: bonanza178
  9. He Got 256,354 Free Views With AI…

    Can you believe it?

    People spend thousands of dollars to get that kind of result…

    My friend Kundan just did it for free…

    He only used his new app… AI ScreenSnap…

    It’s the world’s first AI app that can generate videos with the power of Video-Exclusive AI Engine…

    That can edit, record, and generate videos with just a few clicks… with zero experience…

    Click here now and watch AI ScreenSnap in action https://ext-opp.com/AIScreenSnap

  10. Elevate Learning Adventures with The Story Shack!

    A library of 200+ high-quality books tailored to the school curriculum.
    StoryShack’s Build a Book bundle features word searches, quizzes, creative coloring pages, high-quality images, and top SEO keywords.
    StoryShack’s StoryCraft Pro bundle includes the “Melody Minds Library” with 350+ music tracks and “AnimateMasters Pro,” offering 30+ categories of animations.
    And as if that’s not enough, here are the MEGA BONUSES:

    ✔ 100+ Mega Mazes Pack
    ✔ 100+ Sudoku Elements Pack
    ✔ 100+ Comic Book Template Pack
    ✔ 100+ Handwriting Practice Template Pack
    ✔ 100+ Kids Story Book Templates
    ✔ Canva Book Templates
    ✔ Additional beautiful content like journal prompts
    ✔ INCLUDED: The Ultimate Workbook

    Click https://ext-opp.com/StoryShack to explore The Story Shack e-Learning Collection and seize the opportunity for multiplied income!

Leave a Reply