Time Calibration

Why writing this post?

I’ve discussed the Lucas-Kanade tracking algorithm. For the Lucas-Kanade problem, we formulated it as a least-squares problem, and derive the update rules for it. Since images are discrete functions, we have to do (1) numerical derivative and (2) interpolations.

Time calibration is surprisingly similar to Lucas-Kanade.

The International Conference on Intelligent Robots and Systems(IROS) 2018 best paper was about time calibration. I’d like to discuss the basis of the time calibration problem and go through the highlights of the paper.

What’s in this post?

Time Synchronization is critical for real-time systems. But the time synchronization error is usually not modeled in many optimization applications.

In this post, I will

Formulate the time calibration as a least-squares problem.
Implement it in Python.
A simple discussion about visual odometry.
Discuss the IROS 2018 best paper. Time calibration for visual-inertial odometry.

Prerequisites

Least Squares refresher: https://wang-yimu.com/least-squares-refresher/
Lucas-Kanade tracker: https://wang-yimu.com/least-squares-refresher/

Problem definition

Given a function \(f(t)\), and a time-shifted version of the function \(g(t, dt) = f(t + dt)\), we want to find the \(dt^* \) such that \(f(t)\) and \(g(t, dt^*)\) are aligned.

Translate to math, we want,

\(dt^* = \text{argmin} \; cost(dt) = \sum_{t}||f(t) – g(t + dt)||^2\)

It is a 1d optimization. having a for-loop to search \(dt\) is perfectly fine. But doing least-squares is more flexible because we can joint optimize the time calibration cost with whatever costs.

To make sure we can understand the derivation, let me do a nonlinear least squares refresher.

Least Square Refresher

For a residual function \(r(x)\), define the cost function to be,

\(\text{minimize} \; cost(x) = r(x)^T r(x)\).

We linearize the residual function at the current variable \( x_{0} \) as,

\(r(\Delta{x}) \approx J(x_0) \Delta{x} + r(x_0)\).

Where \(\Delta{x} = x – x_{0}\)

To simplify notations, let \(J = J(x_0), r = r(x_0)\),

Plug in the cost function,

\(cost(x) \approx \Delta{x}^T J^TJ \Delta{x} + 2 J^T b \Delta{x} + b^T b\)

It is a quadratic, we can optimize the linearized cost in one step by solving the normal equation,

\(J^TJ \Delta{x} = -J^Tb\)

and we update x by,

\(x = x_{0} + \Delta{x}\)

We do this iteration until we meet a terminating condition. e.g. \(\Delta{x}\) is small.

Nonlinear least squares summary:

Linearize the residual function \(r(x)\) at current variable \(x_0\) , \(r(\Delta x) = J \Delta x + r\). Where \(\Delta{x} = x – x_{0}\), So the cost function \(\text{cost}(x) = r(x)^Tr(x)\) is approximated as a quadratic form.
We can minimize the quadratic form in 1 step by solve the Normal equation \(J^TJ \Delta{x} = -J^Tb\).
Because of the change of variable \(\Delta{x} = x – x_{0}\), we update varible x by \(x = x_{0} + \Delta{x}\).
If not converge. e.g. \(\Delta{x}\) is small. We go back to 1.

Check this post for details. https://wang-yimu.com/least-squares-refresher/

The Time Calibration Problem Formulation

Go back to the cost function for time calibration.

Want want to minimize,

\( \text{minimize} \; cost(dt) = \sum_{t} ||f(t) – g(t + dt) ||^2\)

Let \(r_t(dt) = f(t) – g(t + dt)\)

\( \text{minimize} \; cost(dt) = \sum_{t} || r_t(dt) ||^2\)

In English, we want to search (optimize) for a dt such that the difference between two functions is minimized.

The gay area is basically the cost value

Next, we will follow the least squares procedure to solve the problem.

Linearize the residual function \(r(x)\) at current variable \(x_0\) , \(r(\Delta x) = J \Delta x + r\). Where \(\Delta{x} = x – x_{0}\), So the cost function \(\text{cost}(x) = r(x)^Tr(x)\) is approximated as a quadratic form.
We can minimize the quadratic form in 1 step by solve the Normal equation \(J^TJ \Delta{x} = -J^Tb\).
Because of the change of variable \(\Delta{x} = x – x_{0}\), we update varible x by \(x = x_{0} + \Delta{x}\).
If not converge. e.g. \(\Delta{x}\) is small. We go back to 1.

Linearize the Residual function

To solve the least squares system, we need to linearize the residual function at current variable \(dt_0\),

\(r(\Delta dt) \approx J(dt_0) \Delta dt + r(dt_0)\).

Without loss of generality, Let’s consider one of the residual terms.

\(r_t(dt) = f(t) – g(t + dt) \)

It can be linearized at \(dt_0\),

\(r_t(\Delta dt) \approx \frac{\partial f(t) – g(t + dt)}{\partial dt}|_{dt=dt_0} \Delta dt + r_t(dt_0) \)

\(r_t(\Delta dt) \approx \frac{\partial – g(t + dt)}{\partial dt}|_{dt=dt_0} \Delta dt + r_t(dt_0) \)

Let \(J_t(dt_0) = \frac{\partial – g(t + dt)}{\partial dt}|_{dt=dt_0}\)

Residual functions are indexed by \(t\). Stack all residuals together, we can get,

\(\begin{bmatrix}r_0(\Delta dt )\\ …\\ r_i( \Delta dt ) \\ …\\ r_n( \Delta dt )\end{bmatrix} = \begin{bmatrix} J_0(dt_0) \\ … \\ J_i(dt_0) \\ … \\ J_n(dt_0) \end{bmatrix} \begin{bmatrix}\Delta dt \end{bmatrix} + \begin{bmatrix}r_0(dt_0)\\ …\\ r_i(dt_0) \\ …\\ r_n(dt_0)\end{bmatrix} \; \; (1) \)

It is in the form of,

\(r(\Delta) = J \Delta + b\).

Now, the residual function is linearized.

We can solve the linearized cost by solving the normal equation,

\(J^T J \Delta = J^T b\)

and update by,

\(dt = dt + \Delta dt\)

With the big picture in mind, we can discuss a technical details. How to compute \(J_t(dt_0) = \frac{\partial – g(t + dt)}{\partial dt}|_{dt=dt_0}\)?

Derivative? Non-Integer Index?

In practice \(g(t)\) is usually a discrete function. How can we compute the derivative?

We can do it numerically. i.e Compute the slop of \(g(t)\).

By definition of derivative (drop \(\delta \to 0\)),

\( \frac{\partial g}{\partial dt} (t) \approx \frac{(g(t + \delta) – g(t – \delta))} { 2\delta}\)

How about non-integer index?

We can do interpolations.

Time Calibration Summary

Time calibration is a least squares problem. It can be summarized as,

Linearize the residual function. Because the residual functions have discrete terms, we need to do (a) numerical derivative and (b) interpolation.
Then, the cost is approximated as a quadratic form. We can minimize the quadratic form by solving the normal equation.
Update variables.
Go back to 1 if not converged.

Looks familiar?

The time calibration problem is exactly the Lucus-Kanade in 1d.

I highly recommend reading the original Lucas-Kanade paper. Classical papers are fun!

Implementation

Code link: https://github.com/yimuw/yimu-blog/tree/master/least_squares/time_calibration

Let’s go through some key functions.

Nonlinear Least Squares

def least_squares_calibrate(self, func1, func2):
    """
        The least square loop
    """
    for iteration in range(self.max_iteration):
        cost = self.compute_cost(func1, func2)
        ...print...
        jacobian = self.compute_jacobian(func1)
        b = self.compute_b(func1, func2)
        delta = self.solve_normal_equation(jacobian, b)
        self.variable_time += delta

The classical least squares update.

Compute Jacobian

compute_jacobian(self, func1):
    """
        Compute the derivative of residual w.r.t dt
    """
    # compute derivative
    dt = func1.time[1] - func1.time[0]
    # np.convolve 'same' has boundary effect.
    value_padded = np.concatenate(
        [[func1.values[0]], func1.values, [func1.values[-1]]])
    dfunc1_dt = Function(func1.time, np.convolve(
        value_padded, [0.5 / dt, 0, - 0.5 / dt], 'valid'))

    jacobian = np.interp(func1.time + self.variable_time,
                         dfunc1_dt.time, dfunc1_dt.values)

    return jacobian

I used np.convolve to compute the numerical derivative. The interpolation is done by np.interp.

Solve the Normal Equation

def solve_normal_equation(self, jacobian, b):
    """
        solve the normal equation
    """
    lhs = jacobian.T @ jacobian
    rhs = -jacobian.T @ b

    # NOTE: np.linalg.solve(rhs, lhs) doesn't work for 1d
    delta = rhs / lhs
    return delta

Interestingly numpy.solve doesn’t work for 1d.

Run the code to see what does the output looks like.

The best student paper for IROS 2018 discussed time calibration. The ideal is the same as the Lucas-Kanade.

The IROS 2018 Best Student Paper

IROS 2018 Best Student paper:

Online Temporal Calibration for Monocular Visual-Inertial Systems,
Tong Qin and Shaojie Shen

Time calibration is critical for real-time systems. And it is usually not modeled.

In this paper, Tong jointly optimized the time-calibration cost with the visual-inertial cost.

To understand the time-calibration for visual odometry, let me do an (oversimplified) introduction to visual odometry and reprojection cost.

Visual Odometry, Projection Error

There is a 3d trajectory (position, rotation, …). it is a function of time,

\(\text{traj}(t)\)

In English, \(\text{traj}(t)\) tells us the “state” (position, rotation, …) of the camera at time t.

For a feature point in 3d \(P\), the projection of the feature point to the image plane at time t is given by,

\(\pi (P, \text{traj}(t)) \)

where \(\pi\) is the image projection function. Read this article for more details.

In English, The projection function \(\pi\) tell us where \(P\) (a point in the world frame) is projected to in the image frame.

\(p_{\text{predition-in-image}} = \pi (P, \text{traj}(t)) \)

In visual-odometry, the measurement residual for visual feature is,

\(\text{projection-residual}(P, \text{traj}(t)) = p – p_{\text{predition-in-image}}\)

\(\text{projection-residual}(P, \text{traj}(t)) = p – \pi (P, \text{traj}(t))\)

Where p is a tracked (or associated) image feature in image frame. It is a constant in the cost.

In word,

We want
- The projection of the 3d feature closes to the tracked 2d feature.
By
- tweaking (optimizing) the trajectory and the 3d feature.

The projection error is the most fundamental cost in Visual-odometry.

Time calibration for VO

Assuming there is a time mismatch \(dt\) between the trajectory and the image (camera clock and IMU clock), the projection error become,

\(\text{projection-residual}(P, \text{traj}, dt) = p – \pi (P, \text{traj}(t + dt))\)

To find a \(dt^*\), in least-squares framework, we take derivative w.r.t \(dt\) and solve this normal equation.

Take derivative w.r.t \(dt\),

\(\frac{\partial (\text{projection-residual}(P, \text{traj}, dt))}{\partial dt} = \frac{\partial (p – \pi (P, \text{traj}(t + dt)))}{\partial dt}\)

Drop terms which doesn’t change with dt,

\(\frac{\partial (\text{projection-residual}(P, \text{traj}, dt))} {\partial dt} = – \frac{\partial (\pi (P, \text{traj}(t + dt)))}{\partial dt}\)

Let’s take a close look at the right hand side.

The \(\pi\) function tell us where the feature is in image. So define,

\(\text{feature-in-image}(t + dt) = \pi (P, \text{traj}(t + dt)))\)

Plug in the above equation. We can compute this term numerically!

\(\frac{\partial (\text{projection-residual}(P, \text{traj}, dt))} {\partial dt} \\ = \frac{\partial \text{feature-in-image}(t + dt)))}{\partial dt} \approx \frac{\text{feature-in-image}(t + dt + \delta)) – \text{feature-in-image}(t + dt – \delta))}{2 \delta}\)

Where \(\delta\) is the camera shooting interval.

For this term,

\(\frac{\text{feature-in-image}(t + dt + \delta)) – \text{feature-in-image}(t + dt – \delta))}{2 \delta}\)

We can compute it numerically by computing the difference of tracked features in two frames, and divided the difference by time.

With \(\frac{\partial (\text{projection-residual}(P, \text{traj}, dt))} {\partial dt}\), we can optimize for the time offset \(dt\) in the least-sqaures framework.

It is the key finding in the paper.

Let me match a couple of notations.

The author defined \(V\) as the speed of features in the image frame. In math,

\(V := \frac{\text{feature-in-image}(t + dt + \delta)) – \text{feature-in-image}(t + dt – \delta))}{2 \delta}\)

To make this cost similar to what in the paper, linearize the projection residual w.r.t \(dt\),

\(\text{projection-cost}(P, \text{traj}, \Delta dt) = p – \pi (P, \text{traj}(t + dt)) + V \Delta dt\)

Now we can jointly optimize \(dt\) and all the fancy visual-odometry costs.

It is a good paper!

The code for this paper is in https://github.com/HKUST-Aerial-Robotics/VINS-Mono
Straight-follow implementation with intuition still alive. Very good for beginners.
Highly recommend reading if you want to learn visual odometry with IMU.

There are other papers adapted the LK idea. For example Unified Temporal and Spatial Calibration for Multi-Sensor Systems. I guess you can do it in your system and publish a paper too.

Maybe this is better?

Can we make this time calibration better?

Let’s go back to the computation of

\(\frac{\partial \text{feature-in-image}(t + dt)))}{\partial dt} \approx \frac{\text{feature-in-image}(t + dt + \Delta)) – \text{feature-in-image}(t + dt – \Delta))}{2 \Delta} \)

This computation is too coarse because \(\Delta\) is the camera shooting rate. It can be 50 ms.

Instead, go back to the derivative of reprojection residual w.r.t to \(dt\)

\(\frac{\partial (\text{projection-residual}(P, \text{traj}, dt))}{\partial dt} = \frac{\partial (p – \pi (P, \text{traj}(t + dt)))}{\partial dt}\)

Do the chain rule,

\(\frac{\partial (\text{projection-residual}(P, \text{traj}, dt))}{\partial dt} = \frac{\partial (p – \pi (P, \text{traj}(t + dt)))}{\partial (\text{traj}(t + dt))} \frac{\partial (\text{traj}(t + dt))}{\partial dt}\)

The first term of RHS, \(\frac{\partial (p – \pi (P, \text{traj}(t + dt)))}{\partial (\text{traj}(t + dt))}\) can be computed exactly.

The second term of RHS, \(\frac{\partial (\text{traj}(t + dt))}{\partial dt} \) still need to be computed numerically.

\(\frac{\partial (\text{traj}(t + dt))}{\partial dt} \approx \frac{\text{traj}(t + dt + \Delta)) – \text{traj}(t + dt – \Delta))}{2 \Delta}\)

However, In this case, the \(\delta\) is the sampling rate of IMU which is usually less than 5ms.

In summary, doing the chain rule of the derivative of the projection error w.r.t dt

and compute \(\frac{\partial (p – \pi (P, \text{traj}(t + dt)))}{\partial (\text{traj}(t + dt))}\) exactly and compute \( \frac{\partial (\text{traj}(t + dt))}{\partial dt}\) numerically can be a better approximation of the true derivative.

I guess doing this will make the optimization a tiny bit faster and more accurate. But for time critical system, maybe it can make a difference.

If we formulate the state in lie-group, the math for the above problem seems elegant! “the speed” will be an element in so(3).
\(\frac{\partial (\text{traj}(t + dt))}{\partial dt} = \begin{bmatrix}
v \ \ \omega \end{bmatrix}\)

I didn’t implement it. It’s just a random idea without experiment support. But if Mr.Shen got a new Phd, please let that dude try it. But remember to cite my blog, I want to be rich and famous (half true half sarcasm) 🙂

The End

I hope you enjoy the journey from freshman math to IROS best paper. 🙂

This article is part of the Least Squares Robotics.

Why writing this post?

What’s in this post?

Prerequisites

Problem definition

Least Square Refresher

The Time Calibration Problem Formulation

Linearize the Residual function

Derivative? Non-Integer Index?

Time Calibration Summary

Looks familiar?

Implementation

Nonlinear Least Squares

Compute Jacobian

Solve the Normal Equation

The IROS 2018 Best Student Paper

Visual Odometry, Projection Error

Time calibration for VO

Maybe this is better?

The End

35 thoughts on “Time Calibration”

Leave a ReplyCancel reply

Why writing this post?

What’s in this post?

Prerequisites

Problem definition

Least Square Refresher

The Time Calibration Problem Formulation

Linearize the Residual function

Derivative? Non-Integer Index?

Time Calibration Summary

Looks familiar?

Implementation

Nonlinear Least Squares

Compute Jacobian

Solve the Normal Equation

The IROS 2018 Best Student Paper

Visual Odometry, Projection Error

Time calibration for VO

Maybe this is better?

The End

Share this:

35 thoughts on “Time Calibration”

Leave a ReplyCancel reply