Lucas–Kanade Tracker

Why writing this post?

I’d like to give a concrete and non-trivial example of a least-squares problem.

Lucas-Kanade is a good choice because many people didn’t realize LK is a least-squares problem.

What’s in this post?

  • Simple & Solid derivation of Lucas-Kanade.
  • C++ implementation of Lucas-Kanade.

What does Lucas–Kanade method do?

It tracks image patches across frames.
In math, it optimizes for a translation \((dx, dy)\) to minimize the difference between 2 image patches.

Lucas–Kanade uses the least-squares formulation to achieve real-time performance.

The above C++ problem runs at 30 HZ on my laptop.

To make sure we can understand later derivation, let me first refresh the nonlinear least squares problem.

Least Squares Refresher

For a residual function \(r(x)\), define the cost function to be,

\(\text{minimize} \; cost(x) = r(x)^T r(x)\).

We linearize the residual function at the current variable \( x_{0} \) as,

\(r(\Delta{x}) \approx J(x_0) \Delta{x} + r(x_0)\).

Where \(\Delta{x} = x – x_{0}\)

To simplify notations, let \(J = J(x_0), r = r(x_0)\),

Plug in the cost function,

\(cost(x) \approx \Delta{x}^T J^TJ \Delta{x} + 2 J^T b \Delta{x} + b^T b\)

It is a quadratic, we can optimize the linearized cost in one step by solving the normal equation,

\(J^TJ \Delta{x} = -J^Tb\)

and we update x by,

\(x = x_{0} + \Delta{x}\)

We do this iteration until we meet a terminating condition. e.g. \(\Delta{x}\) is small.

Nonlinear least squares summary:

  1. Linearize the residual function \(r(x)\) at current variable \(x_0\) , \(r(\Delta x) = J \Delta x + r\). Where \(\Delta{x} = x – x_{0}\), So the cost function \(\text{cost}(x) = r(x)^Tr(x)\) is approximated as a quadratic form.
  2. We can minimize the quadratic form in 1 step by solve the Normal equation \(J^TJ \Delta{x} = -J^Tb\).
  3. Because of the change of variable \(\Delta{x} = x – x_{0}\), we update varible x by \(x = x_{0} + \Delta{x}\).
  4. If not converge. e.g. \(\Delta{x}\) is small. We go back to 1.
Check this post for details. https://wang-yimu.com/least-squares-refresher/

Lucas-Kanade Problem Formulation

Given two images A & B, we want to find a translated \((dx, dy)\) such that the difference between A and translated B is minimized.

In math, we want,

\(\text{minimize} \; \text{cost}(dx, dy) = \sum_{x,y}||A(x,y) – B(x+dx, y + dy) ||^2\)

The residual function is,

\(r(dx, dy) = \begin{bmatrix} r_{x=0, y = 0}( dx, dy ) \\ …\\ r_{x=i, y=j}( dx, dy ) \\ … \\ r_{x=n,y=n}( dx, dy ) \end{bmatrix} = \begin{bmatrix}A(0,0) – B(0+dx, 0 + dy) \\ … \\ A(i,j) – B(i+dx, j + dy) \\ … \\ A(n,n) – B(n+dx, n + dy) \end{bmatrix} \)

Note:
1. x,y is constant in the context of this problem. \((dx, dy)\) are variables we want to optimize.
2. A row of the residual function for the Lucas-Kanade problem is the difference of a single pixel.

For a video stream, we let B be the image patch from the first frame, and A be the image patch from the second frame. Assuming the image patch intensity doesn’t change, doing the Lucas-Kanade optimization gives the movement of the image patch across frames.

Next, we are going to follow the nonlinear least squares steps to solve the Lucas-Kanade least squares problem.

  1. Linearize the residual function \(r(x)\) at current variable \(x_0\) , \(r(\Delta x) = J \Delta x + r\). Where \(\Delta{x} = x – x_{0}\), So the cost function \(\text{cost}(x) = r(x)^Tr(x)\) is approximated as a quadratic form.
  2. We can minimize the quadratic form in 1 step by solve the Normal equation \(J^TJ \Delta{x} = -J^Tb\).
  3. Because of the change of variable \(\Delta{x} = x – x_{0}\), we update varible x by \(x = x_{0} + \Delta{x}\).
  4. If not converge. e.g. \(\Delta{x}\) is small. We go back to 1.

Linearize the Residual Function

Recall the least-squares problem, we need to linearize the residual function and solve the normal equation.

Consider a row in the residual function \(r_i(dx, dy) = A(x_i,y_i) – B(x_i+dx, y_i + dy) \)

Let’s Linearize the function at \((dx_{cur}, dy_{cur})\) , Let \(cur = (dx_{cur}, dy_{cur} )\) .

By definition of linearization (or the Taylor expansion),

In vector form,

\(r_i(\Delta dx, \Delta dy) = \begin{bmatrix}\frac{\partial{r_i} }{\partial{dx} }(cur) & \frac{\partial{r_i} }{\partial{dy}}(cur)) \end{bmatrix} \begin{bmatrix}\Delta dx\\ \Delta dy \end{bmatrix} + r_i(cur) \)

We linearized a row for the residual function.

Do the same operation for all rows in the residual function. The linearized residual function is,

\(\begin{bmatrix}r_0( \Delta dx, \Delta dy )\\ …\\ r_i( \Delta dx, \Delta dy ) \\ …\\ r_n( \Delta dx, \Delta dy )\end{bmatrix} = \begin{bmatrix}\frac{\partial{r_0} }{\partial{dx} } (cur) & \frac{\partial{r_0} }{\partial{dy}}(cur)) \\ … & …\\ \frac{\partial{r_i} }{\partial{dx} } (cur) & \frac{\partial{r_i} }{\partial{dy}}(cur)) \\ … & …\\ \frac{\partial{r_n} }{\partial{dx} } (cur) & \frac{\partial{r_n} }{\partial{dy}}(cur))\end{bmatrix} \begin{bmatrix}\Delta dx\\ \Delta dy \end{bmatrix} + \begin{bmatrix}r_0(cur)\\ …\\ r_i(cur) \\ …\\ r_n(cur)\end{bmatrix} \; \; (1) \)

Where the jacobian of it is,

\(J = \begin{bmatrix}\frac{\partial{r_0} }{\partial{dx} } (cur) & \frac{\partial{r_0} }{\partial{dy}}(cur)) \\ … & …\\ \frac{\partial{r_i} }{\partial{dx} } (cur) & \frac{\partial{r_i} }{\partial{dy}}(cur)) \\ … & …\\ \frac{\partial{r_n} }{\partial{dx} } (cur) & \frac{\partial{r_n} }{\partial{dy}}(cur))\end{bmatrix} \)

The bias term is,

\(b= \begin{bmatrix}r_0(cur)\\ …\\ r_i(cur) \\ …\\ r_n(cur)\end{bmatrix} \)

It is a \(J \Delta = b\) function. It is linear.

Some people (Stephen Boyd) call it a affine function.

Now we can solve for \(\Delta\) by solving the normal equation \(J^TJ\Delta{x} = J^Tb\), and update the current variable.

WAIT! what is the jacobian? we didn’t compute it yet.

Actually, we didn’t do anything significant. We just did a Taylor expansion of the residual function. We haven’t computed the Jacobian of residual function yet.

How to compute the image gradient?

Let’s consider one element in the jacobian matrix.

\(\frac{\partial{r(dx, dy)} }{\partial{dx} }(a, b) = \frac{\partial{A(x,y) – B(x+dx, y + dy)}}{{\partial{dx}}}|_{dx=a, dy=b}\)

I am using the full math notation for clarity.

To compute the derivative, just do the math. drop constant terms,

\(\frac{\partial{r(dx, dy)} }{\partial{dx} }(a, b) = \frac{\partial{- B(x+dx, y + dy)}}{{\partial{dx}}}|_{dx = a, dy = b} \)

We got a problem here. B is a image, it is a not a continues function. The analytical derivative doesn’t exist.

But we can compute the derivative numerically.

By definition of derivative (drop \(\Delta \to 0\)),

\(\frac{\partial{- B(x+dx, y + dy)}}{{\partial{dx}}}|_{dx = a, dy = b} = – \frac{B(x + a + \Delta, y + b) – B(x + a – \Delta, y + b)} { 2 \Delta}\)

In word, the derivative at \((dx, dy)\) is the slope of the function at \((dx, dy)\). We can compute the slope by \(\text{slope} = dy/dx\). .

We can stop here. But there are more efficient ways to do it. The numerical image gradient is the Sobel operator in Computer Vision.

Sobel operator is actually image gradient + denoise. That’s why the convolution kernel is (3X3) instead of (1X3).

Define the image gradient to be,

\(\text{image-gradient-wrt-x}(B)|_{c, d} = \frac{B(c+1, d) – B(c-1, d)}{2}\)

Assuming \(\Delta = 1\), matching terms, we get,

\(\frac{\partial{r} }{\partial{dx} }(a, b)= \frac{\partial{- B(x+dx, y + dy)}}{{\partial{dx}}}|_{dx = a, dy = b} = – \text{image-gradient-wrt-x}(B)|_{c = x + a, y = y + b} \)

Read this openCV tutorial for image gradient: https://docs.opencv.org/master/d5/d0f/tutorial_py_gradients.html

However, the \(\text{image-gradient}(B)\) is also a discrete function. How can we evaluation it when \(x + a\) is not a integer?

Sub-pixel?

We can simply do interpolation. Take linear interpolation for example.

In English, we connect the discrete function by lines. Then the function becomes continuous. And we can evaluate the continuous function at any point.

Please read the code in the implementation section for more details.
The fancy term was “our algorithm can achieve sub-pixel level accuracy”.

Lucas-Kanade summary

As we discussed earlier, Lucas Kanade is simply a least squares problem. So the procedure to solve Lucas-Kanade least squares problem can be summarized as,

  1. Linear the Lucas-Kanade residual function. The residual function has discrete terms. So we have to
    1. Compute derivative numerically.
    2. Interpolate for discrete functions.
  2. Solve normal equation \(J^J\Delta = J^Tb\)
  3. Update variables by, \([dx,dy] = [dx, dy] + \Delta \)
  4. If not converged, go back to 1.

Please compare this derivation with wikipedia or whatever you found by google.
Least squares and optimization framework is much easier to work with!

Implementation

GitHub link: https://github.com/yimuw/yimu-blog/tree/master/least_squares/lucas_kanade

Let’s go through a few key functions.

Practical considerations

For a real-time system, images come in sequence. We can choose the previous image as image B, so that we can compute the gradient when we are waiting for the next image.

The core least squares

const cv::Mat J = compute_jacobian(optimization_vars, feature_loc_cur_frame, 
                last_im_grad_x_, last_im_grad_y_);
                
const cv::Mat b = compute_b(optimization_vars, feature_loc_cur_frame, 
                last_im_, cur_im);

cv::Point2f delta = solve_normal_equation(J, b);

optimization_vars += delta;

We compute Jacobian and b, solve the normal equation and update optimization variables.

Compute the Jacobian

cv::Mat compute_jacobian(const cv::Point2f &velocity, 
                        const cv::Point2f &location, 
                        const cv::Mat &last_im_grad_x_, 
                        const cv::Mat &last_im_grad_y_)
{
    const int patch_size = (half_window_size_ * 2 + 1) * (half_window_size_ * 2 + 1);
    cv::Mat jacobian(patch_size, 2, CV_32F);

    size_t count = 0;
    // The 2 for loops can be generalized by a kernel function.
    for(float y ...)
    {
        for(float x ...)
        {
            const float last_x = x + velocity.x;
            const float last_y = y + velocity.y;
            // dx
            jacobian.at<float>(count, 0) = 
               -bilinear_interp(last_im_grad_x_, {last_x, last_y});
            // dy
            jacobian.at<float>(count, 1) = 
               -bilinear_interp(last_im_grad_y_, {last_x, last_y});
            ++count;
        }        
    }

    assert(count == patch_size);
    return jacobian;
}

We are filling the LHS of equation (1).

\(\begin{bmatrix}\frac{\partial{r_0} }{\partial{dx} } (cur) & \frac{\partial{r_0} }{\partial{dy}}(cur)) \\ … & …\\ \frac{\partial{r_i} }{\partial{dx} } (cur) & \frac{\partial{r_i} }{\partial{dy}}(cur)) \\ … & …\\ \frac{\partial{r_n} }{\partial{dx} } (cur) & \frac{\partial{r_n} }{\partial{dy}}(cur))\end{bmatrix} \begin{bmatrix}\Delta dx\\ \Delta dy \end{bmatrix} + \begin{bmatrix}r_0(cur)\\ …\\ r_i(cur) \\ …\\ r_n(cur)\end{bmatrix} \; \; (1) \)


Each pixel is a residual term in the least squares problem. We loop through each pixel and compute the Jacobian for the pixel difference w.r.t translation.

Interpolation

// https://stackoverflow.com/questions/13299409/how-to-get-the-image-pixel-at-real-locations-in-opencv
inline float bilinear_interp(const cv::Mat& img, cv::Point2f pt)
{
    .....

    const float v1 = img.at<float>(y0, x0);
    const float v2 = img.at<float>(y0, x1);
    const float v3 = img.at<float>(y1, x0);
    const float v4 = img.at<float>(y1, x1);

    const float val = (v1 * (1. - dx) + v2 * dx) * (1 - dy)
                + (v3 * (1. - dx) + v4 * dx) * dy;

    return val;
}

Bilinear interpolation is easy to implement

Compute the b term

cv::Mat compute_b(const cv::Point2f &velocity, 
                  const cv::Point2f &location, 
                  const cv::Mat &last_im, 
                  const cv::Mat &cur_im)
{
    const int patch_size = (half_window_size_ * 2 + 1) * (half_window_size_ * 2 + 1);
    cv::Mat b(patch_size, 1, CV_32F);

    for(float y...)
    {
        for(float x...)
        {
            const float last_x = x + velocity.x;
            const float last_y = y + velocity.y;

            b.at<float>(count, 0) = bilinear_interp(cur_im, {x, y}) 
                    - bilinear_interp(last_im, {last_x, last_y});
        }
    }
    return b; 
}

We are filling the RHS of equation (1).

\(\begin{bmatrix}\frac{\partial{r_0} }{\partial{dx} } (cur) & \frac{\partial{r_0} }{\partial{dy}}(cur)) \\ … & …\\ \frac{\partial{r_i} }{\partial{dx} } (cur) & \frac{\partial{r_i} }{\partial{dy}}(cur)) \\ … & …\\ \frac{\partial{r_n} }{\partial{dx} } (cur) & \frac{\partial{r_n} }{\partial{dy}}(cur))\end{bmatrix} \begin{bmatrix}\Delta dx\\ \Delta dy \end{bmatrix} + \begin{bmatrix}r_0(cur)\\ …\\ r_i(cur) \\ …\\ r_n(cur)\end{bmatrix} \; \; (1) \)

b term is just the residual function evaluated at the current variable.

Solve the normal equation

cv::solve(jacobian, -b, delta, cv::DECOMP_CHOLESKY | cv::DECOMP_NORMAL);

cv::DECOMP_NORMAL means solving \(A^TA x = A^Tb\).
cv::DECOMP_CHOLESKY is the best method for \(A^T A\) system.

Other stuff…

  • What is the initial position for each track?
  • Multiple tracks?
  • Track out of bound?
  • What feature to track?
  • Image IO?

Please read the code for details.

Papers Unlock

Lucas-Kanade 20 Years On: A Unifying Framework
Simon Baker and Iain Matthews

This paper gave an overview of the Lucas-Kanade algorithm and its variations.

It was a general and complete paper. But general and completeness also mean the paper is hard to understand for beginners. But overall, it was a good paper to read.

Further Reading

SVO: Fast Semi-Direct Monocular Visual Odometry
Christian Forster, Matia Pizzoli, Davide Scaramuzza

For the Lucas-Kanade method we discussed above, we optimized for the translation between two frames \((dx, dy)\). We can also optimize for the location of the camera, \((x,y,z, rotation)\). This is the direct method for visual odometry.

This paper did exactly that. However, the math details may be confusing at this point because the author used Lie groups parameterization for rotation.


ElasticFusion: Dense SLAM Without A Pose Graph
Thomas Whelan*, Stefan Leutenegger*, Renato F. Salas-Moreno† , Ben Glocker† and Andrew J. Davison

The kinect sensor has point-clouds and images. So we can do the Iterative Closest Points (ICP) and Lucas-Kanade together. This is exactly what the authors did.

Besides, The Kinect has point cloud deformation. To solve it, the author broke the point clouds into small pieces and did an SLAM for all the point cloud pieces.

What’s next?

Least Squares Robotics

This article is part of the Least Squares Robotics.

5 thoughts on “Lucas–Kanade Tracker

  1. Pingback: confeitofilm

Leave a Reply