Integer Programming & Linear Relaxation

What’s in this article?

I read an excellent book about Integer programming. I’d like to explain some very cool ideas in the book and implement some simple programs to demonstrate the integer programming problem.

Lots of the materials in this article comes from the book Integer Programming. I will add some supporting materials to make the explanation easier to understand.

The main topics in article are,

An introduction to the integer programming.
Explain the linear relaxation for integer programming problems.
Implement linear relaxation for a matching problem.
Prove the linear relaxation of the previous problem is tight.
Some readings/discussions. Including the IROS 2019 Best Student Paper Finalist and Best Application Paper Finalist.

The content will be a little bit advanced and there will be a few proves. I tried my best to make things enjoyable and intuitive.

1. What is Integer Programming?

1.1 Definition of Integer Programming

(All screen shoots comes from this book.)

In English, the optimization variable is \(x\). We want to maximize a linear function \(cx\) on a region defined by the intersection of half-plains \(S = \{x | Ax \leq b, x \geq 0 \}\). The optimization variable \(x\) is integral \(x \in {\Bbb Z}\).

The set (region) \(S = \{x | Ax \leq b, x \geq 0 \}\) (the collection of all x that satisfied \(Ax \leq b, x \geq 0\)) defined by the constraints is a convex polyhedron.

Note that if we drop \(x \in {\Bbb Z}\), the problem become the Linear Programming.

Let’s do an example.

1. The polyhedron is the feasible region for the linear program.
2. Dots are the feasible region for the integer program.
3. \(c\) is the weight for the linear cost.
4. The optimal value for the linear problem is achieved at a vertex of the polyhedron.
5. The optimal value for the integer program is achieve at the dot showed in the figure.

We can see the difference between an Integer Program and a Linear Program. For a Linear Program, the optimal value is always achievable at a vertex of the constrained polyhedron.

For the Integer Program, the optimal value is achieved at some integral x inside or on the boundary of the polyhedron.

1.2 How to solve the Integer Programming?

This section is optional. Feel free to skip.

A simple method to solve the Integer Programming exactly is the Branch-and-Bound method. An online of the method is given below.

The branch-and-Bound algorithm solves the integer programming problem by searching (depth-first-search). At each search node, we solve a linear program. We branch the search depending on the solution of the linear program.

Here is an example given in the Chapter 1 of Integer Programming.

The branch

We first solve the problem as a linear program. If the solution has a non-integer variable, we branch on the non-integer variable. For example, the problem in section 1.1 can be branched as,

An explanation for the graph. The linear programming solution for this problem is \((x_1 = 1.3, x_2 = 3.3)\) which is infeasible. Then we branch the problem. There are 2 branches. For branch 1, we add \(x_1 \leq 1\) to the problem and solve it as a linear program. For branch 2, we add \(x_1 \geq 2\) to the problem and solve it as a linear program.

After a branching, we “removed” the non-integer region \(1 < x_1 < 2\). We do the branching recursively. Eventually, we either get an all integer solution or get an infeasible program (empty polyhedron).

Why we stop when the linear program is integral?

Because a linear program is always an upper bound of the same program with integer constraint added. So, when the solution of the linear problem is integral, it is also the optimal solution of an integer program with the same linear constraints.

The above process is the depth-first-search.

A search node is a leaf node (at which the search return) when either the solution of the linear program for the node is integral or the linear program is infeasible.

The Bound

For a search, we can prune the tree by comparing the current best cost value with the cost value of the current tree node.

Since the optimal value of a linear program is an upper bound for the same program with more constraints (corresponding to programs after branching), when the optimal cost of the current linear program is smaller than the current best cost, we can return from the current node.

The Branch-and-Bound algorithm is a search algorithm. The search tree grows exponentially. In fact, the Integer programming problem is NP-complete.

In practice, we may simply treat an integer program as a linear program. It’s called the linear relaxation of integer programs.

2. The Linear Relaxation of Integer Programming

A nature way to solve a Integer Program approximately is to treat it as a Linear Program.

A interesting question is: When does the Linear Relaxation equivalent to the Integer Program?

The answer is surprisingly simple. When the vertices of the constrained polyhedron are all integral.

2.1 The Integral Polyhedron

A simple picture tells the story.

Top: The vertices of the polyhedron are not all integral. The Linear relaxation doesn’t equal to the original Integer Program.
Bottom: The vertices of the polyhedron are all integral. The Linear relaxation equal to the original Integer Program.

We can see from the picture that: when the vertices of the \(Ax \leq b\) (including \(x \geq 0\)) polyhedron is integral, the linear relaxation equals to the integral programming in terms of solution x and cost value.

A nature question is: when are the vertices of polyhedron \(Ax \leq b\) be all integral?

I will call a polyhedron with all integral vertices as a integral polyhedron in later section.

2.2 Unimodular Matrix

(Warning: I derived the Unimodular in a different way from the Integral programming book. The derivation should be correct.)

We want to answer this question:

Under what conditions such that the vertices of the \(Ax \leq b\) polyhedron are all integral?

Note that we merged the \(x \leq 0\) into the \(Ax \leq b\).

Let’s do a simple thought experiment.

Assuming the size of x is N. A vertex of the polyhedron is a solution of N equations in the \(Ax \leq b\) system. For example, in 2D, a vertex is the intersection of 2 lines; in 3D, a vertex is the intersection of 3 planes; In N dimensions, a vertex is the intersection of N hyper-plains.

To compute a vertex, we select N equations from \(Ax \leq b\). Let’s call the system of equations \(Bx = c\).

Assuming \(B\) is invertable, the problem becomes: under what condition such that the solution x is integral \(x = B^{-1}c \in {\Bbb Z}\)?

Before it, let me refresh the Cramer’s rule for matrix inversion.

2.2.1 Cramer’s Rule for Matrix Inversion

(pictures comes from this article)

To inverse a matrix \(A\), we can apply the Cramer’s rule

Where \(A_{ij}\) is the cofactor. It’s a scalar. It’s defined as the determinant of matrix A with row i and col j removed. Note that when A is integral since the formula for determinant only involves addition and multiplication, \(A_{ij} \in {\Bbb Z}\).

Here is a example for cofactors,

2.2.2 The condition for integral vertices \(x = B^{-1}c \in {\Bbb Z}\)

Go back to the question of what is the condition for integral vertices? i.e. The condition for \(x = B^{-1}c \in {\Bbb Z}\).

We can see that, for every \(c \in {\Bbb Z}\), \(x \in {\Bbb Z}\) if and only if \(B^{-1}\) is integral.

The question become when is \(B^{-1}\) integral?

Let’s apply the Cramer’s rule to compute \(B^{-1}\).

\(B^{-1} = \frac{1}{\det (B)}
\begin{bmatrix}
B_{11}& B_{21} & … & B_{n1} \\
B_{12}& B_{22} & … & B_{n2} \\
…& … & … & … \\
B_{1n} & B_{2n} & …& B_{nn}
\end{bmatrix} \)

When is \(B^{-1}\) integer? Since B is integer by definition, The cofactor \(B_{ij}\) is integral. The only requirement is \(\frac{1}{\det(B)}\) is integral. In other world, \(\det(B) = \pm 1\).

2.2.3 The Unimodular Matrix

From the previous discussion, we can see that for a polyhedron defined by \(Ax \leq b\), the vertices of the polyhedron is integral when the determinant of every N by N submatrix \(B\) is \(\pm 1\) (or 0 which mean the set of equations doesn’t define a vertex).

To capture this idea concisely (and make the discovery fancy). Let me define the unimodular matrix.

The definition for unimodular matrix from Integer Programming p43.

The definition is a little bit different from what we want for \(A\). The above definition is specifically for \((m \times n)\) matrices with rank m (fat matrices). But if we do a similar definition for a \((m \times n)\) matrix with rank n, I got what I want. Let me shamelessly redefine the Unimodular matrix for \((m \times n)\) matrices with rank n (tall matrices),

Definition of the Unimodular Matrix:

A \(m \times n\) matrix \(A\) is an unimodular rectangular matrix if it has rank n, it is integral and \(\det(B) =0, \pm 1\) for every \(n \times n\) submatrix B of A. In particular, a square matrix U is unimodular if it is integral and \(\det(U) = \pm 1\).

With the definition of the unimodular matrix, we can restate the condition for integral polyhedron.

Claim: Given a \(m \times n\) matrix A, the polyhedron defined by \(Ax \leq b\) with arbitrary integral b has only integral vertices if and only if A is an unimodular rectangular matrix.

We did the if part of the prove (A is unimodular \(\rightarrow\) for every b, the polyhedron \(Ax \leq b\) is integral) in section (2.2.2).

For the only if part of the prove (A is not unimodular \(\rightarrow\) polyhedron is not integral), Assuming a invertable submatrix \(B\) of \(A\) is not unimodular, then a vertex is given by \(x = B^{-1}c\). Since \(B\) is not unimodular, these exist non-integral element in \(B^{-1}\). We can always select a integral \(c\) to make the vertex \(x = B^{-1}c\) non-integral. (This prove is not very good, please read next section for how to find such a integral \(c\).)

In sum, we showed that,

\(\text{Linear Relaxation = Integer Program}
\\ \leftrightarrow \text{polyhedron defined by A is integral}
\\ \leftrightarrow \text{A is unimodular}\)

2.2.4 Total Unimodular Matrix

In 2.2.2, we showed: for a polyhedron defined by \(Ax \leq b\) with arbitrary \(b\), the vertices of the polyhedron are integral if and only if \(A\) is unimodular.

Instead, if we define the polyhedron by \(\{ Ax \leq b, x \geq 0 \}\), the condition for the polyhedron to be integral is: \(A\) is Total Unimodular.

This is what the authors in Integer Programming did. Let me go thought the prove.

The definition of total unimodular is every submatrix of A has determinant 0, \(\pm 1\).

The claim is,

To prove it, the if direction is,

Basically, since they are 2 set of inequalities \(Ax \leq b\) and \(x \geq 0\), to compute a vertex, we have to get some equations from \(Ax = b\) and some equations from \(x = 0\). That’s why we have the (4.1) where \(C\) corresponds to equations from \(Ax = b\) and I corresponds to equations from \(x = 0\). Then we just need to apply the definition of the Cramer’s rule and the definition of the total unimodular to show \(x = B^{-1}d \in {\Bbb Z}\).

The only if direction is,

The authors wanted to show: when A is not total unimodular, at least one vertex of the polyhedron is not integral.

Assuming C is a submatrix of A. The determinant of C is not \(\pm 1\). The idea is to construct a vertex \(x\) by C. By Cramer’s rule, \(C^{-1}\) has fraction number. We can construct a integral \(\bar b\) such that a vertex \(C \bar x = \bar b \leftrightarrow \bar x = C^{-1} \bar b\) is not integral.

The author define \(\gamma\) to be a fractional column of \(C^{-1}\). They selected a vertex \(x = \text{ceil} (\gamma) – \gamma\). The \(\text{ceil}(.)\) is the ceil operator which round a number to the closest larger or equal integer. Note that \(x\) is fractional.

Then, the author showed there exist a integral \(b\) such that \(C x = b\). Start with

\(C x = C \text{ceil} (\gamma) + C \gamma\).

The first term is integral because \(C\) and \(\text{ceil} (\gamma)\) are integral.

The second term is integral because,

\(C \gamma = C \text{[a col of C inverse]} = \text{[a vector with n-1 zeros and 1 one]} \in {\Bbb Z}\).

As a result, there exist a integral \(b\) and a fractional \(x\) such that \(C x = b\). In other world, there is a non integral vertex \(x\) for integral \(C\) and integral \(b\).

This proves that when A is not total unimodular then at least one vertex of the polyhedron is not integral.

please note that I ignore the \(x \leq 0\) in the prove. So my prove is actually for the statement in the previous section. Handing \(x \leq 0\) is straightforward. Please read the above picture for details.

The prove looks very easy but I spend a long time to understand it. My feeling is doing the prove by merging \(Ax \leq b\) and \(x \geq 0\) together is simpler to understand. That’s why I redo the prove at the beginning of this section.

PS: Learning math by reading a graduate textbook is an impossible job. I really miss the time when Professor wrote done a prove step by step.

3. The Optimal Matching Problem

Lots of math! It’s time for applications.

3.1 Bipartite Graph Matching

The Bipartite Graph matching problem is a classical combinatorial optimization problem.

Consider an example, given a set of people and a set of jobs. There is a score for each person-job pair. The score indicates how good a person can do the job. The goal is to find a matching such that the sum of scores for matched pairs is maximized.

Let me draw a picture.

The matching problem can be formulated as an Integer program.

A well know fact for the Bipartite matching program is the linear relaxation (replace \(x_{ij} \in \{1, 0\}\) by \(x_{ij} \geq 0, x_{ij} \leq 0\)) for the the Bipartite matching Integer program is tight.

In fact, If we write the constraints in the \(Ax \leq b\) form, we can show the constrained matrix \(A\) is totally unimodular. Recall from the previous section, if the constrained matrix is totally unimodular, the linear relaxation for an integer program is tight.

We can solve the bipartite matching problem by linear programming. Another common method to solve the Bipartite matching is the Hungary Method. Please read page 149 of Integer Programming for details.

I will implement the Bipartite Matching in later section.

3.2 General Graph Matching

In this section, I will review a paper on general graph matching. I will also provide a simplified version of the problem for clarity.

3.2.1 The multiple object tracking problem.

A Linear Programming Approach for Multiple Object Tracking
Hao Jiang, Sidney Fels and James J. Little

Assuming a detector has found bounding boxes (bboxes) on a video. The detector has noise. A nature idea is to associate bboxes belong to the same object across frames and smooth out the noise. That’s what the author did in this paper.

The author modeled the association problem using a graph.

This can be confusing. If I understand it correctly, the assumption is an object always exists in the video. So the author only added the s node at the beginning and the t node at the end.

Let’s ignore the occlusion handing part. The association graph modeled:

For every observation (round node) pair in the near-by frame, a weighted edge models the similarity of the pair. The weight is a similarity measurement for 2 observations. Please read the paper for a detailed discussion of how the similarity is defined.
Each observation connects to a start node and an end node. The start & end node model the possibility that the observation is the start or end of a track. (Note: the paper did something else.)

In the end, the association problem is modeled as a Mixed Integer Program.

The author wanted to,

The first cost term: maximize the sum of similarities between matched observations (it should be minimizing the cost, but maximizing similarities is easier to understand).
The second cost term: Given matches, making the position of an object smooth across frames.
The constraints: subject to an observation only match to less than 1 other observation.

In the end, the author relaxed the integer program to a linear program and solve it. The observation from the author was: “the linear program has a high probability of directly given the globally optimal solution (of the integer program)”. The solution of the linear relaxation of the problem is not guaranteed to be integral.

Integer program != Linear relaxation for this problem

3.2.2 A Simpler Association Problem

To make things easier to understand and implement, let me introduce a simpler version of the matching problem.

A easier version of the association graph can be formulated as,

The intuition for the constraints are: an observation matches less than once.

The graph is weighted. The weighted edges between measurements model the similarity between measurements. Each measurement connects to 2 special nodes. The start node and the end node. The weighted edges which connect to start/end node model the possibility that the observation is the start or end of a track. All weights are positive.

The optimization variables are binary. i.e. \(x \in \{ 0, 1\}\). It indicates whether nodes connected by an edge are matched. For example, \(x_{ab}^{cd}\) means whether the measurement b in layer a matches measurement d in layer c.

The constraints for the problem are:

forward matching constraints: a measurement can only match the start node or a measurement from the previous layer or nothing.
backward matching constraints: a measurement can only match the end node or a measurement from the next layer or nothing.

If weights are positive, we can see that a track always starts at the start node and end at the end node. In fact, we can also use flow constraints to model this problem. But I prefer the “matching less than once” constraints because it’s more intuitive.

This problem is simple enough. I can implement it without too much hair loss.

4. Implementation

I implemented the general graph matching problem in section (3.2.2). We will construct the graph semi-randomly and solve its linear relaxation. In the end, we can check whether the linear relaxation of the integer program is tight (the solution of the linear relaxation is the same as the integer program).

4.1 Initialization

class AssociationGraph:
    def __init__(self):
        self.layer_sizes = None
        # random experiment
        RANDOM_SIZE = True
        if RANDOM_SIZE:
            size = random.randint(2, 10)
            self.layer_sizes = [random.randint(2, 30) for i in range(size)]
        else:
            self.layer_sizes = np.array([10, 10, 9])

        self.len_layers = len(self.layer_sizes)

        # association reward
        self.weights = []
        # map edges in the graph to a linear array
        self.weights_linear_map = [0]
        for i in range(self.len_layers - 1):
            # noise for association reward
            w = np.random.uniform(
                0.0, 0.4, (self.layer_sizes[i], self.layer_sizes[i+1]))
            # Associated nodes.
            # Without loss of generality, put associated cost on diag.
            min_len = min(self.layer_sizes[i], self.layer_sizes[i+1])
            diag_w = np.random.uniform(0.3, 1., min_len)
            # TODO: function? diag?
            for j in range(min_len):
                w[j][j] = diag_w[j]

            self.weights.append(w)
            self.weights_linear_map.append(w.size)

        self.start_reward = np.random.uniform(0.1, 0.5)
        self.end_reward = np.random.uniform(0.1, 0.5)

        self.weights_linear_map = np.cumsum(self.weights_linear_map)

At line 5 ~ 9, I set the length of the layer and size of nodes in each layer. There was a option to generate the length of layers and size of each layer randomly.

At line 20 ~ 31, I generated the association matrix for neighboring layers. Without loss of generality (up to permutation), I assumed pairs in the diagonal corresponding to matches. The similarities for matches, which corresponding to diagonal matrix elements, were drawn from a distribution with high mean. For off-diagonal elements, they correspond to no matches. They were drawn from a distribution with a low mean.

At line 33-34, I generated the weights for edges connected to the start/end node.

4.2 Index map

The variables for the integer program (solved as linear relaxation) are booleans for every edge. Because most of the linear program solvers expect a vector as the variable, I need to define a map which maps edges in the graph to a vector.

# l1<-layer       l2
# ..u.           ..v.
#
def edge_map(self, layer, u_idx, v_idx):
    offset = self.weights_linear_map[layer]
    adj_mat_size = self.weights[layer].shape
    rows, cols = adj_mat_size

    return offset + u_idx * cols + v_idx

# map the start edge for a node in a layer to a linear idx
def start_edge_map(self, layer, u_idx):
    start_idx = self.weights_linear_map[-1] + sum(self.layer_sizes[:layer])
    result = start_idx + u_idx
    return result

# map the end edge for a node in a layer to a linear idx
def end_edge_map(self, layer, u_idx):
    start_idx = self.weights_linear_map[-1] + \
        sum(self.layer_sizes) + sum(self.layer_sizes[:layer])
    result = start_idx + u_idx
    return result

To convert edges into a vector, I flattened the edges between 2 layers and concatenated them together. Then, I concatenated edges from the start node to every observation node. and concatenated the edges from every observation node to the end node.

4.3 Construct the Linear Relaxation

def constraint_lp(self):
    num_vars = 0
    # a variable for a association
    for i in range(self.len_layers - 1):
        num_vars += self.layer_sizes[i] * self.layer_sizes[i+1]
    # a variable for the creation & the ending for each node
    num_vars += 2 * sum(self.layer_sizes)

    # num of constraint, 2 for each graph node
    A = np.zeros((2 * sum(self.layer_sizes), num_vars))

    print('num layers:', self.len_layers, ' num nodes:', sum(self.layer_sizes), 'num edges: ',
          num_vars, ' num graph constraint:', A.shape[0])

    constraint_idx = 0

    c = np.zeros(num_vars)

    for u_layer in range(len(self.layer_sizes) - 1):
        w = self.weights[u_layer]
        rows, cols = w.shape

        # update c for each edge
        for u in range(rows):
            for v in range(cols):
                idx = self.edge_map(u_layer, u, v)
                c[idx] = w[u][v]

        # constraints for out going edges
        for u in range(rows):
            for v in range(cols):
                idx = self.edge_map(u_layer, u, v)
                A[constraint_idx][idx] = 1.

            end_edge_idx = self.end_edge_map(u_layer, u)
            A[constraint_idx][end_edge_idx] = 1.
            constraint_idx += 1

        # constraints for in going edges
        for v in range(cols):
            for u in range(rows):
                idx = self.edge_map(u_layer, u, v)
                A[constraint_idx][idx] = 1.

            start_edge_idx = self.start_edge_map(u_layer + 1, v)
            A[constraint_idx][start_edge_idx] = 1.
            constraint_idx += 1

    # update c for start & end
    for layer in range(self.len_layers):
        layer_size = self.layer_sizes[layer]
        for u in range(layer_size):
            start_edge_idx = self.start_edge_map(layer, u)
            end_edge_idx = self.end_edge_map(layer, u)
            c[start_edge_idx] = self.start_reward
            c[end_edge_idx] = self.end_reward

    # start edges for the first layer
    for u in range(self.layer_sizes[0]):
        start_edge_idx = self.start_edge_map(0, u)
        A[constraint_idx][start_edge_idx] = 1.
        constraint_idx += 1

    # end edges for the last layer
    for u in range(self.layer_sizes[-1]):
        end_edge_idx = self.end_edge_map(len(self.layer_sizes) - 1, u)
        A[constraint_idx][end_edge_idx] = 1.
        constraint_idx += 1

    assert(constraint_idx == A.shape[0])
    b = np.ones(constraint_idx)

    if False:
        print('A:\n', A)

    # TODO: check if every row of A is not perpendicular to c?
    return A, b, c, num_vars

To construct the LP, I followed the definition of the problem in (3.2.2) and the definition of the index map.

Line 19~47: construct the constraint for (1) the sum of out edges are less than 1 and (2) the sum of in edges are less than 1.

Line 49 ~ 71: handle cases for the starting/ending layer.

4.4 Solve the Linear Relaxation

With the linear relaxation constructed, we can solve the linear program by the CVXPY solver.

def lp(self):
    A, b, c, num_vars = self.constraint_lp()

    # Define and solve the CVXPY problem.
    x = cp.Variable(num_vars)
    prob = cp.Problem(cp.Minimize(- c.T@x),
                      [A @ x <= b, x >= 0, x <= 1])
    prob.solve(solver=cp.OSQP, verbose=False, eps_abs=1e-8, eps_rel=1e-8)

    ...

    # TODO: what's the precision for cvxpy?
    epsilon = 1e-7
    integral_mask = (np.abs(x.value-0) <
                     epsilon) | (np.abs(x.value-1) < epsilon)
    num_integral = sum(integral_mask)
    print("Num integral:", num_integral)
    relaxation_success = False
    if num_integral == num_vars:
        print("Integer Programming == Linear Programming !!")
        relaxation_success = True
    else:
        print("Linear relaxation != Integer Programming")
        relaxation_success = False
    ...

    return x.value, relaxation_success

Line 5~8: solve the LP. Please note the linear relaxation for \(x \in \{0, 1\}\) is \(x \leq 1, x \geq 0\).

Line 13~24: code to check whether the linear relaxation is tight.

4.5 Reconstruct the Matching Path

The LP outputted the optimal assignments for each matching pairs. However, We are interested in tracks across time. To get the current tracks (tracks end at the current time) from the LP, we can build the weighted directed graph reversely and do a greedy graph traversal to get a path from the end node to the start node.

def build_graph(self, lp_result_x):
    # adjacent list
    direct_graph = defaultdict(list)

    # build the layer to layer graph
    for u_layer in range(len(self.layer_sizes) - 1):
        w = self.weights[u_layer]
        rows, cols = w.shape

        for v in range(cols):
            v_name = 'layer:{} node:{}'.format(u_layer + 1, v)
            for u in range(rows):
                u_name = 'layer:{} node:{}'.format(u_layer, u)
                x_idx = self.edge_map(u_layer, u, v)
                direct_graph[v_name].append((u_name, lp_result_x[x_idx]))

    # connect to start
    start_name = 'start'
    for layer in range(self.len_layers):
        layer_size = self.layer_sizes[layer]
        for u in range(layer_size):
            u_name = 'layer:{} node:{}'.format(layer, u)
            x_idx = self.start_edge_map(layer, u)
            direct_graph[u_name].append((start_name, lp_result_x[x_idx]))
    direct_graph['start'] = [(None, 0.)]

    return direct_graph

def find_association_greedy(self, lp_result_x):
    graph = self.build_graph(lp_result_x)

    result = []
    # just need to consider current node
    for end_node in range(self.layer_sizes[-1]):
        node = 'layer:{} node:{}'.format(self.len_layers - 1, end_node)
        path = []
        while(node is not None):
            path.append(node)
            # [(node, weight), .....]
            next_nodes = graph[node]
            next_with_largest_weight = sorted(
                next_nodes, key=lambda x: x[1], reverse=True)[0]
            node = next_with_largest_weight[0]

        path.reverse()
        result.append(path)

    return result

Line 1 ~ 27: use the result from the LP to build a weighted directed graph.

Line 29~48: compute tracks end at current time by greedily traverse the graph.

4.6 Bipartite Result

Let’s run the algorithm on a bipartite graph. The size for the 2 layers is (10, 10).

...
self.layer_sizes = np.array([10, 10])
...

def run_single():
    association = AssociationGraph()
    x, relaxation_success = association.lp()
    result = association.find_association_greedy(x)
    for p in result:
        print('path: ', p)
        
if __name__ == "__main__":
    run_single()

yimu@yimu-mate:~/Desktop/blog/yimu-blog/random/lp_ip$ python3 lp_ip.py 
num layers: 2  num nodes: 20 num edges:  140  num graph constraint: 40
Num integral: 140
Integer Programming == Linear Programming !!
path:  ['start', 'layer:0 node:0', 'layer:1 node:0', 'end']
path:  ['start', 'layer:0 node:1', 'layer:1 node:1', 'end']
path:  ['start', 'layer:0 node:2', 'layer:1 node:2', 'end']
path:  ['start', 'layer:0 node:3', 'layer:1 node:3', 'end']
path:  ['start', 'layer:0 node:4', 'layer:1 node:4', 'end']
path:  ['start', 'layer:0 node:5', 'layer:1 node:5', 'end']
path:  ['start', 'layer:0 node:6', 'layer:1 node:6', 'end']
path:  ['start', 'layer:0 node:7', 'layer:1 node:7', 'end']
path:  ['start', 'layer:0 node:8', 'layer:1 node:8', 'end']
path:  ['start', 'layer:0 node:9', 'layer:1 node:9', 'end']

We have 2 conclusions,

Observations match diagonally as I defined in (4.1). i.e. node i in layer 0 matches node i in layer 1.
The solution for the linear relaxation is integral. i.e. The linear relaxation is tight for the bipartite matching.

4.7 The general matching

Let’s do a multiple layers matching problem defined in (3.2.2).

...
self.layer_sizes = np.array([3,4,3,4])
...

def run_single():
    association = AssociationGraph()
    x, relaxation_success = association.lp()
    result = association.find_association_greedy(x)
    for p in result:
        print('path: ', p)
        
if __name__ == "__main__":
    run_single()

yimu@yimu-mate:~/Desktop/blog/yimu-blog/random/lp_ip$ python3 lp_ip.py 
num layers: 4  num nodes: 14 num edges:  64  num graph constraint: 28
Num integral: 64
Integer Programming == Linear Programming !!
path:  ['start', 'layer:0 node:0', 'layer:1 node:0', 'layer:2 node:0', 'layer:3 node:0', 'end']
path:  ['start', 'layer:0 node:1', 'layer:1 node:1', 'layer:2 node:1', 'layer:3 node:1', 'end']
path:  ['start', 'layer:0 node:2', 'layer:1 node:2', 'layer:2 node:2', 'layer:3 node:2', 'end']
path:  ['start', 'layer:3 node:3', 'end']

There are 4 associated paths (tracks) ended at layer 3. 3 of them start at layer 0. 1 of them start at layer 3. This is cohesive with the creation of the weight.

A very interesting observation is, for a multi-layer association problem, we still got Linear relaxation == Integer program.

What’s going on?

4.8 Simulation for checking Linear relaxation == Integer program.

Let’s try 10,000 runs of the linear relaxation and check the percentage of liner relaxation == integer program.

def relaxation_experiment():
    total_try = 10000
    
    success_count = 0
    solver_failure = 0

    for i in range(total_try):
        try:
            print('iter:', i)
            association = AssociationGraph()
            x, relaxation_success = association.lp()

            if relaxation_success:
                success_count += 1
        except:
            solver_failure += 1

    print('total try:', total_try, 
          'relaxation success count:', success_count, 
          'solver failures:', solver_failure)
          
if __name__ == "__main__":
    relaxation_experiment()

....
num layers: 4  num nodes: 53 num edges:  348  num graph constraint: 106
Num integral: 348
Integer Programming == Linear Programming !!
iter: 9996
num layers: 9  num nodes: 165 num edges:  3029  num graph constraint: 330
Num integral: 3029
Integer Programming == Linear Programming !!
iter: 9997
num layers: 4  num nodes: 74 num edges:  1363  num graph constraint: 148
Num integral: 1363
Integer Programming == Linear Programming !!
iter: 9998
num layers: 9  num nodes: 143 num edges:  2345  num graph constraint: 286
Num integral: 2345
Integer Programming == Linear Programming !!
iter: 9999
num layers: 8  num nodes: 133 num edges:  2212  num graph constraint: 266
Num integral: 2212
Integer Programming == Linear Programming !!
total try: 10000 relaxation success count: 10000 solver failures: 0

For 10,000 experiments, we have Linear relaxation == Integer program.

Can we prove the solution of the linear relaxation of (3.2.2) is guaranteed to be integral?

5. The Linear Relaxation for the Multi-layer Matching Problem is Tight

Warning: the material in this section is original (or I didn’t read enough papers and someone did it 50 years ago). I am pretty sure it’s correct. But if you found something strange, please point out. Update: I found a paper! Someone prove it 10 years ago!

The multi-layer matching problem proposed in section (3.2.2) is,

In the previous section, I implemented the linear relaxation for the problem. Interesting, for randomly generated layers and weights, the linear relaxation is tight for all runs.

Can we prove that the linear relaxation of the problem is tight?

Yes we can.

5.1 The multi-layer graph without the start & end nodes is bipartite

Firstly, let me define what’s the bipartite graph.

Definition of the bipartite graph from Wikipedia,

A bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets \(U\) and \(V\) such that every edge connects a vertex in \(U\) to one in \(V\).

A example (also from Wikipedia) of bipartite graph is,

Now let’s consider the multi-layer matching graph without the start and end nodes.

For the graph in (3.2.2) with the start and the end node remove, by grouping nodes in odd layers as \(U\) and grouping nodes in even layers as \(V\), the graph is bipartite.

5.2 The Bipartite Matching problem has Tight Linear Relaxation.

(Most of the materials comes from the chapter 4 of Integer Programming.)

To show that bipartite matching has tight linear relaxation, we need to go through a chain of proves. The high-level idea is to show, for the bipartite matching problem, the constraints matrix \(A\) in \(Ax \leq b, x \geq 0\) is totally unimodular. As we discuss in section 2, if \(A\) is totally unimodular, the linear relaxation is tight.

To make things easier, I will not prove every statement. If you are interested in proves, please read chapter 4 of Integer Programming.

Firstly, we define the Equitable Bicoloring.

Then, we can derive a useful statement.

Also, we need to define the incidence matrix of a graph (not the common adjacent matrix). The incidence matrix captures the matching constraints which says a node can only match less than one other node (in both forward and backward direction).

The incident matrix defines the “less than 1 matching” constraints.

For example, consider the incidence matrix of this graph.

Here comes the key observation. For a bipartite graph, due to the definition of the incidence matrix, we can apply Corollary 4.8 to show the incidence matrix of a bipartite graph is totally unimodular.

For a bipartite graph, an edge connects a node from set \(U\) and a node from set \(V\). So for each column of the \(A_G \) matrix (a column shows how an edge connects to nodes), there are only 2 ones.

We can color rows for nodes in \(V\) as red, and rows for nodes in \(U\) as blue. Again because of an edge only connect a node from \(V\) and node from \(U\), the sum of the blue rows minus the sum of the red rows equal to the 0 vector. There exist an equitable bicoloring for the incidence matrix for the bipartite graph. Directly applying Corollary 4.8 shows the incidence matrix for a bipartite graph is totally unimodular. Thus the linear relaxation for the bipartite matching problem is tight.

The equitable bicoloring for the example is,

We we conclude the discovery by this statement,

Here is a counter example.

If the graph is not bipartite, we have,
(1) there doesn’t exist an equitable bicoloring;
(2) the incidence matrix is not totally unimodular; and
(3) the linear relaxation is not tight.

1. Can’t color the third row of \(A_G\).
2. So \(A_G\) is not totally unimodular.
3. The exist a non-integral vertex.

5.3 The Linear Relaxation for the Multi-layer Matching is Tight

When you see the title, you may have a question: “hey dude, you just proved the relaxation is tight if and only if the matching graph is bipartite. Are you joking?”. Don’t be mad, keep reading.

The Linear Relaxation for the multi-layer matching defined is (3.2.2) is tight. And the graph for (3.2.2) is not bipartite.

Why the result seems conflicting with the previous section? because the constrained matrix \(A\) for the problem (3.2.2) is not exactly the incidence matrix.

For problem (3.2.2), for edges between observation nodes, the constraints are equivalent constraints defined by the incidence matrix. However, we don’t have constraints for the start and the end nodes.

With this in mind, we can make the following argument.

To make things easier, let me define \(r\) to be the edge between an observation node and the start/end node.

In section (5.1), we showed the multi-layer matching graph without the start and the end node is bipartite. So there exists an equitable bicoloring for the graph. Then, let’s add the start and the end nodes. Because we don’t have the matching constraints for the start and the end nodes, for \(r\), there is only one constrained equation for it. It is equivalent to: there is only one row of the constrained matrix that has a one for the variable \(r\). So not matter how I color the row which contains \(r\), the final result for the subtraction of the sum of equitable bicoloring for r is always \(\pm 1\). In another word, every \(r\) doesn’t participate in the equitable bicoloring game. So we can color nodes the same way as for the bipartite graph. By Corollary 4.8, the constrained matrix is totally unimodular. Thus the linear relaxation is tight.

The prove by argument can be confusing. Here is a example.

We can model edges which connects to start/end as single-directional.
e1 is a edge which connects observations nodes.
e2,e3,e4,e5 are edges that connects a observation node and the start/end node. There are the \(r\) edges defined above.
*e2,e3,e4,e5* don’t participate in the row equitable bicoloring game.

The conclusion is the constrained matrix of the multi-layer matching problem is totally unimodular. The linear relaxation for the multi-layer matching problem is tight.

5.4 Why the conclusion is different from the paper

In section (3.2.1), the conclusion of the paper was: The solution of the linear relaxation of the problem is not guaranteed to be integral.

Why the problem defined in (3.2.2) showed the opposite?

Because occlusions in (3.2.1) are modeled as skip connections. e.g. an edge connects a node in layer i to a node in layer i+2. and the flow constraints are active for these edges.

Skipping connections makes the matching graph non-bipartite even without the start/end node. Using Theorem 4.18, the graph is not equitable bicoloring, and the linear relaxation for it is not tight.

6 Discussion

6.1 How to solve a linear program?

Linear programming is the building block for solving integer programming. In the in branch-and-bound method, an IP is solved exactly by split the IP as LPs. We can even simply treat an IP as an LP and hope for an integral LP solution.

How to solve a LP?

There are 2 mainstream methods to solve linear programming. The Simplex Method and The Interior Points Methods.

6.1.1 The Simplex Method

As we discussed previously, if a linear program is feasible and bounded, the optimal value always achieved at a vertex defined by the constraints.

To optimize a linear problem, we can simply jump from the current vertex to a “nearby” vertex which has higher objective value.

This idea is implemented by the Simplex Method.

This idea sounds simple. But keep in mind that computationally, all we have are \(c^T x\) and \(A x \leq b\). The Simplex Method has to reason about a “better” vertex using only this information. Given all this, the Simplex Method is actually not that simple. Please read this great tutorial for the Simplex Method.

6.1.2 Interior Points Methods

The linear program is a subset of convex optimization problem.

A convex optimization problem can be expressed as,

\(\text{minimize}_x \; \; cost(x) \\ \text{subject to: } h(x) \leq 0\)

Where the \(cost(x)\) and \(h(x)\) are convex functions.

A general method to solve a convex optimization is the Primal-Dual Interior Points Methods where we solve for the KKT condition for the convex optimization.

Please read this article for a simple tutorial.

6.2 The Unimodular Condition is Strict

In section 2, we proved that,

For an IP with constraints \(Ax \leq b\), the linear relaxation is tight for every \(b\), when A is unimodular.

This state is very strict. The problem is the condition: “for every \(b\)”. In fact, the example I gave in section 2 is not unimodular.

The constraint is not Unimodular. But for this particular \(b\), the \(Ax \leq b\) system is integral.

Given a \(A x = b\) system for a particular A and a particular b, I guess we can use the Cramer’s rule to show that when \(\det A = \text{common-divisor}(b)\), x is integral. But this seems like a sufficient condition.

When the solution of \(A x = b\) is integral, is there any theory about the condition for A and b? Please let me know if you have any references.

6.3 Famous Integer Programming Problems

Many graph algorithms and search algorithms can be formulated as Integer Programs.

Just read the chapter 2 and chapter 4 of Integer Programming.

Knapsack Problem, Sudoku, Traveling Salesman, Shortest path, Spanning tree …

You can use an Integer programming library to do graph interview questions! (But the feedback could be “not a cultural fit”). In fact, the Integral program is solved by searching (as discussed in section 1). So, I don’t think it is a big surprise that the Sudoku can be solved by Integer programming.

But there are caveats. Building the constraints \(A x \leq b\) for IP can be devastating and the performance of IP can be worse than a search with good heuristics.

6.4 Multiple Robots Planning

The discrete multiple robots planning problem can be modeled as integer programming.

Let’s read a paper.

Planning Optimal Paths for Multiple Robots on Graphs
Jingjin Yu Steven M. LaValle

The authors used Integer programming to solve the multiple robots planning problem.

The precise definition of the problem is,

To be honest, I read this paragraph a few times but I can’t fully understand it. In the next section, there is a follow-up paper that gave a better definition of the problem.

Anyway, The planning problem is modeled as a graph search. An example of a graph search problem is the Dijkstra algorithm which finds the minimum weight path from node A to node B.

The multiple robots planning is modeled as a multiple paths graph search. Given a directed graph, we want to find a path for each robot. No two paths collide. Specifically, no two robots stay at a vertex at the same time. and no two robots move oppositely on an edge at the same time.

An example is the 9-puzzle problem.

The optimal moves from (a) to (b) are,

From this example, I hope you can see the graph search doesn’t happen in the graph defined in Fig 2. In fact, we need to expand the graph in Fig 2 across time to capture the location of a robot at a time.

The key contribution of the paper is a method to construct a time expanded graph for the planning problem.

For the graph building, the authors gave an example.

Fig 4 (a) defined a multi-robots planning problem. Robot 1 wanted to move from \(s_1^+\) to \(s_1^-\). Robot 2 wanted to move from \(s_2^+\) to \(s_2^-\). The constraint is the path of robots doesn’t collide at the same time.

I will call the fig 4 (a) is the map of the problem. It is not the graph for the Integer programming.

Then, the authors built the graph for the planning problem. To model the notion of time. The author concatenated many maps together. Each map represents where each robot is at a time.

To model collision constraints, the author used a trick called the gadget (fig 4 b). For 2 nearby nodes at a consecutive time, let’s say node u at time t and time t+1, node v at time t and time t + 1, the author connects them in the time dimension like Fig 4 (b). All edges have weight (capacity) 1. The author applies flow constraints to the gadget graph. i.e. The sum of in values equals the sum of out values.

Put everything together, at time t, the active edge can only be one of \(u(t)\) or \(v(t)\). At time t+1, the active edge can only be one of \(u(t+1)\) or \(v(t+1)\). In other word, valid paths are \(\emptyset\), \(u(t) => u(t+1)\), \(v(t) => v(t+1)\), \(u(t) => v(t+1)\) and \(v(t) => u(t+1)\). No paths collide under the gadget configuration with flow constraints.

The description from the paper is here for reference.

By adding a time dimension and using the gadget to constrain paths across time, the author generated a planning graph like this,

Interestingly, the author didn’t label two nodes in Fig 4 (a). As far as I can tell, the 2nd row of Fig 5 corresponding the node between \(s_1^-\) and \(s_2^-\) in Fig 4 (a) since it formed gadgets with \(s_1^-\) and \(s_2^-\). The 5th row of fig 5 corresponding to the node between \(s_1^+\) and \(s_2^+\) in fig 4 (a).

The red edges are auxiliary edges to make the constraints uniform. i.e. The flow constraint equation also applies for \(s_i^-\), \(s_i^+\). I guess (1) creating a start node and connect it to \(s_i^+\); (2) creating an end node and connect it to for \(s_i^-\); and (3) don’t apply flow constraints to the start/end node, also does the job.

Given a time expanded graph, the author formulated an Integer program on the graph.

(8) means less than one robot occupied an edge in the time expanded graph. With the gadget, it modeled the two collision constraints. (9) makes the path of a robot continuous (or the flow constraint). (10) is the objective function which minimizes the total length of all paths.

There is a caveat. The time expanded graph is constructed by a given T. For small T, the algorithm may fail to find a solution. The author started with a reasonable T. If the T doesn’t work, the author increased T by 1.

The algorithm works well in practice. For a (20 × 15) grid with obstacles, the running time of the algorithm is,

For each element , the first number is used time in seconds. The second number is the average length of paths.

Spending 15 seconds to compute a optimal path for 30 robots on a (20 x 15) grid is impressive.

The traditional method to solve this problem is search (DFS, BFS, A*). Please read 8 puzzle Problem using Branch And Bound for more details. The search space grows exponentially. I am not confident that I can solve the problem in 15 seconds by DFS. Interestingly, the Integer programming solver internally uses searching too (the algorithm is called Branch and Bound too as we discuss in section 1).

For the Branch and Bound for Integral programming, a nice property is the bound given by linear programs is free. Whereas for a DFS search, the bounding (pruning) method is problem-dependent. We have to look for a bounding method in the LeetCode discussion forum (it’s a joke, I hope you get it).

6.5 Best Paper of IROS 2019

6 years later, the author wrote a paper to discuss a general integer programming framework for graph-based planning problems. The paper is the Best Student Paper Finalist and Best Application Paper Finalist of IROS 2019

Integer Programming as a General Solution Methodology for Path-Based
Optimization in Robotics: Principles, Best Practices, and Applications
Shuai D. Han Jingjin Yu

The author discussed the multi-robot path planning (MPP), the minimum constraint
removal (MCR), and the reward collection problems (RCPs). I will only focus on the MPP which is the same problem as we discussed in the last section.

The key finding from the paper is: there are 2 steps to convert a path-based optimization problem to an integer program. (1) The path-encoding step and (2) adding additional constraints step. All 3 planning problems above can be solved by these 2 steps.

In another word, the first step is to construct a graph (for example, expand the original graph in time). The second step is adding constraints to the graph. After these two steps, an integer program can be formulated on the graph.

Let’s go thought the MPP problem again in this paper.

The problem definition for MPP is much better in this paper.

The definition of graph and paths on the graph.

The MPP problem puts additional constraints on the Path-Based Optimization problem.

To construct the Integer problem for an MPP. The first step is constructing the Time-Expanded-Graph Encoding. The procedure almost the same as in the previous section. The difference is the author directly connected neighboring nodes in consecutive time layers, instead of using the “gadget”.

Then, the author added general path constraints. (9) means there is a path going through every start node and every end node. (10) is the path continuity constraint (the flow constraints).

The second step is adding specific constraints for MPP. i.e. Adding collision free constraints.

(11) prevents robots from occupying the same vertex at the same time. Because the authors didn’t use the “gadget”, they used specific constraints (12) to avoid the head-to-head collisions on an edge. Whereas, using “gadget” convert the collision constraints into the path continuity constraint. My guess is the new formulation introduced fewer variables. So it’s faster.

Given the graph (which defines IP variables) and constraints, the author used an Integer programming solver to solve the problem.

The 2 steps procedure also works for the minimum constraint removal (MCR) and the reward collection problems (RCPs). Please read the paper for details.

6.6 Compute the Similarity

In section 3, we discussed the matching problem. But we didn’t discuss how to get the similarity (weight of directed edges) of the matching graph.

There are many methods to compute the similarity. In the paper in (3.2.1), the author used color histogram to compute the similarity.

We can also learn it.

In this paper, the authored learned the similarity by end-to-end deep learning.

The Detection Net is not in the sense of detecting bboxes from lidar points (it was done by the proposal net in this paper). The detection net outputs a binary for a bbox to indicate whether the bbox is a true positive or false positive. I think we can ignore it since creating a length 1 track for false positive is acceptable.
The Matching Net output weights for edges in the matching linear program. It’s the key network.

The inputs to the network are detections bboxes & cropped lidar for each bboxes. The outputs from the network are association weights. Using the weights, the author used LP to compute an optimal assignment \(y\).

The author defined the cost function to be the difference between the optimal assignments \(\hat y\) from the matching LP and the ground-truth assignments \(y\).

However, there is a problem. The nature cost function for binary vectors \(y\) is the Hamming loss which counts the fraction of the wrong labels to the total number of labels. The Hamming loss is not differentiable.

To handle this problem the author used the structured hinge loss.

The structured hinge loss replaces the discontinuous loss \(\Delta(y, \hat y)\) by a continuous upper-bound of it.

Replace \(\Delta\) by a continuous upper bound.

Note the new cost is continuous but its derivative is not. A method to optimize the cost is the sub-gradient descent, which is simply doing gradient descent for a piece-wise function.

There are other method to compute the similarity.

The face recognition community had a proven solution.

We can use a deep net to extract features from face images. Let’s call the face feature \(f(im)\). We can find a vector \(W_i\) for each person, such that the normalized product (cosine) of \(W_i\) and face features belong to the person i is maximized,

Make \(W_i f(im)\) large, when the im belongs to person i.

And the normalized product of \(W_i\) and face features not belong to the person i is minimized,

Make \(W_i f(im)\) small, when the im doesn’t belong to person i.

As a result, \(W_i f(im)\) gives the similarity between the image and the class i.

The idea can be implemented as a cost function,

It can be visualized beautifully in a sphere.

Margin is important for the recognizing problem.

If you are interested in this topic please read papers from this dude and this dude. They are excellent people and they are graduating. Hire them. otherwise, it’s a big loss for you. Seriously.

7 The End

Oh, it’s a long article! There are so many interesting topics related to integer programming and linear programming.

In section 1, I started by discussing the Integer programming problem and point out the linear relaxation is tight when the polyhedron is integral.

In section 2, I proved that the integral polyhedron is equivalent to a unimodular constrained matrix.

In section 3, I introduced the optimal matching problem.

In section 4, I implemented the optimal matching problem and found a surprising fact that the solution was always integral.

In section 5, I proved the constrained matrix of optimal matching problem is unimodular.

In section 6, I discussed some random stuff including methods to solve a linear program; some papers for multiple robots planning; and how to compute the similarity for a matching problem.

Writing this article is fun for me. I hope you had fun too 🙂

If you want to learn more about Integer programming, please read chapter 1 of Integral Programming. There are enormous brilliant ideas in that chapter.

Big thanks to Chaosheng Dong for giving me great references for the topic!