ORB: An efficient alternative to SIFT or SURF

Unlike PTAM, ORB sets it up alongside SURF and SIFT, considering them as slow for real-time use. The authors aim at proposing a method which is both reliable (compares with SIFT) but also fast.

The paper breaks down the task of Feature matching in 2 steps

Feature detection - What makes corners interesting? A point on a flat white wall is useless, it looks same as any other. While a point on edge is better and unique.
Feature description & Matching- A unique fingerprint for every patch of pixels.

The proposed ORB = Oriented FAST + Rotated BRIEF. I've referenced FAST in my previous blog of PTAM. BRIEF is something new - Binary Robust Independent Elementary Features.

Now, FAST is a very simple and fast way to find corners. It looks at a pixel and checks if a circle of pixels around it are all significantly brighter or all significantly darker. If so, it's a corner. This is just a series of simple comparisons.
- Problem? - It's just a detector. It doesn't tell you the orientation of the corner.
BRIEF takes a small patch of pixels around a feature point. Then, it just compares the brightness of random pairs of pixels within that patch. For example, Is pixel A brighter than pixel B? If yes, write down a 1. If no, write down a 0. It does this 256 times to create a 256-bit binary string.
- Problem? - It is not rotation invariant. If you rotate the image, the pairs of pixels it compares will change, and the resulting binary fingerprint will be completely different.

This is the core of their paper, adding orientation to FAST and make BRIEF rotation-invariant.

Section 1 - oFAST: FAST Keypoint Orientation

The goal is to solve two weaknesses of FAST -

Often finds low quality points
No orientation

Fast Detector

This is like a quality-control step.

Get FAST points
1. Run FAST with low threshold, collecting way more points than necessary.
Harris corner
1. As the selection quality is not very good, the proposed solution is to use a Harris corner measure to give each point a confidence score.
Pick the top N
1. After sorting via Harris score, top N is picked. This guarantees points with high quality features.
Repeat for image-pyramid
1. Ensures they find both big and small corners, achieving scale invariance.

The method is pretty easy. Let us do that step by step, I have taken a test image from Pexels.

I ran FAST with low threshold, leading to detection of 7574 points. After applying, Harris Corner Confidence and choosing the top 2500 points below are the results:

Top 2500 FAST points after Harris scoring.

And applied on 4 levels -

Level 0 - top 2500
Level 1 - top 1000
Level 2 - top 400
Level 3 - top 100

FAST points detected across a 4-level image pyramid.

Orientation by Intensity Centroid

The intensity centroid is simply the center of mass of patch of pixels. The paper aims to define corner's orientation as vector pointing from corner's centroid to center of mass.

$$m_{pq}=\sum_{x,y}x^py^qI(x,y)$$

This is exact formula for moments but just defined from POV of intensity, $I(x,y)$ being intensity of pixel at coordinate $(x,y)$. $x^p, y^q$ being the weighting factors based on its location.

Like if $p=0, q=0$ -> total mass of the patch.
if $p=1, q=0$ -> total mass of the patch along x.
if $p=0, q=1$ -> total mass of the patch along y.

Similarly, we then can calculate centroid $$C=\left(\frac{m_{10}}{m_{00}},\frac{m_{01}}{m_{00}}\right)$$

And now we can get the vector from corner's center $O$, to its centroid, which is $OC$. The orientation of the patch then simply is,

$$\theta = \text{atan2}(m_{01},m_{10})$$

In my implementation I've defined the block as such -

def compute_orientation(image, keypoint, patch_size=31):

	center_x, center_y = int(round(keypoint.pt[0])), int(round(keypoint.pt[1]))
	radius = patch_size // 2
	
	x_min, x_max = center_x - radius, center_x + radius + 1
	y_min, y_max = center_y - radius, center_y + radius + 1
	
	patch = image[y_min:y_max, x_min:x_max]
	m00 = 0
	m10 = 0
	m01 = 0
	
	for y in range(patch.shape[0]):
		for x in range(patch.shape[1]):
			intensity = int(patch[y, x])
			m00 += intensity
			m10 += (x - radius) * intensity
			m01 += (y - radius) * intensity
	
	if m00 ==0:
		return 0
	
	angle = cv.fastAtan2(float(m01), float(m10))
	return angle

Results -

Section 2 - rBRIEF: Rotation-Aware Brief

Let us take a look at BRIEF descriptor in short.

First you detect keypoints, usually using FAST, BRIEF will only describe these points. For each keypoint extract a image patch, here 31x31.

BRIEF defines a set of $n$ pairs of points within the patch. ($n$ = 256)

$$ \tau(\mathbf{p}; \mathbf{x}, \mathbf{y}) := \begin{cases} 1 & :\ \mathbf{p}(\mathbf{x}) < \mathbf{p}(\mathbf{y}) \\ 0 & :\ \mathbf{p}(\mathbf{x}) \geq \mathbf{p}(\mathbf{y}) \end{cases}$$

For each pair compare the pixel intensities in the patch, the above equation describes the same process. Concatenate all bits using,

$$f_{n}(\mathbf{p}):=\sum_{1\le i\le n}2^{i-1}\tau(\mathbf{p}; \mathbf{x}, \mathbf{y})$$

This is the BRIEF descriptor. What you'll get is a sequence of 256 binary number.

Original BRIEF descriptor results — BRIEF descriptor matching on the test image.

Here is an example on our test image.

Now when we apply original BRIEF on rotated image, the results are pretty bad-

Original BRIEF failing on a rotated image — Poor matching with original BRIEF on a rotated image.

Steered BRIEF

The first modification the paper aims to achieve is directly rotating the patch before applying the descriptor.

We already know the keypoint's orientation $\theta$ . We have the predefined set of 256 pairs of points, that we use for BRIEF. Before we test, we just rotate all these points by $\theta$ using the rotation matrix $R_{\theta}$.

The new set of Steered points is $$S_\theta = R_\theta*S$$

Now, apply the BRIEF using these new, rotated points $S_\theta$. For efficiency, the authors implemented a Lookup Table for trigonometric calculations.

All good? Rotated BRIEF achieved! Well no, while this seemed the ideal world case, the Steered BRIEF performs worse than the original BRIEF.

Analysis

Before moving, we will have to first tackle a question what makes a good descriptor?

High Variance
The bits in the descriptor should flip between 0 and 1 often when looking at different keypoints. If a bit is always 0, it provides no information. A bit that is 0 half the time and 1 half the time (a mean of 0.5) has the highest variance and is most informative.
Uncorrelated tests
Each of the 256 bits should provide new information. If 2 or more are same they are useless.

Solution?

Now the goal is to find the 256 binary tests when applied to steered BRIEF will produce -

A mean close to 0.5 (high variance).
Low correlation with each other.

To achieve this the proposed method is to use a greedy search on all possible test within the 31x31 patch, i.e all the 205,590 possible tests. Get the mean for each test, order the list so that tests with a mean closest to 0.5 are at the top. For low correlation greedy is used, final set of 256 tests, called $R$, is made one by one. - This new set of 256 learned tests is what we call rBRIEF.

And the results are good -

rBRIEF successfully matching on a rotated image — Successful matching on the rotated image using rBRIEF.

So this was the concept of ORB, fun little feature descriptor. Thank you!