1989.Fwork—A Computational Framework and an Algorithm for the Measurement of Visual Motion

Link to paper

A quite comprehensive reworking of methods for optical flow. They use the typical combination of elements (coarse-to-fine, photometric, and smoothness), but use two main strategies not present in other papers:

Explicit search for matching pixels. That is, they search over a narrow (3x3) grid around the current estimate at each level, in order to find the pixel with the most similar 5x5 neighborhood. The result is then smoothed with the smoothing loss.
A second-order confidence measure to weight different points and in particular different directions for the photometric loss (not the smoothness loss). They combine these with the three components mentioned above and then use an iterative method (much like Gauss-Seidel, etc.) in order to solve for $u$ and $v$ .

Pyramid search advancements

Bandpass filter. While constructing mid-levels, they first apply a low-pass filter (e.g. Gaussian blur) and then subtract frequencies below, leading to a band-pass filter. At each level, only a (narrow) range of frequencies are present in the image. This leads to less interference from other frequencies.

Overlapped pyramid. In a coarse-to-fine scheme, any mistakes at higher levels are propagated downwards:

The displacement at a pixel at a level $l - 1$ is projected to the four pixels below at level $l$ . However, in this scheme, if the displacement computed for a coarse-level pixel is incorrect, the search areas of none of its descendants at any of the subsequent levels will contain its correct match. Hence, a single match error made at a coarse level causes a large block of pixels at the image resolution to have incorrect displacements.

In order to fix this, they propose the following scheme: each pixel at level $l - 1$ sends its prediction to a $4 \times 4$ (versus $2 \times 2$ ) grid below it. This gives each pixel at level $l$ 4 different match candidates to begin with:

Matching based on context. Matches are not only done based on the pixel itself, but a 5x5 window around the pixel. I think this is related to gradient matching methods, as gradients are usually calculated with nearby pixels.

Some really cool visualizations

I thought these figures were really interesting, to give an idea of what the loss landscape might look like: ![[Asset 5@2x.png]] The occlusion part always really confuses me though. Occluded pixels and disoccluded pixels should both have really high SSDs right? I guess that’s what the last figure shows, but it’s still a bit confusing.

Here’s another one, showing the nature of the aperture problem. The first one is with smoothing; the second one is without. It settles on the correct edge but not the correct location within that edge as it’s not well-specified. ![[Asset 1@2x.png]]

Adding smoothness to explicit search

Adding in a smoothness condition makes a big difference in the final results: ![[Smoothing@2x.png]]

The way they do this is by using the pixel-matching SSD loss as a “ground-truth” optical flow, and the photometric loss measures the difference between the matched flow and the smoothed flow.

The “base” energy to be minimized is:

E ({u}) = E_{sm} ({u}) + E_{ap} ({u}) = \iint trace (\nabla u^{T})^{T} (\nabla u^{T}) d x d y + x, y \sum [(u_{x} - d_{x})^{2} + (u_{y} - d_{y})^{2}] .

The first term is equivalent to the smoothness loss from H&S. The second would measure deviation from the match predictions $d_{x}$ and $d_{y}$ , basically the same loss one would use for a GT optical flow in a supervised setting.

Confidence measure

They propose an additional refinement, weighting the photometric loss by not only overall confidence but confidence in a particular direction. They use the second derivative (curvature) as a measure of confidence, letting $\hat{e}_{ma x}$ and $\hat{e}_{min}$ be the unit vectors in the directions of the greatest and least curvature, respectively, (I assume $\hat{e}_{ma x} ⊥ \hat{e}_{min}$ ) and then the above $E_{ap} ({u})$ is replaced with

x, y \sum c_{max} (u \cdot \hat{e}_{max} - d \cdot \hat{e}_{max})^{2} + c_{min} (u \cdot \hat{e}_{min} - d \cdot \hat{e}_{min})^{2} c_{i} = \frac{curvature _{i}}{k _{1} + k _{2} \cdot min ( SSD vals )} .

I’m not sure why they choose curvature (I guess it’s deviations from a linear local estimate?) as opposed to just the derivative itself. As opposed to N&E, the weightings are applied to the photometric loss, not the smoothness loss.

Iterative solution

They use something called the finite element method to solve the minimization problem of $E$ above. I’m not really sure exactly how it’s involved here (in particular, what they describe seems pretty different from what I found online about it), but they end up with iterative update rule

u^{n + 1} = \overset{ˉ}{u}^{n} + \frac{c _{max}}{c _{max} + 1} ((d - \overset{ˉ}{u}^{n}) \cdot \hat{e}_{max}) \hat{e}_{max} + \frac{c _{min}}{c _{min} + 1} ((d - \overset{ˉ}{u}^{n}) \cdot \hat{e}_{min}) \hat{e}_{min},

which converges towards the minimizer. They have a really cool diagram (I love the diagrams in this paper) intuitively showing what happens with each update of $u$ :

![[Screen Shot 2023-09-05 at 22.45.34.png|center|300]] ![[Screen Shot 2023-09-05 at 22.47.05.png|center|300]]

Cited

Cited By

2004.BBPW—High Accuracy Optical Flow Estimation Based on a Theory for Warping (Brox… 2004)

Return: index

Research

Table of Contents

1989.Fwork—A Computational Framework and an Algorithm for the Measurement of Visual Motion

Pyramid search advancements

Some really cool visualizations

Adding smoothness to explicit search

Confidence measure

Iterative solution

Tags

Cited

Cited By

Graph View

Backlinks

Research

Table of Contents

1989.Fwork—A Computational Framework and an Algorithm for the Measurement of Visual Motion

Pyramid search advancements §

Some really cool visualizations §

Adding smoothness to explicit search §

Confidence measure §

Iterative solution §

Tags §

Cited §

Cited By §

Graph View

Backlinks

Pyramid search advancements

Some really cool visualizations

Adding smoothness to explicit search

Confidence measure

Iterative solution

Tags

Cited

Cited By