Motion perception

Motion perception is the process of inferring the velocity and direction of motion in a visual scene given some visual input. While this process appears straighforward to most observers, it has proven to be a hard problem from a computational perspective.

The observer's visual input is generally insufficient to uniquely determine the 'true' velocity in a visual scene. In monocular vision for example, the visual input will be a 2D projection of a 3D scene. The motion cues present in the 2D projection will by default be insufficient to reconstruct the motion present in the 3D scene. Put differently, many 3D scenes will be compatible with a single 2D projection. The problem of motion estimation generalizes to binocular vision when we consider occlusion or motion perception at relatively large distances, where binocular disparity is a poor cue to depth.

These issues become more apparent when we look at visual illusions involving motion. A well known example is the barberpole illusion. When a diagonally-striped pole is rotated around its longer axis, so that the stripes are moving in the direction of the pole's shorter axis, it nonetheless appears the stripes are moving in the direction of its longer axis.





In addition to the problems of motion perception mentioned above, a number of issues arise due to the physiology of the brain. Each neuron in the visual system is sensitive to visual input in a small part of our visual field, as if each neuron is looking at the visual input through a small apperture. At the resolution of this apperture visual cues can often be approximated by straight lines. The motion direction of a straight line is fundamentally ambiguous, because the motion component parallel to the line cannot be inferred based on the visual input.

In cases where motion cannot be determined based on visual input alone, the visual system is thought to rely on prior assumptions. In the second figure the visual input and prior assumptions together make it appear the stripes are moving to the bottom-right.

Individual neurons initially estimate motion locally within their receptive field. Because each neuron will suffer from the aperture problem the estimates from many neurons are then integrated into a global motion estimate. This appears to occur in Area MT/V5 in human visual cortex.

Motion estimation has connections to both psychology (i.e. visual perception) and computer science.

Second-order motion perception
Motion stimuli are classified into first-order stimuli, in which the moving contour is defined by luminance, and second-order stimuli in which the moving contour is defined by contrast, texture, flicker or some other quality that does not result in an increase in motion energy in the Fourier spectrum of the stimulus (Chubb & Sperling, 1988; Cavanagh & Mather, 1989). There is much evidence to suggest that early processing of first- and second-order motion is carried out by separate pathways (Nishida, et al, 1997). Second-order mechanisms have poorer temporal resolution and are low-pass in terms of the range of spatial frequencies that they respond to. Second-order motion produces a weaker motion aftereffect unless tested with dynamically flickering stimuli (Ledgeway & Smith, 1994).