Empirical theory of perception

An empirical theory of perception is a kind of explanation for how percepts arise. These theories hold that sensory systems incorporate information about the statistical properties of the natural world into their design and relate incoming stimuli to this information, rather than analyzing sensory stimulation into its components or features.

Empirical accounts of vision
Visual perception is initiated when objects in the world reflect light rays towards the eye. Most empirical theories of visual perception begin with the observation that stimulation of the retina is fundamentally ambiguous. In empirical accounts, the most commonly proposed mechanism for circumventing this ambiguity is "unconscious inference," a term that dates back to Helmholtz.

According to Hatfield, Alhazen was the first to propose that higher-level cognitive processes ("judgments") could supplement sense perception to lead to veridical perception of distance, suggesting that these "judgments" are formally equivalent to syllogisms. Descartes extended and refined this account. Berkeley departed from this tradition, putting forth the new idea that sensory systems, rather than performing logical operations on stimuli to reach veridical conclusions (i.e. these light rays come with certain orientations relative to each other, therefore their source is at a certain distance), make associations, so that for instance if certain co-occurring sensory attributes are usually present when an object is at a given distance we see an object with those attributes as being at that distance. For Helmholtz Berkeleyan associations form the premises for inductive "judgements," in Alhazen's sense of the term. Helmholtz was one of the first thinkers on the subject to augment his reasoning with detailed knowledge of the anatomy of sensory mechanisms.

In current work Helmholtz's use of the term is construed as referring to some mechanism that augments sense impressions with acquired knowledge or through application of heuristics. In general, contemporary empirical theories of perception seek to describe and/or explain the physiological underpinnings of this "unconscious inference," particularly in terms of how sensory systems acquire information about general statistical features of their environments (see natural scene statistics) and apply this information to sensory data in order to shape perception. A recurring theme in these theories is that stimulus ambiguity is rectified by a priori knowledge about the natural world.

Wholly empirical approach to visual perception
The wholly empirical approach to perception, developed by Dale Purves and his colleagues, holds that percepts are determined solely by evolutionary and individual experience with sensory impressions and the objects from which they derive. The success or failure of behavior in response to these sensory impressions tends to increase the prevalence of neural structures that support some ways of interpreting sensory input, while decreasing the prevalence of neural structures that support other ways of interpreting sensory input.

On the wholly empirical account, this strategy determines qualities of perception in all visual domains and sensory modalities. Accumulating evidence suggests that the perception of color,  contrast, distance, size, length, line orientation and angles, and motion,  as well as pitch and consonance in music,   may be determined by empirically derived associations between the sensory patterns humans have always experienced and the relative success of behavior in response to those patterns. "Much to the advantage of the observer, percepts co-vary with the efficacy of past actions in response to visual stimuli, and thus only coincidentally with the measured properties of the stimulus or the underlying objects."

- Dale Purves

The wholly empirical strategy
The wholly empirical theory of perception departs from many other empirical theories by recognizing the seriousness of the optical inverse problem. To illustrate this problem, imagine that three hoses are used to fill a bucket with water. If how much water each hose has contributed is known, it is straightforward to calculate how much water is in the bucket. These kinds of problems are known as “forward” problems, and scientists like them because they are easy to solve. But if instead we imagine that all that is known is the amount of water in the bucket, it is impossible to figure out, on this basis alone, how much water came from each hose: it is impossible to work “backwards” from the bucket to the hoses. This is a simple example of an inverse problem. Solutions to these problems are rarely possible, although they can sometimes be approximated by imposing assumption-based constraints on the “solution space”.

Navigating the world on the basis of sensory stimulation alone represents an inverse problem in the realm of biology. Consider, for example, the case of the distance and line length (Fig. 1). When light reflected from a linear object falls on the retina, the object in 3-D space is transformed into a two-dimensional line. Note, however, that a distant line can form the same image on the retina as a shorter but close line. All the eyes receive is an image, which is analogous to the bucket of water. It is impossible to go backwards to know the real distance, length, and orientation of the source of the projected line, analogous to the amounts of water that came from each hose. Despite this fact, percipients usually manage to behave effectively in response to sensory stimulation.

The inverse optics problem presents a quandary for traditional approaches to perception. For example, advocates of feature detection or, in more current terms, neural filtering, propose that the visual system performs logical computations on retinal inputs to determine higher-level aspects of a perceptual scene such as contrast, contour, shape and color percepts. However, given the inverse problem, it is hard to imagine how these computations, if they were actually performed, would be useful, since they would have little or nothing to do with properties of the real world. Empirical approaches to perception take a different tack, arguing that the only way for organisms to successfully overcome the inverse problem is to exploit their long and varied past experience with the real world. The wholly empirical approach holds that this experience is the sole determinant of perceptual qualities. The reason percipients see an object as dark or light, the argument goes, is that in both our own past and the past of the species it paid off to see it that particular way. Returning to the bucket analogy, imagine that each of the three hoses pumps out water of a different color: one pumps out black water, one pumps out gray water, and one pumps out clear water. All one sees is the water in the bucket, which can be clear, gray, black, or any shade in between. As expected, it is impossible to perform some calculation on the color of the water in the bucket to find out how much water came out of each hose. Now imagine that it is your job to bet on how much water came out of the gray hose. The output ratios of the hoses are not random, but co-vary in all kinds of complicated ways based on the time of day, how long it takes to fill up the bucket, etc. At first your behavior in response to the color of the bucket might not be so good, but over time this would gradually improve as different shades and behaviors in response became associated by trial and error. The key is that in order to improve you have to know whether or not your behaviors worked by interacting with the world.

On the wholly empirical view, the retinal image is like the bucket and what you see is determined by past behaviors that have succeeded. Although this example is simplistic, it illustrates the general strategy that visual system uses to work around the inverse problem. Over millions of years individuals whose visual systems more successfully linked sensory stimulation with successful behavior won out. In this view while we do not actually “solve” the inverse problem—which would be analogous to computing the outputs of all three hoses—we get close enough to behave appropriately in response to stimuli.

Color
Color vision is dependent on activation of three cone cell types in the human retina, each of which is primarily responsive to a different spectrum of light frequencies. While these retinal mechanisms enable subsequent color processing, their properties alone cannot account for the full range of color perception phenomena. In part this is due to the fact that illuminance (the amount of light shining on an object), reflectance (the amount of light an object is predisposed to reflect), and transmittance (the extent to which the light medium distorts the light as it travels) are conflated in the retinal image. This is problematic because, if color vision is to be useful, it must somehow guide behavior in line with these properties. Even so, the visual system only has access to retinal input, which does not distinguish the relative contributions of each of these factors to the final light spectra that stimulate the retina.

According to the empirical framework, the visual system solves this problem by drawing on species and individual experience with retinal images that have signified different combinations of illuminance, reflectance, and transmittance in the past. Only those associations that led to appropriate behavior were retained through evolution and development, leading to a repertoire of neural associations and predispositions that ground color perception in the world.

One way to test this idea is to see whether the frequency of co-occurrence of light spectra predicts simultaneous color contrast effects (see side image, Fig. 1). Long and Purves showed that by sampling thousands of natural images, analysis of associations between target colors and the colors of their surrounds could explain perceptual effects like those seen on the right. Rather than explaining the diverging color percepts as unfortunate byproducts of a normally veridical color perception mechanism, according to this work the different colors we see are simply the byproducts of our species and individual exposure to the distribution of color spectra in the world.

Brightness
’’Brightness’’ refers to a subjective sense that the object considered is emitting light. Whereas the perceptual correlates of color are the frequencies of light that compose the light spectrum, the perceptual correlate of brightness is luminance, or the intensity of light emitted by an object. While it may seem obvious that the sensation of brightness is straightforwardly related to the amount or intensity of light coming to the eyes, perception researchers have long known that brightness is not caused solely by the luminance incident on the retina. A common example is simultaneous brightness contrast (shown to the right), in which the two identical target diamonds appear differently bright.

In the empirical account, the same general framework used to rationalize simultaneous color contrast applies to simultaneous brightness contrast. Because the three factors that determine luminance emissions—transmittance, reflectance, and illuminance—are blended in the retinal image of the object, operations on the luminance returns as such cannot in principle yield percepts that are good guides to behavior. The visual system solves this problem by associating luminance values and their given contexts with the success or failure of ensuing behavior, leading to percepts that often (but only incidentally) reflect properties of objects rather than their associated images.

The image on the right (Fig. 2.) strongly supports this view of how brightness perception works. Although other frameworks have either no explanation for this effect or explanations that are highly inconsistent with their explanations for similar effects, the empirical framework makes the case that the perceived brightness differences are due to empirical associations between the targets and their respective contexts. In this case, because the “lighter” targets would typically have been shadowed, we perceive them in a way that is consistent with their having a higher reflectance despite their presumably low levels of illuminance. Note that this approach is considerably different from computational “context”-driven approaches, since in this case the target/context relationships are contingent and world-based, and therefore cannot be generalized to other cases in any meaningful way.

Line length
Perception of line length is confounded by another optical inverse problem: the further away a line in the world, the smaller the projected line will be on the retina. Different orientations of a line relative to the observer may obscure true line length as well. It is well known that straight lines are erroneously reported as longer or shorter as a function of their angular orientation, as is evident in Fig. 3. While no generally accepted explanations of this phenomenon have been offered previously, the empirical approach has had some success in explaining the effect as a function of the distribution of lines in natural scenes. Howe and Purves (2002) analyzed natural scene photographs to find projected lines that corresponded to straight line sources. They found that the ratios of the actual length of the lines to the projected lines on the retina, when classified by their respective orientations on the retina, almost perfectly matched subjective estimation of line length as a function of angle relative to the observer. For example, horizontal lines on the retinal image would typically have turned out to issue from relatively short physical sources, while lines at about 60 degrees relative to the observer would typically have signified longer physical sources, which explains why we tend to see the 60° line in Fig. 3 as longer than the 0° (horizontal) line. While there is no way for the visual system to know this a priori, the fact that it seems to take this knowledge for granted in its construction of length estimation percepts strongly supports the wholly empirical view of perception.

Motion
Perception of motion is also confounded by an inverse problem: movement in three dimensional space does not map perfectly onto movement on the retinal plane. A distant object moving at a given speed will translate more slowly on the retina than a nearby object moving at the same speed, and as mentioned previously size, distance and orientation are also ambiguous given only the retinal image. As with other aspects of perception, empirical theorists propose that this problem is solved by trial-and-error experience with moving stimuli, their associated retinal images and the consequences of behavior. One way to test this hypothesis is by seeing whether it can explain the flash lag illusion, a visual effect in which a flash superimposed on a moving bar is falsely seen to lag behind the bar. The task for empirical theorists is to explain why we perceive the flash in this way, and further, why the perceived lag increases with the speed of the moving bar. To investigate this question, Wojtach et al. (2008) simulated a three-dimensional environment full of moving virtual particles. They modeled the transformation from three dimensions to the two-dimensional image plane and tallied up the frequency of occurrence of particle speeds, particle distances, image speeds, and image distances (image meaning the path projected across the computer-modeled “retina”). The probability distributions they obtained in this way predicted the magnitude of the bar-flash disparity quite well. The authors concluded that the flash-lag effect was a signature of the way brains evolve and develop to behave appropriately in response to moving retinal images.