Categorical perception

Categorical perception is the perception of different sensory phenomena as being qualitatively, or categorically, different. It is opposed to continuous perception, the perception of different sensory phenomena as being located on a smooth continuum.

Categorical perception (CP) can be inborn or can be induced by learning. Formerly thought to be peculiar to speech and color perception, CP turns out to be far more general, and may be related to how the neural networks in our brains detect the features that allow us to sort the things in the world into their proper categories, "warping" perceived similarities and differences so as to compress some things into the same category and separate others into different categories.

Categorization
A category, or kind, is a set of things. Membership in the category may be (1) all-or-none, as with "bird": Something either is a bird or it isn't a bird; a penguin is 100% bird, a platypus is 100% not-bird. In this case we would call the category "categorical." Or membership might be (2) a matter of degree, as with "big": Some things are more big and some things are less big. In this case the category is "continuous" (or rather, degree of membership corresponds to some point along a continuum). There are range or context effects as well: elephants are relatively big in the context of animals, relatively small in the context of bodies in general, if we include planets.

Many categories, however, particularly concrete sensorimotor categories (things we can see and touch), are a mixture of the two: categorical at an everyday level of magnification, but continuous at a more microscopic level. Color categories are good examples: Central reds are clearly reds, and not shades of yellow. But in the orange region of the spectral continuum, red/yellow is a matter of degree; context and contrast effects can also move these regions around somewhat. Perhaps even with "bird," an artist or genetic-engineer could design intermediate cases in which their "birdness" was only a matter of degree.

Resolving the "blooming, buzzing confusion." Categories are important because they determine how we see and act upon the world. As William James noted, we do not see a continuum of "blooming, buzzing confusion" but an orderly world of discrete objects. Some of these categories are "prepared" in advance by evolution: The frog's brain is born already able to detect "flies"; it needs only normal exposure rather than any special learning in order to recognize and catch them. Humans have such innate category-detectors too: The human face itself is probably an example. So too are our basic color categories, although according to the "Whorf Hypothesis" (Whorf 1956; also called the "linguistic relativity" hypothesis), colors are determined by how our culture and language happens to subdivide the spectrum (we will return to this).

But if one opens up a dictionary at random and picks out a content word, chances are that it names a category we have learned to detect, rather than one that our brains were innately prepared in advance by evolution to detect. The generic human face may be an innate category for us, perhaps even the various basic emotions it can express, but surely all the specific people we know and can name are not. "Red" and "yellow" may be inborn, but "scarlet" and "crimson"

The motor theory of speech perception
And what about the very building blocks of the language we use to name categories: Are our speech-sounds -- ba, da, ga -- innate or learned? The first question we must answer about them is whether they are categorical categories at all, or merely arbitrary points along a continuum. It turns out that if one analyzes the sound spectrogram of ba and pa, for example, both are found to lie along an acoustic continuum called "voice-onset-time." With a technique similar to the one used in "morphing" visual images continuously into one another, it is possible to "morph" a ba gradually into a pa and beyond by gradually increasing the voicing parameter.

Liberman et al. (1957) reported that when people listen to sounds that vary along the voicing continuum, they hear only ba's and pa 's, nothing in between. This effect -- in which a perceived quality jumps abruptly from one category to another at a certain point along a continuum, instead of changing gradually -- he dubbed "categorical perception" (CP). He suggested that CP was unique to speech, that CP made speech special, and, in what came to be called "the motor theory of speech perception," he suggested that CP's explanation lay in the anatomy of speech production:

According to the (now abandoned) motor theory, the reason we perceive an abrupt change between ba and pa is that the way we hear speech sounds is influenced by the way we produce them when we speak. What is varying along this continuum is voice-onset-time: the "b" in ba is voiced and the "p" in pa is not. But unlike the synthetic "morphing" apparatus, our natural vocal apparatus is not capable of producing anything in between ba and pa. So when I hear a sound from the voicing continuum, my brain perceives it by trying to match it with what it would have had to do to produce it. Since the only thing I can produce is ba or pa, I will perceive any of the synthetic stimuli along the continuum as either ba or pa, whichever it is closer to. A similar CP effect is found with ba/da; these too lie along a continuum acoustically, but vocally, ba is formed with the two lips, da with the tip of the tongue and the hard palate, and our anatomy does not allow any intermediates.

The motor theory of speech perception explained how speech was special and why speech-sounds are perceived categorically: sensory perception is mediated by motor production. Wherever production is categorical, perception will be categorical; where production is continuous, perception will be continuous. And indeed vowel categories like a/u were found to be much less categorical than ba/pa or ba/da. (Less categorical, but not altogether continuous either: we will return to this.)

Acquired distinctiveness. If motor production mediates sensory perception, then one assumes that this CP effect is a result of learning to produce speech. Eimas et al. (1971), however, found that infants already have speech CP before they begin to speak. Perhaps, then, it is an innate effect, evolved to "prepare" us to learn to speak. But Kuhl (1987) found that chinchillas also have "speech CP" even though they never learn to speak, and presumably did not evolve to do so. Lane (1965) went on to show that CP effects can be induced by learning alone, with a purely sensory (visual) continuum in which there is no motor production discontinuity to mediate the perceptual discontinuity. He concluded that speech CP is not special after all, but merely a special case of Lawrence's classic demonstration that stimuli to which you learn to make a different response become more distinctive and stimuli to which you learn to make the same response become more similar.

It also became clear that CP was not quite the all-or-none effect Liberman had originally thought it was: It is not that all pa's are indistinguishable and all ba's are indistinguishable: We can hear the differences, just as we can see the differences between different shades of red. It is just that the within-category differences (pa1/pa2 or red1/red2) sound/look much smaller than the between-category differences (pa2/ba1 or red2/yellow1), even when the size of the underlying physical differences (voicing, wave-length) are actually the same.

The modern definition of categorical perception
This evolved into the contemporary definition of CP, which is no longer peculiar to speech or dependent on the motor theory: CP occurs whenever perceived within-category differences are compressed and/or between-category differences are separated, relative to some baseline of comparison. The baseline might be the actual size of the physical differences involved, or, in the case of learned CP, it might be the perceived similarity or discriminability within and between categories before the categories were learned, compared to after.

The typical learned CP experiment would be the following: A set of stimuli is tested (usually in pairs) for similarity or discriminability. In the case of similarity, multidimensional scaling might be used to scale the rated pairwise similarity of the set of stimuli. In the case of discriminability, same/different judgments and signal detection analysis might be used to estimate the pairwise discriminability of a set of stimuli. Then the same subjects or a different set are trained, using trial and error and corrective feedback, to sort the stimuli into two or more categories. After the categorization has been learned, similarity or discriminability are tested again, and compared against the untrained data. If there is significant within-category compression and/or between-category separation, this is operationally defined as CP (Harnad 1987).

The Whorf Hypothesis. We can now return both to the "Whorf Hypothesis" and the "weaker" CP for vowels: According to the Whorf Hypothesis (of which Lawrence's acquired similarity/distinctiveness effects would simply be a special case), colors are perceived categorically only because they happen to be named categorically: Our subdivisions of the spectrum are arbitrary, learned, and vary across cultures and languages. But Berlin & Kay (1969) showed that this was not so: Not only do most cultures and languages subdivide and name the color spectrum the same way, but even for those who don't, the regions of compression and separation are the same. We all see blues as more alike and greens as more alike, with a fuzzy boundary in between, whether or not we have named the difference. So there is no Whorfian learning effect with colors: Or is there?

Evolved CP. First, back to vowels. The signature of CP is within-category compression and/or between-category separation. The size of the CP effect is merely a scaling factor; it is this compression/separation "accordion effect," that is CP's distinctive feature. In this respect, the "weaker" CP effect for vowels, whose motor production is continuous rather than categorical, but whose perception is by this criterion categorical, is every bit as much of a CP effect as the ba/pa and ba/da effects. But, as with colors, it looks as if the effect is an innate one: Our sensory category detectors for both color and speech sounds are born already "biased" by evolution: Our perceived color and speech-sound spectrum is already "warped" with these compression/separations.

Learned CP. Is that all there is to it? Apparently not. There are still the Lane/Lawrence demonstrations, lately replicated and extended by Goldstone (1994), that CP can be induced by learning alone. And there are also the countless categories catalogued in our dictionaries that could not possibly be inborn (though nativist theorists such as Fodor [1983] have sometimes seemed to suggest that all of our categories are inborn). There are even recent demonstrations that although the primary color and speech categories are probably inborn, their boundaries can be modified or even lost as a result of learning, and weaker secondary boundaries can be generated by learning alone (Roberson et al. 2000).

Perhaps CP performs some useful function in categorization? In the case of innate CP, our categorically biased sensory detectors pick out their prepared color and speech-sound categories far more readily and reliably than if our perception had been continuous. Could something similar be the case for our repertoire of learned categories too?

Computational and neural models
Computational modeling (Tijsseling & Harnad 1997; Damper & Harnad 2000) has shown that many types of category-learning mechanisms (e.g. both back-propagation and competitive networks) display CP-like effects. In back-propagation nets, the hidden-unit activation patterns that "represent" an input build up within-category compression and between-category separation as they learn; other kinds of nets display similar effects. CP seems to be a means to an end: Inputs that differ among themselves are "compressed" onto similar internal representations if they must all generate the same output; and they become more separate if they must generate different outputs. The network's "bias" is what filters inputs onto their correct output category. The nets accomplish this by selectively detecting (after much trial and error, guided by error-correcting feedback) the invariant features that are shared by the members of the same category and that reliably distinguish them from members of different categories; the nets learn to ignore all other variation as irrelevant to the categorization.

Very little is known yet about the brain mechanisms of category perception and learning. The computational models are really causal hypotheses about what the brain might be doing. Neural data provide correlates of CP and of learning (Sharma & Dorman 1999). Differences between event-related potentials recorded from the brain have been found to be correlated with differences in the perceived category of the stimulus viewed by the subject. Neural imaging studies have shown that these effects are localized and even lateralized to certain brain regions in subjects who have successfully learned the category, and are absent in subjects who have not (Seger et al. 2000).

Language-induced categorical perception
Both innate and learned CP are sensorimotor effects: The compression/separation biases are sensorimotor biases, and presumably had sensorimotor origins, whether during the sensorimotor life-history of the organism, in the case of learned CP, or the sensorimotor life-history of the species, in the case of innate CP. The neural net I/O models are also compatible with this fact: Their I/O biases derive from their I/O history. But when we look at our repertoire of categories in a dictionary, it is highly unlikely that many of them had a direct sensorimotor history during our lifetimes, and even less likely in our ancestors' lifetimes. How many of us have seen a unicorn in real life? We have seen pictures of them, but what had those who first drew those pictures seen? And what about categories I cannot draw or see (or taste or touch): What about the most abstract categories, such as goodness and truth?

Some of our categories must originate from another source than direct sensorimotor experience, and here we return to language and the Whorf Hypothesis: Can categories, and their accompanying CP, be acquired through language alone? Again, there are some neural net simulation results suggesting that once a set of category names has been "grounded" through direct sensorimotor experience, they can be combined into Boolean combinations (man = male & human) and into still higher-order combinations (bachelor = unmarried & man) which not only pick out the more abstract, higher-order categories much the way the direct sensorimotor detectors do, but also inherit their CP effects, as well as generating some of their own. Bachelor inherits the compression/separation of unmarried and man, and adds a layer of separation/compression of its own (Cangelosi et al. 2000, Cangelosi & Harnad 2001).

These language-induced CP-effects remain to be directly demonstrated in human subjects; so far only learned and innate sensorimotor CP have been demonstrated (Pevtzow & Harnad 1997; Livingston et al. 1998). The latter shows the Whorfian power of naming and categorization, in warping our perception of the world. That is enough to rehabilitate the Whorf Hypothesis from its apparent failure on color terms (and perhaps also from its apparent failure on eskimo snow terms, Pullum 1989), but to show that it is a full-blown language effect, and not merely a vocabulary effect, it will have to be shown that our perception of the world can also be warped, not just by how things are named but by what we are told about them.


 * This article is based on material from the article Categorical Perception in the Encyclopedia of Cognitive Science, used here with permission of the author, S. Harnad.