Speech perception

Speech perception refers to the processes by which humans are able to interpret and understand the sounds used in language. The study of speech perception is closely linked to the fields of phonetics and phonology in linguistics and cognitive psychology and perception in psychology.

Theories
Some of the earliest work in the study of how humans perceive speech sounds was done by Alvin Liberman and his colleagues at Haskins Laboratories (1957). Using a speech synthesizer, they constructed speech sounds that varied in place of articulation along a continuum from /ba/ to /da/ to /ga/. Listeners were asked to identify which sound they heard and to discriminate between two different sounds. The results of the experiment showed that listeners grouped sounds into discrete categories, even though the sounds they were hearing were continuously varying. Based on these results, they proposed the notion of categorical perception as a mechanism by which humans are able to identify speech sounds.

More recent research using different tasks and methodologies suggests that listeners are actually sensitive to acoustic differences within a single phonetic category.

Basics of speech perception
The process of perveicing speech begins at the level of the sound signal and the process of audition. (For a complete description of the process of audition see Hearing.) After processing the initial auditory signal, speech sounds are further processed to extract acoustic cues and phonetic information.

The sound signal contains a number of acoustic cues that are used in speech perception. The cues differentiate speech sounds belonging to different phonetic categories. For example, one of the most studied cues in speech is voice onset time or VOT. VOT is a primary cue signaling the difference between voiced and voiceless sounds, such as "b" and "p". Other cues differentiate sounds that are produced by different places of articulation or manners of articulation. The speech system must also combine these cues to determine the category of a specific speech sound. This is often thought of in terms of abstract representations of phonemes. These representations can then be combined for use in word recognition and other language processes.

The process of speech perception is not necessarily uni-directional. That is, higher-level language processes may interact with basic speech perception processes to aid in recognition of speech sounds.

Research topics
One of the basic problems in the study of speech is how to deal with the noise in the speech signal. This is shown by the difficulty that computer voice recognition systems have with recognizing human speech. These programs can do well at recognizing speech when they have been trained on a specific speaker's voice, and under quiet conditions. However, these systems often do poorly in more realistic listening situations where humans are able to understand speech without difficulty.

Development
One of the basic questions in speech perception is how infants learn speech sound categories. Different languages use different sets of speech sounds. For example, English distinguishes two voicing categories of sounds, whereas Hindi has three categories. Infants must learn which sounds their native language uses, and which ones it does not. It remains unclear how they are able to do this. Some researchers have suggested that certain sound categories are innate, that is, they are genetically-specified. Others have suggested that infants may be able to learn the sound categories of their native language through passive listening, using a process called statistical learning.

Studies of infant speech perception have shown that, in general, infants are able to distinguish more categories of speech sounds than adults. Newborns are able to distinguish between many of the sounds of human languages, but by about 12 months of age, they are only able to distinguish those sounds used in their native language.