Speech repetition

Speech repetition is the saying by one individual of the spoken vocalizations made by another individual. This requires the ability in the person making the copy to map the sensory input they hear from the other person's vocal pronunciation into a similar motor output with their own vocal tract.

Such speech input output imitation often occurs independently of speech comprehension such as in speech shadowing when a person automatically says words heard in earphones, and the pathological condition of echolalia in which people reflexively repeat overheard words. This links to speech repetition of words being separate in the brain to speech perception. Speech repetition occurs in the dorsal speech processing stream while speech perception occurs in the ventral speech processing stream. Repetitions are often incorporated unawares by this route into spontaneous novel sentences immediately or after delay following storage in phonological memory.

In humans, the ability to map heard input vocalizations into motor output is highly developed due to this copying ability playing a critical role in a child's rapid expansion of their spoken vocabulary. In older children and adults it still remains important as it enables the continued learning of novel words and names and additional languages. Such repetition is also necessary for the propagation of language from generation to generation. It has also been suggested that the phonetic units out of which speech is made have been selected upon by the process of vocabulary expansion and vocabulary transmissions due to  children preferentially copying words in terms of more easily imitated elementary units.

Automatic
Vocal imitation happens quickly: words can be repeated within 250-300 milliseconds both in normals (during speech shadowing) and during echolalia by retarded individuals. The imitation of speech syllables possibly happens even quicker: people begin imitating the second phone in the syllable [ao] earlier than they can identify it (out of the set [ao], [aæ] and [ai]. Indeed, "...simply executing a shift to [o] upon detection of a second vowel in [ao] takes very little longer than does interpreting and executing it as a shadowed response". Neurobiologically this suggests "...that the early phases of speech analysis yield information which is directly convertible to information required for speech production". Vocal repetition can be done immediately as in speech shadowing and echolalia. It can also be done after the pattern of pronunciation is stored in short-term memory or long-term memory. It automatically uses both auditory and where available visual information  about how a word is produced.

The automatic nature of speech repetition was noted by Carl Wernicke, the late nineteenth century neurologist, who observed that "The primary speech movements, enacted before the development of consciousness, are reflexive and mimicking in nature..".

Independent of speech
Vocal imitiation arises in development before speech comprehension and also babbling: 18 week-old infants spontaneously copy vocal expressions provided the accompanying voice matches. Imitation of vowels has been found as young as 12 weeks. It is independent of native language, language skills, word comprehension and a speaker's intelligence. Many autistic and some mentally retarded people engage in the echolalia of overheard words (often their only vocal interaction with others) without understanding what they echo. Reflex uncontrolled echoing of others words and sentences occurs in roughly half of those with Gilles de la Tourette syndrome. The ability to repeat words and nonwords without comprehension also occurs in mixed transcortical aphasia where it links to the sparing of the short-term phonological store.

The ability to repeat and imitate speech sounds occurs separately to that of normal speech. Speech shadowing provides evidence of a 'privileged' input/output speech loop that is distinct to the other components of the speech system. Neurocognitive research likewise finds evidence of a direct (nonlexical) link between phonological analysis input and motor programming output.

Effector independent
Speech sounds can be imitatively mapped into vocal articulations in spite of vocal tract anatomy differences in size and shape due to gender, age and individual anatomical variability. Such variability is extensive making input output mapping of speech more complex than a simple mapping of vocal track movements. The shape of the mouth varies widely: dentists recognize three basic shapes of palate: trapezoid, ovoid, and triagonal; six types of malocclusion between the two jaws; nine ways teeth relate to the dental arch and a wide range of maxillary and mandible deformities. Vocal sound can also vary due to dental injury and dental caries. Other factors that do not impede the sensory motor mapping needed for vocal imitation  are gross oral deformations such as hare-lips, cleft palates or amputations of the tongue tip, pipe smoking, pencil biting and teeth clinching (such as in ventriloquism). Paranasal sinuses vary between individuals 20-fold in volume, and differ in the presence and the degree of their  asymmetry.

Diverse linguistic vocalizations
Vocal imitation occurs potentially in regard to a diverse range of phonetic units and types of vocalization. The world's languages use consonantal phones that differ in thirteen imitable vocal tract place of articulations (from the lips to the glottis). These phones can potentially be pronounced with eleven types of imitable manner of articulations (nasal stops to lateral clicks). Speech can be copied in regard to its social accent, intonation, pitch and individuality (as with entertainment impersonators). Speech can be articulated in ways which diverge considerably in speed, timbre, pitch, loudness and emotion. Speech further exists in different forms such as song, verse, scream and whisper. Intelligible speech can be produced with pragmatic intonation and in regional dialects and foreign accents. These aspects are readily copied: people asked to repeat speech-like words imitate not only phones but also accurately other pronunciation aspects such as fundamental frequency, schwa-syllable expression, voice spectra and lip kinematics, voice onset times, and regional accent.

Vocabulary expansion
In 1874 Carl Wernicke proposed that the ability to imitate speech plays a key role in language acquisition. This is now a widely researched issue in child development. A study of 17,000 one and two word utterances made by six children between 18 months to 25 months found that, depending upon the particular infant, between 5% and 45% of their words might be mimicked. These figures are minima since they concern only immediately heard words. Many words that may seem spontaneous are in fact delayed imitations heard days or weeks previously. At 13 months children who imitate new words (but not ones they already know) show a greater increase in noun vocabulary at four months and non noun vocabulary at eight months. A major predictor of vocabulary increase in both 20 months, 24 months, and older children between 4 and 8 years is their skill in repeating nonword phone sequences (a measure of mimicry and storage). This is also the case with children with Down's syndrome. The effect is larger than even age: in a study of 222 two year old children that had spoken vocabularies ranging between 3–601 words the ability to repeat nonwords accounted for 24% of the variance compared to 15% for age and 6% for gender (girls better than boys).

Nonvocabulary expansion uses of imitation
Imitation provides the basis for making longer sentences than children could otherwise spontaneously make on their own. Children analyze the linguistic rules, pronunciation patterns, and conversational pragmatics of speech by making monologues  (often in crib talk) in which they repeat and manipulate in word play phrases and sentences previously overheard. Many proto-conversations involve children (and parents) repeating what each other has said in order to sustain social and linguistic interaction. It has been suggested that the conversion of speech sound into motor responses helps aid the vocal "alignment of interactions" by  "coordinating the rhythm and melody of their speech". Repetition enables immigrant monolingual children to learn a second language by allowing them to take part in 'conversations'. Imitation related processes aids the storage of overheard words by putting them into speech based short- and long-term memory.

Language learning
The ability to repeat nonwords predicts the ability to learn second-language vocabulary. A study found that adult polyglots performed better in short-term memory tasks such as repeating nonword vocalizations compared to nonpolyglots though both are otherwise similar in general intelligence, visuo-spatial short-term memory and paired-associate learning ability. Language delay in contrast links to impairments in vocal imitation.

Speech repetition and phones
Electrical brain stimulation research upon the human brain finds that 81% of areas that show disruption of phone identification are also those in which the imitating of oral movements is disrupted and vice versa; Brain injuries in the speech areas show a 0.9 correlation between those causing impairments to the copying of oral movements and those impairing phone production and perception.

Mechanism
Spoken words are sequences of motor movements organized around vocal tract gesture motor targets. Vocalization due to this is copied in terms of the motor goals that organize it rather than the exact movements with which it is produced. These vocal motor goals are auditory. According to James Abbs 'For speech motor actions, the individual articulatory movements would not appear to be controlled with regard to three- dimensional spatial targets, but rather with regard to their contribution to complex vocal tract goals such as resonance properties (e.g., shape, degree of constriction) and or aerodynamically significant variables'. Speech sounds also have duplicable higher-order characteristics such as rates and shape of modulations and rates and shape of frequency shifts. Such complex auditory goals (which often link—though not always—to internal vocal gestures) are detectable from the speech sound which they create.

Dorsal speech processing stream function
Two cortical processing streams exist: a ventral one which maps sound onto meaning, and a dorsal one, that maps sound onto motor representations. The dorsal stream projects from the posterior Sylvian fissure at the temporoparietal junction, onto frontal motor areas, and is not normally involved in speech perception. Carl Wernicke identified a pathway between the left posterior superior temporal sulcus (a cerebral cortex region sometimes called the Wernicke's area) as a centre of the sound "images" of speech and its syllables that connected through the arcuate fasciculus with part of the inferior frontal gyrus (sometimes called the Broca's area) responsible for their articulation. This pathway is now broadly identified as the dorsal speech pathway, one of the two pathways (together with the ventral pathway) that process speech. The posterior superior temporal gyrus is specialized for the transient representation of the phonetic sequences used for vocal repetition. Part of the auditory cortex also can represent aspects of speech such as its consonantal features.

Mirror neurons
Mirror neurons have been identified that both process the perception and production of motor movements. This is done not in terms of their exact motor performance but an inference of the intended motor goals with which it is organized. Mirror neurons that both perceive and produce the motor movements of speech have been identified. According to the motor theory of speech imitation such speech mirror neurons in infants have selected for motor goals with vocal track gestures that are easy to imitate and this has shaped the nature of the phonetic units out of which spoken words are constructed. Speech is mirrored constantly into its articulations since speakers cannot know in advance that a word is unfamiliar and in need of repetition—which is only learnt after the opportunity to map it into articulations has gone. Thus, speakers if they are to incorporate unfamiliar words into their spoken vocabulary must by default map all spoken input. The motor theory of speech imitation unlike that of motor theory of speech perception does not link mirror neurons with speech perception.

Evolution and language
Human language is a vocabulary-based form of communication that unlike that of other animals employs tens of thousands of lexicals and names. This requires that young humans new to language have the ability to quickly learn both the pronunciations and use of many thousands of words. If children could not repeat speech without problems, human language could not exist. This makes the evolution of the capacity of speech repetition a critical innovation needed for the origin of speech The motor theory of speech imitation argues that this need for speech to be imitable not speech perception nor speech production moreover underlies the evolved nature of the vowel and consonant units of phonetics.

Sign language
Words in sign languages, unlike those in spoken ones, are made not of sequential units but of spatial configurations of subword unit arrangements, the spatial analogue of the sonic-chronological morphemes of spoken language. These words, like spoken, ones are learnt by imitation. Indeed, rare cases of compulsive sign-language echolalia exist in otherwise language-deficient deaf autistic individuals born into signing families. At least some cortical areas neurobiologically active during both sign and vocal speech, such as the auditory cortex, are associated with the act of imitation.

Birds
Birds learn their songs from those made by other birds. In several examples, birds show highly developed repetition abilities: the Sri Lankan greater Greater Racket-tailed Drongo (Dicrurus paradiseus) copies the calls of predators and the alarm signals of other birds Albert's Lyrebird (Menura alberti) can accurately imitate the Satin Bowerbird (Ptilonorhynchus violaceus),

Research upon avian vocal motor neurons finds that they perceive their song as a series of articulatory gestures as in humans. Birds that can imitate humans, such as the Indian Hill myna (Gracula religiosa), imitate human speech by mimicking the various speech formants, created by changing the shape of the human vocal tract, with different vibration frequencies of its internal tympaniform membrane. Indian hill mynahs also imitate such phonetic characteristics as voicing, fundamental frequencies, formant transitions, nasalization, and timing, through their vocal movements are made in a different way from those of the human vocal apparatus.

Nonhuman mammals

 * Bottlenose dolphins can show spontaneous vocal mimicry of computer-generated whistles.
 * Killer whales can mimic the barks of California sea lions.
 * Harbor seals can mimic in a speech-like manner one or more English words and phrases
 * Elephants can imitate trunk sounds.
 * Lesser spear-nosed bat can learn their call structure from artificial playback.
 * An orangutan has spontaneously copied the whistles of humans.

Apes
Apes taught language show an ability to imitate language signs with chimpanzees such as Washoe who was able to learn with his arms a vocabularly of 250 American Sign Language gestures. However, such human trained apes show no ability to imitate human speech vocalizations.