Historical linguistics

Historical linguistics (also diachronic linguistics or comparative linguistics) is primarily the study of the ways in which languages change over time. It is opposed to synchronic linguistics, which studies the state of a language at a certain point.

The main tools of historical linguistics are the analysis of historical records, and the comparison of internal features — vocabulary, word formation, and syntax — of current and extinct languages. The goal is to trace the development and genetic affiliations of the world languages, and understand the process of language evolution. A classification of all languages into family trees is both a major result and a necessary tool of this effort.

Modern historical linguistics grew out of the earlier discipline of philology, the study of ancient texts and documents. In its early years, historical linguistics focused on the well-known Indo-European languages; but since then, significant comparative linguistic work has been done on the Uralic languages, Austronesian languages and various families of Native American languages, among many others.

Language evolution and the comparative method
Languages change over time. What were once dialects of the same language may eventually diverge enough that they are no longer mutually intelligible and can be considered separate languages.

One method to illustrate the relationship between such divergent yet related languages is to construct family trees, an idea pioneered by the 19th century historical linguist August Schleicher. The basis for the trees is the comparative method: languages presumed to be related are compared with one another, and linguists look for regular sound correspondences based on what is generally known about how languages can change, and use them to reconstruct the best hypothesis about the nature of the common ancestor language from which the attested languages are descended.

Use of the comparative method is validated by its application to languages whose common ancestor is known. Thus, when the method is applied to the Romance languages (which include French, Spanish, Portuguese, Italian, and Romanian), the reconstructed common ancestor language comes out rather similar to Latin — not the classical Latin of Horace and Cicero, but Vulgar Latin, the colloquial Latin spoken in various dialects in the late Roman Empire.

The comparative method can be used to reconstruct languages for which no written records exist, either because none have been preserved or because the speakers were illiterate. Thus, the Germanic languages (which include German, Dutch, English, Norwegian, Swedish, Danish, Faroese, Icelandic, Yiddish, and the extinct Gothic) can be compared to reconstruct Proto-Germanic, a language that was probably contemporaneous with Latin and for which no records are preserved.

Germanic and Latin (more precisely, Proto-Italic, the ancestor of Latin and a few of its neighbors) are themselves related, being both descended from Proto-Indo-European, spoken perhaps 5000 years ago. Scholars have reconstructed a Proto-Indo-European languae on the basis of data from its nine surviving daughter branches, which are: Germanic, Italic, Celtic, Greek, Baltic, Slavic, Albanian, Armenian, Indo-Iranian, and from the two dead branches Tocharian and Anatolian.

The comparative method aims to distinguish so-called genetic linguistic descent — that is, the passing of a language from parents to children, down through the generations — from resemblances that are due to cultural contact between contemporary languages.

For example, about 30% of the vocabulary of Persian is taken from Arabic, as a result of the Arab conquest of Iran in the 8th century and much subsequent cultural contact. Yet Persian is considered to be a member of the Indo-European language family — because of its core vocabulary, which generally has Indo-European cognates (as in mâdar = "mother"), and of many characteristically Indo-European features of its grammar (as in bûd = "was", formed from a root related to English "be" and a suffix related to the English past tense ending "-ed".)

Once the various changes in the daughter branches have been worked out, and a fair amount of the core vocabulary and grammar of the protolanguage are understood, then scholars will quite generally agree that a relationship of genetic relatedness has been proven.

Non-comparative method theories
Much more controversial are hypotheses about relatedness which are not supported by application of the comparative method. Scholars who attempt to probe deeper than the comparative method supports (for example, by tabulating similarities found by mass lexical comparison without setting up sound correspondences) are often accused of scholarly wishful thinking. The problem is that any two languages have a huge number of opportunities to resemble one another just by accident, so merely pointing out isolated resemblances has little evidentiary value. A famous example is the Persian word for "bad", which is pronounced (more or less) just like English "bad". It can be shown that the resemblance between these two words is completely accidental, and has nothing to do with the (rather remote) genetic connection between English and Persian. For further examples, see False cognate. The idea is that this linguistic "noise" may be reduced by comparing large amounts of words, which is exactly the point of mass lexical comparison. However, by ignoring known historical changes in the languages, mass lexical comparison incorporates known randomness, and therefore its conclusions are inherently inaccurate to an extent that is impossible to assess.

Since supporting distant genetic relationships is so difficult, and the method for finding and proving such relationships is not well established (in the way that the comparative method is), the field of locating remote relationships is riven with scholarly controversy. Nevertheless, the temptation to pursue remote relationships remains a powerful lure to many scholars-- after all, Proto-Indo-European must have seemed a rather wild hypothesis to many when it was first proposed.

This uncertainty also relates to estimates of how long it would take for languages to diverge completely. One commonly cited opinion is that if a group of people were sent to a distant galaxy, after 10,000 years they would be speaking a language that would be no more similar to their native language than any other language selected at random. This figure is based on glottochronology, using a simplified assumption of a constant 14% loss rate each millennium and a chance similarity rate of 5%. However, other work by Isidore Dyen and Sergei Starostin indicates that in fact words have wildly differing expected life spans; thus, for instance, a specialized word like "goshawk" might on average last a mere millennium or two, whereas extremely common words like "I" and "you" often last so long that it is not possible to even estimate their life span without reconstructions going further back in time than those that are universally accepted.

The ultimate in remote reconstruction is the recovery of a Proto-World language. Not all scholars believe that such a language even necessarily existed, since some models of human evolution may allow the independent appearance of human speech in several parts of the world, resulting in several linguistic families with no common ancestral language. Nevertheless, Joseph Greenberg suggested that Proto-World was the language of people coming out of northeast Africa around 50,000 BC. On the other hand, according to current archaeological evidence, the native languages of South America must have been isolated from those of the Old World for 10,000 years or more; and those of Australia may have split off from the putative world language tree even earlier than that. Therefore, if one accepts the estimate that no relationships would be recognizable after 10,000 years, then we stand little chance of demonstrating a common origin for all the world languages — unless, against current expectations, they all had a common ancestor within that time span.

Den&eacute;-Caucasian has also been postulated to include Na-Den&eacute; (North America), Sino-Tibetan, Ket (Siberia), Burushaski (Pakistan), North-East Caucasian (Chechen and the Dagestan languages), and Basque. This language family is extremely hypothetical.

The Nostratic hypothesis was proposed by a Dane named Holger Pedersen, in 1903. The hypothesis claims that the Nostratic grouping includes such widely ranging language families as Indo-European, Afro-Asiatic, Uralic, Altaic, Sumerian, Elamo-Dravidian, and Kartvelian. Others claim other sets of languages. Some have speculated that the Nostratics were refugees from a Black Sea Flood of around 5600 BC, and some think this is the origin of Noah's Flood from the Bible. However, linguists have reached no firm conclusion about the validity of the Nostratic hypothesis. Its proponents, unlike Greenberg, use the traditional comparative method; however, their comparisons are often accused of being far-fetched or involving too many semantic shifts, while some also accuse them of simply grouping together the language families most familiar to them and neglecting to compare each of them to language families further afield.