Comparative method


 * For the constant comparative method by Barney Glaser and Anselm Strauss, see Grounded theory.

The comparative method (in comparative linguistics) is a method used to detect genetic relationships between languages and to establish a consistent relationship hypothesis by reconstructing:
 * the common ancestor of the languages in question,
 * a plausible sequence of regular changes by which the historically known languages can be derived from that common ancestor.

The comparative method is the "gold standard" by which mainstream linguists judge whether two languages are related; relation is deemed certain only if a reconstruction of the common ancestor (or at least a partial reconstruction) is feasible. Other proposed approaches, such as "mass lexical comparison", are considered unreliable by most linguists.

Genetically Related languages
In the present context, "related" has a specific meaning: two languages are said to be genetically related if they are descended from the same ancestor language. Thus, for example, Spanish and French are both descended from Latin. "Descent", in turn, is defined in terms of transmission across the generations: children learn a language from the parents' generation and are then influenced by their peers; they then transmit it to the next generation, and so on (how and why changes are introduced is a complicated, unresolved issue). A continuous chain of speakers across the centuries links Vulgar Latin to all of its modern descendants.

This definition of relatedness implies that even if two languages are quite similar in their vocabularies, they are not necessarily closely related. Modern Persian in fact takes more of its words from Arabic than from its direct ancestor, Proto-Indo-Iranian. This is because of heavy borrowing over the years from Arabic into Persian. But under the definition just given, Persian is considered to be descended from Proto-Indo-Iranian, and not from Arabic.

The comparative method is a method for proving relatedness in the sense just given, as well as a method for reconstructing the proto-phonemes of a languages of a family and uncovering the phonological changes the languages of a family have undergone.

How the comparative method works
Although there is no concrete set of steps to be followed in the application of the comparative method, linguists generally agree on the basic steps, which are as follows:

1. Assemble cognate lists
Relationship between two (or more) languages can be suspected if they show a number of regular correspondences in lexicon, which means that there is a regularly recurring match between the phonetic structure of words with similar meanings. Thus, this step simply involves making lists of words which are likely cognates among the languages being compared. For example, looking at the Polynesian family, we might come up with the following list (although in practice a real list would be much longer):

However, caution needs to be exercised to avoid including borrowings or false cognates in the list, which could skew or obscure the correct data. This problem can usually be overcome by using basic vocabulary (such as kinship terms, numbers, body parts, pronouns, and other basic terms). Nonetheless, basic vocabulary can be borrowed (Finnish, for example, borrowed the word for "mother," äiti, from Gothic, while Pirahã, a Muran language of South America, borrowed all its pronouns from Nhengatu; likewise, English borrowed the pronouns they, them, and their(s) from Norse).

2. Establish correspondence sets
Once cognate lists are established, the next step is to determine the regular sound correspondences they exhibit. The notion of regular correspondence is very important here: mere phonetic similarity, as between English day and Latin dies (both with the same meaning), has no probative value. English initial d- does not regularly match Latin d-, and whatever sporadic matches can be observed are due either to chance (as in the above example) or to borrowing (e.g. Latin diabolus and English devil, both ultimately of Greek origin). The Neogrammarians (German Junggrammatiker), a group of radical young German linguists, mostly from the University of Leipzig, first emphasized this point in the late 1800s. Their motto, "sound laws have no exceptions," has remained a fundamental idea in historical linguistics to this day.

For example, although the correspondence d- : d- (where the notation "A : B" means "A corresponds to B") in English and Latin day and dies above is not regular, English and Latin do exhibit a very regular correspondence of t- : d-. For example (in Latin,  represents /k/):

Since a really systematic correspondence can hardly be accidental, if we can rule out alternative possibilities like massive borrowing, the correspondence can be attributed to common descent. If there are many regular correspondence sets of this kind (the more the better), and if they add up to a sensible pattern (one that could have been produced by known types of sound change), and if some of the correspondences are non-trivial (t : t is trivial but  : b is not), then common origin becomes a virtual certainty.

3. Discover which sets are in complementary distribution
During the time the comparative method was being developed (late 18th to late 19th century), two major developments occurred which improved the method's effectiveness.

First, it was found that many sound changes are conditioned by a particular context. Thus for example, in both Greek and Sanskrit, an aspirated stop evolved into an unaspirated one, but only if a second aspirate occurred later on in the same word; this is the so-called "Grassmann's law", known to the ancient Indian grammarians and promulgated as a historical discovery by Hermann Grassmann.

Second, it was found that sometimes sound changes occurred in contexts that were later lost. For instance, in Sanskrit velars (k-like sounds) were replaced by palatals (ch-like sounds) whenever the following vowel was *i or *e (the asterisk means that the sound is inferred rather than historically documented). Subsequent to this change, all instances of *e were replaced by a (or, more accurately, earlier *e, *o, and *a merged as a). The situation would probably have been unreconstructable, had not the original distribution of e and a been recoverable from the evidence of other Indo-European languages. Thus, for instance, Latin que, "and," preserves the original e vowel that caused the consonant shift in Sanskrit:

Ca is the attested Sanskrit form for "and." This finding was made independently by several scholars during the 1870's.

In the Dravidian languages of Telugu, Tamil and Malayalam, velar plosives in Proto-Dravidian have been replaced by the corresponding palatal if the velar plosive is followed by, , or. However this change is absent in Kannada and few other languages in the family. For example, Proto-Dravidian *kedi becomes Tamil chedi, but Kannada gida.

Verner's Law, discovered by Karl Verner in about 1875, is a similar case: the voicing of consonants in Germanic languages underwent a change that was determined by the position of the old Indo-European accent. Following the change, the accent shifted across the board to initial position. Verner solved the puzzle by comparing the Germanic voicing pattern with data from Greek and Sanskrit accent. For full discussion, see Verner's Law.

This stage of the comparative method, therefore, involves examining the correspondence sets discovered in step 2 and seeing which of them apply only in certain contexts. If two (or more) sets involve identical or similar sounds, and apply in complementary distribution, then the sets can be assumed to reflect a single original phoneme. This is because "some sound changes, particularly conditioned sound changes, can result in a proto-sound being associated with more than one correspondence set" (Campbell 2004:136).

To take another example, when we examine the Romance languages, descended from Latin, we find two different correspondence sets which both involve k:

What we do in this situation is try to see if the two sets occur in complementary distribution (in which case they reflect a single proto-phoneme) or if both occur in identical environments (in which case they must both reflect separate proto-phonemes). In this case, we discover that French ' only occurs before a in the other languages (which becomes ' in French), while French k occurs elsewhere. Both sets (1) and (2) can therefore be assumed to reflect a single proto-phoneme (in this case *k, spelled ).

A more complex case involves consonant clusters in Proto-Algonquian, which have been notoriously difficult to reconstruct. The Algonquianist Leonard Bloomfield, however, looked at the reflexes of the clusters in four of the daughter languages of Proto-Algonquian, and came up with the following correspondence sets (although the clusters are shown here ending in -k, this also generally applies to clusters ending in any of the plosives; <š> and <č> are Americanist symbols for and ):

Although all 5 correspondence sets overlap with one another in various places, they are not in complementary distribution, and so Bloomfield recognized that a different cluster must be reconstructed for each set (his reconstructions were, respectively, *hk, *xk, *čk, *šk, and çk (the modern reconstructions for these clusters are *hk, *tk, čk, šk, and rk, respectively, and two more clusters, reconstructed as ' and ', are recognized).

4. Reconstruct proto-phonemes
This step tends to be much more subjective than the previous ones. A linguist here has to rely mostly on their general intuitions about what types of sound changes are likely and which are unlikely. For example, the voicing of voiceless plosives between vowels is an extremely common sound change, occurring in languages all over the world, whilst the devoicing of voiced plosives between vowels is extremely uncommon. Therefore, if a linguist were comparing two languages with a correspondence of -t- : -d- between vowels, they would reconstruct the proto-phoneme as being *-t-, and assume that it became voiced to -d- in the second language (unless they had a very good reason not to).

It is important to keep in mind, however, that there are sometimes changes that are extremely unexpected. The Proto-Indo-European word for "two," for example, is reconstructed as *duwō, which is reflected in Classical Armenian as erku. Several other cognates demonstrate that the change *d- → erk- in the history of Armenian was a regular one. Similarly, in Bearlake, a dialect of the Athabaskan language of Slavey, there has been a sound change of Proto-Athabaskan *ts → Bearlake . Obviously, *d- did not change directly into erk- and *ts did not change directly into , but they instead must have gone through several intermediate steps to arrive at the later forms. The lesson here is that with enough sound changes, a given sound can change into just about any other sound. This is why it is not phonetic similarity which matters when utilizing the comparative method, but regular sound correspondences.

Another assumption used in determining a proto-phoneme is that our reconstruction should ideally involve as few sound changes as possible to arrive at the modern reflexes in the daughter languages. In other words, unless there is persuasive evidence to the contrary, we should reconstruct for a proto-phoneme whatever value is the most common reflex in the daughter languages. For example, in the Algonquian languages, we find the following correspondence set:

Obviously, we should reconstruct either *m or *b for this set. Both *m → b and *b → m (where "*A → B" means "*A becomes B") are conceivable sound changes, so the principle of reconstructing "likely" changes over "unlikely" ones is not useful here. Instead, linguists note that the reflex of this proto-phoneme is m in five of the languages compared here, and b in one of them. If we reconstruct *b, we need to assume five separate changes of *b → m, whereas if we reconstruct *m, we only need to assume a single change of *m → b in one language in the family. Since we are working on the assumption that our reconstructions should require the fewest number of changes possible to arrive at the modern reflexes, we would obviously reconstruct *m here.

5. Examine the reconstructed system typologically
In the final step, the linguist takes all the proto-phonemes they have reconstructed using steps 1-4, and checks to see how the system fits with what is currently known about typological constraints. For example, if the reconstructed phonemes fit together in the following system, the linguist would be suspicious, because languages generally (though not always) tend to maintain symmetry in their phonemic inventories:

In this reconstructed system, there is only one voiced plosive, *b, and although there is an apical and velar nasal, *n and *ŋ, there is no corresponding labial nasal. In this case, we would have to return to step 4 and reevaluate our earlier conclusions. In this case, we would try to figure out if there is any evidence to suggest that what we earlier reconstructed as *b is actually *m, or evidence that what we earlier reconstructed as *n and *ŋ are actually *d and *g.

Even a symmetrical system can be typologically suspicious. For example, the Proto-Indo-European plosive inventory, as traditionally reconstructed, is as follows:

Lately, however, a number of linguists have argued that this system is, at best, very suspicious typologically. It is extremely unlikely, or maybe even impossible, they say, for a language to have a voiced aspirated (breathy voice) series without a corresponding voiceless aspirated series. These linguists therefore argue, on typological grounds, that we need to reevaluate the traditional reconstruction of Proto-Indo-European. A potential solution was provided by Thomas Gamkrelidze and Vyacheslav V. Ivanov, who argued that the series traditionally reconstructed as plain voiced should in fact be reconstructed as glottalized — either implosive or ejective. The plain voiceless and voiced aspirated series would thus be seen as just voiceless and voiced, with aspiration being a non-distinctive quality of both. This has become known as the Glottalic Theory, and although it has not yet become accepted, it does have a large number of proponents, and is an excellent example of the application of linguistic typology to linguistic reconstruction.

The reconstruction of proto-sounds and their historical transformations enables us to proceed further: we can compare grammatical morphemes (word-forming affixes and inflectional endings), patterns of declension and conjugation, and so on. The full reconstruction of an unrecorded protolanguage can never be complete (for example, proto-syntax is far more elusive than phonology or morphology, and all elements of linguistic structure undergo inevitable erosion and gradual loss or replacement over time), but a consistent partial reconstruction can and must be attempted as proof of genetic relationship.

Weaknesses of the comparative method
While most historical linguists continue to use the comparative method, many of them now also recognize quite a few serious weaknesses in the method. In recent years, alternatives to the comparative method have been proposed (see Mass lexical comparison), in part due to perceived problems inherent to the method.

The Neogrammarian Hypothesis
The first weakness of the comparative method is the fundamental Neogrammarians' assumption that "sound laws have no exceptions". This assumption is problematic even on theoretical grounds: the very fact that different languages evolved according to different sound-change laws seems to indicate a degree of arbitrariness in language evolution. Moreover, once one accepts that sound changes may be conditioned by context according to rather complicated rules, one opens the door for "laws" that may affect only a few words, or even a single word; which is logically equivalent to admitting exceptions to the broader laws. This problem has led some critics to a radically opposite position, summarized by the maxim "each word has its own history."

Borrowings and random mutations
Even the Neogrammarians recognized that, apart from the general sound change laws, languages are also subject to borrowings from other languages and other sporadic changes (such as irregular inflections, compounding, and abbreviation) that affect one word at a time, or small subsets of words.

While borrowed words should be excluded from the analysis, on the grounds that they are not genetic by defintion, they do add noise to the data, and thus may hide systematic laws or distort their analysis. Moreover, there is the danger of circular reasoning — namely, of assuming that a word has been borrowed solely because it does not fit the current assumptions about the regular sound laws.

The other exceptions to the sound laws are a more serious problem, because they occur in generic language transmission. One example of such a sporadic change, with no apparent logical reason, is the Spanish word for "word", palabra. By regular sound changes from the Latin parabŏla, it should have become parabla, but the r and l changed places by sporadic metathesis).

In principle, as those sporadic changes accumulate, they will increasingly obscure the systematic sound laws, and eventually prevent the recognition of the genetic relationship between languages, or lead to incorrect reconstructions of proto-languages and incorrect family trees.

Analogy
A source of sporadic changes that was recognized by the Neogrammarians themselves was analogy, in which a word is sporadically changed to be closer to another word in the lexicon which is perceived as being somehow related to it. For example, the Russian word for "nine," by regular sound changes from Proto-Slavic, should have been, but is actually. It is believed that the initial ' changed to ' due to influence of the word for "ten" in Russian,.

Gradual application
More recently, William Labov and other linguists who have studied contemporary language changes in detail have discovered that even a systematic sound change is at first applied in an unsystematic fashion, with the percentage of its occurrence in a person's speech dependent on various social factors. Often the sound change begins to affect some words in a language, and then gradually spreads to others. These observations considerably weaken the Neogrammarians axiom that "sound laws have no exceptions."

Problems with the tree model
Another weakness of the comparative method is its reliance on the so-called "Tree Model" (German Stammbaum). In this model, daughter languages are seen as branching out from the proto-language, gradually growing more and more distant from the proto-language through accumulated phonological, morpho-syntactical, and lexical changes; and possibly splitting into further daughter languages. This model is usually representd by upside-down tree-like diagrams. For example, here is a diagram of the Uto-Aztecan family of languages, spoken throughout the southern and western United States and Mexico:



(Families are in bold, individual languages in italics. Not all of the branches and languages are shown, for lack of space.)

The Wave Model
Unfortunately, the tree model does not reflect the reality of how languages change. Since languages change gradually, there are long periods in which different dialects of a language, as they evolve into separate languages, remain in contact with one another and influence each other. Even once they are completely separated, languages which are near to one another will continue to influence each other, often sharing grammatical, phonological, and lexical innovations. A change in one language of a family will often spread to neighboring languages; and multiple waves of change may partially overlap like waves on the surface of a pond, across language and dialect boundaries, each with its own randomly delimited range (Fox 1995:129) The following diagram illustrates this conception of language change, called the Wave Model:



This is a serious challenge to the comparative method, which is entirely based on the assumption that each language has a single "genetic" parent, and hence that the genetic relationship between two languages is due to their descent from a common ancestor.

Non-uniformity of the proto-language
Another assumption implicit in the methodology of the comparative method is that the proto-language is uniform. However, even in extremely small language communities there are always dialect differences, whether based on area, gender, class, or other factors (the Pirahã language of Brazil is spoken by only several hundred people, but has at least two different dialects, one spoken by men and one by women, for example). Therefore, the single proto-language reconstructed by the comparative method is, in all likelihood, a language which never existed.

Creoles
Another potential problem for the comparative method is the phenomenon of creole language formation, where essentially a new language is formed from a complicated combination of two languages that are not closely related. The Papiamentu language, spoken in the Caribbean, is a notable example. In these events, the new language may end up with a lexicon and phonology which is derived from both parent languages, in varying proportions; while the grammar (morphology and syntax) is partly inherited, and partly the result of local innovation. Often function words from one of the parent languages are inherited, but used for a completely different function in the creole.

Creole formation seems to be a fairly common phenomenon. Dozens of such events have been documented in the last 500 years, in the wake of European colonial expansion, and many more must have happened along the fringes of past empires. While the comparative method may be able to detect the existence of a genetic relation between the creole and the parent languages (or between two creoles with shared parents), the reconstructed "proto-language" is likely to be a thoroughy artificial construct.

Subjectivity of the reconstruction
While the identification of systematic sound correspondences between known languages is failry objective, the reconstruction of their common ancestral language is inherently subjective. In the proto-Algonquian example above, the choice of m as the parent phoneme is only likely, not certain. It is quite possible that a proto-Algonquian language with b in those locations split into two branches, one which preserved b and one with m instead; and while the first originated only the Arapaho, the second spread out wider and originated all the other Algonquian tribes. (Such dramatic asymmetries in the growth of different branches of the same tree are actually common; contrast for example the Romance and Celtic branches of Indo-European.) It is also possible that the nearest common ancestor of the Algonquian languages used some other sound instead, such as p, which eventually mutated to b in one branch and to m in the other.

Since the reconstruction of a proto-language involves many of these choices, the probability of making a wrong choice is very high. That is, any reconstructed proto-language is almost certainly incorrect; it is an artificial construct that is accepted by convention, not by rigorous proof. These hidden errors take their toll when two reconstructed proto-languages are compared in order to build large family trees.

Assessment
In view of these weaknesses, we must be wary of the reconstructions and trees obtained with the comparative method. Most linguists, however, continue to use it, although they now recognize its flaws. Fox (1997:141-2), for example, concludes:

"'The Comparative Method as such is not, in fact, historical; it provides evidence of linguistic relationships to which we may give a historical interpretation. ...The interpretative processes must therefore weight up the evidence provided by the Comparative Method in conscious knowledge of these weaknesses, and in the light of other relevant considerations, if they are to give historical validity to the results. ...Our interpretation of the findings of the method have doubtless changed as more has been learnt of the historical processes involved, and this has probably made historical linguists less prone to equate the idealizations required by the method with historical reality. ...Provided we keep [the interpretation of the results and the method itself] apart, the Comparative Method can continue to be used in the reconstruction of earlier stages of languages.'"