Indo-European languages

The Indo-European languages comprise a family of several hundred related languages and dialects, including most of the major languages of Europe, as well as many spoken in the Indian subcontinent (South Asia), the Iranian plateau (Southwest Asia), and Central Asia. Present-day languages in this family with more than 100 million native speakers each include English, Urdu, Hindi, Spanish, Portuguese, Bengali, Punjabi, Russian, Persian and German. Numerous national or minority languages with fewer than 100 million native speakers also exist. Indo-European has the largest numbers of speakers of the recognised families of languages in the world today, with its languages spoken by approximately 3 billion native speakers. The Indo-Iranian languages form the largest sub-branch of Indo-European.

Classification
The various subgroups of the Indo-European language family include (in historical order of their first attestation):


 * Anatolian languages, earliest attested branch, from the 18th century BC; extinct, most notably including the language of the Hittites.
 * Indo-Iranian languages, descending from a common ancestor, Proto-Indo-Iranian
 * Indo-Aryan languages, including Sanskrit, attested from the mid 2nd millennium BC.
 * Iranian languages, attested from roughly 1000 BC in the form of Avestan, and from 520 BC in the form of Old Persian
 * Dardic languages
 * Nuristani languages
 * Greek language, fragmentary records in Mycenaean from the 14th century BC; Homeric traditions date to the 8th century BC. (See Proto-Greek language, History of the Greek language.)
 * Italic languages, including Latin and its descendants (the Romance languages), attested from the 7th century BC.
 * Celtic languages, Gaulish inscriptions date as early as the 6th century BC; Old Irish texts from the 6th century AD, see Proto-Celtic language.
 * Germanic languages (including Old English and English), earliest testimonies in runic inscriptions from around the 2nd century, earliest coherent texts in Gothic, 4th century, see Proto-Germanic language.
 * Armenian language, attested from the 5th century.
 * Tocharian languages, extinct tongues of the Tocharians, extant in two dialects, attested from roughly the 6th century.
 * Balto-Slavic languages, believed by many Indo-Europeanists to derive from a common proto-language later than Proto-Indo-European, while skeptical Indo-Europeanists regard Baltic and Slavic as no more closely related than any other two branches of Indo-European.
 * Slavic languages, attested from the 9th century, earliest texts in Old Church Slavonic.
 * Baltic languages, attested from the 14th century, and, for languages attested that late, they retain unusually many archaic features attributed to Proto-Indo-European.
 * Albanian language, attested from the 15th century; relations with Illyrian, Dacian, or Thracian proposed.

In addition to the classical ten branches listed above, several extinct and little-known languages have existed:


 * Illyrian languages — possibly related to Messapian or Venetic; relation to Albanian also proposed.
 * Venetic language — close to Italic.
 * Liburnian language — apparently grouped with Venetic.
 * Messapian language — not conclusively deciphered.
 * Phrygian language — language of ancient Phrygia, possibly close to Greek, Thracian, or Armenian.
 * Paionian language — extinct language once spoken north of Macedon.
 * Thracian language — possibly close to Dacian.
 * Dacian language — possibly close to Thracian or to Albanian – or both.
 * Ancient Macedonian language — probably related to Greek; some propose relationships to Illyrian, Thracian or Phrygian.
 * Ligurian language — possibly not Indo-European; possibly close to or part of Celtic.
 * Lusitanian language — possibly related to (or part of) Celtic, or Ligurian, or Italic.

No doubt other Indo-European languages once existed which have now vanished without leaving a trace.

A large majority of auxiliary languages can be considered Indo-European, at least in content. Examples include


 * Interlingua
 * Occidental
 * Latino sine Flexione

Membership of languages in the same Language Family is determined by the presence of shared retentions, i.e., features of the proto-language (or reflexes of such features) that cannot be explained better by chance or borrowing (convergence). Membership in a branch/group/subgroup within a language family is determined by shared innovations which are presumed to have taken place in a common ancestor. Thus what makes Germanic languages "Germanic" is that large parts of the structures of all the languages so designated can be stated just once for all of them. In other words, they can be treated as an innovation that took place in Proto-Germanic, the source of all the Germanic languages. A problem, though, is that shared innovations can be acquired by borrowing or other means. It has been asserted, for example, that many of the more striking features shared by Italic languages (Latin, Oscan, Umbrian, etc.) might well be "areal" features. More certainly, very similar-looking alterations in the systems of long vowels in the West Germanic languages greatly postdate any possible notion of a proto-language innovation (and cannot readily be regarded as "areal", either, since English and continental West Germanic were not a linguistics area). In a similar vein, there are many similar innovations in Germanic and Baltic/Slavic that are far more likely to be areal features than traceable to a common proto-language, such as the uniform development of a high vowel (*u in the case of Germanic, *i in the case of Baltic and Slavic) before the PIE syllabic resonants *ṛ,* ḷ, *ṃ, *ṇ, unique to these two groups among IE languages. But legitimate uncertainty about whether shared innovations are areal features, coincidence, or inheritance from a common ancestor, leads to disagreement over the proper subdisions of any large language family. Thus specialists have postulated the existence of such subfamilies (subgroups) as Germanic with Slavic, Italo-Celtic and Graeco-Aryan. The vogue for such subgroups waxes and wanes (Italo-Celtic for example used to be an absolutely standard feature of the Indo-European landscape; nowadays it is little honored, in part because much of the striking evidence on the basis of which it was postulated has turned out to have been misinterpreted).

Indo-Hittite refers to the theory that Indo-European (sensu stricto, i.e. the proto-language of the Indo-European languages known before the discovery of Hittite), and Proto-Anatolian, split from a common proto-language called Proto-Indo-Hittite by its first theoretician, Edgar Sturtevant. Validation of such a theory would consist of identifying formal-functional structures that can be coherently reconstructed for both branches but which can only be traced to a formal-functional structure that is either (a) different from both or else (b) shows evidence of a very early, group-wide innovation. As an example of (a), it is obvious that the Indo-European perfect subsystem in the verbs is formally superimposable on the Hittite ḫi-verb subsystem, but there is no match-up functionally, such that (as has been held) the functional source must have been unlike both Hittite and Indo-European. As an example of (b), the solidly-reconstructable Indo-European deictic pronoun paradigm whose nominatives singular are *so, *sā (*seH₂), *tod has been compared to a collection of clause-marking particles in Hittite, the argument being that the coalescence of these particles into the familiar Indo-European paradigm was an innovation of that branch of Proto-Indo-Hittite.

Satem and Centum languages


Many scholars classify the Indo-European sub-branches into a Satem group and a Centum group. This terminology comes from the different treatment of the three original velar rows. Satem languages lost the distinction between labiovelar and pure velar sounds, and at the same time assibilated the palatal velars. The centum languages, on the other hand, lost the distinction between palatal velars and pure velars. Geographically, the "eastern" languages belong in the Satem group: Indo-Iranian and Balto-Slavic (but not including Tocharian and Anatolian); and the "western" languages represent the Centum group: Germanic, Italic, and Celtic. The Satem-Centum isogloss runs right between the Greek (Centum) and Armenian (Satem) languages (which a number of scholars regard as closely related), with Greek exhibiting some marginal Satem features. Some scholars think that some languages classify neither as Satem nor as Centum (Anatolian, Tocharian, and possibly Albanian). Note that the grouping does not imply a claim of monophyly: we do not need to postulate the existence of a "proto-Centum" or of a "proto-Satem". Areal contact among already distinct post-PIE languages (say, during the 3rd millennium BC) may have spread the sound changes involved. In any case, present-day specialists are rather less galvanized by the division than 19th cent. scholars were, partly because of the recognition that it is, after all, just one isogloss among the multitudes that criss-cross Indo-European linguistic geography. (Together with the recognition that the Centum Languages are no subgroup: as mentioned above, subgroups are defined by shared innovations, which the Satem languages definitely have, but the only thing that the "Centum Languages" have in common is staying put.)

Suggested superfamilies
Some linguists propose that Indo-European languages form part of a hypothetical Nostratic language superfamily, and attempt to relate Indo-European to other language families, such as South Caucasian languages, Altaic languages, Uralic languages, Dravidian languages, and Afro-Asiatic languages. This theory remains controversial, like the similar Eurasiatic theory of Joseph Greenberg, and the Proto-Pontic postulation of John Colarusso. There are no possible theoretical objections to the existence of such superfamilies; the difficulty comes in finding concrete evidence that transcends chance resemblance and wishful thinking. The main problem for all of them is that in historical linguistics the noise-to-signal ratio steadily worsens over time, and at great enough time-depths it becomes open to reasonable doubt that it is even be possible to tell what is signal and what is noise.

History of the idea of Indo-European
The first proposal of the possibility of common origin for some of these languages came from the Dutch linguist and scholar Marcus Zuerius van Boxhorn in 1647. He discovered the similarity among Indo-European languages, and supposed the existence of a primitive common language which he called "Scythian". He included in his hypothesis Dutch, Greek, Latin, Persian, and German, later adding Slavic, Celtic and Baltic languages. He excluded languages such as Hebrew from his hypothesis. However, the suggestions of van Boxhorn did not become widely known and did not stimulate further research.

The hypothesis re-appeared in 1786 when Sir William Jones first lectured on similarities between four of the oldest languages known in his time: Latin, Greek, Sanskrit, and Persian. Systematic comparison of these and other old languages conducted by Franz Bopp supported this theory, and Bopp's Comparative Grammar, appearing between 1833 and 1852 counts as the starting-point of Indo-European studies as an academic discipline.

Sound changes
As the Proto-Indo-European language broke up, its sound system diverged as well, changing according to various sound laws evidenced in the daughter-languages. Notable cases of such sound laws include Grimm's law in Proto-Germanic, loss of prevocalic *p- in Proto-Celtic, loss of prevocalic *s- in Proto-Greek, Brugmann's law in Proto-Indo-Iranian, as well as satemization (discussed above). Grassmann's law and Bartholomae's law may or may not have operated at the common Indo-European stage.

Indo-European expansion
The earliest attestations of Indo-European languages date to the early 2nd millennium BC. At that time, the languges were already diversified and widely distributed, so that "loss of contact" between the individual dialects is accepted to have taken place before 2500 BC. . Competing scenarios for the early history of Indo-European are thus largely compatible for times after 2500 BC, even if they are incommensurable for the 4th millennium BC and earlier. The following timeline inserts the scenario suggested by the mainstream Kurgan hypothesis for the mid 5th to mid 3rd millennia (see below for competing hypotheses).


 * 4500 - 4000: Early PIE. Sredny Stog, Dnieper-Donets and Samara cultures, domestication of the horse.
 * 4000 - 3500: The Yamna culture (prototypical kurgan-building) emerges in the steppe, and the Maykop culture in the northern Caucasus. Indo-Hittite models postulate the separation of Proto-Anatolian before this time.
 * 3500 - 3000: Middle PIE. The Yamna culture reaches its peak: it represents the classical reconstructed Proto-Indo-European society, with stone idols, early two-wheeled proto-chariots, predominantly practising animal husbandry, but also with permanent settlements and hillforts, subsisting on agriculture and fishing, along rivers. Contact of the Yamna culture with late Neolithic Europe cultures results in the "kurganized" Globular Amphora and Baden cultures. The Maykop culture shows the earliest evidence of the early Bronze Age, and bronze weapons and artifacts enter Yamna territory. Probable early Satemization.




 * 3000 - 2500: Late PIE. The Yamna culture extends over the entire Pontic steppe. The Corded Ware culture extends from the Rhine to the Volga, corresponding to the latest phase of Indo-European unity, the vast "kurganized" area disintegrating into various independent languages and cultures, but still in loose contact and thus enabling the spread of technology and early loans between the groups (except for the Anatolian and Tocharian branches, already isolated from these processes). The Centum-Satem division has probably run its course, but the phonetic trends of Satemization remain active.
 * 2500 - 2000: The breakup into the proto-languages of the attested dialects has done its work. Speakers of Proto-Greek live in the Balkans, speakers of Proto-Indo-Iranian north of the Caspian in the Sintashta-Petrovka culture. The Bronze Age reaches Central Europe with the Beaker culture, whose people probably use various Centum dialects. Proto-Balto-Slavic speakers (or alternatively, Proto-Slavic and Proto-Baltic communities in close contact) emerge in north-eastern Europe. The Tarim mummies possibly correspond to proto-Tocharians.
 * 2000 - 1500: Invention of the chariot, which leads to the split and rapid spread of Iranian and Indo-Aryan from the Andronovo culture and the Bactria-Margiana Archaeological Complex over much of Central Asia, Northern India, Iran and Eastern Anatolia. Proto-Anatolian splits into Hittite and Luwian. The pre-Proto-Celtic Unetice culture has an active metal industry (Nebra skydisk).
 * 1500 - 1000: The Nordic Bronze Age develops (pre-)Proto-Germanic, and the (pre-)Proto-Celtic Urnfield and Hallstatt cultures emerge in Central Europe, introducing the Iron Age. Proto-Italic migration into the Italian peninsula. Redaction of the Rigveda and rise of the Vedic civilization in the Punjab. Flourishing and decline of the Hittite Empire. The Mycenaean civilization gives way to the Greek Dark Ages.
 * 1000 BC - 500 BC: The Celtic languages spread over Central and Western Europe. Northern Europe enters the Pre-Roman Iron Age, the formative phase of Proto-Germanic. Homer initiates Greek literature and early Classical Antiquity. The Vedic civilization gives way to the Mahajanapadas. Zoroaster composes the Gathas; rise of the Achaemenid Empire, replacing the Elamites and Babylonia. The Scythians supplant the Cimmerians (Srubna culture) in the Pontic steppe. Armenians succeed the Urartu culture. Separation of Proto-Italic into Osco-Umbrian and Latin-Faliscan, and foundation of Rome. Genesis of the Greek and Old Italic alphabets. A variety of Paleo-Balkan languages have speakers in Southern Europe. The Anatolian languages suffer extinction.

Location hypotheses
Scholars have dubbed the common ancestral (reconstructed) language Proto-Indo-European (PIE). They disagree as to the original geographic location (the so-called "Urheimat" or "original homeland") from where it originated. Mainstream opinion locates PIE in the Pontic-Caspian steppe in the Chalcolithic (from ca. 4000 BC; see Kurgan hypothesis). The main competitor of this is the Anatolian hypothesis advanced by Colin Renfrew), dating PIE to several millennia earlier, associating the spread of Indo-European languages with the Neolithic spread of farming (see Indo-Hittite).

Kurgan hypothesis
The Kurgan hypothesis was introduced by Marija Gimbutas in 1956 in order to combine archaeology with linguistics in locating the origins of the Proto-Indo-European (PIE) speaking peoples. She tentatively named the set of cultures in question "Kurgan" after the Russian term for their distinctive burial mounds and traced their diffusion into Europe.

This hypothesis has had a significant impact on Indo-European research. Those scholars who follow Gimbutas identify a Kurgan or Pit Grave culture as reflecting an early Proto-Indo-European ethnicity which existed in the Pontic steppe and southeastern Europe from the fifth to third millennia BC.

Anatolian hypothesis
Colin Renfrew in 1987 suggested an association between the spread of Indo-European and the Neolithic revolution, spreading peacefully into Europe from Asia Minor (Anatolia) from around 7000 BC with the advance of farming (wave of advance). Accordingly, all the inhabitants of Neolithic Europe would have spoken Indo-European tongues, and the Kurgan migrations would at best have replaced Indo-European dialects with other Indo-European dialects.

According to Renfrew, the spread of Indo-European proceeded from "Pre-Proto-Indo-European" in 6500 to Archaic PIE in 5000 BC, with the historical Indo-European families developing from 3000 BC from "Balkan PIE".

The main strength of the farming hypothesis lies in its linking of the spread of Indo-European languages with an archeologically known event that likely involved major population shifts: the spread of farming (though the validity of basing a linguistics theory on archeological evidence remains disputed).

While the Anatolian theory enjoyed brief support when first proposed, the linguistic community in general now rejects it. While the spread of farming undisputedly constituted an important event, most see no case to connect it with Indo-Europeans in particular, since terms for animal husbandry tend to have much better reconstructions than terms related to agriculture.

Other hypotheses
The Armenian hypothesis of Tamaz Gamkrelidze and Vyacheslav V. Ivanov in 1984 placed the Indo-European homeland on Lake Urmia, suggesting that Armenian stayed in the Indo-European cradle while other Indo-European languages left the homeland and migrated on a route that led them along the eastern coast of the Caspian Sea to the steppe north of the Black Sea. Gamkrelidze and Ivanov also originated the Glottalic theory.

An Out of India theory is sometimes advanced, mostly by Indian authors, who see the Indus Valley Civilization as the location of either Proto-Indo-European or of Proto-Indo-Iranian.

Various nationalistic European groups in the 19th and early 20th centuries espoused other theories, typically locating Proto-Indo-European in the respective authors' own countries. For example, a suggested location of the proto-language in Northern Europe became involved in justifying the view of the German people as "Aryan". For a modern version of the hypothesis of European origin of PIE  see the Paleolithic Continuity Theory (proposed by Italian theorists) that derives Indo-European from the European Paleolithic cultures.

Some people have pointed to the Black Sea deluge theory, dating the genesis of the Sea of Azov to ca. 5600 BC, as a direct cause of Indo-European expansion. This event occurred in still clearly Neolithic times and happened rather too early to fit with Kurgan archaeology. One can still imagine it as an event in the remote past of the Sredny Stog culture, with the people living on the land now beneath the Sea of Azov as possible pre-Proto-Indo-Europeans.

Databases

 * The Indo-European Database
 * IE language family overview (SIL)
 * Indo-European at the LLOW-database
 * Indo-European Documentation Center at the University of Texas at Austin

Lexicon

 * Indo-European Roots, from the American Heritage Dictionary.
 * Indo-European Root/lemmas (by Andi Zeneli)