Morphology (language)

Morphology is a subdiscipline of linguistics that studies word structure. While words are generally accepted as being the smallest units of syntax, it is clear that in most (if not all) languages, words can be related to other words by rules. For example, any English speaker can see that the words dog, dogs and dog-catcher are closely related. English speakers can also recognize that these relations can be formulated as rules that can apply to many, many other pairs of words. Dog is to dogs just as cat is to cats, or encyclopædia is to encyclopædias; dog is to dog-catcher as dish is to dishwasher. The rule in the first case is plural formation; in the second case, a transitive verb and a noun playing the role of its object can form a word. Morphology is the branch of linguistics that studies such rules across and within languages.

The term was coined by August Schleicher in 1859: Für die Lehre von der Wortform wähle ich das Wort "Morphologie" ("for the science of word formation, I choose the term 'morphology'", Mémoires Acad. Impériale 7/1/7, 35).

Lexemes and word forms
The word "word" is ambiguous in common usage. To take up again the example of dog vs. dogs, there is one sense in which these two are the same "word" (they are both nouns that refer to the same kind of animal, differing only in number), and another sense in which they are different words (they can't generally be used in the same sentences without altering other words to fit; for example, the verbs is and are in The dog is happy and The dogs are happy).

The distinction between these two senses of "word" is probably the most important one in morphology. The first sense of "word," the one in which dog and dogs are "the same word," is called lexeme. The second sense is called word form. We thus say that dog and dogs are different forms of the same lexeme. Dog and dog-catcher, on the other hand, are different lexemes; for example, they refer to two different kinds of entities. The form of a word that is chosen conventionally to represent the canonical form of a word is called a lemma or citation form.

Inflection vs. word-formation
Given the notion of a lexeme, it is possible to distinguish two kinds of morphological rules. Some morphological rules relate different forms of the same lexeme; while other rules relate two different lexemes. Rules of the first kind are called inflectional rules, while those of the second kind are called word-formation. The English plural, as illustrated by dog and dogs, is an inflectional rule; compounds like dog-catcher or dishwasher are an example of a word-formation rule. Informally, word-formation forms "new words" (that is, lexemes), while inflection gives you more forms of the "same" word (lexeme).

There is a further distinction between two kinds of word-formation: derivation and  compounding. Compounding is a kind of word-formation which involves combining complete word forms into a compound; dog-catcher is a compound, because both dog and catcher are words. Derivation involves suffixes or prefixes that are not independent words; the word independent is derived from the word dependent by prefixing it with the derivational prefix in-, and dependent itself is derived from the verb depend.

The distinction between inflection and word-formation is not at all clear-cut. There are many examples where linguists fail to agree whether a given rule is inflection or word-formation. However, the next section will clarify this distinction further.

Paradigms and morphosyntax
The notion of a paradigm is closely related to that of inflection. The paradigm of a lexeme is the set of all of its word forms, organized by their grammatical categories. The familiar examples of paradigms are the conjugations of verbs, and the declensions of nouns. The word forms of a lexeme can usually be arranged into tables, by classifying them by shared features such as tense, aspect, mood, number, gender or case. For example, the personal pronouns in English can be organized into tables, using the categories of person, number, gender and case.

The categories used to group word forms into paradigms cannot be chosen arbitrarily; they must be categories that are relevant to stating the syntactic rules of the language. For example, person and number are categories that can be used to define paradigms in English, because English has grammatical agreement rules that require the verb in a sentence to appear in an inflectional form that matches the person and number of the subject. In other words, the syntactic rules of English care about the difference between dog and dogs, because it determines which form of the verb must be used; but in contrast, no syntactic rule of English cares about the difference between dog and dog-catcher, or dependent and independent. The first two are just nouns, and the second two just adjectives, and they generally behave like any other noun or adjective behaves.

The major difference between inflection and word formation is that inflectional forms of lexemes are organized into paradigms, which are defined by the requirements of syntactic rules. The part of morphology that covers the relationship between syntax and morphology is called morphosyntax, and it concerns itself with inflection and paradigms, but not with word-formation or compounding.

Allomorphy and morphophonology
In the exposition above, morphological rules are described as analogies between word forms: dog is to dogs as cat is to cats, and as dish is to dishes. In this case, the analogy applies both to the meaning of the words and to their forms: in each pair, the word in the left always means "one of X" and the one on the right "many of X", and at the distinction is always signaled by having the plural form have an -s at the end, which the singular does not have.

One of the largest sources of complexity in morphology is that this sort of one-to-one correspondence between meaning and form hardly ever holds. In English, we have word form pairs like ox/oxen, goose/geese, and sheep/sheep, where the difference between the singular and the plural is signaled in a different way from the regular pattern, or not signalled at all. Even the case we consider "regular", with the final -s, is not quite that simple; the -s in dogs is not pronounced the same way as the -s in cats, and in a plural like dishes, we have an "extra" vowel before the -s. These cases, where the same distinction is effected by different changes of form for different lexemes, are called allomorphy.

There are several kinds of allomorphy. One is pure allomorphy, where the allomorphs are just arbitrary. The most extreme cases here are called suppletion, where two forms related by a morphological rule are just arbitrarily different: for example, the past of go is went, which is a suppletive form.

On the other hand, other kinds of allomorphy are due to interaction between morphology and phonology. Phonological rules constrain which sounds can appear next to each other in a language, and morphological rules, when applied blindly, would often violate phonological rules, by resulting in impossible sound sequences. For example, if we were to try to form the plural of dish by just putting a -s at the end, we'd get *dishs, which is not permitted by the phonology; to "rescue" the word, we put a vowel sound in between, and get dishes. Similar rules apply to the pronunciation of the -s in dogs and cats: it depends on the quality (voiced vs. unvoiced) of the preceding phoneme.

The study of allomorphy that results from the interaction of morphology and phonology is called morphophonology. Many morphophonological rules fall under the category of sandhi.

Lexical morphology
Lexical morphology is the branch of morphology that deals with the lexicon, which, morphologically conceived, is the collection of lexemes in a language. As such, it concerns itself primarily with word-formation: derivation and compounding.

Models of morphology
There are three major families of approaches to morphology, which try to capture the distinctions above in different ways. These are: Please note that while the associations indicated between the concepts in each item in that list is very strong, it is not absolute.
 * Morpheme-based morphology, which makes use of an Item-and-Arrangement approach.
 * Lexeme-based morphology, which normally makes use of an Item-and-Process approach.
 * Word-based morphology, which normally makes use of a Word-and-Paradigm approach.

Morpheme-based morphology
In morpheme-based morphology, word forms are analyzed as sequences of morphemes. A morpheme is defined as the minimal meaningful unit of a language. In a word like independently, we say that the morphemes are in-, depend, -ent, and ly; depend is the root and the other morphemes are, in this case, derivational affixes. In a word like dogs, we say that dog is the root, and that -s is an inflectional morpheme. This way of analyzing word forms as if they were made of morphemes put after each other like beads on a string, is called Item-and-Arrangement.

The morpheme-based approach is the first one that beginners to morphology usually think of, and which laymen tend to find the most obvious. This is so to such an extent that very often beginners think that morphemes are an inevitable, fundamental notion of morphology; and many five-minute explanations of morphology are, in fact, five-minute explanations of morpheme-based morphology. This is, however, not so; the fundamental idea of morphology is that the words of a language are related to each other by different kinds of rules. Analyzing words as sequences of morphemes is a way of describing these relations, but is not the only way. In actual academic linguistics, morpheme-based morphology certainly has many adherents, but is by no means absolutely dominant.

Applying a morpheme-based model strictly quickly leads to complications when one tries to analyze many forms of allomorphy. For example, it's easy to think that in dogs, we have the root dog, followed by the plural morpheme -s; the same sort of analysis is also straightforward for oxen, with the stem ox, and a suppletive plural morpheme -en. But then, how do we "split up" the word geese into root + plural morpheme? How do we do so for sheep?

Theorists who wish to maintain a strict morpheme-based approach often preserve the idea in cases like these by saying that geese is goose followed by a null morpheme (a morpheme that has no phonological content), and that the vowel change in the stem is a morphophonological rule. It is also common for morpheme-based analyses to posit null morphemes even in the absence of any allomorphy. For example, if the plural noun dogs is analyzed as a root dog followed by a plural morpheme -s, then one might analyze the singular dog as the root dog followed by a null morpheme for the singular.

Lexeme-based morphology
Lexeme-based morphology is (usually) an Item-and-Process approach. Instead of analyzing a word form as a set of morphemes arranged in sequence, we think of a word form as the result of applying rules that alter a word form or stems, to produce a new one. An inflectional rule takes a stem, does some changes to it, and outputs a word-form; a derivational rule takes a stem, and outputs a derived stem; a compounding rule takes word-forms, and outputs a compound stem.

The Item-and-Process approach bypasses the difficulty described above for Item-and-Arrangement approaches. Faced with a plural like geese, we don't have to assume there is a zero-morph; all we say is that while the plural of dog is formed by adding an -s to the end, the plural of goose is formed by changing the vowel in the stem.

Word-based morphology
Word-based morphology is a (usually) Word-and-Paradigm approach. This kind of theory takes paradigms as a central notion. Instead of stating rules to combine morphemes into word forms, or to generate word-forms from stems, word-based morphology states generalizations that hold between the forms of inflectional paradigms. The major point behind this approach is that many such generalizations are hard to state with either of the other approaches. The examples are usually drawn from fusional languages, where a given "piece" of a word, which a morpheme-based theory would call an inflectional morpheme, corresponds to a combination of grammatical categories, for example, "third person plural." Morpheme-based theories usually have no problems with this situation, since one just says that a given morpheme has two categories. Item-and-Process theories, on the other hand, often break down in cases like these, because they all too often assume that there will be two separate rules here, one for third person, and the other for plural, but the distinction between them turns out to be artificial. Word-and-Paradigm approaches treat these as whole words that are related to each other by analogical rules. Words can be categorized based on the pattern that they fit into. This applies both to existing words and to new ones. Application of a different pattern than the one that was used historically can give rise to a new word, such as older replacing elder (where older follows the normal pattern of adjectival superlatives) and cows replacing kine (where cows fits the regular pattern of plural formation). While a Word-and-Paradigm approach can explain this easily, other approaches have difficulty with phenomena such as this.

Morphological typology
See the main article, morphological typology

In the 19th century, philologists devised a now classic classification of languages in terms of their morphology. According to this typology, some languages are isolating, and have little or no morphology; others are agglutinative, and their words tend to have lots of easily-separable morphemes; while yet others are fusional, because their inflectional morphemes are said to be "fused" together. The classic example of an isolating language is Chinese; the classic example of an agglutinative language is Turkish; both Latin and Greek are classic examples of fusional languages.

When one considers the variability of the world's languages, it becomes clear that this classification is not at all clear-cut, and many languages don't neatly fit any one of these types. However, examined against the light of the three general models of morphology described above, it is also clear that the classification is very much biased towards a morpheme-based conception of morphology. It makes direct use of the notion of morpheme in the definition of agglutinative and fusional languages. It describes the latter as having separate morphemes "fused" together (which often does correspond to the history of the language, but not to its synchronic reality).

The three models of morphology stem from attempts to analyze languages that more or less match different categories in this typology. The Item-and-Arrangement approach fits very naturally with agglutinative languages; while the Item-and-Process and Word-and-Paradigm approaches usually address fusional languages.

The reader should also note that the classical typology also mostly applies to inflectional morphology. There is very little fusion going on with word-formation. Languages may be classified as synthetic or analytic in their word formation, depending on the preferred way of expressing notions that are not inflectional: either by using word-formation (synthetic), or by using syntactic phrases (analytic).