Anaphora (linguistics)

In linguistics, anaphora is an instance of an expression referring to another. Usually, an anaphoric expression is represented by a pro-form or some other kind of deictic--for instance, a pronoun referring to its antecedent. The term anaphor, an English singular variant, is sometimes used to designate an individual use: "an anaphor is a linguistic entity which indicates a referential tie to some other linguistic entity in the same text."

Anaphora is an important concept for different reasons and on different levels. First, anaphora indicates "how discourse is constructed and maintained". Second, on the level of the sentence, anaphora binds different syntactical elements together. Third, in computational linguistics anaphora presents a challenge to natural language processing, since the identification of the reference can be challenging. Fourth, anaphora "tells us some things about how language is understood, and processed", which is relevant to fields of linguistics interested in cognitive psychology.

Nomenclature and definition
The term anaphora is used in two ways. It can be used in a strict sense, and reserved for references to preceding utterances (backward reference), which can be anything, such as a noun (see examples below). In this sense, anaphora is contrasted with cataphora, in which forward references are used (where the cataphoric expression refers to a succeeding rather than a preceding utterance). Both effects together are called endophora. A more generic use of the term anaphora has it include all of these referential effects, a use of the term generally accepted since the "ground-breaking" work of M. Halliday and R. Hasan in Cohesion in English (Longman, 1976).

Thus, in endophora, reference is made to something inside of the text in which the reference is found.
 * In anaphora, as opposed to cataphora, reference is made to something within a text that has been previously identified. For example, in Susan dropped the plate. It shattered loudly, the word it refers to the phrase the plate.
 * In cataphora, reference is made to something within a text that has not yet been identified. For example, in Because he was very cold, David put on his coat the identity of the he is unknown until the individual is also referred to as David.

Two other, related, kinds of reference are noteworthy:
 * An exophoric reference refers to language outside of the text in which the reference is found.
 * A homophoric reference is a generic phrase that obtains a specific meaning through knowledge of its context. For example, the referent of the phrase the Queen must be determined by the context of the utterance, which would identify the identity of the queen in question.

Examples

 * The monkey took the banana and ate it. "It" is anaphoric under the strict definition (it refers to the banana).
 * Pam went home because she felt sick. "She" is anaphoric (it refers to Pam).
 * What is this? "This" can be considered exophoric (it refers to some object or situation near the speaker).
 * The dog ate the bird and it died. "It" is anaphoric and ambiguous (did the dog or bird die?).

Anaphora in generative grammar
In generative grammar, the term anaphor is used to refer to English's reflexive and reciprocal pronouns, and analogous forms in other languages. Anaphors in this sense must have strictly local antecedents, because they receive their reference via the local syntactic operation (or rule of interpretation) known as binding.

Reflexive anaphors must obey binding condition A, which states that "a reflexive pronoun must be bound within the smallest category containing it, its selecting head and a subject (=its governing category, or GC)". In the following sentence: *John thought that she saw himself, the GC of the reflexive 'himself' is the relative clause, since it contains the anaphor itself, its selecting head (saw) and a subject (she). The only available noun that could bind 'himself' is 'she', but this is ruled out because of the gender mismatch. The anaphor is therefore left unbound, which violates condition A - explaining the sentence's ungrammaticality.

Anaphor resolution
The resolution of an anaphor means finding what it is referring to. Often, the relation between an anaphor and its antecedent is found via inference:


 * We found a house to rent, but the kitchen was very small.

The inference made is that the kitchen is a part of the house discussed in the first clause.

Resolution can be difficult when sentences are taken out of context:


 * The Prime Minister of New Zealand visited us yesterday. The visit was the first time she had come to New York since 1998.

If the second sentence is quoted by itself, it is necessary to resolve the anaphor:


 * The visit was the first time the Prime Minister of New Zealand had come to New York since 1998.

Although of course, as The Prime Minister of New Zealand is an office of state and she would seem to refer to the person currently occupying that office, it could quite easily be that the Prime Minister of New Zealand had visited New York since 1998 and before the present day, whilst the present incumbent she had not.

However, even when taken in context, anaphor resolution can become increasingly complex. Consider the three examples:


 * We gave the bananas to the monkeys because they were hungry.
 * We gave the bananas to the monkeys because they were ripe.
 * We gave the bananas to the monkeys because they were here.

In the first sentence, "they" refers to "monkeys", whereas in the second sentence, "they" refers to "bananas". A semantic understanding that monkeys get hungry, while bananas become ripe is necessary when resolving this ambiguity. Since this type of understanding is still poorly implemented in software, automated anaphora resolution is currently an area of active research within the realm of natural language processing. The third sentence isn't easily resolved either way.

Complement anaphora
In some special cases, an anaphora may refer not to its usual antecedent, but to its complement set. This phenomenon was first extensively studied in a series of psycholinguistic experiments, in the early 1990s.

In (1), the anaphoric pronoun 'they' refers to the children who are eating the ice-cream. Contrastingly, in (2), 'they' seems to refer to the children who are not eating ice-cream.
 * (1) Only a few of the children ate their ice-cream. They ate the strawberry flavour first.
 * (2) Only a few of the children ate their ice-cream. They threw it around the room instead.

The fact that sentences like (2) exist in the language seems at first odd: by definition, an anaphoric pronoun must refer to some noun that has already been introduced into the discourse. In complement anaphora cases, since the referent of the pronoun hasn't been formerly introduced, it is difficult to explain how something can refer to it. In the first sentence of (2), the set of ice-cream-eating-children is introduced into the discourse; but then the pronoun 'they' refers to the set of non-ice-cream-eating-children, a set which hasn't been priorly mentioned. One resolution of this problem is that 'they' refers to all the children, but the second sentence semantically excludes the children who ate ice cream, since children who ate their ice cream cannot throw it around the room.

Several accounts of this phenomenon are found in the literature, based on both semantic and pragmatic considerations. The most important point of debate is the question, whether the pronoun in (2) refers to the complement set (i.e. only to the set of non-ice-cream-eating-children), or to the maximal set (i.e. to all the children, while discounting the minority group). The answer to this question may have theoretical consequences regarding the question of the kind of information that the brain is able to access or calculate, and also pragmatical consequences regarding the way a theory of anaphora resolution should be devised.