Tree-adjoining grammar

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi which is often used in computational linguistics and natural language processing. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (see tree (graph theory) and tree data structure).

The rules in a TAG (known as auxiliary trees) are trees with a special leaf node known as the foot node. The root (top) node and foot node must be labeled with the same symbol.

A TAG derivation starts with an initial tree (analogous to the start symbol in a context-free grammar). Rewriting is accomplished via adjunction, where an auxiliary tree is adjoined at a node (typically a non-leaf node) in the current tree. The root/foot label of the auxiliary tree must match the label of the node at which it adjoins. This operation splits the node which is the target of adjunction into a top half and bottom half; the top half is attached to the root of the adjoining tree whilst the bottom half is attached at the foot node of the adjoining tree.

This is the most basic variant of TAG; the most common variant adds another rewriting operation called substitution, and other variants allow multi-component trees, trees with multiple foot nodes, and other extensions.

Tree-adjoining grammars are often described as "mildly context-sensitive", meaning that they possess certain properties that make them more powerful (in terms of weak generative capacity) than context-free grammars, but less powerful than context-sensitive grammars as defined in the Chomsky hierarchy. Mildly context-sensitive grammars are (it is sometimes conjectured) powerful enough to model the grammars of natural languages while remaining efficiently parseable in the general case.