Zellig Harris

Zellig Sabbettai Harris (October 23, 1909 – May 22, 1992) was a renowned American linguist, mathematical syntactician, and methodologist of science. Originally a Semiticist, he is best known for his work in structural linguistics and discourse analysis and for the discovery of transformational structure in language. These developments from the first 10 years of his career were published within the first 25. His contributions in the subsequent 35 years of his career include transfer grammar, string analysis (adjunction grammar), elementary sentence-differences (and decomposition lattices), algebraic structures in language, operator grammar, sublanguage grammar, and a theory of linguistic information.

Biography
Harris was born on October 23, 1909 in Balta, Russian Empire. In 1913 when he was four years old his family immigrated to Philadelphia, Pennsylvania. At age 13, at his request, he was sent to live in Palestine, where he worked to support himself, and throughout his life, he returned frequently to live on a socialist kibbutz in Israel. His brother was Dr. Tzvi N. Harris, who with his wife Shoshana played a pivotal role in the understanding of the immune system and the development of modern immunology. His sister, Anna H. Live, was Director of the English Institute (for ESL students) at the University of Pennsylvania (now named the English Language Program). In 1941, he married the physicist Bruria Kaufman, who was Einstein's assistant in the 1950s at Princeton. In the 1960s the couple established residence in Israel and lived in kibbutz Mishmar Ha'Emek, and adopted their daughter, Tamar. From 1949 until his death, Harris maintained a close relationship with Naomi Sager, director of the Linguistic String Project at New York University. Their daughter, Eva Harris, is a professor of Infectious Diseases at the University of California, Berkeley, and the President of the non-profit organization Sustainable Sciences Institute. Harris died in his sleep after a routine working day at the age of 82 on May 22, 1992, in New York.

Early career
Harris received his bachelor's (1930), master's (1932), and doctoral (1934) degrees in the Oriental Studies department of the University of Pennsylvania. His first direction was as a Semiticist, with publications on Ugaritic, Phoenician, and Canaanite, and on the origins of the alphabet; and later on Hebrew, both classical and modern. He began teaching linguistic analysis at Penn in 1931, developing an increasingly comprehensive approach which saw practical application as part of the war effort in the 1940s, and formally established there in 1946-1947 the first modern linguistics department in the United States.

Harris's early publications brought him to the attention of Edward Sapir, who strongly influenced him and who came to regard him as his intellectual heir. Harris also greatly admired Leonard Bloomfield for his work and as a person. He did not formally study with either.

Relation to "Bloomfieldian" structuralism
It is widely believed that Harris carried Bloomfieldian ideas of linguistic description to their extreme development: the investigation of discovery procedures for phonemes and morphemes, based on the distributional properties of these units and of antecedent phonetic elements. His Methods in Structural Linguistics (1951) is the definitive formulation of descriptive structural work as he had developed it up to about 1945. This book made him famous, but generativists have sometimes interpreted it as a synthesis of a "neo-Bloomfieldian school" of structuralism.

In contrast, Harris viewed his work as articulating methods for verifying that results, however reached, are validly derived from the data of language. This was in line with virtually all serious views of science at the time; Harris's methods corresponded to what Hans Reichenbach called "the context of justification," as distinct from "the context of discovery." He had no sympathy for the view that to be scientific a linguistic analyst must progress by stepwise discovery from phonetics, to phonemics, to morphology, and so on, without "mixing levels."

Fundamental to this approach, and, indeed, making it possible, is Harris' recognition that phonemic contrast cannot be derived from distributional analysis of phonetic notations but rather that the fundamental data of linguistics are speaker judgments of phonemic contrast. He developed and clarified methods of controlled experiment by substitution tests in which informants distinguish repetition from contrast, the most careful formulation of which he called the pair test (Harris 1951:32). It is probably accurate to say that phonetic data are regarded as fundamental in all other approaches to linguistics. For example, Chomsky (1964:78) "assume[s] that each utterance of any language can be uniquely represented as a sequence of phones, each of which can be regarded as an abbreviation for a set of features". Recognizing the primacy of speaker perceptions of contrast enabled remarkable flexibility and creativity in Harris's linguistic analyses which others - without that improved foundation - labelled "game playing" and "hocus-pocus."

Henry Hoenigswald tells us that in the late 1940s and the 1950s Harris was viewed by his colleagues as a person exploring the consequences of pushing methodological principles right to the edge. As a close co-worker put it Zellig Harris's work in linguistics placed great emphasis on methods of analysis. His theoretical results were the product of prodigious amounts of work on the data of language, in which the economy of description was a major criterion. He kept the introduction of constructs to the minimum necessary to bring together the elements of description into a system. His own role, he said, was simply to be the agent in bringing data in relation to data. &hellip; But it was not false modesty that made Harris downplay his particular role in bringing about results, so much as a fundamental belief in the objectivity of the methods employed. Language could only be described in terms of the placings of words next to words. There was nothing else, no external metalanguage. The question was how these placings worked themselves into a vehicle for carrying the 'semantic burden' of language. &hellip; His commitment to methods was such that it would be fair to say that the methods were the leader and he the follower. His genius was to see at various crucial points where the methods were leading and to do the analytic work that was necessary to bring them to a new result.

This, then, is an extension and refinement of the distributional methodology pioneered by Bloomfield and Sapir, the analysis of which elements of language can co-occur and which cannot. Given a representation in which contrasting utterances (non-repetitions) are written differently, stochastic procedures amenable to statistical learning theory identify the boundaries of words and morphemes. Given words and morphemes, the general method is by substitution of one element, the others in its context being constant, and experimental testing for the occurrence of the new combination in a corpus and for its acceptability by users of the language.

This experimental distributional methodology is thus grounded at two points in the subjective judgments of language users: judgments as to repetition vs. imitation, yielding the fundamental data of phonemic contrast, and judgments as to acceptability, yielding those "departures from randomness" that enable language to carry information. This is in contrast to the commonly held view that Harris, like Bloomfield, rejected mentalism and espoused behaviorism.

Major contributions in the 1940s
Harris's contributions to linguistics as of about 1945 are summarized in Methods in Structural Linguistics (Harris 1951). They include componential analysis of long components in phonology, componential analysis of morphology, discontinuous morphemes, and a substitution-grammar of phrase expansions that is related to immediate-constituent analysis, but without its limitations. With its manuscript date of 1946, the book has been recognized as including the first formulation of the notion of a generative grammar.

The overriding aim of the book, and the import of the word "methods" in its original title, is a detailed specification of validation criteria for linguistic analysis. These criteria lend themselves to differing forms of presentation that have sometimes been taken as competing. Harris showed how they are complementary. (An analogy may be drawn to intersecting parameters in optimality theory.) "It is not that grammar is one or another of these analyses, but that sentences exhibit simultaneously all of these properties." Harris's treatment of these as tools of analysis rather than theories of language, and his way of using them to work toward an optimal presentation for this purpose or that, contributed to the perception that he was engaged in "hocus-pocus" with no expectation that there was any truth to the matter.

Harris's central methodological concern beginning with his earliest publications was to avoid obscuring the essential characteristics of language behind unacknowledged presuppositions, such as are inherent in conventions of notation or presentation. In this vein, among his most illuminating works in the 1940s are restatements of analyses by other linguists, done with the intention of displaying properties of the linguistic phenomena which are invariant across diverse representations. This anticipates later work on linguistic universals.

Metalanguage and notational systems
The basis of this concern was that such hidden presuppositions are dependent upon prior knowledge of and use of language. Since the object of investigation is language itself, properties of language cannot be presupposed without question-begging. "We ca[n]not describe the structure of natural language in some other kind of system, for any system in which we could identify the selements and meanings of a given language would have to have already the same essential structure of words and sentences as the language to be described." "[W]e cannot in general impose our own categories of information upon language. &hellip; We cannot determine in an a priori way the 'logical form' of all sentences.&hellip;" etc. (Harris 1991:346)

Natural language demonstrably contains its own metalanguage, in which we talk about language itself. Any other means for talking about language, such as logical notations, depends upon our prior shared 'common parlance' for our learning and interpreting it. To describe language, or to write a grammar, we cannot rely upon metalinguistic resources outside of that intrinsic metalanguage, "for any system in which we could identify the elements and meanings of a given language would have to have already the same essential structure of words and sentences as the language to be described." "There is no way to define or describe the language and its occurrences except in such statements said in that same language or in another natural language. Even if the grammar of a language is stated largely in symbols, those symbols will have to be defined ultimately in a natural language." (Harris 1991:274)

From this observation there followed Harris's conclusion that a science that aims to determine the nature of language is limited to investigation of the relationships of elements to one another (their distribution). Indeed, beginning with the fundamental data of linguistics, the phonemic contrasts, all the elements are defined relative to one another. Any metalinguistic notions, representations, or notational conventions that are not stateable in metalanguage assertions of the language itself import complexity that is not intrinsic to language, obscuring its true character. Because of this, Harris strove for a 'least grammar'. "The reason for this demand is that every entity and rule, and every complexity and restriction of domains of a rule, states a departure from randomness in the language being described. Since what we have to describe is the restriction on combinations in the language, the description should not add restrictions of its own."

The hypothesis of Universal Grammar (UG) is a contrary proposal that (some) metalinguistic resources for language are in fact a priori, prior to and external to language, as part of the genetic inheritance of humans. Insofar as the only evidence for properties of UG are in language itself, Harris's view was that such properties cannot be presupposed, but they may be sought once a principled theory of language is established on a purely linguistic basis.

Linguistics as applied mathematics
Deriving from this insight, Harris's aim was to constitute linguistics as a product of mathematical analysis of the data of language. "[The] problem of the foundations of mathematics was more topical than ever just at the time when Harris took charge of the 'homologous' enterprise of establishing linguistics on a clear basis." "We see here then nearly fifty years during which, to realize the program that he established very early, Zellig Harris searched and found in mathematics some of his supports. This merits closer attention, and it is doubtless advisable to consider it without shutting it into the reductive box of 'possible applications of mathematics to linguistics.' Is not the question rather 'how could a little mathematics transmute itself into linguistics?'" He contrasted this with attempts by others to project the properties of language from formal language-like systems. "The interest &hellip; is not in investigating a mathematically definable system which has some relation to language, as being a generalization or a subset of it, but in formulating as a mathematical system all the properties and relations necessary and sufficient for the whole of natural language."

Transformational structure in language
As early as 1939, Harris began teaching his students about linguistic transformations and the regularizing of texts in discourse analysis. This aspect of his extensive work in diverse languages such as Kota, Hidatsa, and Cherokee, and of course Hebrew (ancient and modern), as well as English, did not begin to see publication until his "Culture and Style" and "Discourse Analysis" papers in 1952. A later series of papers beginning with "Co-occurrence and Transformations in Linguistic Structure" (1957) developed a more general theory of syntax.

Harris argued, following Sapir and Bloomfield, that semantics is included in grammar, not separate from it, form and information being two faces of the same coin. A particular application of the concern about presuppositions and metalanguage, noted above, is that any specification of semantics other than that which is immanent in language can only be stated in a metalanguage external to language.

Prior to Harris's discovery of transformations, grammar as so far developed could not yet treat of individual word combinations, but only of word classes. A sequence or ntuple of word classes (plus invariant morphemes, termed constants) specifies a subset of sentences that are formally alike. Harris investigated mappings from one such subset to another in the set of sentences. In linear algebra, a mapping that preserves a specified property is called a transformation, and that is the sense in which Harris introduced the term into linguistics. Harris's transformational analysis refined the word classes found in the 1946 "From Morpheme to Utterance" grammar of expansions. By recursively defining semantically more and more specific subclasses according to the combinatorial privileges of words, one may progressively approximate a grammar of individual word combinations. This relation of progressive refinement was subsequently shown in a more direct and straightforward way in a grammar of substring combinability resulting from string analysis (Harris 1962).

Noam Chomsky was Harris's student, beginning as an undergraduate in 1946. The two scholars developed their concepts of transformation on different premises. Rather than taking transformations in the algebraic sense of mappings from subset to subset, preserving inter-word restrictions, Chomsky adapted the notion of rules of transformation vs. rules of formation from Rudolf Carnap. When he was introduced to the Post production systems of Emil Post and their capacity to generate language-like formal systems, he employed them as a notation for presentation of immediate-constituent analysis. He called this phrase structure grammar (PSG), which he then adapted for presentation of Harris's transformations, restated as operations mapping one phrase-structure tree to another. In his conception, PSG provided the rules of formation which were ‘enriched′ by his rules of transformation. This led later to his redefinition of transformations as operations mapping an abstract deep structure into a surface structure. This notion of transformation adds layers of complexity that Harris regarded as unnecessary and undesirable. In Harris's transformational analysis, inter-word dependencies suffice to determine mappings in the set of sentences, and many generalizations that seem of importance in the various theories employing abstract syntax trees, such as island phenomena, fall out naturally with no special explanation needed. Harris did not require the complex hierarchy of abstract structure posited by Chomsky.

Early work on transformations used paraphrase as a heuristic, but Harris recognized that this was inadequate (e.g. in Harris 1954) and, in keeping with the methodological principles noted above in the section on metalanguage issues and earlier, sought a formal criterion for transformational analysis. In the 1957 "Co-Occurrence and Transformation" paper his criterion for a transformation between two sentence-forms was that inter-word co-occurrence restrictions should be preserved under the mapping; that is, if two sentence-forms are transforms, then acceptable word choices for one also obtain for the other. Even while the 1957 publication was in press it was clear that preservation of word co-occurrence could not resolve certain problems, and in the 1965 "Transformational Theory" the criterion for transformation was the preservation of relative acceptability of the satisfiers of each sentence-form so paired; that is, if two sentence-forms are transforms, then the relative acceptabilities of any pair of word choices satisfying one sentence-form are not reversed for the corresponding word choices satisfying the other (though in some contexts, e.g. under "I imagine" or "I dreamt", acceptabilities may be collapsed). These acceptability gradings may also be expressed as ranges of contexts in which the word choices are fully acceptable, a formulation which leads naturally to sublanguage grammar (below).

Operator grammar
Harris factored the set of transformations into elementary sentence-differences, which he then showed to be transitions in a derivational sequence. This led to a partition of the set of sentences into two sublanguages: an informationally complete sublanguage with neither ambiguity nor paraphrase, vs. the set of its more conventional and usable paraphrases ("The Two Systems of Grammar: Report and Paraphrase" 1969). In the paraphrastic set, morphemes may be present in reduced form, even reduced to zero; their fully explicit forms are recoverable by undoing deformations and reductions of phonemic shape that Harris termed "extended morphophonemics".

Thence, in parallel with the generalization of linear algebra to operator theory in mathematics, he developed Operator Grammar. Here at last is a grammar of the entry of individual words into the construction of a sentence. When the entry of an operator word on its argument word or words brings about the string conditions that a reduction requires, it may be carried out (most reductions being optional). Operator Grammar resembles predicate calculus, and has affinities with Categorial Grammar, but these are findings after the fact which did not guide its development or the research that led to it. Recent work by Stephen Johnson on formalization of operator grammar adapts the "lexicon grammar" of Maurice Gross for the complex detail of the reductions.

Sublanguage and linguistic information
In his work on sublanguage analysis, Harris showed how the sublanguage for a restricted domain can have a pre-existent external metalanguage, expressed in sentences in the language but outside of the sublanguage, something that is not available to language as a whole. In the language as a whole, restrictions on operator-argument combinability can only be specified in terms of relative acceptability, and it is difficult to rule out any satisfier of an attested sentence-form as nonsense, but in technical domains, especially in sublanguages of science, metalanguage definitions of terms and relations restrict word combinability, and the correlation of form with meaning becomes quite sharp. It is perhaps of interest that the test and exemplification of this in The Form of Information in Science (1989) vindicates in some degree the Sapir–Whorf hypothesis. It also expresses Harris's lifelong interest in the further evolution or refinement of language in context of problems of social amelioration (e.g., "A Language for International Cooperation" [1962], "Scientific Sublanguages and the Prospects for a Global Language of Science" [1988]), and in possible future developments of language beyond its present capacities.

Harris's linguistic work culminated in the companion books A Grammar of English on Mathematical Principles (1982) and A Theory of Language and Information (1991). Mathematical information theory concerns only quantity of information; here for the first time is a theory of information content. In the latter work, also, Harris ventured to propose at last what might be the "truth of the matter" about the nature of language, what is required to learn it, its origin, and its possible future development. His discoveries vindicate Sapir's recognition, long disregarded, that language is pre-eminently a social artifact, the users of which collectively create and re-create it in the course of using it.

Legacy
The influence of Harris's work is pervasive in linguistics, often invisibly. Diverse lines of research that Harris opened continue to be developed by others, as indicated by contributions to (Nevin 2002a, 2002b). The Medical Language Processor developed by Naomi Sager and others in the Linguistic String Program in the Courant Institute of Mathematical Sciences (NYU) has been made available on Sourceforge. Richard Kittredge and his colleagues have developed systems for automatic generation of text from data, which are used for weather radio broadcasts and for production of reportage of stock market activity, sports results, and the like. Work on information retrieval has been influential in development of the Lexis-Nexis systems and elsewhere.

Recent work on Statistical semantics, in particular Distributional semantics, is based on the Distributional hypothesis and is explicitly influenced by Harris' work on distributional structure.

Harris's students in linguistics include, among many others, Joseph Applegate, Ernest Bender, Noam Chomsky, William Evan, Lila R. Gleitman, Michael Gottfried, Maurice Gross, James Higginbotham, Stephen B. Johnson, Aravind Joshi, Michael Kac, Edward Keenan, Daythal Kendall, Richard Kittredge, James A. Loriot/Lauriault, Leigh Lisker, Fred Lukoff, Paul Mattick, James Munz, Bruce E. Nevin, Jean-Pierre Paillet, Ellen Prince, John R. Ross, Naomi Sager, Morris Salkoff, Thomas A. Ryckman, and William C. Watt.

Politics
Harris was also influential with many students and colleagues, though in a less public way, in work on the amelioration of social and political arrangements. His last book — The Transformation of Capitalist Society — summarizing his findings, was published posthumously. In it he proposes ways to identify and foster the seed-points of a more humane successor to capitalism, which he saw would arise in niche areas in which capitalism cannot function well, much as capitalism arose in the midst of feudalism. Some of his unpublished writings on politics are in a collection at the Van Pelt Library of the University of Pennsylvania.

He was committed all his life to radical transformation of society, but from the ground up rather than by revolution directed from the top down. From his undergraduate days he was active in a student left-Zionist organization called Avukah (Hebrew "Torch"). He resigned as its national President in 1936, the year he obtained the Ph.D., but continued in a leadership advisory role until, like many other student organizations in the war years, it fell apart in 1943. From the early 1940s he and an informal group of fellow scientists in diverse fields collaborated on an extensive project called A Frame of Reference for Social Change. This develops a new vocabulary and concepts on grounds that the existing ones of economics and sociology presuppose and thereby covertly perpetuate capitalist constructs, and that it is necessary to ‘unfool’ oneself before proceeding. This was submitted to Victor Gollancz, a notoriously interventionist editor, who demanded a complete rewrite in more familiar terms. A manuscript among Harris's papers at his death titled Directing social change was brought to publication in 1997 by Seymour Melman, Murray Eden, and Bill Evan. The publisher changed the title to The transformation of capitalist society.

Selected writings
A complete bibliography of Harris's writings is available. A selection of Harris's works follows:


 * 1936. A Grammar of the Phoenician Language. Ph.D. dissertation. American Oriental Series, 8.
 * 1939. Development of the Canaanite Dialects: An Investigation in Linguistic History. American Oriental Series, 16.
 * 1946. "From Morpheme to Utterance". Language 22:3.161–183.
 * 1951. Methods in Structural Linguistics
 * 1962. String Analysis of Sentence Structure
 * 1968. Mathematical Structures of Language
 * 1970. Papers in Structural and Transformational Linguistics
 * 1976. Notes du Cours de Syntaxe (in French)
 * 1981. Papers on Syntax
 * 1982. A Grammar of English on Mathematical Principles (ISBN 0-471-02958-0)
 * 1988. Language and Information (ISBN 0-231-06662-7)(French translation : Ibrahim A.H. et Martinot Cl., La langue et l’information, Paris, CRL.)
 * 1989. The Form of Information in Science: Analysis of an immunology sublanguage (ISBN 90-277-2516-0)
 * 1991. A Theory of Language and Information: A Mathematical Approach (ISBN 0-19-824224-7)
 * 1997. The Transformation of Capitalist Society (ISBN 0-8476-8412-1)
 * 2002. "The background of transformational and metalanguage analysis." Introduction to The Legacy of Zellig Harris: Language and Information into the 21st Century: Vol. 1: Philosophy of science, syntax, and semantics, John Benjamins Publishing Company (CILT 228).

Literature treating of Harris
[See also the comprehensive bibliography by Konrad Koerner in Nevin (2002a:305–316, 2002b:293–304), and the revision of it at ]

Corcoran, John. 1972. "Harris on the Structure of Language". In [Trömel-]Plötz 1972. 275–292.