Semiotic information theory

Semiotic information theory considers the information content of signs and expressions as it is conceived within the semiotic or sign-relational framework developed by Charles Sanders Peirce.

What's it good for?
The good of information is its use in reducing our uncertainty about some issue that comes before us. Generally speaking, uncertainty comes in several flavors, and so the information that serves to reduce uncertainty can be applied in several different ways. The situations of uncertainty that human agents commonly find themselves facing have been investigated under many headings, literally for ages, and the classifications that subtle thinkers arrived at long before the dawn of modern information theory still have their uses in setting the stage of an introduction.

Picking an example of a subtle thinker almost at random, the philosopher-scientist Immanuel Kant divided the principal questions of human existence into three parts:


 * What's true?
 * What's to do?
 * What's to hope?

The third question is a bit too subtle for the present frame of discussion, but the first and second are easily recognizable as staking out the two main axes of information theory, namely, the dual dimensions of information and control. Roughly the same space of concerns is elsewhere spanned by the dual axes of competence and performance, specification and optimization, or just plain knowledge and skill.

A question of what's true is a descriptive question, and there exist what are called descriptive sciences devoted to answering descriptive questions about any domain of phenomena that one might care to name.

A question of what's to do, in other words, what must be done by way of achieving a given aim, is a normative question, and there exist what are called normative sciences devoted to answering normative questions about any domain of problems that one might care to address.

Since information plays its role on a stage set by uncertainty, a big part of saying what information is will necessarily involve saying what uncertainty is. There is little chance that the vagueness of a word like 'uncertainty', given the nuances of its ordinary, poetic, and technical uses, can be corralled by a single pen, but there do exist established models and formal theories that address definable aspects of uncertainty, and these have enough uses to make them worth looking into.

What is information that a sign may bear it?
Three more questions arise at this juncture:
 * 1) How is a sign empowered to contain information?
 * 2) What is the practical context of communication?
 * 3) Why do we care about these bits of information?

A very rough answer to these questions might begin as follows:

Human beings are initially concerned solely with their own lives, but then a world obtrudes on their subjective existence, and so they find themselves forced to take an interest in the objective realities of its nature.

In pragmatic terms our initial aim, concern, interest, object, or 'pragma' is expressed by the verbal infinitive 'to live', but the infinitive is soon reified into the derivative substantial forms of 'nature', 'reality', 'the world', and so on. Against this backdrop we find ourselves cast as the protagonists on a 'scene of uncertainty'. The situation may be pictured as a juncture from which a manifold of options fan out before us. It may be an issue of truth, duty, or hope, the last codifying a special type of uncertainty as to what regulative principle has any chance of success, but the chief uncertainty is that we are called on to make a choice and find that we all too often have almost no clue as to which of the options is most fit to pick.

Just to make up a discrete example, let us suppose that the cardinality of this choice is a finite n, and just to make it fully concrete let us say that n = 5. Figure 1 affords a rough picture of the situation.

o-o | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` `?` ` `?` ` `?` ` `?` ` `?` ` ` ` ` ` | | ` ` ` ` ` `o` ` `o` ` `o` ` `o` ` `o` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` `o` ` o ` `o` ` o ` `o` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` `o` `o` `o` `o` `o` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` `o` o `o` o `o` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` `o o o o o` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` `ooooo` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` `O` ` ` ` ` ` ` ` `n = 5` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | o-o Figure 1. Juncture of Degree 5

This pictures a juncture, represented by "O", where there are n options for the outcome of a conduct, and we have no clue as to which it must be. In a sense, the degree of this node, in this case n = 5, measures the uncertainty that we have at this point.

This is the minimal sort of setting in which a sign can make any sense at all. A sign has significance for an agent, interpreter, or observer because its actualization, its being given or its being present, serves to reduce the uncertainty of a decision that the agent has to make, whether it concerns the actions that the agent ought to take in order to achieve some objective of interest, or whether it concerns the predicates that the agent ought to treat as being true of some object in the world.

The way that signs enter the scene is shown in Figure 2.

o-o | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` k_1 = 3 ` ` ` `k_2 = 2` ` ` ` ` ` | | ` ` ` ` ` `o-o-o` ` `o-o` ` ` ` ` ` | | ` ` ` ` ` ` ` ` "A" ` ` ` ` ` "B" ` ` ` ` ` ` ` | | ` ` ` ` ` ` `ooo` ` oo` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` `o---o---o` `o---o` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` `o--o--o` o--o` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` `o-o-o o-o` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` `ooooo` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` `O` ` ` ` ` ` ` ` `n = 5` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | o-o Figure 2. Partition of Degrees 3 and 2

This illustrates a situation of uncertainty that has been augmented by a classification.

In the particular pattern of classification that is shown here, the first three outcomes fall under the sign "A", and the next two outcomes fall under the sign "B". If the outcomes make up a set of things that might be true about an object, then the signs could be read as nomens (terms) or notions (concepts) of a relevant empirical, ontological, taxonomical, or theoretical scheme, that is, as predicates and predictions of the outcomes. If the outcomes make up a set of things that might be good to do in order to achieve an objective, then the signs could be read as bits of advice or other sorts of indicators that tell us what to do in the situation, relative to our active goals.

This is the basic framework for talking about information and signs in regard to communication, decision, and the uncertainties thereof.

Just to unpack some of the many things that may be getting glossed over in this little word 'sign', it encompasses all of the 'data of the senses' (DOTS) that we take as informing us about inner and outer worlds, plus all of the concepts and terms that we use to argue, to communicate, to inquire, or even to speculate, both about our ontologies for beings in the worlds and about our policies for action in the world.

Here is one of the places where it is tempting to try to collapse the 3-adic sign relation into a 2-adic relation. For if these DOTS are so closely identified with objects that we can scarcely imagine how they might be discrepant, then it will appear to us that one role of beings can be eliminated from our picture of the world. In this event, the only things that we are required to inform ourselves about, via the inspection of these DOTS, are yet more DOTS, whether past, or present, or prospective, just more DOTS. This is the special form to which we frequently find the idea of an information channel being reduced, namely, to a 'source' that has nothing more to tell us about than its own conceivable conducts or its own potential issues.

As a matter of fact, at least in this discrete type of case, it would be possible to use the degree of the node as a measure of uncertainty, but it would operate as a multiplicative measure rather than the sort of additive measure that we would normally prefer. To illustrate how this would work out, let us consider an easier example, one where the degree of the choice point is 4.

o-o | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` `?` ` `?` ` ` ` ` `?` ` `?` ` ` ` ` ` | | ` ` ` ` ` `o` ` `o` ` ` ` ` `o` ` `o` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` `o` ` o ` ` ` ` o ` `o` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` `o` `o` ` ` `o` `o` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` `o` o ` ` o `o` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` `o o` `o o` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` `oo oo` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` `O` ` ` ` ` ` ` ` `n = 4` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | o-o Figure 3. Juncture of Degree 4

Suppose that we contemplate making another decision after the present issue has been decided, one that has a degree of 2 in every case. The compound situation is depicted in Figure 4.

o-o | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` `o` `o o` `o` ` ` `o` `o o` `o` ` ` ` ` | | ` ` ` ` ` \ / ` \ / ` ` ` ` \ / ` \ / ` ` ` ` ` | | ` ` ` ` ` `o` ` `o` ` ` ` ` `o` ` `o` `n_2 = 2` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` `o` ` o ` ` ` ` o ` `o` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` `o` `o` ` ` `o` `o` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` `o` o ` ` o `o` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` `o o` `o o` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` `oo oo` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` `O` ` ` ` ` ` ` `n_1 = 4` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | o-o Figure 4. Compound Junctures of Degrees 4 and 2

This illustrates the fact that the compound uncertainty, 8, is the product of the two component uncertainties, 4 times 2. To convert this to an additive measure, one simply takes the logarithms to a convenient base, say 2, and thus arrives at the not too astounding fact that the uncertainty of the first choice is 2 bits, the uncertainty of the next choice is 1 bit, and the compound uncertainty is 2 + 1 = 3 bits.

In many ways, the provision of information, a process that reduces uncertainty, is the inverse process to the kind of uncertainty augmentation that occurs in compound decisions. By way of illustrating this relationship, let us return to our initial example.

A set of signs enters on a setup like this as a system of middle terms, a collection of signs that one may regard, aptly enough, as constellating a medium.

o-o | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` k_1 = 3 ` ` ` `k_2 = 2` ` ` ` ` ` | | ` ` ` ` ` `o-o-o` ` `o-o` ` ` ` ` ` | | ` ` ` ` ` ` ` ` "A" ` ` ` ` ` "B" ` ` ` ` ` ` ` | | ` ` ` ` ` ` `ooo` ` oo` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` `o---o---o` `o---o` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` `o--o--o` o--o` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` `o-o-o o-o` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` `ooooo` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | | ` ` ` ` ` ` ` ` ` ` ` `O` ` ` ` ` ` ` ` `n = 5` | | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | o-o Figure 5. Partition of Degrees 3 and 2

The language or medium here is the set of signs {"A", "B"}. On the assumption that the initial 5 outcomes are equally likely, one may associate a frequency distribution (k1, k2) = (3, 2) and thus a probability distribution (p1, p2) = (3/5, 2/5) = (0.6, 0.4) with this language, and thus define a communication channel.

The most important thing here is really just to get a handle on the 'conditions for the possibility of signs making sense', but once we have this much of a setup we find that we can begin to construct some rough and ready bits of information-theoretic furniture, like measures of uncertainty, channel capacity, and the amount of information that can be associated with the reception or the recognition of a single sign. Still, before we get into all of this, it needs to be emphasized that, even when these measures are too ad hoc and insufficient to be of much use per se, the significance of the setup that it takes to support them is not at all diminished.

Consider the classification-augmented or sign-enhanced situation of uncertainty that was depicted above. What happens if one or the other of the two signs, "A" or "B", is observed or received, on the constant assumption that its significance is recognized on receipt?


 * If we receive "A" our uncertainty is reduced from $$\log 5$$ to $$\log 3.$$


 * If we receive "B" our uncertainty is reduced from $$\log 5$$ to $$\log 2.$$

It is from these characteristics that the information capacity of a communication channel can be defined, specifically, as the 'average uncertainty reduction on receiving a sign', a formula with the splendid mnemonic 'AURORAS'.


 * On receiving the message "A", the additive measure of uncertainty is reduced from $$\log 5$$ to $$\log 3$$, so the net reduction is $$(\log 5 - \log 3).$$


 * On receiving the message "B", the additive measure of uncertainty is reduced from $$\log 5$$ to $$\log 2$$, so the net reduction is $$(\log 5 - \log 2).$$

The 'average uncertainty reduction' per sign of the language is computed by taking a weighted average of the reductions that occur in the channel, where the weight of each reduction is the number of options or outcomes that fall under the associated sign.


 * The uncertainty reduction of $$(\log 5 - \log 3)$$ gets a weight of 3.


 * The uncertainty reduction of $$(\log 5 - \log 2)$$ gets a weight of 2.

Finally, the weighted average of these two reductions is:


 * $${1 \over {2 + 3}}(3(\log 5 - \log 3) + 2(\log 5 - \log 2))$$

Extracting the general pattern of this calculation yields the following worksheet for computing the capacity of a 2-symbol channel with frequencies that partition as n = k1 + k2.

In other words, the capacity of this channel is slightly under 1 bit. This makes intuitive sense, since 3 against 2 is a near-even split of 5, and the measure of the channel capacity or the entropy is supposed to attain its maximum of 1 bit whenever a two-way partition is 50-50, that is to say, when it's as uniform a distribution as it can be.