Numerals

A numeral is a symbol or group of symbols that represents a number. Numerals differ from numbers just as words differ from the things they refer to. The symbols "11", "eleven" and "XI" are different numerals, all representing the same number. This article attempts to explain the various systems of numerals. See also number names. A numeral system (or system of numeration) is a framework where a set of numbers are represented by numerals in a consistent manner. It can be seen as the context that allows the numeral "11" to be interpreted as the binary numeral for three, the decimal numeral for eleven, or other numbers in a different bases.

Ideally, a numeral system will:


 * Represent a useful set of numbers (e.g. all whole numbers, integers, or real numbers)
 * Give every number represented a unique representation (or at least a standard representation)
 * Reflect the algebraic and arithmetic structure of the numbers.

For example, the usual decimal representation of whole numbers gives every whole number a unique representation as a finite sequence of digits, with the operations of arithmetic (addition, subtraction, multiplication and division) being present as the standard algorithms of arithmetic. However, when decimal representation is used for the rational or real numbers, the representation is no longer unique: many rational numbers have two numerals, a standard one that terminates, such as 2.31, and another that recurs, such as 2.30999999... . Numerals which terminate have no non-zero digits after a given position. For example, numerals like 2.31 and 2.310 are taken to be the same, except in the natural sciences where differing precision is denoted.

Numeral systems are sometimes called number systems, but that name is misleading: different systems of numbers, such as the system of real numbers, the system of complex numbers, the system of p-adic numbers, etc., are not the topic of this article.

Types of numeral systems
The simplest numeral system is the unary numeral system, in which every natural number is represented by a corresponding number of symbols. If the symbol &#8242; is chosen, for example, then the number seven would be represented by &#8242;&#8242;&#8242;&#8242;&#8242;&#8242;&#8242;. The unary system is normally only useful for small numbers. It has some uses in theoretical computer science. Elias gamma coding is commonly used in data compression; it includes a unary part and a binary part.

The unary notation can be abbreviated by introducing different symbols for certain new values. Very commonly, these values are powers of 10; so for instance, if &#8242; stands for one, - for ten and + for 100, then the number 304 can be compactly represented as +++ &#8242;&#8242;&#8242;&#8242; and number 123 as + -- &#8242;&#8242;&#8242;. The ancient Egyptian system is of this type, and the Roman system is a modification of this idea.

More useful still are systems which employ special abbreviations for repetitions of symbols; for example, using the first nine letters of our alphabet for these abbreviations, with A standing for "one occurrence", B "two occurrences", and so on, we could then write C+ D&#8242; for the number 304. The numeral system of English is of this type ("three hundred [and] four"), as are those of virtually all other spoken languages, regardless of what written systems they have adopted.

More elegant is a positional system: again working in base 10, we use ten different digits 0, ..., 9 and use the position of a digit to signify the power of ten that the digit is to be multiplied with, as in 304 = 3&times;100 + 0&times;10 + 4. Note that zero, which is not needed in the other systems, is of crucial importance here, in order to be able to "skip" a power. The Hindu-Arabic numeral system, borrowed from India, is a positional base 10 system; it is used today throughout the world.

Arithmetic is much easier in positional systems than in the earlier additive ones; furthermore, additive systems have a need for a potentially infinite number of different symbols for the different powers of 10; positional systems need only 10 different symbols (assuming that it uses base 10).

In certain areas of computer science, a modified base-k positional system is used, called bijective numeration, with digits 1, 2, ..., k (k &ge; 1), and zero being represented by the empty string. This establishes a bijection between the set of all such digit-strings and the set of non-negative integers, avoiding the non-uniqueness caused by leading zeros. Bijective base-k numeration is also called k-adic notation, not to be confused with p-adic numbers. Bijective base-1 is the same as unary.

History

 * See also History of natural numbers and the status of zero.

Tallies carved from wood and stone have been used since prehistoric times. Stone age cultures, including ancient American Indian groups, used tallies for gambling with horses, slaves, personal services and trade-goods.

The earliest known written tallies appear in the ruins of the Sumerian empire, using clay tablets impressed with a sharp stick and baked. The Sumerians had quite an exotic system based on counts to 60, used in astronomical and other calculations. This system was imported and used by every Mediterranean nation that used astronomy, including the Greeks, Romans and Egyptians. We still use it to count time (minutes per hour), and angle (degrees).

In China, armies and provisions were counted using modular tallies of prime numbers. Unique numbers of troops and measures of rice appear as unique combinations of these tallies. A great convenience of modular arithmetic is that it is easy to multiply, though quite difficult to add. This makes use of modular arithmetic for provisions especially attractive. Conventional tallies are quite difficult to multiply and divide. In modern times modular arithmetic is sometimes used in Digital signal processing.

The Roman empire used tallies written on wax, papyrus and stone, and roughly followed the Greek custom of assigning letters to various numbers. The Roman system remained in common use in Europe until positional notation came into common use in the 1500s.

The Maya of Central America used a base 20/base 18 system, possibly inherited from the Olmec, including advanced features such as positional notation and a zero. They used this to do advanced astronomical calculations, including highly accurate calculations of the length of the solar year and the orbit of Venus.

The Incan Empire ran a large command economy using quipu, tallies made by knotting colored fibers. Knowledge of the encodings of the knots and colors was suppressed by the Spanish conquistadors in the 16th century, and has not survived although simple quipu-like recording devices are still used in the Andean region.

Some authorities believe that positional arithmetic began with the wide use of the abacus in China. The earliest written positional records seem to be tallies of abacus results in China around 400. In particular, zero was correctly described by Chinese mathematicians around 932, and seems to have originated as a circle of a place empty of beads.

In India, recognizably modern positional numeral systems, passed to the Arabians, probably along with the astronomical tables, were brought to Baghdad by an Indian ambassador around 773. For greater discussion of numeral systems from India, see Hindu-Arabic numerals and Indian numerals.

From India, the thriving trade between Islamic sultans and Africa carried the concept to Cairo. Arabic mathematicians extended the system to decimal fractions, and al-Khwarizmi wrote an important work about it in the 9th century. The system was introduced to Europe with the translation of this work in the 12th century in Spain and Leonardo of Pisa's Liber Abaci of 1201. In Europe, the complete Indian system with the zero was derived from the Arabs in the 12th century.

The binary system (base 2), propagated in the 17th century by Gottfried Leibniz who had heard about it from China, came in common use in the 20th century because of computer applications.

Bases used
The base-10 system is the one most commonly used today. It is assumed to have originated because humans have ten fingers.

The Maya civilization and other civilizations of Pre-Columbian Mesoamerica used base 20, possibly originating from the number of a person's fingers and toes.

A base-eight system was devised by the Yuki of Northern California, who used the spaces between the fingers to count. There is also linguistic evidence which suggests that the Bronze Age Proto-Indo Europeans (from whom most European and Indic languages descend) might have replaced a base 8 system (or a system which could only count up to 8) with a base 10 system. The evidence is that the word for 9, newm, is suggested by some to derive from the word for 'new', newo-, suggesting that the number 9 had been recently invented and called the 'new number' (Mallory & Adams 1997).

Base-12 systems were popular because multiplication and division are easier than in base-10, with addition just as easy. 12 is a useful base because it has many factors. It is the smallest divisor of one through four. We still have a special word for "dozen" and use 12 hours for every night and day.

Base 60 was used by the Sumerians and their successors in Mesopotamia and survives today in our system of time (hence the division of an hour into 60 minutes and a minute into 60 seconds) and in our system of angular measure (a degree is divided into 60 minutes and a minute is divided into 60 seconds). 60 also has a large number of factors, including the first six counting numbers. It is the smallest divisor of one through five.

The Nenets language once used a base 9 system, but has since shifted to decimal under the influence of Russian. The word yúq originally meant 9, but took the value 10 on account of Russian influence; so in current Nenets the word for 9 is xasu-yúq, lit. 'Nenets yúq ', whereas 10 is simply yúq, but in Eastern dialects also lúca-yúq, lit. 'Russian yúq '.

Switches, mimicked in their electronic successors built of vacuum tubes, have only two possible states: "open" and "closed". Substituting open=1 and closed=0 (or the other way around) yields the entire set of binary digits. This binary system is the basis for digital computers. It is used to perform integer arithmetic in almost all digital computers, with exceptions in the exotic base-3 and base-10 designs that were discarded early in the history of computing hardware. Modern computers use transistors that have binary state as either high or low voltages. A computer does not treat all of its data as numerical. For instance, some of it may be treated as program instructions or data such as text. However, arithmetic and Boolean logic constitute a great part of operation. Real numbers, allowing fractional values, are usually approximated as floating point numbers which have different methods of arithmetic from integers.

Positional systems in detail
Also see Positional notation.

In a positional base-b numeral system (with b a positive natural number known as the radix), b basic symbols (or digits) corresponding to the first b natural numbers including zero are used. To generate the rest of the numerals, the position of the symbol in the figure is used. The symbol in the last position has its own value, and as it moves to the left its value is multiplied by b.

For example, in the decimal system (base 10), the numeral 4327 means (4&times;103) + (3&times;102) + (2&times;101) + (7&times;100), noting that 100 = 1.

In general, if b is the base, we write a number in the numeral system of base b by expressing it in the form a1bk + a2bk-1 + a3bk-2 + ... + ak+1b0 and writing the digits a1a2a3 ... ak+1 in order. The digits are natural numbers between 0 and b-1, inclusive.

If a text (such as this one) discusses multiple bases, and if ambiguity exists, the base (itself represented in base 10) is added in subscript to the right of the number, like this: numberbase. Unless specified by context, numbers without subscript are considered to be decimal.

By using a dot to divide the digits into two groups, one can also write fractions in the positional system. For example, the base-2 numeral 10.11 denotes 1&times;21+ 0&times;20 +1&times;2-1 +1&times;2-2 = 2.75.

In general, numbers in the base b system are of the form:



(a_na_{n-1}...a_1a_0.c_1c_2c_3...)_b = \sum_{k=0}^n a_kb^k + \sum_{k=1}^\infty c_kb^{-k} $$

The numbers bk and b-k are the weights of the corresponding digits.

Note that a number has a terminating or repeating expansion if and only if it is rational; this does not depend on the base. A number that terminates in one base may repeat in another (thus 0.310 = 0.0100110011001...2). An irrational number stays unperiodic (infinite amount of unrepeating digits) in all integral bases. Thus, for example in base 2, &pi; = 3.1415926...10 can be written down as the unperiodic 11.001001000011111...2.

If b=p is a prime number, one can define base-p numerals whose expansion to the left never stops; these are called the p-adic numbers.

Change of radix
A simple algorithm for converting integers between positive-integer radices is repeated division by the target radix; the remainders give the "digits" starting at the least significant. E.g., 1020304 base 10 into base 7: 1020304 / 7 = 145757 r 5 145757 / 7 = 20822 r 3 20822 / 7 =  2974 r 4 2974 / 7 =   424 r 6 424 / 7 =    60 r 4 60 / 7 =     8 r 4 8 / 7 =     1 r 1 1 / 7 =     0 r 1   => 11446435

E.g., 10110111 base 2 into base 5: 10110111 / 101 = 100100 r 11 (3) 100100 / 101 =   111 r  1  (1) 111 / 101 =     1 r 10  (2) 1 / 101 =     0 r  1  (1)  => 1213

To convert a "decimal" fraction, do repeated multiplication, taking the protruding integer parts as the "digits". Unfortunately a terminating fraction in one base may not terminate in another. E.g., 0.1A4C base 16 into base 9: 0.1A4C × 9 = 0.ECAC 0.ECAC × 9 = 8.520C 0.520C × 9 = 2.E26C 0.E26C × 9 = 7.F5CC 0.F5CC × 9 = 8.A42C 0.A42C × 9 = 5.C58C => 0.082785...

Generalized variable-length integers
More general is using a notation (here written little-endian) like a0a1a2 for a0 + a1b1 + a2b1b2, etc.

This is used in punycode, one aspect of which is the representation of a sequence of non-negative integers of arbitrary size in the form of a sequence without delimiters, of "digits" from a collection of 36: a-z and 0-9, representing 0-25 and 26-35 respectively. A digit lower than a threshold value marks that it is the most-significant digit, hence the end of the number. The threshold value depends on the position in the number. For example, if the threshold value for the first digit is b (1) then a (0) marks the end of the number (it has just one digit), so in numbers of more than one digit the range is only b-9 (1-35), therefore the weight b1 is 35 instead of 36. Suppose the threshold values for the second and third digit are c (2), then the third digit has a weight 34 &times; 35 = 1190 and we have the following sequence:

a (0), ba (1), ca (2), .., 9a (35), bb (36), cb (37), .., 9b (70), bca (71), .., 99a (1260), bcb (1261), etc.

Note that unlike a regular base-35 numeral system, we have numbers like 9b where 9 and b each represent 35; yet the representation is unique because ac and aca are not allowed.

The flexibility in choosing threshold values allows optimization depending on the frequency of occurrence of numbers of various sizes.

The case with all threshold values equal to 1 corresponds to bijective numeration, where the zeros correspond to separators of numbers with digits which are nonzero.

Reference

 * Georges Ifrah. The Universal History of Numbers : From Prehistory to the Invention of the Computer, Wiley, 1999. ISBN 0471375683
 * D. Knuth. The Art of Computer Programming. Volume 2, 3rd Ed. Addison-Wesley. pp.194–213, "Positional Number Systems".
 * J.P. Mallory and D.Q. Adams, Encyclopedia of Indo-European Culture, Fitzroy Dearborn Publishers, London and Chicago, 1997.