Law of large numbers

The law of large numbers is a fundamental concept in statistics and probability that describes how the average of a randomly selected large sample from a population is likely to be close to the average of the whole population. The term "law of large numbers" was introduced by S.D. Poisson in 1835 as he discussed a 1713 version of it put forth by James Bernoulli.

In formal language: "If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large."

In statistics, this means that when a large number of units of something is measured, the sample's average will be close to the true average of all of the units&mdash;including those that were not measured. (The term "average" means the arithmetic mean.)

There are two versions of the Law of Large Numbers, one called the "weak" law and the other the "strong" law. This article will describe both versions in technical detail, but in essence the two laws do not describe different actual laws but instead refer to different ways of describing the convergence of the sample mean with the population mean. The weak law states that as the sample size grows larger, the difference between the sample mean and the population mean will approach zero. The strong law states that as the sample size grows larger, the probability that the sample mean and the population mean will be exactly equal approaches 1.

The phrase "law of large numbers" is also sometimes used in a less technical way to refer to the principle that the probability of any possible event (even an unlikely one) occurring at least once in a series increases with the number of events in the series. For example, the odds that you will win the lottery are very low; however, the odds that someone will win the lottery are quite good, provided that a large enough number of people purchased lottery tickets.

One misperception of LLN is that if an event has not occurred in many trials, the probability of it occurring in a subsequent trial is increased. For example, the probability of a fair die turning up a 3 is 1 in 6. LLN says that over a large number of throws, the observed frequency of 3s will be close to 1 in 6. This however does not mean that if the first 5 throws of the die do not turn up a 3, the sixth throw will turn up a 3 with certainty (probability 1). The probability for the 6th throw turning up a 3 remains 1 in 6. In an infinite (or very large) set of observations, the value of any one individual observation cannot be predicted based upon past observations. Such predictions are known as the Gambler's Fallacy.

The law of large numbers and the central limit theorem
The central limit theorem (CLT) gives the distribution of sums of identical random variables, regardless of the shape of the distribution of the random variables (as long as the distribution has finite variance), as long as the number of random variables added is large. CLT thus applies to the sample mean of a large sample as the mean is a sum. The variance as given by CLT collapses as the sample size grows larger, it follows that the mean converges to a number (which CLT says is the population mean). This is the LLN. So LLN is a result that can be obtained from the CLT.

CLT enables statisticians evaluate the reliability of their results because they are able to make assumptions about a sample and extrapolate their results or conclusions to the population from which the sample was derived with a certain degree of confidence. See Statistical hypothesis testing.

The remainder of this article will assume the reader has a familiarity with mathematical concepts and notation.

The weak law
The weak law of large numbers states that if X1, X2, X3, ... is an infinite sequence of random variables, where all the random variables have the same expected value μ and variance σ2; and are uncorrelated (i.e., the correlation between any two of them is zero), then the sample average


 * $$\overline{X}_n=(X_1+\cdots+X_n)/n$$

converges in probability to μ. Somewhat less tersely: For any positive number ε, no matter how small, we have


 * $$\lim_{n\rightarrow\infty}\operatorname{P}\left(\left|\overline{X}_n-\mu\right|<\varepsilon\right)=1.$$

Proof
Chebyshev's inequality is used to prove this result. Finite variance $$ \operatorname{Var} (X_i)=\sigma^2 $$ (for all $$i$$) and no correlation yield that

\operatorname{Var}(\overline{X}_n) = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n}. $$

The common mean μ of the sequence is the mean of the sample average:



E(\overline{X}_n) = \mu. $$

Using Chebyshev's inequality on $$\overline{X}_n $$ results in



\operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \leq \frac{\sigma^2}. $$

This may be used to obtain the following:



\operatorname{P}( \left| \overline{X}_n-\mu \right| < \varepsilon) = 1 - \operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \geq 1 - \frac{\sigma^2}{\varepsilon^2 n}. $$

As n approaches infinity, the expression approaches 1.

Proof ends here

The result holds also for the 'infinite variance' case, provided the $$ X_i $$ are mutually independent and their (finite) mean μ exists.

A consequence of the weak law of large numbers is the asymptotic equipartition property.

The strong law
The strong law of large numbers states that if X1, X2, X3, ... is an infinite sequence of random variables that are pairwise independent and identically distributed with E(|Xi|) < ∞  (and where the common expected value is μ), then


 * $$\operatorname{P}\left(\lim_{n\rightarrow\infty}\overline{X}_n=\mu\right)=1,$$

i.e., the sample average converges almost surely to μ.

If we replace the finite expectation condition with a finite second moment condition, E(Xi2) < ∞ (which is the same as assuming that Xi has variance), then we obtain both almost sure convergence and convergence in mean square. In either case, these conditions also imply the consequent weak law of large numbers, since almost sure convergence implies convergence in probability (as, indeed, does convergence in mean square).

This law justifies the intuitive interpretation of the expected value of a random variable as the "long-term average when sampling repeatedly".

A weaker law and proof
Proofs of the above weak and strong laws of large numbers are rather involved. The consequent of the slightly weaker form below is implied by the weak law above (since convergence in distribution is implied by convergence in probability), but has a simpler proof.

Theorem. Let X1, X2, X3, ... be a sequence of random variables, independent and identically distributed with common mean μ < ∞, and define the partial sum Sn := X1 + X2 + ... +Xn. Then, Sn / n converges in distribution to μ.

Proof. (See &#091;1&#093;, p. 174) By Taylor's theorem for complex functions, the characteristic function of any random variable, X, with finite mean μ, can be written as


 * $$\varphi(t) = 1 + it\mu + o(t), \quad t \rightarrow 0.$$

Then, since the characteristic function of the sum of independent random variables is the product of their characteristic functions, the characteristic function of Sn / n  is


 * $$\left[\varphi\left({t \over n}\right)\right]^n = \left[1 + i\mu{t \over n} + o\left({t \over n}\right)\right]^n \, \rightarrow \, e^{it\mu}, \quad \textrm{as} \quad n \rightarrow \infty.$$

The limit eitμ  is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem,  Sn / n converges in distribution to μ. Note that the proof of the central limit theorem, which tells us more about the convergence of the average to μ (when the variance σ2 is finite), follows a very similar approach.