Probability axioms

The probability P of some event E, denoted $$P(E)$$, is defined with respect to a "universe", or sample space $$\Omega$$, of all possible elementary events in such a way that P must satisfy the Kolmogorov axioms.

Alternatively, a probability can be interpreted as a measure on a &sigma;-algebra of subsets of the sample space, those subsets being the events, such that the measure of the whole set equals 1. This property is important, since it gives rise to the natural concept of conditional probability. Every set $$A$$ with non-zero probability (that is, P(A)> 0 ) defines another probability


 * $$P(B \vert A) = {P(B \cap A) \over P(A)}$$

on the space. This is usually read as "probability of B given A". If the conditional probability of B given A is the same as the probability of B, then A and B are said to be independent.

In the case that the sample space is finite or countably infinite, a probability function can also be defined by its values on the elementary events $$\{e_1\}, \{e_2\}, ...$$ where $$\Omega = \{\,e_1, e_2, \dots\,\}.\,$$

Kolmogorov axioms
The following three axioms are known as the Kolmogorov axioms, after Andrey Kolmogorov who developed them. We have an underlying set &Omega;, a sigma-algebra F of subsets of &Omega;, and a function P assigning real numbers to members of F. The members of F are those subsets of &Omega; that are called "events".

First axiom

 * For any set $$E\in F$$, that is, for any event $$E$$, we have $$P(E)\geq 0$$.

That is, the probability of an event is a non-negative real number.

Second axiom

 * $$P(\Omega) = 1.\,$$

That is, the probability that some elementary event in the entire sample set will occur is 1. More specifically, there are no elementary events outside the sample set.

This is often overlooked in some mistaken probability calculations; if you cannot precisely define the whole sample set, then the probability of any subset cannot be defined either.

Third axiom

 * Any countable sequence of pairwise disjoint events $$E_1, E_2, ...$$ satisfies $$P(E_1 \cup E_2 \cup \cdots) = \sum P(E_i)$$.

That is, the probability of an event set which is the union of other disjoint subsets is the sum of the probabilities of those subsets. This is called &sigma;-additivity. If there is any overlap among the subsets this relation does not hold. Some authors consider merely finitely-additive probability spaces, in which case one just needs an algebra of sets, rather than a &sigma;-algebra.

For an algebraic alternative to Kolmogorov's approach, see algebra of random variables.

Lemmas in probability
From the Kolmogorov axioms one can deduce other useful rules for calculating probabilities:


 * $$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

This is called the addition law of probability, or the sum rule. That is, the probability that A or B will happen is the sum of the probabilities that A will happen and that B will happen, minus the probability that A and B will happen. This can be extended to the inclusion-exclusion principle.


 * $$P(\Omega\setminus E) = 1 - P(E)$$

That is, the probability that any event will not happen is 1 minus the probability that it will.

Using conditional probability as defined above, it also follows immediately that


 * $$P(A \cap B) = P(A) \cdot P(B \vert A)$$

That is, the probability that A and B will happen is the probability that A will happen, times the probability that B will happen given that A happened; this relationship gives Bayes' theorem. It then follows that


 * A and B are independent if and only if $$P(A \cap B) = P(A) \cdot P(B)$$.