Yule–Simon distribution

In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert Simon. Simon originally called it the Yule distribution.

The probability mass function of the Yule–Simon (&rho;) distribution is


 * $$f(k;\rho) = \rho\,\mathrm{B}(k, \rho+1), \,$$

for integer $$k \geq 1$$ and real $$\rho > 0$$, where $$\mathrm{B}$$ is the beta function. Equivalently the pmf can be written in terms of the falling factorial as



f(k;\rho) = \frac{\rho\,\Gamma(\rho+1)}{(k+\rho)^{\underline{\rho+1}}} , \,$$

where $$\Gamma$$ is the gamma function. Thus, if $$\rho$$ is an integer,



f(k;\rho) = \frac{\rho\,\rho!\,(k-1)!}{(k+\rho)!} . \,$$

The parameter $$\rho$$ can be estimated using a fixed point algorithm.

The probability mass function f has the property that for sufficiently large k we have



f(k;\rho) \approx \frac{\rho\,\Gamma(\rho+1)}{k^{\rho+1}} \propto \frac{1}{k^{\rho+1}} . \,$$

This means that the tail of the Yule–Simon distribution is a realization of Zipf's law: $$f(k;\rho)$$ can be used to model, for example, the relative frequency of the $$k$$th most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of $$k$$.

Occurrence
The Yule–Simon distribution arose originally as the limiting distribution of a particular stochastic process studied by Yule as a model for the distribution of biological taxa and subtaxa. Simon dubbed this process the "Yule process" but it is more commonly known today as a preferential attachment process. The preferential attachment process is an urn process in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number the urn already contains.

The distribution also arises as a continuous mixture of geometric distributions. Specifically, assume that $$W$$ follows an exponential distribution with scale $$1/\rho$$ or rate $$\rho$$:


 * $$W \sim \mathrm{Exponential}(\rho)\,$$
 * $$h(w;\rho) = \rho \, \exp(-\rho\,w)\,$$

Then a Yule–Simon distributed variable $$K$$ has the following geometric distribution:


 * $$K \sim \mathrm{Geometric}(\exp(-W))\,$$

The pmf of a geometric distribution is


 * $$g(k; p) = p \, (1-p)^{k-1}\,$$

for $$k\in\{1,2,\dots\}$$. The Yule–Simon pmf is then the following exponential-geometric mixture distribution:


 * $$f(k;\rho)

= \int_0^{\infty} \,\,\, g(k;\exp(-w))\,h(w;\rho)\,dw \,$$

Generalizations
The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule–Simon(&rho;, &alpha;) distribution is defined as



f(k;\rho,\alpha) = \frac{\rho}{1-\alpha^{\rho}} \; \mathrm{B}_{1-\alpha}(k, \rho+1) , \,$$

with $$0 \leq \alpha < 1$$. For $$\alpha = 0$$ the ordinary Yule–Simon(&rho;) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.