Zipf–Mandelbrot law

In probability theory and statistics, the Zipf–Mandelbrot law is a discrete probability distribution. Also known as the Pareto-Zipf law, it is a power-law distribution on ranked data, named after the linguist George Kingsley Zipf who suggested a simpler distribution called Zipf's law, and the mathematician Benoît Mandelbrot, who subsequently generalized it.

The probability mass function is given by:


 * $$f(k;N,q,s)=\frac{1/(k+q)^s}{H_{N,q,s}}$$

where $$H_{N,q,s}$$ is given by:


 * $$H_{N,q,s}=\sum_{i=1}^N \frac{1}{(i+q)^s}$$

which may be thought of as a generalization of a harmonic number. In the formula, k is the rank of the data, and q and s are parameters of the distribution. In the limit as $$N$$ approaches infinity, this becomes the Hurwitz zeta function $$\zeta(q,s)$$. For finite $$N$$ and $$q=0$$ the Zipf–Mandelbrot law becomes Zipf's law. For infinite $$N$$ and $$q=0$$ it becomes a Zeta distribution.

Applications
The distribution of words ranked by their frequency in a random text corpus is generally a power-law distribution, known as Zipf's law.

If one plots the frequency rank of words contained in a large corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Gelbukh & Sidorov, 2001).

In ecological field studies, the relative abundance distribution (i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a Zipf–Mandelbrot law.

Within music, many metrics of measuring "pleasing" music conform to Zipf–Mandlebrot distributions.