Linkage disequilibrium

Linkage disequilibrium (LD) is a term used in the study of population genetics for the non-random association of alleles at two or more loci, not necessarily on the same chromosome. It is not the same as linkage, which describes the association of two or more loci on a chromosome with limited recombination between them. LD describes a situation in which some combinations of alleles or genetic markers occur more or less frequently in a population than would be expected from a random formation of haplotypes from alleles based on their frequencies.

Linkage disequilibrium is caused by fitness interactions between genes or by such non-adaptive processes as population structure, inbreeding, and stochastic effects. In population genetics, linkage disequilibrium is said to characterize the haplotype distribution at two or more loci. Formally, if we define pairwise LD, we consider indicator variables on alleles at two loci, say $$I_1, I_2$$. We define the LD parameter $$\delta$$ as:


 * $$\delta := \operatorname{cov}(I_1, I_2) = p_1 p_2 - h_{12}.

$$

Here $$p_1, p_2 $$ denote the marginal allele frequencies at the two loci and $$h_{12}$$ denotes the haplotype frequency in the joint distribution of both alleles. Various derivatives of this parameter have been developed. In the genetic literature the wording "two alleles are in LD" usually means to imply $$\delta \ne 0$$. Contrariwise, linkage equilibrium, denotes the case $$\delta = 0$$.

The International HapMap Project enables the study of LD in human populations online. The Ensembl project integrates HapMap data and such from dbSNP in general with other genetic information.

Linkage disequilibrium measure, D
If inspecting the two loci A and B with two alleles each, a two-locus, two-allele model, the following table shall denote the frequencies of each combination:

From there one can determine the frequency of each of the alleles:

if the two loci and the alleles are independent from each other, then one can express the oservation A1B1 as "A1 must be found and B1 must be found". The table above lists the frequencies for $$A_1, p_1$$, and $$B_1, q_1$$, hence the frequency of $$A_1B_1$$, $$x_{11}$$, equals according to the rules of elementary statistics $$x_{11} = p_{1} * q_{1}$$.

A deviation of the observed frequencies from the expected is referred to as the linkage disequilibrium parameter of Lewontin and Kojima (1960) and commonly denoted by a capital D as defined by $$D = x_{11} - p_1q_1$$. It is vividly presented in the following table.

When extending these formula for diploid cells rather than investigating the gametes/haplotypes directly, the laid out principle prevails, the recombination rate between the two loci $$A$$ and $$B$$ must be taken into account, though, which is commonly denoted by the letter $$c$$.

$$D$$ is nice to calculate with but has the disadvantage to depend on the frequency of the alleles inspected. This is evident since frequencies are between 0 and 1. There can be no $$D$$ observed if any locus is 0 or 1 and maximal when frequencies are at 0.5. Lewontin (1964) suggested to normalise D by dividing it with the theoretical maximum for the observed allele frequencies. Thus $$D'=\frac{D}{D_\max}$$.

Another value is the correlation coefficient as also laid out in the initial paragraphs of this page, denoted as $$r^2=\frac{D^2}{p_1p_2q_1q_2}$$. This however is not adjusted to the loci having different allele frequencies. If it was, $$r$$, the square root of $$r^2$$ if given the sign of $$D$$ would be equivalent to $$D'$$