Neyman–Pearson lemma

In statistics, the Neyman-Pearson lemma states that when performing a hypothesis test between two point hypotheses H0: θ = θ0 and H1: θ = θ1, then the likelihood-ratio test which rejects H0 in favour of H1 when


 * $$\Lambda(x)=\frac{ L( \theta _{0} \mid x)}{ L (\theta _{1} \mid x)} \leq \eta \text{ where } P(\Lambda(X)\leq \eta|H_0)=\alpha $$

is the most powerful test of size &alpha; for a threshold η. If the test is most powerful for all $$\theta_1 \in \Theta_1$$, it is said to be uniformly most powerful (UMP) for alternatives in the set $$\Theta_1 \, $$.

It is named for Jerzy Neyman and Egon Pearson.

In practice, the likelihood ratio is often used directly to construct tests &mdash; see Likelihood-ratio test. However it can also be used to suggest particular test-statistics that might be of interest or to suggest simplified tests &mdash; for this one considers algebraic manipulation of the ratio to see if there are key statistics in it is related to the size of the ratio (i.e. whether a large statistic corresponds to a small ratio or to a large one).

Proof
Define the rejection region of the null hypothesis for the NP test as
 * $$R_{NP}=\left\{ X: \frac{L(\theta_{0},X)}{L(\theta_{1},X)} \leq \eta\right\} .$$

Any other test will have a different rejection region that we define as $$R_A$$. Furthermore define the function of region, and parameter


 * $$P(R,\theta)=\int_R L(\theta|x)\, dx, $$

where this is the probability of the data falling in region R, given parameter $$\theta$$.

For both tests to have significance level $$\alpha$$, it must be true that


 * $$\alpha= P(R_{NP}, \theta_0)=P(R_A, \theta_0) \,.$$

However it is useful to break these down into integrals over distinct regions, given by


 * $$P(R_{NP} \cap R_A, \theta) + P(R_{NP} \cap R_A^c, \theta) =

P(R_{NP},\theta) ,$$

and


 * $$ P(R_{NP} \cap R_A, \theta) + P(R_{NP}^c \cap R_A, \theta) = P(R_A,\theta).$$

Setting $$\theta=\theta_0$$ and equating the above two expression, yields that


 * $$P(R_{NP} \cap R_A^c, \theta_0) = P(R_{NP}^c \cap R_A, \theta_0).$$

Comparing the powers of the two tests, which are $$P(R_{NP},\theta_1)$$ and $$P(R_A,\theta_1)$$, one can see that


 * $$P(R_{NP},\theta_1) \geq P(R_A,\theta_1) \text{ if, and only if, }

P(R_{NP} \cap R_A^c, \theta_1) \geq P(R_{NP}^c \cap R_A, \theta_1). $$

Now by the definition of $$R_{NP}$$ ,


 * $$ P(R_{NP} \cap R_A^c, \theta_1)= \int_{R_{NP}\cap R_A^c} L(\theta_{1}|x)\,dx \geq \frac{1}{\eta} \int_{R_{NP}\cap R_A^c} L(\theta_0|x)\,dx = \frac{1}{\eta}P(R_{NP} \cap R_A^c, \theta_0)$$
 * $$ = \frac{1}{\eta}P(R_{NP}^c \cap R_A, \theta_0) = \frac{1}{\eta}\int_{R_{NP}^c \cap R_A} L(\theta_{0}|x)\,dx \geq \int_{R_{NP}^c\cap R_A} L(\theta_{1}|x)dx = P(R_{NP}^c \cap R_A, \theta_1).$$

Hence the inequality holds.

Example
Let $$X_1,\dots,X_n$$ be a random sample from the $$\mathcal{N}(\mu,\sigma^2)$$ distribution where the mean $$\mu$$ is known, and suppose that we wish to test for $$H_0:\sigma^2=\sigma_0^2$$ against $$H_1:\sigma^2=\sigma_1^2$$. The likelihood for this set of normally distributed data is


 * $$L\left(\sigma^2;\mathbf{x}\right)\propto \left(\sigma^2\right)^{-n/2} \exp\left\{-\frac{\sum_{i=1}^n \left(x_i-\mu\right)^2}{2\sigma^2}\right\}.$$

We can compute the likelihood ratio to find the key statistic in this test and its effect on the test's outcome:


 * $$\Lambda(\mathbf{x}) = \frac{L\left(\sigma_1^2;\mathbf{x}\right)}{L\left(\sigma_0^2;\mathbf{x}\right)} =

\left(\frac{\sigma_1^2}{\sigma_0^2}\right)^{-n/2}\exp\left\{-\frac{1}{2}(\sigma_1^{-2}-\sigma_0^{-2})\sum_{i=1}^n \left(x_i-\mu\right)^2\right\}.$$

This ratio only depends on the data through $$\sum_{i=1}^n \left(x_i-\mu\right)^2$$. Therefore, by the Neyman-Pearson lemma, the most powerful test of this type of hypothesis for this data will depend only on $$\sum_{i=1}^n \left(x_i-\mu\right)^2$$. Also, by inspection, we can see that if $$\sigma_1^2>\sigma_0^2$$, then $$\Lambda(\mathbf{x})$$ is an increasing function of $$\sum_{i=1}^n \left(x_i-\mu\right)^2$$. So we should reject $$H_0$$ if $$\sum_{i=1}^n \left(x_i-\mu\right)^2$$ is sufficiently large. The rejection threshold depends on the size of the test.