Type II errors

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory

See also:Type I and type II errors for more context

In statistics, a Type II error , also called a false negative or miss, exists when a test incorrectly reports that a result was not detected, when it was really present.

Detection algorithms of all kinds often create misses. For example, if a radar does not detect an enemy air plane when an enemy air plane is present within the radar scanned area, that is a false negative.

False negative rate[]

The false negative rate is the proportion of positive instances that were erroneously reported as negative. It is equal to 1 minus the sensitivity of the test.

{\rm false\ negative\ rate} = \frac{\rm number\ of\ false\ negatives}{\rm number\ of\ positives}

In statistical hypothesis testing, this fraction is given the symbol β, and $1-\beta$ is defined as the power of the test. Increasing the sensitivity of the test lowers the probability of Type II errors, but raises the probability of Type I errors (false positives that reject the null hypothesis when it is true).

When developing detection algorithms or tests, a balance must be chosen between risks of false negatives and false positives. Usually there is a threshold of how close a match to a given sample must be achieved before the algorithm reports a match. The higher this threshold, the more false negatives and the fewer false positives.

Medical testing[]

False negatives are a significant issue in medical testing. In some cases, there are two or more (often many) tests that could be used, one of which is simpler and less expensive, but less accurate, than the other. For example, the simplest tests for HIV and hepatitis in blood have a significant rate of false positives. These tests are used to screen out possible blood donors, but more expensive and more precise tests are used in medical practice, to determine whether a person is actually infected with these diseases.

False negatives in medical testing provide false, incorrect reassurance to both patients and physicians that patients are free of disease which is actually present. This in turn leads to people receiving inappropriate understanding and a lack of better advice and treatment to better protect their interests. A common example is relying on cardiac stress tests to detect coronary atherosclerosis, even though cardiac stress tests are known to only detect limitations of coronary artery blood flow due to advanced stenosis.

False negatives produce serious and counterintuitive problems, especially when the condition being searched for is common. If a test with a false negative rate of only 10%, is used to test a population with a true occurrence rate of 70%, many of the "negatives" detected by the test will be falsely incorrect. (See Bayes' theorem below.)

Biometrics[]

False negatives are also a problem in biometric scans, such as retina scans or facial recognition, when the scanner incorrectly identifies someone as not matching a known person, when in actually, it is the same person whose scan was in the system.

Bayes' theorem[]

The probability that an observed negative result is a false negative versus a true negative may be calculated (and the problem of false negatives demonstrated) using Bayes' theorem. The key concept of Bayes' theorem is that the true rates of false positives and false negatives are not a function of the accuracy of the test alone, but also the actual rate within the population. Often, the more powerful issue is the actual rates of the condition within the sample being tested.

False negatives and Anti-Spam[]

The term False negative is also used when Spam email is not detected as such but rather classified as good email. A low number of false negatives is an indicator for the efficiency of Spam filtering methods.

References[]

Aguinis, H., Sturman, M. C., & Pierce, C. A. (2008). Comparison of three meta-analytic procedures for estimating moderating effects of categorical variables: Organizational Research Methods Vol 11(1) Jan 2008, 9-34.
Austin, E. (2004). Review of How to design and report experiments: British Journal of Mathematical and Statistical Psychology Vol 57(2) Nov 2004, 380-381.
Baer, D. M. (1977). Perhaps it would be better not to know everything: Journal of Applied Behavior Analysis Vol 10(1) Spr 1977, 167-172.
Barbaree, H. (1997). Evaluating treatment efficacy with sexual offenders: The insensitivity of recidivism studies to treatment effects: Sexual Abuse: Journal of Research and Treatment Vol 9(2) Apr 1997, 111-128.
Barcelona, R. J. (1993). Type I and II error rates for a traditional and new approach to validity generalization: Dissertation Abstracts International.
Beckstead, J. W. (2007). A note on determining the number of cues used in judgment analysis studies: The issue of type II error: Judgment and Decision Making Vol 2(5) Oct 2007, 317-325.
Belknap, J. K., Mitchell, S. R., O'Toole, L. A., & Helms, M. L. (1996). Type I and Type II error rates for quantitative trait loci (QTL) mapping studies using recombinant inbred mouse strains: Behavior Genetics Vol 26(2) Mar 1996, 149-160.
Bengston, W. F. (2007). Commentary: A method used to train skeptical volunteers to heal in an experimental setting: Journal of Alternative and Complementary Medicine Vol 13(3) Apr 2007, 329-331.
Bengston, W. F., & Moga, M. (2007). Resonance, placebo effects, and type II errors: Some implications from healing research for experimental methods: Journal of Alternative and Complementary Medicine Vol 13(3) Apr 2007, 317-327.
Botella, J. (2002). Power of alternative tests for two paired samples with missing data: Psicothema Vol 14(1) Feb 2002, 174-180.
Bush, L. K., Hess, U., & Wolford, G. (1993). Transformations for within-subject designs: A Monte Carlo investigation: Psychological Bulletin Vol 113(3) May 1993, 566-579.
Cahan, S., & Cohen, N. (1988). Significance testing of subtest score differences: The case of nonsignificant results: Journal of Psychoeducational Assessment Vol 6(2) Jun 1988, 107-117.
Callender, J. C., & Osburn, H. G. (1988). Unbiased estimation of sampling variance of correlations: Journal of Applied Psychology Vol 73(2) May 1988, 312-315.
Cangelosi, J. S., & Jesunathadas, J. (1986). The common misinterpretation of statistical insignificance: College Student Journal Vol 20(1) Spr 1986, 115-120.
Choi, T., & Schervish, M. J. (2007). On posterior consistency in nonparametric regression problems: Journal of Multivariate Analysis Vol 98(10) Nov 2007, 1969-1987.
Cicchetti, D. V. (1974). Reply to Keselman concerning Cicchetti's interpretation of the findings of Petrinovich and Hardyck: Psychological Bulletin Vol 81(11) Nov 1974, 896-897.
Clement, T. H. (1975). Multiple comparison of means after analysis of covariance: Dissertation Abstracts International.
Cohen, P. (1982). To be or not to be: Control and balancing of Type I and Type II errors: Evaluation and Program Planning Vol 5(3) 1982, 247-253.
Cotton, J. W. (1975). Review of A first reader in statistics. 2nd Ed: PsycCRITIQUES Vol 20 (5), May, 1975.
Crosby, R. A. (1998). Condom use as a dependent variable: Measurement issues relevant to HIV prevention programs: AIDS Education and Prevention Vol 10(6) Dec 1998, 548-557.
Davis, C., & Gaito, J. (1984). Multiple comparison procedures within experimental research: Canadian Psychology/Psychologie Canadienne Vol 25(1) Jan 1984, 1-13.
Dohan, F. C. (1981). Schizophrenia, celiac disease, gluten antibodies, and the importance of beta: Biological Psychiatry Vol 16(11) Nov 1981, 1115-1117.
Dyer, F. J. (1998). Avoiding Type II Error in assessing lead toxicity plaintiffs: Behavioral Sciences & the Law Vol 16(1) Win 1998, 131-145.
Edlund, M. J., Overall, J. E., & Rhoades, H. M. (1985). Beta, or Type II error in psychiatric controlled clinical trials: Journal of Psychiatric Research Vol 19(4) 1985, 563-567.
Erdfelder, E. (1985). Proof of alternative hypotheses: Notes on Pieter Koele's comments: Zeitschrift fur Sozialpsychologie Vol 16(1) 1985, 59-62.
Fidalgo, A. M., Ferreres, D., & Muniz, J. (2004). Liberal and Conservative Differential Item Functioning Detection Using Mantel-Haenszel and SIBTEST: Implications for Type I and Type II Error Rates: Journal of Experimental Education Vol 73(1) Fal 2004, 23-39.
Fidalgo, A. M., Ferreres, D., & Muniz, J. (2004). Utility of the Mantel-Haenszel Procedure for Detecting Differential Item Functioning in Small Samples: Educational and Psychological Measurement Vol 64(6) Dec 2004, 925-936.
Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error: Journal of Marketing Research Vol 18(1) Feb 1981, 39-50.
Fouladi, R. T. (2000). Performance of modified test statistics in covariance and correlation structure analysis under conditions of multivariate nonnormality: Structural Equation Modeling Vol 7(3) 2000, 356-410.
Gaito, J. (1961). Repeated measurements designs and counterbalancing: Psychological Bulletin Vol 58(1) Jan 1961, 46-54.
Gaito, J., & Davis, C. (1985). Response to T. A. Ryan's comments: Canadian Psychology/Psychologie Canadienne Vol 26(1) Jan 1985, 78-79.
Ganzach, Y. (1998). Nonlinearity, multicollinearity and the probability of type II error in detecting interaction: Journal of Management Vol 24(5) 1998, 615-622.
Green, S. B. (1982). Establishing behavioral correlates: The MMPI as a case study: Applied Psychological Measurement Vol 6(2) Spr 1982, 219-224.
Gregoire, T. G., & Driver, B. L. (1987). Type II errors in leisure research: Journal of Leisure Research Vol 19(4) 1987, 261-272.
Hartmann, D. P., & Gardner, W. (1979). On the not so recent invention of interobserver reliability statistics: A commentary on two articles by Birkimer and Brown: Journal of Applied Behavior Analysis Vol 12(4) Win 1979, 559-560.
Hawley, J. F. (1980). An empirical study of Type I and Type II error control of selected tests for related correlation coefficients: Dissertation Abstracts International.
Janicak, P. G., Viana, M., Dowd, S. M., Martis, B., Beedle, D., Alam, D., et al. (2002). Reply to comment on "Repetitive transcranial magnetic stimulation versus electroconvulsive therapy for major depression: Preliminary results of a randomized trial." Biological Psychiatry Vol 52(10) Nov 2002, 1033.
Kellner, C. H., Husain, M., Petrides, G., Fink, M., & Rummans, T. (2002). Comment on "Repetitive transcranial magnetic stimulation versus electroconvulsive therapy for major depression: Preliminary results of a randomized trial." Biological Psychiatry Vol 52(10) Nov 2002, 1032-1033.
Keren, G., & Lewis, C. (1994). The two fallacies of gamblers: Type I and Type II: Organizational Behavior and Human Decision Processes Vol 60(1) Oct 1994, 75-89.
Keselman, H. J. (1976). A power investigation of the Tukey multiple comparison statistic: Educational and Psychological Measurement Vol 36(1) Spr 1976, 97-104.
Keselman, H. J., Games, P. A., & Rogan, J. C. (1980). Type I and Type II errors in simultaneous and two-stage multiple comparison procedures: Psychological Bulletin Vol 88(2) Sep 1980, 356-358.
Keselman, H. J., Games, P. A., & Rogan, J. C. (1981). Correction to Keselman, Games, and Rogan: Psychological Bulletin Vol 90(1) Jul 1981, 20.
Koele, P. (1985). The simultaneous control of type I and type II errors in statistical hypothesis testing: Zeitschrift fur Sozialpsychologie Vol 16(1) 1985, 56-58.
Kolm, G. P. (1981). An empirical investigation of potential problems in the analysis of developmental data: Dissertation Abstracts International.
Krauskopf, C. J. (1991). Pattern analysis and statistical power: Psychological Assessment: A Journal of Consulting and Clinical Psychology Vol 3(2) Jun 1991, 261-264.
Kubinger, K. D. (2006). Editorial: Psychology Science Vol 48(4) 2006, 403-404.
Landa, B. K. (1981). Alternative solutions to the Behrens-Fisher problem: An emperical study of Type I and Type II errors: Dissertation Abstracts International.
Leon, A. C. (2004). Multiplicity-Adjusted Sample Size Requirements: A Strategy to Maintain Statistical Power With Bonferroni Adjustments: Journal of Clinical Psychiatry Vol 65(11) Nov 2004, 1511-1514.
Ley, R. (1979). F curves have two tails but the F test is a one-tailed two-tailed test: Journal of Behavior Therapy and Experimental Psychiatry Vol 10(3) Sep 1979, 207-212.
Marler, M. R. (1981). Sampling variability and other sources of error: A reply to Coleman: Journal of Verbal Learning & Verbal Behavior Vol 20(2) Apr 1981, 248-249.
Maroco, J. (2007). Consistency and efficiency of ordinary least squares, maximum likelihood, and three type II linear regression models: A Monte Carlo simulation study: Methodology: European Journal of Research Methods for the Behavioral and Social Sciences Vol 3(2) 2007, 81-88.
Maxwell, S. E. (1980). Pairwise multiple comparisons in repeated measures designs: Journal of Educational Statistics Vol 5(3) Fal 1980, 269-287.
Meca, J. S., & Martinez, F. M. (1997). Meta-analysis of Monte Carlo simulations: Revista de Psicologia Universitas Tarraconensis Vol 19(1) 1997, 29-51.
Milligan, G. W. (1980). Factors that affect Type I and Type II error rates in the analysis of multidimensional contingency tables: Psychological Bulletin Vol 87(2) Mar 1980, 238-244.
Morris, J. H., Sherman, J. D., & Mansfield, E. R. (1986). Failures to detect moderating effects with ordinary least squares-moderated multiple regression: Some reasons and a remedy: Psychological Bulletin Vol 99(2) Mar 1986, 282-288.
Nakagawa, S., & Foster, T. M. (2004). The case against retrospective statistical power analyses with an introduction to power analysis: Acta Ethologica Vol 7(2) Dec 2004, 103-108.
Naumann, E., Huber, C., Maier, S., Plihal, W., & et al. (1992). The scalp topography of P300 in the visual and auditory modalities: A comparison of three normalization methods and the control of statistical type II error: Electroencephalography & Clinical Neurophysiology Vol 83(4) Oct 1992, 254-264.
Orme, J. G., & Combs-Orme, T. D. (1986). Statistical power and Type II errors in social work research: Social Work Research & Abstracts Vol 22(3) Fal 1986, 3-10.
Ottenbacher, K. (1982). Statistical power and research in occupational therapy: Occupational Therapy Journal of Research Vol 2(1) Jan 1982, 13-25.
Pursell, E. D., Dossett, D. L., & Latham, G. P. (1980). Obtaining valid predictors by minimizing rating errors in the criterion: Personnel Psychology Vol 33(1) Spr 1980, 91-96.
Radelet, M. L., & Bedau, H. A. (1988). Fallibility and finality: Type II errors and capital punishment. Thousand Oaks, CA: Sage Publications, Inc.
Raju, N. S., Edwards, J. E., & LoVerde, M. A. (1985). Corrected formulas for computing sample sizes under indirect range restriction: Journal of Applied Psychology Vol 70(3) Aug 1985, 565-566.
Rasmussen, J. L. (1989). Analysis of Likert-scale data: A reinterpretation of Gregoire and Driver: Psychological Bulletin Vol 105(1) Jan 1989, 167-170.
Roecker, C. E., House, A. E., & Graybill, D. F. (1992). Luria-Nebraska Neuropsychological Battery: High rates of false positives for geriatric subjects: Current Psychology: Research & Reviews Vol 11(4) Win 1992-1993, 354-359.
Ryan, T. A. (1985). Comments on: "Multiple comparison procedures within experimental research" by Caroline Davis and John Gaito: Canadian Psychology/Psychologie Canadienne Vol 26(1) Jan 1985, 75-78.
Sakuragi, A. (2006). The applicability of the exner's comprehensive system of the Rorschach to a Japanese population. Dissertation Abstracts International: Section B: The Sciences and Engineering.
Sappington, J. T. (1997). Type II errors in comparisons of dextral and sinistral groups: Perceptual and Motor Skills Vol 84(3, Pt 2) Jun 1997, 1163-1167.
Sato, T. (1996). Type I and Type II error in multiple comparisons: Journal of Psychology: Interdisciplinary and Applied Vol 130(3) May 1996, 293-302.
Sawilowsky, S. S., & Blair, R. C. (1992). A more realistic look at the robustness and Type II error properties of the t test to departures from population normality: Psychological Bulletin Vol 111(2) Mar 1992, 352-360.
Sawilowsky, S. S., & Hillman, S. B. (1992). Power of the independent samples t test under a prevalent psychometric measure distribution: Journal of Consulting and Clinical Psychology Vol 60(2) Apr 1992, 240-243.
Schooler, L. J., & Shiffrin, R. M. (2005). Efficiently measuring recognition performance with sparse data: Behavior Research Methods Vol 37(1) Feb 2005, 3-10.
Sheehan, J. J., & Drury, C. G. (1971). The analysis of industrial inspection: Applied Ergonomics Vol 2(2) Jun 1971, 74-78.
Shorter, E. (2004). Looking backwards: A possible new pathway for drug discovery in psychopharmacology: Revista de Psiquatria do Rio Grande do Sul Vol 26(2) May-Aug 2004, 196-203.
Silverstein, A. B. (1993). Type I, Type II, and other types of errors in pattern analysis: Psychological Assessment Vol 5(1) Mar 1993, 72-74.
Smith, R. A., Levine, T. R., Lachlan, K. A., & Fediuk, T. A. (2002). The high cost of complexity in experimental design and data analysis: Type I and Type II error rates in multiway ANOVA: Human Communication Research Vol 28(4) Oct 2002, 515-530.
Spector, P. E., & Levine, E. L. (1987). Meta-analysis for integrating study outcomes: A Monte Carlo study of its susceptibility to Type I and Type II errors: Journal of Applied Psychology Vol 72(1) Feb 1987, 3-9.
Stone-Romero, E. F., Alliger, G. M., & Aguinis, H. (1994). Type II error problems in the use of moderated multiple regression for the detection of moderating effects of dichotomous variables: Journal of Management Vol 20(1) Spr 1994, 167-178.
Streiner, D. L. (1993). An introduction to multivariate statistics: The Canadian Journal of Psychiatry / La Revue canadienne de psychiatrie Vol 38(1) Feb 1993, 9-13.
Tachibana, T. (1980). Persistent erroneous interpretation of negative data and assessment of statistical power: Perceptual and Motor Skills Vol 51(1) Aug 1980, 37-38.
Tai, S.-y. W., & Pohl, N. F. (1979). CHI-B: An interactive BASIC program for analyzing the power of chi-square tests: Behavior Research Methods & Instrumentation Vol 11(3) Jun 1979, 404.
Thye, S. R. (2000). Reliability in experimental psychology: Social Forces Vol 78(4) Jun 2000, 1277-1309.
Vallejo, G., & Menendez, I. (1998). The effects of dependence among the observations in several multiple comparison procedures: Psicologica Vol 19(1) 1998, 53-71.
Vasconcelos, M., Urcuioli, P. J., & Lionello-Denolf, K. M. (2007). When is a failure to replicate not a Type II error? : Journal of the Experimental Analysis of Behavior Vol 87(3) May 2007, 405-407.
Velicer, W. F., Peacock, A. C., & Jackson, D. N. (1982). A comparison of component and factor patterns: A Monte Carlo approach: Multivariate Behavioral Research Vol 17(3) Jul 1982, 371-388.
Von Weber, S. (2000). A comparison of tests used in the CFA by simulation: Psychologische Beitrage Vol 42(3) 2000, 260-272.
von Weber, S., von Eye, A., & Lautsch, E. (2004). The Type II Error of Measures for the Analysis of 2 x 2 Tables: Understanding Statistics Vol 3(4) 2004, 259-282.
Walker, J. B., Klein, R. M., & Yee, S.-L. (2001). Type II error and antidepressants: Journal of Clinical Psychiatry Vol 62(5) May 2001, 373-374.
Watts, T. M. (1979). Indices of cheating on multiple-choice tests: Simulation and evaluation: Dissertation Abstracts International.
Weihs, K. L., Batey, S. R., Houser, T. L., Donahue, R. M. J., & Ascher, J. A. (2001). "Type II error and antidepressants": Reply: Journal of Clinical Psychiatry Vol 62(5) May 2001, 374-374.
Westermann, R., & Hager, W. (1986). Error probabilities in educational and psychological research: Journal of Educational Statistics Vol 11(2) Sum 1986, 117-146.
Wilkerson, M., & Olson, M. R. (1997). Misconceptions about sample size, statistical significance, and treatment effect: Journal of Psychology: Interdisciplinary and Applied Vol 131(6) Nov 1997, 627-631.
Williams, P. (1984). Brief psychotherapy in family practice: British Journal of Psychiatry Vol 144 Jan 1984, 101-102.
Woodfield, T. J. (1985). Fool's Type IIa error revisited: Journal of Experimental Education Vol 54(2) Win 1985-1986, 109-113.
Zenhausern, R. (1974). Damn lies or statistics? : Journal of the American Society for Psychical Research Vol 68(3) 1974, 281-296.
Zimmerman, D. W. (1994). A note on the F test for equal variances under violation of random sampling: Journal of General Psychology Vol 121(1) Jan 1994, 77-83.
Zimmerman, D. W. (1994). Note on the influence of distribution of shape on nonparametric tests: Perceptual and Motor Skills Vol 79(3, Pt 1) Dec 1994, 1160-1162.
Zimmerman, D. W. (1994). A note on the influence of outliers on Parametric and Nonparametric tests: Journal of General Psychology Vol 121(4) Oct 1994, 391-401.
Zimmerman, D. W. (1995). Increasing the power of nonparametric tests by detecting and downweighting outliers: Journal of Experimental Education Vol 64(1) Fal 1995, 71-78.
Zimmerman, D. W., Williams, R. H., & Zumbo, B. D. (1992). Correction of the Student t statistic for nonindependence of sample observations: Perceptual and Motor Skills Vol 75(3, Pt 1) Dec 1992, 1011-1020.

This page uses Creative Commons Licensed content from Wikipedia (view authors).