Type I errors

A false positive, also called a Type I error, exists when a test incorrectly reports that it has found a result where none really exists.

Detection algorithms of all kinds often create false positives. For example, optical character recognition (OCR) software may detect an 'a' where there are only some dots that look like an a to the algorithm being used.

False positive rate
The false positive rate is the proportion of negative instances that were erroneously reported as positive. It is equal to 1 minus the specificity of the test.


 * $${\rm false\ positive\ rate} = \frac{\rm number\ of\ false\ positives}{\rm number\ of\ negatives}$$

In statistical hypothesis testing, this fraction is sometimes described as the size of the test, and is given the symbol α.

False positives vs. false negatives
When developing detection algorithms (that is, tests) there is a tradeoff between false positives, and false negatives (in which an actual match is not detected). A threshold value can be varied to make the algorithm more restrictive or more sensitive. Restrictive algorithms risk rejecting true positives while more sensitive algorithms risk accepting false positives.

False positives in medicine
False positives are a significant issue in medical testing. In some cases, there are two or more tests that can be used, one of which is simpler and less expensive, but less accurate, than the other. For example, the simplest tests for HIV and hepatitis in blood have a significant rate of false positives. These tests are used to screen out possible blood donors, but more expensive and more precise tests are used in medical practice, to determine whether a person is actually infected with these viruses.

Perhaps the most widely discussed false positives in medicine come from screening mammography, a test to detect breast cancer. The US rate of false positive mammograms is up to 15%, the highest in world. The lowest rate in the world is in Holland, 1%. The lowest rates are generally in Northern Europe where mammography films are read twice and a high threshold for additional testing is set. One consequence of the US’s high false positive rate is that, in a ten year period, half of American women receive a false positive mammogram. False positive mammograms are costly, with over $100 million spent annually in the US on unnecessary follow-up testing and treatment. They also cause women unneeded anxiety. Research has shown that the anxiety associated with receiving a false positive can be reduced if the time between the abnormal result and the all clear is reduced.

False positives are also problematic in biometric scans, such as retina scans or facial recognition, when the scanner incorrectly identifies someone as matching a known person, either a person who is entitled to enter the system, or a suspected criminal.

False positives can produce serious and counterintuitive problems when the condition being searched for is rare. If a test has a false positive rate of one in ten thousand, but only one in a million samples (or people) is a true positive, most of the "positives" detected by the test will be false. The probability that an observed positive result is a false positive may be calculated, and the problem of false positives demonstrated, using Bayes' theorem.

False positives in computer database searching
In computer database searching, false positives are documents that are retrieved by a search despite their irrelevance  to the search question. False positives are common in full text searching, in which the search algorithm examines all of the text in all of the stored documents in an attempt to match one or more search terms supplied by the user.

Most false positives can be attributed to the deficiencies of natural language, which is often ambiguous: the term "home," for example, may mean "a person's dwelling" or "the main or top-level page in a Web site." The false positive rate can be reduced by using a controlled vocabulary, but this solution is expensive because the vocabulary must be developed by an expert and applied to documents by trained indexers.

False positives and spam
The term "False positive" is also used when spam filtering or spam blocking techniques wrongly classify a legitimate email message as spam and as a result interferes with its delivery.

The opposite, a False Negative, occurs when filtering allows a spam email to be delivered to a user's inbox.

While most anti-spam tactics can block or filter a high percentage of unwanted emails, doing so without creating significant false-positive results is a much more demanding task.

A commonly referenced sub-category is the "Critical False-Positive." This term is used to distinguish the accidental blocking of mass-emails that may not be spam, but are not generally regarded as critical communications, in contrast with user to user messages and automated transaction notifications where timely delivery is much more important.

False positives and malware
The term False positive is also used when antivirus software wrongly classifies a file as a virus. The incorrect detection may occur either by heuristics or by an incorrect virus signature in a database. Similar problems can occur with antitrojan or antispyware software.

False positives and ghost investigation
False positive has been adopted by paranormal or ghost investigation groups to describe a photograph, recording, or other evidence that incorrectly appears to have a paranormal origin. In other words, a false positive in this context is a disproven piece of media (image, movie, audio recording, etc.) that has a normal explanation. Several sites provide examples of false positives, including The Atlantic Paranormal Society (TAPS) and Moorestown Ghost Research.