34,680 Pages

The clustering illusion refers to the natural human tendency to "see patterns where actually none exist." Since according to a branch of mathematics known as Ramsey Theory complete mathematical disorder in any physical system is an impossibility, it may be more correct to state, however, that the clustering illusion refers to the natural human tendency to associate some meaning to certain types of patterns which must inevitably appear in any large enough data set.

For instance, most people say that the sequence "OXXXOXXXOXXOOOXOOXXOO" (Gilovich, 1993) is non-random, when, in fact, it has many qualities which would also appear to be desirable characteristics of what one expects to see in a "random" stream, such as having an equal number of each result and the fact that the number of adjacent results with the same outcome is equal for both possible outcomes. In sequences like this, people seem to expect to see a greater number of alternations than one would predict statistically. In fact, in a short number of trials, variability and non-random-looking "streaks" are quite probable.

As another example, the answers of the SAT, an important multiple-choice standardized test in the United States, are specifically chosen not to contain any long runs, because experience has shown test designers that students believe these runs are unlikely to occur. As a result, a student may feel pressured into choosing a wrong answer just to break a run.

Whether or not patterns exist in a data set can often be decided by means of statistical analysis, or even methods of computational cryptanalysis. Consider the sequence "XXOXOXOOOXOXOOOXOX"; is it random? The answer is no; if you associate the position of the X's in the string with prime numbers, and the O's with composites, beginning with the number 2 — the pattern is obvious. Computer programs that compress data, i.e., data compression algorithms, are designed to, in a sense, "look for patterns" in data, and to create alternative representations from which it is possible to reconstruct the original data from a compressed form. Large datasets which contain "clusters" of a non-random nature can in general be expected to compress well, given the right encoding algorithm. On the other hand, if there is no real clustering, or pattern, in a particular data set, then one would expect it to compress poorly, if at all.

The clustering illusion was central to a widely reported study by Thomas Gilovich, Robert Vallone and Amos Tversky. Their conclusion debunked the "hot hand" of basketball as being indistinguishable from chance, where "hot hand" is the idea that players shoot successfully in "streaks". Famous coaches including Bobby Knight reportedly scoffed at the idea.

Using this cognitive bias in causal reasoning may result in the Texas sharpshooter fallacy. It may also often be the cause of the gambler's fallacy. See the representativeness heuristic.