Visual short term memory

In the study of vision, visual short-term memory (VSTM) is one of three broad memory systems including iconic memory and long-term memory. VSTM is a type of short-term memory, but one limited to information within the visual domain.

The term VSTM refers in a theory-neutral manner to the non-permanent storage of visual information over an extended period of time. The Visuospatial Sketchpad is a VSTM subcomponent within the theoretical model of working memory proposed by Alan Baddeley.

Whereas iconic memories are fragile, decay rapidly, and are unable to be actively maintained, visual short-term memories are robust to subsequent stimuli and last over many seconds.

Overview
The introduction of stimuli which were hard to verbalize, and unlikely to be held in long-term memory, revolutionized the study of visual short-term memory (VSTM) in the early 1970s (Cermak, 1971; Phillips, 1974; Phillips & Baddeley, 1971). The basic experimental technique used required observers to indicate whether two matrices (Phillips, 1974; Phillips &Baddeley, 1971), or figures (Cermak, 1971), separated by a short temporal interval, were the same. The finding that observers were able to report that a change had occurred, at levels significantly above chance, indicated that they were able to encode some aspect of the first stimulus in a purely visual store, at least for the period until the presentation of the second stimulus. However, as the stimuli used were complex, and the nature of the change relatively uncontrolled, these experiments left open various questions, such as: (1) whether only a subset of the perceptual dimensions comprising a visual stimulus are stored (e.g., spatial frequency, luminance, or contrast); (2) whether some perceptual dimensions are maintained in VSTM with greater fidelity than others; and (3) the nature by which these dimensions are encoded (i.e., are perceptual dimensions encoded within separate, parallel channels, or are all perceptual dimensions stored as a single bound entity within VSTM?).

Set-size effects in VSTM
In a typical VSTM experiment, observers are presented with two arrays, composed of a number of stimuli. The two arrays are separated by a short temporal interval, and the task of observers is to decide if the first and second arrays are composed of identical stimuli, or whether one item differs across the two displays (e.g., Luck & Vogel, 1997). Increasing the number of stimuli present within the two arrays leads to a monotonic decrease in the sensitivity of observers to differences in stimuli across the two arrays (Luck & Vogel, 1997; Pashler, 1988). This capacity limit has been linked to the posterior parietal cortex, the activity of which increases with the number of stimuli in the arrays, but only up to the capacity limit of about four stimuli (Todd & Marois, 2004). There are a number of frameworks that attempt to explain the effect of increasing set-size on performance in VSTM. These can be broadly grouped under three categories: (1) psychophysical frameworks (e.g., Magnussen & Greenlee, 1997); (2) sample size models (e.g., Palmer, 1990); and (3) urn models (e.g., Pashler, 1988).

Problems with psychophysical explanations
Psychophysical experiments suggest that information is encoded in VSTM across multiple parallel channels, each channel associated with a particular perceptual attribute (Magnussen, 2000). Within this framework, a decrease in an observer's ability to detect a change with increasing set-size can be attributed to two different processes: (1) if decisions are made across different channels, decreases in performance are typically small, and consistent with decreases expected when making multiple independent decisions (Greenlee & Thomas, 1993; Vincent & Regan, 1995); (2) if multiple decisions are made within the same channel, the decrease in performance is much greater than expected on the basis of increased decision-noise alone, and is attributed to interference caused by multiple decisions within the same perceptual channel (Magnussen & Greenlee, 1997).

However, the Greenlee-Thomas model (Greenlee & Thomas, 1993) suffers from two failings as a model for the effects of set-size in VSTM. First, it has only been empirically tested with displays composed of one or two elements. It has been shown repeatedly in various experimental paradigms that set-size effects differ for displays composed of a relatively small number of elements (i.e., approximately ≤ 4 items), and those associated with larger displays (i.e., approximately > 4 items). The Greenlee-Thomas (1993) model offers no explanation for why this might be so. Second, while Magnussen, Greenlee, and Thomas (1997) are able to use this model to predict that greater interference will be found when dual decisions are made within the same perceptual dimension, rather than across different perceptual dimensions, this prediction lacks quantitative rigor, and is unable to accurately anticipate the size of the threshold increase, or give a detailed explanation of its underlying causes.

In addition to the Greenlee-Thomas model (Greenlee & Thomas, 1993), there are two other prominent approaches for describing set-size effects in VSTM. These two approaches are can be referred to as sample size models (Palmer, 1990), and urn models (e.g., Pashler, 1988). They differ from the Greenlee-Thomas (1993) model by: (1) ascribing the root cause of set-size effects to a stage prior to decision making; and (2) making no theoretical distinction between decisions made in the same, or across different, perceptual dimensions.

Models of capacity limits in VSTM
If observers are asked to report on the quality (e.g., color) of an item stored in memory, while performance might be perfect when only a few items are encoded (the number of items that can be perfectly encoded varies depending on the attribute being encoded, but is usually less than five), after which performance invariably declines in a monotonic fashion as more items are added. Different theoretical models have been put forward to explain this decline in performance.

Slot models
A prominent class of model proposes that observers are limited by the total number of items which can be encoded, either because the capacity of VSTM itself is limited (e.g., Cowan, 2001; Luck & Vogel, 1997; Pashler, 1988), or because of a bottleneck in the number of items which can be attended to prior to encoding. This type of model has obvious similarities to urn models used in probability theory (see, for example, Mendenhall, 1967). In essence, an urn model assumes that VSTM is restricted in storage capacity to only a few items, k (often estimated to lie in the range of three-to-five). The probability that a suprathreshold change will be detected is simply the probability that the change element is encoded in VSTM (i.e., k/N). Although urn models are used commonly to describe performance limitations in VSTM (e.g., Luck & Vogel, 1997; Pashler, 1988; Sperling, 1960), it is only recently that the actual structure of items stored has been considered. Luck and colleagues have reported a series of experiments designed specifically to elucidate the structure of information held in VSTM (Luck & Vogel, 1997). This work provides evidence that items stored in VSTM are coherent objects, and not the more elementary features of which those objects are composed.

Noise models
A much more controversial framework has more recently been put forward by Wilken and Ma (2004) who suggest that apparent capacity limitations in VSTM are caused by a monotonic decline in the quality of the internal representations stored (i.e., monotonic increase in noise) as a function of set size. In this conception capacity limitations in memory are not caused by a limit on the number of things that can be encoded, but by a decline in the quality of the representation of each thing as more things are added to memory.

In their 2004 experiments, they varied color, spatial frequency, and orientation of objects stored in VSTM using a signal detection theory (SDT) approach. The participants were asked to report difference between the visual stimuli presented to them in consecutive order. The invesigators found that different stimuli were encoded independently and in parallel, and that the major factor limiting discrimination performance was neuronal noise (which is a function of visual set size).

Sample size models
Sample size models (Palmer, 1990) propose that the monotonic decrease in performance with increasing set-size in VSTM experiments is a direct outcome of a limit in the amount of information observers can extract from a visual display.

In the sample size model, each perceptual attribute of a stimulus is associated with an internal, unidimensional percept, formed by the collection of a finite number of discrete samples. It is assumed that the total number of samples that can be collected across the entire visual scene is fixed. Assuming that equal attention is paid to each stimulus, it follows that the total number of samples taken from each element in an array will be inversely proportional to the number of stimuli present, N. Central limit theorem implies that the mean of the samples taken, and therefore the mean of the internal percept, will have a variance inversely proportional to N. Signal detection theory defines sensitivity (i.e., d&prime;) as being inversely proportional to the standard deviation of the underlying representation to be discriminated (Macmillan & Creelman, 1991). Therefore according to the sample size model, in a VSTM experiment an observer's sensitivity to a stimulus change, d&prime;, will be inversely proportional to square-root of N.

Unfortunately, few studies have directly tested this prediction of the sample size model. Some evidence has been provided by Palmer (1990), who performed a VSTM experiment using arrays composed of lines of varying length, and set-sizes of one, two or four. The task of observers was to determine whether there had been a change in the length of one of the lines. It was found that observers' thresholds increased proportional to square-root of N, in accordance with the predictions of the sample size model.