Implicit memory testing

Implicit memory tests indirectly measure the retention of information that has not been consciously attended to. Participants are given tasks designed to elicit knowledge that was acquired unconsciously and is evident when performance shows greater inclination towards items initially presented than new items. . Performance on implicit tests is a result of priming, a preference to respond to previously experienced stimuli over novel stimuli. Types of implicit memory tests include The Implicit Association Test, The Lexical Decision Task, The Word Stem Completion task, Artificial Grammar Learning, and Word Fragment Completion.

Implicit Association Test (IAT)
The Implicit Association Test is a testing method designed by Anthony Greenwald, Debbie McGhee and Jordan Schwartz, and was first introduced in 1998. The IAT measures the associative strength between categories (e.g. Bug, Flower) and attributes (e.g. Bad, Good) by having participants rapidly classify stimuli that represent the categories and attributes of interest on a computer. During four of the seven trial blocks in an IAT, categories and attributes share a response key (e.g. Bug or Bad, Flower or Good), with the underlying assumption being that participant response times will be quicker when the category and attribute are more closely associated.

Method/Procedure
The first two trial blocks have participants match stimuli only to categories or attributes, to allow participants to practice grouping the stimuli (Bug, Flower). The third and fourth trial blocks mark the first occurrence that a category and attribute share a response key, and during these blocks, categories and attributes are grouped in a congruently associative manner (e.g. Bug with Bad, Flower with Good ). The fifth trial block has the category labels switch sides, and gives participants a chance to practice grouping stimuli from the category, with the new orientation of the labels (e.g. Flower, Bug). Finally, the sixth and seventh trial blocks have categories and attributes sharing a response key again, but now, because of the switching of sides for the category, labels are now presented in an incongruently associative manner (Flower and Bad, Bug and Good).

Originally, a participant's performance during an IAT was scored in milliseconds, depending on how much time they took to respond each trial, but since then, an improved scoring algorithm has been created. The resulting "D measure" was found to be superior in a variety of ways, such as creating larger correlations with explicit measures, and reducing the effects of prior IAT experience. Interpreting the D measure is also fairly straightforward, with high positive scores indicating a congruent implicit preference, high negative scores indicating an incongruent implicit preference, and scores around zero indicating a relatively neutral implicit preference.

Reliability & Validity Information
Implicit measures, especially latency-based ones, typically struggle to achieve a satisfactory level of internal consistency and test-retest reliability. However, the IAT possess acceptable levels of both, with one review finding that internal consistency values of IAT's typically ranged from .7 to .9. In terms of test-retest reliability, the IAT has shown itself to be a relatively stable measure, however, little research has examined the test-retest reliability of the IAT with a gap in time larger than a month between administrations.

The IAT has also established itself to be an acceptably valid measure, and has demonstrated this through its convergent validity, discriminant validity, and predictive validity. The IAT's convergent and discriminant validity has been established through its comparison with explicit measures, whereby IAT's were found to relate to explicit measures targeting the traits, and not with explicit measures targeting unrelated traits. Additionally, multitrait-multimethod studies have demonstrated that although IATs and explicit measures may be related, they appear to be measuring different constructs. Overall, the IAT has been found to be an effective predictor of behavior, and is generally superior to self-report measures when dealing with topics of discrimination and stereotyping, especially when examining patterns of ingroup liking (e.g. preferring Canadians over Americans if one is Canadian, and vice versa if one is American).

Current IAT Research
The IAT is a procedure applied to a variety of research topics, including examinations of self-esteem, consumer studies, and human sexuality. Oftentimes, it is the IAT's ability to skirt socially desirable response biases that makes it an attractive method, and it is often used in lieu of, or alongside explicit self-report measures.

Implicit Self-Esteem
Implicit self-esteem IAT's utilize "self" and "other" as categories, and "positive" and "negative" as attributes. . Participants who group "self" stimuli quicker when sharing a response with "positive" stimuli show positive implicit self-esteem. On the other hand, participants who group "self" stimuli quicker when sharing a response key with "negative" stimuli show low implicit self-esteem. In one implicit self-esteem IAT study, it was demonstrated that North American and Asian university students all have relatively high levels of implicit self-esteem This is quite a difference when compared with explicit measures of self-esteem, as North American participants tended to have much higher levels of explicit self-esteem than their Asian counterparts, highlighting implicit self-esteem as a possibly universal phenomenon. Separate research examining the relationship of implicit and explicit self-esteem has determined that the two are separate, but weakly related constructs.

Marketing and Consumer Studies
The IAT has also been effectively used in the realm of marketing and consumer studies. In one such study, participant's attitudes towards Apple Macintosh and Microsoft Windows computers were compared using both an explicit measure and an IAT. The IAT used targets of "Windows" and "Mac" which were paired with attributes of "positive" and "negative". The researchers found that while correlations between explicit brand preference and implicit brand preference were high, that Mac users had stronger implicit preferences for their brand than Windows users. Other IAT research has also demonstrated that the IAT can reliably predict consumer behavior, including purchase intention, brand preference, and perceived brand superiority.

Human Sexuality
Human sexuality research has been one area where the IAT has been slow to catch on as a procedure choice, as implicit sexual attitudes have not been investigated in earnest and most of the research has focused on attitudes towards condom use, and attitudes towards gay and lesbian people. In one study, researchers found that while explicit attitudes towards gays and lesbians were generally positive, implicit attitudes towards gay men were negative, as were men's implicit attitudes towards lesbians Additionally, the IAT has been found to be extremely effective at predicting the sexual orientation of gay and heterosexual men. Finally, research comparing heterosexual men and women found that heterosexual women harbor more negative explicit and implicit attitudes towards sex than males.

Criticisms of the IAT
Previous criticisms of the IAT typically centered around the notion that IAT effects were a product of familiarity with stimulus items, rather than actual implicit attitudes. However, additional research seems to have addressed this concern as several studies have shown that IAT effects are not on account of familiarity. One such study found that implicit attitudes towards White Americans were much more positive compared to Black Americans, even when equally unfamiliar stimuli were used to represent these categories.

Currently, most of the criticisms of the IAT center around the accuracy of its communicated purpose, as some have interpreted the IAT to act as a kind of lie detector to get at attitudes that are "more true". However, the creators of the IAT assert that participants' implicit attitudes may differ from self-report for a number of reasons, such as they are unaware of these implicit biases, are aware of the implicit biases but reject them as incongruent with their beliefs, or are aware of the implicit biases and simply attempt to hide them, and only in the third case does the IAT fit the description of detecting hidden beliefs. In conclusion the authors state that the difference between the IAT and self-report measures of attitudes is that self-report measures require introspection, while the IAT does not: they simply measure different things and one is not more associated with truth than the other.

Lexical Decision Task (LDT)
The first experimenters to use the Lexical Decision Task (LDT) were Meyer and Schvaneveldt in 1971 who measured semantic decisions and showed that people are faster to respond to words when they have already been shown a prime that is semantically related, ex. faster to confirm "nurse" as a word when it is preceded by "doctor" than when it is preceded by "butter".

Method/Procedure
The Lexical Decision Task is an implicit memory task in which participants are given a stimulus (a string of letters), and asked to decide whether this string is a word or a nonword. Nonwords are made by replacing at least one letter in a word with another letter (ex. mark becomes marb). Vowels are used to replace vowels and consonants are used to replace consonants. Response times are the main measure in these tasks and they are measured as a function of the string's meaning, familiarity, and the frequency of the word. Response times are also measured to see if they reflect what has occurred previously - like if the participant has been recently exposed to these words or if they relate to ideas that the participant has been recently thinking about. It has been found that people respond faster to words they have recently been exposed to as well as to words that relate to ideas that the person has recently been thinking about. The original task consisted of a stimulus that involved either a pair of words, a word and a nonword, or a pair of nonwords. The participants are asked to respond "yes" if both strings are words, and "no" in the other two conditions (if there is a word and a nonword, or if there are two nonwords). Another variation of this answering scheme is for participants to respond "same" if the strings are either both words or both nonwords, and "different" if one of the strings is a word and the other is a nonword. "The stimuli were generated on a Stromberg Carlson SC4060 graphics system, photographed on 16-mm movie film and presented on a rear-projection screen by a Perceptual Development Laboratories’ Mark III Perceptoscope." The participants were told to look at a fixation box which appeared on the screen for 1 second and after this the stimulus was displayed. The participants used a panel with finger keys for their right and left hands to respond. The right index finger pressed the "yes" (or "same") button and the left index finger pressed the "no" (or "different") button. By counting the cycles of a 1000Hz oscillator, the participants' reaction times were measured to the nearest millisecond; the response times were measured from the time the stimulus was presented until the response was made by the participant.

The On-Line LDT
The more recent version of the Lexical Decision Task is on-line. In these experiments stimuli are usually words from a series of text, that are presented either visually or auditorially to participants one word at a time. Part way through the presentation of the text, a string of letters is presented visually to the participant and their task is to decide if this is a word or a nonword. Participants respond by pressing the corresponding key on their keyboard. This is commonly the "?" and "/" key for "yes" and "z" for "no". This technique measures reaction time and accuracy and has been used to examine our understanding of word meanings and syntactic structures.

Current LDT Research
Current LDT research has increased the knowledge of inter-hemispheric communication in people with and without reading disabilities. The left hemisphere of the brain uses a phonological, non-lexical strategy that changes graphemes into phonemes to sound out strings of letters. People with reading disabilities, specifically phonological dyslexia, and people without reading disabilities participated in a LDT task and it was found that experience in a task improves hemispheric asymmetry in the brain. Moreover, there is a transformation from no asymmetry in nonword conditions to a clear left hemisphere advantage in word conditions. It was also shown that the left hemisphere is enhanced by experience in familiar word conditions which results in the suppression of the right hemisphere in these conditions for both people with and without reading disabilities. This shows that hemispheric asymmetry for lexical processing is not altered by having a reading disability. Finally, responses in the pseudoword condition were slower when people with phonological dyslexia were only using their left hemisphere which suggests that there’s more reliance on lexical processing by the right hemisphere than non-lexical processing by the left hemisphere. This research has furthered the knowledge of inter-hemispheric communication in people with reading disabilities so that now inter-hemispheric communication for the processing of unfamiliar and pseudowords is all that is needed to help people with phonological dyslexia develop a non-lexical strategy. The identification of a critical time period in which an intervention should take place is also needed.

Alternate Theory and Criticisms of LDT
In the standard LDT participants have to read the string of letters in front of them, decide if it is a word or not, and then make their response by pressing a key. This version of the LDT has been criticized for participants having more errors and longer response times due to the fact that they have to remember which key to press (yes key if it is a word and no key if it is not a word) after they have decided if the string is a word or not. The go/no-go task is an alternate task that has been proposed to see if the response selection is actually resulting in slower response times and more errors for participants. A study was conducted to look at this, using both the yes/no LDT and the go/no-go LDT. In the go/no-go task the participants are asked to asked to press the mouse with their dominant hand if the string of letters presented to them on the screen is a word, and do nothing if it is not a word. In comparing the two tasks it was found that the response times of the go/no-go task were faster and more accurate than those of the yes/no task. This result was also present in an associative priming experiment where the priming effect was found to be greater for the go/no-go task than for the yes/no task. In both experiments there was also dramatic decrease in the amount of errors made for people in the go/no-go condition implying that the go/no-go task has an advantage over the yes/no task because there is no response selection to be made, therefore decreasing the response times and errors made.

Word Stem Completion (WSC) Task
One of the first uses of the word stem completion (WSC) task was by Elizabeth K. Warrington and L. Weiskrantz in 1970. These researchers used the WSC task to examine the memory of verbal material in amnesic patients. They asked amnesic participants to read a list of words three times and then tested them on recall, recognition, fragmented words or a WSC task (the first few initial letters were presented). They found the amnesic participants to be worse than controls on recall and recognition, but performed equally to control participants on fragmented words and WSC tasks. This suggested that long-term memory can be demonstrated in amnesic patients using the WSC task.

Method/Procedure
The WSC task is a verbal test of perceptual implicit memory. In this task a participant is presented with the first few letters of a word and asked to complete the word stem with the first word that they can think of. The participants are usually unaware that they have to complete these tasks using words that they have previously seen. An example of a WSC task would be presenting the word "lettuce" in such a way that a participant was not aware that this word would be useful later. After a given time the participant would be given the word stem "LET____" and asked to complete it with the first word that comes to mind. Participants are using their implicit memory if they complete the word stem with the previously presented word, in this case lettuce. To construct a WSC task for a study, researchers will usually use a thesaurus to generate a large pool of words, including both general words and the words to be primed. With the use of a dictionary, this pool of words is slowly reduced to a smaller amount. Using information from pilot testing the smaller pool is shrunken to the desired number of word stems for the test.

WSC Task Theory
Some researchers predict that when an individual is primed with a word a schema is activated in the brain, producing further activation of the components of that schema. This activation strengthens the internal organization of the schema, making the word more accessible because it will come to mind more readily when only some of its components are presented. In the WSC task this is exactly what happens, the first few letters are shown, which activate the components of the schema. The processing of a word increases its accessibility and the probability that this word will be produced even when only some of its components are presented (i.e. the first few letters of the word). Since the WSC task is measuring implicit memory, all of this happens without the participant being aware of it.

Current WSC Task Research
The WSC task has been used in recent years to measure whether learning can occur when a patient is under anesthesia. In one study, 14 words were played, either before surgery or during surgery, through headphones to patients anesthetized with propofol. Once patients had recovered, their memory was assessed using an auditory WSC test. This has the same procedure as a WSC task using images of the words, except the first part of the word is heard instead of seen during testing. The patients were also tested for explicit memory of the words using a recall test. The researchers found that none of the patients had explicit recall for the words listened to while being under anesthesia. Furthermore, patients under anesthesia who listened to the words before surgery did not show any implicit learning using the WSC task. However, patients under anesthesia who listened to the words during surgery showed implicit memory using the WSC task. Although, it is important to note that the amount of learning is quite small and the results of this study are weak.

Researchers have also been using WSC tasks to investigate the implicit impact of exposure to appearance and weight related images in the media. In one study two groups of participants, a control group who watched a nonappearance related video and an experimental group who watched an appearance related video, were asked to complete twenty word stems with the first word that came to mind. The word stems were created with the possibility of being completed with an appearance related word or a nonappearance related word. For example, SLE___ could be completed with slender, an appearance related word or with sleep, a nonappearance related word. For both females and males, results showed that watching an appearance related video before completing a WSC task significantly increased the number of appearance related responses. This study shows that the WSC task can be successfully used to explore the implicit influences of the media.

WSC Task Limitations
Researchers have compared the WSC task to the word identification test, the word fragment completion test, and the anagram solution test. They used four different types of presentation for the studying of words to test and compare these implicit memory tasks. The four types of presentation studied were: visual where the font was the same on the test, visual that was a different font from the test, auditory and picture. Conclusions from this study are that the WSC task has better results when participants are primed visually and worse results when participants are primed using the auditory and picture conditions. Furthermore, research has shown that priming effects for the WSC task usually disappear within two hours.

Method/Procedure
Artificial Grammar Learning (AGL) is a task designed to test the process of implicit learning, which is the unconscious acquisition of knowledge and the use of this knowledge without consciously activating it. It involves the use of a “finite state language”, which is a potentially infinite set of items made up of symbols following a finite set of rules, which constitutes a grammar. It was first introduced in 1967 by Arthur S. Reber.

In the standard AGL paradigm based on Reber’s work, a “language” that consists of a vocabulary of letters (for example, Z, K, F, G and B) and grammatical rules for putting these letters into sentences is constructed. The grammar consists of a number of states, where the addition of a letter causes the transition from one state to another, until the end state is achieved. In the learning phase of the AGL task, the experimental group is given a number of sentences created using the artificial grammar. The control group is given a number of random strings made up of the same letters, but not following the rules of the artificial grammar. Both groups are told that they are doing a memory task, and must memorize the letter strings and then reproduce them. In the test phase, both groups are told that each letter string was actually a sentence created using a complex set of grammatical rules. They are each given a number of new sentences, some grammatically correct and some not, and are asked to judge the grammaticality of each. The results show that most participants can consistently make accurate grammatical/non-grammatical assessments of the new sentences, even though few can correctly articulate the rules that they are using to make those assessments.

Current AGL Research
AGL is used in many studies as a measure of implicit learning or memory along with a separate test for explicit learning/memory in response to a certain variable. One study investigated the relationship between age and learning style, i.e. explicit versus implicit. An AGL task was used because of its ability to measure both implicit and explicit learning. One group of participants was given strings made from a complex grammar, with no mention in the instructions about the underlying rules. This was thought to increase the amount of implicit learning, as more complex rules are harder to perceive and participants were not attempting to find them. Another group was given strings made from a simple grammar, with instructions to try and figure out the rules. This was thought to increase the amount of explicit learning, as participants were consciously attempting to find rules that were easy to perceive. The results showed that aged adults performed poorly on the task that emphasized explicit learning compared to young adults; however, both groups performed similarly on the task that emphasized implicit learning. This demonstrates that the aging effects seen with explicit memory do not have an effect on implicit memory.

A 2002 study was done investigating the neural correlates of AGL. Data from amnesic patients with medial temporal damage whose performance on AGL tasks is no different from controls show that this area is not implicated with AGL. The learning phase was conducted as usual for the participants, and the test phase was conducted with the participants inside an fMRI scanner. The results showed a greater activity in the left superior occipital cortex and right fusiform gyrus for grammatical stimuli, and greater activity in the left angular gyrus during grammaticality judgments, as compared to a matched recognition control task.

Alternate Theories and Criticisms of AGL
Reber’s original AGL theory is rule-based; participants learn and apply the formal rules of the artificial grammar through viewing grammatical strings. However, there are many alternate theories to describe the knowledge that is obtained through learning an artificial grammar.

Microrules
This theory states that participants do not acquire the abstract rules exactly as stated by the artificial grammar. Instead, participants develop their own rules based on small sections of each letter string. For example, they may notice that an F always comes after an M. The existing AGL paradigm is criticized for having only two responses: grammatical or non-grammatical. In one study, participants were asked to indicate why they felt a certain sentence was grammatical or non-grammatical. In the test phase, the participants were told to either cross out the part of each string that made it non-grammatical, or underline the part that made it grammatical. This indicated the microrules that each participant was consciously applying. The results showed that participants acquired a large number of imperfect and limited rules, however, they do lead to consistently correct judgments of grammaticality and non-grammaticality.

Similarity
The specific similarity theory states that learning occurs by encoding each letter string in the learning phase as a whole. Grammaticality judgments in the test phase are made by comparing novel letter string to the ones already in memory. The more similar a string is to the remembered strings, the more grammatical it is reported to be. A variant of this theory suggests that the representation of each letter string is pooled into a larger representation of multiple strings, and grammaticality is assessed by comparing the similarity of novel items to this pooled representation. Another similarity model suggests that smaller surface features of each string are stored as well as the string as a whole. Each novel letter string is compared to the collection of features in memory and their similarity is used to determine grammaticality. Similarity can also be called familiarity in some theories.

Chunking
In the competitive chunking hypothesis, knowledge of a letter string develops along a hierarchy of “chunks”, beginning with bigrams (two letters), leading to trigrams, four-grams, and so on. “Chunk strength” refers to the frequency of occurrence of any given chunk during the learning phase. The higher the chunk strength of an item, the more likely it is to be determined grammatical.

Hybrid Theory
Some researchers don’t believe that AGL can be explained using only one of the theories mentioned above. A hybrid theory claims that a knowledge of the abstract grammar rules as well as of the surface features of the letter strings are obtained while learning the artificial grammar, and that both are used to determine the grammaticality of novel letter strings. A study investigating a hybrid theory showed that not only did participants use both of these types of knowledge in their grammaticality judgments, but amnesic patients who had lost use of their explicit memory were also able to make grammaticality judgments using both types of knowledge. This shows that both the abstract grammar rules and the surface features of the strings are implicitly learned and implemented.

Word Fragment Completion (WFC)
The Word Fragment Completion test (WFC) is a test designed to measure memory of words presented to participants. Words that were previously shown to participants are presented again in a fragmented form (i.e. missing letters) with the task of retrieving the missing letters from memory to complete it. This task calls on implicit memory because at the time of word presentation, participants have not consciously stored the items in memory; they have merely been exposed to them. To avoid participants consciously trying to retain the items presented, which would result in a test of explicit memory, they are often mislead about the purpose of the study through irrelevant tasks that are given which require their conscious attention. Implicit memory can then be observed when participants perform better on the WFC test for words that have been presented than for words that have not. This effect is known as priming and is the key demonstration of this test.

Method/Procedure
Since the main objective of this implicit test is to assess priming effects, the WFC assessment is typically administered after a presentation period of the to-be-tested words. Subjects are typically presented with the items as read directly from a list by a test administrator or by the participants themselves. To ensure that implicit memory is being measured rather than explicit memory, participants can be given irrelevant tasks in this stage to distract them from attempting to memorize the to-be-tested words (E.g. Sort various squares by size). Participants in research studies often try to determine the experimenters’ goals and respond in ways which would support their hypotheses, which makes distracter tasks crucial to the validity of studies. Another step to ensure participants are not relying on explicit memory is to place a time delay between the learning phase and the test phase. This interferes with primacy and recency effects because it interrupts the active rehearsal of the listed items.

After being exposed to the items (learning phase), the participants enter the test phase. They are presented with fragments of the words that were shown in the learning phase, in addition to new words that serve as a baseline of performance (i.e. performance on non-primed words). Participants are then instructed to complete the fragments with the first word that comes to mind. Priming effects are evident when performance on the originally presented words exceeds performance on the new words. The types of words that are presented are typically ones that are used infrequently in everyday language. Words that are lower in frequency are more likely to be identified correctly in the WFC test because they are more distinct, which makes them easier to recall. Presented words also tend to be longer (7 or 8 characters) than words presented in other implicit memory tests and the fragments are presented in such a way that only 1 or 2 possibilities for completion exist.

An example of a WFC test is as follows:

Participants are presented with a list of words including ASSASSIN, EMISSARY, MYSTERY, PENDULUM, and THEOREM, among others. A distracter task is utilized to redirect the participant’s attention; they are asked to sort paint chips into their respective colour categories (red, blue, green, etc.). Participants are then presented with a fragment of a previously exposed word, A_ _A_ _IN, along with other fragments of primed words and new words.

Current Research in WFC
One of the findings of this test is the distinction between performance on high and low frequency words. It is already understood that a distinction exists for word frequency with respect to recall and recognition memory, but this test in particular helped build evidence for this distinction in implicit memory rather than explicit memory alone. For both direct and indirect tests (explicit and implicit, respectively), performance is better for the free recall of high frequency words and better for the recognition of low frequency words. Low frequency words are more distinct and stand out, so when one is presented, it is easier to determine if it has been seen before (i.e. if the item is recognized) because of its distinctiveness in memory. Recall and recognition tests have different performance rates for different types of tests because they involve different levels of processing (LOP). Recall tests require one to generate the information in its entirety, a deeper LOP, while recognition tests require one to determine if a stimulus has been previously presented, a shallow LOP. Research on LOP has further supported the finding that priming effects last longer for WFC than that of other implicit memory tests. WFC performance remains high for words presented in a learning phase of an experiment for up to a week before dropping down to baseline levels, while performance on other tests, such as Artificial Grammar Learning, dropped down after only a few hours.

An interesting finding through the use of this test is that the first letter of a word is particularly important in participants’ ability to correctly determine its identity. One study presented fragments of words with the first letter deleted (e.g. _urse) and found that performance rates were significantly lower than words that had the first letter intact (e.g. p_rse). This may be because the first letter is the first cue for what the word to follow may be.

WFC is a test of the unconscious retention of information and so the majority of the new research associated with this test is geared towards implicit memory. One application of tests such as this one is with patients who have amnesia. When the distinction between explicit and implicit memory was first determined, it was hypothesized that amnesiacs may not have lost all of their memory after all. In fact, when tests that measure implicit memory are administered to people who suffer from amnesia, they show tendencies of responding to stimuli in ways which correlate with information previously presented but not explicitly remembered.

Perceptual Tests

 * Word Identification Task
 * Degraded Word Naming
 * Anagram Solution

Non-verbal Tests

 * Picture Fragment Naming
 * Object Decision Task
 * Possible/Impossible Object Decision

Conceptual Tests

 * Word Association Test
 * Category Instance Generation
 * General Knowledge Questions