Rescorla-Wagner learning model

The Rescorla-Wagner model is a model of classical conditioning in which the animal is theorized to learn from the discrepancy between what it predicted would happen and what actually happened. This is a trial-level model in which each stimulus is either present or not present at some point in the trial. The prediction of the unconditioned stimulus for a trial can be represented as the sum of all the associative strengths for the conditioned stimuli present during the trial. This is the feature of the model that represents a major advance over previous models, and allowed a straightforward explanation of important experimental phenomena such as blocking. For this reason, the Rescorla-Wagner model has become one of the most influential models of learning, though it has been frequently criticized since its publication. It has attracted considerable attention in recent years, as many studies have shown that the phasic activity of dopamine neurons in the midbrain encodes the type of prediction error in the model.

The Rescorla-Wagner model was created by Robert A. Rescorla of the University of Pennsylvania and Allan R. Wagner of Yale University.

Basic assumptions of the model
1. The associative strength of a stimulus is expressed directly in the behaviour it elicits/inhibits. There is no way of learning about a stimulus and not showing what was learned in the organism's reactions.

2. Excitation and inhibition are opposite features. One stimulus can only have a positive associative strength (being a conditioned excitor) or a negative associative strength (being a conditioned inhibitor) it cannot have both.

3. The amount of surprise an organism is assumed to experience when encountering an Unconditioned Stimulus (US) is assumed to be dependent on the summed associative value of all cues present during that trial. This assumption differs from previous models which considered only the associative value of a particular Conditioned Stimulus (CS) to be the determining aspect of surprise.

4. The salience of a CS is a constant. The salience of a CS (alpha) is not supposed to undergo any changes during training and can thus be represented by a constant.

5. The history a cue does not have any effects on its current state. It is only the current associative value of a cue which determines the amount of learning. It does not matter whether the CS may have undergone several conditioning-extinction sessions or the like.

The Revised RW model by Van Hamme and Wassermann (1994)
Van Hamme and Wassermann have extended the original Rescorla-Wagner (RW) model and introduced a new factor in their revised RW model in 1994: They suggested that not only conditioned stimuli physically present on a given trial can undergo changes in their associative strength, the associative value of a CS can also be altered by a within-compound-association with a CS present on that trial. A within-compound-association is established if two CSs are presented together during training (compound stimulus). If one of the two component CSs is subsequently presented alone, then it is assumed to activate a representation of the other (previously paired) CS as well. Van Hamme and Wassermann propose that stimuli indirectly activated through within-compound-associations have a negative learning parameter--thus phenomenons of retrospective reevaluation can be explained.

Let's consider the following example, an experimental paradigm called `backward blocking´, indicative of retrospective revaluation, where AB is the compound stimulus A+B:

Phase 1:     AB-US

Phase 2:      A-US

Test trials: Group 1, which received both Phase 1- and 2-trials, elicits a weaker Conditioned Response (CR) compared to the Control group, which only received Phase 1-trials.

The original RW model cannot account for this effect. But the revised model can: In phase 2, stimulus B is indirectly activated through within-compound-association with A. But instead of a positive learning parameter (usually called alpha) when physically present, during Phase 2, B has a negative learning parameter. Thus during the second phase, B's associative strength declines whereas A's value increases because of its positive learning parameter.

Thus, the revised RW model can explain why the CR elicited by B after backward blocking training is weaker compared with AB-only conditioning.

Some failures of the RW Model
 a) spontaneous recovery from extinction and recovery from extinction caused by reminder treatments (reinstatement)
 * It is a well established observation that a time-out interval after completion of extinction results in partial recovery from extinction, i.e. the previously extinguished reaction or response reoccurs - but usually at a lower level than before extinction training. Reinstatement refers to the phenomenon that exposure to the US from training alone after completion of extinction results in partial recovery from extinction. The RW model can't account for those phenomena.

 b) extinction of a previously conditioned inhibitor
 * The RW model predicts that repeated presentation of a conditioned inhibitor alone (a CS with negative associative strength) results in extinction of this stimulus (a decline of its negative associative value). This is a false prediction. Contrarily, experiments show the repeated presentation of a conditioned inhibitor alone even increases its inhibitory potential.

 c) facilitated reacquisition after extinction
 * One of the assumptions of the model is that the history of conditioning of a CS does not have any influences on its present status - only its current associative value is important. Contrary to this assumption, many experiments show that stimuli that were first conditioned and then extinguished are more easily reconditioned (i.e. fewer trial are necessary for conditioning).

 d) the exclusiveness of excitation and inhibition
 * The RW model also assumes that excitation and inhibition are opponent features. A stimulus can either have excitatory potential (a positive associative strength) or inhibitory potential (a negative associative strength). By contrast it is sometimes observed, that stimuli can have both qualities. One example is backward excitatory conditioning in which a CS is backwardly paired with the US (US-CS instead of CS-US). This usually makes the CS become a conditioned exctitor. But interestingly, the stimulus also has inhibitory features which can be proven by the retardation of acquisition test. This test is used to assess the inhibitory potential of a stimulus since it is observed that excitatory conditioning with a previously conditioned inhibitor is retarded. The backwardly conditioned stimulus passes this test and thus seems to have both excitatory and inhibitory features.

 e) pairing a novel stimulus with a conditioned inhibitor
 * A conditioned inhibitor is assumed to have a negative associative value. By presenting an inhibitor with a novel stimulus (i.e. its associative strength is zero), the model predicts that the novel cue should become a conditioned excitor. This is not the case in experimental situations. The predictions of the model stem from its basic term (lambda-V). Since the summed associative strength of all stimuli (V) present on the trial is negative (zero + inhibitory potential) and lambda is zero (no US present), the resulting change in the associative strength is positive, thus making the novel cue a conditioned excitor.

 f) CS-preexposure effect
 * The CS-preexposure effect is the well established observation that conditioning after exposure to the stimulus later used as the CS in conditioning is retarded. The RW model doesn't predict any effect of presenting a novel stimulus without a US.

 g) higher-order conditioning
 * In higher order conditioning a previously conditioned CS is paired with a novel cue (i.e. first CS1-US than CS2-CS1). This usually makes the novel cue CS2 eliciting similar reactions like the CS1. The model cannot account for this phenomenon since during CS2-CS1 trials, no US is present. But by allowing CS1 acting similarly like a US, one can reconcile the model with this effect.

 h) sensory preconditioning
 * Sensory preconditioning refers to first pairing two novel cues (CS1-CS2) and than pairing one of them with an US (CS2-US). This turns both CS1 and CS2 in conditioned excitors. The RW model cannot explain this, since during the CS1-CS2-phase both stimuli have an associative value of zero and lambda is also zero (no US present) which results in no change in the associative strength of the stimuli.