Reinforcement schedules

In operant conditioning, a schedule of reinforcement is any rule determining which responses should be followed by reinforcement under conditions where not every response is necessarily reinforced.

In his early experiments using the Skinner box, B. F. Skinner made a surprising discovery. If he did not deliver food reward after every leverpress that his rat subjects made, the rate of leverpressing did not decrease and become irregular, as Skinner had expected, but (after a few hours of experience of the situation) increased and became more regular.

Omitting reinforcement after some responses is called intermittent reinforcement or partial reinforcement, and in classical conditioning (as studied by Pavlov), it is well known to weaken the conditioned response, which occurs more slowly and less intensely than it does when every response is followed by the unconditional stimulus (the usual arrangement, referred to as continuous reinforcement). The effect Skinner had found was therefore paradoxical. It turned out, however, to be both robust and important.

Given that not every response is reinforced, the question arises of which responses should be reinforced. Any rule for deciding this is called a schedule of reinforcement. There are many possible schedules. Many of the simpler possibilities, and some of the more complex ones, were investigated at great length by Ferster and Skinner (1957), using pigeons, but new schedules have continued to be defined and investigated.

Theoretical importance of schedules of reinforcement
The reason that schedules were important theoretically was that it turned out that different kinds of schedule induced different patterns of behaviour, and these patterns were highly orderly; furthermore they were found regardless of the species being investigated. Even humans sometimes show the same patterns as other species, though not under all conditions. Not only are the patterns of behaviour under different schedules consistently different, quantitative properties of behaviour under a given schedule depend on the parameters of the schedule, and sometimes on other, non-schedule factors, in an orderly way. The orderliness and predictability of behaviour under schedules of reinforcement was powerful evidence for Skinner's claim that using operant conditioning he could obtain "control over behaviour", in a way that rendered the theoretical disputes of contemporary comparative psychology obsolete. It was the success of schedule control that led to the development of the idea that a radical behaviourist experimental analysis of behavior could be the foundation for a science of behaviour that need make no mention of any mental or cognitive processes, and also to the proposal that an effective behavioural technology could be developed using applied behavior analysis.

Simple schedules
Simple schedules are those involving a single rule to determine the delivery of a single type of reinforcement for making a single type of response.

The simplest schedules of all barely deserve the name: they are continuous reinforcement (the reinforcement of every response) and extinction (the cessation of all reinforcement). Apart from these, the four schedules that are best known to those outside the operant conditioning community are:
 * Fixed ratio (FR) - reinforcement after a fixed number of responses have been made (maintaining a fixed ratio between response rate and reinforcement rate), e.g. FR 10
 * Variable ratio (VR) - reinforcement after a variable number of response have been made; the schedule is characterised by the mean number of responses occurring between reinforcement, e.g. VR 30.
 * Fixed interval (FI) - reinforcement of the first response after a fixed time interval has passed since the previous reinforcement (maintaining a fixed interval between reinforcers)
 * Variable interval (VI) - reinforcement of the first response after a variable time interval has passed since the previous reinforcement; the schedule is characterised by the mean interval occurring between reinforcements, e.g. VI 30 seconds.

However there are numerous other simple schedules. Prominent examples include:
 * Differential reinforcement of low rate (DRL) - reinforcement for the first response after a minimum interval since the last response
 * Differential reinforcement of high rate (DRH) - reinforcement for the first response within a maximum interval since the last response
 * Differential reinforcement of other behaviour (DRO) - reinforcement after a minimum interval with no responding
 * Fixed Time (VT) - reinforcement at a fixed time since the last reinforcement, regardless of whether the subject has responded or not.
 * Variable Time (FT) - reinforcement at a variable time since the last reinforcement, regardless of whether the subject has responded or not; the schedule is characterised by the mean interval occurring between reinforcements, e.g. VT 30 seconds.

Compound schedules
Compound schedules combine two or more different simple schedules in some way. There are many possibilities; among those most often used are:
 * Alternating schedules (alt) - alternate reinforcements are delivered according to the requirements of two different schedules.
 * Mixed schedules (mix) - either of two schedules may occur, perhaps in blocks of time, with no stimulus indicating which is in force.
 * Multiple schedules (mult) - either of two schedules may occur, perhaps in blocks of time, with a stimulus indicating which is in force.
 * Concurrent schedules (conc) - two schedules are simultaneously in force, typically though not necessarily on two different response devices
 * Tandem schedules (tand) - reinforcement is given when two successive schedule requirements have been completed, with no stimulus indicating when one has been completed and the next has started. E.g. in a tand FR10 FI 60 secs, the subject would have to make ten responses; a 60-sec interval would then start and after its completion the next response would be reinforced.
 * Chained schedules (chain) - reinforcement is given when two successive schedule requirements have been completed, with a stimulus indicating when one has been completed and the next has started. E.g. in a chain FR10 FI 60 secs, the subject would have to make ten responses, whereup on a stimulus change would occur (e.g. for a pigeon, the colour of the pecking key might change); a 60-sec interval would then start and after its completion the next response would be reinforced.
 * Higher order schedules - completion of one schedule is reinforced according to a second schedule; e.g. in FR2 (FI 10 secs), two successive fixed interval schedules would have to be completed before a response is reinforced.

In compound schedules, the different simple schedules of which the compound schedule is made up are referred to as components, so one might write, "In the mult VR30 VI 45 sec schedule, responding under the VR30 component was generally higher than under the VI 45 secs component". The terms listed above are often combined, for example an alternating schedule can be referred to as either alt mix or alt mult, depending on whether there is a stimulus that signals which component is in effect; or an FR2 (FI 10 secs) schedule could be either tandem or chained, depending on whether there is a stimulus change at the completion of each FI 10 secs component.