Reinforcement

In operant conditioning, reinforcement is any change in an organism's surroundings that:


 * occurs regularly when the organism behaves in a given way (that is, is contingent on a specific response), and
 * is associated with an increase in the probability that the response will be made or in another measure of its strength.

For example: you give your dog food every time it sits when you tell it to. If the dog becomes more likely to sit when told to, sitting is considered to have been reinforced by the administration of food contingent on it.

Note that it is the behavior that is reinforced, not the dog. The food serves as a reinforcer, reinforcing or strengthening that behavior, only to the extent that sitting subsequently occurs more often or more quickly because of it.

The study of reinforcement has produced an enormous body of reproducible experimental results. Reinforcement is the central concept and procedure in the experimental analysis of behavior.

Schedules of reinforcement
When enough of the variations in an animal's surroundings are reduced or "controlled," its behavior patterns after reinforcement are remarkably predictable. When rates of reinforcement are adjusted in particular ways, even very complex behavior patterns can be predicted. A schedule of reinforcement is the protocol for determining which responses (i.e., which individual occurrences of a given behavior) will be reinforced. The two extremes are continuous reinforcement, in which every response results in reinforcement, and extinction, in which no response is reinforced.

Other schedules include:
 * Fixed ratio (FR), in which every nth response is reinforced.
 * Fixed interval (FI), in which reinforcement occurs after the passage of a specified length of time from the beginning of training or from the last reinforcement, provided that at least one response occurred in that time period.
 * Variable ratio (VR), in which the number of responses required between reinforcements varies, but on average equals a predetermined number.
 * Variable interval (VI), in which reinforcement occurs after the passage of a varying length of time around an average, provided that at least one response occurred in that period.

Ratio schedules produce higher rates of responding than interval schedules. Variable schedules produce higher rates than fixed schedules. The variable ratio schedule produces both the highest rate of responding and the greatest resistance to extinction (that is, resistance to "petering out"). One notable example is gambling behavior. In the fixed ratio schedule, there's a pause after a reinforcer is delivered. This is called a post-reinforcement pause. The fixed interval schedule do produce post-reinforcement pauses, but they are scalloped-shape. Any responses produced before the elapsed time are not reinforced, therefore a subject has learned to respond at a gradual rate. If an organism is subject to a fixed ratio schedule and there is a sudden increase in the number of responses necessary to obtain a reinforcer (say from FR50 to FR250) then the organism is observed to pause periodically before the delivery of the reinforcer. This phenomenom is called the ratio strain and it contrasts with the usual pattern of postreinforcement pause - ratio run and reinforcement in FR-schedules. Concerning extinction, partial reinforcement schedules are more resistant than continuous reniforcement schedules. This phenomenom is called the Partial reinforcement extinction effetct (PREE). Ratio schedules tend to be more resistant than interval schedules and variable schedules more resistant than fixed ones.

Positive vs. negative
Positive reinforcement changes the animal's surroundings by adding a stimulus: a physical object (like a food pellet or paycheck) or energy (like light from a lamp).

Negative reinforcement changes the surroundings by removing an aversive stimulus - such as turning off a painful electric current or removing a hated ex-spouse's picture. Speaking colloquially, an aversive stimulus is something the animal finds "bad;" its removal is thus a "good" thing from the animal's point of view.

Distinguishing "positive" from "negative" in these cases is largely a matter of emphasis. For example, in a very warm room, a current of external air serving as reinforcement may be positive because it is relatively cool but negative because it removes the uncomfortably hot air. Furthermore, the distinction seems to have no real use in research or applied psychology, although one may some day be found. Until then, many behavioral psychologists simply refer to reinforcement or punishment—without polarity—to cover all consequent environmental changes.

Punishment
Punishment is any change in an animal's surroundings that occurs after a given behavior and seems to reduce the frequency of that behavior. As with reinforcement, it is the behavior, not the animal, that is punished. Whether a change is or is not punishing is only known by its effect on the rate of the behavior, not by any "hostile" features of the change. In positive punishment or type I punishment, an experimenter punishes a response by adding an aversive stimulus into the animal's surroundings (a brief electric shock, for example). In negative punishment or type II punishment, a positive reinforcer is removed (as in the removal of a feeding dish). As with reinforcement, it is not usually necessary to speak of positive and negative in regard to punishment.

Punishment is not a mirror effect of reinforcement. In experiments with laboratory animals and studies with children, punishment decreases the frequency of a previously reinforced response only temporarily, and it can produce other "emotional" behavior (wing-flapping in pigeons, for example) and physiological changes (increased heart rate, for example) that have no clear equivalents in reinforcement.

Punishment is considered by some behavioral psychologists to be a "primary process" – a completely independent phenomenon of learning, distinct from reinforcement. Others see it as a category of negative reinforcement, creating a situation in which any punishment-avoiding behavior (even standing still) is reinforced.

Aversive stimulus, punisher, and punishing stimulus are synonyms. Punishment may be used for (a) an aversive stimulus or (b) the occurrence of any punishing change or (c) the part of an experiment in which a particular response is punished.

Other reinforcement terms

 * An unconditioned reinforcer, sometimes called a primary reinforcer, is a stimulus or situation considered to be inherently reinforcing (such as affection, food, or opportunity for sleep).
 * A conditioned reinforcer, sometimes called a secondary reinforcer, is a stimulus or situation that has acquired reinforcing power after being paired in the animal's environment with an unconditioned reinforcer or an earlier conditioned reinforcer (such as praise).
 * A generalized reinforcer is a conditioned reinforcer that has been paired with many other reinforcers (such as money).
 * Differential reinforcement of incompatible behavior (DRI) is used in reducing an already frequent behavior without punishing it by reinforcing a specific incompatible response (like leaving a room so that fighting with someone in it is not possible).
 * In differential reinforcement of other behavior (DRO), any behavior other than some undesired behavior is reinforced.
 * Differential reinforcement of low response rate (DRL): a behavior is reinforced only if it occurred infrequently. "If you ask me for a potato chip no more than once every 10 minutes, I will give it to you. If you ask more often, I will give you none."
 * Differential reinforcement alternate behavior (DRA): the reinforcers for the undesirable behavior are used instead for a more desirable behavior. For example, a teacher will pay attention to students who sit than those who walk or talk in class.
 * In reinforcer sampling a potentially reinforcing but unfamiliar stimulus is presented to an animal without regard to any prior behavior. The stimulus may then later be used more effectively in reinforcement.
 * Social reinforcement involves various sorts of access to and interaction with others.
 * Satiation occurs when a stimulus that had reinforced some behavior no longer seems to do so.

Shaping & chaining
Shaping involves reinforcing successive, increasingly accurate approximations of a response desired by a trainer. In training a rat to press a lever, for example, simply turning toward the lever will be reinforced at first. Then, only turning and stepping toward it will be reinforced. As training progresses, the response reinforced becomes progressively more like the desired behavior. Chaining is similar but involves reinforcing various simple behaviors separately and then linking them together in a more complex series.

Controversies
The standard idea of behavioral reinforcement has been criticized as circular, since it appears to argue that response strength is increased by reinforcement while defining reinforcement as something which increases response strength. Other definitions have been proposed, such as F. D. Sheffield's "consummatory behavior contingent on a response," but these are not broadly used in psychology.

History of the terms
In the 1920s Russian physiologist Ivan Pavlov may have been the first to use the word reinforcement with respect to behavior, but (according to Dinsmoor) he used its approximate Russian cognate sparingly, and even then it referred to strengthening an already-learned but weakening response. He did not use it, as it is today, for selecting and strengthening new behavior. Pavlov's introduction of the word extinction (in Russian) approximates today's psychological use.

In popular use, positive reinforcement is often used as a synonym for reward, with people (not behavior) thus being "reinforced," but this is contrary to the term's consistent technical usage. Negative reinforcement is often used by laypeople and even social scientists outside psychology as a synonym for punishment. This is contrary to modern technical use, but it was B. F. Skinner who first used it this way in his 1938 book. By 1953, however, he followed others in thus employing the word punishment, and he re-cast negative reinforcement for the removal of aversive stimuli.