Introduction to Operant conditioning

Operant conditioning, so named by psychologist B. F. Skinner, is the modification of behavior brought about over time by the consequences of said behavior. The phrase operant conditioning differs from Pavlovian conditioning in that while operant conditioning deals with voluntary behavior explained by its consequences, Pavlovian conditioning deals with involuntary behavior triggered by its antecedents.

Operant conditioning, sometimes called instrumental conditioning or instrumental learning, was first extensively studied by Edward L. Thorndike (1874-1949), who observed the behavior of cats trying to escape from home-made puzzle boxes. When first constrained in the boxes, the cats took a long time to escape. With experience, ineffective responses occurred less frequently and successful responses occurred more frequently, enabling the cats to escape in less time over successive trials. In his Law of Effect, Thorndike theorized that successful responses, those producing satisfying consequences, were "stamped in" by the experience and thus occurred more frequently. Unsuccessful responses, those producing annoying consequences, were stamped out and subsequently occurred less frequently. In short, some consequences strengthened behavior and some consequences weakened behavior. B.F. Skinner (1904-1990) built upon Thorndike's ideas to construct a more detailed theory of operant conditioning based on reinforcement and punishment.

Reinforcement and punishment
Reinforcement and punishment, the core ideas of operant conditioning, are either positive (adding a stimulus to an organism's environment), or negative (removing a stimulus from an organism's environment). This creates a total of four basic consequences, with the addition of no consequence (i.e. nothing happens). It's important to note that organisms are not reinforced or punished; behavior is reinforced or punished.


 * Reinforcement is a consequence that causes a behavior to occur with greater frequency.
 * Punishment is a consequence that causes a behavior to occur with less frequency. According to Skinner's theory of operant conditioning, there are two methods of decreasing a behavior or response. These can be by punishment or extinction.

Four contexts of operant conditioning: Here the terms "positive" and "negative" are not used in their popular sense, but rather: "positive" refers to addition, and "negative" refers to subtraction. What is added or subtracted may be either reinforcement or punishment. Hence positive punishment is sometimes a confusing term, as it denotes the addition of punishment (such as spanking or an electric shock), a context that may seem very negative in the lay sense. The four situations are:




 * 1) Positive reinforcement occurs when a behavior (response) is followed by a pleasant stimulus that rewards it. In the Skinner box experiment, positive reinforcement is provided by rewarding the rat with food or sugar solution for pressing the lever.
 * 2) Negative reinforcement occurs when a behavior (response) is followed by an unpleasant stimulus being removed. In the Skinner box experiment, negative reinforcement is a loud noise continuously sounding inside the rat's cage until it presses the lever, when the noise ceases.
 * 3) Positive punishment an aversive stimulus, such as introducing a shock or loud noise.
 * 4) Negative punishment removes a pleasant stimulus, such as taking away a child's toy. This occurs when a behavior (response) that had previously been followed by a pleasant stimulus is followed by no stimulus at all. In the Skinner box experiment, this is the rat pushing the lever and being rewarded with a food pellet several times, and then pushing the lever again and never receiving a food pellet again. Eventually the rat would learn that no food would come, and would cease pushing the lever.

Also:
 * A type of learning in which a certain behavior (usually negative) is not done in an attempt to not receive a punishment is termed avoidance learning.

Avoidance learning
Avoidance training belongs to negative reinforcement schedules. Showing the instrumental response results in terminating or preventing an aversive stimulus. There are two kind of commonly used experimental settings: discriminated and free-operant avoidance learning.

 Discriminated avoidance learning 
 * In discriminated avoidance learning, a novel stimulus such as a light or are a tone is followed by an aversive stimulus such as a shock (CS-US, similar to classical conditioning). Whenever the animal performs the instrumental response, the CS respectively the US is removed. During the first trials (called escape-trials) the animals usually experiences both the CS and the US, showing the instrumental response to terminate the aversive US. By the time, the animaly will learn to perform the response already during the presentation of the CS thus preventing the aversive US from occuring. Such trials are called avoidance trials.

 Free-operant avoidance learning 
 * In this experimental session, no discrete stimulus is used to signal the occurence of the aversive stimulus. Rather, the aversive stimulus (mostly shocks) are presented without explicit warning stimuli.
 * There are two crucial time intervals determining the rate of avoidance learning. This first one is called the S-S-interval (shock-shock-interval). This is the amount of time which passes during successive presentations of the shock (unless the instrumental response is performed). The other one is called the R-S-interval (response-shock-interval) which specifies the length of the time interval following an instrumental response during which no shocks will be delivered. Note that each time the organism perfoms the instrumental response, the R-S-interval without shocks begins newly.

Two-Process Theory of Avoidance
This theory was originally established to explain learning in disciminated avoidance learning. It assumes two processes to take place.  a) Classical conditioning of fear  During the first trials of the training, the organism expences both CS and aversive US (escape-trials). The theory assumed that during those trials classical conditioning takes places by pairing the CS with the US. Because of the aversive nature of the US the CS is supposed to eclicit a conditoned emotional reaction (CER) - fear. In classical conditioning, presenting a CS conditioned with an aversive US disrupts the organisms ongoing behavior.  b) Reinforcement of the instrumental response by fear-reduction  Because during the first process, the CS signalling the aversive US has itself become aversive by eliciting fear in the organism, reducing this unpleasant emotional reaction serves to motivate the instrumental response. The organisms learns to make the response during the CS thus terminating the aversive internal reaction elicited by the CS. An imporant aspect of this theory is that the term "Avoidance" does not really describe what the organism is doing. It does not "avoid" the aversive US in the sense of anicipating it. Rather the organism escapes an aversive internal state, caused by the CS.


 * One of the practical aspects of operant conditioning with relation to animal training is the use of shaping or Reinforcing successive approximations and not reinforcing behavior past apprimations, as well as chaining.