This page was on the web but the url is no longer active. This is a scanned version of previously available material.

 

 

TRAINING THEORY

Cons right 1998 Ron Lawrence See notice below

 

 

INTRODUCTION It is necessary to understand some of the theory behind training before we can discuss and understand the different training methods used to train our dogs foi obedience trials. Many of the terms used in training and behaviour theory have become buzz words which are often confused and abused. Abusing these terms is one thing, misusing the training methods can result in behaviour other than that which was intended by the trainer.

 

 

MOTIVATION. Sane animals do not do anything unless there is a motive for it (the prospect of a reward). The animal does something because it satisfies a need. Motivation is the desire (sometimes compelling, irresistible and obsessive) to satisfy a need. Need motivates action. The need may be as basic as satisfying the physiological needs (hunger, thirst, sleep, sex) or as obscure as satisfying seif-actualisation needs (attaining one’s full potential). Generally speaking, satisfying the basic survival needs (Physiological and Security needs) will always take preference over satisfying the higher needs (social esteem, self actualisation). The more ‘comfortable’ the anima’ is (Physiological and Security needs are effortlessly met), the more sophisticated will be its needs.

 

 

DRIVES. One of the best and most enduring articles ever written about drives and their relationship to training was written by Wendy Voihard. This is an important work because all dogs are not driven by the same instincts to the same degree. Read Wendy’s article at: Drives A New Look at an Old Concept if you do so you will understand your dog a whole lot better.

 

 

POSITIVE. In the context of modifying behaviour ‘positive’ means: tgiving’ something e.g., giving praise, giving a treat, giving a spanking or giving a reprimand.

 

 

             NEGATIVE. In the context of modifying behaviour ‘negative’ means: ttaking away or removing’ something eg, ceasing praise, taking away pain, taking away privileges.

 

 

    STRENGTHENING. Strengthening a response simply means: making it more likely to re-occur. It is reinforcing (positively or negatively) that behaviour.

 

 

WEAKENING. Weakening a response simply means: making it less likely to re-occur. It involves punishing (positively or negatively) the behaviour.

 

 

~                                                   POSITIVE REINFORCEMENT. The giving of a pleasant event contingent on a

behaviour with the goal of increasing the likelihood of the behaviour in the future (1). Example:

You ask your dog to sit, file dog sits, you give the dog praise and immediately follow it up with

a treat for doing what you asked.






 

 

 

 

Important Note:

 

 

Praise. The use of the term ‘praise’ in this homepage means giving the dog a vocal word or phrase that the dog has been conditioned to associate with approval. Praise is given with the voice so the tone of voice can indicate just how pleased the handler is with the dog and will positively reinforce the activity with which the praise is linked. Praise may be a secondary reinforcer or a primary reinforcer. See Event Marker below. The timing of praise is crucial to its effectiveness in training. When, in this homepage, ‘praise’ is linked with ‘click’ (as in click/praise), it refers to the situation where !clickl and ‘praise’ are being used in the same way, ie as a secondary reinforcer or an event marker.

 

 

NEGATIVE REINFORCEMENT. The removal of an aversive event contingent on a

 

behaviour with the goal of increasing the likelihood of the behaviour in the future (1). Example:

The Koehler technique for teaching the retrieve involves releasing an ear pinch or terminating a shock at the moment the dog clasps the dumbbell in its mouth. If the dog does what is required the pain is removed. Equally, a dog may have learned that escaping from an enclosure relieved the restriction he was feeling and having been negatively reinforced will escape to relieve that uncomfortable feeling again. Negative reinforcement can help us teach the dog e.g., the dog is released from confinement only when he is silent or as another example, when an uncomfortable physical force is used to guide the dog into a ‘sit’, ‘down’, etc - when the dog complies, the force is removed. The use of the hands/feet in this situation which doesnt involve discomfort to the dog (indeed is pleasant to the dog) is not negative reinforcement because it is not averse to the dog.

 

 

‘~                                               ~ PUNISHMENT. A punishment is any stimulus that decreases the probability of the response that it follows. Punishment only seeks to stop undesirable behaviour - it does not teach a new desired behaviour. The undesirable response that the Punishment was designed to decrease may only last for a short duration or may only occur when the ‘punisher’ is present. Think of the traffic ticket for speeding (punishment), the driver may slow down while the memory of the fine is still present or whenever the driver sees a traffic cop but after a time, the driver will return to his same old bad habits.

 

 

POSITIVE PUNISHMENT. The giving of an aversive event contingent on a behaviour with the goal of decreasing the likelihood of the behaviour in the future (1). Example: Dog gets up on the ‘Down Stay’ in an obedience class, the handler immediately storms towards the dog, glaring at it, gives the dog a harsh scruff shake and screams ‘No!’ and physically forces the dog back into the ‘Down.

 

 

Important Notes:

 

 

1. The Correction Command or Non Reward Marker (NRM). Because the ‘correction command’ or NRM’s are an essential part of dog obedience training, clarification is important at this point. Corrections, Punishment, Negative Reinforcement are the most misunderstood and misused terms used in dog training schools, in training manuals and books on dog training. Punishment and Correction are emotional terms for some people. There are very important differences between physical punishment, harsh reprimands and the correction command or NRM’s. A properly given correction command or NRM, as I use them, is not averse in the true sense of the word; but in OC terms they come under the definition of Positive Punishment. Any dog that routinely experiences physical punishment or harsh reprimands in training would be justified in fearing training

                               

 

 



 

but, if the handler is training correctly, a dog should never fear the correction command or an NRM. The correction command means: ‘Ahhhh, not like that, try again!’ A correction command, as 1 use it, can be as benign as a sigh of disappointment like that which occurs from a hushed crowd immediately after a golfer misses a putt - ‘Ahhhhhhh’. It is not abject disapproval ie, positive punishment via a reprimand.

 

Note:           Withholding a reward as a deterrent is Negative Punishment under the definitions of Operant Conditioning.

 

2.       This subject is covered in more depth in Training Methods and Training Basics, but as a simple example to whet the appetite, consider the example given above with the dog in a Down Stay, the dog has two basic choices, ie he can Stay where he is or break the Stay. If the dog is punished for every mistake he makes, he would have cause to fear and hate this exercise or exhibit learned helplessness; he may choose to fight, flee or freeze, only ‘freeze’ reflex would please the handler (but not the dog). However, if the dog is routinely corrected for wrong choices (the proofing technique is tempting the dog to make the common mistakes in an exercise), he will happily try’ one choice after another knowing that, if he is wrong there will be no unpleasant consequences (see Notes 4 and 5), he will merely hear the correction command ‘Ahhhhhhhhh!’ meaning: not that way or not like that, try again. If he has any intelligence at all (and most dogs do), the dog will eventually learn what is required and will be rewarded (positively reinforced).

 

3.       Timing. Timing is absolutely critical to corrections, if the timing of the correction command is poor (too late), the dog will already have broken the Stay (referring the exercise example above) and the choices available to him will be multiplying by the second; however, if the timing of the correction is good (the instant the dog is thinking about breaking), the number of choices are reduced to two, ie continue to break the Stay or Stay where he is. See more about timing in Training Methods.

 

4.     A so called ‘correction command’ which has a threat of physical punishment implied or is given as a harsh reprimand is not a true correction command, it is a complete waste of time in obedience training (Classical Conditioning: Discovery and Investigations, Read Lectures 5 to 14, just change the number in the URL address). I do not consider the positive punishment (aversives)/harsh corrections given above the threshold of comfort for the dog which are typically given to correct the behaviour of aggressive or dominate dogs to be obedience training. This is behaviour modification and a quite separate and special discipline in itself.

 

 

NEGATIVE PUNISHMENT. The removal of a pleasant event contingent on a behaviour with the goal of decreasing the likelihood of the behaviour in the future (1). Example: This is probably more commonly used by humans with other humans, eg the removal of privileges. In dog training, during the early stages of teaching a dog to heel, we give constant praise and a treat when the desired behaviour occurs but as the dog progresses we withhold the praise and treats if the dog’s heeling does not live up to the dog’s best efforts, ie we negatively punish the unwanted behaviours and poor performance and positively reinforce personal best performances’. Negative Punishment is sometimes referred to in terms of ‘Response Cost’. See the note below:

 

Note:

 

1.       Response Cost. If positive reinforcement strengthens a response by adding a positive stimulus, then response cost has to weaken a behaviour by subtracting (withholding) a positive stimulus. After the response the positive reinforcer is removed which weakens the frequency of that response. Trainers say we reward behaviours we want and ignore what we don’t want. What this really means is we praise/treat those behaviours we want and withhold praise/treats when we get behaviours we don’t want.





 

 

 

 

 

 

 

 

 

 

CLASSICAL CONDITIONING Classical conditioning in%ohes simple stimulant response reactions, ie Pavlovian conditioning takes place. Pavlov’s dogs came to associate metronome clicks with food and began to salivate (drool) on hearing the clicks in expectation of being fed. The dogs didn’t have to do a thing for the food. The only behaviour that was positively reinforced was the expectation of being fed having received the stimulus. The stimulus didn’t reinforce the drooling - that was a natural reaction to the expectation of being fed - it was a sign that the stimulus was working. Conditioning is the learned association or connection of one thing with another.

 

 

OPERANT. ‘Operant’ means a behaviour that operates on the environment. So when you see the word ‘operant’, substitute ‘behaviour’ to make the meaning more clears

 

 

OPERANT CONDiTIONING B F Skinner, who spent a whole career documenting the contingencies of reinforcement, outlined the principles of ‘operant conditioning’. Operant conditioning means using the concepts of positive and negative reinforcement and punishment or correction by association. It is an extension of the Pavlovian conditioning but in operant conditioning the dog has to actually do something before the stimulus, the secondary reinforcer (‘click’, verbal praise, etc) is given which in turn is associated, through conditioning, with the primary reinforcer, eg a treat.

 

 

PRI MARY REINFOR( ERS Primary reinforcers are those things which directly positively or negatively reinforce behaviour eg a treat. A primary reinforcement satisfies a biological need (eg, food, water, shelter, warmth).

 

SECONDARY REINFORCERS. Secondary reinforcers are those things which, through operant conditioning, are associated with the primary reinforcers, eg ‘click’, verbal praise, feedback, etc. For humans, money is a secondary reinforcer because money is associated with the primary reinforcers, food, shelter, security, sex, status, etc. Money and Praise are unusual reinforcers. Secondary reinforcers usually have no value in them per se, a ‘click’ is just a momentary audibLe sound, money is just paper or metal. yet the money’s value as a reinforcer depends on what’s printed or stamped on it. Praise depends on who gives it and how it is given (voice tone, pitch, etc). Some argue that money and praise have become primary reinforcers as well as secondary reinforcers. Well timed Praise is a very powerful training tool.

 

Important Note

 

Event Markers. Usually event markers are secondary reinforcers but they can be primary reinforcers too. An event marker may be a ‘click’ which is, by convention, aLways associated with primary reinforcers or it can be a vocal event marker which can be a secondary reinforcer or primary reinforcer. A vocal event marker may vary from positive reinforcement (praise) right through to positive punishment (reprimand) depending on the circumstances in which it is used, ie a vocal event marker may indicate an imminent reward but may also be praise (a reward in itself), a correction or a reprimand. VocaL event markers are extremely flexible whiLe clicker event markers are restrictive and ‘wooden’. Event markers are sometimes referred to as conditioned reinforcers.






 

 

 

 

Conditioned Reinforcers. “A conditioned reinforcer is a sound, word or phrase that has been

associated with a reward which will signal a real reward is coming. It is spoken just as the dog does what you want him to thus letting him know what action of his pleased you and earned him the reward”. Off Lead” August 1986. A ‘cLicker’ is also a conditioned reinforcer.

 

Clickers. There has been a great deal of nonsense put about by clicker devotees. One of those who promote the use of a clicker is the refreshingly level headed Gary WiLkes. See Clicker Training: What it isn’t. My only contention with Gary’s introduction to these articles is that the timing of the secondary reinforcer (conditioned reinforcers/event markers) should be within half a second of the desired response and not within one tenth of a second as Gary claims in his article. This is why the well-timed human voice when used as a secondary reinforcer (usually given as praise) is just as effective as the clicker and, in most cases, more flexible than the clicker in training. This said, Gary’s articles are excellent and should be compulsory reading for all novice trainers. Praise and click are interchangeable in training they achieve exactly the same thing and that is why in this page they are coupled together in the text.

 

 

REINFORCEMENT INTENSITY. in general, the more intense (larger or more appealing) a reinforcer, the more effective the conditioning ie, the response is learned faster and is emitted more frequently. Intensity is relative to the dog concerned, ie what turns him on. What may be a very intense reinforcer to one dog eg, say play with a ball, may be a very weak reinforcer to a dog that is only turned on by treats. A reinforcer is not reinforcing unless it

~                                          reinforces, ie if the reinforcer is not intense enough to increase the probabiLity of the response it follows, then it is not a reinforcement.

 

RESPONSE/REINFORCEMENT CONTINGENCY. In general, the interval between the response and the reinforcement should be as short as possible. This is why precise timing of praise (or clicker) and reward in dog training is so critical to its success.

 

CONTINUOUS REINFORCEMENT. Continuous reinforcement is when a reinforcement follows each response (1:1 ratio). Continuous reinforcement tends to lead to faster conditioning, higher rates of responding, faster extinction, and is usually used early in the conditioning process.

 

 

PARTIAL OR RANDOM REINFORCEMENT. Partial or random reinforcement is when a reinforcement does not follow each response (1:2 or a greater or random ratio). Partial or random reinforcement tends to lead to slower conditioning, lower rates of responding, slower extinction, and is usually used late in the conditioning process.

 

 

SHAPING (SUCCESSIVE APPROXIMATION). Shaping is often called the Method of Successive Approximation because it involves reinforcing responses that are closer and closer approximations of the final desired response. The final desired response is termed the Terminal Behaviour. Shaping is a step-by-step process that begins by reinforcing a response

that in some way approximates the desired Terminal behaviour. Once the Terminal Behaviour ‘~ is reached, only the Terminal Behaviour is reinforced.

 

 

-~ MODELLING Modelling forces/manipulates/channels the dog into a position with the






 

 

 

use of the feet, hands or other training equipment. ~fodelling ma~ be extremely gentle or harsh. Harsh modelling carries with it a form of punishment while gentle modelling is akin to

~           petting and may be used as a form of positive reinforcement.

 

 

CHAINING. Chaining puts a series of simple exercises into one complete exercise or to look at it another way the exercise is broken down into simple parts which are trained separately and then put together to make the entire exercise. Firstly, the dog learns each simple exercise separately and then they are ‘linked’ together like the links in a ‘chain’.

 

 

BEHAVIOUR MODIFICATION. Behaviour modification is a structured attempt to alter the dog’s environment and its reinforcement and punishment contingencies in order to change a behaviour. Behaviour modification is based on the assumption that undesired behaviours are the result of inappropriate contingencies (being rewarded for the undesired behaviour or not being rewarded for a desired behaviour).

 

 

~ EXTINCTION. Extinction occurs when a response is repeatedly not followed by a reinforcement. The response - reinforcement association or causal inference is broken and the rate of response decreases, often to the original free operant level. Extinction is unpredictable. Sometimes it will work well and other times it may not.

 

 

~ DIFFERENTIAL REINFORCEMENT OF OTHER BEHAVIOUR (PRO). A

Differential Reinforcement of Other Behaviours involves administering a reinforcement after a period of time when the undesired response does not occur. That is, the dog is reinforced for doing anything except the undesired response during this time period. When using a DRO, the time period involved is usually short to begin with, and increased over time (a form of shaping).

 

      

~* LEARNED HELPLESSNESS. Learned helplessness develops when a dog perceives that no matter what he does he is repeatedly subject to punishments that are not warranted and that cannot be avoided, escaped, or controlled. As a result, the dog comes to understand that he has no controL over his environment and so gives up and passively accept whatever the environment offers.

 

Motivational Effect of Learned Helplessness: The dog becomes slow to exhibit behaviours that result in reinforcement, even when this type of reinforcement control is possible. Also, the dog becomes slow to avoid avoidable punishment and is often lethargic.

 

Cognitive Effect of Learned helplessness: The dog has difficulty learning in situations where the dog actually does have some control over punishment and reinforcement. The dog is slow to learn new contingencies.

 

Emotional Effect of Learned Helplessness: The dog tends to be passive, withdrawn, fearful, and depressed.

 

Some other terms you may come across in dog training are:

 

 

~ LURING/TARGETING. Holding a treat (or other primary’ reinforcer) in such a way as to induce the dog to make the desired motion in pursuit of the treat (hold the treat above the sitting dog~s head to get him to Stand or beg). It is a little like the proverbial, ‘carrot and the donkey’.






 

 

 

 

 

CAPTURING AND MARKINC Marking (vuth a ‘click’ or ‘praise’) the exact moment that the dog performs the desired behaviour, generally without direction from the handler. This is often referred to as ‘Marking’. See event markers above.

 

 

The Premack Principle. The L’remack Principle which is often called ‘grandma’s rule’, states that a high frequency activity can be used to reinforce low frequency behaviour. Access to the preferred activity is contingent on completing the low-frequency behavi~ur. Determine what the dog likes to do but make doing that activity contingent upon doingw~tat he wouldn’t otherwise do. Work first then we can play.

 

Finally, before I close this section on theory, the way we see rewards and punishment depends on our outlook on life, don’t assume your dog sees it the same way:

 

“Losers visualize the penalties of failure. Winners visualize the rewards of success.” and “If at first you don’t succeed, try, try again.” This is very easy for the success-oriented, it is hard for the person trying to avoid failing.

 

These are some of the main terms you wiLl see used in dog training and in this Homepage. Now, let’s get on with the training  

 

 

 

- This article is provided as a ser~ ice to all those interested in promoting th~ sport of Dog Obedience Trialling The author hercby grants permission for individuals and non-profit orgarusatlons to reproduce and distribute this article under the followmg conditions Full credit is given to the author ort each at eveiy copy, with the notation Copyright 1998 Ron ~rencc’ All copIes distributed must b~ provided free of charge If reproduced ri ane~sletteror magazine, full credit must be given

 

 

 

 

 

 

 

H

Disclaimer:                        Some of the following links may promote the use of certain training devices. The fact that I have included the links here does not indicate that I agree with the training methods recommended or the devices promoted by the author.

 

 

A Brief Introduction to Operant Conditioning (1)

 

Play Training

 

Training Philosophy and Background

 

Positive Reinforcement

 

Training Tips For New HandLers

 

Reinforcement

 

~         Qp~rant Conditioning

 

Operant Conditioning (2)

 

Punishment:   Problems ∓ Principles for Effective Use

 

Romancing the Cookie

 

Punishment:   How not to do it.

 

OPERANT (INSTRUMENTAL) CONDITIONING





 

 

~.                                            INTRODUCTION TO LEARNING 5 Clicker Training: What it isn’t

 

Classical Conditioning: Discovery and Investigations (Rea_ e~tures 5-14, change the URL number}~ Self-Quiz on Conditioning

                                          ~.