TRAINING THEORY
Cons right
1998 Ron Lawrence See notice below
INTRODUCTION It is necessary
to understand some of the theory behind training before we can discuss and
understand the different training methods used to train our dogs foi obedience
trials. Many of the terms used in training and behaviour theory have become
buzz words which are often confused and abused. Abusing these terms is one
thing, misusing the training methods can result in behaviour other than that
which was intended by the trainer.
MOTIVATION. Sane animals do
not do anything unless there is a motive for it (the prospect of a reward). The
animal does something because it satisfies a need. Motivation is the desire
(sometimes compelling, irresistible and obsessive) to satisfy a need. Need
motivates action. The need may be as basic as satisfying the physiological
needs (hunger, thirst, sleep, sex) or as obscure as satisfying
seif-actualisation needs (attaining one’s full potential). Generally speaking,
satisfying the basic survival needs (Physiological and Security needs) will
always take preference over satisfying the higher needs (social esteem, self
actualisation). The more ‘comfortable’ the anima’ is (Physiological and
Security needs are effortlessly met), the more sophisticated will be its needs.
DRIVES. One of the best and
most enduring articles ever written about drives and their relationship to
training was written by Wendy Voihard. This is an important work because all
dogs are not driven by the same instincts to the same degree. Read Wendy’s
article at: Drives A New Look at an Old Concept if you do so you will
understand your dog a whole lot better.
POSITIVE. In the context of
modifying behaviour ‘positive’ means: tgiving’ something e.g.,
giving praise, giving a treat, giving a spanking or giving a reprimand.
NEGATIVE. In the context of modifying behaviour
‘negative’ means: ttaking away or removing’ something eg, ceasing
praise, taking away pain, taking away privileges.
‘ STRENGTHENING. Strengthening
a response simply means: making it more likely to re-occur. It is reinforcing
(positively or negatively) that behaviour.
WEAKENING. Weakening a response
simply means: making it less likely to re-occur. It involves punishing
(positively or negatively) the behaviour.
~ POSITIVE REINFORCEMENT. The
giving of a pleasant event contingent on a
behaviour with the goal of
increasing the likelihood of the behaviour in the future (1). Example:
You ask your dog to sit,
file dog sits, you give the dog praise and immediately follow it up with
a treat for doing what you
asked.
Important Note:
Praise. The
use of the term ‘praise’ in this homepage means giving the dog a vocal word or
phrase that the dog has been conditioned to associate with approval. Praise is
given with the voice so the tone of voice can indicate just how pleased the
handler is with the dog and will positively reinforce the activity with which
the praise is linked. Praise may be a secondary reinforcer or a primary
reinforcer. See Event Marker below. The timing of praise is crucial to its
effectiveness in training. When, in this homepage, ‘praise’ is linked with
‘click’ (as in click/praise), it refers to the situation where !clickl and
‘praise’ are being used in the same way, ie as a secondary reinforcer or an
event marker.
NEGATIVE
REINFORCEMENT. The removal of an aversive event contingent on a
behaviour
with the goal of increasing the likelihood of the behaviour in the future (1).
Example:
The Koehler
technique for teaching the retrieve involves releasing an ear pinch or
terminating a shock at the moment the dog clasps the dumbbell in its mouth. If
the dog does what is required the pain is removed. Equally, a dog may have
learned that escaping from an enclosure relieved the restriction he was feeling
and having been negatively reinforced will escape to relieve that uncomfortable
feeling again. Negative reinforcement can help us teach the dog e.g., the dog
is released from confinement only when he is silent or as another example, when
an uncomfortable physical force is used to guide the dog into a ‘sit’, ‘down’,
etc - when
the dog complies, the force is removed. The use of the hands/feet in this
situation which doesn’t involve discomfort to the dog (indeed is
pleasant to the dog) is not negative reinforcement because it is not averse to
the dog.
‘~ ~ PUNISHMENT. A punishment is
any stimulus that decreases the probability of the response that it follows.
Punishment only seeks to stop undesirable behaviour - it does not teach a new
desired behaviour. The undesirable response that the Punishment was designed to
decrease may only last for a short duration or may only occur when the
‘punisher’ is present. Think of the traffic ticket for speeding (punishment),
the driver may slow down while the memory of the fine is still present or
whenever the driver sees a traffic cop but after a time, the driver will return
to his same old bad habits.
POSITIVE PUNISHMENT. The giving of an aversive event contingent on a
behaviour with the goal of decreasing the likelihood of the behaviour in the
future (1). Example: Dog gets up on the ‘Down Stay’ in an obedience class, the
handler immediately storms towards the dog, glaring at it, gives the dog a
harsh scruff shake and screams ‘No!’ and physically forces the dog back into
the ‘Down’.
Important Notes:
1. The
Correction Command or Non Reward Marker (NRM). Because the ‘correction
command’ or NRM’s are an essential part of dog obedience training,
clarification is important at this point. Corrections, Punishment, Negative
Reinforcement are the most misunderstood and misused terms used in dog training
schools, in training manuals and books on dog training. Punishment and
Correction are emotional terms for some people. There are very important
differences between physical punishment, harsh reprimands and the correction
command or NRM’s. A properly given correction command or NRM, as I use them, is
not averse in the true sense of the word; but in OC terms they come under the
definition of Positive Punishment. Any dog that routinely experiences physical
punishment or harsh reprimands in training would be justified in fearing
training
but, if the
handler is training correctly, a dog should never fear the correction command
or an NRM. The correction command means: ‘Ahhhh, not like that, try again!’ A
correction command, as 1 use it, can be as benign as a sigh of disappointment
like that which occurs from a hushed crowd immediately after a golfer misses a
putt - ‘Ahhhhhhh’.
It is not abject disapproval ie, positive punishment via a reprimand.
Note: Withholding a
reward as a deterrent is Negative Punishment under the definitions of Operant
Conditioning.
2. This subject is covered
in more depth in Training Methods and Training Basics, but as a simple
example to whet the appetite, consider the example given above with the dog in
a Down Stay, the dog has two basic choices, ie he can Stay where he is or break
the Stay. If the dog is punished for every mistake he makes, he would have
cause to fear and hate this exercise or exhibit learned helplessness; he may
choose to fight, flee or freeze, only ‘freeze’ reflex would please the handler
(but not the dog). However, if the dog is routinely corrected for wrong choices
(the proofing technique is tempting the dog to make the common mistakes in an
exercise), he will happily try’ one choice after another knowing that, if he is
wrong there will be no unpleasant consequences (see Notes 4 and 5), he will merely hear the
correction command ‘Ahhhhhhhhh!’ meaning: not that way or not like that, try
again. If he has any intelligence at all (and most dogs do), the dog will
eventually learn what is required and will be rewarded (positively reinforced).
3. Timing. Timing is
absolutely critical to corrections, if the timing of the correction command is
poor (too late), the dog will already have broken the Stay (referring the
exercise example above) and the choices available to him will be multiplying by
the second; however, if the timing of the correction is good (the instant the
dog is thinking about breaking), the number of choices are reduced to two, ie
continue to break the Stay or Stay where he is. See more about timing in
Training Methods.
4. A so called ‘correction
command’ which has a threat of physical punishment implied or is given as a
harsh reprimand is not a true correction command, it is a complete waste of
time in obedience training (Classical Conditioning: Discovery and Investigations,
Read Lectures 5 to 14, just change the number in the URL address). I do not
consider the positive punishment (aversives)/harsh corrections given above the
threshold of comfort for the dog which are typically given to correct the
behaviour of aggressive or dominate dogs to be obedience training. This is
behaviour modification and a quite separate and special discipline in itself.
NEGATIVE
PUNISHMENT. The removal of a pleasant event contingent on a behaviour with the
goal of decreasing the likelihood of the behaviour in the future (1). Example:
This is probably more commonly used by humans with other humans, eg the removal
of privileges. In dog training, during the early stages of teaching a dog to
heel, we give constant praise and a treat when the desired behaviour occurs but
as the dog progresses we withhold the praise and treats if the dog’s heeling
does not live up to the dog’s best efforts, ie we negatively punish the
unwanted behaviours and poor performance and positively reinforce personal best
performances’. Negative Punishment is sometimes referred to in terms of
‘Response Cost’. See the note below:
Note:
1. Response Cost. If
positive reinforcement strengthens a response by adding a positive stimulus,
then response cost has to weaken a behaviour by subtracting (withholding) a
positive stimulus. After the response the positive reinforcer is removed which
weakens the frequency of that response. Trainers say we reward behaviours we
want and ignore what we don’t want. What this really means is we praise/treat
those behaviours we want and withhold praise/treats when we get behaviours we
don’t want.
CLASSICAL
CONDITIONING Classical conditioning in%ohes simple stimulant response
reactions, ie Pavlovian conditioning takes place. Pavlov’s dogs came to
associate metronome clicks with food and began to salivate (drool) on hearing
the clicks in expectation of being fed. The dogs didn’t have to do a thing for
the food. The only behaviour that was positively reinforced was the expectation
of being fed having received the stimulus. The stimulus didn’t reinforce the
drooling - that was a natural reaction to the expectation of being fed - it was a sign that the
stimulus was working. Conditioning is the learned association or connection of
one thing with another.
OPERANT.
‘Operant’ means a behaviour that operates on the environment. So when you see
the word ‘operant’, substitute ‘behaviour’ to make the meaning more clears
OPERANT
CONDiTIONING B F Skinner, who spent a whole career documenting the contingencies
of reinforcement, outlined the principles of ‘operant conditioning’. Operant
conditioning means using the concepts of positive and negative reinforcement
and punishment or correction by association. It is an extension of the
Pavlovian conditioning but in operant conditioning the dog has to actually do
something before the stimulus, the secondary reinforcer (‘click’, verbal
praise, etc) is given which in turn is associated, through conditioning, with
the primary reinforcer, eg a treat.
PRI
MARY REINFOR( ERS Primary reinforcers are those things which directly
positively or negatively reinforce behaviour eg a treat. A primary
reinforcement satisfies a biological need (eg, food, water, shelter, warmth).
SECONDARY
REINFORCERS. Secondary reinforcers are those things which, through operant
conditioning, are associated with the primary reinforcers, eg ‘click’, verbal
praise, feedback, etc. For humans, money is a secondary reinforcer because
money is associated with the primary reinforcers, food, shelter, security, sex,
status, etc. Money and Praise are unusual reinforcers. Secondary reinforcers
usually have no value in them per se, a ‘click’ is just a momentary audibLe
sound, money is just paper or metal. yet the money’s value as a reinforcer
depends on what’s printed or stamped on it. Praise depends on who gives it and
how it is given (voice tone, pitch, etc). Some argue that money and praise have
become primary reinforcers as well as secondary reinforcers. Well timed Praise
is a very powerful training tool.
Important
Note
Event
Markers. Usually event markers are secondary reinforcers but they can be
primary reinforcers too. An event marker may be a ‘click’ which is, by
convention, aLways associated with primary reinforcers or it can be a vocal
event marker which can be a secondary reinforcer or primary reinforcer. A vocal
event marker may vary from positive reinforcement (praise) right through to
positive punishment (reprimand) depending on the circumstances in which it is
used, ie a vocal event marker may indicate an imminent reward but may also be
praise (a reward in itself), a correction or a reprimand. VocaL event markers
are extremely flexible whiLe clicker event markers are restrictive and
‘wooden’. Event markers are sometimes referred to as conditioned reinforcers.
Conditioned
Reinforcers. “A conditioned reinforcer is a sound, word or phrase that has been
associated
with a reward which will signal a real reward is coming. It is spoken just as
the dog does what you want him to thus letting him know what action of his
pleased you and earned him the reward”. Off Lead” August 1986. A ‘cLicker’ is
also a conditioned reinforcer.
Clickers.
There has been a great deal of nonsense put about by clicker devotees. One of
those who promote the use of a clicker is the refreshingly level headed Gary
WiLkes. See Clicker Training: What it isn’t. My only contention with Gary’s
introduction to these articles is that the timing of the secondary reinforcer
(conditioned reinforcers/event markers) should be within half a second of the
desired response and not within one tenth of a second as Gary claims in his
article. This is why the well-timed human voice when used as a secondary
reinforcer (usually given as praise) is just as effective as the clicker and,
in most cases, more flexible than the clicker in training. This said, Gary’s
articles are excellent and should be compulsory reading for all novice
trainers. Praise and click are interchangeable in training they achieve exactly
the same thing and that is why in this page they are coupled together in the
text.
REINFORCEMENT INTENSITY. in
general, the more intense (larger or more appealing) a reinforcer, the more
effective the conditioning ie, the response is learned faster and is emitted
more frequently. Intensity is relative to the dog concerned, ie what turns him
on. What may be a very intense reinforcer to one dog eg, say play with a ball,
may be a very weak reinforcer to a dog that is only turned on by treats. A
reinforcer is not reinforcing unless it
~ reinforces, ie if the
reinforcer is not intense enough to increase the probabiLity of the response it
follows, then it is not a reinforcement.
RESPONSE/REINFORCEMENT
CONTINGENCY. In general, the interval between the response and the
reinforcement should be as short as possible. This is why precise timing of
praise (or clicker) and reward in dog training is so critical to its success.
CONTINUOUS REINFORCEMENT.
Continuous reinforcement is when a reinforcement follows each response (1:1
ratio). Continuous reinforcement tends to lead to faster conditioning, higher
rates of responding, faster extinction, and is usually used early in the
conditioning process.
PARTIAL OR RANDOM
REINFORCEMENT. Partial or random reinforcement is when a reinforcement does not
follow each response (1:2 or a greater or random ratio). Partial or random
reinforcement tends to lead to slower conditioning, lower rates of responding,
slower extinction, and is usually used late in the conditioning process.
SHAPING (SUCCESSIVE
APPROXIMATION). Shaping is often called the Method of Successive Approximation
because it involves reinforcing responses that are closer and closer
approximations of the final desired response. The final desired response is
termed the Terminal Behaviour. Shaping is a step-by-step process that begins by
reinforcing a response
that in some way
approximates the desired Terminal behaviour. Once the Terminal Behaviour ‘~ is reached, only the
Terminal Behaviour is reinforced.
-~ MODELLING
Modelling forces/manipulates/channels the dog into a position with the
use of the
feet, hands or other training equipment. ~fodelling ma~ be extremely gentle or
harsh. Harsh modelling carries with it a form of punishment while gentle
modelling is akin to
~ petting and may be used as a form of positive
reinforcement.
CHAINING. Chaining puts a
series of simple exercises into one complete exercise or to look at it another
way the exercise is broken down into simple parts which are trained separately
and then put together to make the entire exercise. Firstly, the dog learns each
simple exercise separately and then they are ‘linked’ together like the links
in a ‘chain’.
BEHAVIOUR MODIFICATION.
Behaviour modification is a structured attempt to alter the dog’s environment
and its reinforcement and punishment contingencies in order to change a
behaviour. Behaviour modification is based on the assumption that undesired
behaviours are the result of inappropriate contingencies (being rewarded for
the undesired behaviour or not being rewarded for a desired behaviour).
~ EXTINCTION.
Extinction occurs when a response is repeatedly not followed by a
reinforcement. The response - reinforcement association or causal inference is broken and the rate of
response decreases, often to the original free operant level. Extinction is
unpredictable. Sometimes it will work well and other times it may not.
~
DIFFERENTIAL REINFORCEMENT OF OTHER BEHAVIOUR (PRO). A
Differential
Reinforcement of Other Behaviours involves administering a reinforcement after
a period of time when the undesired response does not occur. That is, the dog
is reinforced for doing anything except the undesired response during this time
period. When using a DRO, the time period involved is usually short to begin
with, and increased over time (a form of shaping).
~* LEARNED
HELPLESSNESS. Learned helplessness develops when a dog perceives that no matter
what he does he is repeatedly subject to punishments that are not warranted and
that cannot be avoided, escaped, or controlled. As a result, the dog comes to
understand that he has no controL over his environment and so gives up and
passively accept whatever the environment offers.
Motivational
Effect of Learned Helplessness: The dog
becomes slow to exhibit behaviours that result in reinforcement, even when this
type of reinforcement control is possible. Also, the dog becomes slow to avoid
avoidable punishment and is often lethargic.
Cognitive
Effect of Learned helplessness: The dog
has difficulty learning in situations where the dog actually does have some
control over punishment and reinforcement. The dog is slow to learn new
contingencies.
Emotional
Effect of Learned Helplessness: The dog
tends to be passive, withdrawn, fearful, and depressed.
Some other terms you may
come across in dog training are:
~ LURING/TARGETING. Holding a
treat (or other primary’ reinforcer) in such a way as to induce the dog to make
the desired motion in pursuit of the treat (hold the treat above the sitting
dog~s head to get him to Stand or beg). It is a little like the proverbial,
‘carrot and the donkey’.
CAPTURING AND MARKINC
Marking (vuth a ‘click’ or ‘praise’) the exact moment that the dog performs the
desired behaviour, generally without direction from the handler. This is often
referred to as ‘Marking’. See event markers above.
The
Premack Principle. The L’remack Principle which is often called ‘grandma’s
rule’, states that a high frequency activity can be used to reinforce low
frequency behaviour. Access to the preferred activity is contingent on
completing the low-frequency behavi~ur. Determine what the dog likes to do but
make doing that activity contingent upon doingw~tat he wouldn’t otherwise do.
Work first then we can play.
Finally, before I close this
section on theory, the way we see rewards and punishment depends on our outlook
on life, don’t assume your dog sees it the same way:
“Losers visualize the
penalties of failure. Winners visualize the rewards of success.” and “If at
first you don’t succeed, try, try again.” This is very easy
for the success-oriented, it is hard for the person trying to avoid failing.
These are some of the main
terms you wiLl see used in dog training and in this Homepage. Now, let’s get on
with the training
- This article is provided as a ser~ ice to all those
interested in promoting th~ sport of Dog Obedience Trialling The author hercby grants permission for
individuals and non-profit orgarusatlons to reproduce and distribute this
article under the followmg conditions Full credit is given to the author ort
each at eveiy copy, with
the notation Copyright 1998 Ron ~rencc’ All copIes distributed must b~ provided free of charge If
reproduced ri ane~sletteror magazine, full credit must be given
H
Disclaimer: Some of the following
links may promote the use of certain training devices. The fact that I have
included the links here does not indicate that I agree with the training
methods recommended or the devices promoted by the author.
A Brief Introduction to
Operant Conditioning (1)
Play Training
Training Philosophy and
Background
Positive Reinforcement
Training Tips For New
HandLers
Reinforcement
~ Qp~rant Conditioning
Operant Conditioning (2)
Punishment: Problems
∓ Principles for Effective Use
Romancing the Cookie
Punishment: How
not to do it.
OPERANT (INSTRUMENTAL)
CONDITIONING
~. INTRODUCTION TO LEARNING 5
Clicker
Training: What it isn’t
Classical Conditioning:
Discovery and Investigations (Rea_ e~tures 5-14, change the URL number}~
Self-Quiz on Conditioning
~.