| name | reinforcement-strategies |
| description | Use when designing, evaluating, or troubleshooting reinforcement-based interventions — covers positive and negative reinforcement, schedules, token economies, conditioned reinforcement, preference assessment, and behavioral momentum. |
Reinforcement Strategies
Core Definitions
Reinforcement is a process in which a consequence following a behavior increases the future probability of that behavior under similar conditions. The defining feature is the effect on behavior, not the subjective experience of the individual.
- Positive reinforcement: A stimulus is added contingent on behavior, and the behavior increases. The added stimulus is the positive reinforcer.
- Negative reinforcement: A stimulus is removed contingent on behavior, and the behavior increases. The removed stimulus is the aversive condition.
- Escape: Behavior terminates an ongoing aversive (e.g., child completes demand and teacher removes task).
- Avoidance: Behavior prevents or postpones an aversive before it occurs (e.g., child complies with pre-instruction cue, preventing a more demanding prompt sequence).
Both positive and negative reinforcement increase behavior. The distinction concerns whether a stimulus is added or removed, not whether the procedure is "good" or "bad."
Unconditioned vs Conditioned Reinforcers
- Unconditioned reinforcers (URs): Function as reinforcers without prior learning — food, water, warmth, sexual stimulation, escape from pain. Effectiveness depends on relevant motivating operations (deprivation/satiation).
- Conditioned reinforcers (CRs): Acquire reinforcing properties through pairing with established reinforcers. Tokens, praise, points, money are conditioned reinforcers. A generalized conditioned reinforcer is paired with multiple backup reinforcers, making it resistant to satiation (e.g., tokens exchangeable for a menu of items).
Establishing Conditioned Reinforcers
- Present the neutral stimulus immediately before or simultaneously with the established reinforcer.
- Pair repeatedly across multiple trials and contexts.
- Gradually thin direct access to backup reinforcers while maintaining the conditioned reinforcer's value.
- Monitor — if the conditioned reinforcer loses its effect, re-pair with backup reinforcers.
Schedules of Reinforcement
Continuous Reinforcement (CRF)
Every instance of the target behavior is reinforced. Use during acquisition to establish new behavior rapidly. CRF produces rapid learning but also rapid extinction when reinforcement is withdrawn.
Intermittent Schedules
Transition from CRF to intermittent schedules to build resistance to extinction and maintain behavior efficiently.
| Schedule | Definition | Response Pattern |
|---|
| Fixed Ratio (FR) | Reinforcement after a fixed number of responses | High, steady rate with post-reinforcement pause |
| Variable Ratio (VR) | Reinforcement after an average number of responses, varying across instances | High, steady rate with minimal pausing — most resistant to extinction |
| Fixed Interval (FI) | Reinforcement for the first response after a fixed time period | Scalloped pattern — low rate after reinforcement, accelerating as interval end approaches |
| Variable Interval (VI) | Reinforcement for the first response after an average time, varying | Moderate, steady rate — resistant to extinction |
Schedule Thinning
Abrupt removal of reinforcement risks extinction. Thin schedules gradually:
- Increase ratio or interval requirements in small increments (e.g., FR-1 → FR-2 → FR-3 → FR-5).
- Monitor behavior at each step — if responding deteriorates, return to the previous schedule temporarily.
- Use a thinning criterion (e.g., 80% correct across two sessions before advancing).
- For token economies, increase the response requirement per token, or decrease the exchange rate for backup reinforcers.
Token Economies
Design
- Define target behaviors operationally.
- Select tokens (chips, points, stickers, digital counters) appropriate to the learner's developmental level.
- Establish a token-to-backup-reinforcer exchange rate.
- Create a reinforcer menu with multiple backup options (mitigates satiation).
- Specify exchange schedule (e.g., end of session, end of day).
Implementation
- Deliver tokens immediately following the target behavior with a brief descriptive praise statement.
- Teach the exchange process explicitly before the system begins.
- Pair token delivery with social reinforcement to condition social praise as a reinforcer.
- Use a visual token board for learners who benefit from concrete representations.
Troubleshooting
- Token hoarding: Set maximum accumulation limits or schedule regular exchange opportunities.
- Loss of motivation: Refresh the reinforcer menu, conduct new preference assessments, check for satiation.
- Stealing tokens: Use individualized, non-transferable token systems.
- Failure to respond: Verify the backup reinforcers are functional — conduct a paired-stimulus or multiple-stimulus preference assessment.
Premack Principle
A high-probability behavior can reinforce a low-probability behavior when access to the high-probability behavior is contingent on performing the low-probability behavior. Clinically: "First [work task], then [preferred activity]." This is sometimes called "Grandma's rule." Useful when tangible reinforcers are unavailable or when building natural contingency awareness.
Matching Law
Herrnstein's matching law states that the relative rate of responding to alternatives matches the relative rate of reinforcement obtained from those alternatives. Clinical implication: if problem behavior produces richer, more immediate, or more consistent reinforcement than appropriate behavior, the matching law predicts the individual will allocate responding toward problem behavior. Intervention requires making reinforcement for appropriate behavior more favorable than for problem behavior across all dimensions: rate, immediacy, quality, magnitude, and schedule.
Satiation and Deprivation
- Deprivation: Period without access to a reinforcer — establishes the reinforcer's value (establishing operation).
- Satiation: Recent, abundant access to a reinforcer — decreases its value (abolishing operation).
Clinically, ensure a state of relative deprivation for the target reinforcer before sessions. Avoid inadvertently providing free access to reinforcers used in programming (e.g., if iPad time is the reinforcer, don't allow unlimited iPad access before sessions).
Selecting Reinforcers
Preference Assessments
- Free operant observation: Observe what the individual interacts with when given unrestricted access.
- Single-stimulus (successive choice): Present items one at a time, record approach/avoidance.
- Paired-stimulus (Fisher et al., 1992): Present two items, record selection across all pairs — produces a rank-ordered hierarchy.
- Multiple-stimulus without replacement (MSWO; DeLeon & Iwata, 1996): Present an array, record selection, remove chosen item, re-present remaining — efficient rank ordering.
- Multiple-stimulus with replacement (MS): Item is returned to the array after selection.
Reassess preference regularly. Preference is not static — what functions as a reinforcer this week may not next week.
Reinforcer Variation
Rotate reinforcers across sessions and within sessions to reduce satiation. Use a reinforcer menu and allow choice.
Behavioral Momentum and High-Probability Sequences
Behavioral momentum theory (Nevin, 1992): Behavior in a given context has both rate and resistance to change. Dense reinforcement history builds momentum.
High-p sequence: Deliver 3–5 rapid, high-probability requests (requests the individual typically complies with) immediately before a low-probability request. The momentum of compliance carries into the low-p demand. Effective for increasing initial compliance, particularly with escape-maintained noncompliance.
Procedural requirements:
- High-p requests must have a known compliance rate above 80%.
- Deliver high-p requests rapidly (inter-trial interval of ~3–5 seconds).
- Reinforce each high-p compliance briefly (praise).
- Deliver the low-p request within 5 seconds of the last high-p compliance.
Key References
- Cooper, J. O., Heron, T. E., & Heward, W. L. (2020). Applied Behavior Analysis (3rd ed.). Pearson.
- Ferster, C. B., & Skinner, B. F. (1957). Schedules of Reinforcement. Appleton-Century-Crofts.
- Fisher, W., Piazza, C. C., Bowman, L. G., Hagopian, L. P., Owens, J. C., & Slevin, I. (1992). A comparison of two approaches for identifying reinforcers for persons with severe and profound disabilities. Journal of Applied Behavior Analysis, 25(2), 491–498.
- DeLeon, I. G., & Iwata, B. A. (1996). Evaluation of a multiple-stimulus presentation format for assessing reinforcer preferences. Journal of Applied Behavior Analysis, 29(4), 519–533.
- Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4(3), 267–272.
- Nevin, J. A. (1992). An integrative model for the study of behavioral momentum. Journal of the Experimental Analysis of Behavior, 57(3), 301–316.