Run any Skill in Manus with one click

reinforcement-strategies

Use when designing, evaluating, or troubleshooting reinforcement-based interventions — covers positive and negative reinforcement, schedules, token economies, conditioned reinforcement, preference assessment, and behavioral momentum.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/ccashwell/agentic-behavior-analysis --skill reinforcement-strategies

Copy and paste this command into Claude Code to install the skill

Source

ccashwell/agentic-behavior-analysis

Stars0

Forks0

UpdatedApril 11, 2026 at 21:24

SKILL.md

readonly

name	reinforcement-strategies
description	Use when designing, evaluating, or troubleshooting reinforcement-based interventions — covers positive and negative reinforcement, schedules, token economies, conditioned reinforcement, preference assessment, and behavioral momentum.

Reinforcement Strategies

Core Definitions

Reinforcement is a process in which a consequence following a behavior increases the future probability of that behavior under similar conditions. The defining feature is the effect on behavior, not the subjective experience of the individual.

Positive reinforcement: A stimulus is added contingent on behavior, and the behavior increases. The added stimulus is the positive reinforcer.
Negative reinforcement: A stimulus is removed contingent on behavior, and the behavior increases. The removed stimulus is the aversive condition.
- Escape: Behavior terminates an ongoing aversive (e.g., child completes demand and teacher removes task).
- Avoidance: Behavior prevents or postpones an aversive before it occurs (e.g., child complies with pre-instruction cue, preventing a more demanding prompt sequence).

Both positive and negative reinforcement increase behavior. The distinction concerns whether a stimulus is added or removed, not whether the procedure is "good" or "bad."

Unconditioned vs Conditioned Reinforcers

Unconditioned reinforcers (URs): Function as reinforcers without prior learning — food, water, warmth, sexual stimulation, escape from pain. Effectiveness depends on relevant motivating operations (deprivation/satiation).
Conditioned reinforcers (CRs): Acquire reinforcing properties through pairing with established reinforcers. Tokens, praise, points, money are conditioned reinforcers. A generalized conditioned reinforcer is paired with multiple backup reinforcers, making it resistant to satiation (e.g., tokens exchangeable for a menu of items).

Establishing Conditioned Reinforcers

Present the neutral stimulus immediately before or simultaneously with the established reinforcer.
Pair repeatedly across multiple trials and contexts.
Gradually thin direct access to backup reinforcers while maintaining the conditioned reinforcer's value.
Monitor — if the conditioned reinforcer loses its effect, re-pair with backup reinforcers.

Schedules of Reinforcement

Continuous Reinforcement (CRF)

Every instance of the target behavior is reinforced. Use during acquisition to establish new behavior rapidly. CRF produces rapid learning but also rapid extinction when reinforcement is withdrawn.

Intermittent Schedules

Transition from CRF to intermittent schedules to build resistance to extinction and maintain behavior efficiently.

Schedule	Definition	Response Pattern
Fixed Ratio (FR)	Reinforcement after a fixed number of responses	High, steady rate with post-reinforcement pause
Variable Ratio (VR)	Reinforcement after an average number of responses, varying across instances	High, steady rate with minimal pausing — most resistant to extinction
Fixed Interval (FI)	Reinforcement for the first response after a fixed time period	Scalloped pattern — low rate after reinforcement, accelerating as interval end approaches
Variable Interval (VI)	Reinforcement for the first response after an average time, varying	Moderate, steady rate — resistant to extinction

Schedule Thinning

Abrupt removal of reinforcement risks extinction. Thin schedules gradually:

Increase ratio or interval requirements in small increments (e.g., FR-1 → FR-2 → FR-3 → FR-5).
Monitor behavior at each step — if responding deteriorates, return to the previous schedule temporarily.
Use a thinning criterion (e.g., 80% correct across two sessions before advancing).
For token economies, increase the response requirement per token, or decrease the exchange rate for backup reinforcers.

Token Economies

Design

Define target behaviors operationally.
Select tokens (chips, points, stickers, digital counters) appropriate to the learner's developmental level.
Establish a token-to-backup-reinforcer exchange rate.
Create a reinforcer menu with multiple backup options (mitigates satiation).
Specify exchange schedule (e.g., end of session, end of day).

Implementation

Deliver tokens immediately following the target behavior with a brief descriptive praise statement.
Teach the exchange process explicitly before the system begins.
Pair token delivery with social reinforcement to condition social praise as a reinforcer.
Use a visual token board for learners who benefit from concrete representations.

Troubleshooting

Token hoarding: Set maximum accumulation limits or schedule regular exchange opportunities.
Loss of motivation: Refresh the reinforcer menu, conduct new preference assessments, check for satiation.
Stealing tokens: Use individualized, non-transferable token systems.
Failure to respond: Verify the backup reinforcers are functional — conduct a paired-stimulus or multiple-stimulus preference assessment.

Premack Principle

A high-probability behavior can reinforce a low-probability behavior when access to the high-probability behavior is contingent on performing the low-probability behavior. Clinically: "First [work task], then [preferred activity]." This is sometimes called "Grandma's rule." Useful when tangible reinforcers are unavailable or when building natural contingency awareness.

Matching Law

Herrnstein's matching law states that the relative rate of responding to alternatives matches the relative rate of reinforcement obtained from those alternatives. Clinical implication: if problem behavior produces richer, more immediate, or more consistent reinforcement than appropriate behavior, the matching law predicts the individual will allocate responding toward problem behavior. Intervention requires making reinforcement for appropriate behavior more favorable than for problem behavior across all dimensions: rate, immediacy, quality, magnitude, and schedule.

Satiation and Deprivation

Deprivation: Period without access to a reinforcer — establishes the reinforcer's value (establishing operation).
Satiation: Recent, abundant access to a reinforcer — decreases its value (abolishing operation).

Clinically, ensure a state of relative deprivation for the target reinforcer before sessions. Avoid inadvertently providing free access to reinforcers used in programming (e.g., if iPad time is the reinforcer, don't allow unlimited iPad access before sessions).

Selecting Reinforcers

Preference Assessments

Free operant observation: Observe what the individual interacts with when given unrestricted access.
Single-stimulus (successive choice): Present items one at a time, record approach/avoidance.
Paired-stimulus (Fisher et al., 1992): Present two items, record selection across all pairs — produces a rank-ordered hierarchy.
Multiple-stimulus without replacement (MSWO; DeLeon & Iwata, 1996): Present an array, record selection, remove chosen item, re-present remaining — efficient rank ordering.
Multiple-stimulus with replacement (MS): Item is returned to the array after selection.

Reassess preference regularly. Preference is not static — what functions as a reinforcer this week may not next week.

Reinforcer Variation

Rotate reinforcers across sessions and within sessions to reduce satiation. Use a reinforcer menu and allow choice.

Behavioral Momentum and High-Probability Sequences

Behavioral momentum theory (Nevin, 1992): Behavior in a given context has both rate and resistance to change. Dense reinforcement history builds momentum.

High-p sequence: Deliver 3–5 rapid, high-probability requests (requests the individual typically complies with) immediately before a low-probability request. The momentum of compliance carries into the low-p demand. Effective for increasing initial compliance, particularly with escape-maintained noncompliance.

Procedural requirements:

High-p requests must have a known compliance rate above 80%.
Deliver high-p requests rapidly (inter-trial interval of ~3–5 seconds).
Reinforce each high-p compliance briefly (praise).
Deliver the low-p request within 5 seconds of the last high-p compliance.

Key References

Cooper, J. O., Heron, T. E., & Heward, W. L. (2020). Applied Behavior Analysis (3rd ed.). Pearson.
Ferster, C. B., & Skinner, B. F. (1957). Schedules of Reinforcement. Appleton-Century-Crofts.
Fisher, W., Piazza, C. C., Bowman, L. G., Hagopian, L. P., Owens, J. C., & Slevin, I. (1992). A comparison of two approaches for identifying reinforcers for persons with severe and profound disabilities. Journal of Applied Behavior Analysis, 25(2), 491–498.
DeLeon, I. G., & Iwata, B. A. (1996). Evaluation of a multiple-stimulus presentation format for assessing reinforcer preferences. Journal of Applied Behavior Analysis, 29(4), 519–533.
Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4(3), 267–272.
Nevin, J. A. (1992). An integrative model for the study of behavioral momentum. Journal of the Experimental Analysis of Behavior, 57(3), 301–316.

Reinforcement Strategies

Core Definitions

Positive reinforcement: A stimulus is added contingent on behavior, and the behavior increases. The added stimulus is the positive reinforcer.
Negative reinforcement: A stimulus is removed contingent on behavior, and the behavior increases. The removed stimulus is the aversive condition.
- Escape: Behavior terminates an ongoing aversive (e.g., child completes demand and teacher removes task).
- Avoidance: Behavior prevents or postpones an aversive before it occurs (e.g., child complies with pre-instruction cue, preventing a more demanding prompt sequence).

Both positive and negative reinforcement increase behavior. The distinction concerns whether a stimulus is added or removed, not whether the procedure is "good" or "bad."

Unconditioned vs Conditioned Reinforcers

Unconditioned reinforcers (URs): Function as reinforcers without prior learning — food, water, warmth, sexual stimulation, escape from pain. Effectiveness depends on relevant motivating operations (deprivation/satiation).
Conditioned reinforcers (CRs): Acquire reinforcing properties through pairing with established reinforcers. Tokens, praise, points, money are conditioned reinforcers. A generalized conditioned reinforcer is paired with multiple backup reinforcers, making it resistant to satiation (e.g., tokens exchangeable for a menu of items).

Establishing Conditioned Reinforcers

Present the neutral stimulus immediately before or simultaneously with the established reinforcer.
Pair repeatedly across multiple trials and contexts.
Gradually thin direct access to backup reinforcers while maintaining the conditioned reinforcer's value.
Monitor — if the conditioned reinforcer loses its effect, re-pair with backup reinforcers.

Schedules of Reinforcement

Continuous Reinforcement (CRF)

Every instance of the target behavior is reinforced. Use during acquisition to establish new behavior rapidly. CRF produces rapid learning but also rapid extinction when reinforcement is withdrawn.

Intermittent Schedules

Transition from CRF to intermittent schedules to build resistance to extinction and maintain behavior efficiently.

Schedule	Definition	Response Pattern
Fixed Ratio (FR)	Reinforcement after a fixed number of responses	High, steady rate with post-reinforcement pause
Variable Ratio (VR)	Reinforcement after an average number of responses, varying across instances	High, steady rate with minimal pausing — most resistant to extinction
Fixed Interval (FI)	Reinforcement for the first response after a fixed time period	Scalloped pattern — low rate after reinforcement, accelerating as interval end approaches
Variable Interval (VI)	Reinforcement for the first response after an average time, varying	Moderate, steady rate — resistant to extinction

Schedule Thinning

Abrupt removal of reinforcement risks extinction. Thin schedules gradually:

Increase ratio or interval requirements in small increments (e.g., FR-1 → FR-2 → FR-3 → FR-5).
Monitor behavior at each step — if responding deteriorates, return to the previous schedule temporarily.
Use a thinning criterion (e.g., 80% correct across two sessions before advancing).
For token economies, increase the response requirement per token, or decrease the exchange rate for backup reinforcers.

Token Economies

Design

Define target behaviors operationally.
Select tokens (chips, points, stickers, digital counters) appropriate to the learner's developmental level.
Establish a token-to-backup-reinforcer exchange rate.
Create a reinforcer menu with multiple backup options (mitigates satiation).
Specify exchange schedule (e.g., end of session, end of day).

Implementation

Deliver tokens immediately following the target behavior with a brief descriptive praise statement.
Teach the exchange process explicitly before the system begins.
Pair token delivery with social reinforcement to condition social praise as a reinforcer.
Use a visual token board for learners who benefit from concrete representations.

Troubleshooting

Token hoarding: Set maximum accumulation limits or schedule regular exchange opportunities.
Loss of motivation: Refresh the reinforcer menu, conduct new preference assessments, check for satiation.
Stealing tokens: Use individualized, non-transferable token systems.
Failure to respond: Verify the backup reinforcers are functional — conduct a paired-stimulus or multiple-stimulus preference assessment.

Premack Principle

Matching Law

Satiation and Deprivation

Deprivation: Period without access to a reinforcer — establishes the reinforcer's value (establishing operation).
Satiation: Recent, abundant access to a reinforcer — decreases its value (abolishing operation).

Selecting Reinforcers

Preference Assessments

Free operant observation: Observe what the individual interacts with when given unrestricted access.
Single-stimulus (successive choice): Present items one at a time, record approach/avoidance.
Paired-stimulus (Fisher et al., 1992): Present two items, record selection across all pairs — produces a rank-ordered hierarchy.
Multiple-stimulus without replacement (MSWO; DeLeon & Iwata, 1996): Present an array, record selection, remove chosen item, re-present remaining — efficient rank ordering.
Multiple-stimulus with replacement (MS): Item is returned to the array after selection.

Reassess preference regularly. Preference is not static — what functions as a reinforcer this week may not next week.

Reinforcer Variation

Rotate reinforcers across sessions and within sessions to reduce satiation. Use a reinforcer menu and allow choice.

Behavioral Momentum and High-Probability Sequences

Behavioral momentum theory (Nevin, 1992): Behavior in a given context has both rate and resistance to change. Dense reinforcement history builds momentum.

Procedural requirements:

High-p requests must have a known compliance rate above 80%.
Deliver high-p requests rapidly (inter-trial interval of ~3–5 seconds).
Reinforce each high-p compliance briefly (praise).
Deliver the low-p request within 5 seconds of the last high-p compliance.

Key References

Cooper, J. O., Heron, T. E., & Heward, W. L. (2020). Applied Behavior Analysis (3rd ed.). Pearson.
Ferster, C. B., & Skinner, B. F. (1957). Schedules of Reinforcement. Appleton-Century-Crofts.
Fisher, W., Piazza, C. C., Bowman, L. G., Hagopian, L. P., Owens, J. C., & Slevin, I. (1992). A comparison of two approaches for identifying reinforcers for persons with severe and profound disabilities. Journal of Applied Behavior Analysis, 25(2), 491–498.
DeLeon, I. G., & Iwata, B. A. (1996). Evaluation of a multiple-stimulus presentation format for assessing reinforcer preferences. Journal of Applied Behavior Analysis, 29(4), 519–533.
Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4(3), 267–272.
Nevin, J. A. (1992). An integrative model for the study of behavioral momentum. Journal of the Experimental Analysis of Behavior, 57(3), 301–316.

reinforcement-strategies

Reinforcement Strategies

Core Definitions

Unconditioned vs Conditioned Reinforcers

Establishing Conditioned Reinforcers

Schedules of Reinforcement

Continuous Reinforcement (CRF)

Intermittent Schedules

Schedule Thinning

Token Economies

Design

Implementation

Troubleshooting

Premack Principle

Matching Law

Satiation and Deprivation

Selecting Reinforcers

Preference Assessments

Reinforcer Variation

Behavioral Momentum and High-Probability Sequences

Key References

More from this repository

More from this repository

Reinforcement Strategies

Core Definitions

Unconditioned vs Conditioned Reinforcers

Establishing Conditioned Reinforcers

Schedules of Reinforcement

Continuous Reinforcement (CRF)

Intermittent Schedules

Schedule Thinning

Token Economies

Design

Implementation

Troubleshooting

Premack Principle

Matching Law

Satiation and Deprivation

Selecting Reinforcers

Preference Assessments

Reinforcer Variation

Behavioral Momentum and High-Probability Sequences

Key References