| name | context-clip |
| description | Create context-limited versions of evaluations. Use when you need to simulate lower context windows or compare model performance at different context limits. Keywords - context, clip, truncate, window, limit, simulate, 8k, 12k, 16k. Use when this capability is needed. |
| metadata | {"author":"the-crypt-keeper"} |
Context Clip Skill
You are helping the user create context-limited versions of existing evaluations using the cohort.py tool.
Workflow
When the user invokes /context-clip <cohort-or-pattern> <eval-pattern> <contexts>, follow these steps:
Important: Use the AskUserQuestion tool for all user confirmations to maintain smooth flow without breaking execution.
1. Parse User Input
Extract from the command:
- Cohort pattern: Cohort path or pattern (e.g.,
Qwen3-Next-80B, data/m12x/Qwen*, or empty for all)
- Eval pattern: Description of which eval to process (e.g., "thinking 16k", "fp16", "awq")
- Target contexts: One or more context limits (e.g., "12k", "8k", "12k and 8k", "8192", "12288")
Input flexibility:
- Cohort can be just the model name (prefix
data/m12x/ if needed)
- Cohort can be a glob pattern for multiple cohorts
- Eval pattern can be any substring or keywords
- Contexts can be in K notation (8k, 12k) or raw tokens (8192, 12288)
2. List Available Evals
Discover what evals exist using cohort.py list:
python cohort.py list data/m12x/<CohortName>
python cohort.py list 'data/m12x/<Pattern>*/'
python cohort.py list
Parse the markdown table output to extract cohort, eval_id, label, and groups for each eval.
3. Match Eval Pattern
Filter the evals list using the user's pattern:
- Case-insensitive substring match on: label, cohort, or group values
- Show all matches to the user
Use AskUserQuestion:
- If 1 match: "Confirm eval?" with the matched eval as default option
- If 2-4 matches: Present all as options, multiSelect=false
- If 5+ matches: Show list and ask user to be more specific, then re-run
Question format:
Which eval should be context-clipped?
Options:
[cohort] [eval_id] Label
4. Parse Target Contexts
Convert user's context specifications to token counts:
- "8k" ā 8192
- "12k" ā 12288
- "16k" ā 16384
- "32k" ā 32768
- Already numeric ā use as-is
If user specified multiple contexts (e.g., "12k and 8k"), process all of them.
Use AskUserQuestion to confirm:
Create context-limited versions at: 12288, 8192 tokens?
Options:
- Yes, proceed
- Modify context values
If "Modify context values", ask again with editable input.
5. Check for Existing Context Variants
Before processing, check if context-limited versions already exist:
python cohort.py list <cohort-path> | grep "ctx"
If found, use AskUserQuestion:
Existing context-limited versions found:
- [eval_id] Label (8k ctx)
- [eval_id] Label (12k ctx)
Options:
- Skip existing, create missing only
- Recreate all (will skip if folders exist)
- Cancel
6. Execute Context Clipping
For each target context, run:
source venv/bin/activate
python cohort.py context <cohort-path> \
--eval-id <eval_id> \
--context <context_tokens>
Show progress for each context:
Processing 12288 token limit...
ā Created 3 result folders, clipped X samples
ā Updated evals.json
Processing 8192 token limit...
ā Created 3 result folders, clipped Y samples
ā Updated evals.json
7. Verify Results
After processing, list the cohort again to show the new evals:
python cohort.py list <cohort-path>
Show the user:
- Original eval
- New context-limited eval(s)
- Their eval_ids for future reference
8. Suggest Next Steps
Inform the user:
Context-limited evaluations created successfully!
To analyze, you'll need to:
1. Add the cohort to a dataset config (data/*.json)
2. Rebuild the dataset database
3. Run analysis commands
Example workflow:
# If cohort already in a dataset, just rebuild:
python evaluate.py --dataset data/<dataset>.json
# Then compare performance across contexts:
python analyze.py cluster data/<dataset>.json \
--filters '{"eval_id": ["<original>", "<ctx12k>", "<ctx8k>"]}' \
--stack sampler
# Or examine truncation patterns:
python analyze.py surface data/<dataset>.json \
--filters '{"eval_id": ["<ctx8k>"], "base_task": "arithmetic"}' \
--output-dir research/<project>/
Edge Cases
- No matches found: Ask user to check pattern or list all evals
- Multiple target contexts: Process sequentially, report progress
- Multiple cohorts match pattern: Ask user to specify which cohort
- Already processed: Skip with warning, or confirm recreation
- Source eval has lower context than target: Error - cannot clip to higher context than source
- Invalid context value: Must be between 1024 and 131072 (1k-128k)
- Cohort not in any dataset: Inform user they'll need to add to a dataset config for analysis
Examples
Example 1: Simple Case
User: /context-clip Qwen3-Next-80B "thinking 16k fp16" 12k
1. Cohort: data/m12x/Qwen3-Next-80B
2. List evals: python cohort.py list data/m12x/Qwen3-Next-80B
3. Found 1 match:
[bc8eef] Qwen3-Next-80B Thinking (FP16, 16k)
4. Confirm: Yes
5. Target contexts: 12288 tokens
6. Execute:
python cohort.py context data/m12x/Qwen3-Next-80B --eval-id bc8eef --context 12288
7. ā Created 3 result folders, clipped 335+835+1423 samples
8. ā New eval_id: 221ec5
Example 2: Multiple Contexts
User: /context-clip Qwen3-Next-80B fp16 "12k and 8k"
1. Cohort: data/m12x/Qwen3-Next-80B
2. List evals: python cohort.py list data/m12x/Qwen3-Next-80B
3. Found 2 matches:
[bc8eef] Qwen3-Next-80B Thinking (FP16, 16k)
[5a2b3c] Qwen3-Next-80B Instruct (FP16, 16k)
4. Ask user to pick ā [bc8eef]
5. Target contexts: 12288, 8192 tokens
6. Confirm: Yes
7. Execute:
12k: ā Created [221ec5] (335+835+1423 samples clipped)
8k: ā Created [8fc62e] (1314+3717+5998 samples clipped)
8. Show both new eval_ids and suggest next steps
Example 3: Pattern Matching Multiple Cohorts
User: /context-clip "Qwen*" fp16 12k
1. List all: python cohort.py list 'data/m12x/Qwen*/'
2. Found evals in multiple cohorts:
- Qwen3-32B
- Qwen3-Next-80B
- Qwen3-14B
3. Ask user which cohort ā Qwen3-32B
4. List evals in that cohort... found 3 fp16 variants
5. Ask user to be more specific
6. User refines: "qwen3-32b fp16 thinking"
7. Found 1 match ā confirm ā execute
Context Value Guidelines
Common context limits and use cases:
| Context | Tokens | Use Case |
|---|
| 2k | 2048 | Extreme resource constraints |
| 4k | 4096 | Mobile/edge deployment |
| 8k | 8192 | Standard constrained deployment |
| 12k | 12288 | Moderate context tasks |
| 16k | 16384 | Extended reasoning tasks |
Guidelines:
- Source eval should have higher context than target
- Common pattern: 16k ā 12k, 8k for comparative analysis
- Avoid creating too many variants (stick to 2-3 key points)
Notes
- Always work from the ReasonScape root directory
- Activate venv before running cohort.py:
source venv/bin/activate
- Context clipping is non-destructive (creates new folders)
- Each context limit gets a unique eval_id
- Original result folders are never modified
- The tool updates evals.json automatically
- New result folders include postprocess metadata in experiment_config.yaml
- Cohorts can be used across multiple datasets
- After clipping, you may need to add the cohort to a dataset config for analysis
- Be explicit about what you're doing at each step
- If uncertain about anything, ask the user before proceeding
Converted and distributed by TomeVault ā claim your Tome and manage your conversions.