Run any Skill in Manus with one click

$pwd:

blog-calibrate

Name: Blog Calibrate
Author: jeremylongshore

// Monthly calibration report for the blog content tier classification system. Analyzes decisions.jsonl and feedback.jsonl to detect tier inflation, assess calibration accuracy, and surface emergent patterns. Trigger with /blog-calibrate.

Run Skill in Manus

$ git log --oneline --stat

stars:9

forks:1

updated:May 16, 2026 at 20:08

SKILL.md

readonly

related-skills.json

same repository

blog-backfill.md

from "jeremylongshore/startaitools.com"

Backfill daily blog posts to startaitools.com for a date range. Auto-classifies content into 3 tiers (Field Note / Deep-Dive / Case Study). Tracks methodology decisions in JSONL for calibration analysis. Dual-publishes to tonsofskills.com/blog, cross-posts to Dev.to, Hashnode, Medium, and Substack. Generates weekly recaps and monthly retrospectives on cadence. Use when backfilling missed blog posts. Trigger with /blog-backfill.

2026-05-289

blog-feedback.md

from "jeremylongshore/startaitools.com"

Post-publication assessment for blog posts classified by the tier system. Records whether the classification was correct, captures engagement data, and feeds the calibration loop. Trigger with /blog-feedback.

2026-05-169

package.json

"author": "jeremylongshore"

"repository": "jeremylongshore/startaitools.com"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Market Research Analysts and Marketing SpecialistsBusiness and Financial Operations Occupations13-1161L4

name	blog-calibrate
description	Monthly calibration report for the blog content tier classification system. Analyzes decisions.jsonl and feedback.jsonl to detect tier inflation, assess calibration accuracy, and surface emergent patterns. Trigger with /blog-calibrate.
version	1.0.0
author	Jeremy Longshore <jeremy@intentsolutions.io>
license	Proprietary
tags	["content","blog","methodology","calibration","analytics"]
allowed-tools	Read,Write,Edit,Glob,Grep,Bash(command:*),Agent
argument-hint	[YYYY-MM]
model	opus
compatibility	Designed for Claude Code

blog-calibrate

Overview

Generates a monthly calibration report from the blog-backfill methodology tracking system. Analyzes classification decisions, feedback assessments, and accumulated patterns to detect drift, validate the classification algorithm, and surface actionable improvements.

Grounded in Annie Duke's decision quality framework (Thinking in Bets): good decisions can have bad outcomes and vice versa. The calibration report separates decision quality from outcome quality.

Prerequisites

/home/jeremy/000-projects/blog/startaitools/.claude/skills/blog-backfill/methodology/decisions.jsonl — at least 10 entries
/home/jeremy/000-projects/blog/startaitools/.claude/skills/blog-backfill/methodology/feedback.jsonl — optional but improves accuracy
/home/jeremy/000-projects/blog/startaitools/.claude/skills/blog-backfill/methodology/patterns.jsonl — optional, accumulated rules

Arguments

$ARGUMENTS = YYYY-MM (month to calibrate). If omitted, uses previous month.

Instructions

Step 1: Load Data

Read all three JSONL files. Filter decisions to the target month. Load all feedback (may reference earlier decisions). Load all patterns.

# Count records for the month
grep "\"date\": \"$YEAR-$MONTH" /home/jeremy/000-projects/blog/startaitools/.claude/skills/blog-backfill/methodology/decisions.jsonl | wc -l

If fewer than 5 decisions exist for the month, warn that sample size is too small for reliable calibration.

Step 2: Tier Distribution Analysis

Calculate:

Tier 1/2/3 counts and percentages for the month
Compare against expected distribution (60-70% T1, 25-35% T2, 5-10% T3)
Compare against previous month (if data exists)
Trend direction: inflating, stable, or deflating

Inflation detection rules:

Tier 1 < 55% → WARNING: probable inflation
Tier 3 > 15% → WARNING: Tier 3 overuse
Tier 2 increasing month-over-month for 3+ months → WARNING: grade creep

Step 3: Confidence Calibration (Brier Score)

For decisions that have feedback records:

Calculate Brier score: avg((confidence - was_correct)^2)
Perfect calibration = 0.0, worst = 1.0
Target: < 0.15

Interpretation:

< 0.10: Well calibrated
0.10-0.20: Acceptable, minor adjustments needed
0.20-0.30: Miscalibrated, review dimension scoring anchors
0.30: Seriously miscalibrated, retrain on calibration examples

Step 4: Decision Quality Matrix (Annie Duke 2x2)

Cross-tabulate confidence vs correctness:

	Good Outcome (correct)	Bad Outcome (incorrect)
High Confidence (>0.7)	Good Decision + Good Outcome	Good Decision + Bad Outcome (learn why)
Low Confidence (<0.7)	Lucky (examine why unsure)	Bad Decision + Bad Outcome (fix process)

Focus attention on:

"Good Decision + Bad Outcome" quadrant → these reveal blind spots in the dimension scoring
"Lucky" quadrant → these reveal under-confidence, possibly over-applying anti-inflation rules

Step 5: Dimension Analysis

For each dimension (NOV, ARC, NAR, TCH, SCP, RPR):

Average score across all decisions
Score distribution (histogram)
Most common score level
Correlation with tier assignment

Anti-pattern detection:

If one dimension consistently drives tier upgrades (e.g., SCP always high), check if the anti-inflation rule "high scope alone doesn't escalate" is being enforced
If NAR is consistently 0-1 but posts are being classified Tier 2+, the algorithm may be too lenient on the Tier 2 gate

Step 6: Anti-Inflation Flag Analysis

Count how often each flag was triggered
Check for flags that should have triggered but didn't
Identify posts where anti-inflation overrides changed the tier

Step 7: Pattern Learning

If 30+ decisions exist in the dataset:

Look for emergent classification rules not in the original algorithm
Check for day-of-week effects (e.g., Mondays consistently Tier 1)
Check for project-specific patterns (e.g., repo X work always scores high on ARC)
Surface any patterns worth adding to patterns.jsonl

Step 8: Generate Report

Write the calibration report to /home/jeremy/000-projects/blog/startaitools/.claude/skills/blog-backfill/methodology/calibration-$YEAR-$MONTH.md.

Output Format

# Calibration Report: {Month YYYY}

## Summary
- Decisions analyzed: {N}
- Feedback records: {N}
- Overall health: HEALTHY / NEEDS ATTENTION / MISCALIBRATED

## Tier Distribution
| Tier | Count | Actual % | Expected % | Status |
|------|-------|----------|------------|--------|
| 1    | {N}   | {%}      | 60-70%     | OK/WARN |
| 2    | {N}   | {%}      | 25-35%     | OK/WARN |
| 3    | {N}   | {%}      | 5-10%      | OK/WARN |

Trend: {inflating / stable / deflating} (vs previous month)

## Calibration Accuracy
- Brier score: {N} ({interpretation})
- Sample size: {N} feedback records
- Confidence histogram: ...

## Decision Quality Matrix
{2x2 table with counts and key takeaways}

## Dimension Analysis
{Per-dimension stats and anti-pattern flags}

## Anti-Inflation Effectiveness
{Flag trigger counts and gaps}

## Emergent Patterns
{New rules discovered, if any — candidates for patterns.jsonl}

## Recommendations
1. {Specific actionable recommendation}
2. {Specific actionable recommendation}
3. {Specific actionable recommendation}

Step 9: Update Patterns (if warranted)

If Step 7 discovered reliable patterns (consistent across 10+ decisions), append them to methodology/patterns.jsonl:

{
  "pattern_id": "auto-YYYY-MM-NNN",
  "name": "Pattern name",
  "description": "What the pattern is",
  "direction": "upgrade|downgrade|neutral",
  "conditions": "When it applies",
  "evidence": "Data supporting it",
  "discovered_date": "YYYY-MM-DD",
  "times_applied": 0,
  "active": true
}

Step 10: Rebuild SQLite Index

cd /home/jeremy/000-projects/blog/startaitools/.claude/skills/blog-backfill
sqlite3 methodology/index.db < methodology/rebuild-index.sql

Error Handling

No decisions for month: Print "No classification data for {month}. Run /blog-backfill first."
No feedback records: Run report without calibration accuracy. Note: "Brier score unavailable — run /blog-feedback on past posts to populate."
SQLite rebuild fails: Non-fatal. JSONL files are source of truth.

Examples

/blog-calibrate 2026-04

Generates calibration report for April 2026. Analyzes tier distribution, Brier score, dimension patterns, and surfaces recommendations.

/blog-calibrate

No argument — calibrates previous month automatically.

blog-calibrate

More from this repository

More from this repository

blog-calibrate

Overview

Prerequisites

Arguments

Instructions

Step 1: Load Data

Step 2: Tier Distribution Analysis

Step 3: Confidence Calibration (Brier Score)

Step 4: Decision Quality Matrix (Annie Duke 2x2)

Step 5: Dimension Analysis

Step 6: Anti-Inflation Flag Analysis

Step 7: Pattern Learning

Step 8: Generate Report

Output Format

Step 9: Update Patterns (if warranted)

Step 10: Rebuild SQLite Index

Error Handling

Examples

blog-calibrate

Overview

Prerequisites

Arguments

Instructions

Step 1: Load Data

Step 2: Tier Distribution Analysis

Step 3: Confidence Calibration (Brier Score)

Step 4: Decision Quality Matrix (Annie Duke 2x2)

Step 5: Dimension Analysis

Step 6: Anti-Inflation Flag Analysis

Step 7: Pattern Learning

Step 8: Generate Report

Output Format

Step 9: Update Patterns (if warranted)

Step 10: Rebuild SQLite Index

Error Handling

Examples