| name | data-source-watcher |
| description | Proactive monitor for credible government data sources (IRS, BLS, SSA, DOL, state revenue departments, Census, O*NET). Detects upstream releases, opens draft refresh PRs for affected tools and articles, and maintains the user-visible "new data coming soon" banner. Operationalizes the update-velocity moat — being among the first sites to ship authoritative numbers when they change. Used by Stage 1 Discovery (surfaces freshness opportunities), Stage 4 Creation (auto-drafts refresh PRs), and the tool-cluster strategy. |
| metadata | {"family":"ops","owner":"seo","last_reviewed":"2026-05-01T00:00:00.000Z","version":"1.0.0","related_skills":["tool-fit","tool-fit-validation","content-refresh","seo-foundations","opportunity-discovery"],"kpis":["Time-to-ship after authoritative source release ≤72h for IRS / SSA; ≤14d for BLS monthly; ≤30d for ACS annual","Zero published tools showing data older than the most recent authoritative release without a 'new data coming soon' banner","≥1 'we updated within X hours' linkable-asset PR opportunity per quarter (Tier 3 backlink velocity)"],"marketing_pillar":4,"seo_standard":"B_C_E","kpi_tier":13,"funnel_stage":"all","content_class":"transactional","maturity_stage":"predictive","used_by_stages":[1,4]} |
Data Source Watcher
Proactive monitor for credible government data sources. Operationalizes the update-velocity moat — pillar 3 of the six-pillar moat per seo/tool-fit/reference/moat-scoring.md.
Why This Exists
A take-home pay calculator showing 2024 tax tables in 2026 is worse than no calculator — it damages credibility, suppresses Tier 3 brand search, and triggers manual-action risk on YMYL pages. Reactive freshness routing (via ops/content-refresh) catches stale pages after they've gone stale; the watcher catches the upstream release before the page goes stale.
Two outcomes:
- Speed-to-ship moat: be among the first sites to publish authoritative numbers when they update. The IRS releases 2027 tables in November 2026; the SERP competitors take 2-4 weeks. We ship inside 72 hours and own the search interest during the gap. The speed itself becomes a Tier 3 brand-search and link-bait asset (a "we updated within 48h" methodology page is itself linkable).
- Honest freshness signal: when a source has updated but our integration hasn't shipped yet, the tool surfaces a user-visible "BLS 2026 OEWS published; Career Hub updates within 72h" banner. Honesty earns trust the static SERP competitors can't match.
When to Use
- Continuous (scheduled): monitor authoritative-source release calendars + RSS feeds. No human input required for the watch loop.
- Stage 1 Discovery (weekly): emits "freshness opportunities" — tools that need refresh because an upstream source has released. Discovery surfaces these as
90_10_class: refresh candidates.
- Stage 4 Creation (event-driven): when a release fires, auto-drafts a refresh PR for affected tools / articles, including data-table updates and "Last verified" date bumps.
Do Not Use When
- Reactive routing of already-stale content — that's
ops/content-refresh's job. The watcher is proactive: it knows in advance when sources are going to release.
- Editorial content refresh (article guides) — articles use
ops/content-refresh SLAs, not source-release events.
- Owned platform data (Indeed Flex internal datasets) — different release cadence; not in scope here.
Authoritative Sources Watched
Career Hub clusters (per seo-foundations/reference/tool-cluster-map.md) drive what's monitored. Initial source list:
Pay and Tax cluster
| Source | What it publishes | Cadence | Watch method | SLA after release |
|---|
| IRS Pub 15-T | Federal withholding tables | Annual ~Nov | RSS feed + IRS press release | 72h |
| IRS Pub 15 (Circular E) | Employer tax guide | Annual | Pub 15-T release co-occurrence | 72h |
| SSA | FICA wage base, COLA | Annual ~Oct | SSA press release | 72h |
| All 50 state revenue departments | State withholding tables | Annual + ad-hoc | State portal scrape (cached) | 7 days |
| DOL Wage and Hour Division | FLSA + minimum wage rules | Ad-hoc | DOL announcements | 7 days |
| BLS OEWS | Occupational wages | Annual ~Mar/May | BLS release calendar | 30 days |
Cost and Trade-offs cluster
| Source | What it publishes | Cadence | Watch method | SLA after release |
|---|
| BLS CPI | Consumer Price Index | Monthly | BLS schedule | 14 days |
| Census ACS 5-Year | Commute, housing, demographics | Annual ~Dec | Census release calendar | 30 days |
| DOL UI rules | State unemployment rules | Ad-hoc | State labor department portals | 7 days |
Career and Skills cluster
| Source | What it publishes | Cadence | Watch method | SLA after release |
|---|
| BLS Occupational Outlook Handbook | Role projections | Biennial | BLS release | 30 days |
| O*NET-OnLine | Skills, tasks | Quarterly | O*NET update notice | 30 days |
Shift and Schedule cluster
| Source | What it publishes | Cadence | Watch method | SLA after release |
|---|
| DOL FLSA rules | Overtime, breaks | Ad-hoc | DOL announcements | 30 days |
| State paid-leave laws | Paid sick, family leave | Ad-hoc | State labor portals | 7 days |
The watcher does not monitor commercial competitors, paid data vendors, or non-authoritative sources. Government and quasi-government only — the credibility is the moat.
Output Artifacts
1. Release detection log
nextjs-app/docs/research/data-source-watcher/releases/<YYYY-MM-DD>-<source>.md — one record per detected release:
source: irs-pub-15-t
release_date: 2026-11-15
release_url: <official-source-url>
detected_at: 2026-11-15T14:23:00Z
affected_tools: [paycheck-calculator, take-home-pay, tax-calculator, salary-converter, hourly-to-salary]
affected_articles: []
sla_target: 2026-11-18T14:23:00Z
draft_pr_opened: <github-pr-url>
banner_displayed: true
banner_text: "IRS 2027 withholding tables published; Career Hub updates within 72h"
2. Auto-drafted refresh PR
For each detected release, the watcher opens a draft PR:
- Resolves each affected tool through
nextjs-app/src/features/tools/shared/data/tool-registry/data.ts before editing.
- Updates the tool's actual owning data/config/module, such as:
- shared registry fields under
nextjs-app/src/features/tools/shared/data/tool-registry/
- calculator-owned data under
nextjs-app/src/features/tools/<tool>/
- shared tax/source tables under the existing state-pages or calculator data modules
- page metadata, methodology, or source records owned by the route family
- Bumps
dateModified or equivalent freshness metadata in the owning registry/data object.
- Updates the methodology page's "Last verified" stamp when the affected tool has one.
- Includes a checklist for human reviewer: confirm new tables match source PDF, run
pnpm typecheck + pnpm test:unit for the calculator, smoke-test the affected tool route.
Do not assume every tool has a direct per-tool data.ts; many current tools are registry-backed or use feature-specific modules. If ownership is ambiguous, hand off to code/tool-page-builder or code/data-placement before drafting the PR.
Reviewer (CPA / EA for YMYL) signs off; PR merges. If reviewer can't ship within SLA, the user-visible banner remains until ship.
3. User-visible freshness banner
A shared component (proposed: src/features/tools/shared/content/FreshnessBanner.tsx — coordinate with code/tool-page-builder if implementing) renders on tool pages when:
- An upstream source has released, AND
- The corresponding refresh PR has not yet merged
Banner copy template: "<source> <year> data published <date>; Career Hub updates within <SLA>."
The banner is part of the moat. A static SERP competitor can't honestly say this. The transparency itself converts.
4. Discovery opportunity emission
The watcher emits structured opportunity records that Stage 1 Discovery surfaces as 90_10_class: refresh candidates:
opportunity_id: refresh-paycheck-calc-irs-2027-2026-11-15
ninety_ten_class: refresh
target_tools: [paycheck-calculator, take-home-pay, ...]
trigger: irs-pub-15-t-release-2026-11-15
effort_hours: 4
deadline: 2026-11-18T14:23:00Z
priority: high
These records skip the full Discovery scoring — they're SLA-driven, not lift-driven. Discovery surfaces them in a separate "freshness queue" section of the weekly Slack gate.
Tools Allowed
- File system + web fetch (RSS feeds, source press releases, state portal HTML — cache aggressively per 040-mcp-usage.mdc)
- GitHub API (open draft PRs, read PR status)
plugin-slack-slack (post to #career-hub-management when SLA-at-risk)
pnpm scripts (typecheck, test:unit on draft PRs)
NOT allowed: paid data vendors (Semrush, etc.) — credibility moat depends on government / quasi-government sources only.
Workflow
Continuous loop (scheduled hourly during release windows, daily otherwise)
- Poll authoritative sources per the source list above (RSS feeds, release calendars, state portals).
- Detect new releases via content-hash + date comparison against last-seen state in
data-source-watcher/state.json.
- Identify affected tools / articles via the cluster map's
authoritative_sources field.
- Emit release record to
releases/<date>-<source>.md.
- Open draft refresh PR for affected tools (one PR per cluster batched, not one PR per tool — reduces review load).
- Display freshness banner on affected tool pages until PR merges.
- Emit Discovery opportunity record for the next weekly Discovery batch.
SLA-watch loop (continuous after a release)
- Track time-since-release per affected tool.
- At 50% of SLA elapsed → Slack reminder to reviewer.
- At 100% of SLA elapsed → escalate via Slack approval gate. If overdue, the banner copy escalates from "Career Hub updates within 72h" to "Career Hub update in progress; data refresh underway".
Integration with Existing Skills
| Skill | Integration |
|---|
| seo/tool-fit | Tools committing to the update-velocity moat (pillar 3 of moat scorecard) must register with the watcher |
| ops/tool-fit-validation | K6 of the kill list requires watcher subscription before approving a tool that depends on changing data |
| ops/content-refresh | Reactive counterpart — handles already-stale content; the watcher prevents staleness in the first place |
| seo/seo-foundations | Standard B (E-E-A-T freshness) + Standard E (operational reliability with SLO/SLI) |
| code/tool-page-builder | freshnessSla field in registry + FreshnessBanner component placement |
Memory Layer Update
Append to nextjs-app/docs/reports/_index.json per release event:
{
"agent": "ops-data-source-watcher",
"stage": "satellite",
"run_at": "<ISO-8601>",
"inputs_hash": "<source-content-hash>",
"outputs_path": "nextjs-app/docs/research/data-source-watcher/releases/<date>-<source>.md",
"confidence": 1.0,
"human_decision_if_gated": null,
"source": "<source-id>",
"affected_tools": ["<slug>"],
"draft_pr": "<github-url>",
"sla_target": "<ISO-8601>",
"actual_ship": "<ISO-8601 | null>",
"sla_met": true
}
monthly-attribution-review reads these to compute time-to-ship distributions per source — the metric that proves (or disproves) the update-velocity moat.
Failure Modes
- Source URL changes / RSS feed breaks: log + Slack alert; manual re-discovery of the new feed URL. State portal scrapes are particularly fragile — keep at least one alternative path (e.g., direct PDF download URL pattern).
- False-positive release detection: when a source republishes a fixed PDF without changing data. Compare structural hash of relevant tables, not document hash.
- SLA breach: banner remains visible until ship. Log the breach to
monthly-attribution-review for trend analysis.
- Affected-tool list incomplete: cluster map is the single source of truth for source → tool mapping. If a new tool isn't mapped to its source, it won't get refresh PRs. Stage 4 Creation must register every new tool's authoritative sources at build time.
Anti-Patterns
- ❌ Polling sources every minute → wasted compute and IP-block risk. Daily for most sources; hourly only during known release windows.
- ❌ Auto-merging refresh PRs without YMYL review. Speed is the moat, but accuracy is the floor; never trade accuracy for speed on Pillar 3 tools.
- ❌ Hiding the freshness banner when ship is delayed — the banner is the moat. Hiding it loses the trust play.
- ❌ Watching commercial / private vendor sources. Credibility moat requires government / quasi-government sources only.
- ❌ One PR per tool. Cluster-batched PRs reduce review load and reflect the cluster strategy.
References