| name | autoresearch-info-index |
| description | Improve repeatable news and message indexing workflows so time-sensitive claims become easier to verify, compare, and reuse. Use when the goal is to get better at collecting fast-moving public information, separating confirmed facts from interpretation, and producing decision-useful summaries with explicit uncertainty. |
Autoresearch Info Index
Use this skill for repeated information-indexing workflows.
This is the task-specific layer for autoresearch-loop when the recurring job
is to turn fast-moving messages, headlines, or market-moving statements into a
clean, source-traceable index record.
Dual-Track Hints
- treat requests containing
latest, today, current, or still as Complex unless proven otherwise
- if the task expands to multiple stocks or multiple industry links, keep it
Complex
- use
financial-services-docs/docs/runtime/codex-dual-track/context-pack-template.md before long research runs
- finish with the delivery contract and include the exact cutoff date
Routing note for Reddit:
- prefer
agent-reach:reddit when Reddit should be fetched as part of the
broader live discovery surface and then compete with other channels inside
hot_topic_discovery
- prefer
reddit-bridge when you already have a saved Reddit payload,
posts.csv, or a scraper export root such as data/r_<subreddit>/posts.csv
or data/u_<user>/posts.csv and only need to normalize it into
news-index
- when extending Reddit clustering or bridge metadata, update the bounded
config under
references/reddit-cluster-aliases.json and
references/reddit-community-profiles.json before hardcoding more runtime
branches
- if a Reddit export directory also includes
comments.csv, prefer preserving
that top-comment layer instead of dropping it; the bridge can now fold it
into top_comment_summary and related metadata automatically
- if the Reddit payload already nests
comments under each post item, preserve
that structure instead of flattening it by hand before running the bridge
- if the same Reddit comment appears multiple times across export fragments,
let the bridge deduplicate it conservatively and preserve the duplicate count
as metadata instead of inflating
top_comment_count
- if comments are only near-duplicates, keep them in the imported sample and
surface
comment_near_duplicate_count as operator-review caution instead of
merging them into the exact-duplicate path
- when near-duplicate comments are present, preserve the split between
comment_near_duplicate_same_author_count and
comment_near_duplicate_cross_author_count; cross-author repetition is a
stronger caution signal than one author rephrasing the same point
- preserve a few bounded
comment_near_duplicate_examples when operator review
needs to inspect which comment pairs triggered the caution
- when Reddit comment imports are partial, keep the mismatch metadata
(
comment_declared_count, comment_sample_coverage_ratio,
comment_count_mismatch) visible for operator review instead of pretending
the sampled comments represent the whole thread
- prefer consuming the consolidated
comment_operator_review block in
raw_metadata, source_items, and clustered topic output when downstream
tooling needs one bounded operator-review object instead of many scattered
Reddit comment fields
- when triage order matters, prefer the emitted
operator_review_priority
object and result-level operator_review_queue instead of inventing a second
manual ranking pass outside the runtime
- when the indexing result enters article, macro note, publish, reuse, or queue
workflows, preserve the emitted
manual_review /
publication_readiness state instead of collapsing it back into a free-form
note; downstream consumers should keep Reddit as a publication gate signal,
not upgrade it into claim confirmation
- when you expose publish-side operator tooling, keep the same gate visible in
human-readable outputs too: readiness reports, regression reports, reuse
reports, push summaries, and queue reports should all surface the carried
manual_review / publication_readiness state
- if comment freshness matters, prefer setting
comment_sort_strategy=hybrid
or recency_then_score explicitly instead of hardcoding a new ranking branch
Use This When
- the same type of news or statement analysis happens repeatedly
- the output needs confirmed facts, explicit uncertainty, and reusable source links
- the goal is to improve the indexing process, not only to summarize one item once
Do not use this skill when:
- the task is mainly long-form company valuation
- the event cannot be tied to timestamped public sources
- there is no stable output shape to compare across runs
Required Inputs
Before starting, collect:
retrieval_request
topic
analysis_time
questions[]
use_case
source_preferences[]
mode=generic|crisis
windows=[10m,1h,6h,24h]
- source candidates with publication timestamps, source type, and claim links
- key claims that need confirmation
- current draft or baseline write-up if the run will enter phase 1 scoring
- rollback rule for unsupported or stale claims
If these are missing, the run is not ready.
Core Rule
Never keep an indexing workflow that:
- uses relative dates without anchoring them
- mixes confirmed facts with interpretation
- omits the source support behind key claims
- hides contradictory signals from different sources
Hard Checks
Every candidate should pass all of these:
- all time-sensitive references are anchored to absolute dates
- key claims are traceable to specific sources
- fact and inference are clearly separated
- contradictory or missing confirmations are disclosed
- source recency is checked before the conclusion is written
Suggested Score Dimensions
- source coverage
- claim traceability
- recency discipline
- contradiction handling
- signal extraction
- retrieval efficiency
Dual-Track Output
Every retrieval result should separate:
core_verdict
live_tape
confirmed
not_confirmed
inference_only
Promotion rule:
- one stronger-source confirmation can move a shadow signal into core evidence
- or two independent same-tier confirmations can do the same
Demotion rule:
- evidence older than 24 hours becomes
background unless a fresh confirming or
contradicting signal reactivates it
Shadow signals may change monitoring priority, but they do not raise the main
confidence level by themselves.
Structured Interfaces
Main data shapes for the recency-first front half:
source_observation
source_id
source_name
source_type
source_tier
channel=core|shadow|background
published_at
observed_at
url
claim_ids[]
entity_ids[]
vessel_ids[]
text_excerpt
position_hint
geo_hint
access_mode=public|browser_session|blocked
rank_score
claim_ledger_entry
claim_id
claim_text
status=confirmed|denied|unclear|inferred
supporting_sources[]
contradicting_sources[]
last_updated_at
promotion_state=shadow|core|background
verdict_output
core_verdict
live_tape
confidence_interval
confidence_gate
latest_signals
confirmed
not_confirmed
inference_only
conflict_matrix
missing_confirmations
market_relevance
next_watch_items
freshness_panel
source_layer_summary
background_only
- crisis mode also adds
negotiation_status_timeline, vessel_movement_table, and escalation_scenarios
retrieval_run_report
fetch_order
sources_attempted
sources_blocked
top_recent_hits
shadow_to_core_promotions
missed_expected_source_families
The runnable top-level result currently includes:
request
observations
claim_ledger
verdict_output
retrieval_run_report
retrieval_quality
report_markdown
news-index also supports preset=energy-war, which forces the crisis path
and backfills an energy benchmark watchlist plus preset watch items when the
user has not supplied custom ones.
Credibility Metrics
Every evaluated result should also expose a credibility snapshot that is easy to
aggregate later.
Prefer these metrics:
source_strength_score
claim_confirmation_score
timeliness_score
agreement_score
confidence_score
confidence_interval
The confidence interval is not meant to be academic precision. It is a practical
way to show when the conclusion depends on weak, sparse, or conflicting sources.
Also capture retrieval-quality metrics:
freshness_capture_score
shadow_signal_discipline_score
source_promotion_discipline_score
blocked_source_handling_score
Recommended Workflow
- define the topic and build a
retrieval_request
- discover, normalize, and rank candidate sources
- merge duplicates and build the
claim_ledger
- emit the dual-track verdict
- if this is a benchmarked loop run, bridge the result into a run record
- evaluate the candidate against the baseline
- keep only if hard checks pass and the score improves enough
Crisis Mode
mode=crisis uses the same generic core but ships with source families and
report sections tuned for:
- negotiations and mediation claims
- public AIS or ship-tracker pages
- military movement notes using
last public indication
- escalation scenarios with explicit triggers
Do not present exact military truth. Use wording such as last public location
or last public indication.
Casebook References
Read the matching case file when the request clearly fits one of these repeated
patterns:
Local Helper Scripts
Keep the local batch evaluator flow parallel to autoresearch-code-fix:
validated sample pool, batch run-record generation, batch evaluation, then one
markdown report.
Article Workflow Tuning Inputs
When the task has already produced a stable news-index / x-index result and
you want to tune the writing layer instead of recollecting evidence, prefer an
article-workflow request that starts from:
source_result
- or
source_result_path
Useful style / packaging controls that are now part of the supported workflow
surface:
feedback_profile_dir
- applies persisted request defaults and style memory before drafting
headline_hook_mode
- lets you force the title hook strategy instead of relying on auto mode
human_signal_ratio
- adjusts how much the draft should sound like a human operator rather than a neutral scaffold
personal_phrase_bank
- injects explicit preferred transitions or signature phrasing
For deterministic regression coverage of this surface, use:
For deterministic publish-layer acceptance coverage, use:
References