| name | news-search |
| description | Systematic web search for news, media, policy, and industry coverage of FORTIS Lab publications, tools, and research. Use when the user asks for a news audit, media coverage check, broader impact evidence, or visibility search for their work. |
News & Media Coverage Search
Systematically search for external coverage of Yue Zhao / FORTIS Lab work across all outlet types: tech press, security press, business press, government/policy, industry analysts, science press, AI newsletters, and universities.
When to Use
- Periodic audit of media coverage (quarterly recommended)
- Before tenure/promotion materials
- Before grant applications requiring "broader impact" evidence
- After major paper acceptances or tool releases
- When updating the website news section
Inputs
Read these files before starting:
data/publications.json — all papers (titles, venues, years, links)
data/open-source.json — all tools/libraries (names, stars, URLs)
news-coverage-audit.md — previous audit results (skip known items, update stale entries)
Execution Model
If parallel workers and web search are available, run dimensions in parallel; otherwise process dimensions sequentially. Batch queries conservatively to stay within tool rate limits. Read these reference files before starting:
references/search-queries.md: query bank (not exhaustive; see triage rules below)
references/outlet-registry.md: outlet classification and site: domain lists
references/search-strategy.md: techniques for finding indirect coverage, when to persist vs. stop, name disambiguation, Cloudflare/SSR-shell fetch tactics, and the Phase B "snippet alone is not verified" rule
references/candidate-schema.md: Phase A and Phase B candidate record contract
references/disclaimer-patterns.md: AI-generated, aggregator, translation, templated-database, and blocked-page detection
references/domain-registry.md: seed domains by source class and outlet_class values for Phase A
references/disambiguation-registry.md: cumulative tool-name and person-name collision rules + verified-negative leads from prior rounds (consult before counting any borderline match)
scripts/pdf_term_scan.py: PyMuPDF-based FORTIS-term scanner with built-in false-positive filters; canonical Phase B PDF-deep-search tool. Run as python skills/news-search/scripts/pdf_term_scan.py <pdf_path>.
Run the two-phase pipeline described below. Phase A gathers candidates into news-search-candidates.jsonl and does not edit news-coverage-audit.md. Phase B verifies each candidate, classifies the survivors, records dropped candidates in the candidate file, and writes only kept rows to news-coverage-audit.md using the tier structure in the Output section. If news-coverage-audit.md does not yet exist, Phase B creates it with the full tier structure and negative-results table as a fresh audit.
Query Bank Triage Rules
The query bank in references/search-queries.md is curated, not exhaustive. It covers high-adoption tools and papers with known media hooks.
Full audit mode: search EVERY paper and tool. Read data/publications.json and data/open-source.json and generate at least one smart-keyword search (Dimension 5) for every single entry. Do not skip any paper or tool. Use distinctive claims or method names, not exact titles. This is critical because coverage can appear for any paper, not just high-profile ones (e.g., COPOD has a dedicated book chapter, GLIP-OOD has a tech blog feature, the computing resources paper influenced CVPR policy).
For items not in the query bank, generate queries at runtime:
- Tools with 500+ GitHub stars: add dedicated Dimension 2 queries
- All papers at top venues from the current or prior year: add Dimension 5 smart-keyword entries
- Older papers and low-star tools: still search with at least one Dimension 5 smart-keyword query each
- Preprints: search with distinctive claim keywords
Pipeline: Two-Phase Output
Each audit runs in two phases. Phase A discovers candidates without classifying them. Phase B reads each candidate, runs verification checks, and assigns a tier. Splitting the two lets the candidate list be reviewed before any classification work commits to a row in the audit ledger.
Phase A: Candidate Gathering
Run news-search dimensions D1-D8 plus D10 and emit candidates as JSON-Lines records to news-search-candidates.jsonl at the project root. Handle citation-affiliation evidence through the freshness-gated [[citation-audit]] hook (see "Cross-skill: citation-audit integration" below), not as a Phase A candidate dimension. Do not assign tiers yet. Do not write to news-coverage-audit.md yet.
Each candidate carries the schema in references/candidate-schema.md: URL, title, snippet, surfacing query, outlet class, fetch timestamp, plus empty placeholders for the Phase B fields (flags, direct-mention, tier, notes).
When Phase A completes, present the candidate count grouped by dimension and outlet class to the user. The user (or a reviewer such as Codex) can scan the candidate list and flag wrong query routing or wrong outlet-class tagging before Phase B starts. This is the cheap, parallelizable stage; treat it as re-runnable.
Add news-search-candidates.jsonl to .git/info/exclude (local, untracked) before the first run so git add -A does not stage scratch output.
Phase B: Verify and Classify
For each candidate in news-search-candidates.jsonl, fetch the page and apply five checks in order:
-
Pre-tier filter: first-party / already-tracked / disambiguation drops. Before running the citation rule, drop the candidate if it falls into any of these patterns (each was stepped into during the 2026-05-07 round):
- First-party hosting on the PI's current or prior institution (e.g., the PI's CMU PhD-era profile, an NSF PAR record of the PI's own grant output, a journal mirror of the PI's own paper).
- Already-tracked award URL — the canonical landing page for an award already recorded in Ledger 5.
- Coauthor-institution publication listing — a bare research-listing page on a coauthor institution's site (Microsoft Research, Adobe Research, etc.) that is not editorial; demote to Ledger 3.
- Name-collision drop — the match is on a different person ("Yue Zhao" → Yuchen / Siyan / Qingyue / W. / D. Zhao) or a different project ("Aegis" → Forrester AEGIS / NVIDIA Aegis / RedHat aegis-ai; "TrustLLM" → trustllm.eu; "TDC" → TDCJ / J&J Therapeutics Discovery). Consult
references/disambiguation-registry.md.
-
Direct-mention / topic-validation routing (the citation verification rule in the Output section). If the page names the work, person, lab, co-author, institution, or direct URL per one of clauses 1 to 6, fill direct_mention and continue as coverage. If it does not pass direct mention but clearly covers the same topic area, set tier: "topic-validation" and keep it for the Topic Validation appendix, not a coverage ledger. If it is neither direct coverage nor topic validation, set tier: "dropped" and record the drop reason in notes.
Snippet alone is not verified evidence for Tier 0 / Tier 1 candidates. WebSearch summaries can synthesize content that does not appear in the source (the 2026-05-07 round caught this with GAO-26-108695: snippet claimed TrustLLM citation; manual PDF extraction confirmed the PDF says nothing of the sort). Tier 0 / Tier 1 promotion requires direct fetch of the source — pdf_term_scan.py for PDFs, real-UA HTTP for web pages. If the source is gated and cannot be re-fetched, set tier_guess: phase_b_priority and leave as a candidate; do not count.
-
Disclaimer / aggregator detection (references/disclaimer-patterns.md). Run the regex sweep on fetched content. Set entries in the candidate's flags[] field. Hard caps:
ai_generated and aggregator are capped at Tier 3 regardless of outlet domain.
machine_translated is capped at Tier 3 unless editorial_translation is also set.
paywall_or_blocked is held for manual verification, not classified from snippet alone.
-
Tier assignment per the tier structure in the Output section. Assign coverage tiers (Tier 0 through Tier 5) only to candidates that pass direct mention; topic-only candidates keep tier: topic-validation from step 1.
-
Registry harvest status. For each kept coverage row, set registry_status to existing or new after checking the page's registered domain against references/domain-registry.md. Leave registry_status empty on dropped and topic-validation rows.
Phase B writes direct-coverage rows (Tier 0 through Tier 5) to the coverage ledgers, topic-only rows (tier == 'topic-validation') to the Topic Validation appendix, and keeps dropped rows (tier == 'dropped') in news-search-candidates.jsonl for auditability. The full candidates file stays at the project root through the audit so a reviewer can audit drop decisions, not only the kept rows.
Domain Registry and Post-Round Harvest
references/domain-registry.md lists known high-value source classes (gov / policy PDFs, EU research projects, patents, China tech media, security research blogs, AI-newsletter aggregators, and others) with seed domains. Phase A queries fan out to seeded domains in addition to the open dragnet, never instead of it. The registry is a recall floor, not a filter.
After each audit, harvest the domains of every confirmed Phase B hit and append new ones to the registry under the appropriate class. If no class fits, create one (lowercase-hyphenated name). This is the only way the registry stays current as new outlet types appear; without it, the registry freezes and re-discovery cost recurs.
Once a quarter, run a registry-disabled pass (open dragnet only) to surface new outlet classes the registry has not seen yet. This is what catches the next surprise category.
Dimension 1: Person & Lab
Find coverage that names the PI or lab, regardless of which paper or tool.
Search for: name + university + research area + various contexts (news, interview, podcast, keynote, expert quote, award, fellowship, grant). Also search for lab name and industry partner names (Amazon, NVIDIA, Google, Meta, Anthropic, NSF).
See references/search-queries.md § Dimension 1 for the full query list.
Dimension 2: Tools in Non-Academic Contexts
Major tools (PyOD, TrustLLM, agent-audit, Aegis, ADBench) may appear in industry deployments, government reports, textbooks, or enterprise case studies without naming the PI.
Search for: each tool name + context keywords (enterprise, deployment, production, fraud detection, cybersecurity, government, NIST, federal, textbook, course, patent, Walmart, NASA, Tesla).
See references/search-queries.md § Dimension 2 for the full query list.
Dimension 3: Outlet Sweep
Systematically check each outlet category using site: filters. This is the most important dimension for finding coverage the other dimensions miss.
Categories: security press, business press, top tech press, AI newsletters, science press, government/policy, industry analysts, university/institutional press, developer community.
Generation rule: references/search-queries.md § Dimension 3 provides base queries for the highest-priority outlets. For any outlet domain listed in references/outlet-registry.md that does not have an explicit query in the query bank, generate one at runtime using this template: site:{domain} "anomaly detection" OR "AI auditing" OR "AI agent security" OR PyOD OR TrustLLM. This ensures every registered outlet is checked without requiring the query bank to enumerate all 70+ domains.
Dimension 4: Topic Proximity
Search for the broader trending topic and check if the work appears within coverage. This catches indirect coverage where the paper is relevant but not cited by name.
Examples: ChatGPT geolocation trend (connects to DoxBench), OpenClaw security crisis (connects to agent-audit), OWASP agentic AI landscape (connects to agent-audit/Aegis), anomaly detection open-source landscape (connects to PyOD).
See references/search-queries.md § Dimension 4 for the full query list.
Dimension 5: Smart Paper Search
For papers where exact title search fails (most papers), use distinctive result keywords or striking claims from the paper instead.
Examples: "surpassed human performance 95.33% CLADDER" for the causal reasoning paper, "defense training breaks LLM agents 47-77% benign task failure" for The Autonomy Tax.
See references/search-queries.md § Dimension 5 for the full mapping table.
Dimension 6: Citation & Downstream Impact
Track high-level citation metrics, appearances in high-impact journals (Nature, Science), enterprise adoption evidence, and downstream tools built on the work.
See references/search-queries.md § Dimension 6 for the full query list.
Dimension 7: Education, Ecosystem & Global
Search the surfaces where widely-adopted tools spread beyond academic papers and news: education platforms, code ecosystems, non-English press, and developer communities.
Education & courses: Kaggle notebooks, Google Colab examples, Coursera/edX/Udemy course materials, university syllabi, YouTube/Bilibili tutorials, O'Reilly/Manning learning paths
Code ecosystem: GitHub code dependents (repos that import PyOD), PyPI/conda-forge download stats pages, Papers with Code tool listings, Hugging Face Spaces built on your tools
Dissertations & theses: ProQuest, Google Scholar thesis search, university repository searches
Non-English coverage: Chinese tech press (InfoQ CN, CSDN, Zhihu, WeChat public accounts), Japanese (Qiita, Zenn), Korean (Tistory, Velog), European (Heise, Le Monde Informatique, etc.)
See references/search-queries.md § Dimension 7 Education/Ecosystem for queries.
Dimension 8: PDF Deep Search (Government, Think Tank, Industry Reports)
This is critical and cannot be skipped. Web search does not index the text inside PDFs from government reports, congressional testimony, think tank whitepapers, and industry reports. Citations of your work inside these documents are the highest-impact coverage (Tier 0) and are routinely missed by Dimensions 1-6.
Strategy
- Identify candidate PDFs — search for government/think tank reports on topics your work addresses (AI agent security, anomaly detection, LLM trustworthiness, AI auditing). Collect the PDF URLs.
- Fetch and search inside each PDF — download or fetch the PDF, extract text (via PyMuPDF, pdftotext, or the WebFetch tool), and search for: tool names (PyOD, TrustLLM, Aegis, agent-audit, etc.), paper titles, author names ("Yue Zhao", "Zhao et al."), arXiv IDs, and repo URLs.
- Verify and classify — if found, note the exact page, footnote number, and surrounding context.
High-priority PDF sources to search
U.S. Government (highest priority):
- U.S. Senate committee reports — HSGAC, Commerce, Judiciary, Armed Services (AI-related)
- U.S. House committee reports — Science, Homeland Security, Financial Services
- NIST special publications — AI RMF updates, AI agent security, AI 100-series
- GAO reports — AI technology assessments, Science & Tech Spotlight series
- CRS reports — Congressional Research Service AI analyses
- Federal agency AI strategies — DOD (JAIC/CDAO), DOE national labs, HHS, Treasury/OCC, SEC, CFTC, Federal Reserve
- White House — AI executive orders, OMB memoranda, OSTP reports, CEA reports
- NSF — program solicitations, dear colleague letters mentioning anomaly detection or AI safety
International Government (high priority):
- EU — AI Act impact assessments, ENISA reports, EU AI Office publications
- UK — AI Safety Institute reports, DSIT AI regulation papers, Alan Turing Institute policy briefs
- Canada — ISED AI strategy, Canadian Centre for Cyber Security
- Australia — Department of Industry AI reports, eSafety Commissioner
- Singapore — IMDA Model AI Governance Framework
- OECD — AI Policy Observatory reports, OECD AI Principles implementation documents
- UN — UNESCO AI ethics recommendations, ITU AI reports
- G7/G20 — Hiroshima AI Process documents, AI governance communiques
Think tanks & policy institutes:
- Brookings, RAND, CSET Georgetown, Stanford HAI, FLI, CAIS, Partnership on AI
- Center for Data Innovation, Information Technology and Innovation Foundation (ITIF)
- Centre for International Governance Innovation (CIGI)
Foundation model companies (Tier 0 if they cite your work):
- OpenAI — system cards (GPT-4, GPT-5, o1, o3), safety reports, preparedness framework documents, red teaming reports
- Anthropic — model cards, responsible scaling policy documents, safety research reports
- Google DeepMind — technical reports, Gemini system cards, safety evaluations
- Meta AI — Llama model cards, system cards, responsible use guides
- Mistral — model documentation, technical reports
- xAI — Grok system cards and technical reports
- Cohere — model cards, safety documentation
- Microsoft — Phi model cards, responsible AI reports, Azure AI safety documentation
- Amazon — Titan model documentation, AWS AI safety reports
These companies publish system cards and safety evaluations as PDFs or long-form web pages that reference academic benchmarks (TrustLLM, HELM, etc.) and tools. They also publish blog posts with embedded citations. Search both the HTML pages and any linked PDFs.
Standards bodies:
- ISO/IEC (AI standards series), IEEE SA, OWASP (agentic AI PDFs)
- MITRE ATLAS documentation
Industry whitepapers & analyst reports (Tier 0 if they cite your work by name):
- McKinsey, Deloitte, PwC, Accenture, EY, KPMG, BCG, Bain
- Gartner, Forrester, IDC research reports
Known citations found via this dimension
- U.S. Senate HSGAC — "Hedge Fund Use of Artificial Intelligence" (Jun 2024), footnote 119 cites TrustLLM on page 25
- FLI AI Safety Index — Winter 2025 PDF uses TrustLLM as an official benchmark
Why web search misses these
Government PDFs are hosted as static files (e.g., .senate.gov/wp-content/uploads/...pdf). Web search engines index the hosting page but not the text inside the PDF. A search for site:senate.gov TrustLLM returns nothing because the word "TrustLLM" only appears inside the PDF, not on any HTML page. The only way to find these is to identify candidate documents by topic, then search inside the PDFs directly.
Output
Write all results to news-coverage-audit.md at the project root.
Citation Verification Rule
An item only counts as coverage if the article names or cites at least one of:
- A specific paper title or tool name (PyOD, TrustLLM, Aegis, agent-audit, etc.)
- The PI by name ("Yue Zhao")
- The lab ("FORTIS")
- A co-author by name in the context of the specific paper/tool
- An institutional attribution ("researchers from USC", "a USC team") in the context of the specific paper/tool
- A direct link to the project URL, repo, or arXiv paper
Read the article or its snippet to verify before including it. Items attributed to co-authors or institutions should note this in the entry (e.g., "names: first author X, USC affiliation — Yue Zhao is co-author").
Items that only cover the same topic your work addresses (e.g., "AI agent security is important" without naming your tools) are not coverage. These may be useful as context for grant narratives but must be placed in a separate "Topic Validation (Not Direct Coverage)" appendix, clearly marked as not naming your work.
Tier Structure
All tiers below require the citation verification rule above. If a result does not name or cite your work, it does not belong in any tier.
| Tier | Definition | Examples |
|---|
| Tier 0 | (a) Government reports (U.S. or international: legislative, executive, federal/national agency), international body reports (OECD, UN, EU), or official standards documents; (b) Technical reports, system cards, safety reports, or model cards from major foundation model companies (OpenAI, Anthropic, Google DeepMind, Meta AI, Mistral, xAI, Cohere, etc.); (c) Major consulting/analyst firm reports (McKinsey, Gartner, Forrester, Deloitte, etc.) — all that cite your work by name | U.S. Senate report citing TrustLLM, OpenAI system card citing TrustLLM, Anthropic safety report citing anomaly detection benchmark, Gartner report citing PyOD |
| Tier 1 | Mainstream tech/business/security press, major research institutions (national labs, Hoover, Microsoft Research), or high-impact policy reports that name your work | FLI AI Safety Index using TrustLLM, LLNL article naming TrustLLM, Nature Biotechnology citing DrugAgent |
| Tier 2 | Industry press, institutional PR, or dedicated features that name your work | "DrugAgent" in MarkTechPost, "Yue Zhao" in USC Viterbi News, Databricks blog naming PyOD |
| Tier 3 | Dedicated blog posts, tutorials, or platform integrations naming your tool | KDnuggets PyOD tutorial, Databricks Kakapo built on PyOD, DEV Community Aegis post |
| Tier 4 | Awards, recognitions, encyclopedia entries | Amazon RA, NVIDIA Grant, Grokipedia entry |
| Tier 5 | Academic community only (Hugging Face, alphaXiv, etc.) | Paper pages, GitHub stars, Moonlight reviews |
Separate appendix (not a tier):
- Topic Validation — articles covering the same topic area without naming your work. Useful for grant narratives ("our research addresses concerns raised in McKinsey's 2026 report on agentic AI security") but not website news items.
Tier 0(b) extension: foundation-model-company careers pages
A first-party foundation-model-company job posting that names a FORTIS tool as expected operational tooling (e.g., the OpenAI "Technical Intelligence Analyst" Qualifications block naming PyOD as anomaly-detection tooling) qualifies as Tier 0(b)-equivalent only when all of the following hold:
- First-party host. The canonical URL is the company's own careers domain (
openai.com/careers/..., anthropic.com/jobs/..., deepmind.google/careers/..., ai.meta.com/careers/..., etc.), not a Greenhouse / Lever / Ashby / DFJ Growth / Glassdoor / LinkedIn / Indeed mirror. ATS mirrors are kept under mirrors[] in the candidate record but never count as the load-bearing citation.
- Tool named as operational tooling, not background literature. The mention sits in Qualifications, Responsibilities, or Tech Stack as a tool the hire is expected to use, not in a "see also" or "related work" footnote.
- Durable snapshot exists. A Wayback Machine archive URL OR a committed local sidecar pair (HTML + PDF in
news-snapshots/<slug>-<YYYY-MM-DD>.{html,pdf}) is in the repo, with a Markdown index file documenting the live URL, capture date, verification method, and verbatim quote. Sidecars must be captured from a logged-in browser session when the live URL is behind Cloudflare; PDF must be re-verified with python skills/news-search/scripts/pdf_term_scan.py <pdf_path>.
When all three hold, the candidate goes into Ledger 1 (Government/Policy citations) under Tier 0(b) with a Source URLs row that includes the live URL, mirror URLs, and the snapshot index path. The #8g precedent (news-snapshots/openai-careers-technical-intelligence-analyst-2026-05-07.md) is the reference shape; new entries follow that index format.
If the live URL is reachable but no snapshot exists yet, set tier_guess: phase_b_priority and status: paywall_or_blocked (or candidate with a snapshot-pending note in notes). Do not promote to Ledger 1 from a snippet alone — careers pages go stale within weeks of the role being filled, so an unsnapshotted Tier 0(b) claim becomes unverifiable as soon as OpenAI / Anthropic rotates the URL.
Non-FM-co careers pages (Wells Fargo, Capital One, JPMC, Pfizer, Goldman, etc.) follow the same snapshot-or-hold rule but classify under Ledger 3 (ecosystem adoption — enterprise operational adoption evidence), not Ledger 1 / Tier 0(b). Only foundation-model companies get the Tier 0(b) lift; the rationale is that FM-co operational tooling decisions are themselves treated as authoritative signal in the way GAO / NIST PDFs are. A non-FM enterprise JD naming a tool is operational adoption evidence comparable to a code import or vendor whitepaper, which is Ledger 3 territory; it is not third-party media (Ledger 2) and it is not a government / FM-co citation (Ledger 1).
Required Sections in Output File
- Coverage ledgers (separate counts for each):
- Government/Policy citations — Tier 0: government reports, foundation model system cards, standards documents, analyst reports that cite your work by name
- External media — Tier 1-2: third-party press, institutional features, dedicated blog posts by external authors
- Ecosystem adoption — Tier 3: books, podcasts, enterprise integrations, patents, tutorials, platform integrations by external parties
- First-party/community — self-authored blog posts, GitHub discussions, dataset hosting (not external coverage)
- Awards & recognitions — Tier 4: awards, fellowships, encyclopedia entries
- Topic Validation appendix — articles that cover the same topic but do not name your work
- Negative Results table — outlet types searched with no results (prevents re-searching)
- Upcoming Opportunities — imminent conferences, journalist contacts from prior coverage
- Summary Statistics — separate counts per ledger, not a single aggregate. Report: government/policy total, external media total, ecosystem total, first-party/community total, and awards total.
- Coverage matrix — per-item appendix or CSV with one row per paper/tool, dimensions searched (D1-D8 plus D10; D9 is now handled by the standalone [[citation-audit]] skill), Phase A candidate count, and Phase B outcome (kept/topic-only/dropped/none). This makes the audit auditable.
- Registry harvest summary — list of new domains added to
references/domain-registry.md from this round's confirmed Phase B hits, grouped by class. Empty list is fine and should still be reported, so the harvest step stays visible across rounds.
Incremental Updates
When running a targeted search (not full audit), append new findings to the existing file. Do not overwrite previous results. Mark the date of each search pass.
Run Modes
| Mode | When | Dimensions to run |
|---|
| Full audit | Once per semester, before portfolio updates | All 9 news-coverage dimensions (D1-D8 + D10) plus a freshness-gated hook into the standalone [[citation-audit]] skill (see "Cross-skill: citation-audit integration" below) |
| Targeted | After a specific paper acceptance or tool release | Dims 1, 3, 4, 5 scoped to that item, plus D8 when policy or PDF evidence is plausible |
| Quick check | Before grant submissions | Dims 1, 6, 8, 10 (citations, impact, government PDFs, external deep research) |
| Topic monitor | When a trending topic connects to your work | Dim 4 only, focused on that topic |
| Ecosystem check | Before broader-impact statements | Dim 7 (education, code ecosystem, global) |
| PDF deep search | Before tenure materials or when a specific gov report is suspected | Dim 8 only, with candidate PDF list |
| Affiliation audit | Before tenure / promotion, after major citation milestones | Invoke the standalone /citation-audit skill (was D9 here before; split out as its own skill at skills/citation-audit/SKILL.md) |
| External deep research | After automated audit, as a complement pass | Dim 10 (external LLM deep research) |
Dimension 10: External LLM Deep Research
Purpose: Use external deep research tools (ChatGPT Deep Research, Gemini Deep Research, Claude on claude.ai, or similar) as a complement to the automated Dimensions 1-8. These tools have browsing capabilities, PDF reading, and search strategies that differ from Claude Code's WebSearch, and consistently find items the automated audit misses.
Why this matters: In practice, external deep research tools found the NIST AI 100-2e2025 citation, a third FLI AI Safety Index edition, 5 additional patents, and non-English coverage in Korean/German/Spanish that the automated audit missed entirely. These tools are not a substitute for the structured audit (they lack the systematic coverage and verification discipline), but they are a strong complement.
How to run
- After completing Dimensions 1-8 and reading the freshness state of the standalone citation-audit hook, generate a self-contained prompt for external deep research tools. The prompt should include the full tool/paper inventory, what to search for, and the citation verification rule. See
references/search-queries.md for the base query bank, but the prompt should be open-ended ("search broadly and creatively — I do not know where the coverage is").
- Run the prompt in 1-3 external tools. Different tools have different search indices and browsing capabilities; running multiple increases coverage.
- Save the raw output to
external-research/ in the project root. Name each file by source and date: {source}-{YYYY-MM}.md (e.g., chatgpt-deep-research-2026-04.md, gemini-2026-04.md, claude-2026-04.md). Date the files so future runs know what was already searched and when. Running once per quarter is sufficient; monthly if a major release or conference just happened.
- Diff the external findings against the existing audit. Use an agent to extract only genuinely new items (not already in
news-coverage-audit.md).
- For any Tier 0 claims (government, NIST, foundation model system cards), manually verify by opening the source PDF and searching for the tool name. External deep research tools hallucinate citations at a non-trivial rate.
What external tools find that automated search misses
- Government PDFs: NIST publications, congressional reports, agency toolkits. These tools can browse and read PDFs that WebSearch cannot index.
- Patents: Google Patents searches with natural language are more effective through browsing tools than through API queries.
- Non-English coverage: Deep research tools handle multilingual searches better and find content on platforms (Bilibili, Tistory, ichi.pro) that site:-scoped web search misses.
- Older coverage: Blog posts and tutorials from 2018-2020 that have fallen out of search engine rankings but are still live.
What they get wrong
- Hallucinated citations: A deep research tool may claim a PDF contains your tool name when it does not. Always verify Tier 0 claims manually.
- Name collisions: "Aegis" matches many unrelated projects. "BOND" matches biology papers. Verification is mandatory.
- Stale or broken links: Some URLs returned may be dead. Check before adding to the audit.
Cross-skill: citation-audit integration
Bibliometric citation-affiliation audit (formerly Dimension 9 inside this skill) now lives in its own skill at skills/citation-audit/SKILL.md (slash command: /citation-audit). The two skills divide the external-impact landscape:
- news-search (this skill): editorial coverage. Press, blogs, government PDFs, ecosystem, deep-research-tool output. Output:
news-coverage-audit.md.
- [[citation-audit]]: bibliometric coverage. Citing-paper author affiliations via OpenAlex and Dimensions Analytics. Output:
citation-affiliation-audit.md.
The "Full audit" mode of this skill hooks citation-audit results into the editorial report so the final news-coverage-audit.md captures both sides of impact evidence. The hook is freshness-gated, not always-on, because a full citation audit takes 30-80 minutes and may not be appropriate every time news-search runs.
Hook procedure
When running the "Full audit" Run Mode, perform the following before writing news-coverage-audit.md:
- Check whether
citation-affiliation-audit.md exists at the project root.
- If missing: tell the user "Citation affiliation audit has never run on this project; recommend running
/citation-audit --source both (or --source openalex if no Dimensions credentials) before the news-search full audit." Do not auto-invoke the long citation audit without confirmation.
- If fresh (mtime within the last 30 days): copy citation-affiliation-audit.md's Tier 0 and Tier 1 tables verbatim into
news-coverage-audit.md under a ## Citation Affiliation Evidence (integrated from citation-audit) section. Specifically:
- Reproduce the full Tier 0 table (all rows), with the same
Category | Institution | Country | Your Work Cited | Citing Paper | Year | Source columns.
- Reproduce the full Tier 1 table (all rows), same columns.
- Reproduce the Summary by Institution subsection.
- Reproduce the per-source Coverage subsections (so the freshness gate and source coverage stay visible in the merged report).
- Add a one-line freshness stamp at the top of the section citing the audit's generation date.
- Add a link back to
citation-affiliation-audit.md for the canonical separate copy.
- If stale (mtime older than 30 days): integrate the same content, prefix the section header with
(stale, regenerate via /citation-audit), and surface the staleness to the user. Do not silently truncate.
The hook never re-runs the citation audit silently; that decision belongs to the user. The hook only reads the existing file and integrates its full content into the unified report.
Why full integration, not summary
news-coverage-audit.md is the single artifact a tenure / promotion reader or grant reviewer scans for external-impact evidence. Forcing them to follow a link into a separate citation-affiliation-audit.md (which they may not realize exists) creates a gap in the impact story. Truncating to "top 10 hits" loses the long-tail Tier 1 evidence that often matters most for broader-impact narratives (e.g., a niche citation from Capital One or Mayo Clinic adds new domain breadth that a top-10 list might drop). Embedding the full tables verbatim keeps the integrated report self-contained while the standalone citation-affiliation-audit.md remains canonical for incremental re-runs.
Why the hook is one-way
The hook is news-search reading citation-audit's output, not citation-audit calling into news-search. This keeps the two skills independent: the citation-audit pipeline (OpenAlex / Dimensions DSL, Tier 0 / Tier 1 regex patterns, per-source coverage merging) does not need to know anything about editorial-coverage discovery. The single integration surface is the citation-affiliation-audit.md file, which both skills agree on as the contract.