Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

news-search

Name: News Search
Author: yzhao062

// Systematic web search for news, media, policy, and industry coverage of FORTIS Lab publications, tools, and research. Use when the user asks for a news audit, media coverage check, broader impact evidence, or visibility search for their work.

Exécuter dans Manus

$ git log --oneline --stat

stars:8

forks:26

updated:28 mai 2026 à 21:34

Explorateur de fichiers

10 fichiers

SKILL.md

readonly

related-skills.json

même dépôt

citation-audit.md

from "yzhao062/yzhao062.github.io"

Systematic bibliometric audit of citing-paper author affiliations across OpenAlex and Dimensions Analytics. Use when the user wants to find which notable institutions (government agencies, foundation model companies, national labs, Big Tech, finance, pharma) have authors who cite their published work. Pair with the news-search skill for full external-impact evidence; this skill is the bibliometric half, news-search is the editorial-coverage half. Also use for legacy prompts: "citation audit", "affiliation audit", "citation-affiliation audit", or "Dimension 9 audit" (this skill was previously Dimension 9 inside news-search; split out 2026-05).

2026-05-208

dual-update.md

from "yzhao062/yzhao062.github.io"

Add or update content that must appear on both the website and the LaTeX CV. Use when the user mentions adding a new paper, award, grant, service role, teaching course, PhD student, open-source project, or any other content that overlaps between the website and CV.

2026-05-178

post-to-linkedin.md

from "yzhao062/yzhao062.github.io"

Use when announcing a release, paper, grant, talk, award, or project update on LinkedIn from this repo. Covers drafting, OAuth setup, token refresh, and posting via scripts/post_to_linkedin.py.

2026-04-228

post-to-x.md

from "yzhao062/yzhao062.github.io"

Use when announcing a release, paper, grant, talk, award, or project update on X (Twitter) from this repo. Covers drafting, style compliance, cost preview, and posting via scripts/post_to_x.py.

2026-04-228

condense-cv.md

from "yzhao062/yzhao062.github.io"

Create 1-page or 2-page CV variants from a longer master LaTeX CV. Use when the user asks for a short CV, concise CV, one-page CV, two-page CV, resume-style version, or a role-specific application CV derived from `cv/cv-full.tex` or another master CV source.

2026-04-098

package.json

"author": "yzhao062"

"repository": "yzhao062/yzhao062.github.io"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Analystes en études de marché et spécialistes en marketingProfessions des affaires et des opérations financières13-1161L4

Exécutez n'importe quel Skill en un clic

name	news-search
description	Systematic web search for news, media, policy, and industry coverage of FORTIS Lab publications, tools, and research. Use when the user asks for a news audit, media coverage check, broader impact evidence, or visibility search for their work.

News & Media Coverage Search

Systematically search for external coverage of Yue Zhao / FORTIS Lab work across all outlet types: tech press, security press, business press, government/policy, industry analysts, science press, AI newsletters, and universities.

When to Use

Periodic audit of media coverage (quarterly recommended)
Before tenure/promotion materials
Before grant applications requiring "broader impact" evidence
After major paper acceptances or tool releases
When updating the website news section

Inputs

Read these files before starting:

data/publications.json — all papers (titles, venues, years, links)
data/open-source.json — all tools/libraries (names, stars, URLs)
news-coverage-audit.md — previous audit results (skip known items, update stale entries)

Execution Model

If parallel workers and web search are available, run dimensions in parallel; otherwise process dimensions sequentially. Batch queries conservatively to stay within tool rate limits. Read these reference files before starting:

references/search-queries.md: query bank (not exhaustive; see triage rules below)
references/outlet-registry.md: outlet classification and site: domain lists
references/search-strategy.md: techniques for finding indirect coverage, when to persist vs. stop, name disambiguation, Cloudflare/SSR-shell fetch tactics, and the Phase B "snippet alone is not verified" rule
references/candidate-schema.md: Phase A and Phase B candidate record contract
references/disclaimer-patterns.md: AI-generated, aggregator, translation, templated-database, and blocked-page detection
references/domain-registry.md: seed domains by source class and outlet_class values for Phase A
references/disambiguation-registry.md: cumulative tool-name and person-name collision rules + verified-negative leads from prior rounds (consult before counting any borderline match)
scripts/pdf_term_scan.py: PyMuPDF-based FORTIS-term scanner with built-in false-positive filters; canonical Phase B PDF-deep-search tool. Run as python skills/news-search/scripts/pdf_term_scan.py <pdf_path>.

Run the two-phase pipeline described below. Phase A gathers candidates into news-search-candidates.jsonl and does not edit news-coverage-audit.md. Phase B verifies each candidate, classifies the survivors, records dropped candidates in the candidate file, and writes only kept rows to news-coverage-audit.md using the tier structure in the Output section. If news-coverage-audit.md does not yet exist, Phase B creates it with the full tier structure and negative-results table as a fresh audit.

Query Bank Triage Rules

The query bank in references/search-queries.md is curated, not exhaustive. It covers high-adoption tools and papers with known media hooks.

Full audit mode: search EVERY paper and tool. Read data/publications.json and data/open-source.json and generate at least one smart-keyword search (Dimension 5) for every single entry. Do not skip any paper or tool. Use distinctive claims or method names, not exact titles. This is critical because coverage can appear for any paper, not just high-profile ones (e.g., COPOD has a dedicated book chapter, GLIP-OOD has a tech blog feature, the computing resources paper influenced CVPR policy).

For items not in the query bank, generate queries at runtime:

Tools with 500+ GitHub stars: add dedicated Dimension 2 queries
All papers at top venues from the current or prior year: add Dimension 5 smart-keyword entries
Older papers and low-star tools: still search with at least one Dimension 5 smart-keyword query each
Preprints: search with distinctive claim keywords

Pipeline: Two-Phase Output

Each audit runs in two phases. Phase A discovers candidates without classifying them. Phase B reads each candidate, runs verification checks, and assigns a tier. Splitting the two lets the candidate list be reviewed before any classification work commits to a row in the audit ledger.

Phase A: Candidate Gathering

Run news-search dimensions D1-D8 plus D10 and emit candidates as JSON-Lines records to news-search-candidates.jsonl at the project root. Handle citation-affiliation evidence through the freshness-gated [[citation-audit]] hook (see "Cross-skill: citation-audit integration" below), not as a Phase A candidate dimension. Do not assign tiers yet. Do not write to news-coverage-audit.md yet.

Each candidate carries the schema in references/candidate-schema.md: URL, title, snippet, surfacing query, outlet class, fetch timestamp, plus empty placeholders for the Phase B fields (flags, direct-mention, tier, notes).

When Phase A completes, present the candidate count grouped by dimension and outlet class to the user. The user (or a reviewer such as Codex) can scan the candidate list and flag wrong query routing or wrong outlet-class tagging before Phase B starts. This is the cheap, parallelizable stage; treat it as re-runnable.

Add news-search-candidates.jsonl to .git/info/exclude (local, untracked) before the first run so git add -A does not stage scratch output.

Phase B: Verify and Classify

For each candidate in news-search-candidates.jsonl, fetch the page and apply five checks in order:

Pre-tier filter: first-party / already-tracked / disambiguation drops. Before running the citation rule, drop the candidate if it falls into any of these patterns (each was stepped into during the 2026-05-07 round):
- First-party hosting on the PI's current or prior institution (e.g., the PI's CMU PhD-era profile, an NSF PAR record of the PI's own grant output, a journal mirror of the PI's own paper).
- Already-tracked award URL — the canonical landing page for an award already recorded in Ledger 5.
- Coauthor-institution publication listing — a bare research-listing page on a coauthor institution's site (Microsoft Research, Adobe Research, etc.) that is not editorial; demote to Ledger 3.
- Name-collision drop — the match is on a different person ("Yue Zhao" → Yuchen / Siyan / Qingyue / W. / D. Zhao) or a different project ("Aegis" → Forrester AEGIS / NVIDIA Aegis / RedHat aegis-ai; "TrustLLM" → trustllm.eu; "TDC" → TDCJ / J&J Therapeutics Discovery). Consult references/disambiguation-registry.md.
Direct-mention / topic-validation routing (the citation verification rule in the Output section). If the page names the work, person, lab, co-author, institution, or direct URL per one of clauses 1 to 6, fill direct_mention and continue as coverage. If it does not pass direct mention but clearly covers the same topic area, set tier: "topic-validation" and keep it for the Topic Validation appendix, not a coverage ledger. If it is neither direct coverage nor topic validation, set tier: "dropped" and record the drop reason in notes.

Snippet alone is not verified evidence for Tier 0 / Tier 1 candidates. WebSearch summaries can synthesize content that does not appear in the source (the 2026-05-07 round caught this with GAO-26-108695: snippet claimed TrustLLM citation; manual PDF extraction confirmed the PDF says nothing of the sort). Tier 0 / Tier 1 promotion requires direct fetch of the source — pdf_term_scan.py for PDFs, real-UA HTTP for web pages. If the source is gated and cannot be re-fetched, set tier_guess: phase_b_priority and leave as a candidate; do not count.
Disclaimer / aggregator detection (references/disclaimer-patterns.md). Run the regex sweep on fetched content. Set entries in the candidate's flags[] field. Hard caps:
- ai_generated and aggregator are capped at Tier 3 regardless of outlet domain.
- machine_translated is capped at Tier 3 unless editorial_translation is also set.
- paywall_or_blocked is held for manual verification, not classified from snippet alone.
Tier assignment per the tier structure in the Output section. Assign coverage tiers (Tier 0 through Tier 5) only to candidates that pass direct mention; topic-only candidates keep tier: topic-validation from step 1.
Registry harvest status. For each kept coverage row, set registry_status to existing or new after checking the page's registered domain against references/domain-registry.md. Leave registry_status empty on dropped and topic-validation rows.

Phase B writes direct-coverage rows (Tier 0 through Tier 5) to the coverage ledgers, topic-only rows (tier == 'topic-validation') to the Topic Validation appendix, and keeps dropped rows (tier == 'dropped') in news-search-candidates.jsonl for auditability. The full candidates file stays at the project root through the audit so a reviewer can audit drop decisions, not only the kept rows.

Domain Registry and Post-Round Harvest

references/domain-registry.md lists known high-value source classes (gov / policy PDFs, EU research projects, patents, China tech media, security research blogs, AI-newsletter aggregators, and others) with seed domains. Phase A queries fan out to seeded domains in addition to the open dragnet, never instead of it. The registry is a recall floor, not a filter.

After each audit, harvest the domains of every confirmed Phase B hit and append new ones to the registry under the appropriate class. If no class fits, create one (lowercase-hyphenated name). This is the only way the registry stays current as new outlet types appear; without it, the registry freezes and re-discovery cost recurs.

Once a quarter, run a registry-disabled pass (open dragnet only) to surface new outlet classes the registry has not seen yet. This is what catches the next surprise category.

Dimension 1: Person & Lab

Find coverage that names the PI or lab, regardless of which paper or tool.

Search for: name + university + research area + various contexts (news, interview, podcast, keynote, expert quote, award, fellowship, grant). Also search for lab name and industry partner names (Amazon, NVIDIA, Google, Meta, Anthropic, NSF).

See references/search-queries.md § Dimension 1 for the full query list.

Dimension 2: Tools in Non-Academic Contexts

Major tools (PyOD, TrustLLM, agent-audit, Aegis, ADBench) may appear in industry deployments, government reports, textbooks, or enterprise case studies without naming the PI.

Search for: each tool name + context keywords (enterprise, deployment, production, fraud detection, cybersecurity, government, NIST, federal, textbook, course, patent, Walmart, NASA, Tesla).

See references/search-queries.md § Dimension 2 for the full query list.

Dimension 3: Outlet Sweep

Systematically check each outlet category using site: filters. This is the most important dimension for finding coverage the other dimensions miss.

Categories: security press, business press, top tech press, AI newsletters, science press, government/policy, industry analysts, university/institutional press, developer community.

Generation rule: references/search-queries.md § Dimension 3 provides base queries for the highest-priority outlets. For any outlet domain listed in references/outlet-registry.md that does not have an explicit query in the query bank, generate one at runtime using this template: site:{domain} "anomaly detection" OR "AI auditing" OR "AI agent security" OR PyOD OR TrustLLM. This ensures every registered outlet is checked without requiring the query bank to enumerate all 70+ domains.

Dimension 4: Topic Proximity

Search for the broader trending topic and check if the work appears within coverage. This catches indirect coverage where the paper is relevant but not cited by name.

Examples: ChatGPT geolocation trend (connects to DoxBench), OpenClaw security crisis (connects to agent-audit), OWASP agentic AI landscape (connects to agent-audit/Aegis), anomaly detection open-source landscape (connects to PyOD).

See references/search-queries.md § Dimension 4 for the full query list.

Dimension 5: Smart Paper Search

For papers where exact title search fails (most papers), use distinctive result keywords or striking claims from the paper instead.

Examples: "surpassed human performance 95.33% CLADDER" for the causal reasoning paper, "defense training breaks LLM agents 47-77% benign task failure" for The Autonomy Tax.

See references/search-queries.md § Dimension 5 for the full mapping table.

Dimension 6: Citation & Downstream Impact

Track high-level citation metrics, appearances in high-impact journals (Nature, Science), enterprise adoption evidence, and downstream tools built on the work.

See references/search-queries.md § Dimension 6 for the full query list.

Dimension 7: Education, Ecosystem & Global

Search the surfaces where widely-adopted tools spread beyond academic papers and news: education platforms, code ecosystems, non-English press, and developer communities.

Education & courses: Kaggle notebooks, Google Colab examples, Coursera/edX/Udemy course materials, university syllabi, YouTube/Bilibili tutorials, O'Reilly/Manning learning paths Code ecosystem: GitHub code dependents (repos that import PyOD), PyPI/conda-forge download stats pages, Papers with Code tool listings, Hugging Face Spaces built on your tools Dissertations & theses: ProQuest, Google Scholar thesis search, university repository searches Non-English coverage: Chinese tech press (InfoQ CN, CSDN, Zhihu, WeChat public accounts), Japanese (Qiita, Zenn), Korean (Tistory, Velog), European (Heise, Le Monde Informatique, etc.)

See references/search-queries.md § Dimension 7 Education/Ecosystem for queries.

Dimension 8: PDF Deep Search (Government, Think Tank, Industry Reports)

This is critical and cannot be skipped. Web search does not index the text inside PDFs from government reports, congressional testimony, think tank whitepapers, and industry reports. Citations of your work inside these documents are the highest-impact coverage (Tier 0) and are routinely missed by Dimensions 1-6.

Strategy

Identify candidate PDFs — search for government/think tank reports on topics your work addresses (AI agent security, anomaly detection, LLM trustworthiness, AI auditing). Collect the PDF URLs.
Fetch and search inside each PDF — download or fetch the PDF, extract text (via PyMuPDF, pdftotext, or the WebFetch tool), and search for: tool names (PyOD, TrustLLM, Aegis, agent-audit, etc.), paper titles, author names ("Yue Zhao", "Zhao et al."), arXiv IDs, and repo URLs.
Verify and classify — if found, note the exact page, footnote number, and surrounding context.

High-priority PDF sources to search

U.S. Government (highest priority):

U.S. Senate committee reports — HSGAC, Commerce, Judiciary, Armed Services (AI-related)
U.S. House committee reports — Science, Homeland Security, Financial Services
NIST special publications — AI RMF updates, AI agent security, AI 100-series
GAO reports — AI technology assessments, Science & Tech Spotlight series
CRS reports — Congressional Research Service AI analyses
Federal agency AI strategies — DOD (JAIC/CDAO), DOE national labs, HHS, Treasury/OCC, SEC, CFTC, Federal Reserve
White House — AI executive orders, OMB memoranda, OSTP reports, CEA reports
NSF — program solicitations, dear colleague letters mentioning anomaly detection or AI safety

International Government (high priority):

EU — AI Act impact assessments, ENISA reports, EU AI Office publications
UK — AI Safety Institute reports, DSIT AI regulation papers, Alan Turing Institute policy briefs
Canada — ISED AI strategy, Canadian Centre for Cyber Security
Australia — Department of Industry AI reports, eSafety Commissioner
Singapore — IMDA Model AI Governance Framework
OECD — AI Policy Observatory reports, OECD AI Principles implementation documents
UN — UNESCO AI ethics recommendations, ITU AI reports
G7/G20 — Hiroshima AI Process documents, AI governance communiques

Think tanks & policy institutes:

Brookings, RAND, CSET Georgetown, Stanford HAI, FLI, CAIS, Partnership on AI
Center for Data Innovation, Information Technology and Innovation Foundation (ITIF)
Centre for International Governance Innovation (CIGI)

Foundation model companies (Tier 0 if they cite your work):

OpenAI — system cards (GPT-4, GPT-5, o1, o3), safety reports, preparedness framework documents, red teaming reports
Anthropic — model cards, responsible scaling policy documents, safety research reports
Google DeepMind — technical reports, Gemini system cards, safety evaluations
Meta AI — Llama model cards, system cards, responsible use guides
Mistral — model documentation, technical reports
xAI — Grok system cards and technical reports
Cohere — model cards, safety documentation
Microsoft — Phi model cards, responsible AI reports, Azure AI safety documentation
Amazon — Titan model documentation, AWS AI safety reports

These companies publish system cards and safety evaluations as PDFs or long-form web pages that reference academic benchmarks (TrustLLM, HELM, etc.) and tools. They also publish blog posts with embedded citations. Search both the HTML pages and any linked PDFs.

Standards bodies:

ISO/IEC (AI standards series), IEEE SA, OWASP (agentic AI PDFs)
MITRE ATLAS documentation

Industry whitepapers & analyst reports (Tier 0 if they cite your work by name):

McKinsey, Deloitte, PwC, Accenture, EY, KPMG, BCG, Bain
Gartner, Forrester, IDC research reports

Known citations found via this dimension

U.S. Senate HSGAC — "Hedge Fund Use of Artificial Intelligence" (Jun 2024), footnote 119 cites TrustLLM on page 25
FLI AI Safety Index — Winter 2025 PDF uses TrustLLM as an official benchmark

Why web search misses these

Government PDFs are hosted as static files (e.g., .senate.gov/wp-content/uploads/...pdf). Web search engines index the hosting page but not the text inside the PDF. A search for site:senate.gov TrustLLM returns nothing because the word "TrustLLM" only appears inside the PDF, not on any HTML page. The only way to find these is to identify candidate documents by topic, then search inside the PDFs directly.

Output

Write all results to news-coverage-audit.md at the project root.

Citation Verification Rule

An item only counts as coverage if the article names or cites at least one of:

A specific paper title or tool name (PyOD, TrustLLM, Aegis, agent-audit, etc.)
The PI by name ("Yue Zhao")
The lab ("FORTIS")
A co-author by name in the context of the specific paper/tool
An institutional attribution ("researchers from USC", "a USC team") in the context of the specific paper/tool
A direct link to the project URL, repo, or arXiv paper

Read the article or its snippet to verify before including it. Items attributed to co-authors or institutions should note this in the entry (e.g., "names: first author X, USC affiliation — Yue Zhao is co-author").

Items that only cover the same topic your work addresses (e.g., "AI agent security is important" without naming your tools) are not coverage. These may be useful as context for grant narratives but must be placed in a separate "Topic Validation (Not Direct Coverage)" appendix, clearly marked as not naming your work.

Tier Structure

All tiers below require the citation verification rule above. If a result does not name or cite your work, it does not belong in any tier.

Tier	Definition	Examples
Tier 0	(a) Government reports (U.S. or international: legislative, executive, federal/national agency), international body reports (OECD, UN, EU), or official standards documents; (b) Technical reports, system cards, safety reports, or model cards from major foundation model companies (OpenAI, Anthropic, Google DeepMind, Meta AI, Mistral, xAI, Cohere, etc.); (c) Major consulting/analyst firm reports (McKinsey, Gartner, Forrester, Deloitte, etc.) — all that cite your work by name	U.S. Senate report citing TrustLLM, OpenAI system card citing TrustLLM, Anthropic safety report citing anomaly detection benchmark, Gartner report citing PyOD
Tier 1	Mainstream tech/business/security press, major research institutions (national labs, Hoover, Microsoft Research), or high-impact policy reports that name your work	FLI AI Safety Index using TrustLLM, LLNL article naming TrustLLM, Nature Biotechnology citing DrugAgent
Tier 2	Industry press, institutional PR, or dedicated features that name your work	"DrugAgent" in MarkTechPost, "Yue Zhao" in USC Viterbi News, Databricks blog naming PyOD
Tier 3	Dedicated blog posts, tutorials, or platform integrations naming your tool	KDnuggets PyOD tutorial, Databricks Kakapo built on PyOD, DEV Community Aegis post
Tier 4	Awards, recognitions, encyclopedia entries	Amazon RA, NVIDIA Grant, Grokipedia entry
Tier 5	Academic community only (Hugging Face, alphaXiv, etc.)	Paper pages, GitHub stars, Moonlight reviews

Separate appendix (not a tier):

Topic Validation — articles covering the same topic area without naming your work. Useful for grant narratives ("our research addresses concerns raised in McKinsey's 2026 report on agentic AI security") but not website news items.

Tier 0(b) extension: foundation-model-company careers pages

A first-party foundation-model-company job posting that names a FORTIS tool as expected operational tooling (e.g., the OpenAI "Technical Intelligence Analyst" Qualifications block naming PyOD as anomaly-detection tooling) qualifies as Tier 0(b)-equivalent only when all of the following hold:

First-party host. The canonical URL is the company's own careers domain (openai.com/careers/..., anthropic.com/jobs/..., deepmind.google/careers/..., ai.meta.com/careers/..., etc.), not a Greenhouse / Lever / Ashby / DFJ Growth / Glassdoor / LinkedIn / Indeed mirror. ATS mirrors are kept under mirrors[] in the candidate record but never count as the load-bearing citation.
Tool named as operational tooling, not background literature. The mention sits in Qualifications, Responsibilities, or Tech Stack as a tool the hire is expected to use, not in a "see also" or "related work" footnote.
Durable snapshot exists. A Wayback Machine archive URL OR a committed local sidecar pair (HTML + PDF in news-snapshots/<slug>-<YYYY-MM-DD>.{html,pdf}) is in the repo, with a Markdown index file documenting the live URL, capture date, verification method, and verbatim quote. Sidecars must be captured from a logged-in browser session when the live URL is behind Cloudflare; PDF must be re-verified with python skills/news-search/scripts/pdf_term_scan.py <pdf_path>.

When all three hold, the candidate goes into Ledger 1 (Government/Policy citations) under Tier 0(b) with a Source URLs row that includes the live URL, mirror URLs, and the snapshot index path. The #8g precedent (news-snapshots/openai-careers-technical-intelligence-analyst-2026-05-07.md) is the reference shape; new entries follow that index format.

If the live URL is reachable but no snapshot exists yet, set tier_guess: phase_b_priority and status: paywall_or_blocked (or candidate with a snapshot-pending note in notes). Do not promote to Ledger 1 from a snippet alone — careers pages go stale within weeks of the role being filled, so an unsnapshotted Tier 0(b) claim becomes unverifiable as soon as OpenAI / Anthropic rotates the URL.

Non-FM-co careers pages (Wells Fargo, Capital One, JPMC, Pfizer, Goldman, etc.) follow the same snapshot-or-hold rule but classify under Ledger 3 (ecosystem adoption — enterprise operational adoption evidence), not Ledger 1 / Tier 0(b). Only foundation-model companies get the Tier 0(b) lift; the rationale is that FM-co operational tooling decisions are themselves treated as authoritative signal in the way GAO / NIST PDFs are. A non-FM enterprise JD naming a tool is operational adoption evidence comparable to a code import or vendor whitepaper, which is Ledger 3 territory; it is not third-party media (Ledger 2) and it is not a government / FM-co citation (Ledger 1).

Required Sections in Output File

Coverage ledgers (separate counts for each):
- Government/Policy citations — Tier 0: government reports, foundation model system cards, standards documents, analyst reports that cite your work by name
- External media — Tier 1-2: third-party press, institutional features, dedicated blog posts by external authors
- Ecosystem adoption — Tier 3: books, podcasts, enterprise integrations, patents, tutorials, platform integrations by external parties
- First-party/community — self-authored blog posts, GitHub discussions, dataset hosting (not external coverage)
- Awards & recognitions — Tier 4: awards, fellowships, encyclopedia entries
Topic Validation appendix — articles that cover the same topic but do not name your work
Negative Results table — outlet types searched with no results (prevents re-searching)
Upcoming Opportunities — imminent conferences, journalist contacts from prior coverage
Summary Statistics — separate counts per ledger, not a single aggregate. Report: government/policy total, external media total, ecosystem total, first-party/community total, and awards total.
Coverage matrix — per-item appendix or CSV with one row per paper/tool, dimensions searched (D1-D8 plus D10; D9 is now handled by the standalone [[citation-audit]] skill), Phase A candidate count, and Phase B outcome (kept/topic-only/dropped/none). This makes the audit auditable.
Registry harvest summary — list of new domains added to references/domain-registry.md from this round's confirmed Phase B hits, grouped by class. Empty list is fine and should still be reported, so the harvest step stays visible across rounds.

Incremental Updates

When running a targeted search (not full audit), append new findings to the existing file. Do not overwrite previous results. Mark the date of each search pass.

Run Modes

Mode	When	Dimensions to run
Full audit	Once per semester, before portfolio updates	All 9 news-coverage dimensions (D1-D8 + D10) plus a freshness-gated hook into the standalone [[citation-audit]] skill (see "Cross-skill: citation-audit integration" below)
Targeted	After a specific paper acceptance or tool release	Dims 1, 3, 4, 5 scoped to that item, plus D8 when policy or PDF evidence is plausible
Quick check	Before grant submissions	Dims 1, 6, 8, 10 (citations, impact, government PDFs, external deep research)
Topic monitor	When a trending topic connects to your work	Dim 4 only, focused on that topic
Ecosystem check	Before broader-impact statements	Dim 7 (education, code ecosystem, global)
PDF deep search	Before tenure materials or when a specific gov report is suspected	Dim 8 only, with candidate PDF list
Affiliation audit	Before tenure / promotion, after major citation milestones	Invoke the standalone `/citation-audit` skill (was D9 here before; split out as its own skill at `skills/citation-audit/SKILL.md`)
External deep research	After automated audit, as a complement pass	Dim 10 (external LLM deep research)

Dimension 10: External LLM Deep Research

Purpose: Use external deep research tools (ChatGPT Deep Research, Gemini Deep Research, Claude on claude.ai, or similar) as a complement to the automated Dimensions 1-8. These tools have browsing capabilities, PDF reading, and search strategies that differ from Claude Code's WebSearch, and consistently find items the automated audit misses.

Why this matters: In practice, external deep research tools found the NIST AI 100-2e2025 citation, a third FLI AI Safety Index edition, 5 additional patents, and non-English coverage in Korean/German/Spanish that the automated audit missed entirely. These tools are not a substitute for the structured audit (they lack the systematic coverage and verification discipline), but they are a strong complement.

How to run

After completing Dimensions 1-8 and reading the freshness state of the standalone citation-audit hook, generate a self-contained prompt for external deep research tools. The prompt should include the full tool/paper inventory, what to search for, and the citation verification rule. See references/search-queries.md for the base query bank, but the prompt should be open-ended ("search broadly and creatively — I do not know where the coverage is").
Run the prompt in 1-3 external tools. Different tools have different search indices and browsing capabilities; running multiple increases coverage.
Save the raw output to external-research/ in the project root. Name each file by source and date: {source}-{YYYY-MM}.md (e.g., chatgpt-deep-research-2026-04.md, gemini-2026-04.md, claude-2026-04.md). Date the files so future runs know what was already searched and when. Running once per quarter is sufficient; monthly if a major release or conference just happened.
Diff the external findings against the existing audit. Use an agent to extract only genuinely new items (not already in news-coverage-audit.md).
For any Tier 0 claims (government, NIST, foundation model system cards), manually verify by opening the source PDF and searching for the tool name. External deep research tools hallucinate citations at a non-trivial rate.

What external tools find that automated search misses

Government PDFs: NIST publications, congressional reports, agency toolkits. These tools can browse and read PDFs that WebSearch cannot index.
Patents: Google Patents searches with natural language are more effective through browsing tools than through API queries.
Non-English coverage: Deep research tools handle multilingual searches better and find content on platforms (Bilibili, Tistory, ichi.pro) that site:-scoped web search misses.
Older coverage: Blog posts and tutorials from 2018-2020 that have fallen out of search engine rankings but are still live.

What they get wrong

Hallucinated citations: A deep research tool may claim a PDF contains your tool name when it does not. Always verify Tier 0 claims manually.
Name collisions: "Aegis" matches many unrelated projects. "BOND" matches biology papers. Verification is mandatory.
Stale or broken links: Some URLs returned may be dead. Check before adding to the audit.

Cross-skill: citation-audit integration

Bibliometric citation-affiliation audit (formerly Dimension 9 inside this skill) now lives in its own skill at skills/citation-audit/SKILL.md (slash command: /citation-audit). The two skills divide the external-impact landscape:

news-search (this skill): editorial coverage. Press, blogs, government PDFs, ecosystem, deep-research-tool output. Output: news-coverage-audit.md.
[[citation-audit]]: bibliometric coverage. Citing-paper author affiliations via OpenAlex and Dimensions Analytics. Output: citation-affiliation-audit.md.

The "Full audit" mode of this skill hooks citation-audit results into the editorial report so the final news-coverage-audit.md captures both sides of impact evidence. The hook is freshness-gated, not always-on, because a full citation audit takes 30-80 minutes and may not be appropriate every time news-search runs.

Hook procedure

When running the "Full audit" Run Mode, perform the following before writing news-coverage-audit.md:

Check whether citation-affiliation-audit.md exists at the project root.
If missing: tell the user "Citation affiliation audit has never run on this project; recommend running /citation-audit --source both (or --source openalex if no Dimensions credentials) before the news-search full audit." Do not auto-invoke the long citation audit without confirmation.
If fresh (mtime within the last 30 days): copy citation-affiliation-audit.md's Tier 0 and Tier 1 tables verbatim into news-coverage-audit.md under a ## Citation Affiliation Evidence (integrated from citation-audit) section. Specifically:
- Reproduce the full Tier 0 table (all rows), with the same Category | Institution | Country | Your Work Cited | Citing Paper | Year | Source columns.
- Reproduce the full Tier 1 table (all rows), same columns.
- Reproduce the Summary by Institution subsection.
- Reproduce the per-source Coverage subsections (so the freshness gate and source coverage stay visible in the merged report).
- Add a one-line freshness stamp at the top of the section citing the audit's generation date.
- Add a link back to citation-affiliation-audit.md for the canonical separate copy.
If stale (mtime older than 30 days): integrate the same content, prefix the section header with (stale, regenerate via /citation-audit), and surface the staleness to the user. Do not silently truncate.

The hook never re-runs the citation audit silently; that decision belongs to the user. The hook only reads the existing file and integrates its full content into the unified report.

Why full integration, not summary

news-coverage-audit.md is the single artifact a tenure / promotion reader or grant reviewer scans for external-impact evidence. Forcing them to follow a link into a separate citation-affiliation-audit.md (which they may not realize exists) creates a gap in the impact story. Truncating to "top 10 hits" loses the long-tail Tier 1 evidence that often matters most for broader-impact narratives (e.g., a niche citation from Capital One or Mayo Clinic adds new domain breadth that a top-10 list might drop). Embedding the full tables verbatim keeps the integrated report self-contained while the standalone citation-affiliation-audit.md remains canonical for incremental re-runs.

Why the hook is one-way

The hook is news-search reading citation-audit's output, not citation-audit calling into news-search. This keeps the two skills independent: the citation-audit pipeline (OpenAlex / Dimensions DSL, Tier 0 / Tier 1 regex patterns, per-source coverage merging) does not need to know anything about editorial-coverage discovery. The single integration surface is the citation-affiliation-audit.md file, which both skills agree on as the contract.

name	news-search
description	Systematic web search for news, media, policy, and industry coverage of FORTIS Lab publications, tools, and research. Use when the user asks for a news audit, media coverage check, broader impact evidence, or visibility search for their work.

News & Media Coverage Search

When to Use

Periodic audit of media coverage (quarterly recommended)
Before tenure/promotion materials
Before grant applications requiring "broader impact" evidence
After major paper acceptances or tool releases
When updating the website news section

Inputs

Read these files before starting:

data/publications.json — all papers (titles, venues, years, links)
data/open-source.json — all tools/libraries (names, stars, URLs)
news-coverage-audit.md — previous audit results (skip known items, update stale entries)

Execution Model

references/search-queries.md: query bank (not exhaustive; see triage rules below)
references/outlet-registry.md: outlet classification and site: domain lists
references/search-strategy.md: techniques for finding indirect coverage, when to persist vs. stop, name disambiguation, Cloudflare/SSR-shell fetch tactics, and the Phase B "snippet alone is not verified" rule
references/candidate-schema.md: Phase A and Phase B candidate record contract
references/disclaimer-patterns.md: AI-generated, aggregator, translation, templated-database, and blocked-page detection
references/domain-registry.md: seed domains by source class and outlet_class values for Phase A
references/disambiguation-registry.md: cumulative tool-name and person-name collision rules + verified-negative leads from prior rounds (consult before counting any borderline match)
scripts/pdf_term_scan.py: PyMuPDF-based FORTIS-term scanner with built-in false-positive filters; canonical Phase B PDF-deep-search tool. Run as python skills/news-search/scripts/pdf_term_scan.py <pdf_path>.

Query Bank Triage Rules

The query bank in references/search-queries.md is curated, not exhaustive. It covers high-adoption tools and papers with known media hooks.

For items not in the query bank, generate queries at runtime:

Tools with 500+ GitHub stars: add dedicated Dimension 2 queries
All papers at top venues from the current or prior year: add Dimension 5 smart-keyword entries
Older papers and low-star tools: still search with at least one Dimension 5 smart-keyword query each
Preprints: search with distinctive claim keywords

Pipeline: Two-Phase Output

Phase A: Candidate Gathering

Add news-search-candidates.jsonl to .git/info/exclude (local, untracked) before the first run so git add -A does not stage scratch output.

Phase B: Verify and Classify

For each candidate in news-search-candidates.jsonl, fetch the page and apply five checks in order:

Pre-tier filter: first-party / already-tracked / disambiguation drops. Before running the citation rule, drop the candidate if it falls into any of these patterns (each was stepped into during the 2026-05-07 round):
- First-party hosting on the PI's current or prior institution (e.g., the PI's CMU PhD-era profile, an NSF PAR record of the PI's own grant output, a journal mirror of the PI's own paper).
- Already-tracked award URL — the canonical landing page for an award already recorded in Ledger 5.
- Coauthor-institution publication listing — a bare research-listing page on a coauthor institution's site (Microsoft Research, Adobe Research, etc.) that is not editorial; demote to Ledger 3.
- Name-collision drop — the match is on a different person ("Yue Zhao" → Yuchen / Siyan / Qingyue / W. / D. Zhao) or a different project ("Aegis" → Forrester AEGIS / NVIDIA Aegis / RedHat aegis-ai; "TrustLLM" → trustllm.eu; "TDC" → TDCJ / J&J Therapeutics Discovery). Consult references/disambiguation-registry.md.
Direct-mention / topic-validation routing (the citation verification rule in the Output section). If the page names the work, person, lab, co-author, institution, or direct URL per one of clauses 1 to 6, fill direct_mention and continue as coverage. If it does not pass direct mention but clearly covers the same topic area, set tier: "topic-validation" and keep it for the Topic Validation appendix, not a coverage ledger. If it is neither direct coverage nor topic validation, set tier: "dropped" and record the drop reason in notes.

Snippet alone is not verified evidence for Tier 0 / Tier 1 candidates. WebSearch summaries can synthesize content that does not appear in the source (the 2026-05-07 round caught this with GAO-26-108695: snippet claimed TrustLLM citation; manual PDF extraction confirmed the PDF says nothing of the sort). Tier 0 / Tier 1 promotion requires direct fetch of the source — pdf_term_scan.py for PDFs, real-UA HTTP for web pages. If the source is gated and cannot be re-fetched, set tier_guess: phase_b_priority and leave as a candidate; do not count.
Disclaimer / aggregator detection (references/disclaimer-patterns.md). Run the regex sweep on fetched content. Set entries in the candidate's flags[] field. Hard caps:
- ai_generated and aggregator are capped at Tier 3 regardless of outlet domain.
- machine_translated is capped at Tier 3 unless editorial_translation is also set.
- paywall_or_blocked is held for manual verification, not classified from snippet alone.
Tier assignment per the tier structure in the Output section. Assign coverage tiers (Tier 0 through Tier 5) only to candidates that pass direct mention; topic-only candidates keep tier: topic-validation from step 1.
Registry harvest status. For each kept coverage row, set registry_status to existing or new after checking the page's registered domain against references/domain-registry.md. Leave registry_status empty on dropped and topic-validation rows.

Domain Registry and Post-Round Harvest

Once a quarter, run a registry-disabled pass (open dragnet only) to surface new outlet classes the registry has not seen yet. This is what catches the next surprise category.

Dimension 1: Person & Lab

Find coverage that names the PI or lab, regardless of which paper or tool.

See references/search-queries.md § Dimension 1 for the full query list.

Dimension 2: Tools in Non-Academic Contexts

Major tools (PyOD, TrustLLM, agent-audit, Aegis, ADBench) may appear in industry deployments, government reports, textbooks, or enterprise case studies without naming the PI.

Search for: each tool name + context keywords (enterprise, deployment, production, fraud detection, cybersecurity, government, NIST, federal, textbook, course, patent, Walmart, NASA, Tesla).

See references/search-queries.md § Dimension 2 for the full query list.

Dimension 3: Outlet Sweep

Systematically check each outlet category using site: filters. This is the most important dimension for finding coverage the other dimensions miss.

Categories: security press, business press, top tech press, AI newsletters, science press, government/policy, industry analysts, university/institutional press, developer community.

Dimension 4: Topic Proximity

Search for the broader trending topic and check if the work appears within coverage. This catches indirect coverage where the paper is relevant but not cited by name.

See references/search-queries.md § Dimension 4 for the full query list.

Dimension 5: Smart Paper Search

For papers where exact title search fails (most papers), use distinctive result keywords or striking claims from the paper instead.

Examples: "surpassed human performance 95.33% CLADDER" for the causal reasoning paper, "defense training breaks LLM agents 47-77% benign task failure" for The Autonomy Tax.

See references/search-queries.md § Dimension 5 for the full mapping table.

Dimension 6: Citation & Downstream Impact

Track high-level citation metrics, appearances in high-impact journals (Nature, Science), enterprise adoption evidence, and downstream tools built on the work.

See references/search-queries.md § Dimension 6 for the full query list.

Dimension 7: Education, Ecosystem & Global

Search the surfaces where widely-adopted tools spread beyond academic papers and news: education platforms, code ecosystems, non-English press, and developer communities.

See references/search-queries.md § Dimension 7 Education/Ecosystem for queries.

Dimension 8: PDF Deep Search (Government, Think Tank, Industry Reports)

Strategy

Identify candidate PDFs — search for government/think tank reports on topics your work addresses (AI agent security, anomaly detection, LLM trustworthiness, AI auditing). Collect the PDF URLs.
Fetch and search inside each PDF — download or fetch the PDF, extract text (via PyMuPDF, pdftotext, or the WebFetch tool), and search for: tool names (PyOD, TrustLLM, Aegis, agent-audit, etc.), paper titles, author names ("Yue Zhao", "Zhao et al."), arXiv IDs, and repo URLs.
Verify and classify — if found, note the exact page, footnote number, and surrounding context.

High-priority PDF sources to search

U.S. Government (highest priority):

U.S. Senate committee reports — HSGAC, Commerce, Judiciary, Armed Services (AI-related)
U.S. House committee reports — Science, Homeland Security, Financial Services
NIST special publications — AI RMF updates, AI agent security, AI 100-series
GAO reports — AI technology assessments, Science & Tech Spotlight series
CRS reports — Congressional Research Service AI analyses
Federal agency AI strategies — DOD (JAIC/CDAO), DOE national labs, HHS, Treasury/OCC, SEC, CFTC, Federal Reserve
White House — AI executive orders, OMB memoranda, OSTP reports, CEA reports
NSF — program solicitations, dear colleague letters mentioning anomaly detection or AI safety

International Government (high priority):

EU — AI Act impact assessments, ENISA reports, EU AI Office publications
UK — AI Safety Institute reports, DSIT AI regulation papers, Alan Turing Institute policy briefs
Canada — ISED AI strategy, Canadian Centre for Cyber Security
Australia — Department of Industry AI reports, eSafety Commissioner
Singapore — IMDA Model AI Governance Framework
OECD — AI Policy Observatory reports, OECD AI Principles implementation documents
UN — UNESCO AI ethics recommendations, ITU AI reports
G7/G20 — Hiroshima AI Process documents, AI governance communiques

Think tanks & policy institutes:

Brookings, RAND, CSET Georgetown, Stanford HAI, FLI, CAIS, Partnership on AI
Center for Data Innovation, Information Technology and Innovation Foundation (ITIF)
Centre for International Governance Innovation (CIGI)

Foundation model companies (Tier 0 if they cite your work):

OpenAI — system cards (GPT-4, GPT-5, o1, o3), safety reports, preparedness framework documents, red teaming reports
Anthropic — model cards, responsible scaling policy documents, safety research reports
Google DeepMind — technical reports, Gemini system cards, safety evaluations
Meta AI — Llama model cards, system cards, responsible use guides
Mistral — model documentation, technical reports
xAI — Grok system cards and technical reports
Cohere — model cards, safety documentation
Microsoft — Phi model cards, responsible AI reports, Azure AI safety documentation
Amazon — Titan model documentation, AWS AI safety reports

Standards bodies:

ISO/IEC (AI standards series), IEEE SA, OWASP (agentic AI PDFs)
MITRE ATLAS documentation

Industry whitepapers & analyst reports (Tier 0 if they cite your work by name):

McKinsey, Deloitte, PwC, Accenture, EY, KPMG, BCG, Bain
Gartner, Forrester, IDC research reports

Known citations found via this dimension

U.S. Senate HSGAC — "Hedge Fund Use of Artificial Intelligence" (Jun 2024), footnote 119 cites TrustLLM on page 25
FLI AI Safety Index — Winter 2025 PDF uses TrustLLM as an official benchmark

Why web search misses these

Output

Write all results to news-coverage-audit.md at the project root.

Citation Verification Rule

An item only counts as coverage if the article names or cites at least one of:

A specific paper title or tool name (PyOD, TrustLLM, Aegis, agent-audit, etc.)
The PI by name ("Yue Zhao")
The lab ("FORTIS")
A co-author by name in the context of the specific paper/tool
An institutional attribution ("researchers from USC", "a USC team") in the context of the specific paper/tool
A direct link to the project URL, repo, or arXiv paper

Tier Structure

All tiers below require the citation verification rule above. If a result does not name or cite your work, it does not belong in any tier.

Tier	Definition	Examples
Tier 0	(a) Government reports (U.S. or international: legislative, executive, federal/national agency), international body reports (OECD, UN, EU), or official standards documents; (b) Technical reports, system cards, safety reports, or model cards from major foundation model companies (OpenAI, Anthropic, Google DeepMind, Meta AI, Mistral, xAI, Cohere, etc.); (c) Major consulting/analyst firm reports (McKinsey, Gartner, Forrester, Deloitte, etc.) — all that cite your work by name	U.S. Senate report citing TrustLLM, OpenAI system card citing TrustLLM, Anthropic safety report citing anomaly detection benchmark, Gartner report citing PyOD
Tier 1	Mainstream tech/business/security press, major research institutions (national labs, Hoover, Microsoft Research), or high-impact policy reports that name your work	FLI AI Safety Index using TrustLLM, LLNL article naming TrustLLM, Nature Biotechnology citing DrugAgent
Tier 2	Industry press, institutional PR, or dedicated features that name your work	"DrugAgent" in MarkTechPost, "Yue Zhao" in USC Viterbi News, Databricks blog naming PyOD
Tier 3	Dedicated blog posts, tutorials, or platform integrations naming your tool	KDnuggets PyOD tutorial, Databricks Kakapo built on PyOD, DEV Community Aegis post
Tier 4	Awards, recognitions, encyclopedia entries	Amazon RA, NVIDIA Grant, Grokipedia entry
Tier 5	Academic community only (Hugging Face, alphaXiv, etc.)	Paper pages, GitHub stars, Moonlight reviews

Separate appendix (not a tier):

Topic Validation — articles covering the same topic area without naming your work. Useful for grant narratives ("our research addresses concerns raised in McKinsey's 2026 report on agentic AI security") but not website news items.

Tier 0(b) extension: foundation-model-company careers pages

First-party host. The canonical URL is the company's own careers domain (openai.com/careers/..., anthropic.com/jobs/..., deepmind.google/careers/..., ai.meta.com/careers/..., etc.), not a Greenhouse / Lever / Ashby / DFJ Growth / Glassdoor / LinkedIn / Indeed mirror. ATS mirrors are kept under mirrors[] in the candidate record but never count as the load-bearing citation.
Tool named as operational tooling, not background literature. The mention sits in Qualifications, Responsibilities, or Tech Stack as a tool the hire is expected to use, not in a "see also" or "related work" footnote.
Durable snapshot exists. A Wayback Machine archive URL OR a committed local sidecar pair (HTML + PDF in news-snapshots/<slug>-<YYYY-MM-DD>.{html,pdf}) is in the repo, with a Markdown index file documenting the live URL, capture date, verification method, and verbatim quote. Sidecars must be captured from a logged-in browser session when the live URL is behind Cloudflare; PDF must be re-verified with python skills/news-search/scripts/pdf_term_scan.py <pdf_path>.

Required Sections in Output File

Coverage ledgers (separate counts for each):
- Government/Policy citations — Tier 0: government reports, foundation model system cards, standards documents, analyst reports that cite your work by name
- External media — Tier 1-2: third-party press, institutional features, dedicated blog posts by external authors
- Ecosystem adoption — Tier 3: books, podcasts, enterprise integrations, patents, tutorials, platform integrations by external parties
- First-party/community — self-authored blog posts, GitHub discussions, dataset hosting (not external coverage)
- Awards & recognitions — Tier 4: awards, fellowships, encyclopedia entries
Topic Validation appendix — articles that cover the same topic but do not name your work
Negative Results table — outlet types searched with no results (prevents re-searching)
Upcoming Opportunities — imminent conferences, journalist contacts from prior coverage
Summary Statistics — separate counts per ledger, not a single aggregate. Report: government/policy total, external media total, ecosystem total, first-party/community total, and awards total.
Coverage matrix — per-item appendix or CSV with one row per paper/tool, dimensions searched (D1-D8 plus D10; D9 is now handled by the standalone [[citation-audit]] skill), Phase A candidate count, and Phase B outcome (kept/topic-only/dropped/none). This makes the audit auditable.
Registry harvest summary — list of new domains added to references/domain-registry.md from this round's confirmed Phase B hits, grouped by class. Empty list is fine and should still be reported, so the harvest step stays visible across rounds.

Incremental Updates

When running a targeted search (not full audit), append new findings to the existing file. Do not overwrite previous results. Mark the date of each search pass.

Run Modes

Mode	When	Dimensions to run
Full audit	Once per semester, before portfolio updates	All 9 news-coverage dimensions (D1-D8 + D10) plus a freshness-gated hook into the standalone [[citation-audit]] skill (see "Cross-skill: citation-audit integration" below)
Targeted	After a specific paper acceptance or tool release	Dims 1, 3, 4, 5 scoped to that item, plus D8 when policy or PDF evidence is plausible
Quick check	Before grant submissions	Dims 1, 6, 8, 10 (citations, impact, government PDFs, external deep research)
Topic monitor	When a trending topic connects to your work	Dim 4 only, focused on that topic
Ecosystem check	Before broader-impact statements	Dim 7 (education, code ecosystem, global)
PDF deep search	Before tenure materials or when a specific gov report is suspected	Dim 8 only, with candidate PDF list
Affiliation audit	Before tenure / promotion, after major citation milestones	Invoke the standalone `/citation-audit` skill (was D9 here before; split out as its own skill at `skills/citation-audit/SKILL.md`)
External deep research	After automated audit, as a complement pass	Dim 10 (external LLM deep research)

Dimension 10: External LLM Deep Research

How to run

After completing Dimensions 1-8 and reading the freshness state of the standalone citation-audit hook, generate a self-contained prompt for external deep research tools. The prompt should include the full tool/paper inventory, what to search for, and the citation verification rule. See references/search-queries.md for the base query bank, but the prompt should be open-ended ("search broadly and creatively — I do not know where the coverage is").
Run the prompt in 1-3 external tools. Different tools have different search indices and browsing capabilities; running multiple increases coverage.
Save the raw output to external-research/ in the project root. Name each file by source and date: {source}-{YYYY-MM}.md (e.g., chatgpt-deep-research-2026-04.md, gemini-2026-04.md, claude-2026-04.md). Date the files so future runs know what was already searched and when. Running once per quarter is sufficient; monthly if a major release or conference just happened.
Diff the external findings against the existing audit. Use an agent to extract only genuinely new items (not already in news-coverage-audit.md).
For any Tier 0 claims (government, NIST, foundation model system cards), manually verify by opening the source PDF and searching for the tool name. External deep research tools hallucinate citations at a non-trivial rate.

What external tools find that automated search misses

Government PDFs: NIST publications, congressional reports, agency toolkits. These tools can browse and read PDFs that WebSearch cannot index.
Patents: Google Patents searches with natural language are more effective through browsing tools than through API queries.
Non-English coverage: Deep research tools handle multilingual searches better and find content on platforms (Bilibili, Tistory, ichi.pro) that site:-scoped web search misses.
Older coverage: Blog posts and tutorials from 2018-2020 that have fallen out of search engine rankings but are still live.

What they get wrong

Hallucinated citations: A deep research tool may claim a PDF contains your tool name when it does not. Always verify Tier 0 claims manually.
Name collisions: "Aegis" matches many unrelated projects. "BOND" matches biology papers. Verification is mandatory.
Stale or broken links: Some URLs returned may be dead. Check before adding to the audit.

Cross-skill: citation-audit integration

news-search (this skill): editorial coverage. Press, blogs, government PDFs, ecosystem, deep-research-tool output. Output: news-coverage-audit.md.
[[citation-audit]]: bibliometric coverage. Citing-paper author affiliations via OpenAlex and Dimensions Analytics. Output: citation-affiliation-audit.md.

Hook procedure

When running the "Full audit" Run Mode, perform the following before writing news-coverage-audit.md:

Check whether citation-affiliation-audit.md exists at the project root.
If missing: tell the user "Citation affiliation audit has never run on this project; recommend running /citation-audit --source both (or --source openalex if no Dimensions credentials) before the news-search full audit." Do not auto-invoke the long citation audit without confirmation.
If fresh (mtime within the last 30 days): copy citation-affiliation-audit.md's Tier 0 and Tier 1 tables verbatim into news-coverage-audit.md under a ## Citation Affiliation Evidence (integrated from citation-audit) section. Specifically:
- Reproduce the full Tier 0 table (all rows), with the same Category | Institution | Country | Your Work Cited | Citing Paper | Year | Source columns.
- Reproduce the full Tier 1 table (all rows), same columns.
- Reproduce the Summary by Institution subsection.
- Reproduce the per-source Coverage subsections (so the freshness gate and source coverage stay visible in the merged report).
- Add a one-line freshness stamp at the top of the section citing the audit's generation date.
- Add a link back to citation-affiliation-audit.md for the canonical separate copy.
If stale (mtime older than 30 days): integrate the same content, prefix the section header with (stale, regenerate via /citation-audit), and surface the staleness to the user. Do not silently truncate.

The hook never re-runs the citation audit silently; that decision belongs to the user. The hook only reads the existing file and integrates its full content into the unified report.

news-search

Plus depuis ce dépôt

Plus depuis ce dépôt

News & Media Coverage Search

When to Use

Inputs

Execution Model

Query Bank Triage Rules

Pipeline: Two-Phase Output

Phase A: Candidate Gathering

Phase B: Verify and Classify

Domain Registry and Post-Round Harvest

Dimension 1: Person & Lab

Dimension 2: Tools in Non-Academic Contexts

Dimension 3: Outlet Sweep

Dimension 4: Topic Proximity

Dimension 5: Smart Paper Search

Dimension 6: Citation & Downstream Impact

Dimension 7: Education, Ecosystem & Global

Dimension 8: PDF Deep Search (Government, Think Tank, Industry Reports)

Strategy

High-priority PDF sources to search

Known citations found via this dimension

Why web search misses these

Output

Citation Verification Rule

Tier Structure

Tier 0(b) extension: foundation-model-company careers pages

Required Sections in Output File

Incremental Updates

Run Modes

Dimension 10: External LLM Deep Research

How to run

What external tools find that automated search misses

What they get wrong

Cross-skill: citation-audit integration

Hook procedure

Why full integration, not summary

Why the hook is one-way

News & Media Coverage Search

When to Use

Inputs

Execution Model

Query Bank Triage Rules

Pipeline: Two-Phase Output

Phase A: Candidate Gathering

Phase B: Verify and Classify

Domain Registry and Post-Round Harvest

Dimension 1: Person & Lab

Dimension 2: Tools in Non-Academic Contexts

Dimension 3: Outlet Sweep

Dimension 4: Topic Proximity

Dimension 5: Smart Paper Search

Dimension 6: Citation & Downstream Impact

Dimension 7: Education, Ecosystem & Global

Dimension 8: PDF Deep Search (Government, Think Tank, Industry Reports)

Strategy

High-priority PDF sources to search

Known citations found via this dimension

Why web search misses these

Output

Citation Verification Rule

Tier Structure

Tier 0(b) extension: foundation-model-company careers pages

Required Sections in Output File

Incremental Updates

Run Modes

Dimension 10: External LLM Deep Research

How to run

What external tools find that automated search misses

What they get wrong

Cross-skill: citation-audit integration

Hook procedure

Why full integration, not summary

Why the hook is one-way