Run any Skill in Manus with one click

$pwd:

autopilot-research

Name: Autopilot Research
Author: dmlguq456

// Research survey pipeline — multi-mode investigation (academic / technology / market). Mode-specific search sources and report templates. Field intelligence only; no PPT/paper drafts. Hand off to autopilot-doc (writing/slides) or autopilot-code (build) for actual document/code creation.

Run Skill in Manus

$ git log --oneline --stat

stars:1

forks:0

updated:May 6, 2026 at 13:55

SKILL.md

readonly

package.json

"author": "dmlguq456"

"repository": "dmlguq456/claude_setting"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	autopilot-research
description	Research survey pipeline — multi-mode investigation (academic / technology / market). Mode-specific search sources and report templates. Field intelligence only; no PPT/paper drafts. Hand off to autopilot-doc (writing/slides) or autopilot-code (build) for actual document/code creation.
argument-hint	<query> [--mode academic\|technology\|market] [--depth shallow\|medium\|deep] [--qa quick\|light\|standard\|thorough] [--no-clarify] [--from search\|analyze\|report]

산출물 폴더 컨벤션: SKILL_OUTPUT_CONVENTION.md (3-tier: T1 root / T2 named subdir / T3 _internal/). 본 skill의 raw metadata (search_results.json, phase_a_*.json, chaining_results.md, code_search.md 등) + reviews는 모두 _internal/ 하위로 격리. T1/T2 chapter 파일과 cards/는 root.

Language Rule

When explaining something to the user, write in Korean.

Argument Parsing

Parse $ARGUMENTS for optional flags:

query: research topic, paper title, arXiv ID, or PDF path (remaining text after flags)
--mode: academic (default) | technology | market — investigation type (see Modes below)
--depth: shallow | medium (default) | deep
(no --refs flag — local reference materials should be pre-processed via /analyze-project --mode paper first → output goes to .claude_reports/analysis_project/paper/ which autopilot-research auto-detects)
--qa: quick | light | standard (default) | thorough — override QA intensity for report QA loop. Standard+ runs a parallel fact-checker (sonnet) alongside quality reviewer(s) for cards verbatim 대조 (citation/venue/year/metric verification). quick은 review loop를 1라운드로 강제 종료하는 fastest path — 1× 품질관리팀(sonnet) 단일 패스 후 🔴 잔존 시에도 재호출 없이 unresolved.md만 기록하고 종료. fact-checker 비활성, refine-style re-invoke 비활성.
--from: search | analyze | report — resume the pipeline at a specific stage (see Resume below)
--no-clarify: skip Step 0 Scope Clarification (force-run with current query as-is)

Modes

The mode determines (a) search sources used in Step 2, (b) Phase A/B/C activation in Step 3, and (c) report templates in Step 4. The pipeline structure (search → analyze → report) is the same across modes.

`--mode academic` (default)

Use when: 학술 논문 중심 조사 (deep learning method survey, 알고리즘 비교, 분야 trend).

Search sources: arXiv, Semantic Scholar, OpenAlex, Hugging Face paper_search, Google Scholar
Phases: A (skimming) + B (reference chaining) + C (code & model search) — 모두 활성
Reports: 9개 (briefing → landscape → core_papers → baselines → technical_deep_dive → datasets → implementation → resources → reading_guide) — 현행 동일

`--mode technology`

Use when: 산업 표준·기술 ecosystem 조사 (코덱/프로토콜, 표준 문서, vendor 솔루션 비교, 배포 고려사항).

Search sources: WebSearch (industry blogs, technical whitepapers, vendor docs), WebFetch (standards orgs: 3GPP / ITU-T / IEEE / W3C), arXiv (보조), Hugging Face (관련 모델)
Phases: A (full skim of standards + whitepapers) — 활성. B (reference chaining) — 약화 (academic citation 그래프가 의미 약함). C (code search) — 활성 (open-source 구현체).
Reports (7개):
- 00_briefing.md — Executive briefing
- 01_landscape.md — Technology landscape (categories, players, lineage)
- 02_standards.md — Standards & specs (3GPP/ITU-T/IEEE/RFC numbers, key sections)
- 03_vendor_comparison.md — Vendor / solution comparison (Qualcomm vs Samsung vs Apple vs ...)
- 04_technical_deep_dive.md — Algorithm·protocol details
- 05_deployment.md — Deployment considerations (latency, cost, integration paths)
- 06_implementation.md — Goal-adaptive roadmap (existing template, build/adopt 우선)
- 07_resources.md — Open-source code, model weights, evaluation tools

`--mode market`

Use when: 시장 동향·경쟁사·analyst report 조사 (제품/서비스 시장 사이즈, key players, 채택률).

Search sources: WebSearch (analyst content, news, earnings reports, press releases), WebFetch (company sites, investor pages)
Phases: A (skim of market reports + news) — 활성. B / C — 비활성 (학술 검색 X, 코드 검색 X).
Reports (5개):
- 00_briefing.md — Executive briefing
- 01_market_overview.md — Market sizing, segmentation, growth rate
- 02_key_players.md — Competitor profiles, market share, positioning
- 03_trends.md — Trends, drivers, inhibitors, disruptors
- 04_opportunities.md — Opportunity assessment + actionable recommendations

Mode 미지정 시 query 키워드로 추론 — "논문/algorithm/method/SOTA" → academic, "표준/codec/protocol/3GPP/ITU/chip/MCU" → technology, "market/시장/competitor/analyst" → market. Fallback: 어느 키워드도 매치되지 않으면 → academic (한 줄 통보: "키워드 매칭 실패 → academic으로 진행. 다른 모드는 --mode 명시"). Multi-match (>=2 modes 동시 매치): Step 0 Scope Clarification에서 사용자에게 확정 질문.

Decision Defaults (no autonomy gating)

The pipeline auto-proceeds with sane defaults. There is no autonomy-level dial. Pause points are limited to:

Decision Point	Default Behavior
Search results review	Auto-proceed.
Query expansion rounds	Auto-proceed.
Phase B loopback	Auto-proceed up to the depth-gated limit.
External material discovery	If `analysis_project/paper/` exists in current dir, auto-include as supplementary input. If user expects external materials but none found → suggest `/analyze-project --mode paper` first.
Search returned 0 papers	Auto-stop with `pipeline_summary(failed)` (no useful continuation possible).
Report generation	Auto-proceed.

Resume (`--from`)

--from <stage> re-enters an existing artifact directory and runs from that stage onward. Stages:

search — Step 2 (Paper Search)
analyze — Step 3 (Phase A skimming + B chaining + C code search + analysis_summary)
report — Step 4 (Report Generation + QA loop)

When --from is used, the positional argument should be either the artifact directory path or a fuzzy-matchable topic name. The orchestrator resolves it via ls -d .claude_reports/research/*$ARG* 2>/dev/null. Read pipeline_state.yaml to recover query, mode, depth, qa_level, clarified_intent. CLI flags override stored values. Step 0 Scope Clarification is always skipped on resume (already captured in first run).

pipeline_state.yaml

Written/updated at {artifact_dir}/pipeline_state.yaml after each completed stage:

pipeline: autopilot-research
query: <original query>
mode: academic                   # academic | technology | market (resolved at Step 1)
depth: medium
qa_level: standard
clarified_intent: <string or null>    # Step 0 output (if Clarification ran)
last_completed_stage: analyze    # one of: clarify, search, analyze, report
artifact_dir: <abs path>

Pipeline

Step 1: Input Parsing & Validation

Detect query type: keyword, paper title, arXiv ID, PDF path, folder path
Resolve --mode: explicit flag value, or infer from query keywords (academic / technology / market — see Modes section). Notify user of inferred mode in one line. Multi-match → defer resolution to Step 1.5 Scope Clarification.
Auto-detect supplementary input: if .claude_reports/analysis_project/paper/ exists in current dir, include as supplementary input for chaining. If user explicitly requested "use my local PDFs" but no analysis_project/paper/ → suggest running /analyze-project --mode paper first.
Construct topic name (sanitize: lowercase, hyphens, max 30 chars)
Set artifact_dir: .claude_reports/research/{topic}/
mkdir -p {artifact_dir} (only AFTER validation)

Step 1.5: Scope Clarification (사전 조율) — skipped if `--no-clarify` or `--from`

Purpose: 모호한 query는 mode 선택과 검색 폭을 잘못 잡아 9/7/5개 보고서 출력이 무용지물이 됨. 모호 detection 시 사용자에게 2-4 sharp question을 던진다.

Trigger conditions (any one matches → run):

Mode multi-match (≥2 modes 동시 매치)
Query 길이 < 50 Korean chars 또는 < 12 English words AND no specific constraint (예: time range, specific platform, target metric)
Query에 "조사/분석/survey" 같은 메타 키워드만 있고 구체적 deliverable·범위 없음

Mode-specific question seed:

academic: 조사 깊이(--depth 명시 의도?), 필독 컷오프(citation > N or year ≥ Y), 분야 경계(예: speech only? including audio in general?)
technology: 대상 표준 그룹/년도, 배포 환경(production/research), vendor 범위, 비교 축(performance/cost/license 우선순위)
market: 지역/시간 범위, 경쟁자 명시 여부, 의사결정 목적(투자 판단? 진출 결정? competitive intel?)

Skip 조건:

--no-clarify 명시
--from <stage> 재개 (이미 캡처됨)
Query 길이 ≥ 50 Korean chars 또는 ≥ 12 English words AND mode 명확

Output: 사용자 답변을 통합한 refined query를 Step 2로 전달 + pipeline_state.yaml의 clarified_intent 필드에 한 줄 요약 기록.

Step 2: Source Search (direct Agent call) — mode-aware

Search source selection per mode:

academic: arXiv + Semantic Scholar + OpenAlex + Hugging Face paper_search + Google Scholar (현행)

technology: WebSearch (industry blogs, vendor whitepapers) + WebFetch (3GPP/ITU-T/IEEE/W3C standards pages) + arXiv (보조) + Hugging Face (관련 모델)

market: WebSearch (analyst content, news, press releases) + WebFetch (company sites, investor pages). arXiv·Semantic Scholar·OpenAlex 비활성.

Step 2a: 초기 쿼리 확장 (LLM 지식 기반)

오케스트레이터가 사용자 쿼리로부터 2~3개 동의어/대체 표현을 생성한다. 목적: 같은 분야인데 다른 이름으로 불리는 연구를 첫 검색부터 포함. (예: "user-defined keyword spotting" → + "query-by-example KWS", "personalized wake word detection") queries = [original_query, variant_1, variant_2]

Step 2e의 논문 기반 확장과 다름: 2a는 LLM 사전 지식으로 동의어 생성, 2e는 실제 발견된 논문에서 새 키워드 추출.

Step 2b: HF MCP Pre-Fetch

Before invoking the agent, attempt HF paper_search for all queries:

For each query in queries: call paper_search and collect results
If successful: store combined as hf_results_json
If MCP unavailable or fails: hf_results_json = null, note in pipeline log

Step 2c: Invoke Agent

Agent(subagent_type="연구팀"):
  "Research survey mode: Paper search.
   Queries: {queries_list}
   Original query: {original_query}
   Query type: {detected_type}
   Output directory: {artifact_dir}
   **Routing**: All raw metadata files (search_results.json, phase_a_*.json, access_classification.json, browser_extracts/) → write to `{artifact_dir}/_internal/`. T1/T2 deliverables (cards/, chapter .md files, analysis_summary.md) → root `{artifact_dir}/`. mkdir -p `_internal` before first write if absent.
   Max results per source per query: 10
   {If analysis_project/paper/ available: 'Supplementary local paper analysis: {artifact_dir}/../analysis_project/paper/'}
   {If hf_results_json: 'HF paper_search results (pre-fetched): {hf_results_json}'}
   Timeout rule: If any single source takes >3 minutes, skip it and proceed to the next.

   ## search_results.json Schema
   {
     "query": "string", "date": "YYYY-MM-DD", "sources_used": ["string"],
     "total_papers": int,
     "papers": [{"title": "string (required)", "authors": ["string"],
       "year": int|null, "citation_count": int|null,
       "discovery_count": int (required, >=1), "sources": ["string"],
       "arxiv_id": string|null, "oa_url": string|null,
       "openalex_id": string|null, "referenced_works": ["string"]|null,
       "venue": string|null, "venue_tier": int|null (1-4), "raw_type": string|null,
       "url": string|null (landing page URL from any source — used by 탐색팀 for paywall access)}]
   }

   ## Google Scholar HTML Parsing Patterns
   - Split blocks: <div class='gs_r gs_or gs_scl'>
   - Title: strip tags from <h3> content
   - Year: , (\d{4})\s*[-–] pattern (leading comma required)
   - Citation: >Cited by (\d+)< pattern

   Follow your Role 2a procedure. Return file paths + 3-5 line Korean summary."

Step 2d: Post-Search Validation

Read {artifact_dir}/_internal/search_results.json
Verify valid JSON — if parse fails, re-invoke Agent once: "Your search_results.json was invalid. Fix and rewrite."
Verify papers array non-empty, each paper has title
If still fails after retry: pipeline_summary(failed) → STOP
If total_papers == 0: pipeline_summary(failed, "검색 결과 0건") → STOP

Error handling: If Agent call fails or returns no output → pipeline_summary(failed) → STOP.

Step 2e: Query Expansion Rounds (depth-gated)

발견된 논문의 제목/키워드에서 새로운 검색어를 추출하여 추가 검색 라운드를 실행한다.

라운드 제어 (depth 파라미터):

shallow: 추가 라운드 없음 (Round 1만)
medium: 최대 1회 추가 라운드 (Round 1 → keyword 추출 → Round 2)
deep: 최대 2회 추가 라운드 (Round 1 → Round 2 → Round 3)

각 라운드 절차:

오케스트레이터가 search_results.json의 논문 제목들을 읽고, 빈출 키워드/새로운 용어를 추출 (예: Round 1에서 "query-by-example", "metric learning", "prototypical network"가 반복 등장)
기존 쿼리에 없는 새 키워드로 2~3개 추가 쿼리 생성

새 쿼리만으로 연구팀 재호출 (기존 쿼리 재검색 안 함):

Agent(subagent_type="연구팀"):
  "Research survey mode: Paper search.
   Queries: {new_queries_only}
   Original query: {original_query} (for context, do NOT re-search)
   Output directory: {artifact_dir}
   **Routing**: raw metadata → `{artifact_dir}/_internal/` (search_results.json, etc.).
   Max results per source per query: 10
   MERGE mode: append to existing _internal/search_results.json — update discovery_count for duplicates, add new papers.
   ..."

병합 후 Post-Search Validation 재실행
새 논문이 3편 미만이면 → 라운드 종료 (수렴)

수렴 조건 (일찍 끝나는 경우):

추가 라운드에서 새 논문 < 3편 → 더 이상 확장하지 않음
새 키워드를 추출할 수 없음 (기존 쿼리와 동일) → 종료

Auto-proceed after expansion rounds (no user gate).

Step 3: Source Analysis (direct Agent calls) — mode-aware

Phase activation per mode:

academic: Phase A (skim) + B (reference chaining) + C (code/model search) — 모두 활성

technology: Phase A (full skim of standards + whitepapers) — 활성. Phase B (reference chaining) — 비활성 (academic citation graph가 의미 약함). Phase C — 활성 (open-source 구현체 탐색)

market: Phase A (skim of market reports + news) — 활성. Phase B / C — 비활성

Step 3a: Playwright Pre-Check + 탐색팀 Pre-Fetch

Bash: python3 -c "from playwright.async_api import async_playwright; print('OK')"
Bash: ls ~/.cache/ms-playwright/chromium_headless_shell-*/ > /dev/null 2>&1 && echo 'BROWSER_OK'

Set playwright_available = true/false.

If playwright_available == true: Read _internal/search_results.json and identify paywall papers (no arXiv ID AND no oa_url → likely paywall). If paywall papers exist, invoke 탐색팀 to pre-fetch their content:

Agent(subagent_type="탐색팀"):
  "Mode: fetch_papers
   URLs: {paywall_url_list}
   Output directory: {artifact_dir}
   Extract full text from each URL. Write to `_internal/browser_extracts/{filename}.txt` (T3 raw metadata).
   Return summary of successes and failures."

The extracted texts will be available for 연구팀 to Read during Phase A skimming. If 탐색팀 fails or playwright unavailable: proceed without — 연구팀 will fall through to abstract-only.

Step 3b: Phase A — Parallel Skimming Batches

Read _internal/search_results.json. Classify each paper's access type FIRST:

accessible: has arxiv_id OR oa_url OR matching file in _internal/browser_extracts/
paywall-only: no arxiv_id, no oa_url, no browser extract → abstract/metadata only

Construct batches (accessible papers only get full-read treatment):

Full-read accessible (citations > 10 AND not null AND accessible): 1 paper per Agent call
Abstract-only (citations <= 10 OR null OR paywall-only): up to 10 per Agent call
Exception: discovery_count >= 3 AND accessible → upgrade to full-read (1 per call)
Paywall-only papers: always go in abstract-only batches regardless of citation count (attempting WebFetch on paywall sites causes timeout/hang — never do this)

For each batch:

Agent(subagent_type="연구팀"):
  "Research survey mode: Paper analysis.
   Papers: {batch_json}
   Output directory: {artifact_dir}
   Supplementary inputs (if any): `{artifact_dir}/../analysis_project/paper/` (use if exists, otherwise none)
   Browser extracts: {artifact_dir}/_internal/browser_extracts/ (pre-fetched by 탐색팀, if available)

   ## CRITICAL RULES
   - Per-paper timeout: 60초 이내에 본문을 얻지 못하면 즉시 다음 논문으로. 절대 한 논문에 60초 이상 소비 금지.
   - Paywall fast-detect: arxiv_id도 oa_url도 없고 browser_extracts에도 없는 논문 → WebFetch 시도하지 말고 바로 OpenAlex Abstract만 사용.
   - WebFetch가 3xx redirect 무한 루프나 빈 응답을 반환하면 즉시 스킵.
   - 한 배치의 전체 처리 시간이 10분을 넘기지 않도록 한다.

   ## Paywall Access
   If browser_extracts/{filename}.txt exists for a paper: Read the pre-extracted text.
   If not: skip to metadata fallback (OpenAlex Abstract). Do NOT attempt browser access directly.
   Playwright 실행은 탐색팀(browser-team)이 전담 — 연구팀은 절대 직접 Playwright를 실행하지 않는다.

   Follow your Role 2b procedure. Return file paths + Korean summary."

Launch batches in parallel. Error handling: Individual batch failure → log and continue. Total failure (0 batches succeed) → pipeline_summary(failed) → STOP.

Step 3c: Phase B — Reference Chaining (depth-gated)

If depth == shallow: SKIP Phase B entirely.

Agent(subagent_type="연구팀"):
  "Research survey mode: Reference chaining.
   Paper cards: {artifact_dir}/cards/
   Search results: {artifact_dir}/_internal/search_results.json
   Depth: {depth}
   Output: {artifact_dir}/_internal/chaining_results.md
   Follow your Role 2b reference chaining procedure. Return file paths + Korean summary."

Loopback control (orchestrator responsibility):

Parse chaining_results.md → extract papers with reference_frequency >= 2
If new papers exist AND loopback_count < limit (medium: 1, deep: 2):
- Construct Phase A batches for new papers only (top 10)
- Invoke additional skimming Agent calls
- Increment loopback_count
- Re-invoke Phase B for further chaining
When limit reached or no new papers → proceed to Phase C

Step 3d: Phase C — Code & Model Search

Agent(subagent_type="연구팀"):
  "Research survey mode: Code and model search.
   Paper cards: {artifact_dir}/cards/
   Output: {artifact_dir}/code_resources/
   Aggregate: {artifact_dir}/_internal/code_search.md
   Follow your Role 2c procedure. Return file paths + Korean summary."

Step 3e: Compile analysis_summary.md

Agent(subagent_type="연구팀"):
  "Research survey mode: Compile analysis summary.
   Compile from: cards/, _internal/chaining_results.md (if exists), _internal/code_search.md (if exists).
   Set phase flags: chaining_available, code_search_available.
   Output: {artifact_dir}/analysis_summary.md
   Return file path + Korean summary."

Step 3 Status Check

Read {artifact_dir}/analysis_summary.md.

Not exists or 0 papers → pipeline_summary(failed) → STOP
Depth-aware: shallow + chaining_available == false + code_search_available == true → done (intentional skip)
Otherwise partial flags → partial, warn user, proceed

Step 4: Report Generation (direct Agent call + QA loop)

Step 4a: Generate Reports

Agent(subagent_type="연구팀"):
  "Research survey mode: Report generation.
   Analysis directory: {artifact_dir}
   Topic: {topic}
   Output directory: {artifact_dir}
   **Routing**: T1/T2 chapter files (00_briefing.md ~ NN_*.md, analysis_summary.md) → root `{artifact_dir}/`. Reviews/raw metadata are written elsewhere by other steps — do not touch _internal/ here.
   Date: {YYYY-MM-DD}

   ## Source Files to Read
   - analysis_summary.md (MUST READ — taxonomy, core papers, themes, evolution, gaps)
   - _internal/chaining_results.md (foundational dependencies, if exists)
   - _internal/code_search.md (code/model resources)
   - _internal/search_results.json (paper metadata)
   - Read key card files from cards/ (at least top 15-20 by discovery_count)

   ## Report Structure (mode-specific)

   The report set differs per mode. Common rules across all modes:
   - All Korean prose, technical terms English-parenthesized
   - Every comparison table ends with bold **Takeaway** line
   - Numbers/claims sourced only from analysis_summary / cards — NO fabrication
   - Cross-references via `[text](filename.md)`

   ### Mode `academic` (default) — 9 files

   ### 00_briefing.md — Executive Briefing
   - **Level 0** (1 line): 한 문장 요약
   - **Level 1** (3-5 lines): 핵심 발견 요약
   - **Level 2** (1 page):
     - Mermaid paper relationship diagram (`graph TD`, styled key nodes, 4 subgraphs: Backbone/QbE/QbT/On-Device)
     - Research axes table: axis | description | key papers | paper count
     - Key findings (numbered, 5-7 items)
     - Recommended architecture stack (ASCII pipeline: input → feature → encoder → matching → output)
     - Model size spectrum (ASCII: MCU→Edge→GPU→Server with params and best metric per tier)
   - **Level 3**: 전체 보고서 가이드 table (file | content | key question answered)

   ### 01_landscape.md — Research Landscape
   - Problem definition (formal: few-shot, zero-shot, open-set variants)
   - 3D taxonomy: enrollment method (audio/text/multi-modal) × learning paradigm (metric/contrastive/classification/KD/meta) × architecture (CNN/Conformer/Hybrid/MLP)
   - Temporal evolution table: period | key transition | representative papers
   - Research axes detailed breakdown with paper counts
   - Enrollment method comparison (QbE vs QbT vs Multi-modal, with paper lists per category)

   ### 02_core_papers.md — Core Paper Analysis
   - Grade classification: **필독** (DC>=5 or CC>100), **정독** (DC>=3 or CC>30), **참조** (rest)
   - Paper lineage diagrams (ASCII: metric learning lineage, phoneme matching lineage, multi-modal lineage)
   - Per-paper detailed cards for 필독+정독:
     authors | venue/year | DC/CC | code link | core insight | architecture (diagram if possible) | key results table | limitations | connections
   - 참조 grade: compact table only (title | year | contribution | params)

   ### 03_baselines.md — Benchmark Comparison Tables
   Tables (each ending with bold **Takeaway** line):
   1. GSC closed-set (12-class): model/year/acc-v1/acc-v2/params/MACs/latency/code
   2. LibriPhrase text-enrollment: model/year/EER-Easy/EER-Hard/AUC-Easy/AUC-Hard/params/code
   3. splitGSC few-shot open-set: model/backbone/params/5-shot-acc/AUROC/code
   4. Zero-shot audio enrollment: model/size-quant/AUC/EER/training-data/code
   5. Continuous speech KWS: model/keywords/recall@2FA-clean/other/speed
   6. Multilingual UD-KWS: model/params/languages/metric/score/code
   7. On-device deployment: model/year/platform/params/power/accuracy/method
   8. Model size spectrum ASCII (MCU→Server with params and best metric at each tier)
   - Only include numbers directly from card files — NO fabrication

   ### 04_technical_deep_dive.md — Technical Deep Dive
   - 5-8 technology themes, each with: problem definition → approach comparison table → key insight
     Expected themes: phoneme-level supervision, audio-text modality gap, metric/contrastive losses, KD for lightweight, open-set rejection, streaming detection, data augmentation/synthesis
   - Loss function comparison table (MANDATORY): loss | papers | mechanism | pros | cons | best-for
   - Closing section: **미해결 과제와 연구 기회** (5-8 gaps with difficulty/impact ratings + solution directions)

   ### 05_datasets.md — Dataset Specifications
   - Primary benchmarks (detailed field/value tables): GSC v1/v2 (+ splitGSC split details), LibriPhrase (+ key eval numbers), Qualcomm KWSD, Hey-Snips
   - Training datasets: MSWC, LibriSpeech, VoxCeleb, WenetPhrase, Common Voice
   - Each dataset: year/size/speakers/keywords/language/access URL/license/usage count
   - Noise/augmentation datasets table
   - Dataset usage map (ASCII diagram: training datasets → evaluation benchmarks)
   - Recommended benchmark combination table: scenario → datasets → metrics

   ### 06_implementation.md — Goal-Adaptive Action Roadmap
   First **infer the user's primary goal** from the original query and select the matching template. Always state the inferred goal at the top of the file (`> Inferred goal: {goal} — {one-line rationale}`). If ambiguous, default to **build** but log the assumption.

   **Goal detection cues** (non-exhaustive, infer from `original_query`):
   - **build** — "구현", "implement", "develop", "build a system", "재현", "프로젝트" → code/system implementation
   - **seminar** — "세미나", "발표", "lecture", "presentation", "slides", "talk" → talk/slide preparation
   - **write** — "논문 작성", "survey 쓰기", "review writing", "thesis" → paper/survey writing
   - **research** — "연구 방향", "research direction", "open problem", "hypothesis", "what's next" → research direction scoping
   - **adopt** — "기술 도입", "선택", "어떤 모델 써야", "production 적용" → technology selection / adoption decision

   **Template by goal** (always end with a Cross-References section + 5-7 line Korean summary):

   #### Goal: build — Implementation Roadmap
   - Architecture decision matrix (5-8 decisions): each with Option A/B/C + Recommendation + reasoning. Decision keys depend on domain (e.g., backbone, loss, training paradigm, deployment target, data pipeline).
   - Phased implementation plan (typically 6-12 weeks): Phase 0 (Infrastructure: dataset pipeline, eval metrics, reference code) → Phase 1-N (incremental capability buildup, ending with optimization/deployment).
   - Key technical decisions with runnable Python code snippets (feature extraction, evaluation protocol, etc.)
   - Paper-to-code mapping table: technique → source paper → reference repo → status
   - Risk assessment table: risk | probability | impact | mitigation

   #### Goal: seminar — Seminar Preparation Roadmap
   - Slide structure outline organized by chapter (target audience-aware slide count, e.g., 30-50 for 60-min)
   - Per-chapter cheat sheet (key papers, takeaways, transitions, time budget)
   - Deep-dive slide candidates for expert audiences (5-10 backup slides)
   - Demo candidates with reproducible inference setup (link to repos)
   - Q&A anticipation table (5-10 likely questions with brief answers + supporting paper)

   #### Goal: write — Writing Roadmap
   - Section-by-section outline (Abstract → Intro → Related Work → Methods → Experiments → Conclusion, or domain-appropriate variant)
   - Argument scaffolding: thesis → supporting evidence per claim → counter-considerations / limitations
   - Figure/table candidates with caption drafts and source paper references
   - Citation map: which papers to cite where (with rationale linking to claim)
   - Writing-stage timeline (literature consolidation → outline → draft → revision → submission)

   #### Goal: research — Research Direction Roadmap
   - Open-problem identification: 5-8 gaps with severity (impact × tractability) ratings
   - Hypothesis candidates: testable hypotheses with expected outcomes
   - Experimental setup proposals: minimal viable experiment per hypothesis (data, baseline, metric, resource estimate)
   - Decision matrix: which direction first (impact × feasibility × novelty)
   - Risk register: scientific risks (negative results, scooping) + mitigation

   #### Goal: adopt — Technology Adoption Roadmap
   - Selection criteria matrix (cost, latency, accuracy, license, maintenance) weighted to user constraints
   - Candidate shortlist (3-5 options) with pros/cons aligned to criteria
   - Pilot evaluation plan: which to try first, measurement protocol, decision threshold
   - Integration considerations: data pipeline, monitoring, rollback path
   - Risk assessment: technical + organizational

   **Schema flexibility**: section names above are guides, not hard requirements. Adapt headings, decision keys, phase counts to the actual domain (e.g., "MCU optimization" only relevant if on-device is in scope). Numbers/examples in cards must drive the template, not the other way around.

   **CRITICAL — Output scope strictly limited to the 9 markdown reports** (00_briefing through 08_reading_guide). Specifically for goal=seminar:
   - Produce `06_implementation.md` with chapter outline + cheat sheet + Q&A + deep-dive candidates ONLY.
   - Do **NOT** produce `seminar_slides.md`, `notion_seminar_slides.md`, slide-by-slide markdown, PPTX, or any other slide-rendering artifact.
   - Slide-by-slide draft generation belongs to autopilot-doc presentation mode. Never overstep.

   Same restriction applies to other goals: do NOT generate paper drafts, code, PPTX, or any final-form document — only the 9 markdown analysis reports.

   **MANDATORY closing section — `## Next Pipeline`** (always include at end of `06_implementation.md`, regardless of goal):

   This file is a **high-level outline / sketch** based on field analysis. For the actual document creation or implementation, hand off to a downstream pipeline. Pick the recommendation by detected goal:

   | Inferred Goal | Recommended next command | Hand-off rationale |
   |---|---|---|
   | build | `/autopilot-code --mode dev "<task>"` | Code implementation needs init-plan → execute-plan → run-test loop. autopilot-code reads `analysis_project/{code,paper}/` + `research/{topic}/` implicitly. |
   | seminar | `/autopilot-doc "<task>" --mode presentation` | Slide-by-slide markdown draft (PPTX export is NOT supported — user converts to PPT manually with their lab template). research artifact는 implicit 인지. |
   | write | `/autopilot-doc "<task>" --mode write` | Full paper draft (Abstract → Conclusion) generation. |
   | research | `/autopilot-doc "<task>" --mode proposal` (or stay in research-only mode) | Proposal mode covers hypothesis + experiment design framing. |
   | adopt | `/autopilot-doc "<task>" --mode report` (or `--mode proposal` for go/no-go decision) | Tech adoption is a structured report/proposal. |
   | review | `/autopilot-doc "<task>" --mode review --format-ref <path>` (REQUIRED; path-only, no built-in presets — venues differ year-to-year) | Reviewer report draft following the venue's review form. |

   Include the recommended next command verbatim in this section so the user can copy-paste it. autopilot-doc은 `research/{topic}/` 산출물을 prompt 키워드 fuzzy match로 자동 인지하므로 별도 path 인자 불요.

   **Boundary disclaimer** (also include): "이 06_implementation.md는 분야 분석에서 도출된 high-level 계획입니다. 본격적인 문서 작성·코드 구현은 autopilot-doc / autopilot-code로 인계됩니다."

   ### 07_resources.md — Code, Data & Model Resources
   - Tier-based repos: Tier 1 (directly usable for UD-KWS) / Tier 2 (backbone/infra) / Tier 3 (supplementary)
     Columns: repo | paper | stars | language | last-update | reproducibility | notes
   - Code-not-available high-impact papers (institution/reason)
   - Pre-trained models table: model | architecture | params | framework | checkpoint | URL
   - Reproducibility assessment matrix: paper | code | data | checkpoint | overall rating

   ### 08_reading_guide.md — Recommended Reading Paths
   - 4-5 purpose-based tracks:
     Track A: UD-KWS 입문자 (what is this field)
     Track B: 경량 모델 설계 (small model, good performance)
     Track C: 실전 구현 (I want to build a system)
     Track D: 연구자 (where are the open problems)
     Track E (optional): On-device 배포 전문가
   - Each track: target audience, goal, ordered paper list (5-7), reading point per paper, estimated time
   - Per-paper markers: 필수/권장/선택 for each track

   ### Mode `technology` — 7 files

   ### 00_briefing.md — Executive Briefing
   - 1-line summary, 3-5 line key findings, 1-page overview
   - Mermaid: technology landscape (categories, vendors, standards) — `graph TD`
   - Top-3 actionable insights (e.g., "production 환경엔 X 코덱이 사실상 표준", "오픈소스 대안 Y가 부상")

   ### 01_landscape.md — Technology Landscape
   - Category taxonomy (codecs / protocols / processing / hardware 등)
   - Key technologies × categories matrix
   - Lineage diagram (어떤 기술이 어디서 파생됐는지)
   - Adoption stage per technology (emerging / mainstream / legacy)

   ### 02_standards.md — Standards & Specs
   - Standards inventory: org (3GPP / ITU-T / IEEE / W3C / IETF) | spec ID | scope | year | status
   - Per-standard detail: 핵심 sections, mandatory vs optional features, profile/level
   - Cross-references between specs (예: VoLTE는 3GPP 26.171 + IETF SDP + ITU-T G.722.2)
   - **Takeaway**: 어느 표준을 따라야 하는가 (production / research 별도)

   ### 03_vendor_comparison.md — Vendor / Solution Comparison
   - Vendor matrix: vendor | product/SDK | licensing | platform | strengths | weaknesses
   - Capability checklist: feature × vendor (Yes/No/Partial)
   - Cost·license model 비교 (proprietary / open-source / royalty)
   - **Takeaway**: 사용 시나리오별 추천 솔루션

   ### 04_technical_deep_dive.md — Algorithm·Protocol Details
   - 3-5 핵심 기술 테마, each: 문제 정의 → 알고리즘 비교 → key insight
   - Critical equations / pseudocode / state machines (필요 시)
   - Performance trade-off 분석 (latency / quality / complexity)

   ### 05_deployment.md — Deployment Considerations
   - Reference architectures (network topology / signal flow)
   - Latency budget breakdown
   - Integration paths (existing system → new tech 마이그레이션)
   - Failure modes + mitigation
   - Cost model (CapEx / OpEx / per-call cost 등 해당 시)

   ### 06_implementation.md — Goal-Adaptive Action Roadmap (academic mode와 동일 템플릿; build / adopt 우선)

   ### 07_resources.md — Open-source Code, Models, Tools
   - Tier-based resources: Tier 1 (직접 사용 가능) / Tier 2 (참조용) / Tier 3 (실험용)
   - Pre-trained checkpoints (있다면) | platform support | license
   - Evaluation tools, test datasets, benchmarking suites

   ### Mode `market` — 5 files

   ### 00_briefing.md — Executive Briefing
   - 1-line summary, 3-5 line key findings, 1-page overview
   - Top-3 strategic implications

   ### 01_market_overview.md — Market Sizing & Segmentation
   - Total Addressable Market (TAM) / Serviceable (SAM) / Obtainable (SOM)
   - Segment breakdown: by region / customer type / use case
   - Growth rate (CAGR) + projection 3-5년
   - Source attribution table (출처 / 발행일 / 신뢰도)
   - **Takeaway**: 시장 규모 + 어디서 성장 동인이 나오는가

   ### 02_key_players.md — Competitor Profiles
   - Top 5-10 players: name | revenue / market share | products | strategy | recent moves
   - Positioning map (2D, 예: price vs feature)
   - Recent M&A / partnership / funding 동향
   - **Takeaway**: 경쟁 구도 1줄 요약

   ### 03_trends.md — Market Trends & Drivers
   - Driver factors (technology / regulation / customer need)
   - Inhibitor factors (cost / risk / inertia)
   - Disruptor candidates (incumbent를 위협할 수 있는 신기술·플레이어)
   - Timeline (단기 / 중기 / 장기 trend 분리)

   ### 04_opportunities.md — Opportunity Assessment
   - Whitespace identification (충족되지 않는 needs)
   - Entry strategy options (organic / partnership / acquisition)
   - Risk register
   - **Recommended actions** (prioritized)

   ## Quality Directives
   - Cross-reference other reports: [text](filename.md)
   - Every comparison table MUST end with bold **Takeaway** line
   - Mermaid: use graph TD with style directives for key nodes
   - Code snippets in 06_implementation.md must be runnable Python
   - Numbers only from card files / analysis_summary — NO fabrication
   - Do NOT return report content in response — write files only
   Return file paths + 3-5 line Korean summary."

Step 4b: QA Loop (max 2 rounds; quick = 1 round)

QA level: --qa flag if provided, else auto-detect (<=10 papers: light, 11-25: standard, >25 or deep: thorough).

Two reviewer roles run in parallel at standard+:

Quality reviewer(s): coverage / no-fabrication / progressive disclosure / actionable roadmap
Fact-checker (NEW): cards/ verbatim 대조 — reports에 인용된 venue/year/metric/lineage가 source cards와 일치하는지 narrow 검증

Level	Quality reviewer	Fact-checker (parallel)	Max rounds
quick	1× 품질관리팀 (sonnet), spot-check만	skip	1 (no re-invoke even on 🔴)
light	1× 품질관리팀 (sonnet)	skip (quality reviewer covers basic spot-checks)	2
standard	1× 품질관리팀 (opus)	1× 품질관리팀 fact-check (sonnet)	2
thorough	2× 품질관리팀 parallel (opus, completeness + accuracy)	1× 품질관리팀 fact-check (sonnet)	2

Why Sonnet for fact-checker: cards verbatim 대조는 _창의적 판단_이 아닌 _단순 매칭 작업_이라 Sonnet으로 충분. 비용 효율적.

round = 0, review_dir = {artifact_dir}/_internal/reviews/
Loop:
  round += 1

  # Parallel reviewer invocation (single message with multiple Agent calls per QA Scaling)

  Quality reviewer prompt (opus or sonnet per level):
    "Review research survey report — _coverage / no-fabrication / disclosure / roadmap_ focus.
     Topic: {topic}. Reports dir: {artifact_dir}.
     Verify: coverage, no fabrication, progressive disclosure, actionable roadmap.
     Do NOT individually verify each citation (model venue/year/metric) — that's the fact-checker's role at standard+.
     Write to: {review_dir}/round_{round}_quality.md (or round_{round}.md at light level).
     Return ONLY path + one-line verdict."

  Fact-checker prompt (sonnet, parallel — standard/thorough only):
    "You are a fact-check focused reviewer — NOT report quality.
     Topic: {topic}. Reports dir: {artifact_dir}. Cards: {artifact_dir}/cards/.

     For every domain claim in the reports (model name / venue / year / metric / dataset /
     lineage / classification mentioned in 00_briefing through last report), open the
     corresponding card and verbatim compare:
     - Single source of truth: {artifact_dir}/cards/*.md
     - If a report claim has no matching card → flag as 🔴 (fabrication risk)

     Do NOT comment on coverage, narrative, or roadmap quality — that's the quality reviewer's job.
     Cost-aware mode (sonnet): table-only output. Limit to ~30 most material claims (prioritize Tier 1 papers + key models in user-prompt).

     Output table:
     | Report | Section | Claim | Source card (file:line) | Match (✅/❌) | Severity (🔴/🟡) |

     Write to: {review_dir}/round_{round}_factcheck.md.
     Return ONLY path + one-line verdict."

  No 🔴 from any reviewer → exit.
  qa_level == quick → after round 1, write unresolved.md if any 🔴 remain (tag fact-check residuals as [FACT-RESIDUAL]), exit. NEVER re-invoke 연구팀.
  🔴 from quality + round < 2 → re-invoke 연구팀 with quality findings.
  🔴 from fact-checker + round < 2 → re-invoke 연구팀 with mandatory ref-grounding (re-read named cards).
  🔴 from both + round < 2 → re-invoke 연구팀 with combined findings.
  round >= 2 + 🔴 remain → write unresolved.md (tag fact-check residuals as [FACT-RESIDUAL]), exit.

Step 4c: Status Check

Verify {artifact_dir}/00_briefing.md exists. Not exists → pipeline_summary(failed) → STOP.

Step 5: Pipeline Summary

Write {artifact_dir}/pipeline_summary.md BEFORE reporting:

# Research Survey Pipeline Summary: {topic}
- **Date**: {YYYY-MM-DD}
- **Query**: {query}
- **Depth**: {depth}
- **Status**: done / partial / failed
- **From-Stage**: {stage if resumed via --from, else "N/A"}

## Process Log
| Step | Action | Result | Notes |
|---|---|---|---|
| 1 | Input parsing | {type} | topic: {topic} |
| 2a | Query Expansion | {N} queries | original + {N-1} variants |
| 2b-c | Paper Search (Agent) | {N} papers | sources: {list} |
| 2e | Query Expansion Rounds | {N} rounds | new papers per round: {list} |
| 3 | Paper Analysis (Agent x N) | {N} analyzed | depth: {depth}, loopbacks: {N} |
| 4 | Report Generation (Agent + QA) | {N} files (mode={mode}: academic=9 / technology=7 / market=5) | QA: {level}, rounds: {N} |

## Artifacts
- Search: {artifact_dir}/_internal/search_results.json
- Analysis: {artifact_dir}/analysis_summary.md
- Reports: {artifact_dir}/00_briefing.md ~ {last_report.md} (mode-aware: academic→08_reading_guide / technology→07_resources / market→04_opportunities)

## Decision Points
| Step | Decision | Response | Action |
|---|---|---|---|
| (from in-memory log) |

Step 6: Briefing

Read 00_briefing.md and 06_implementation.md (for the inferred goal + Next Pipeline) and present:

Level 0 summary (one line)
Level 1 overview (3-5 lines)
Key stats: total papers, core papers, code availability
File paths for all reports (mode-aware: academic→00~~08 / technology→00~~07 / market→00~04)
Next pipeline recommendation: read the ## Next Pipeline section from 06_implementation.md and present the inferred goal + recommended next command verbatim. Make it copy-paste-ready.
"질문이 있으시면 물어보세요. 보고서를 기반으로 답변드리겠습니다."

Pipeline completion: Step 5 determines formal status. Step 6 is optional interaction.

Scope boundary: autopilot-research produces field intelligence (markdown analysis only). It does NOT produce final documents (papers/slides/PPTX/code). For document/slide creation, hand off to autopilot-doc; for code implementation, hand off to autopilot-code. The 06_implementation.md outline is the bridge artifact between these pipelines.

Decision Logging

Record after each gate: {step | decision | response | action}. Populate pipeline_summary Decision Points table.

Safety Rules

Do NOT fabricate citations, URLs, or metrics
Source failure → continue with remaining sources
(no --refs flag — supplementary local materials read from analysis_project/paper/ if exists; not asked otherwise)
Rate limits: arXiv ~3s, OpenAlex 10 req/s, S2 1 req/s, Google Scholar 3s + 50/day
Context protection: each Agent returns ONLY file paths + 3-5 line summary
Context budget: deep 모드에서 오케스트레이터 context가 누적됨 (쿼리 확장 라운드 + 스키밍 배치 + loopback). Agent 결과는 항상 파일로 저장하고 요약만 context에 유지. search_results.json 전체를 context에 올리지 않고 paper count + top-5만 참조.
MERGE mode 무결성: 제목 fuzzy matching은 lowercase + 구두점 제거 + a/an/the 제거로 정규화. 같은 논문의 discovery_count는 단조 증가만 허용 (감소 금지).
Playwright 고아 프로세스: 탐색팀 호출 전후로 pkill -f chromium_headless_shell 실행

Task

$ARGUMENTS

name	autopilot-research
description	Research survey pipeline — multi-mode investigation (academic / technology / market). Mode-specific search sources and report templates. Field intelligence only; no PPT/paper drafts. Hand off to autopilot-doc (writing/slides) or autopilot-code (build) for actual document/code creation.
argument-hint	<query> [--mode academic\|technology\|market] [--depth shallow\|medium\|deep] [--qa quick\|light\|standard\|thorough] [--no-clarify] [--from search\|analyze\|report]

산출물 폴더 컨벤션: SKILL_OUTPUT_CONVENTION.md (3-tier: T1 root / T2 named subdir / T3 _internal/). 본 skill의 raw metadata (search_results.json, phase_a_*.json, chaining_results.md, code_search.md 등) + reviews는 모두 _internal/ 하위로 격리. T1/T2 chapter 파일과 cards/는 root.

Language Rule

When explaining something to the user, write in Korean.

Argument Parsing

Parse $ARGUMENTS for optional flags:

query: research topic, paper title, arXiv ID, or PDF path (remaining text after flags)
--mode: academic (default) | technology | market — investigation type (see Modes below)
--depth: shallow | medium (default) | deep
(no --refs flag — local reference materials should be pre-processed via /analyze-project --mode paper first → output goes to .claude_reports/analysis_project/paper/ which autopilot-research auto-detects)
--qa: quick | light | standard (default) | thorough — override QA intensity for report QA loop. Standard+ runs a parallel fact-checker (sonnet) alongside quality reviewer(s) for cards verbatim 대조 (citation/venue/year/metric verification). quick은 review loop를 1라운드로 강제 종료하는 fastest path — 1× 품질관리팀(sonnet) 단일 패스 후 🔴 잔존 시에도 재호출 없이 unresolved.md만 기록하고 종료. fact-checker 비활성, refine-style re-invoke 비활성.
--from: search | analyze | report — resume the pipeline at a specific stage (see Resume below)
--no-clarify: skip Step 0 Scope Clarification (force-run with current query as-is)

Modes

`--mode academic` (default)

Use when: 학술 논문 중심 조사 (deep learning method survey, 알고리즘 비교, 분야 trend).

Search sources: arXiv, Semantic Scholar, OpenAlex, Hugging Face paper_search, Google Scholar
Phases: A (skimming) + B (reference chaining) + C (code & model search) — 모두 활성
Reports: 9개 (briefing → landscape → core_papers → baselines → technical_deep_dive → datasets → implementation → resources → reading_guide) — 현행 동일

`--mode technology`

Use when: 산업 표준·기술 ecosystem 조사 (코덱/프로토콜, 표준 문서, vendor 솔루션 비교, 배포 고려사항).

Search sources: WebSearch (industry blogs, technical whitepapers, vendor docs), WebFetch (standards orgs: 3GPP / ITU-T / IEEE / W3C), arXiv (보조), Hugging Face (관련 모델)
Phases: A (full skim of standards + whitepapers) — 활성. B (reference chaining) — 약화 (academic citation 그래프가 의미 약함). C (code search) — 활성 (open-source 구현체).
Reports (7개):
- 00_briefing.md — Executive briefing
- 01_landscape.md — Technology landscape (categories, players, lineage)
- 02_standards.md — Standards & specs (3GPP/ITU-T/IEEE/RFC numbers, key sections)
- 03_vendor_comparison.md — Vendor / solution comparison (Qualcomm vs Samsung vs Apple vs ...)
- 04_technical_deep_dive.md — Algorithm·protocol details
- 05_deployment.md — Deployment considerations (latency, cost, integration paths)
- 06_implementation.md — Goal-adaptive roadmap (existing template, build/adopt 우선)
- 07_resources.md — Open-source code, model weights, evaluation tools

`--mode market`

Use when: 시장 동향·경쟁사·analyst report 조사 (제품/서비스 시장 사이즈, key players, 채택률).

Search sources: WebSearch (analyst content, news, earnings reports, press releases), WebFetch (company sites, investor pages)
Phases: A (skim of market reports + news) — 활성. B / C — 비활성 (학술 검색 X, 코드 검색 X).
Reports (5개):
- 00_briefing.md — Executive briefing
- 01_market_overview.md — Market sizing, segmentation, growth rate
- 02_key_players.md — Competitor profiles, market share, positioning
- 03_trends.md — Trends, drivers, inhibitors, disruptors
- 04_opportunities.md — Opportunity assessment + actionable recommendations

Mode 미지정 시 query 키워드로 추론 — "논문/algorithm/method/SOTA" → academic, "표준/codec/protocol/3GPP/ITU/chip/MCU" → technology, "market/시장/competitor/analyst" → market. Fallback: 어느 키워드도 매치되지 않으면 → academic (한 줄 통보: "키워드 매칭 실패 → academic으로 진행. 다른 모드는 --mode 명시"). Multi-match (>=2 modes 동시 매치): Step 0 Scope Clarification에서 사용자에게 확정 질문.

Decision Defaults (no autonomy gating)

The pipeline auto-proceeds with sane defaults. There is no autonomy-level dial. Pause points are limited to:

Decision Point	Default Behavior
Search results review	Auto-proceed.
Query expansion rounds	Auto-proceed.
Phase B loopback	Auto-proceed up to the depth-gated limit.
External material discovery	If `analysis_project/paper/` exists in current dir, auto-include as supplementary input. If user expects external materials but none found → suggest `/analyze-project --mode paper` first.
Search returned 0 papers	Auto-stop with `pipeline_summary(failed)` (no useful continuation possible).
Report generation	Auto-proceed.

Resume (`--from`)

--from <stage> re-enters an existing artifact directory and runs from that stage onward. Stages:

search — Step 2 (Paper Search)
analyze — Step 3 (Phase A skimming + B chaining + C code search + analysis_summary)
report — Step 4 (Report Generation + QA loop)

pipeline_state.yaml

Written/updated at {artifact_dir}/pipeline_state.yaml after each completed stage:

pipeline: autopilot-research
query: <original query>
mode: academic                   # academic | technology | market (resolved at Step 1)
depth: medium
qa_level: standard
clarified_intent: <string or null>    # Step 0 output (if Clarification ran)
last_completed_stage: analyze    # one of: clarify, search, analyze, report
artifact_dir: <abs path>

Pipeline

Step 1: Input Parsing & Validation

Detect query type: keyword, paper title, arXiv ID, PDF path, folder path
Resolve --mode: explicit flag value, or infer from query keywords (academic / technology / market — see Modes section). Notify user of inferred mode in one line. Multi-match → defer resolution to Step 1.5 Scope Clarification.
Auto-detect supplementary input: if .claude_reports/analysis_project/paper/ exists in current dir, include as supplementary input for chaining. If user explicitly requested "use my local PDFs" but no analysis_project/paper/ → suggest running /analyze-project --mode paper first.
Construct topic name (sanitize: lowercase, hyphens, max 30 chars)
Set artifact_dir: .claude_reports/research/{topic}/
mkdir -p {artifact_dir} (only AFTER validation)

Step 1.5: Scope Clarification (사전 조율) — skipped if `--no-clarify` or `--from`

Purpose: 모호한 query는 mode 선택과 검색 폭을 잘못 잡아 9/7/5개 보고서 출력이 무용지물이 됨. 모호 detection 시 사용자에게 2-4 sharp question을 던진다.

Trigger conditions (any one matches → run):

Mode multi-match (≥2 modes 동시 매치)
Query 길이 < 50 Korean chars 또는 < 12 English words AND no specific constraint (예: time range, specific platform, target metric)
Query에 "조사/분석/survey" 같은 메타 키워드만 있고 구체적 deliverable·범위 없음

Mode-specific question seed:

academic: 조사 깊이(--depth 명시 의도?), 필독 컷오프(citation > N or year ≥ Y), 분야 경계(예: speech only? including audio in general?)
technology: 대상 표준 그룹/년도, 배포 환경(production/research), vendor 범위, 비교 축(performance/cost/license 우선순위)
market: 지역/시간 범위, 경쟁자 명시 여부, 의사결정 목적(투자 판단? 진출 결정? competitive intel?)

Skip 조건:

--no-clarify 명시
--from <stage> 재개 (이미 캡처됨)
Query 길이 ≥ 50 Korean chars 또는 ≥ 12 English words AND mode 명확

Output: 사용자 답변을 통합한 refined query를 Step 2로 전달 + pipeline_state.yaml의 clarified_intent 필드에 한 줄 요약 기록.

Step 2: Source Search (direct Agent call) — mode-aware

Search source selection per mode:

academic: arXiv + Semantic Scholar + OpenAlex + Hugging Face paper_search + Google Scholar (현행)

technology: WebSearch (industry blogs, vendor whitepapers) + WebFetch (3GPP/ITU-T/IEEE/W3C standards pages) + arXiv (보조) + Hugging Face (관련 모델)

market: WebSearch (analyst content, news, press releases) + WebFetch (company sites, investor pages). arXiv·Semantic Scholar·OpenAlex 비활성.

Step 2a: 초기 쿼리 확장 (LLM 지식 기반)

Step 2e의 논문 기반 확장과 다름: 2a는 LLM 사전 지식으로 동의어 생성, 2e는 실제 발견된 논문에서 새 키워드 추출.

Step 2b: HF MCP Pre-Fetch

Before invoking the agent, attempt HF paper_search for all queries:

For each query in queries: call paper_search and collect results
If successful: store combined as hf_results_json
If MCP unavailable or fails: hf_results_json = null, note in pipeline log

Step 2c: Invoke Agent

Agent(subagent_type="연구팀"):
  "Research survey mode: Paper search.
   Queries: {queries_list}
   Original query: {original_query}
   Query type: {detected_type}
   Output directory: {artifact_dir}
   **Routing**: All raw metadata files (search_results.json, phase_a_*.json, access_classification.json, browser_extracts/) → write to `{artifact_dir}/_internal/`. T1/T2 deliverables (cards/, chapter .md files, analysis_summary.md) → root `{artifact_dir}/`. mkdir -p `_internal` before first write if absent.
   Max results per source per query: 10
   {If analysis_project/paper/ available: 'Supplementary local paper analysis: {artifact_dir}/../analysis_project/paper/'}
   {If hf_results_json: 'HF paper_search results (pre-fetched): {hf_results_json}'}
   Timeout rule: If any single source takes >3 minutes, skip it and proceed to the next.

   ## search_results.json Schema
   {
     "query": "string", "date": "YYYY-MM-DD", "sources_used": ["string"],
     "total_papers": int,
     "papers": [{"title": "string (required)", "authors": ["string"],
       "year": int|null, "citation_count": int|null,
       "discovery_count": int (required, >=1), "sources": ["string"],
       "arxiv_id": string|null, "oa_url": string|null,
       "openalex_id": string|null, "referenced_works": ["string"]|null,
       "venue": string|null, "venue_tier": int|null (1-4), "raw_type": string|null,
       "url": string|null (landing page URL from any source — used by 탐색팀 for paywall access)}]
   }

   ## Google Scholar HTML Parsing Patterns
   - Split blocks: <div class='gs_r gs_or gs_scl'>
   - Title: strip tags from <h3> content
   - Year: , (\d{4})\s*[-–] pattern (leading comma required)
   - Citation: >Cited by (\d+)< pattern

   Follow your Role 2a procedure. Return file paths + 3-5 line Korean summary."

Step 2d: Post-Search Validation

Read {artifact_dir}/_internal/search_results.json
Verify valid JSON — if parse fails, re-invoke Agent once: "Your search_results.json was invalid. Fix and rewrite."
Verify papers array non-empty, each paper has title
If still fails after retry: pipeline_summary(failed) → STOP
If total_papers == 0: pipeline_summary(failed, "검색 결과 0건") → STOP

Error handling: If Agent call fails or returns no output → pipeline_summary(failed) → STOP.

Step 2e: Query Expansion Rounds (depth-gated)

발견된 논문의 제목/키워드에서 새로운 검색어를 추출하여 추가 검색 라운드를 실행한다.

라운드 제어 (depth 파라미터):

shallow: 추가 라운드 없음 (Round 1만)
medium: 최대 1회 추가 라운드 (Round 1 → keyword 추출 → Round 2)
deep: 최대 2회 추가 라운드 (Round 1 → Round 2 → Round 3)

각 라운드 절차:

오케스트레이터가 search_results.json의 논문 제목들을 읽고, 빈출 키워드/새로운 용어를 추출 (예: Round 1에서 "query-by-example", "metric learning", "prototypical network"가 반복 등장)
기존 쿼리에 없는 새 키워드로 2~3개 추가 쿼리 생성

새 쿼리만으로 연구팀 재호출 (기존 쿼리 재검색 안 함):

Agent(subagent_type="연구팀"):
  "Research survey mode: Paper search.
   Queries: {new_queries_only}
   Original query: {original_query} (for context, do NOT re-search)
   Output directory: {artifact_dir}
   **Routing**: raw metadata → `{artifact_dir}/_internal/` (search_results.json, etc.).
   Max results per source per query: 10
   MERGE mode: append to existing _internal/search_results.json — update discovery_count for duplicates, add new papers.
   ..."

병합 후 Post-Search Validation 재실행
새 논문이 3편 미만이면 → 라운드 종료 (수렴)

수렴 조건 (일찍 끝나는 경우):

추가 라운드에서 새 논문 < 3편 → 더 이상 확장하지 않음
새 키워드를 추출할 수 없음 (기존 쿼리와 동일) → 종료

Auto-proceed after expansion rounds (no user gate).

Step 3: Source Analysis (direct Agent calls) — mode-aware

Phase activation per mode:

academic: Phase A (skim) + B (reference chaining) + C (code/model search) — 모두 활성

technology: Phase A (full skim of standards + whitepapers) — 활성. Phase B (reference chaining) — 비활성 (academic citation graph가 의미 약함). Phase C — 활성 (open-source 구현체 탐색)

market: Phase A (skim of market reports + news) — 활성. Phase B / C — 비활성

Step 3a: Playwright Pre-Check + 탐색팀 Pre-Fetch

Bash: python3 -c "from playwright.async_api import async_playwright; print('OK')"
Bash: ls ~/.cache/ms-playwright/chromium_headless_shell-*/ > /dev/null 2>&1 && echo 'BROWSER_OK'

Set playwright_available = true/false.

Agent(subagent_type="탐색팀"):
  "Mode: fetch_papers
   URLs: {paywall_url_list}
   Output directory: {artifact_dir}
   Extract full text from each URL. Write to `_internal/browser_extracts/{filename}.txt` (T3 raw metadata).
   Return summary of successes and failures."

The extracted texts will be available for 연구팀 to Read during Phase A skimming. If 탐색팀 fails or playwright unavailable: proceed without — 연구팀 will fall through to abstract-only.

Step 3b: Phase A — Parallel Skimming Batches

Read _internal/search_results.json. Classify each paper's access type FIRST:

accessible: has arxiv_id OR oa_url OR matching file in _internal/browser_extracts/
paywall-only: no arxiv_id, no oa_url, no browser extract → abstract/metadata only

Construct batches (accessible papers only get full-read treatment):

Full-read accessible (citations > 10 AND not null AND accessible): 1 paper per Agent call
Abstract-only (citations <= 10 OR null OR paywall-only): up to 10 per Agent call
Exception: discovery_count >= 3 AND accessible → upgrade to full-read (1 per call)
Paywall-only papers: always go in abstract-only batches regardless of citation count (attempting WebFetch on paywall sites causes timeout/hang — never do this)

For each batch:

Agent(subagent_type="연구팀"):
  "Research survey mode: Paper analysis.
   Papers: {batch_json}
   Output directory: {artifact_dir}
   Supplementary inputs (if any): `{artifact_dir}/../analysis_project/paper/` (use if exists, otherwise none)
   Browser extracts: {artifact_dir}/_internal/browser_extracts/ (pre-fetched by 탐색팀, if available)

   ## CRITICAL RULES
   - Per-paper timeout: 60초 이내에 본문을 얻지 못하면 즉시 다음 논문으로. 절대 한 논문에 60초 이상 소비 금지.
   - Paywall fast-detect: arxiv_id도 oa_url도 없고 browser_extracts에도 없는 논문 → WebFetch 시도하지 말고 바로 OpenAlex Abstract만 사용.
   - WebFetch가 3xx redirect 무한 루프나 빈 응답을 반환하면 즉시 스킵.
   - 한 배치의 전체 처리 시간이 10분을 넘기지 않도록 한다.

   ## Paywall Access
   If browser_extracts/{filename}.txt exists for a paper: Read the pre-extracted text.
   If not: skip to metadata fallback (OpenAlex Abstract). Do NOT attempt browser access directly.
   Playwright 실행은 탐색팀(browser-team)이 전담 — 연구팀은 절대 직접 Playwright를 실행하지 않는다.

   Follow your Role 2b procedure. Return file paths + Korean summary."

Launch batches in parallel. Error handling: Individual batch failure → log and continue. Total failure (0 batches succeed) → pipeline_summary(failed) → STOP.

Step 3c: Phase B — Reference Chaining (depth-gated)

If depth == shallow: SKIP Phase B entirely.

Agent(subagent_type="연구팀"):
  "Research survey mode: Reference chaining.
   Paper cards: {artifact_dir}/cards/
   Search results: {artifact_dir}/_internal/search_results.json
   Depth: {depth}
   Output: {artifact_dir}/_internal/chaining_results.md
   Follow your Role 2b reference chaining procedure. Return file paths + Korean summary."

Loopback control (orchestrator responsibility):

Parse chaining_results.md → extract papers with reference_frequency >= 2
If new papers exist AND loopback_count < limit (medium: 1, deep: 2):
- Construct Phase A batches for new papers only (top 10)
- Invoke additional skimming Agent calls
- Increment loopback_count
- Re-invoke Phase B for further chaining
When limit reached or no new papers → proceed to Phase C

Step 3d: Phase C — Code & Model Search

Agent(subagent_type="연구팀"):
  "Research survey mode: Code and model search.
   Paper cards: {artifact_dir}/cards/
   Output: {artifact_dir}/code_resources/
   Aggregate: {artifact_dir}/_internal/code_search.md
   Follow your Role 2c procedure. Return file paths + Korean summary."

Step 3e: Compile analysis_summary.md

Agent(subagent_type="연구팀"):
  "Research survey mode: Compile analysis summary.
   Compile from: cards/, _internal/chaining_results.md (if exists), _internal/code_search.md (if exists).
   Set phase flags: chaining_available, code_search_available.
   Output: {artifact_dir}/analysis_summary.md
   Return file path + Korean summary."

Step 3 Status Check

Read {artifact_dir}/analysis_summary.md.

Not exists or 0 papers → pipeline_summary(failed) → STOP
Depth-aware: shallow + chaining_available == false + code_search_available == true → done (intentional skip)
Otherwise partial flags → partial, warn user, proceed

Step 4: Report Generation (direct Agent call + QA loop)

Step 4a: Generate Reports

Agent(subagent_type="연구팀"):
  "Research survey mode: Report generation.
   Analysis directory: {artifact_dir}
   Topic: {topic}
   Output directory: {artifact_dir}
   **Routing**: T1/T2 chapter files (00_briefing.md ~ NN_*.md, analysis_summary.md) → root `{artifact_dir}/`. Reviews/raw metadata are written elsewhere by other steps — do not touch _internal/ here.
   Date: {YYYY-MM-DD}

   ## Source Files to Read
   - analysis_summary.md (MUST READ — taxonomy, core papers, themes, evolution, gaps)
   - _internal/chaining_results.md (foundational dependencies, if exists)
   - _internal/code_search.md (code/model resources)
   - _internal/search_results.json (paper metadata)
   - Read key card files from cards/ (at least top 15-20 by discovery_count)

   ## Report Structure (mode-specific)

   The report set differs per mode. Common rules across all modes:
   - All Korean prose, technical terms English-parenthesized
   - Every comparison table ends with bold **Takeaway** line
   - Numbers/claims sourced only from analysis_summary / cards — NO fabrication
   - Cross-references via `[text](filename.md)`

   ### Mode `academic` (default) — 9 files

   ### 00_briefing.md — Executive Briefing
   - **Level 0** (1 line): 한 문장 요약
   - **Level 1** (3-5 lines): 핵심 발견 요약
   - **Level 2** (1 page):
     - Mermaid paper relationship diagram (`graph TD`, styled key nodes, 4 subgraphs: Backbone/QbE/QbT/On-Device)
     - Research axes table: axis | description | key papers | paper count
     - Key findings (numbered, 5-7 items)
     - Recommended architecture stack (ASCII pipeline: input → feature → encoder → matching → output)
     - Model size spectrum (ASCII: MCU→Edge→GPU→Server with params and best metric per tier)
   - **Level 3**: 전체 보고서 가이드 table (file | content | key question answered)

   ### 01_landscape.md — Research Landscape
   - Problem definition (formal: few-shot, zero-shot, open-set variants)
   - 3D taxonomy: enrollment method (audio/text/multi-modal) × learning paradigm (metric/contrastive/classification/KD/meta) × architecture (CNN/Conformer/Hybrid/MLP)
   - Temporal evolution table: period | key transition | representative papers
   - Research axes detailed breakdown with paper counts
   - Enrollment method comparison (QbE vs QbT vs Multi-modal, with paper lists per category)

   ### 02_core_papers.md — Core Paper Analysis
   - Grade classification: **필독** (DC>=5 or CC>100), **정독** (DC>=3 or CC>30), **참조** (rest)
   - Paper lineage diagrams (ASCII: metric learning lineage, phoneme matching lineage, multi-modal lineage)
   - Per-paper detailed cards for 필독+정독:
     authors | venue/year | DC/CC | code link | core insight | architecture (diagram if possible) | key results table | limitations | connections
   - 참조 grade: compact table only (title | year | contribution | params)

   ### 03_baselines.md — Benchmark Comparison Tables
   Tables (each ending with bold **Takeaway** line):
   1. GSC closed-set (12-class): model/year/acc-v1/acc-v2/params/MACs/latency/code
   2. LibriPhrase text-enrollment: model/year/EER-Easy/EER-Hard/AUC-Easy/AUC-Hard/params/code
   3. splitGSC few-shot open-set: model/backbone/params/5-shot-acc/AUROC/code
   4. Zero-shot audio enrollment: model/size-quant/AUC/EER/training-data/code
   5. Continuous speech KWS: model/keywords/recall@2FA-clean/other/speed
   6. Multilingual UD-KWS: model/params/languages/metric/score/code
   7. On-device deployment: model/year/platform/params/power/accuracy/method
   8. Model size spectrum ASCII (MCU→Server with params and best metric at each tier)
   - Only include numbers directly from card files — NO fabrication

   ### 04_technical_deep_dive.md — Technical Deep Dive
   - 5-8 technology themes, each with: problem definition → approach comparison table → key insight
     Expected themes: phoneme-level supervision, audio-text modality gap, metric/contrastive losses, KD for lightweight, open-set rejection, streaming detection, data augmentation/synthesis
   - Loss function comparison table (MANDATORY): loss | papers | mechanism | pros | cons | best-for
   - Closing section: **미해결 과제와 연구 기회** (5-8 gaps with difficulty/impact ratings + solution directions)

   ### 05_datasets.md — Dataset Specifications
   - Primary benchmarks (detailed field/value tables): GSC v1/v2 (+ splitGSC split details), LibriPhrase (+ key eval numbers), Qualcomm KWSD, Hey-Snips
   - Training datasets: MSWC, LibriSpeech, VoxCeleb, WenetPhrase, Common Voice
   - Each dataset: year/size/speakers/keywords/language/access URL/license/usage count
   - Noise/augmentation datasets table
   - Dataset usage map (ASCII diagram: training datasets → evaluation benchmarks)
   - Recommended benchmark combination table: scenario → datasets → metrics

   ### 06_implementation.md — Goal-Adaptive Action Roadmap
   First **infer the user's primary goal** from the original query and select the matching template. Always state the inferred goal at the top of the file (`> Inferred goal: {goal} — {one-line rationale}`). If ambiguous, default to **build** but log the assumption.

   **Goal detection cues** (non-exhaustive, infer from `original_query`):
   - **build** — "구현", "implement", "develop", "build a system", "재현", "프로젝트" → code/system implementation
   - **seminar** — "세미나", "발표", "lecture", "presentation", "slides", "talk" → talk/slide preparation
   - **write** — "논문 작성", "survey 쓰기", "review writing", "thesis" → paper/survey writing
   - **research** — "연구 방향", "research direction", "open problem", "hypothesis", "what's next" → research direction scoping
   - **adopt** — "기술 도입", "선택", "어떤 모델 써야", "production 적용" → technology selection / adoption decision

   **Template by goal** (always end with a Cross-References section + 5-7 line Korean summary):

   #### Goal: build — Implementation Roadmap
   - Architecture decision matrix (5-8 decisions): each with Option A/B/C + Recommendation + reasoning. Decision keys depend on domain (e.g., backbone, loss, training paradigm, deployment target, data pipeline).
   - Phased implementation plan (typically 6-12 weeks): Phase 0 (Infrastructure: dataset pipeline, eval metrics, reference code) → Phase 1-N (incremental capability buildup, ending with optimization/deployment).
   - Key technical decisions with runnable Python code snippets (feature extraction, evaluation protocol, etc.)
   - Paper-to-code mapping table: technique → source paper → reference repo → status
   - Risk assessment table: risk | probability | impact | mitigation

   #### Goal: seminar — Seminar Preparation Roadmap
   - Slide structure outline organized by chapter (target audience-aware slide count, e.g., 30-50 for 60-min)
   - Per-chapter cheat sheet (key papers, takeaways, transitions, time budget)
   - Deep-dive slide candidates for expert audiences (5-10 backup slides)
   - Demo candidates with reproducible inference setup (link to repos)
   - Q&A anticipation table (5-10 likely questions with brief answers + supporting paper)

   #### Goal: write — Writing Roadmap
   - Section-by-section outline (Abstract → Intro → Related Work → Methods → Experiments → Conclusion, or domain-appropriate variant)
   - Argument scaffolding: thesis → supporting evidence per claim → counter-considerations / limitations
   - Figure/table candidates with caption drafts and source paper references
   - Citation map: which papers to cite where (with rationale linking to claim)
   - Writing-stage timeline (literature consolidation → outline → draft → revision → submission)

   #### Goal: research — Research Direction Roadmap
   - Open-problem identification: 5-8 gaps with severity (impact × tractability) ratings
   - Hypothesis candidates: testable hypotheses with expected outcomes
   - Experimental setup proposals: minimal viable experiment per hypothesis (data, baseline, metric, resource estimate)
   - Decision matrix: which direction first (impact × feasibility × novelty)
   - Risk register: scientific risks (negative results, scooping) + mitigation

   #### Goal: adopt — Technology Adoption Roadmap
   - Selection criteria matrix (cost, latency, accuracy, license, maintenance) weighted to user constraints
   - Candidate shortlist (3-5 options) with pros/cons aligned to criteria
   - Pilot evaluation plan: which to try first, measurement protocol, decision threshold
   - Integration considerations: data pipeline, monitoring, rollback path
   - Risk assessment: technical + organizational

   **Schema flexibility**: section names above are guides, not hard requirements. Adapt headings, decision keys, phase counts to the actual domain (e.g., "MCU optimization" only relevant if on-device is in scope). Numbers/examples in cards must drive the template, not the other way around.

   **CRITICAL — Output scope strictly limited to the 9 markdown reports** (00_briefing through 08_reading_guide). Specifically for goal=seminar:
   - Produce `06_implementation.md` with chapter outline + cheat sheet + Q&A + deep-dive candidates ONLY.
   - Do **NOT** produce `seminar_slides.md`, `notion_seminar_slides.md`, slide-by-slide markdown, PPTX, or any other slide-rendering artifact.
   - Slide-by-slide draft generation belongs to autopilot-doc presentation mode. Never overstep.

   Same restriction applies to other goals: do NOT generate paper drafts, code, PPTX, or any final-form document — only the 9 markdown analysis reports.

   **MANDATORY closing section — `## Next Pipeline`** (always include at end of `06_implementation.md`, regardless of goal):

   This file is a **high-level outline / sketch** based on field analysis. For the actual document creation or implementation, hand off to a downstream pipeline. Pick the recommendation by detected goal:

   | Inferred Goal | Recommended next command | Hand-off rationale |
   |---|---|---|
   | build | `/autopilot-code --mode dev "<task>"` | Code implementation needs init-plan → execute-plan → run-test loop. autopilot-code reads `analysis_project/{code,paper}/` + `research/{topic}/` implicitly. |
   | seminar | `/autopilot-doc "<task>" --mode presentation` | Slide-by-slide markdown draft (PPTX export is NOT supported — user converts to PPT manually with their lab template). research artifact는 implicit 인지. |
   | write | `/autopilot-doc "<task>" --mode write` | Full paper draft (Abstract → Conclusion) generation. |
   | research | `/autopilot-doc "<task>" --mode proposal` (or stay in research-only mode) | Proposal mode covers hypothesis + experiment design framing. |
   | adopt | `/autopilot-doc "<task>" --mode report` (or `--mode proposal` for go/no-go decision) | Tech adoption is a structured report/proposal. |
   | review | `/autopilot-doc "<task>" --mode review --format-ref <path>` (REQUIRED; path-only, no built-in presets — venues differ year-to-year) | Reviewer report draft following the venue's review form. |

   Include the recommended next command verbatim in this section so the user can copy-paste it. autopilot-doc은 `research/{topic}/` 산출물을 prompt 키워드 fuzzy match로 자동 인지하므로 별도 path 인자 불요.

   **Boundary disclaimer** (also include): "이 06_implementation.md는 분야 분석에서 도출된 high-level 계획입니다. 본격적인 문서 작성·코드 구현은 autopilot-doc / autopilot-code로 인계됩니다."

   ### 07_resources.md — Code, Data & Model Resources
   - Tier-based repos: Tier 1 (directly usable for UD-KWS) / Tier 2 (backbone/infra) / Tier 3 (supplementary)
     Columns: repo | paper | stars | language | last-update | reproducibility | notes
   - Code-not-available high-impact papers (institution/reason)
   - Pre-trained models table: model | architecture | params | framework | checkpoint | URL
   - Reproducibility assessment matrix: paper | code | data | checkpoint | overall rating

   ### 08_reading_guide.md — Recommended Reading Paths
   - 4-5 purpose-based tracks:
     Track A: UD-KWS 입문자 (what is this field)
     Track B: 경량 모델 설계 (small model, good performance)
     Track C: 실전 구현 (I want to build a system)
     Track D: 연구자 (where are the open problems)
     Track E (optional): On-device 배포 전문가
   - Each track: target audience, goal, ordered paper list (5-7), reading point per paper, estimated time
   - Per-paper markers: 필수/권장/선택 for each track

   ### Mode `technology` — 7 files

   ### 00_briefing.md — Executive Briefing
   - 1-line summary, 3-5 line key findings, 1-page overview
   - Mermaid: technology landscape (categories, vendors, standards) — `graph TD`
   - Top-3 actionable insights (e.g., "production 환경엔 X 코덱이 사실상 표준", "오픈소스 대안 Y가 부상")

   ### 01_landscape.md — Technology Landscape
   - Category taxonomy (codecs / protocols / processing / hardware 등)
   - Key technologies × categories matrix
   - Lineage diagram (어떤 기술이 어디서 파생됐는지)
   - Adoption stage per technology (emerging / mainstream / legacy)

   ### 02_standards.md — Standards & Specs
   - Standards inventory: org (3GPP / ITU-T / IEEE / W3C / IETF) | spec ID | scope | year | status
   - Per-standard detail: 핵심 sections, mandatory vs optional features, profile/level
   - Cross-references between specs (예: VoLTE는 3GPP 26.171 + IETF SDP + ITU-T G.722.2)
   - **Takeaway**: 어느 표준을 따라야 하는가 (production / research 별도)

   ### 03_vendor_comparison.md — Vendor / Solution Comparison
   - Vendor matrix: vendor | product/SDK | licensing | platform | strengths | weaknesses
   - Capability checklist: feature × vendor (Yes/No/Partial)
   - Cost·license model 비교 (proprietary / open-source / royalty)
   - **Takeaway**: 사용 시나리오별 추천 솔루션

   ### 04_technical_deep_dive.md — Algorithm·Protocol Details
   - 3-5 핵심 기술 테마, each: 문제 정의 → 알고리즘 비교 → key insight
   - Critical equations / pseudocode / state machines (필요 시)
   - Performance trade-off 분석 (latency / quality / complexity)

   ### 05_deployment.md — Deployment Considerations
   - Reference architectures (network topology / signal flow)
   - Latency budget breakdown
   - Integration paths (existing system → new tech 마이그레이션)
   - Failure modes + mitigation
   - Cost model (CapEx / OpEx / per-call cost 등 해당 시)

   ### 06_implementation.md — Goal-Adaptive Action Roadmap (academic mode와 동일 템플릿; build / adopt 우선)

   ### 07_resources.md — Open-source Code, Models, Tools
   - Tier-based resources: Tier 1 (직접 사용 가능) / Tier 2 (참조용) / Tier 3 (실험용)
   - Pre-trained checkpoints (있다면) | platform support | license
   - Evaluation tools, test datasets, benchmarking suites

   ### Mode `market` — 5 files

   ### 00_briefing.md — Executive Briefing
   - 1-line summary, 3-5 line key findings, 1-page overview
   - Top-3 strategic implications

   ### 01_market_overview.md — Market Sizing & Segmentation
   - Total Addressable Market (TAM) / Serviceable (SAM) / Obtainable (SOM)
   - Segment breakdown: by region / customer type / use case
   - Growth rate (CAGR) + projection 3-5년
   - Source attribution table (출처 / 발행일 / 신뢰도)
   - **Takeaway**: 시장 규모 + 어디서 성장 동인이 나오는가

   ### 02_key_players.md — Competitor Profiles
   - Top 5-10 players: name | revenue / market share | products | strategy | recent moves
   - Positioning map (2D, 예: price vs feature)
   - Recent M&A / partnership / funding 동향
   - **Takeaway**: 경쟁 구도 1줄 요약

   ### 03_trends.md — Market Trends & Drivers
   - Driver factors (technology / regulation / customer need)
   - Inhibitor factors (cost / risk / inertia)
   - Disruptor candidates (incumbent를 위협할 수 있는 신기술·플레이어)
   - Timeline (단기 / 중기 / 장기 trend 분리)

   ### 04_opportunities.md — Opportunity Assessment
   - Whitespace identification (충족되지 않는 needs)
   - Entry strategy options (organic / partnership / acquisition)
   - Risk register
   - **Recommended actions** (prioritized)

   ## Quality Directives
   - Cross-reference other reports: [text](filename.md)
   - Every comparison table MUST end with bold **Takeaway** line
   - Mermaid: use graph TD with style directives for key nodes
   - Code snippets in 06_implementation.md must be runnable Python
   - Numbers only from card files / analysis_summary — NO fabrication
   - Do NOT return report content in response — write files only
   Return file paths + 3-5 line Korean summary."

Step 4b: QA Loop (max 2 rounds; quick = 1 round)

QA level: --qa flag if provided, else auto-detect (<=10 papers: light, 11-25: standard, >25 or deep: thorough).

Two reviewer roles run in parallel at standard+:

Quality reviewer(s): coverage / no-fabrication / progressive disclosure / actionable roadmap
Fact-checker (NEW): cards/ verbatim 대조 — reports에 인용된 venue/year/metric/lineage가 source cards와 일치하는지 narrow 검증

Level	Quality reviewer	Fact-checker (parallel)	Max rounds
quick	1× 품질관리팀 (sonnet), spot-check만	skip	1 (no re-invoke even on 🔴)
light	1× 품질관리팀 (sonnet)	skip (quality reviewer covers basic spot-checks)	2
standard	1× 품질관리팀 (opus)	1× 품질관리팀 fact-check (sonnet)	2
thorough	2× 품질관리팀 parallel (opus, completeness + accuracy)	1× 품질관리팀 fact-check (sonnet)	2

Why Sonnet for fact-checker: cards verbatim 대조는 _창의적 판단_이 아닌 _단순 매칭 작업_이라 Sonnet으로 충분. 비용 효율적.

round = 0, review_dir = {artifact_dir}/_internal/reviews/
Loop:
  round += 1

  # Parallel reviewer invocation (single message with multiple Agent calls per QA Scaling)

  Quality reviewer prompt (opus or sonnet per level):
    "Review research survey report — _coverage / no-fabrication / disclosure / roadmap_ focus.
     Topic: {topic}. Reports dir: {artifact_dir}.
     Verify: coverage, no fabrication, progressive disclosure, actionable roadmap.
     Do NOT individually verify each citation (model venue/year/metric) — that's the fact-checker's role at standard+.
     Write to: {review_dir}/round_{round}_quality.md (or round_{round}.md at light level).
     Return ONLY path + one-line verdict."

  Fact-checker prompt (sonnet, parallel — standard/thorough only):
    "You are a fact-check focused reviewer — NOT report quality.
     Topic: {topic}. Reports dir: {artifact_dir}. Cards: {artifact_dir}/cards/.

     For every domain claim in the reports (model name / venue / year / metric / dataset /
     lineage / classification mentioned in 00_briefing through last report), open the
     corresponding card and verbatim compare:
     - Single source of truth: {artifact_dir}/cards/*.md
     - If a report claim has no matching card → flag as 🔴 (fabrication risk)

     Do NOT comment on coverage, narrative, or roadmap quality — that's the quality reviewer's job.
     Cost-aware mode (sonnet): table-only output. Limit to ~30 most material claims (prioritize Tier 1 papers + key models in user-prompt).

     Output table:
     | Report | Section | Claim | Source card (file:line) | Match (✅/❌) | Severity (🔴/🟡) |

     Write to: {review_dir}/round_{round}_factcheck.md.
     Return ONLY path + one-line verdict."

  No 🔴 from any reviewer → exit.
  qa_level == quick → after round 1, write unresolved.md if any 🔴 remain (tag fact-check residuals as [FACT-RESIDUAL]), exit. NEVER re-invoke 연구팀.
  🔴 from quality + round < 2 → re-invoke 연구팀 with quality findings.
  🔴 from fact-checker + round < 2 → re-invoke 연구팀 with mandatory ref-grounding (re-read named cards).
  🔴 from both + round < 2 → re-invoke 연구팀 with combined findings.
  round >= 2 + 🔴 remain → write unresolved.md (tag fact-check residuals as [FACT-RESIDUAL]), exit.

Step 4c: Status Check

Verify {artifact_dir}/00_briefing.md exists. Not exists → pipeline_summary(failed) → STOP.

Step 5: Pipeline Summary

Write {artifact_dir}/pipeline_summary.md BEFORE reporting:

# Research Survey Pipeline Summary: {topic}
- **Date**: {YYYY-MM-DD}
- **Query**: {query}
- **Depth**: {depth}
- **Status**: done / partial / failed
- **From-Stage**: {stage if resumed via --from, else "N/A"}

## Process Log
| Step | Action | Result | Notes |
|---|---|---|---|
| 1 | Input parsing | {type} | topic: {topic} |
| 2a | Query Expansion | {N} queries | original + {N-1} variants |
| 2b-c | Paper Search (Agent) | {N} papers | sources: {list} |
| 2e | Query Expansion Rounds | {N} rounds | new papers per round: {list} |
| 3 | Paper Analysis (Agent x N) | {N} analyzed | depth: {depth}, loopbacks: {N} |
| 4 | Report Generation (Agent + QA) | {N} files (mode={mode}: academic=9 / technology=7 / market=5) | QA: {level}, rounds: {N} |

## Artifacts
- Search: {artifact_dir}/_internal/search_results.json
- Analysis: {artifact_dir}/analysis_summary.md
- Reports: {artifact_dir}/00_briefing.md ~ {last_report.md} (mode-aware: academic→08_reading_guide / technology→07_resources / market→04_opportunities)

## Decision Points
| Step | Decision | Response | Action |
|---|---|---|---|
| (from in-memory log) |

Step 6: Briefing

Read 00_briefing.md and 06_implementation.md (for the inferred goal + Next Pipeline) and present:

Level 0 summary (one line)
Level 1 overview (3-5 lines)
Key stats: total papers, core papers, code availability
File paths for all reports (mode-aware: academic→00~~08 / technology→00~~07 / market→00~04)
Next pipeline recommendation: read the ## Next Pipeline section from 06_implementation.md and present the inferred goal + recommended next command verbatim. Make it copy-paste-ready.
"질문이 있으시면 물어보세요. 보고서를 기반으로 답변드리겠습니다."

Pipeline completion: Step 5 determines formal status. Step 6 is optional interaction.

Decision Logging

Record after each gate: {step | decision | response | action}. Populate pipeline_summary Decision Points table.

Safety Rules

Do NOT fabricate citations, URLs, or metrics
Source failure → continue with remaining sources
(no --refs flag — supplementary local materials read from analysis_project/paper/ if exists; not asked otherwise)
Rate limits: arXiv ~3s, OpenAlex 10 req/s, S2 1 req/s, Google Scholar 3s + 50/day
Context protection: each Agent returns ONLY file paths + 3-5 line summary
Context budget: deep 모드에서 오케스트레이터 context가 누적됨 (쿼리 확장 라운드 + 스키밍 배치 + loopback). Agent 결과는 항상 파일로 저장하고 요약만 context에 유지. search_results.json 전체를 context에 올리지 않고 paper count + top-5만 참조.
MERGE mode 무결성: 제목 fuzzy matching은 lowercase + 구두점 제거 + a/an/the 제거로 정규화. 같은 논문의 discovery_count는 단조 증가만 허용 (감소 금지).
Playwright 고아 프로세스: 탐색팀 호출 전후로 pkill -f chromium_headless_shell 실행

Task

$ARGUMENTS

autopilot-research

Language Rule

Argument Parsing

Modes

--mode academic (default)

--mode technology

--mode market

Decision Defaults (no autonomy gating)

Resume (--from)

pipeline_state.yaml

Pipeline

Step 1: Input Parsing & Validation

Step 1.5: Scope Clarification (사전 조율) — skipped if --no-clarify or --from

Step 2: Source Search (direct Agent call) — mode-aware

Step 2a: 초기 쿼리 확장 (LLM 지식 기반)

Step 2b: HF MCP Pre-Fetch

Step 2c: Invoke Agent

Step 2d: Post-Search Validation

Step 2e: Query Expansion Rounds (depth-gated)

Step 3: Source Analysis (direct Agent calls) — mode-aware

Step 3a: Playwright Pre-Check + 탐색팀 Pre-Fetch

Step 3b: Phase A — Parallel Skimming Batches

Step 3c: Phase B — Reference Chaining (depth-gated)

Step 3d: Phase C — Code & Model Search

Step 3e: Compile analysis_summary.md

Step 3 Status Check

Step 4: Report Generation (direct Agent call + QA loop)

Step 4a: Generate Reports

Step 4b: QA Loop (max 2 rounds; quick = 1 round)

Step 4c: Status Check

Step 5: Pipeline Summary

Step 6: Briefing

Decision Logging

Safety Rules

Task

Language Rule

Argument Parsing

Modes

--mode academic (default)

--mode technology

--mode market

Decision Defaults (no autonomy gating)

Resume (--from)

pipeline_state.yaml

Pipeline

Step 1: Input Parsing & Validation

Step 1.5: Scope Clarification (사전 조율) — skipped if --no-clarify or --from

Step 2: Source Search (direct Agent call) — mode-aware

Step 2a: 초기 쿼리 확장 (LLM 지식 기반)

Step 2b: HF MCP Pre-Fetch

Step 2c: Invoke Agent

Step 2d: Post-Search Validation

Step 2e: Query Expansion Rounds (depth-gated)

Step 3: Source Analysis (direct Agent calls) — mode-aware

Step 3a: Playwright Pre-Check + 탐색팀 Pre-Fetch

Step 3b: Phase A — Parallel Skimming Batches

Step 3c: Phase B — Reference Chaining (depth-gated)

Step 3d: Phase C — Code & Model Search

Step 3e: Compile analysis_summary.md

Step 3 Status Check

Step 4: Report Generation (direct Agent call + QA loop)

Step 4a: Generate Reports

Step 4b: QA Loop (max 2 rounds; quick = 1 round)

Step 4c: Status Check

Step 5: Pipeline Summary

Step 6: Briefing

Decision Logging

Safety Rules

Task

`--mode academic` (default)

`--mode technology`

`--mode market`

Resume (`--from`)

Step 1.5: Scope Clarification (사전 조율) — skipped if `--no-clarify` or `--from`

`--mode academic` (default)

`--mode technology`

`--mode market`

Resume (`--from`)

Step 1.5: Scope Clarification (사전 조율) — skipped if `--no-clarify` or `--from`