Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

research-intake

Source-governed research link intake workflow. Use when asked to find papers, repos, datasets, docs, standards, benchmarks, source surfaces, or trusted links; collect research links; search literature; update a research base; run research intake; produce links.md; or prepare auditable JSONL evidence. The skill asks for a topic ID when missing, creates or reviews topic configs with user approval, discovers trusted source roots, writes URL-bearing query files, runs fetch/dedupe/check/finalize, and produces links.md plus run-local accepted.jsonl evidence. It does not write reports or syntheses.

Exécuter dans Manus

Aperçu

Commande d'installation

npx skills add https://github.com/mzqef/research-intake --skill research-intake

Copiez et collez cette commande dans Claude Code pour installer le skill

Source

mzqef/research-intake

Étoiles0

Forks0

Mis à jour16 mai 2026 à 21:35

Explorateur de fichiers

9 fichiers

SKILL.md

readonly

Plus depuis ce dépôt

même dépôt

research-intake

mzqef/research-intake

2026-05-160

Source

mzqef

mzqef/research-intake

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

name	research-intake
version	0.2.0
description	Source-governed research link intake workflow. Use when asked to find papers, repos, datasets, docs, standards, benchmarks, source surfaces, or trusted links; collect research links; search literature; update a research base; run research intake; produce links.md; or prepare auditable JSONL evidence. The skill asks for a topic ID when missing, creates or reviews topic configs with user approval, discovers trusted source roots, writes URL-bearing query files, runs fetch/dedupe/check/finalize, and produces links.md plus run-local accepted.jsonl evidence. It does not write reports or syntheses.
allowed-tools	["Bash","Read","Write","WebSearch","WebFetch","AskUserQuestion"]
triggers	["find papers for","search literature","collect links","research intake","intake research","update research base"]

Research Intake Skill

Research Intake is not a report writer. It is the intake layer that creates a small, auditable, topic-scoped set of accepted links. The CLI is the deterministic I/O layer. The model is responsible for topic judgment, source trust, query planning, semantic filtering, and concise user gates.

The final user-visible deliverable is normally:

topics/<topic-id>/captures/links.md

The structured evidence for a run is:

runs/<run-id>/accepted.jsonl

Runtime Bootstrap

Use this command:

RI_CMD="__RESEARCH_INTAKE_COMMAND__"
RI_ROOT="${RESEARCH_INTAKE_ROOT:-.research-intake}"

All CLI calls should include --root "$RI_ROOT".

If the command is not installed and the current working directory is the project checkout, use:

RI_CMD="uv run research-intake"

Respect an explicit user-provided root. If the user gives no root, use RESEARCH_INTAKE_ROOT when set, otherwise .research-intake in the current working directory.

Before a non-trivial run, check the command and root shape:

$RI_CMD doctor --root "$RI_ROOT"

If doctor fails because the root does not exist, initialize only when the user has asked to start or update an intake base:

$RI_CMD init --root "$RI_ROOT"

Artifact Contract

Root layout:

<root>/
  initial_sources.jsonl
  topics/<topic-id>/
    config.yaml
    sources.jsonl
    progress.md
    captures/links.md
    tasks/
  runs/<run-id>/
    expanded_queries.jsonl
    raw_results.jsonl
    candidates.jsonl
    rejected.jsonl          optional, model-written
    categories.json         optional, model-written
    keep.json               model-written before finalize
    accepted.jsonl          written by finalize

The following schema blocks are illustrative only. They show required shape and field meaning, not fixed values. Always generate topic IDs, labels, ideas, keywords, source IDs, URLs, resource titles, and dates from the user's actual topic and verified sources.

Topic config schema example:

topic_id: <chosen-topic-id>
label: <Human Readable Topic Label>
description: <one-sentence topic scope>
include:
  - idea: <included concept or source-governed research need>
    keywords:
      - <keyword>
      - <related keyword>
    description: <why this idea belongs in scope>
    objects:
      - <target artifact, method, dataset, benchmark, repo, paper, or docs type>
exclude:
  - idea: <excluded concept>
    description: <what to reject>
    reason: <why it is outside the intake scope>
budget:
  max_queries_per_source: 5
  max_results_per_query: 10

Source record schema example, one JSON object per line:

{"source_id":"<source_slug>","url":"https://<trusted-root-or-collection>/"}

Optional inactive/candidate source example:

{"source_id":"<source_slug>","url":"https://<trusted-root-or-collection>/","status":"candidate"}

Rules for sources:

source_id is a short stable slug, preferably lowercase with underscores.
url is a trusted root, collection, docs home, org page, index, or source surface.
A specific result page is not a source. It belongs in expanded_queries.jsonl as a URL to fetch.
Sources with missing status are active. Sources with status other than active are not fetched.
Keep sources.jsonl compact: one current row per source_id. Replace a row when updating it, keeping duplicate source events out of the file.

Resource record schema used in raw_results.jsonl, candidates.jsonl, and accepted.jsonl:

{
  "resource_id": "url:<hash16>",
  "source": "<source_slug>",
  "title": "<verified resource title>",
  "summary": "Short metadata description if available.",
  "link": "https://<verified-result-url>",
  "domain": "<verified-domain>",
  "year": 0,
  "query": "https://<verified-result-url>",
  "fetched_at": "<UTC timestamp>"
}

Decision Gates

Ask the user only at real gates. Batch choices into one question.

Mandatory gates:

Topic ID when missing.
Scope for a new topic.
Approval before changing config.yaml.
Approval before adding or changing source records.
Hard-cap recovery when more than 256 candidates remain.
Candidate group keep/reject/inspect decision.

If an AskUserQuestion tool is unavailable, ask one concise chat question and wait. Use group-level questions unless the user explicitly requests candidate-by-candidate review.

After a gate is answered, continue the workflow. Source discovery and config review are not final deliverables.

Phase 0 - Understand The Request

Classify the user's request:

New intake: collect links for a topic that is not yet configured.
Existing intake: update or rerun a known topic.
Source maintenance: add, remove, or clean source roots.
Review-only: inspect existing candidates or links without running fetch.

If the user asks for a literature review, synthesis, report, or summary, explain that this skill only performs link intake. Offer to produce links.md first.

Choose a run ID when one is not provided:

YYYY-MM-DD-<topic-id>-v1

If that directory exists, increment the suffix: v2, v3, and so on. Preserve existing run directories.

Phase 1 - Topic Gate

If no topic ID is explicit, stop and ask:

Which topic ID should I use? Use a short stable slug, for example `<topic-id>`.

Ask for the topic ID instead of inferring it from nearby files or previous context.

Check whether the topic exists:

test -f "$RI_ROOT/topics/<topic-id>/config.yaml"

New Topic

If the topic does not exist:

Ask for the scope if the user has not supplied it.
Create the skeleton:

$RI_CMD new-topic <topic-id> --root "$RI_ROOT" --description "<approved scope>"

Read the generated config.yaml.
Draft a richer config update with label, description, include, exclude, and budget.
Show the proposed config or a compact before/after diff.
Ask one approval question:

Apply this topic config? Options: apply, edit first, leave default.

Write config.yaml only after approval.
Continue to source discovery.

Existing Topic

For an existing topic, read:

topics/<topic-id>/config.yaml
topics/<topic-id>/sources.jsonl
topics/<topic-id>/progress.md if present
topics/<topic-id>/captures/links.md if present

Run:

$RI_CMD plan --root "$RI_ROOT" --topic <topic-id>

Review the config. The authority order is:

description > include idea > include keywords > include objects > exclude keywords

If the config is vague, contradictory, too broad, or missing exclusions, propose specific edits and wait for approval before writing them.

Good config review output is concrete:

I would change:
1. Add include idea "<missing in-scope concept>" because the description explicitly mentions it.
2. Add exclude idea "<out-of-scope result type>" because those pages are not reusable source-governed evidence.
3. Reduce max_results_per_query from <old value> to <new value> to keep review under the cap.

Then ask one approval question for the whole set.

Phase 2 - Source Discovery And Maintenance

This phase is mandatory before fetch unless the user explicitly says to use the existing sources only.

Goal: identify trusted source roots and concrete result URLs. Keep those two concepts separate.

Examples:

Good source root: https://<trusted-domain>/<collection-root>/
Good result URL:  https://<trusted-domain>/<specific-result-path>
Bad source root:  https://<trusted-domain>/<specific-result-path>

Use web search from the topic config. Build 3-8 probes from:

Topic description.
Include ideas.
Include keywords.
Known official organizations, libraries, standards, datasets, or benchmarks.
Exclude ideas, to avoid ambiguous wording.

Probe patterns:

<core idea> official docs
<core idea> <code or artifact host>
<core idea> dataset
<core idea> benchmark
<core idea> standard
<core idea> <topic-specific artifact type>
site:<known-source-domain> <core idea>
site:<candidate-domain> <core idea>

For each promising result, record:

Source root URL.
Concrete result URLs worth fetching.
Why the source is trusted.
Whether it should be active now or only remembered as candidate.

Trust levels:

high: official project/org docs, canonical repository/org, standards body, maintained dataset/benchmark index, conference/journal/index page.
medium: respected lab, project page, package index, curated list with clear maintenance.
reject: mirrors, SEO pages, scraped copies, generic blog spam, unrelated result pages.

Before editing sources.jsonl, show a compact table:

| # | source_id | root URL | status | trust | concrete URLs found |
| 1 | <source_slug> | https://<trusted-domain>/<collection-root>/ | active | high | <count> |

Ask one question:

Which source updates should I apply? Options: add all high-trust, add selected numbers, inspect selected numbers, skip source updates.

When approved, update topics/<topic-id>/sources.jsonl as compact JSONL:

{"source_id":"<source_slug>","url":"https://<trusted-domain>/<collection-root>/"}
{"source_id":"<another_source_slug>","url":"https://<another-trusted-domain>/<collection-root>/"}
{"source_id":"<candidate_source_slug>","url":"https://<candidate-domain>/<collection-root>/","status":"candidate"}

Keep one row per source ID. Preserve unrelated source rows. If changing a URL or status, replace that source's row.

If no useful new sources are found, say so in one sentence and continue with the existing active sources.

Phase 3 - Query Planning

The fetcher is generic URL metadata fetch. It does not perform general web search. Therefore query lines should usually contain verified concrete URLs on the source domain.

For each active source:

Use the discovered result URLs from Phase 2.
Use links already present in links.md only to avoid duplicates, not to refetch them as new candidates.
Use WebSearch/WebFetch to find additional concrete URLs if the source root is too broad.
Respect budget.max_queries_per_source.
Set budget_max_results to budget.max_results_per_query.

Write runs/<run-id>/expanded_queries.jsonl with one JSON object per line:

{"source_id":"<source_slug>","query":"https://<trusted-domain>/<specific-result-path>","budget_max_results":<max_results_per_query>,"topic_id":"<topic-id>"}
{"source_id":"<another_source_slug>","query":"https://<another-trusted-domain>/<specific-result-path>","budget_max_results":<max_results_per_query>,"topic_id":"<topic-id>"}

Multiple URLs may be placed in a query string when they belong to the same source domain, but keep lines readable. Prefer one URL per line when review clarity matters.

Avoid empty or purely conceptual query lines such as:

{"source_id":"<source_slug>","query":"<purely conceptual search phrase>"}

With the generic fetcher, that line does not search the web or the source site. It will not produce useful results unless the query includes concrete verified URLs on the source domain.

Phase 4 - Fetch, Dedupe, Check

Run fetch:

$RI_CMD fetch \
  --root "$RI_ROOT" \
  --topic <topic-id> \
  --run <run-id> \
  --queries "$RI_ROOT/runs/<run-id>/expanded_queries.jsonl"

Expected output file:

runs/<run-id>/raw_results.jsonl

Run dedupe:

$RI_CMD dedupe --root "$RI_ROOT" --run <run-id>

Expected output file:

runs/<run-id>/candidates.jsonl

Dedupe compares against earlier runs/*/accepted.jsonl files and against the current run. It does not use reading-log.jsonl.

Run the cap check:

$RI_CMD check --root "$RI_ROOT" --run <run-id>

If the check exits with code 3, more than 256 candidates remain. Stop and ask:

More than 256 candidates remain. How should we narrow this? Options: add exclusions, lower budget, split topic, restrict years/sources.

Then apply the approved change and rerun fetch/dedupe/check or manually narrow candidates.jsonl, depending on the user's choice.

If raw_results.jsonl is empty:

Inspect expanded_queries.jsonl.
Confirm each source_id is active in sources.jsonl.
Confirm query URLs share the source domain.
If needed, repair queries or sources and rerun fetch.
Finalize an empty run only when the user explicitly asks.

Phase 5 - Semantic Filtering

Read every candidate's title, summary, link, domain, source, and year. Apply the topic config semantically.

Reject a candidate when:

It matches an exclude idea.
It is a source root with no useful result-level content and the user wants only result links.
It is off-topic despite keyword overlap.
It is a duplicate not caught by URL/title dedupe.
The title is generic and the URL does not reveal a useful resource.

Keep a candidate when:

It directly supports an include idea.
It is a high-trust source result for the topic.
It is a useful dataset, benchmark, docs page, repo, paper page, standard, or artifact index relevant to the topic.

If you write rejected.jsonl, use this shape:

{"resource_id":"url:<hash16>","title":"<title>","link":"https://<verified-result-url>","reject_reason":"Excluded by <exclude idea>: <specific reason>."}

Keep raw_results.jsonl intact. If narrowing candidates before final review, write the reduced list to candidates.jsonl.

Phase 6 - Candidate Group Review

Before asking the user, group candidates by meaningful review units. Prefer:

Source plus domain.
Semantic theme.
Trust surface.
Result purpose: docs, dataset, benchmark, repository, paper page, standard.

Group candidates from the available schema fields; kind is not part of the schema.

Use a compact numbered digest in AskUserQuestion messages. Some clients flatten Markdown tables into unreadable text.

Candidate groups:
1. <source or theme group> (<count> links)
  Why keep: <short rationale>
  Examples: <title>; <title>

2. <source or theme group> (<count> links)
  Why keep: <short rationale>
  Examples: <title>; <title>

Ask exactly one group-level question:

Which groups should I accept for finalization?

Options:
- Accept all listed groups
- Accept only the group numbers I enter
- Reject the group numbers I enter
- Show details for group numbers I enter

Use those exact option labels when AskUserQuestion supports selectable options. For the three number-based options, tell the user to enter group numbers in the freeform field, for example 1, 3, 5.

If the user chooses Accept all listed groups, proceed to keep.json without follow-up.

If the user chooses Show details for group numbers I enter, show only those groups. For each inspected group, include:

Resource number.
Title.
Year or -.
Source.
Link.
One-line reason to keep or reject.

Then ask one follow-up for the inspected set. Keep review at group level unless the user requests candidate-by-candidate review.

Phase 7 - Write keep.json

Write final accepted IDs to:

runs/<run-id>/keep.json

Shape:

{"keep":["url:<hash16>","title:<hash16>"]}

Use resource_id values from candidates.jsonl. URL strings are accepted by the CLI as keep keys, but resource_id is preferred.

If all candidates are accepted, still write keep.json; it documents the review decision and makes finalization explicit.

Phase 8 - Finalize

Run:

$RI_CMD finalize --root "$RI_ROOT" --topic <topic-id> --run <run-id>

Expected outputs:

runs/<run-id>/accepted.jsonl
topics/<topic-id>/captures/links.md
topics/<topic-id>/progress.md

Finalization also removes stale topics/<topic-id>/captures/accepted.jsonl if one exists.

After finalization:

Read topics/<topic-id>/captures/links.md.
Confirm the table has only Year, Title, Source, ID, and Link data plus the row number.
Confirm there is no topic-level captures/accepted.jsonl.
Report the number of accepted links and the path to links.md.

Summarize the research content only when the user asks. The output is the link set, not a literature review.

Phase 9 - Optional Dashboard

Only start the dashboard when the user asks to view or edit the intake in a browser:

$RI_CMD serve --root "$RI_ROOT" --host 127.0.0.1 --port 8412 --no-open

If the port is occupied, choose another port. The dashboard reads links.md for the main table and sources.jsonl for source names/URLs.

Source URL Rules

Use these examples when deciding whether a URL belongs in sources.jsonl or expanded_queries.jsonl.

Good source roots:

https://<trusted-domain>/
https://<trusted-domain>/<collection-root>/
https://<trusted-domain>/<docs-root>/
https://<trusted-domain>/<dataset-or-benchmark-index>/

Good result URLs:

https://<trusted-domain>/<specific-paper-or-record>
https://<trusted-domain>/<specific-repository-or-artifact>
https://<trusted-domain>/<specific-doc-page>

Put result URLs in expanded_queries.jsonl so the fetcher can capture metadata; reserve sources.jsonl for source roots.

File Writing Rules

When writing JSONL:

One JSON object per line.
No trailing commas.
Preserve existing unrelated rows.
Keep source files compact.
Prefer ASCII unless the source title already contains non-ASCII text.

When writing YAML:

Preserve topic_id.
Keep include and exclude as lists.
Keep budget.max_queries_per_source and budget.max_results_per_query as positive integers.

When editing generated run files:

Preserve run directories.
Keep raw_results.jsonl as the fetch-produced evidence.
It is acceptable to rewrite candidates.jsonl, rejected.jsonl, categories.json, and keep.json as part of review.

Failure Recovery

Command not found:

Use `uv run research-intake` when inside the checkout, or ask the user to install the CLI.

Topic not found:

Ask whether to create it. Continue with the requested topic unless the user
chooses another topic.

Malformed config:

Show the parse error, propose a minimal YAML fix, ask before writing.

Duplicate sources:

Compact to one row per source_id, keeping the newest approved URL/status.

Specific page added as source:

Move the page URL to expanded_queries.jsonl and replace the source URL with the collection root.

Too many candidates:

Stop at the hard cap gate. Narrow before review.

No useful candidates:

Inspect source roots and query URLs. Use web search to find concrete URLs, then rerun.

Final Response Format

When the run completes, keep the final answer short:

Done. Final link page: topics/<topic-id>/captures/links.md
Accepted: <N> links
Evidence: runs/<run-id>/accepted.jsonl

Mention any unresolved issue only if it affects the link set.

Guardrails

Ask for a topic ID when it is missing.
Use only verified titles, URLs, resource IDs, summaries, and source endpoints.
Keep source roots and result pages separate.
Leave reading-log.jsonl out of this workflow.
Keep accepted evidence run-scoped; finalization writes runs/<run-id>/accepted.jsonl, not topic-level captures/accepted.jsonl.
Keep resource records to the documented schema fields.
Edit config.yaml only after approval.
Add or update sources only after approval.
After source discovery, continue to query planning unless blocked.
Narrow candidate sets above 256 before review.
Write reports or syntheses only as explicitly requested downstream artifacts.
Preserve run evidence and existing run directories.
Make links.md the primary artifact.