一键在 Manus 中运行任何 Skill

storage-classify

Decide where a produced artifact must be persisted before writing it, then route it the prescribed way — semantic_models / metrics / reference_sql via the matching task() subagent, knowledge via extract-knowledge (lite), memory via add_memory, skills via create-skill, and AGENTS.md edited directly. Load before persisting any business fact, validated SQL, metric/model definition, session preference, project convention, or reusable workflow.

在 Manus 中运行

星标1,285

分支204

更新时间2026年6月16日 05:42

来源

Datus-ai

Datus-ai/Datus-agent

打开 GitHub 仓库查看创作者相关仓库

安装命令

下载

在 Manus 中运行

SKILL.md

readonly

同仓库更多 Skills

同仓库

build-kb

Datus-ai/Datus-agent

Build the project's vector-indexed knowledge base from files plus database metadata — optionally scoped to specific files / tables / datasources / domains. Scan the in-scope material, classify it into business domains, explore each domain in parallel with explore subagents, then (after the user confirms a generation manifest) route every artifact to its store via storage-classify, generating semantic_models / metrics / reference_sql (and mining any extra knowledge), and refresh AGENTS.md's KB index. The lightweight /init handles the AGENTS.md inventory plus file-based knowledge/memory; this skill owns the heavy vector-store generation.

2026-06-161.3k

init

Datus-ai/Datus-agent

Lightweight project initialization — infer the project goal and in-scope datasources, scan the file tree and database metadata (db/table/desc/sample), classify into business domains, then write an AGENTS.md inventory skeleton plus the cheap file-based stores (atomic facts to ./knowledge/*.md via lite extract-knowledge, durable preferences to memory). Stops short of the expensive vector-indexed stores (semantic_models / metrics / reference_sql). Single confirmation-free pass, low token cost.

2026-06-161.3k

extract-knowledge

Datus-ai/Datus-agent

Mine shortest atomic facts from (question + gold_sql) pairs into ./knowledge/*.md; either by simulating SQL drafting (lite) or by driving the gen_sql subagent in blind iteration (deep)

2026-06-111.3k

memory-organization

Datus-ai/Datus-agent

Audit and reorganize every persistent store — semantic_models, metrics, reference_sql, knowledge, memory, AGENTS.md, skills — verifying each item sits in the correct store per storage-classify, and surfacing duplicates, misclassifications, conflicts, and stale/erroneous entries. Produce a Remediation Plan, STOP for confirmation, then execute. Use ask_user only for genuine decisions during analysis. If nothing needs fixing, report it and stop.

2026-06-111.3k

session-summarize

Datus-ai/Datus-agent

Review the current chat session and persist its valuable takeaways — business facts/rules, validated SQL, metric/model definitions, durable preferences, project conventions, reusable workflows — by classifying each via storage-classify and routing it to the right store. Present a Summary Manifest and STOP for confirmation before any heavy generation. Use at the end of a working session or when the user asks to capture what was learned.

2026-06-111.3k

gen-metrics

Datus-ai/Datus-agent

Generate MetricFlow metrics from natural language business descriptions

2026-06-081.3k

name	storage-classify
description	Decide where a produced artifact must be persisted before writing it, then route it the prescribed way — semantic_models / metrics / reference_sql via the matching task() subagent, knowledge via extract-knowledge (lite), memory via add_memory, skills via create-skill, and AGENTS.md edited directly. Load before persisting any business fact, validated SQL, metric/model definition, session preference, project convention, or reusable workflow.
tags	["storage","classification","routing","persistence"]
version	1.3.0
user_invocable	false
disable_model_invocation	false

Classify and Route Persistent Storage

You receive one or more pieces of content that might be worth persisting (a business fact, a validated SQL, a metric or model definition, a session preference, a project convention, a reusable workflow). For each piece you must:

Classify it into exactly one of the seven persistent stores — or decide it should NOT be persisted.
Route it the prescribed way for that store (delegate to a task subagent / another skill / a memory tool, or edit AGENTS.md directly).

You are a router and curator, not a generator. The goal is that every artifact lands in the one store that fits its shape, lifetime, and reuse pattern — and is written with the mechanism that owns that store.

Role & Boundary

semantic_models / metrics / reference_sql MUST go through a task subagent. Never hand-write their YAML files or vector-store rows yourself — the subagents own schema, validation, and indexing.
knowledge goes through extract-knowledge in lite mode — summarize the atomic fact directly against the source; do NOT trigger the deep blind-SQL-iteration flow.
memory goes through add_memory / edit_memory — the only writers; they enforce the 2000-byte cap.
skills goes through create-skill.
AGENTS.md is the only file you edit yourself (write_file / edit_file).
When content does not clearly belong to any store, or routing would overwrite something destructively, call ask_user. If ask_user is unavailable, report the gap and do not write by default.
This skill only routes — it does not own persistence mechanics. Whether a store is a flat file, a vector index, or both, and the durability of that write, is the responsibility of the target subagent/skill (e.g. gen_sql_summary must persist and index its reference_sql). Do not add file-vs-index verification logic here.

Context Handoff to Subagents

Every task() subagent and extract-knowledge run in an isolated context. They do NOT inherit the current session/conversation, the project goal, the active datasource, the inventory you built, or the facts you just gathered. A bare table name or a SQL string with no surrounding context yields a generic, poorly-indexed artifact (wrong dimensions, empty search_text, a summary that ignores a mandatory filter). So whoever dispatches the task MUST inline everything the generator needs. At minimum pass:

Datasource name (and dialect) the artifact targets — so paths, identifiers, and YAML data_source resolve to the right place.
Business intent / the question the artifact answers — what it is for, in one or two lines.
Rules and encodings you already discovered that the generator must honor (a mandatory constant filter like rtype='D', a column encoding, a join-key trap), so it does not rediscover, ignore, or contradict them.
For reference_sql / metrics: the knowledge atoms mined from the same source, so the summary / search_text / measure expression stay consistent with the rule that explains the SQL.

The dispatching skill owns this handoff — the prompt seeds in the decision tree below (<table names + intent>, <complete SQL + business context>, …) are minimums, not the whole prompt.

Decision Tree

Walk top to bottom; the first match wins. Each item routes to exactly one store.

Reusable atomic business fact or rule — field encoding / enum / status code, mandatory constant filter, join trap (must go through a mapping table), boundary trap (strict vs non-strict inequality), business-term-to-field mapping, same-name-field divergence. → knowledge — delegate to extract-knowledge (lite mode).
Structured semantic definition of a table — its identifiers, measures, dimensions (the structure, not a single metric). → semantic_models — task(type="gen_semantic_model", prompt=<table name(s) + intent>).
A reusable metric built on a measure — a named business metric, its aggregation expr, base measures, available dimensions. → metrics — task(type="gen_metrics", prompt=<metric description or SQL>).
A complete, validated SQL worth indexing for semantic search / future reuse (plus a human summary). → reference_sql — task(type="gen_sql_summary", prompt=<the complete SQL + business context>). One SQL = one call = one entry. gen_sql_summary produces exactly one reference_sql row per invocation, so dispatch a separate task(gen_sql_summary) call for every distinct query. NEVER bundle multiple SQLs into one prompt — that collapses them into a single useless entry whose sql/search_text mixes unrelated queries and breaks few-shot retrieval. When the query came with an original natural-language question, pass it along and instruct the generator to use it verbatim as search_text — a future user question matches a stored question far better than it matches SQL keywords.
A high-level project overview — architecture, directory map, services, data assets, or the knowledge index. → AGENTS.md — edit the file directly (see AGENTS.md Section Ownership).
A lightweight, cross-session preference or context bound to one agent (≤ 2000 bytes) — a user/team habit, a default the agent should remember next session. → memory — add_memory / edit_memory.
A reusable, multi-step workflow or operating procedure that will be invoked repeatedly. → skills — delegate to create-skill.
None of the above — one-shot, inferable from INFORMATION_SCHEMA / table comments / column names, or not reusable. → Do NOT persist. Say so and stop.

Per-Store Reference

Store	Path	Format	Route (mechanism)
semantic_models	`./subject/semantic_models/{datasource}/{name}.yml` (anchored to project root)	YAML — `data_source` with `identifiers` / `measures` / `dimensions`; also mirrored into LanceDB `semantic_model`	`task(type="gen_semantic_model", …)` — prompt MUST name the table(s)
metrics	LanceDB `metrics` table (`~/.datus/data/{project}/datus_db/`)	Vector rows — `measure_expr`, `metric_type`, `base_measures`, dimensions	`task(type="gen_metrics", …)` — built on a semantic model's measures
reference_sql	`./subject/sql_summaries/{id}.yaml` + LanceDB `reference_sql`	YAML — `id` / `name` / `sql` / `summary` / `search_text` / `tags`	`task(type="gen_sql_summary", …)` — prompt MUST carry one complete SQL; one call per query (never batch multiple SQLs into a single entry)
knowledge	`./knowledge/<domain-slug>.md`, indexed under `AGENTS.md ## Knowledge`	Markdown atomic facts (no longer a vector store)	`extract-knowledge` (lite mode)
memory	`{workspace_root}/.datus/memory/{node}/MEMORY.md` (only `chat` and custom subagents)	Markdown — hard 2000-byte cap	`add_memory` / `edit_memory` (the only writers)
AGENTS.md	`./AGENTS.md` (project root; first ~200 lines injected into `<project_context>`)	Markdown — Architecture / Directory / Services / Knowledge …	edit directly (`write_file` / `edit_file`)
skills	`./.datus/skills/` (project) > `~/.datus/skills/` (user) > `datus/resources/skills/` (builtin); first-wins	`SKILL.md` + YAML frontmatter	`create-skill`

When NOT to pick a store:

Not semantic_models — a single metric (that's metrics); a one-table fact already in the column comment.
Not metrics — structural table/column/measure definitions (that's semantic_models).
Not reference_sql — a one-shot query never to be reused; or the reason a query is written a certain way (that atomic rule is knowledge).
Not knowledge — anything inferable from INFORMATION_SCHEMA / comments, generic SQL knowledge, or a fact that mechanically composes from existing facts.
Not memory — a team-level long-lived business fact (that's knowledge); anything that would exceed 2000 bytes.
Not AGENTS.md — fine-grained atomic facts (those are knowledge; AGENTS.md only holds their index).
Not skills — a declarative "what is true" fact (that's knowledge); skills capture executable "how to do" procedures.

AGENTS.md Section Ownership

Because you edit AGENTS.md yourself, keep its sections consistent and minimal. Sections and their intent:

Section	Purpose	Edit constraint
`# <project name>`	One-line project description	Keep one line
`## Architecture`	Data flow / stack / how services connect	Brief; ASCII diagram only if complex
`## Directory Map`	Directory / Purpose / Key Entry Point / Consumer	Table; main dirs only
`## Services`	Service / Type / Connection / Description	Configured DBs + user-mentioned services
`## Data Assets`	Per-database high-level summary	Never enumerate every table — categorize + count
`## Recommended Tools`	Runtime tools per service type	Only configured/mentioned services
`## SQL Conventions`	Project's induced SQL output discipline (answer-shape rules)	Induced from the project's validated-SQL corpus by `/init`; schema-free bullets; project-level presentation guidance, NOT knowledge
`## Semantic Models`	KB index: how many models, which tables	Count + `search_semantic_model`; do NOT inline definitions
`## Metrics`	KB index: how many metrics, key names	Count + `search_metrics`; do NOT inline definitions
`## Reference SQL`	KB index: how many validated queries	Count + `search_reference_sql`; retrieved for few-shot, not read as files
`## Knowledge`	Index of `./knowledge/*.md` (mapped to files)	Maintained by `extract-knowledge` — do not overwrite its entries

AGENTS.md is the KB entry point. It is injected into every node's <project_context>, so it is the one reliable place that tells a downstream agent what KB exists and how to reach it. Index retrieval-backed stores (semantic_models/metrics/reference_sql) by count + which search tool; index knowledge by file links. Never inline the stores' contents.

Write only sections that have real content. Omit any section you have nothing concrete to put in — never write empty "none yet" / "not built yet" placeholder lines. A section is added later (per the handling order below) when content for it actually exists.

Handling order:

AGENTS.md missing → write_file a minimal skeleton (# <project dir name> plus the relevant section); leave the inventory + knowledge/memory sections for /init to fill in and the vector-index sections (## Semantic Models / ## Metrics / ## Reference SQL) for /build-kb. Do not block on the user running /init first.
Exists but missing the target section → insert it in the canonical order above (Knowledge goes last).
Exists with the section → prefer a minimal edit_file. Never rewrite the whole file when a scoped edit suffices, and never touch ## Knowledge entries owned by extract-knowledge.

Disambiguation (overlapping boundaries)

knowledge vs memory — a team-level, long-lived business fact → knowledge; a session-level, agent-bound preference or working context → memory.
reference_sql vs knowledge — not exclusive, they feed each other. The complete reusable SQL → reference_sql (an example that teaches answer shape, retrieved later for few-shot); the atomic business rule mined from why that SQL is written that way → knowledge (teaches why). A single (question, gold_sql) pair routes to BOTH: store the example, and run extract-knowledge to mine the rule.
output / answer conventions are NOT knowledge. "Return only the ratio column", "match the exact column projection", "don't add helper columns", "count questions return a single scalar" — these are presentation conventions, not facts about the data. They live in two project-level places: as reference_sql examples (few-shot, the strongest teacher) and, when a recurring pattern can be induced from the corpus, as schema-free bullets in AGENTS.md ## SQL Conventions (rides in <project_context>). They are NOT knowledge atoms and NOT a global prompt rule — answer shape is project/context-specific (strict-eval wants minimal projection; interactive BI often wants extra context columns). Only durable data-model facts (field encodings, term→field mappings, mandatory filters) are knowledge. Never encode a benchmark/eval quirk as knowledge.
semantic_models vs metrics — the structure (table/column/measure/dimension definitions) → semantic_models; a concrete metric built on a measure → metrics.
AGENTS.md vs knowledge — a high-level overview or index → AGENTS.md; a fine-grained atomic fact → knowledge (AGENTS.md only indexes it).
skills vs knowledge — an executable, repeatable procedure ("how to do") → skills; a declarative statement ("what is true") → knowledge.

Output / Behavior

For each piece of input:

State the verdict: { target store, the decision-tree branch that matched and why }.
Execute the routing for that store — call the prescribed task subagent / skill / memory tool, or directly edit AGENTS.md.
If the content does not clearly fit any store, or routing would destructively overwrite existing content, call ask_user with numbered options before acting.

Process multiple pieces one at a time. When ask_user is unavailable and a verdict is ambiguous, report what is missing and route nothing by default.

End with a short human-readable summary: how many pieces were classified, where each was routed (or why it was not persisted), and any items still awaiting user confirmation. Do not return a JSON envelope.

Forbidden

Do not hand-write semantic_models / metrics / reference_sql YAML or vector-store rows — always delegate to the matching task subagent.
Do not bundle multiple SQLs into one gen_sql_summary call — one query per call, one entry per query.
Do not run extract-knowledge in deep mode — use lite.
Do not exceed the 2000-byte memory cap, and do not write memory with any tool other than add_memory / edit_memory.
Do not overwrite ## Knowledge entries in AGENTS.md — those are owned by extract-knowledge.
Do not persist one-shot content, generic SQL knowledge, or anything inferable from INFORMATION_SCHEMA / table comments / column names.
Do not generate the artifact's content yourself (e.g. write the SQL, invent the metric) — your job is classification and routing, not authoring.
Do not resolve a destructive overwrite silently — go through ask_user.