with one click
ds-analysis-campaign
// Use when a quest needs one or more follow-up runs such as ablations, robustness checks, error analysis, or failure analysis after a main experiment.
// Use when a quest needs one or more follow-up runs such as ablations, robustness checks, error analysis, or failure analysis after a main experiment.
Generate/edit images with OpenAI gpt-image-2 by default, falling back to Gemini (gemini-3.1-flash-image-preview) when OPENAI_API_KEY is unset. Supports text-to-image + image-to-image; 1K/2K/4K; use --input-image for editing, --provider to force a provider, --model to override the model.
Mandatory pre-flight compute resource check before running experiments. Detects whether local/remote GPU or compute resources are actually available. If resources are unavailable, STOPS the experiment pipeline immediately and reports to the user — preventing the model from hallucinating fake experiment results. Use when: about to run experiments, deploy training, or any GPU-intensive task.
Workflow 1.5: Bridge between idea discovery and auto review. Reads EXPERIMENT_PLAN.md, implements experiment code, deploys to GPU, collects initial results. Use when user says "实现实验", "implement experiments", "bridge", "从计划到跑实验", "deploy the plan", or has an experiment plan ready to execute.
Deploy and run ML experiments on local, remote, Vast.ai, or Modal serverless GPU. Use when user says "run experiment", "deploy to server", "跑实验", or needs to launch training jobs.
Dr. Claw workspace skill for project lookup, session inspection, TaskMaster progress, OpenClaw structured schema, and event-driven reporting
Use when a quest needs to attach, import, reproduce, repair, verify, compare, or publish a baseline and its metrics.
| name | ds-analysis-campaign |
| description | Use when a quest needs one or more follow-up runs such as ablations, robustness checks, error analysis, or failure analysis after a main experiment. |
| skill_role | stage |
| license | MIT |
| metadata | {"author":"ResearAI/DeepScientist","version":"1.0.0"} |
Use this skill when one or more follow-up runs are needed and the quest needs a coordinated evidence campaign.
This is the shared DeepScientist protocol for supplementary experiments after a durable result. Use the same route for:
For paper-facing work, treat “analysis campaign” broadly:
Do not assume a writing-facing campaign means “analysis only”.
Do not invent a separate experiment system for those cases.
bash_exec; do not use any other terminal path for slice execution, smoke tests, Git, Python, package-manager, or file-inspection commands.bash_exec for campaign slice commands so each run has a durable session id, quest-local log folder, and later read/list/kill control.artifact.interact(kind='milestone', reply_mode='threaded', ...) report.sage-clay: #E7E1D6, #B7A99A, #7F8F84 for the default aggregated campaign summarymist-stone: #F3EEE8, #D8D1C7, #8A9199 for conservative or uncertainty-heavy summariesdust-rose: #F2E9E6, #D8C3BC, #B88C8C only as a secondary accent when an extra comparison is necessaryfigure-polish/SKILL.md and complete its render-inspect-revise pass before treating the figure as final.The analysis-campaign stage exists to test the strength, boundaries, and failure modes of a result. It preserves the core old DeepScientist analysis-experimenter discipline:
The campaign should behave like a disciplined evidence program, not an unstructured pile of extra runs.
For campaign prioritization and writing-facing slice design, read references/campaign-design.md.
When the campaign is paper-facing and the mapping fields are not obvious, also read references/writing-facing-slice-examples.md.
Treat this as the compressed campaign map. The authoritative slice protocol and aggregation rules remain in Workflow.
paper/paper_experiment_matrix.* before freezing the slice frontier.PLAN.md and CHECKLIST.md.PLAN.md as the durable charter and CHECKLIST.md as the living execution surface while launching, monitoring, recording, and aggregating slices.1-2 sentence summary that says whether the claim gained stable support, partial support, contradiction, or unresolved ambiguity, what the matrix frontier now looks like, and what happens next.research_question and experimental_design from that outline.paper/paper_experiment_matrix.md when it exists.exp_id, not just a free-form note.section_id, item_id, claim_links, and paper_role; otherwise the slice is not paper-ready.Before launching a campaign, confirm:
active_baseline_metric_contract_json, read that JSON file before defining slice success criteria or comparison tablesactive_baseline_metric_contract_json as the default baseline comparison contract unless a slice is explicitly testing a different evaluation contractIf the question list is fuzzy, sharpen it before running anything.
Treat quest files, attached user assets, checkpoints, configs, extracted texts, baselines, and existing code paths as the first-choice asset pool.
Do not design slices around hypothetical resources that the current system cannot actually access or run.
If a slice cannot be executed with the current system, redesign it around available assets or explicitly report that the task cannot currently be completed.
If infeasibility appears mid-run, attempt bounded recovery first; if still blocked, record the slice with a non-success status and explain why.
If ids, active refs, or current quest state are unclear after restart, call artifact.get_quest_state(detail='summary') and artifact.resolve_runtime_refs(...) before launching or recording slices.
If the exact quest brief / plan / status wording matters for campaign scope, call artifact.read_quest_documents(...).
If earlier user instructions materially affect campaign scope or ordering, call artifact.get_conversation_context(...) before changing the slice set.
For concrete paper-facing cases:
main_required / main_textappendixreference_only with a reasonresult_table row is updatedpaper/evidence_ledger.json reflects the new mappingDo not leave a slice "completed" while the paper contract still looks stale.
Before launching any real campaign slice, create a quest-visible PLAN.md and CHECKLIST.md.
references/campaign-plan-template.md as the canonical structure for PLAN.md.references/campaign-checklist-template.md as the canonical structure for CHECKLIST.md.PLAN.md is the durable campaign charter and should cover the claim under test, slice table, comparability boundary, available assets, required comparators, smoke and main-run strategy, monitoring and sleep rules, reporting expectations, and a revision log.CHECKLIST.md is the living campaign execution list; update it during launch, asset preparation, slice execution, aggregation, and route changes.PLAN.md before continuing.PLAN.md and CHECKLIST.md should be the canonical campaign-control surface during execution.Use:
active_baseline_metric_contract_json when availablebash_exec session ids and managed shell logs for campaign runsDo not summarize a campaign from impressions alone.
A campaign should usually leave behind:
paper/paper_experiment_matrix.mdpaper/paper_experiment_matrix.jsonbaselines/local/<baseline_id>/ or attached under baselines/imported/<baseline_id>/artifacts/baselines/analysis_inventory.jsonIn the current runtime, represent that with existing artifact actions only:
decision artifact with action='launch_analysis_campaign'reportrun artifact per sliceprogress artifacts during executionreportdecisionBefore launching any slice, record the campaign start through artifacts:
decision artifact with:
action='launch_analysis_campaign'campaign_idparent_run_id or parent_idea_idreport with the planned slice listplan.md if the campaign materially changes the quest pathDo not start a multi-slice campaign from chat-only intent.
Do not start it from chat-only intent plus vague notes either: write PLAN.md and CHECKLIST.md first, using references/campaign-plan-template.md and references/campaign-checklist-template.md as the default structures.
After the charter and launch decision are durably recorded, send one threaded artifact.interact(kind='milestone', ...) update naming:
If the campaign exists to support a paper or paper-like report:
write or decision first so the outline can be created and selected durablypaper/paper_experiment_matrix.md when it is missing or staletodo_items alonemain_requiredmain_optionalartifact.create_analysis_campaign(...) with:
selected_outline_refresearch_questionsexperimental_designstodo_itemsexp_idtodo_idslice_idtitleresearch_questionexperimental_designtierpaper_placementcompletion_conditionFor writing-facing campaigns, every slice should also carry paper-contract identity, not just free-form text:
section_iditem_idclaim_linkspaper_roleDo not treat a completed analysis slice as paper-ready until those fields exist and the slice is mappable back into the selected outline or paper experiment matrix.
Use references/writing-facing-slice-examples.md when the correct field values are not obvious.
This keeps the analysis campaign aligned with the paper plan instead of becoming a free-floating batch of slices.
State:
The charter should also include:
Prefer to keep this charter in PLAN.md first and mirror the execution frontier in CHECKLIST.md.
For each analysis question, also state:
If there are many possible slices, order them by decision value:
Do not spend half the campaign budget on secondary slices before the claim-critical ones run.
When the parent line is still below solid evidence quality, use the campaign first to move it from minimum to solid before chasing broader polish.
Each analysis run should correspond to one need, such as:
Avoid changing many factors at once unless the campaign is explicitly exploratory.
For each slice, define at minimum:
required_baselines when the slice depends on an extra comparator that is not yet available in the questRecommended extra per-slice fields:
exp_idslice_idrun_kindslice_class, such as auxiliary, claim-carrying, or supportingtier, such as main_required, main_optional, appendix, or optionalpaper_placementhighlight_idsrequired_baselines, where each item records at least baseline_id plus the reason, benchmark, and split when knownIf a slice needs an extra comparator baseline:
baselines/local/<baseline_id>/ unless it is attached under baselines/imported/<baseline_id>/analysis_plan.md, setup.md, execution.md, and verification.mdrecord_analysis_slice(..., comparison_baselines=[...]) with its baseline_id, path, benchmark/split, and metrics summaryparent_run_idRecommended run_kind naming in the current runtime:
analysis.ablationanalysis.robustnessanalysis.sensitivityanalysis.erroranalysis.efficiencyanalysis.environmentCreate the campaign with artifact.create_analysis_campaign(...) before starting any slice.
Even one extra experiment should still be represented as a one-slice campaign so Git and Canvas show a real child node.
Branch that campaign from the current workspace/result node rather than mutating the completed parent node in place.
That tool should receive the full slice list, and each returned slice worktree becomes the required execution location for that slice.
Only create the campaign after you have verified that the listed slices are actually executable with the current quest assets and runtime.
When the campaign is writing-facing, the same call should also carry selected_outline_ref, research_questions, experimental_designs, and todo_items.
If ids or refs are unclear, recover them first with artifact.resolve_runtime_refs(...), artifact.get_analysis_campaign(...), or artifact.list_paper_outlines(...) instead of guessing.
Treat campaign_id as system-owned, and treat slice_id / todo_id as agent-authored semantic ids.
Do not replace the normal campaign flow with repeated manual artifact.prepare_branch(...) calls.
After each slice finishes, call artifact.record_analysis_slice(...) immediately so the result is mirrored back to the parent branch and the next slice can be activated.
If a slice fails or becomes infeasible, still call artifact.record_analysis_slice(...) with an honest non-success status plus the real blocker and next recommendation; do not leave the campaign state ambiguous.
After every completed, excluded, or blocked writing-facing slice:
paper/paper_experiment_matrix.mdDo not keep launching writing-facing slices from stale memory when the matrix has changed.
For slice recording, deviations and evidence_paths are optional context fields, not mandatory ceremony; include them only when they materially help explanation or auditability.
Each artifact.record_analysis_slice(...) call should also include an evaluation_summary with exactly these six fields:
takeawayclaim_updatebaseline_relationcomparabilityfailure_modenext_actionUse those six fields to keep each slice readable at a glance from Canvas, stage tabs, review, and rebuttal. The longer prose still matters, but the six-field summary is the stable routing summary.
For writing-facing campaigns, prefer running claim-carrying slices before supporting slices unless an auxiliary check is required to make the main slice interpretable.
For slices that run longer than a quick smoke check:
bash_exec(mode='detach', ...) and normally leave timeout_seconds unset for that long runbash_exec(mode='read', id=...) returns the full rendered log when it is 2000 lines or fewer; for longer logs it returns the first 500 lines plus the last 1500 lines and a hint to inspect omitted sections with start and tailbash_exec(mode='read', id=..., start=..., tail=...)bash_exec(mode='list') and bash_exec(mode='read', id=..., tail_limit=..., order='desc')bash_exec(mode='read', id=..., after_seq=last_seen_seq, tail_limit=..., order='asc') for incremental monitoringbash_exec(mode='history')comment such as {stage, goal, action, expected_signal, next_check}silent_seconds, progress_age_seconds, signal_age_seconds, and watchdog_overdue from bash_exec(mode='list'|'read', ...) as the default stall checks60s, 120s, 300s, 600s, 1800s, then every 1800s while still runningbash_exec(command='sleep 60', mode='await', timeout_seconds=70) or bash_exec(mode='await', id=..., timeout_seconds=...) between checksbash_exec(command='sleep N', mode='await', timeout_seconds=N+buffer, ...)timeout_seconds exactly equal to Nbash_exec(mode='await', id=..., timeout_seconds=...) instead of starting a new sleep commandartifact.interact(kind='progress', ...) so the user sees the newest real stateartifact.interact(kind='progress', ...) update if the user-visible state materially changedbash_exec(mode='kill', id=..., wait=true, timeout_seconds=...) if the slice is invalid, wedged, or superseded; add force=true when immediate termination is requiredtqdm progress reporter and, when feasible, pair it with concise __DS_PROGRESS__ lines carrying phase and ETAComparability rules:
active_baseline_metric_contract_json exists, keep slice comparisons aligned with it unless the slice explicitly records why it differsFor code-modifying slices, the default durable layout should stay interpretable:
.ds/worktrees/<slice_id>/ when isolated worktrees are usedexperiments/analysis/<campaign_id>/<slice_id>/artifacts/runs/<artifact_id>.jsonartifacts/reports/<artifact_id>.jsonIf the variation itself changes the evaluation setup, record that explicitly and do not present the run as a direct apples-to-apples comparison.
Before a long slice starts, emit a progress artifact or artifact.interact(kind='progress', ...) update so the quest shows that the slice is active.
For each run, record:
Preferred per-slice summary shape:
Each completed slice should also leave a run artifact containing at least:
campaign_idslice_idrun_kindparent_run_idanalysis_questionfixed_conditionschanged_factorsmetrics_summarymetric_deltassuccess_criteriaabandonment_criteriaverdictreasonpathsIf a slice fails before producing evidence, still record it as a failed or partial run artifact rather than silently skipping it.
When a slice materially changes the recommended route or weakens the main claim, do not wait until the final synthesis to mention it.
Send a threaded artifact.interact(kind='milestone', ...) update at that point with the new boundary or risk.
The campaign report should explain:
Campaign reporting rules:
When there are many slices, summarize the top 3-5 most important ones first, then point to the full evidence paths.
The aggregated report should also answer:
When the aggregated campaign report is complete, send a richer threaded artifact.interact(kind='milestone', ...) update.
Lead that milestone with a concise 1-2 sentence campaign outcome summary before expanding into slice-level detail.
If QQ milestone media is enabled and the aggregated report materially changes the claim boundary, you may attach one campaign summary PNG to that closing milestone update. That update should explicitly classify the campaign outcome in the same language as the report:
A campaign should end with an explicit next move:
experimentwriteRecord the post-campaign route as a decision artifact.
When helpful, include a reflection block with:
what_workedwhat_failedlearned_constraintsand a next_direction block that states:
This makes the next stage executable without guesswork.
Good campaign behavior:
Strong campaign ordering usually looks like:
The exact order can vary, but the most claim-relevant evidence should appear first.
Weak campaign behavior:
Stage-start requirement:
memory.list_recent(scope='quest', limit=5)memory.search(...) before launching or resuming slicescampaign_id, parent_run_id, idea_id, or branch instead of mixing unrelated slice memoryWrite to memory only when the campaign yields reusable lessons, such as:
Stage-end requirement:
memory.write(...) before leaving the stageThe campaign’s main record belongs in run artifacts and the aggregated report.
When synthesizing the campaign, read the per-slice evaluation_summary fields first, then expand into longer evidence only where the short summaries are still ambiguous.
Typical artifact sequence:
Record blocked or failed campaign states explicitly, such as:
A blocked campaign should still name the next best action.
Exit the analysis-campaign stage once one of the following is durably true:
experiment or idea