一键导入
multi-project-batch-isolation
Multi-project signal isolation with cascading recipe resolution
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Multi-project signal isolation with cascading recipe resolution
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Normalize long-form CODEX cycle folders to short form before notebooks run. Trigger: cyc001_reg001_*, hard-coded cyc paths breaking, staged CODEX raw data failing in Notebooks 1/2.
v5.6.0 joint multi-TF model: single model per symbol with broadcast 1Hour context replaces dual 15Min/1Hour models. Trigger: (1) replacing weighted-voting model aggregation, (2) adding broadcast features to vectorized env, (3) limited training data + worried about overfitting from doubling obs_dim, (4) backtest builder mismatch with newer feature counts.
DEPRECATED in v5.6.0 — see joint-multi-tf-v560 skill. Documents the v5.2.0 dual-model approach (train separate 15Min/1Hour models, combine via weighted voting). Still relevant for: (1) loading legacy v5.5.0 dual models, (2) understanding the historical aggregation layer, (3) resampling pattern via origin='start'.
Surface a shipped-but-undocumented CLI feature in user-facing docs. Trigger: user reports a known feature missing from README/readthedocs even though the CLI command exists.
KINTSUGI Snakefile + CLI changes that route SLURM jobs around accounts saturated by OTHER users on the same QOS pool. Trigger: QOSGrpMemLimit, jobs stuck pending despite available GPU slots in config, noisy neighbor on shared QOS, multi-user investment pool exhaustion, _build_cycle_assignment static-vs-live.
KINTSUGI SLURM batch processing: Maximize throughput using multi-account resource calculation with GPU+CPU pools per account. Trigger: SLURM job submission, batch processing, resource maximization, GPU+CPU concurrent, headless processing, resource pool.
| name | multi-project-batch-isolation |
| description | Multi-project signal isolation with cascading recipe resolution |
| author | smith6jt |
| date | "2026-02-22T00:00:00.000Z" |
| Item | Details |
|---|---|
| Date | 2026-02-22 |
| Goal | Run AF subtraction across ~20 projects with intelligent per-marker parameter sourcing |
| Environment | HiPerGator, Python 3.11, KINTSUGI conda env |
| Status | Success — 17 projects discovered, dry-run validated |
Only 1 of ~20 registered projects (CX_19-001_SP_CC2-A28) had completed signal isolation. That project had 26 hand-tuned legacy recipes and its outcomes were recorded to the parameter learning DB. The remaining projects needed AF subtraction but had no project-specific recipes. A single-project kintsugi workflow isolate run command existed but couldn't orchestrate multiple projects or source parameters from across datasets.
Key challenge: Not all markers have recipes in every project. Need a cascading fallback system that uses the best available source per marker.
# Preview all eligible projects and recipe sources
kintsugi workflow isolate batch /path/to/KINTSUGI_Projects --dry-run
# Process single project (test before full batch)
kintsugi workflow isolate batch . -d CX_19-003_spleen_CC1-A
# Full batch with learning enabled
kintsugi workflow isolate batch /blue/maigan/smith6jt/KINTSUGI_Projects --learn
# Force reprocess + custom template
kintsugi workflow isolate batch . -f --template-recipe-dir /path/to/recipes
| Tier | Source | Condition |
|---|---|---|
| 1 | Own recipes | {project}/data/processed/Processing_parameters/*_param.txt exists |
| 2 | Learned DB | ParameterLearningEngine.recommend_parameters() confidence >= 0.6 |
| 3 | Template | Default: CX_19-001's 26 recipes, match by marker name |
| 4 | Auto | select_method() picks global vs weighted per marker |
_TISSUE_PATTERNS = [
(r"(?:_SP_|spleen|splenic)", "spleen"),
(r"(?:_LN_|lymph.?node|lymph)", "lymph_node"),
(r"(?:_TH_|thymus|thymic)", "thymus"),
(r"(?:pancrea)", "pancreas"),
]
Fallback: experiment.json name field, then "unknown".
Markers with recipes (tiers 1-3) use _write_recipes_as_param_files() to serialize resolved recipes into temp param files, then call process_batch(recipe_dir=tmpdir). Markers without recipes (tier 4) call process_batch() without recipe_dir for auto-analysis. Results are merged into a single signal_isolation_manifest.json.
| Attempt | Why it Failed | Lesson Learned |
|---|---|---|
Passing resolved recipes directly to process_channel() | process_batch() handles all the plumbing (location_map, blank resolution, output dir, learning DB) | Reuse existing process_batch() — serialize recipes to temp param files instead of bypassing |
Patching kintsugi.signal.batch_multi.ParameterLearningEngine | Import is inside resolve_recipes_for_project(), not at module level | Patch kintsugi.claude.parameter_learning.ParameterLearningEngine |
| Testing template tier without mocking learned DB | Real learning DB has data from CX_19-001, so learned tier wins | Mock learning DB to return {"confidence": 0.0} in template tier tests |
Single process_batch() call with mixed recipe/auto markers | process_batch either uses recipes for all or none per call | Split into two calls: recipe markers + auto markers, then merge results |
# Discovery
config_path = project / "workflow" / "config.yaml" # Must exist
registered_dir = project / "data" / "processed" / "registered" # Must have .tif files
manifest_path = ".../signal_isolated/signal_isolation_manifest.json" # Skip if exists
# Recipe search dirs (in order)
_RECIPE_SEARCH_DIRS = [
"data/processed/Processing_parameters",
"Processing_parameters",
"notebooks/Processing_parameters",
]
# Default template project
DEFAULT_TEMPLATE_PROJECT = "CX_19-001_SP_CC2-A28"
# Learning DB confidence threshold
min_confidence = 0.6
# Processing defaults
tile_smooth_sigma = 0.0 # No blank smoothing (recipes don't need it)
method = "auto" # Per-marker: global vs weighted
learn = True # Record outcomes for cross-dataset transfer
_write_recipes_as_param_files() roundtrips cleanly — load_legacy_recipes() can read the generated filesrecipe_source field on ChannelResult tracks provenance ("own_recipe", "learned", "template", "auto")batch_isolation_report_{timestamp}.json saved to projects_dirsrc/kintsugi/signal/batch_multi.py — Core: discovery, tissue parsing, recipe resolution, orchestrationsrc/kintsugi/signal/batch.py — Single-project processing, ChannelResult.recipe_source fieldsrc/kintsugi/cli.py — @isolate_group.command("batch") CLItests/test_batch_multi_project.py — 34 tests (tissue parsing, discovery, resolution, roundtrip)src/kintsugi/signal/CLAUDE.md — Documentationbatch-signal-isolation skill (single-project recipe-driven processing)batch-orchestration-script pattern (sequential, error isolation, per-dataset filtering)claude-md-context-management for documentation placement