| name | complexa-sweep |
| description | Use this skill whenever the user wants to run a parameter sweep over a Proteina-Complexa design pipeline — cartesian-product hyperparameter scans, Pareto search over generation/reward/evaluation knobs, or any "compare configurations" workflow. Trigger phrases include "sweep beam width", "sweep nsteps", "hyperparameter sweep", "parameter scan", "scan beam_width and temperature", "compare configurations", "find the best generation params", "what's the optimal nsteps", "Pareto search for binder quality vs wall-clock", "complexa sweep", "tune Complexa", "ablate the reward weights", "configs/sweeps", "--sweeper", "run beam_width.yaml". This is the only skill that owns sweeper YAML authoring, cartesian-product expansion, and per-config result ranking. |
| allowed-tools | Bash, Read, Write, AskUserQuestion |
complexa-sweep
Run cartesian-product parameter sweeps over Proteina-Complexa design pipelines. Pick or author a sweeper YAML in configs/sweeps/, expand it to N inference configs with script_utils/generate_inference_configs.py, loop complexa design over those configs, then aggregate per-config success metrics into a ranked summary CSV plus a manifest.
Important: The complexa design CLI does NOT accept --sweeper directly. Sweeps are driven by script_utils/generate_inference_configs.py, which writes one configs/inference_configs/inf_{idx}_{run_name}.yaml per sweep combination. You then loop complexa design over those generated configs.
What this skill enables
- Pick an existing sweeper YAML from
configs/sweeps/ (beam_width, bb_ca_temperature, search_replicas, example).
- Author a new sweeper YAML with arbitrary dot-notation axes (cartesian product).
- Generate N inference + evaluation config pairs with
script_utils/generate_inference_configs.py.
- Loop
complexa design over the generated configs (one at a time on a single GPU, or in parallel on a multi-GPU host by sharding the config list across CUDA_VISIBLE_DEVICES).
- Walk per-config output directories and parse the analyze-step CSV from each.
- Emit
sweep_summary.csv (one row per config: axis values + success rate + mean iPAE + diversity) and sweep_manifest.json.
- Identify the best config by success rate and the Pareto frontier (wall-clock vs success).
Step 1: Pre-flight
bash .claude/skills/_shared/scripts/preflight.sh
Read ./complexa_setup/preflight.json. A sweep multiplies GPU time by the number of configs. Before launching, confirm the cost with the user:
"This sweep produces N configs × ~M minutes per config ≈ TOTAL GPU-hours. OK to proceed? (y / reduce / cancel)"
If gpu.available=false, stop — sweeps are not feasible on CPU.
Step 2: Pick the pipeline + target
Use the same dialogue as complexa-design — do not duplicate it here. See .claude/skills/complexa-design/SKILL.md Step 2 ("Pick the pipeline") and Step 3 ("Gather parameters"). Capture:
pipeline_config_name — e.g. search_binder_local_pipeline (default), search_ligand_binder_local_pipeline, search_ame_local_pipeline.
task_name — e.g. 02_PDL1, 22_DerF21, 39_7V11_LIGAND. Passed as --override generation.task_name=<task>.
run_name — short tag for output dir naming.
Step 3: Pick or author the sweeper YAML
Sweeper YAMLs live in configs/sweeps/. Each key is a dot-notation Hydra path; each value is a list. The cartesian product becomes N configs.
Canned sweepers
| File | Axis | Values | Configs |
|---|
configs/sweeps/beam_width.yaml | generation.search.beam_search.beam_width | 1, 2, 4, 8 | 4 |
configs/sweeps/bb_ca_temperature.yaml | generation.model.bb_ca.simulation_step_params.sc_scale_noise | 0.1, 0.4 | 2 |
configs/sweeps/search_replicas.yaml | generation.search.best_of_n.replicas | 1, 4, 16, 64 | 4 |
configs/sweeps/example.yaml | beam_width × nsteps | (2,4) × (200,400) | 4 |
If one matches the user's intent, use it as-is. Otherwise author a new file.
Authoring a new sweeper
Minimal multi-axis example (saved to configs/sweeps/my_sweep.yaml):
generation.search.beam_search.beam_width:
- 2
- 4
- 8
generation.args.nsteps:
- 200
- 400
Rules (from script_utils/generate_inference_configs.py:load_sweeper_file):
- Top-level mapping only. Keys are dot-notation Hydra paths.
- Values must be lists. A scalar is auto-wrapped into a single-element list (which pins a value without adding a dimension).
- Cartesian product: total configs = product of list lengths. Two 4-value axes = 16 configs; budget accordingly.
- If a key appears in both the sweeper file and an
--override, the override wins and that axis collapses.
See reference/sweep_axes.md for the full catalogue of swept keys (typical ranges, cost multipliers, what improves/regresses).
Dry-run preview before generating
Always confirm the config count first:
python script_utils/generate_inference_configs.py \
--config_name search_binder_local_pipeline \
--sweeper configs/sweeps/my_sweep.yaml \
--override generation.task_name=22_DerF21 \
--run_name my_sweep \
--dryrun
The output lists every axis + value list and prints DRY RUN — would generate N config pair(s).
Step 4: Generate configs + loop complexa design
Once the dry-run looks right, drop --dryrun to materialize inf_{idx}_{run_name}.yaml + eval_{idx}_{run_name}.yaml pairs under configs/inference_configs/ and configs/eval_configs/. Then loop complexa design over the inference configs.
python script_utils/generate_inference_configs.py \
--config_name search_binder_local_pipeline \
--sweeper configs/sweeps/my_sweep.yaml \
--override generation.task_name=22_DerF21 \
--run_name my_sweep
for cfg in configs/inference_configs/inf_*_my_sweep.yaml; do
complexa design "$cfg" || echo "FAILED: $cfg"
done
This serialises the sweep on a single GPU — be honest with the user that wall-clock = N × per-run.
Multi-GPU host (optional speed-up)
On a host with K GPUs, shard the config list K ways and launch one loop per GPU in parallel. Each complexa design call uses exactly one GPU at default gen_njobs=1 / eval_njobs=1, so pinning via CUDA_VISIBLE_DEVICES keeps them from colliding:
CONFIGS=(configs/inference_configs/inf_*_my_sweep.yaml)
N=${#CONFIGS[@]}; K=4
for gpu in $(seq 0 $((K-1))); do
(
for i in $(seq $gpu $K $((N-1))); do
CUDA_VISIBLE_DEVICES=$gpu complexa design "${CONFIGS[$i]}" \
|| echo "FAILED on GPU $gpu: ${CONFIGS[$i]}"
done
) &
done
wait
Drop gen_njobs / eval_njobs overrides into the loop only if you intentionally want each complexa design invocation to consume multiple GPUs (and you have a way to keep them out of each other's way).
Step 5: Collect results
Each swept config writes to its own ./inference/inf_{idx}_{run_name}/ directory; the analyze step writes ./evaluation_results/eval_{idx}_{run_name}/results_*.csv. After the sweep finishes:
ls -d ./inference/inf_*_my_sweep/
ls ./evaluation_results/eval_*_my_sweep/results_*.csv
For each config, parse the analyze CSV (one row per generated binder). Standard columns used for ranking: i_pae, i_plddt, sc_rmsd, binder_seq, passes_filter (bool). If passes_filter is missing, derive a success flag with the user's chosen thresholds (defaults: i_pae < 10, i_plddt > 0.7, sc_rmsd < 2.0 — confirm with user).
Step 6: Rank configs
Emit sweep_summary.csv to the run directory. One row per config:
| Column | How to compute |
|---|
config_id | The {idx} from inf_{idx}_{run_name} |
<axis_1>, <axis_2>, ... | The swept value at this combination (read from the per-config inf_*.yaml) |
n_samples | Row count in the analyze CSV |
success_rate | passes_filter.mean() |
mean_i_pae | i_pae.mean() (lower = better) |
mean_i_plddt | i_plddt.mean() (higher = better) |
diversity_score | Unique sequence count / n_samples (or use TM-score clustering if available) |
wall_clock_min | From the per-config log timestamps |
Then report:
- Best config = argmax of
success_rate. Tie-break on mean_i_pae (lower).
- Pareto frontier over (
wall_clock_min, success_rate): a config is on the frontier iff no other config is both faster AND has higher success rate.
Print the best config + the frontier to the terminal. Save the full table to sweep_summary.csv.
Step 7: Emit manifest
Capture the resolved invocation + outputs for replay. The shared helper takes a single --output-dir (it walks for CSVs and pulls the Hydra config from the run's .hydra/), so call it against the best-config's evaluation directory and pass the parent complexa design command for that run:
BEST_ID=4
python3 .claude/skills/_shared/scripts/write_manifest.py \
--output-dir ./evaluation_results/eval_${BEST_ID}_my_sweep \
--command "python script_utils/generate_inference_configs.py --config_name search_binder_local_pipeline --sweeper configs/sweeps/my_sweep.yaml --override generation.task_name=22_DerF21 --run_name my_sweep && for cfg in configs/inference_configs/inf_*_my_sweep.yaml; do complexa design \"\$cfg\"; done" \
--skill complexa-sweep \
--out ./sweep_runs/my_sweep/sweep_manifest.json
Alongside the manifest, save the ranked sweep_summary.csv from Step 6 to ./sweep_runs/my_sweep/sweep_summary.csv and surface both paths to the user.
Recommended sweep recipes
| Symptom | Sweep this | Why |
|---|
| Quality not good enough | beam_width × nsteps (use example.yaml as a starting point) | Both raise compute → quality; find the cheapest combination that lands. |
| Too slow / want speed-up | nsteps downward (e.g. [100, 200, 400]) with fixed beam_width=4 | Find the smallest nsteps that retains success rate. |
| Mode collapse / low diversity | generation.model.bb_ca.simulation_step_params.sc_scale_noise (use bb_ca_temperature.yaml) | Higher noise → more diverse backbones. |
| Reward over-fitting (high reward, bad metrics) | generation.reward_model.reward_models.af2folding.reward_weights.{i_pae, plddt} ratio | Re-balance composite reward. |
| Want statistical robustness on one config | search_replicas.yaml (best_of_n.replicas) | Same config, more samples → tighter success-rate estimate. |
| Algorithm shoot-out | generation.search.algorithm over [single-pass, best-of-n, beam-search, fk-steering] | Compare search regimes; pin everything else. |
Hardware
Total GPU-time for a sweep = N_configs × per_run_GPU_time. A single search_binder_local_pipeline run on one A100 is roughly 30–90 min (binder length + nsteps dependent). A 4-axis × 4-value sweep = 256 configs × ~60 min = ~256 GPU-hours — at that scale, plan on a multi-GPU host with the Step-4 sharding pattern.
Refer to .claude/skills/_shared/reference/hardware.md for the per-run baseline + VRAM minima.
Troubleshooting
| Symptom | Cause | Fix |
|---|
Sweeper file not found from generate_inference_configs.py | Path resolved from wrong CWD | Use a path relative to the repo root; or pass an absolute path. |
Sweeper YAML must be a mapping | Top-level YAML is a list or scalar | Rewrite as key: [v1, v2] mapping. |
No configs were generated | One of the value lists is empty [] | Sweeper file has a key: [] line — add at least one value. |
complexa design rejects --sweeper | Confusion between entrypoints | The CLI does not accept --sweeper. Use generate_inference_configs.py first, then loop complexa design over the generated configs. |
| Override silently collapses a sweep axis | --override key=v shadowed a sweep key | Drop either the override OR the matching key from the sweeper file. |
| One config in the loop fails, sweep keeps going | The || echo "FAILED…" in Step 4 swallows the error | Re-run the failed inf_*.yaml standalone; check the per-config log under ./logs/. Skip failed config_id when ranking. |
For per-axis reference (typical ranges, cost, what gets better/worse), see reference/sweep_axes.md.
For the user-facing sweep system overview (config generation, output layout), see docs/SWEEP.md.