com um clique
wild-v2-gpu-discovery-parallel-scheduling
// Protocol for GPU discovery and parallel run scheduling across local GPU and Slurm clusters
// Protocol for GPU discovery and parallel run scheduling across local GPU and Slurm clusters
System prompt for handling experiment alerts. Provides diagnosis guidance, GPU wrapper context, action suggestions, and structured response from allowed choices.
Default system prompt for agent chat mode. Provides identity, environment context, compute awareness, API-driven job submission, and workflow reflection.
Generates a structured experiment plan with compute-aware recommendations and saves it via the plan API endpoint.
Wraps user steering input with context signals for the model during a wild loop session
Ports new models into FastVideo with strict numerical alignment to official implementations. Use when adding a FastVideo model/pipeline, porting an official or Diffusers checkpoint, or debugging parity/alignment.
Single source of truth protocol for Wild V2 preflight, sweep/run auditability, GPU discovery, and parallel scheduling
| name | wild_v2_gpu_discovery_parallel_scheduling |
| description | Protocol for GPU discovery and parallel run scheduling across local GPU and Slurm clusters |
| category | protocol |
| variables | [] |
Use this protocol before launching experiment grids.
curl -X POST "$SERVER_URL/cluster/detect" -H "X-Auth-Token: $AUTH_TOKEN"
curl -X GET "$SERVER_URL/cluster" -H "X-Auth-Token: $AUTH_TOKEN"
curl -X GET "$SERVER_URL/wild/v2/system-health" -H "X-Auth-Token: $AUTH_TOKEN"
Use:
cluster.type (local_gpu, slurm, cpu_only, ...)cluster.gpu_countrun_summary / system-health running+queued countsgpu_count = N, target at most N concurrently running GPU jobs, unless user explicitly wants oversubscription.CUDA_VISIBLE_DEVICES=0 ...CUDA_VISIBLE_DEVICES=1 ...--gres=gpu:1, partition/account/qos as available).For grid search, create one run per configuration using API. Launch multiple configurations in the same iteration when capacity allows.
Do not serialize the entire grid one task at a time when idle capacity is available.
POST /runs (with sweep_id).auto_start=true when safe parallel capacity exists.auto_start=false and start selected runs via POST /runs/{id}/start.