| name | remote-execution |
| description | Run ado operations on remote Ray clusters using --remote execution context files. Use when the user wants to create an operation, asks about remote clusters, wants to ship local plugins or data files to a cluster, or asks about execution context YAML files. Also applies proactively when creating an operation if execution context files are present in the workspace. |
Remote Execution with ado
Execution context files
An execution context YAML configures a specific cluster and environment.
Multiple files can exist in the same repo for different clusters or
environments:
morrigan_execution.yaml # Morrigan cluster, standard env
vela_execution.yaml # Vela cluster
morrigan_vllm_dev_execution.yaml # Morrigan + vllm_performance from source
The file names are user-defined conventions. Check the repo root for any
*_execution.yaml files to discover what contexts are available.
See website/docs/getting-started/remote_run.md for full schema reference.
Proactive prompt when creating operations
When the user asks to create an operation, check the repo root for
*_execution.yaml files. If any exist, ask which (if any) they want to use
before proceeding. Do not assume remote by default — local execution is still
common.
Do not dispatch other ado commands (get, show, create space, etc.)
remotely unless the user explicitly requests it.
Prerequisites
Before dispatching to a cluster with port-forward, verify cluster login:
oc whoami
kubectl get nodes
If this fails, request user to log in first — the port-forward will
fail with a credentials error otherwise.
Project context
The active local project context is automatically forwarded to the remote job.
To work on the same project locally and remotely, use the same active context
for both — do not add a separate -c flag unless explicitly switching context:
uv run ado create space -f space.yaml
uv run ado --remote morrigan_execution.yaml create operation \
-f operation.yaml --use-latest space
Only supply -c context.yaml when you need to target a different project than
the one currently active.
Operation creation command patterns
One step — create space and operation together remotely:
uv run ado --remote execution_context.yaml create operation \
-f operation.yaml \
--with space=space.yaml
Two steps — create space locally, run operation remotely:
uv run ado create space -f space.yaml
uv run ado --remote execution_context.yaml create operation \
-f operation.yaml --use-latest space
Prefer the two-step pattern when you want the space registered in the local
metastore (e.g. for local querying or validation) before submitting.
Tips
Check if all entries under additionalFiles in the remote
execution context YAML are required for the current submission.
Comment or remove those that are not to avoid uploading
unnecessary data.
Check if the value of the wait field is suitable for
the command being executed. In general do not wait
for create operation as it can be hours long.
If you are executing get or show commands waiting is valid
as these may only take seconds to minutes.
Common Issues
file paths in YAML not valid on the remote cluster
Any file path appearing in a space, operation, or actuator configuration YAML
(e.g. mps_file, a model checkpoint, a dataset path) must satisfy both
conditions for the remote job to succeed:
- File not present on cluster: add the local path to
additionalFiles in
the execution context YAML.
- Path invalid on cluster: use a bare filename in the YAML; ado symlinks
additionalFiles entries into the Ray working dir, so my-file.gz resolves
but /Users/me/data/my-file.gz does not.
Failing either condition produces a file-not-found error at experiment runtime,
not at submission time, so the job starts successfully but measurements fail.
Pattern to follow
To avoid, if the experiment references a file use a bare filename (no path)
in the space/operation YAML. Add the absolute local path to additionalFiles;
ado symlinks it into the Ray working directory so the bare filename resolves on
the cluster.
entitySpace:
- identifier: mps_file
propertyDomain:
variableType: OPEN_CATEGORICAL_VARIABLE_TYPE
values:
- pigeon-10.mps.gz
additionalFiles:
- /absolute/local/path/to/pigeon-10.mps.gz
The same applies to actuator configuration files that reference local paths
(e.g. model weights, config files). Audit all -f files for local path
references before dispatching remotely.
Ray version mismatch
If you see Changing the ray version is not allowed, pin the Ray version in
fromPyPI to match the cluster:
fromPyPI:
- ado-core
- ray==2.52.1
- ado-ray-tune
fromSource plugin changes not reflected in remote run
Local edits to a plugin included via fromSource may not be reflected in
the remote run. Symptoms include: a fixed import error still occurring, an
added log line not appearing, or a new parameter not being present.
The most likely cause is that the wheel built for the plugin has the same
version as a wheel already cached by Ray. The default setuptools_scm local
scheme appends only the date for dirty (uncommitted) changes, so multiple dirty
builds on the same day share the same version string. Ray sees the version as
already installed and skips reinstallation.
The solution is to add local_scheme = "node-and-timestamp" to
[tool.setuptools_scm] in the plugin's pyproject.toml. This appends the git
node and a timestamp to the version, making each dirty build uniquely versioned.
See plugin development for details and
examples.