name	work-with-assets
description	ALWAYS use this skill when working with file assets in DerivaML — discovering, downloading, uploading, inspecting, or managing images, model weights, CSVs, or any file-based catalog records. Triggers on: 'download asset', 'upload files', 'asset table', 'find images', 'model weights', 'what created this file', 'asset provenance', 'asset types', 'create asset table'.
disable-model-invocation	true

Working with Assets in DerivaML

An asset is a file-based record in a Deriva catalog — it combines a file (stored in Deriva's object store) with catalog metadata like filename, size, MD5 checksum, and description. Assets live in asset tables, which have standard file-tracking columns plus optional custom metadata. Every asset has a unique RID for stable referencing across the system.

For background on asset tables, types, RIDs, object storage, caching, and provenance, see references/concepts.md.

Every tool below takes hostname= and catalog_id= arguments explicitly. Substitute your catalog's hostname (e.g., "data.example.org") and catalog ID (e.g., "1") wherever the examples show them.

Critical Rules

Use RIDs to reference assets — not filenames or URLs. RIDs are immutable and unique.
Upload within an execution — assets must be registered with Python API exe.asset_file_path() and uploaded with Python API exe.upload_execution_outputs() inside an active execution for provenance tracking.
Download records provenance automatically — calling Python API ml.download_asset(rid) within an execution links the asset as an "Input" to that execution.
Create the asset table before uploading — the table must exist before you can register files for upload to it.

Workflow Summary

Discovering and inspecting assets

Start with rag_search to discover asset tables and types by concept:

rag_search("image assets", doc_type="catalog-schema")
rag_search("model weights files", doc_type="catalog-schema")

Browse asset tables in one schema — read deriva://catalog/{h}/{c}/ml/assets/{schema} for a schema-scoped list of asset tables (e.g., ml/assets/deriva-ml, ml/assets/myproject). To survey across schemas, list the schemas with list_schemas (deriva-mcp-core) and read this resource per schema; there is no single all-schemas asset-table tool.
Snapshot the contents of one asset table — read deriva://catalog/{h}/{c}/ml/assets/{schema}/{asset_table} for a bounded snapshot of assets in one table (capped at the resource limit; for paginated/filtered access, use the tool below).
deriva_ml_list_assets(hostname, catalog_id, ...) — paginated, filterable browse across asset tables (use this when the snapshot resource hits its cap or when you need filters).
deriva_ml_lookup_asset(hostname, catalog_id, asset_rid) — inspect a specific asset (metadata, types, producer execution).
deriva_ml_get_execution(hostname, catalog_id, execution_rid) — find auto-generated metadata files and output assets for an execution.

Downloading assets

Python API ml.download_asset(rid) — download a single asset by RID
Python API dataset.download_dataset_bag(version) — download a dataset as a BDBag with all asset files (no execution required)
Python API exe.download_dataset_bag() — same as above but within an active execution (records the dataset as an input for provenance)
Python API bag.restructure_assets() — organize downloaded assets into ML-ready directory layouts

Creating asset tables

Use create_table (from the deriva plugin) with the standard hatrac column shape and an Asset_Type FK. See references/concepts.md for the recipe.

Uploading assets (within an execution)

deriva_ml_create_execution(hostname, catalog_id, ...) + deriva_ml_start_execution(hostname, catalog_id, execution_rid) — start provenance tracking. Capture the returned execution_rid.
Python API exe.asset_file_path() — register each output file for upload (returns a path to write to).
Python API exe.upload_execution_outputs() — upload all registered files to the object store and catalog.
deriva_ml_commit_execution(hostname, catalog_id, execution_rid) — finalize on success (use deriva_ml_abort_execution on failure).

Managing asset types

Create a new term in the Asset_Type vocabulary: add_term(hostname, catalog_id, schema="deriva-ml", table="Asset_Type", name=..., description=...).
Tag / untag an asset with a type: update_entities(hostname, catalog_id, schema, table, entities=[{"RID": asset_rid, "Asset_Type": <term>}]) to set the value, or pass null to clear it. See references/workflow.md for the full recipe.

For the full step-by-step guide with MCP tool parameters and Python API examples, see references/workflow.md.

Reference Resources

references/concepts.md — What assets are, asset tables, RIDs, types, object storage, caching, provenance, execution metadata vs execution assets, notebook output assets, and the manual recipe for creating an asset table.
references/workflow.md — Step-by-step MCP and Python API workflows, finding assets by type and execution.
references/restructure-guide.md — Restructuring assets for ML: group_by options, value selectors, file transformers, ML framework patterns, upload tuning.
rag_search("file assets in DerivaML", doc_type="user-guide") — Search the user guide for file asset documentation.
Use deriva_ml_list_assets(hostname, catalog_id, ...) and deriva_ml_lookup_asset(hostname, catalog_id, asset_rid) for the typed asset reads.
deriva://catalog/{h}/{c}/ml/assets/{schema} — schema-scoped list of asset tables.
deriva://catalog/{h}/{c}/ml/assets/{schema}/{asset_table} — snapshot of assets in one asset table (bounded; complements the paginated deriva_ml_list_assets tool).
deriva://catalog/{h}/{c}/ml/asset/{rid} — one asset by RID.
deriva_ml_get_execution(hostname, catalog_id, execution_rid) returns metadata files and output assets for an execution.

Related Skills

execution-lifecycle — Full execution lifecycle including asset upload patterns
ml-data-engineering — Downloading and restructuring assets for ML training
dataset-lifecycle — Datasets organize assets into versioned collections for reproducibility
/deriva:create-table (deriva-skills) — Generic table creation via create_table, used to build new asset tables
/deriva:load-data (deriva-skills) — Row-side loading once the asset table exists; covers both ad-hoc inserts and the production deriva-upload-cli path with upload specs (asset_mappings)
/deriva:manage-vocabulary (deriva-skills) — Generic vocabulary CRUD via add_term/delete_term, used to manage Asset_Type terms

name	work-with-assets
description	ALWAYS use this skill when working with file assets in DerivaML — discovering, downloading, uploading, inspecting, or managing images, model weights, CSVs, or any file-based catalog records. Triggers on: 'download asset', 'upload files', 'asset table', 'find images', 'model weights', 'what created this file', 'asset provenance', 'asset types', 'create asset table'.
disable-model-invocation	true

Working with Assets in DerivaML

For background on asset tables, types, RIDs, object storage, caching, and provenance, see references/concepts.md.

Every tool below takes hostname= and catalog_id= arguments explicitly. Substitute your catalog's hostname (e.g., "data.example.org") and catalog ID (e.g., "1") wherever the examples show them.

Critical Rules

Use RIDs to reference assets — not filenames or URLs. RIDs are immutable and unique.
Upload within an execution — assets must be registered with Python API exe.asset_file_path() and uploaded with Python API exe.upload_execution_outputs() inside an active execution for provenance tracking.
Download records provenance automatically — calling Python API ml.download_asset(rid) within an execution links the asset as an "Input" to that execution.
Create the asset table before uploading — the table must exist before you can register files for upload to it.

Workflow Summary

Discovering and inspecting assets

Start with rag_search to discover asset tables and types by concept:

rag_search("image assets", doc_type="catalog-schema")
rag_search("model weights files", doc_type="catalog-schema")

Browse asset tables in one schema — read deriva://catalog/{h}/{c}/ml/assets/{schema} for a schema-scoped list of asset tables (e.g., ml/assets/deriva-ml, ml/assets/myproject). To survey across schemas, list the schemas with list_schemas (deriva-mcp-core) and read this resource per schema; there is no single all-schemas asset-table tool.
Snapshot the contents of one asset table — read deriva://catalog/{h}/{c}/ml/assets/{schema}/{asset_table} for a bounded snapshot of assets in one table (capped at the resource limit; for paginated/filtered access, use the tool below).
deriva_ml_list_assets(hostname, catalog_id, ...) — paginated, filterable browse across asset tables (use this when the snapshot resource hits its cap or when you need filters).
deriva_ml_lookup_asset(hostname, catalog_id, asset_rid) — inspect a specific asset (metadata, types, producer execution).
deriva_ml_get_execution(hostname, catalog_id, execution_rid) — find auto-generated metadata files and output assets for an execution.

Downloading assets

Python API ml.download_asset(rid) — download a single asset by RID
Python API dataset.download_dataset_bag(version) — download a dataset as a BDBag with all asset files (no execution required)
Python API exe.download_dataset_bag() — same as above but within an active execution (records the dataset as an input for provenance)
Python API bag.restructure_assets() — organize downloaded assets into ML-ready directory layouts

Creating asset tables

Use create_table (from the deriva plugin) with the standard hatrac column shape and an Asset_Type FK. See references/concepts.md for the recipe.

Uploading assets (within an execution)

deriva_ml_create_execution(hostname, catalog_id, ...) + deriva_ml_start_execution(hostname, catalog_id, execution_rid) — start provenance tracking. Capture the returned execution_rid.
Python API exe.asset_file_path() — register each output file for upload (returns a path to write to).
Python API exe.upload_execution_outputs() — upload all registered files to the object store and catalog.
deriva_ml_commit_execution(hostname, catalog_id, execution_rid) — finalize on success (use deriva_ml_abort_execution on failure).

Managing asset types

Create a new term in the Asset_Type vocabulary: add_term(hostname, catalog_id, schema="deriva-ml", table="Asset_Type", name=..., description=...).
Tag / untag an asset with a type: update_entities(hostname, catalog_id, schema, table, entities=[{"RID": asset_rid, "Asset_Type": <term>}]) to set the value, or pass null to clear it. See references/workflow.md for the full recipe.

For the full step-by-step guide with MCP tool parameters and Python API examples, see references/workflow.md.

Reference Resources

references/concepts.md — What assets are, asset tables, RIDs, types, object storage, caching, provenance, execution metadata vs execution assets, notebook output assets, and the manual recipe for creating an asset table.
references/workflow.md — Step-by-step MCP and Python API workflows, finding assets by type and execution.
references/restructure-guide.md — Restructuring assets for ML: group_by options, value selectors, file transformers, ML framework patterns, upload tuning.
rag_search("file assets in DerivaML", doc_type="user-guide") — Search the user guide for file asset documentation.
Use deriva_ml_list_assets(hostname, catalog_id, ...) and deriva_ml_lookup_asset(hostname, catalog_id, asset_rid) for the typed asset reads.
deriva://catalog/{h}/{c}/ml/assets/{schema} — schema-scoped list of asset tables.
deriva://catalog/{h}/{c}/ml/assets/{schema}/{asset_table} — snapshot of assets in one asset table (bounded; complements the paginated deriva_ml_list_assets tool).
deriva://catalog/{h}/{c}/ml/asset/{rid} — one asset by RID.
deriva_ml_get_execution(hostname, catalog_id, execution_rid) returns metadata files and output assets for an execution.

Related Skills

execution-lifecycle — Full execution lifecycle including asset upload patterns
ml-data-engineering — Downloading and restructuring assets for ML training
dataset-lifecycle — Datasets organize assets into versioned collections for reproducibility
/deriva:create-table (deriva-skills) — Generic table creation via create_table, used to build new asset tables
/deriva:load-data (deriva-skills) — Row-side loading once the asset table exists; covers both ad-hoc inserts and the production deriva-upload-cli path with upload specs (asset_mappings)
/deriva:manage-vocabulary (deriva-skills) — Generic vocabulary CRUD via add_term/delete_term, used to manage Asset_Type terms

work-with-assets

Working with Assets in DerivaML

Critical Rules

Workflow Summary

Discovering and inspecting assets

Downloading assets

Creating asset tables

Uploading assets (within an execution)

Managing asset types

Reference Resources

Related Skills

Working with Assets in DerivaML

Critical Rules

Workflow Summary

Discovering and inspecting assets

Downloading assets

Creating asset tables

Uploading assets (within an execution)

Managing asset types

Reference Resources

Related Skills