mit einem Klick
datachain-jobs
// Use when asked about Studio job analytics — compute hours, user spend, failure rates, cost estimation, cluster usage. Generates and maintains dc-knowledge/jobs/index.md.
// Use when asked about Studio job analytics — compute hours, user spend, failure rates, cost estimation, cluster usage. Generates and maintains dc-knowledge/jobs/index.md.
Use ONLY for abstract DataChain SDK questions — API usage, method signatures, or code patterns — when no specific dataset or bucket is referenced. If the request mentions creating, saving, listing, exploring datasets or buckets, use datachain-knowledge instead.
Use whenever datasets, cloud storage buckets, or data pipelines are mentioned — creating, saving, querying, listing, exploring, deleting, or processing data in S3, GCS, Azure Blob, or local storage. Also use when running any script that may create datasets as a side effect. Maintains a knowledge base at dc-knowledge/ (JSON + markdown). ALWAYS use this skill when the user creates a dataset, saves pipeline output, runs a data script, or references any storage bucket.
| name | datachain-jobs |
| description | Use when asked about Studio job analytics — compute hours, user spend, failure rates, cost estimation, cluster usage. Generates and maintains dc-knowledge/jobs/index.md. |
| triggers | ["how many hours","compute time","who ran jobs","failed jobs","job cost","cluster usage","studio jobs","job analytics","job history","how much did we spend"] |
You are now loaded with the datachain-jobs skill. Maintain a jobs analytics file at dc-knowledge/jobs/index.md. Follow the 3-step flow below exactly.
python3 {skill_dir}/scripts/jobs.py --plan
"studio_available": false → report the error message and stop."up_to_date": true → skip to Step 3."up_to_date": false → continue to Step 2.python3 {skill_dir}/scripts/jobs.py --fetch [--days N] [--limit N] [--enrich]
--days N from the user's request if stated (e.g. "last 7 days" → --days 7). Default: --days 30.--enrich only when the question requires duration, workers, or cluster data AND enriched: false in an existing index — tell the user it makes one API call per terminal job.Write dc-knowledge/jobs/index.md using EXACTLY this format:
---
generated: <generated from script output>
days_covered: <days_covered>
total_jobs: <filtered_count>
failed_count: <failed_count>
complete_count: <complete_count>
running_count: <running_count>
other_count: <other_count>
enriched: <true|false>
duration_note: "Wall-clock duration (submit→finish). Null when enriched=false or job still running."
truncated: <true|false>
---
## Clusters
| Name | Cloud | Max Workers | Default |
|------|-------|-------------|---------|
| <name> | <cloud_provider> | <max_workers> | <yes if is_default else no> |
## Jobs
| Date | ID | Name | Status | User | Workers | Duration | Cluster | Python |
|------|----|------|--------|------|---------|----------|---------|--------|
| <created_display> | <id> | <name> | <status> | <created_by> | <workers> | <duration_str or —> | <cluster_name or —> | <python_version or —> |
Section rules:
## Clusters if the clusters array is empty.duration_str value (e.g. "9000s") when known, — when null.workers field, defaults to 1).— when null.created_display (YYYY-MM-DD HH:MM UTC).truncated: true, add after the table: _(Results truncated at <limit> jobs. Use --limit N for more.)_Read dc-knowledge/jobs/index.md and answer the user's question.
Duration cells contain plain seconds strings like "9000s". Parse the integer before s, sum, then convert:
— (enriched: false) → say: "Duration data requires enrichment. Re-fetch with: python3 {skill_dir}/scripts/jobs.py --fetch --enrich" and offer to do so.failed_count / total_jobs * 100 from frontmatter.failed in the table.When the user asks for cost:
3.20 for $3.20/hr)"— → ask: "How many workers per job?" or compute single-worker cost and note it.duration_seconds / 3600 × rate × workers. Group by user/day/cluster as requested.Filter the Jobs table by the Cluster or User column. Aggregate (Ns) Duration values for totals.