ワンクリックでManusで任意のスキルを実行

$pwd:

databricks-jobs

Name: Databricks Jobs
Author: databricks

// Develop and deploy Lakeflow Jobs on Databricks via DABs, Python SDK, or the CLI. Use when creating data engineering jobs with notebooks, Python wheels, SQL, dbt, or pipelines. Invoke BEFORE starting implementation.

Manusで実行

$ git log --oneline --stat

stars:4

forks:5

updated:2026年5月28日 19:56

ファイルエクスプローラー

8 ファイル

SKILL.md

readonly

related-skills.json

同じリポジトリ

author-recipes-and-cookbooks.md

from "databricks/devhub"

Author and maintain DevHub templates published at `developers.databricks.com/templates`. A template is the public name for any of three internal entry kinds — atomic snippets, multi-step end-to-end walkthroughs, and full deployable example apps. Use when creating, updating, or reorganizing any template-tier content.

2026-05-294

databricks-apps.md

from "databricks/devhub"

Build apps on Databricks Apps platform. Use when asked to create dashboards, data apps, analytics tools, or visualizations. Auto-detects need for Lakebase when app stores state; evaluates data access patterns (analytics vs Lakebase synced tables) before scaffolding. Invoke BEFORE starting implementation.

2026-05-284

resource-image-generator.md

from "databricks/devhub"

Generate on-brand 16:9 placeholder preview images for DevHub resources (recipes, cookbooks, examples) when a real app screenshot is not available. Use when you need to add, regenerate, or improve a resource's previewImageLightUrl / previewImageDarkUrl. Produces a light and a dark PNG at 1920x1080 that passes `npm run verify:images`, wires the images into `src/lib/recipes/recipes.ts`, and verifies them with agent-browser.

2026-05-014

databricks-pipelines.md

from "databricks/devhub"

Develop Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) on Databricks. Use when building batch or streaming data pipelines with Python or SQL. Invoke BEFORE starting implementation.

2026-04-054

databricks-apps.md

from "databricks/devhub"

Build apps on Databricks Apps platform. Use when asked to create dashboards, data apps, analytics tools, or visualizations. Invoke BEFORE starting implementation.

2026-04-054

databricks-core.md

from "databricks/devhub"

Databricks CLI operations: auth, profiles, data exploration, and bundles. Contains up-to-date guidelines for Databricks-related CLI tasks.

2026-04-054

package.json

"author": "databricks"

"repository": "databricks/devhub"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

name	databricks-jobs
description	Develop and deploy Lakeflow Jobs on Databricks via DABs, Python SDK, or the CLI. Use when creating data engineering jobs with notebooks, Python wheels, SQL, dbt, or pipelines. Invoke BEFORE starting implementation.
compatibility	Requires databricks CLI (>= v0.292.0)
metadata	{"version":"0.2.0"}
parent	databricks-core

Lakeflow Jobs Development

FIRST: Use the parent databricks-core skill for CLI basics, authentication, profile selection, and data exploration commands.

Lakeflow Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Asset Bundles (DABs), Python SDK, or CLI.

Reference Files

Use Case	Reference File
Configure task types (notebook, Python, SQL, dbt, pipeline, JAR, run_job, for_each)	task-types.md
Set up triggers and schedules (cron, periodic, file arrival, table update, continuous)	triggers-schedules.md
Configure notifications, health rules, retries, timeouts, queues	notifications-monitoring.md
Complete worked examples (ETL, warehouse refresh, event-driven, ML training, multi-env, streaming, cross-job)	examples.md

Scaffolding a New Job Project

Use databricks bundle init with a config file to scaffold non-interactively. This creates a project in the <project_name>/ directory:

databricks bundle init default-python --config-file <(echo '{"project_name": "my_job", "include_job": "yes", "include_pipeline": "no", "include_python": "yes", "serverless": "yes"}') --profile <PROFILE> < /dev/null

project_name: letters, numbers, underscores only

After scaffolding, create CLAUDE.md and AGENTS.md in the project directory. These files are essential to provide agents with guidance on how to work with the project. Use this content:

# Declarative Automation Bundles Project

This project uses Declarative Automation Bundles (formerly Databricks Asset Bundles) for deployment.

## Prerequisites

Install the Databricks CLI (>= v0.288.0) if not already installed:
- macOS: `brew tap databricks/tap && brew install databricks`
- Linux: `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh`
- Windows: `winget install Databricks.DatabricksCLI`

Verify: `databricks -v`

## For AI Agents

Read the `databricks-core` skill for CLI basics, authentication, and deployment workflow.
Read the `databricks-jobs` skill for job-specific guidance.

If skills are not available, install them: `databricks experimental aitools install`

Project Structure

my-job-project/
├── databricks.yml              # Bundle configuration
├── resources/
│   └── my_job.job.yml          # Job definition
├── src/
│   ├── my_notebook.ipynb       # Notebook tasks
│   └── my_module/              # Python wheel package
│       ├── __init__.py
│       └── main.py
├── tests/
│   └── test_main.py
└── pyproject.toml              # Python project config (if using wheels)

Quick Start

Asset Bundles (DABs) — recommended

# resources/jobs.yml
resources:
  jobs:
    my_etl_job:
      name: "[${bundle.target}] My ETL Job"
      tasks:
        - task_key: extract
          notebook_task:
            notebook_path: ../src/notebooks/extract.py

Python SDK

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, Source

w = WorkspaceClient()

job = w.jobs.create(
    name="my-etl-job",
    tasks=[
        Task(
            task_key="extract",
            notebook_task=NotebookTask(
                notebook_path="/Workspace/Shared/etl/extract",
                source=Source.WORKSPACE,
            ),
        ),
    ],
)
print(f"Created job: {job.job_id}")

CLI

databricks jobs create --json '{
  "name": "my-etl-job",
  "tasks": [{
    "task_key": "extract",
    "notebook_task": {
      "notebook_path": "/Workspace/Shared/etl/extract",
      "source": "WORKSPACE"
    }
  }]
}'

Core Concepts

Multi-Task Workflows

Jobs support DAG-based task dependencies:

tasks:
  - task_key: extract
    notebook_task:
      notebook_path: ../src/extract.py

  - task_key: transform
    depends_on:
      - task_key: extract
    notebook_task:
      notebook_path: ../src/transform.py

  - task_key: load
    depends_on:
      - task_key: transform
    run_if: ALL_SUCCESS  # Only run if all dependencies succeed
    notebook_task:
      notebook_path: ../src/load.py

run_if conditions:

ALL_SUCCESS (default) — run when all dependencies succeed
ALL_DONE — run when all dependencies complete (success or failure)
AT_LEAST_ONE_SUCCESS — run when at least one dependency succeeds
NONE_FAILED — run when no dependencies failed
ALL_FAILED — run when all dependencies failed
AT_LEAST_ONE_FAILED — run when at least one dependency failed

Task Types Summary

Task Type	Use Case	Reference
`notebook_task`	Run notebooks	task-types.md#notebook-task
`spark_python_task`	Run Python scripts	task-types.md#spark-python-task
`python_wheel_task`	Run Python wheels	task-types.md#python-wheel-task
`sql_task`	Run SQL queries/files/dashboards/alerts	task-types.md#sql-task
`dbt_task`	Run dbt projects	task-types.md#dbt-task
`pipeline_task`	Trigger SDP (formerly DLT) pipelines	task-types.md#pipeline-task
`spark_jar_task`	Run Spark JARs	task-types.md#spark-jar-task
`run_job_task`	Trigger other jobs	task-types.md#run-job-task
`for_each_task`	Loop over inputs	task-types.md#for-each-task

Trigger Types Summary

Trigger Type	Use Case	Reference
`schedule`	Cron-based scheduling	triggers-schedules.md#cron-schedule
`trigger.periodic`	Interval-based	triggers-schedules.md#periodic-trigger
`trigger.file_arrival`	File arrival events	triggers-schedules.md#file-arrival-trigger
`trigger.table_update`	Unity Catalog table change events	triggers-schedules.md#table-update-trigger
`continuous`	Always-running jobs	triggers-schedules.md#continuous-jobs

Compute Configuration

Job Clusters (recommended)

Define reusable cluster configurations shared across tasks:

job_clusters:
  - job_cluster_key: shared_cluster
    new_cluster:
      spark_version: "15.4.x-scala2.12"
      node_type_id: "i3.xlarge"
      num_workers: 2
      spark_conf:
        spark.speculation: "true"

tasks:
  - task_key: my_task
    job_cluster_key: shared_cluster
    notebook_task:
      notebook_path: ../src/notebook.py

Autoscaling Clusters

new_cluster:
  spark_version: "15.4.x-scala2.12"
  node_type_id: "i3.xlarge"
  autoscale:
    min_workers: 2
    max_workers: 8

Existing Cluster

tasks:
  - task_key: my_task
    existing_cluster_id: "0123-456789-abcdef12"
    notebook_task:
      notebook_path: ../src/notebook.py

Serverless Compute

For notebook and Python tasks, omit cluster configuration to use serverless:

tasks:
  - task_key: serverless_task
    notebook_task:
      notebook_path: ../src/notebook.py
    # No cluster config = serverless

Job Parameters

Parameters defined at job level are passed to ALL tasks (no need to repeat per task):

parameters:
  - name: env
    default: "dev"
  - name: date
    default: "{{start_date}}"  # Dynamic value reference

Access in notebooks:

catalog = dbutils.widgets.get("env")
load_date = dbutils.widgets.get("date")

Pass to specific tasks:

tasks:
  - task_key: my_task
    notebook_task:
      notebook_path: ../src/notebook.py
      base_parameters:
        env: "{{job.parameters.env}}"
        custom_param: "value"

Common Operations

Python SDK

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# List jobs
jobs = w.jobs.list()

# Get job details
job = w.jobs.get(job_id=12345)

# Run job now
run = w.jobs.run_now(job_id=12345)

# Run with parameters
run = w.jobs.run_now(
    job_id=12345,
    job_parameters={"env": "prod", "date": "2024-01-15"},
)

# Cancel run
w.jobs.cancel_run(run_id=run.run_id)

# Delete job
w.jobs.delete(job_id=12345)

CLI

# List jobs
databricks jobs list

# Get job details
databricks jobs get 12345

# Run job
databricks jobs run-now 12345

# Run with parameters (must use --json with job_id inside)
databricks jobs run-now --json '{"job_id": 12345, "job_parameters": {"env": "prod"}}'

# Cancel run
databricks jobs cancel-run 67890

# Delete job
databricks jobs delete 12345

Asset Bundle Operations

# Validate configuration
databricks bundle validate --profile <profile>

# Deploy to a target
databricks bundle deploy -t dev --profile <profile>

# Run a job
databricks bundle run <job_name> -t dev --profile <profile>

# Check run status
databricks jobs get-run --run-id <id> --profile <profile>

# Destroy resources
databricks bundle destroy --auto-approve

Permissions (DABs)

resources:
  jobs:
    my_job:
      name: "My Job"
      permissions:
        - level: CAN_VIEW
          group_name: "data-analysts"
        - level: CAN_MANAGE_RUN
          group_name: "data-engineers"
        - level: CAN_MANAGE
          user_name: "admin@example.com"

Permission levels:

CAN_VIEW — view job and run history
CAN_MANAGE_RUN — view, trigger, and cancel runs
CAN_MANAGE — full control including edit and delete

Unit Testing

Run unit tests locally:

uv run pytest

Development Workflow

Validate: databricks bundle validate --profile <profile>
Deploy: databricks bundle deploy -t dev --profile <profile>
Run: databricks bundle run <job_name> -t dev --profile <profile>
Check run status: databricks jobs get-run --run-id <id> --profile <profile>

Common Issues

Issue	Solution
Job cluster startup slow	Use job clusters with `job_cluster_key` for reuse across tasks
Task dependencies not working	Verify `task_key` references match exactly in `depends_on`
Schedule not triggering	Check `pause_status: UNPAUSED` and valid timezone
File arrival not detecting	Ensure path has proper permissions and uses cloud storage URL
Table update trigger missing events	Verify Unity Catalog table and proper grants
Parameter not accessible	Use `dbutils.widgets.get()` in notebooks
`admins` group error	Cannot modify admins permissions on jobs
Serverless task fails	Ensure task type supports serverless (notebook, Python)

Related Skills

databricks-dabs — DABs configuration patterns shared by jobs and pipelines
databricks-pipelines — SDP (formerly DLT) pipelines triggered by pipeline_task

databricks-jobs

このリポジトリの他の Skills

このリポジトリの他の Skills

Lakeflow Jobs Development

Reference Files

Scaffolding a New Job Project

Project Structure

Quick Start

Asset Bundles (DABs) — recommended

Python SDK

CLI

Core Concepts

Multi-Task Workflows

Task Types Summary

Trigger Types Summary

Compute Configuration

Job Clusters (recommended)

Autoscaling Clusters

Existing Cluster

Serverless Compute

Job Parameters

Common Operations

Python SDK

CLI

Asset Bundle Operations

Permissions (DABs)

Unit Testing

Development Workflow

Common Issues

Related Skills

Documentation

Lakeflow Jobs Development

Reference Files

Scaffolding a New Job Project

Project Structure

Quick Start

Asset Bundles (DABs) — recommended

Python SDK

CLI

Core Concepts

Multi-Task Workflows

Task Types Summary

Trigger Types Summary

Compute Configuration

Job Clusters (recommended)

Autoscaling Clusters

Existing Cluster

Serverless Compute

Job Parameters

Common Operations

Python SDK

CLI

Asset Bundle Operations

Permissions (DABs)

Unit Testing

Development Workflow

Common Issues

Related Skills

Documentation