Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

Loslegen

pyspark-databricks

Sterne9

Forks1

Aktualisiert12. Februar 2026 um 11:59

Build and optimize PySpark pipelines on Databricks.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

Awish021

Awish021/opencode

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

DatenwissenschaftlerInformatik- und Mathematikberufe·SOC 15-2051

SKILL.md

readonly

name	pyspark-databricks
description	Build and optimize PySpark pipelines on Databricks.

PySpark Databricks

What I do

Build PySpark ETL pipelines on Databricks
Optimize Spark jobs for performance and cost
Apply Delta Lake patterns for reliability

When to use me

Use when you need help authoring or tuning PySpark on Databricks. Ask clarifying questions about data volume, schema, and SLAs.

Quick checklist

Avoid collect on large data
Partition and prune data sources
Cache judiciously for reuse
Use Delta Lake tables
Prefer Spark SQL/DataFrame API

Minimal examples

from pyspark.sql import functions as F

events = spark.read.format("parquet").load("/mnt/raw/events/")
users = spark.read.option("header", "true").csv("/mnt/raw/users.csv")

result = (
    events.join(users, "user_id")
    .groupBy("country")
    .agg(F.count("*").alias("events"))
)

(result.write.format("delta")
 .mode("overwrite")
 .partitionBy("country")
 .save("/mnt/delta/event_counts"))

Output format

## PySpark Pipeline Update

### Summary
- [What changed]
- [Optimization or reliability gain]

### Code
```python
[PySpark code]

Notes

[Assumptions]
[Next steps]

Mehr aus diesem Repository

gleiches Repository

atomic-code-changes

Awish021/opencode

Use when implementing code changes, bug fixes, refactors, or multi-step edits that may sprawl; keeps work split into atomic, independently verifiable changes.

2026-06-199

helm-chart-patterns

Awish021/opencode

Helm chart development patterns for packaging and deploying Kubernetes applications. Use when creating reusable Helm charts, managing multi-environment deployments, or building application catalogs for Kubernetes.

2026-02-129

conventional-commits

Awish021/opencode

Generate commit messages following conventional commit format.

2026-02-129

find-skills

Awish021/opencode

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

2026-02-129

gitlab-ci-patterns

Awish021/opencode

Build GitLab CI/CD pipelines with multi-stage workflows, caching, and distributed runners for scalable automation. Use when implementing GitLab CI/CD, optimizing pipeline performance, or setting up automated testing and deployment.

2026-02-129

golang-k8s-agent

Awish021/opencode

Use when building Go-based Kubernetes agents/controllers, reconcile loops, or cloud-native systems. Invoke for controller-runtime, CRDs, leader election, and Go concurrency.

2026-02-129

name	pyspark-databricks
description	Build and optimize PySpark pipelines on Databricks.

PySpark Databricks

What I do

Build PySpark ETL pipelines on Databricks
Optimize Spark jobs for performance and cost
Apply Delta Lake patterns for reliability

When to use me

Use when you need help authoring or tuning PySpark on Databricks. Ask clarifying questions about data volume, schema, and SLAs.

Quick checklist

Avoid collect on large data
Partition and prune data sources
Cache judiciously for reuse
Use Delta Lake tables
Prefer Spark SQL/DataFrame API

Minimal examples

from pyspark.sql import functions as F

events = spark.read.format("parquet").load("/mnt/raw/events/")
users = spark.read.option("header", "true").csv("/mnt/raw/users.csv")

result = (
    events.join(users, "user_id")
    .groupBy("country")
    .agg(F.count("*").alias("events"))
)

(result.write.format("delta")
 .mode("overwrite")
 .partitionBy("country")
 .save("/mnt/delta/event_counts"))

Output format

## PySpark Pipeline Update

### Summary
- [What changed]
- [Optimization or reliability gain]

### Code
```python
[PySpark code]

Notes

[Assumptions]
[Next steps]