with one click
pyspark-databricks
Build and optimize PySpark pipelines on Databricks.
Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.
Menu
Build and optimize PySpark pipelines on Databricks.
Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.
Based on SOC occupation classification
| name | pyspark-databricks |
| description | Build and optimize PySpark pipelines on Databricks. |
Use when you need help authoring or tuning PySpark on Databricks. Ask clarifying questions about data volume, schema, and SLAs.
collect on large datafrom pyspark.sql import functions as F
events = spark.read.format("parquet").load("/mnt/raw/events/")
users = spark.read.option("header", "true").csv("/mnt/raw/users.csv")
result = (
events.join(users, "user_id")
.groupBy("country")
.agg(F.count("*").alias("events"))
)
(result.write.format("delta")
.mode("overwrite")
.partitionBy("country")
.save("/mnt/delta/event_counts"))
## PySpark Pipeline Update
### Summary
- [What changed]
- [Optimization or reliability gain]
### Code
```python
[PySpark code]
Use when implementing code changes, bug fixes, refactors, or multi-step edits that may sprawl; keeps work split into atomic, independently verifiable changes.
Helm chart development patterns for packaging and deploying Kubernetes applications. Use when creating reusable Helm charts, managing multi-environment deployments, or building application catalogs for Kubernetes.
Generate commit messages following conventional commit format.
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Build GitLab CI/CD pipelines with multi-stage workflows, caching, and distributed runners for scalable automation. Use when implementing GitLab CI/CD, optimizing pipeline performance, or setting up automated testing and deployment.
Use when building Go-based Kubernetes agents/controllers, reconcile loops, or cloud-native systems. Invoke for controller-runtime, CRDs, leader election, and Go concurrency.