원클릭으로 Manus에서 모든 스킬 실행

$pwd:

spark-optimization

Name: Spark Optimization
Author: wshobson

// Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

Manus에서 실행

$ git log --oneline --stat

stars:36,165

forks:3,920

updated:2026년 5월 22일 12:18

파일 탐색기

2 개 파일

SKILL.md

readonly

related-skills.json

같은 저장소

wcag-audit-patterns.md

from "wshobson/agents"

Conduct WCAG 2.2 accessibility audits with automated testing, manual verification, and remediation guidance. Use when auditing websites for accessibility, fixing WCAG violations, or implementing accessible design patterns.

2026-05-2236.2k

fastapi-templates.md

from "wshobson/agents"

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

2026-05-2236.2k

api-design-principles.md

from "wshobson/agents"

Master REST and GraphQL API design principles to build intuitive, scalable, and maintainable APIs that delight developers. Use when designing new APIs, reviewing API specifications, or establishing API design standards.

2026-05-2236.2k

architecture-patterns.md

from "wshobson/agents"

Implement proven backend architecture patterns including Clean Architecture, Hexagonal Architecture, and Domain-Driven Design. Use this skill when designing clean architecture for a new microservice, when refactoring a monolith to use bounded contexts, when implementing hexagonal or onion architecture patterns, or when debugging dependency cycles between application layers.

2026-05-2236.2k

cqrs-implementation.md

from "wshobson/agents"

Implement Command Query Responsibility Segregation for scalable architectures. Use when separating read and write models, optimizing query performance, or building event-sourced systems.

2026-05-2236.2k

event-store-design.md

from "wshobson/agents"

Design and implement event stores for event-sourced systems. Use when building event sourcing infrastructure, choosing event store technologies, or implementing event persistence patterns.

2026-05-2236.2k

package.json

"author": "wshobson"

"repository": "wshobson/agents"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

데이터 과학자컴퓨터 및 수학직15-2051L4

name	spark-optimization
description	Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

Apache Spark Optimization

Production patterns for optimizing Apache Spark jobs including partitioning strategies, memory management, shuffle optimization, and performance tuning.

When to Use This Skill

Optimizing slow Spark jobs
Tuning memory and executor configuration
Implementing efficient partitioning strategies
Debugging Spark performance issues
Scaling Spark pipelines for large datasets
Reducing shuffle and data skew

Core Concepts

1. Spark Execution Model

Driver Program
    ↓
Job (triggered by action)
    ↓
Stages (separated by shuffles)
    ↓
Tasks (one per partition)

2. Key Performance Factors

Factor	Impact	Solution
Shuffle	Network I/O, disk I/O	Minimize wide transformations
Data Skew	Uneven task duration	Salting, broadcast joins
Serialization	CPU overhead	Use Kryo, columnar formats
Memory	GC pressure, spills	Tune executor memory
Partitions	Parallelism	Right-size partitions

Quick Start

from pyspark.sql import SparkSession
from pyspark.sql import functions as F

# Create optimized Spark session
spark = (SparkSession.builder
    .appName("OptimizedJob")
    .config("spark.sql.adaptive.enabled", "true")
    .config("spark.sql.adaptive.coalescePartitions.enabled", "true")
    .config("spark.sql.adaptive.skewJoin.enabled", "true")
    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    .config("spark.sql.shuffle.partitions", "200")
    .getOrCreate())

# Read with optimized settings
df = (spark.read
    .format("parquet")
    .option("mergeSchema", "false")
    .load("s3://bucket/data/"))

# Efficient transformations
result = (df
    .filter(F.col("date") >= "2024-01-01")
    .select("id", "amount", "category")
    .groupBy("category")
    .agg(F.sum("amount").alias("total")))

result.write.mode("overwrite").parquet("s3://bucket/output/")

Detailed patterns and worked examples

Detailed pattern documentation lives in references/details.md. Read that file when the navigation tier above is insufficient.

Best Practices

Do's

Enable AQE - Adaptive query execution handles many issues
Use Parquet/Delta - Columnar formats with compression
Broadcast small tables - Avoid shuffle for small joins
Monitor Spark UI - Check for skew, spills, GC
Right-size partitions - 128MB - 256MB per partition

Don'ts

Don't collect large data - Keep data distributed
Don't use UDFs unnecessarily - Use built-in functions
Don't over-cache - Memory is limited
Don't ignore data skew - It dominates job time
Don't use .count() for existence - Use .take(1) or .isEmpty()

spark-optimization

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Apache Spark Optimization

When to Use This Skill

Core Concepts

1. Spark Execution Model

2. Key Performance Factors

Quick Start

Detailed patterns and worked examples

Best Practices

Do's

Don'ts

Apache Spark Optimization

When to Use This Skill

Core Concepts

1. Spark Execution Model

2. Key Performance Factors

Quick Start

Detailed patterns and worked examples

Best Practices

Do's

Don'ts