원클릭으로 Manus에서 모든 스킬 실행

pandera

스타10

포크2

업데이트2026년 2월 25일 18:18

This skill should be used when the user asks to "validate a DataFrame with pandera", "write a pandera schema", "use pandera DataFrameModel", "add data validation to a pipeline", or needs guidance on pandera best practices for data quality.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

the-perfect-developer

the-perfect-developer/the-perfect-opencode

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

관련 직업SOC

SOC 직업 분류 기준

데이터 과학자컴퓨터 및 수학직·SOC 15-2051

파일 탐색기

3 개 파일

SKILL.md

readonly

이 저장소의 다른 Skills

같은 저장소

seo-best-practices

the-perfect-developer/the-perfect-opencode

This skill should be used when the user asks to "optimize a website for SEO", "improve search engine rankings", "apply SEO best practices", "do on-page SEO", or needs guidance on technical SEO, keyword research, content optimization, or link building strategies.

2026-04-2810

copilot-sdk

the-perfect-developer/the-perfect-opencode

This skill should be used when the user asks to "integrate GitHub Copilot into an app", "use the Copilot SDK", "build a Copilot-powered agent", "embed Copilot in a service", or needs guidance on the GitHub Copilot SDK for Python, TypeScript, Go, or .NET.

2026-04-0210

perfectcode-zen-evaluation

the-perfect-developer/the-perfect-opencode

This skill should be used when the user asks to "evaluate an implementation", "run the zen evaluation workflow", "check if the plan was properly implemented", "review implementation against a plan", or needs to assess implementation quality and surface improvement suggestions after a zen build cycle.

2026-03-1910

perfectcode-zen-ideation

the-perfect-developer/the-perfect-opencode

This skill should be used when the user asks to "ideate a feature together", "zen ideation", "let's think through this together", "help me shape this idea", "collaborative ideation session", or needs a structured framework for LLM-user co-creation where both parties actively contribute, challenge, and build toward the best possible idea.

2026-03-1910

perfectcode-zen-plan

the-perfect-developer/the-perfect-opencode

This skill should be used when the user asks to "plan a feature", "run the zen planning workflow", "consult all senior agents on a plan", "create a structured plan with agent consultation", or needs a thorough multi-agent planning phase before building anything.

2026-03-1910

perfectcode-zen-implement

the-perfect-developer/the-perfect-opencode

This skill should be used when the user asks to "implement a zen plan", "execute the zen workflow", "run parallel agent implementation", "build from an opencode plan", or needs to execute a written plan from .opencode/plans/ using parallel engineering agents with quality gates.

2026-03-1910

name	pandera
description	This skill should be used when the user asks to "validate a DataFrame with pandera", "write a pandera schema", "use pandera DataFrameModel", "add data validation to a pipeline", or needs guidance on pandera best practices for data quality.

Pandera: DataFrame Validation

Pandera is an open-source framework for validating DataFrame-like objects at runtime. Define schemas once and reuse them across pandas, polars, Dask, Modin, PySpark, and Ibis backends.

Import Convention

Since pandera v0.24.0, use the backend-specific module. Using the top-level pandera module produces a FutureWarning and will be deprecated in v0.29.0.

import pandera.pandas as pa          # pandas (recommended)
import pandera.polars as pa          # polars
from pandera.typing.pandas import DataFrame, Series, Index

Two Schema Styles

Object-based API (`DataFrameSchema`)

Suitable for dynamic schema construction or when schemas need to be built programmatically.

import pandas as pd
import pandera.pandas as pa

schema = pa.DataFrameSchema({
    "user_id": pa.Column(int, pa.Check.gt(0)),
    "email": pa.Column(str, pa.Check.str_matches(r"^[^@]+@[^@]+\.[^@]+$")),
    "score": pa.Column(float, [pa.Check.ge(0.0), pa.Check.le(1.0)]),
    "status": pa.Column(str, pa.Check.isin(["active", "inactive", "banned"])),
})

validated = schema.validate(df)

Class-based API (`DataFrameModel`) — preferred

Pydantic-style syntax with type annotations. Produces cleaner, reusable schemas that integrate with @pa.check_types.

import pandera.pandas as pa
from pandera.typing.pandas import DataFrame, Series

class UserSchema(pa.DataFrameModel):
    user_id: int = pa.Field(gt=0)
    email: str = pa.Field(str_matches=r"^[^@]+@[^@]+\.[^@]+$")
    score: float = pa.Field(ge=0.0, le=1.0)
    status: str = pa.Field(isin=["active", "inactive", "banned"])

    class Config:
        strict = True       # reject extra columns
        coerce = False      # do not silently cast types

# Validate directly
UserSchema.validate(df)

# Or via typing annotation + decorator
@pa.check_types
def process(df: DataFrame[UserSchema]) -> DataFrame[UserSchema]:
    return df

Checks

Built-in Checks (prefer these over lambdas)

pa.Check.gt(0)               # greater than
pa.Check.ge(0)               # greater than or equal
pa.Check.lt(100)             # less than
pa.Check.le(100)             # less than or equal
pa.Check.eq("value")         # equal to
pa.Check.ne("value")         # not equal to
pa.Check.isin(["a", "b"])    # membership
pa.Check.notin(["x"])        # exclusion
pa.Check.str_matches(r"^\d+$")  # regex match
pa.Check.in_range(0, 100)    # closed interval
pa.Check.str_startswith("prefix")
pa.Check.str_endswith("suffix")
pa.Check.str_length(1, 255)  # min/max string length

Custom Checks

# Vectorized (default, faster — operates on the whole Series)
pa.Check(lambda s: s.str.len() <= 255)

# Element-wise (scalar input, use only when vectorized is impractical)
pa.Check(lambda x: x > 0, element_wise=True)

# Always add an error message
pa.Check(lambda s: s > 0, error="values must be positive")

DataFrame-level Checks

schema = pa.DataFrameSchema(
    columns={...},
    checks=pa.Check(lambda df: df["end_date"] >= df["start_date"]),
)

In DataFrameModel, use @pa.dataframe_check:

class Schema(pa.DataFrameModel):
    start_date: int
    end_date: int

    @pa.dataframe_check
    @classmethod
    def end_after_start(cls, df: pd.DataFrame) -> pd.Series:
        return df["end_date"] >= df["start_date"]

Nullable and Optional Columns

# Object API: allow nulls in a column
pa.Column(float, nullable=True)

# DataFrameModel: make a column optional (may be absent)
from typing import Optional

class Schema(pa.DataFrameModel):
    required_col: Series[int]
    optional_col: Optional[Series[float]]

Coercion

Enable coercion to cast data to the declared type before validation. Use deliberately — coercion can hide upstream data issues.

# Per-column
pa.Column(int, coerce=True)

# Schema-wide via Config
class Schema(pa.DataFrameModel):
    year: int = pa.Field(gt=2000, coerce=True)

    class Config:
        coerce = True

Lazy Validation — Collect All Errors

By default pandera raises on the first error. Use lazy=True to collect all failures before raising, useful for batch reporting.

try:
    schema.validate(df, lazy=True)
except pa.errors.SchemaErrors as exc:
    print(exc.failure_cases)   # DataFrame of all failures

Decorator Integration

Integrate validation transparently into pipelines using decorators.

# DataFrameModel + check_types (recommended)
@pa.check_types
def transform(df: DataFrame[InputSchema]) -> DataFrame[OutputSchema]:
    return df.assign(revenue=df["units"] * df["price"])

# Object API: check_input / check_output
@pa.check_input(input_schema)
@pa.check_output(output_schema)
def pipeline_step(df):
    return df

# check_io: concisely specify both
@pa.check_io(raw=input_schema, out=output_schema)
def pipeline_step(raw):
    return raw

Decorators work on sync/async functions, methods, class methods, and static methods.

Schema Inheritance

Build specialized schemas from a base to avoid repetition.

class BaseEvent(pa.DataFrameModel):
    event_id: str
    timestamp: int = pa.Field(gt=0)

class ClickEvent(BaseEvent):
    url: str
    user_agent: str

    class Config:
        strict = True

Schema Persistence (YAML / Script)

Serialize and reload schemas to keep validation reproducible.

import pandera.io

# Save
pandera.io.to_yaml(schema, "./schema.yaml")

# Load
schema = pandera.io.from_yaml("./schema.yaml")

# Generate Python script
pandera.io.to_script(schema, "./schema_definition.py")

Schema Inference (Prototyping Only)

Infer a schema from existing data to bootstrap development. Always review and tighten the generated schema before using in production.

import pandera.pandas as pa

inferred = pa.infer_schema(df)
print(inferred.to_script())   # inspect then copy-edit

Dropping Invalid Rows

Use drop_invalid_rows=True on DataFrameSchema to filter out failing rows instead of raising an error. Supported on pandas and polars.

schema = pa.DataFrameSchema(
    {"score": pa.Column(float, pa.Check.ge(0))},
    drop_invalid_rows=True,
)
cleaned = schema.validate(df_with_bad_rows)

Error Handling

from pandera.errors import SchemaError, SchemaErrors

# Single error (eager validation)
try:
    schema.validate(df)
except SchemaError as exc:
    print(exc.failure_cases)   # Series/DataFrame of failures

# Multiple errors (lazy validation)
try:
    schema.validate(df, lazy=True)
except SchemaErrors as exc:
    # Structured dict with SCHEMA and DATA keys
    print(exc.error_counts)
    print(exc.failure_cases)

Key Configuration Options (`Config`)

Option	Type	Effect
`strict`	`bool`	Raise if extra columns present
`coerce`	`bool`	Cast columns to declared dtypes
`ordered`	`bool`	Require columns in declared order
`name`	`str`	Schema name shown in error messages
`add_missing_columns`	`bool`	Insert columns with default values

Best Practices

Use DataFrameModel over DataFrameSchema for new code — cleaner syntax, inheritance, and type-annotation integration.
Prefer strict=True to catch unexpected extra columns early.
Use built-in checks (Check.gt, Check.isin, etc.) over custom lambdas where possible — they produce better error messages.
Write vectorized checks (element_wise=False, the default) for performance; only use element_wise=True when the logic is truly scalar.
Always add error= messages to custom Check objects to improve debuggability.
Use lazy validation in pipelines that process large batches so all failures surface in one pass.
Never rely on inferred schemas in production — always explicitly define constraints.
Use coerce=True deliberately — set at the column level to limit scope; avoid schema-wide coercion unless certain.
Prefer raise_warning=True only for non-critical informational checks (e.g., normality tests), not for data integrity constraints.

Additional Resources

references/checks-and-validation.md — Built-in check catalog, groupby checks, wide checks, hypothesis testing
references/dataframe-models.md — Field spec, schema inheritance, MultiIndex, aliases, parsers, Polars usage

pandera

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Pandera: DataFrame Validation

Import Convention

Two Schema Styles

Object-based API (DataFrameSchema)

Class-based API (DataFrameModel) — preferred

Checks

Built-in Checks (prefer these over lambdas)

Custom Checks

DataFrame-level Checks

Nullable and Optional Columns

Coercion

Lazy Validation — Collect All Errors

Decorator Integration

Schema Inheritance

Schema Persistence (YAML / Script)

Schema Inference (Prototyping Only)

Dropping Invalid Rows

Error Handling

Key Configuration Options (Config)

Best Practices

Additional Resources

Pandera: DataFrame Validation

Import Convention

Two Schema Styles

Object-based API (DataFrameSchema)

Class-based API (DataFrameModel) — preferred

Checks

Built-in Checks (prefer these over lambdas)

Custom Checks

DataFrame-level Checks

Nullable and Optional Columns

Coercion

Lazy Validation — Collect All Errors

Decorator Integration

Schema Inheritance

Schema Persistence (YAML / Script)

Schema Inference (Prototyping Only)

Dropping Invalid Rows

Error Handling

Key Configuration Options (Config)

Best Practices

Additional Resources

Object-based API (`DataFrameSchema`)

Class-based API (`DataFrameModel`) — preferred

Key Configuration Options (`Config`)

Object-based API (`DataFrameSchema`)

Class-based API (`DataFrameModel`) — preferred

Key Configuration Options (`Config`)