一键导入
data-validation
Implement comprehensive data validation, quality checks, and testing frameworks. Ensure data integrity and reliability across pipelines.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Implement comprehensive data validation, quality checks, and testing frameworks. Ensure data integrity and reliability across pipelines.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Interactive onboarding workflow that interviews users to understand their coding goals and generates PR-ready implementation plans. Use when starting a new development task to ensure clear requirements and structured execution.
Implement security best practices for Gamma integration. Use when securing API keys, implementing access controls, or auditing Gamma security configuration. Trigger with phrases like "gamma security", "gamma API key security", "gamma secure", "gamma credentials", "gamma access control".
Write effective technical documentation including READMEs, API docs, architecture decisions, and inline code documentation.
Build and manage CI/CD pipelines with Azure DevOps. Configure builds, releases, and automate software delivery workflows.
Develop, deploy, and manage Azure Functions for serverless computing. Supports HTTP triggers, timers, queues, and event-driven architectures.
Manage Azure resources effectively using CLI, Portal, Bicep, and ARM templates. Use for provisioning, organizing, and maintaining cloud infrastructure.
| name | data-validation |
| description | Implement comprehensive data validation, quality checks, and testing frameworks. Ensure data integrity and reliability across pipelines. |
| triggers | ["/data validation","/data quality"] |
This skill provides comprehensive approaches to validating data quality, implementing testing frameworks, and ensuring data reliability throughout your pipelines.
Use this skill when you need to:
Schema Validation
import pandera as pa
from pandera import Column, Check
schema = pa.DataFrameSchema({
"id": Column(int, nullable=False),
"email": Column(str, Check.str_matches(r"^[^@]+@[^@]+$"), nullable=False),
"age": Column(int, Check.greater_than(0), Check.less_than(150), nullable=True),
"created_at": Column(pa.DateTime, nullable=False)
})
validated_df = schema.validate(df)
Content Validation
Statistical Validation
Great Expectations
import great_expectations as gx
context = gx.get_context()
suite = context.add_expectation_suite("my_suite")
# Add expectations
validator = context.get_validator(
batch_request=batch_request,
expectation_suite=suite
)
validator.expect_column_values_to_not_be_null("id")
validator.expect_column_values_to_be_between("age", 0, 150)
validator.expect_column_values_to_match_regex("email", r"^[^@]+@[^@]+$")
validator.expect_column_mean_to_be_between("salary", 50000, 150000)
validator.save_expectation_suite()
Soda Core
# checks.yml
checks for dataset:
- row_count > 0
- duplicate_count(id) = 0
- missing_count(email) < 5
- invalid_percent(email) < 1%:
valid format: email
- avg(age) between 18 and 100
Pre-Load Validation
Post-Transform Validation
Continuous Monitoring
Completeness
Accuracy
Consistency
Timeliness
Validation Layers
Source → Schema Validation → Business Rules → Statistical Checks → Target
↓ ↓ ↓
Reject Bad Data Flag Warnings Alert Anomalies
Error Handling
Testing
Metrics to Track
Alert Conditions
See the examples/ directory for:
great-expectations-suite.py - Complete GE suite setuppandera-schema.py - Runtime schema validationcustom-validator.py - Building custom validation frameworkquality-dashboard.py - Quality metrics visualization