con un clic
data-profiling
Data profiling and schema analysis workflow. Use when user wants to understand data structure, quality, distributions. Triggers: profiling, schema, 資料品質, data quality, describe, 看資料, overview, 概況.
Menú
Data profiling and schema analysis workflow. Use when user wants to understand data structure, quality, distributions. Triggers: profiling, schema, 資料品質, data quality, describe, 看資料, overview, 概況.
Generate comprehensive test suites including static analysis, unit tests, integration tests, E2E tests, and coverage reports. Triggers: TG, test, 測試, 寫測試, coverage, 覆蓋率, pytest, unittest, 驗證.
Codex drawing harness integration. Triggers: 繪圖, draw, figure, chart, plot, mermaid, SVG, Gemini, graph, 生成圖表.
13-Phase Auditable EDA Pipeline orchestration. Use when user wants to explore data, run analysis, or asks about the pipeline. Triggers: EDA, 資料探索, 分析資料, pipeline, 13-phase, analyze, 探索.
13-Phase Auditable EDA Pipeline orchestration. Use when user wants to explore data, run analysis, or asks about the pipeline. Triggers: EDA, 資料探索, 分析資料, pipeline, 13-phase, analyze, 探索.
Phase 10 report assembly and export workflow. Use when user wants to generate, review, or export the EDA report. Triggers: 報告, report, assemble, export, 產出報告, 匯出.
13-Phase Auditable EDA Pipeline orchestration. Use when user wants to explore data, run analysis, or asks about the pipeline. Triggers: EDA, 資料探索, 分析資料, pipeline, 13-phase, analyze, 探索.
| name | data-profiling |
| description | Data profiling and schema analysis workflow. Use when user wants to understand data structure, quality, distributions. Triggers: profiling, schema, 資料品質, data quality, describe, 看資料, overview, 概況. |
Phase 2 的資料 profiling 工作流,包括型別推論、品質評估、統計前提檢查。
load_dataset(filepath)
→ 自動推論型別、識別 PII
build_schema()
→ schema.json: 變數名稱、型別、基礎統計
profile_dataset(dataset_id)
→ 嘗試 ydata-profiling
→ 如不可用 → 自動降級為 basic-fallback engine
assess_quality(dataset_id)
→ quality_report.json: 品質問題 + 嚴重度
| Constraint | 檢查內容 | 動作 |
|---|---|---|
| S-001 | 各數值變數的常態性 | Shapiro-Wilk / K-S |
| S-004 | 偏態分佈 | 建議 log/sqrt 轉換 |
| S-005 | 缺失模式 | MCAR/MAR/MNAR 判斷 |
| S-006 | 極端值 | skewness/kurtosis 評估 |
| S-007 | 多重共線性 | VIF 計算 |
| H-004 | PII 偵測 | 自動標記可疑變數 |
每段分析都附加 Agent 建議:
📊 變數 `age`
- 類型: continuous
- 缺失: 3.2%
- 常態性: Shapiro-Wilk p=0.034 (非常態)
💡 [S-001] 建議使用無母數檢定
💡 [S-004] 偏態 1.23 → 考慮 log 轉換