Run any Skill in Manus with one click

data-quality-auditor

Stars10

Forks3

UpdatedJune 17, 2026 at 15:34

Audits data quality at source and transformation layers. Identifies missing values, duplicates, outliers, referential integrity issues, and freshness gaps. Returns data quality scorecards and remediation steps.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

UitbreidenOS

UitbreidenOS/Claudient

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

SKILL.md

readonly

More from this repository

same repository

agent-teams

UitbreidenOS/Claudient

Orchestrate multi-agent teams in Claude Code — set up coordinated sessions with task delegation, inter-agent communication, and parallel execution

2026-06-2310

ultraplan

UitbreidenOS/Claudient

Leverage Claude Code's ultra-deep planning mode for complex architecture decisions, multi-file refactors, and system design

2026-06-2310

ultrareview

UitbreidenOS/Claudient

Deep code review using Claude Code's thorough analysis mode — security, performance, correctness, and maintainability

2026-06-2310

swarm-sandbox

UitbreidenOS/Claudient

Safe isolated testing environment for multi-agent swarm topologies before production deployment

2026-06-2210

auto-summarizer

UitbreidenOS/Claudient

Automatically summarizes the current session context to prevent token window overflow.

2026-06-2210

prune-context

UitbreidenOS/Claudient

Claude Code context pruner: slash command to summarize session and reset token bloat

2026-06-1910

# Data Quality Scorecard **Table:** [Table name] **Source System:** [Source] **Warehouse:** [DW name] **Assessment Date:** [date] **Owner:** [Name] --- ## Quality Score **Overall Quality Score: [X]/100** | Dimension | Score | Status | Trend | |-----------|-------|--------|-------| | Completeness | 95 | ✓ Acceptable | ↑ Improving | | Uniqueness | 100 | ✓ Acceptable | → Stable | | Validity | 92 | ⚠ Warning | ↓ Declining | | Consistency | 87 | ⚠ Warning | ↓ Declining | | Accuracy | 98 | ✓ Acceptable | → Stable | | Timeliness | 100 | ✓ Acceptable | → Stable | | Conformity | 94 | ✓ Acceptable | → Stable | **Overall Status:** ⚠ PASS WITH WARNINGS --- ## Completeness Assessment | Column | Nulls | % Null | Expected | Status | |--------|-------|--------|----------|--------| | customer_id | 0 | 0% | 0% | ✓ Pass | | email | 150 | 0.3% | <1% | ✓ Pass | | phone | 5,000 | 10% | <5% | ⚠ Warning | | address | 3,200 | 6.4% | <5% | ⚠ Warning | | created_date | 0 | 0% | 0% | ✓ Pass | **Issues:** Phone and address have elevated NULL rates (likely optional fields; verify business requirements) --- ## Uniqueness Assessment | Column(s) | Duplicates | % Duplicate | Expected | Status | |-----------|------------|-------------|----------|--------| | customer_id (PK) | 0 | 0% | 0% | ✓ Pass | | email | 42 | 0.08% | 0% | ⚠ Warning | | phone + email | 120 | 0.24% | <0.1% | ⚠ Warning | **Issues:** 42 duplicate emails detected (e.g., john@example.com appears 2x). 120 rows with duplicate phone+email pairs. **Root Cause:** Email validation was disabled in source system for 2 weeks (June 1-14); allows duplicate signups. **Remediation:** Contact source system team; implement email uniqueness constraint; deduplicate in staging layer. --- ## Validity Assessment | Column | Data Type | Sample Values | Invalid | % Invalid | Status | |--------|-----------|---------------|---------|-----------|--------| | customer_id | INT | [1, 2, 3] | 0 | 0% | ✓ Pass | | email | VARCHAR | [user@example.com, ...] | 156 | 0.31% | ⚠ Warning | | phone | VARCHAR | [555-1234, +1.555.1234, ...] | 8,200 | 16.4% | ✗ Fail | | age | INT | [25, 32, 41, ...] | 450 | 0.9% | ⚠ Warning | **Issues:** - Email: 156 invalid formats (missing @, spaces, etc.) - Phone: 16.4% invalid formats (inconsistent formatting: some +1.555.1234, others 555-1234, some blank) - Age: 450 values outside valid range (0-120) **Root Cause:** Phone imported from multiple source systems with different formats; no normalization applied. **Remediation:** Standardize phone format in staging layer; implement regex validation; audit age values >100. --- ## Consistency Assessment ### Referential Integrity | Foreign Key | Records | Orphaned | % Orphaned | Status | |-------------|---------|----------|-----------|--------| | customer_id → dim_customers | 50M | 45K | 0.09% | ⚠ Warning | | order_id → fct_orders | 50M | 0 | 0% | ✓ Pass | | product_id → dim_products | 50M | 2,100 | 0.004% | ✓ Pass | **Issues:** 45K orphaned customer_ids in fact table (customer deleted or not yet loaded in dim). **Root Cause:** Data arrives out of order; customers arrive 12h after their orders sometimes. **Remediation:** Add late-arriving dimension handling in dbt; keep inactive customers in dimension. ### Aggregation Consistency ```sql -- Verify: SUM(transaction_amount) in warehouse = SUM(amount) in source SELECT SUM(transaction_amount) AS warehouse_total, SUM(amount) AS source_total, ABS(SUM(transaction_amount) - SUM(amount)) AS delta FROM fact_transactions LEFT JOIN source_raw_transactions USING (transaction_id) WHERE DATE(transaction_date) = CURRENT_DATE - INTERVAL 1 DAY;

Field

Matches

Mismatches

% Match

Status

order_id

100

100%

✓ Pass

customer_id

98%

⚠ Warning

amount

100

100%

✓ Pass

currency

100

100%

✓ Pass

Table

Last Refresh

Expected Refresh

Latency

SLA

Status

raw_transactions

6:15 AM (today)

6:00 AM

15 min

✓ Pass

stg_customers

6:20 AM (today)

6:00 AM

20 min

30 min

⚠ Warning

fct_orders

6:45 AM (today)

6:00 AM

45 min

✓ Pass

Rule

Compliant

Non-Compliant

% Compliant

Status

Column naming: snake_case

Yes

100%

✓ Pass

Allowed statuses

49.9M

100K

99.8%

⚠ Warning

Date format: YYYY-MM-DD

Yes

100%

✓ Pass

Statistic

Value

Expected

Status

Mean

$1,240

$1,200

→

Median

$950

$1,000

→

Std Dev

$2,450

$2,400

→

Min

$0.01

⚠ Below threshold

Max

$125K

$50K

⚠ Above threshold

Issue

Priority

Owner

ETA

Status

Duplicate emails

High

Data team

June 15

In Progress

Phone format standardization

High

Data team

June 18

Pending

Orphaned customer_ids

Medium

Analytics

June 20

Pending

stg_customers performance

Medium

Analytics

June 22

Pending

Transaction outlier investigation

Low

Finance

June 25

Not started

Field

Matches

Mismatches

% Match

Status

order_id

100

100%

✓ Pass

customer_id

98%

⚠ Warning

amount

100

100%

✓ Pass

currency

100

100%

✓ Pass

Table

Last Refresh

Expected Refresh

Latency

SLA

Status

raw_transactions

6:15 AM (today)

6:00 AM

15 min

✓ Pass

stg_customers

6:20 AM (today)

6:00 AM

20 min

30 min

⚠ Warning

fct_orders

6:45 AM (today)

6:00 AM

45 min

✓ Pass

Rule

Compliant

Non-Compliant

% Compliant

Status

Column naming: snake_case

Yes

100%

✓ Pass

Allowed statuses

49.9M

100K

99.8%

⚠ Warning

Date format: YYYY-MM-DD

Yes

100%

✓ Pass

Statistic

Value

Expected

Status

Mean

$1,240

$1,200

→

Median

$950

$1,000

→

Std Dev

$2,450

$2,400

→

Min

$0.01

⚠ Below threshold

Max

$125K

$50K

⚠ Above threshold

Issue

Priority

Owner

ETA

Status

Duplicate emails

High

Data team

June 15

In Progress

Phone format standardization

High

Data team

June 18

Pending

Orphaned customer_ids

Medium

Analytics

June 20

Pending

stg_customers performance

Medium

Analytics

June 22

Pending

Transaction outlier investigation

Low

Finance

June 25

Not started

name	data-quality-auditor
description	Audits data quality at source and transformation layers. Identifies missing values, duplicates, outliers, referential integrity issues, and freshness gaps. Returns data quality scorecards and remediation steps.
allowed-tools	Read, Write, WebFetch
effort	high

name	data-quality-auditor
description	Audits data quality at source and transformation layers. Identifies missing values, duplicates, outliers, referential integrity issues, and freshness gaps. Returns data quality scorecards and remediation steps.
allowed-tools	Read, Write, WebFetch
effort	high

data-quality-auditor

When to activate

When NOT to use

Data Quality Audit Checklist

Quality Dimensions

Data Quality Scorecard Template

Accuracy Assessment

Sample Validation

Timeliness Assessment

Conformity Assessment

Anomaly Detection

Statistical Analysis

Remediation Plan

Approval & Sign-Off

When to activate

When NOT to use

Data Quality Audit Checklist

Quality Dimensions

Data Quality Scorecard Template

Accuracy Assessment

Sample Validation

Timeliness Assessment

Conformity Assessment

Anomaly Detection

Statistical Analysis

Remediation Plan

Approval & Sign-Off