Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

dct-profile

Sterne0

Forks0

Aktualisiert9. Februar 2026 um 02:47

Use this skill when the user wants to analyze data quality, profile data files, check value distributions, perform character analysis on text fields, identify data quality issues, or get statistics about dataset contents. Triggers include "profile this data", "analyze data quality", "check for nulls", "value distribution", "character frequency", "data statistics", "column profiling", or when doing exploratory data analysis or quality assessment.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

andrew-a-hale

andrew-a-hale/dct

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

DatenwissenschaftlerInformatik- und Mathematikberufe·SOC 15-2051

SKILL.md

readonly

name

dct-profile

description

DCT Profile - Data Quality Analysis

Analyze data files for value distributions, unique counts, and character frequencies.

When to Use

Use this skill when you need to:

Assess data quality before processing
Identify anomalies or outliers
Check for null/missing values
Analyze text field character distributions
Understand value cardinality
Validate data format compliance

Installation

which dct || go build -o dct && chmod +x ./dct

Usage

dct prof <file> [flags]

Arguments

file: Data file to profile (CSV, JSON, NDJSON, or Parquet)

Flags

-o, --output <file>: Output to file instead of stdout

Examples

Profile a CSV file:

dct prof data.csv

Profile Parquet file:

dct prof large.parquet

Save profile report:

dct prof messy.csv -o data_quality_report.txt

Profile JSON data:

dct prof data.json

Output Sections

The profile report includes detailed analysis for each column:

1. Count Statistics

Basic cardinality information:

-- Field: `email` --
Count: 1000
Unique Count: 995

2. Value Occurrences

Most common values with their frequencies:

Value Occurrence
row: value -> count
0: user@example.com -> 1
1: admin@example.com -> 1
...
MOSTLY UNIQUE VALUES SHOWING SAMPLE...

For high-cardinality fields, shows a sample of unique values.

3. String Length Statistics

For text fields, provides length metrics:

Value Summary - String Lengths
Min: 10
Mean: 22.500000
Max: 45

4. Character Frequency Analysis

Detailed character-level statistics:

Char Occurrence
row: rune -> count
00: '@' (hex: U+0040) (dec: 64) -> 1000
01: '.' (hex: U+002E) (dec: 46) -> 1000
02: 'e' (hex: U+0065) (dec: 101) -> 2500

Shows:

Character symbol
Hexadecimal code (U+XXXX)
Decimal code
Total occurrences

Data Quality Indicators

Look for these patterns in the output:

Missing/Null Values

Low count vs expected row count
<nil> values in occurrence list

Duplicates

Count significantly higher than unique count
Same value appearing multiple times

Encoding Issues

Unexpected characters in char occurrence
Non-ASCII characters (hex > U+007F)
Null bytes (�)

Format Inconsistencies

Wide range in string lengths
Mixed formats in same column
Special characters in unexpected places

Best Practices

Profile first: Always profile new data sources before processing
Check all columns: Review each field's statistics
Look for outliers: Extreme min/max values may indicate errors
Character analysis: Check for encoding issues, especially in text fields
Save reports: Use -o to save profiles for documentation

Example Workflow

# 1. Profile the data
dct prof incoming_data.csv -o profile.txt

# 2. Review the output for issues:
#    - Check count matches expectations
#    - Look for nulls in value occurrences
#    - Review character frequencies for encoding issues

# 3. Fix issues if found
#    - Handle nulls
#    - Fix encoding
#    - Remove duplicates

# 4. Re-profile after fixes
dct prof cleaned_data.csv

Interpreting Results

Good Data Quality Signs

Count matches expected row count
Unique counts appropriate for field type
Character distributions match expected language/encoding
String lengths within reasonable bounds

Warning Signs

High null counts
Extreme string length variations
Unexpected special characters
Count/unique count ratio indicates duplicates

Related Skills

dct-peek: Quick preview before detailed profiling
dct-infer: Generate schema after quality check
dct-diff: Compare profiles of two file versions

Performance Notes

Profiles entire file by default
May be slow on very large files (>1GB)
Consider sampling large files with dct peek first
Character analysis can be memory-intensive on wide text columns

Mehr aus diesem Repository

gleiches Repository

dct-chart

andrew-a-hale/dct

Use this skill when the user wants to visualize data distributions, create ASCII histograms, generate simple charts from CSV/JSON data, plot column values, or see value frequencies in terminal-friendly format. Triggers include "chart this data", "visualize distribution", "histogram of values", "plot the data", "ascii chart", "terminal visualization", or when needing quick visual analysis without external plotting tools.

2026-02-090

dct-diff

andrew-a-hale/dct

Use this skill when the user wants to compare two data files, find differences between datasets, validate data consistency, check if files have matching records, or reconcile data between sources. Triggers include "compare these files", "diff the datasets", "are these the same", "find differences", "validate data matches", "reconcile", "data comparison", or when doing data quality validation between two files.

2026-02-090

dct-flattify

andrew-a-hale/dct

Use this skill when the user wants to flatten nested JSON structures, convert nested objects to flat format, generate SQL queries from nested JSON, unnest hierarchical data, or work with nested API responses that need to be tabular. Triggers include "flatten this json", "make json flat", "nested to flat", "unnest json", "json to sql", "flatten nested", or when dealing with deeply nested JSON from APIs or document stores.

2026-02-090

dct-generate

andrew-a-hale/dct

Use this skill when the user wants to create synthetic test data, generate fake datasets, create mock data for testing, produce realistic data with specific patterns, or need sample data with custom schemas. Triggers include "generate test data", "create fake data", "mock dataset", "synthetic data", "generate sample records", "create test data", "fake users", "mock data", or when needing test data with specific fields and relationships.

2026-02-090

dct-infer

andrew-a-hale/dct

Use this skill when the user wants to generate SQL CREATE TABLE statements from data files, infer schema from CSV/JSON/Parquet, create database schemas from existing data, or get column types from a file. Triggers include "generate schema", "create table from csv", "infer types", "what's the schema", "get column types", "sql ddl", or when preparing data for SQL databases like DuckDB, PostgreSQL, or similar.

2026-02-090

dct-js2sql

andrew-a-hale/dct

Use this skill when the user wants to convert JSON Schema to SQL CREATE TABLE statements, transform schema definitions to database DDL, create SQL tables from JSON Schema files, or generate database schemas from API specifications. Triggers include "json schema to sql", "convert schema to sql", "create table from json schema", "json schema ddl", "schema conversion", or when working with OpenAPI, JSON Schema, or API specifications that need database tables.

2026-02-090

name

dct-profile

description

DCT Profile - Data Quality Analysis

Analyze data files for value distributions, unique counts, and character frequencies.

When to Use

Use this skill when you need to:

Assess data quality before processing
Identify anomalies or outliers
Check for null/missing values
Analyze text field character distributions
Understand value cardinality
Validate data format compliance

Installation

which dct || go build -o dct && chmod +x ./dct

Usage

dct prof <file> [flags]

Arguments

file: Data file to profile (CSV, JSON, NDJSON, or Parquet)

Flags

-o, --output <file>: Output to file instead of stdout

Examples

Profile a CSV file:

dct prof data.csv

Profile Parquet file:

dct prof large.parquet

Save profile report:

dct prof messy.csv -o data_quality_report.txt

Profile JSON data:

dct prof data.json

Output Sections

The profile report includes detailed analysis for each column:

1. Count Statistics

Basic cardinality information:

-- Field: `email` --
Count: 1000
Unique Count: 995

2. Value Occurrences

Most common values with their frequencies:

Value Occurrence
row: value -> count
0: user@example.com -> 1
1: admin@example.com -> 1
...
MOSTLY UNIQUE VALUES SHOWING SAMPLE...

For high-cardinality fields, shows a sample of unique values.

3. String Length Statistics

For text fields, provides length metrics:

Value Summary - String Lengths
Min: 10
Mean: 22.500000
Max: 45

4. Character Frequency Analysis

Detailed character-level statistics:

Char Occurrence
row: rune -> count
00: '@' (hex: U+0040) (dec: 64) -> 1000
01: '.' (hex: U+002E) (dec: 46) -> 1000
02: 'e' (hex: U+0065) (dec: 101) -> 2500

Shows:

Character symbol
Hexadecimal code (U+XXXX)
Decimal code
Total occurrences

Data Quality Indicators

Look for these patterns in the output:

Missing/Null Values

Low count vs expected row count
<nil> values in occurrence list

Duplicates

Count significantly higher than unique count
Same value appearing multiple times

Encoding Issues

Unexpected characters in char occurrence
Non-ASCII characters (hex > U+007F)
Null bytes (�)

Format Inconsistencies

Wide range in string lengths
Mixed formats in same column
Special characters in unexpected places

Best Practices

Profile first: Always profile new data sources before processing
Check all columns: Review each field's statistics
Look for outliers: Extreme min/max values may indicate errors
Character analysis: Check for encoding issues, especially in text fields
Save reports: Use -o to save profiles for documentation

Example Workflow

# 1. Profile the data
dct prof incoming_data.csv -o profile.txt

# 2. Review the output for issues:
#    - Check count matches expectations
#    - Look for nulls in value occurrences
#    - Review character frequencies for encoding issues

# 3. Fix issues if found
#    - Handle nulls
#    - Fix encoding
#    - Remove duplicates

# 4. Re-profile after fixes
dct prof cleaned_data.csv

Interpreting Results

Good Data Quality Signs

Count matches expected row count
Unique counts appropriate for field type
Character distributions match expected language/encoding
String lengths within reasonable bounds

Warning Signs

High null counts
Extreme string length variations
Unexpected special characters
Count/unique count ratio indicates duplicates

Related Skills

dct-peek: Quick preview before detailed profiling
dct-infer: Generate schema after quality check
dct-diff: Compare profiles of two file versions

Performance Notes

Profiles entire file by default
May be slow on very large files (>1GB)
Consider sampling large files with dct peek first
Character analysis can be memory-intensive on wide text columns