with one click
connector-doc-review
// Review and fix OpenMetadata connector documentation against JSON schema and source code. Validates availableFeatures, permissions, yaml.mdx configuration, and overall completeness. Automatically fixes gaps.
// Review and fix OpenMetadata connector documentation against JSON schema and source code. Validates availableFeatures, permissions, yaml.mdx configuration, and overall completeness. Automatically fixes gaps.
| name | connector-doc-review |
| description | Review and fix OpenMetadata connector documentation against JSON schema and source code. Validates availableFeatures, permissions, yaml.mdx configuration, and overall completeness. Automatically fixes gaps. |
| user-invocable | true |
| argument-hint | <connector-name> [--service-type=database|pipeline|dashboard|messaging|storage|search|mlmodel] [--version=v1.12.x|v1.11.x|v1.13.x-SNAPSHOT|all] [--dry-run] |
| allowed-tools | ["Bash","Read","Glob","Grep","Edit","Write","Agent"] |
When a user asks to review, validate, audit, or fix connector documentation — checking it against the actual JSON schema and ingestion source code.
redshift, dynamodb, bigquery, airflow, looker, kafka)database, pipeline, dashboard, messaging, storage, search, mlmodel. Default: auto-detect from connector name.all (reviews v1.11.x, v1.12.x, v1.13.x-SNAPSHOT).DOCS_ROOT = . # The docs-om repo (current working directory)
OM_ROOT = ../OpenMetadata # Sibling directory to docs-om
SCHEMA_ROOT = ${OM_ROOT}/openmetadata-spec/src/main/resources/json/schema/entity/services
SOURCE_ROOT = ${OM_ROOT}/ingestion/src/metadata/ingestion/source
Read the connector's JSON schema to extract the canonical list of capabilities and configuration fields.
Schema file: ${SCHEMA_ROOT}/connections/${service_type}/${connectorName}Connection.json
Extract:
supports* boolean flags — these define what the connector can doschemaFilterPattern, tableFilterPattern, storedProcedureFilterPattern, etc.sampleDataStorageConfig — presence indicates Sample Data support$ref to auth schemas (basicAuth, iamAuthConfig, awsCredentials, gcpCredentials, etc.). Build a list of supported authentication types (e.g., "Basic Auth", "IAM Auth", "OAuth2", "API Key", "GCP Credentials") from the authType property's oneOf/anyOf references.sslMode, sslConfig, verifySSLrequired arrayAlso read the parent service schema to verify the connector is registered:
Service schema: ${SCHEMA_ROOT}/${serviceType}Service.json
Read these files for the connector:
${SOURCE_ROOT}/${service_type}/${connector_name}/metadata.py — Source class, capabilities
${SOURCE_ROOT}/${service_type}/${connector_name}/connection.py — Connection, test steps, permissions
${SOURCE_ROOT}/${service_type}/${connector_name}/service_spec.py — Service spec (lineage, usage, profiler classes)
Extract:
test_fn dictionary in test_connection(). Each key represents a permission/capability the connector validates.LifeCycleQueryMixin, MultiDBSource, CommonNoSQLSource)yield_tag or owner-related methods in the source to determine if Owners/Tags are supportedAlso check for:
${SOURCE_ROOT}/${service_type}/${connector_name}/queries.py — SQL queries (for permission requirements)
${SOURCE_ROOT}/${service_type}/${connector_name}/client.py — API client (for REST connectors)
Using the schema and code, build a definitive feature map:
For Database Connectors:
| Schema Signal | Available Feature String | Unavailable Feature String |
|---|---|---|
supportsMetadataExtraction: true | "Metadata" | — |
supportsUsageExtraction: true | "Query Usage" | "Query Usage" if false |
supportsLineageExtraction: true | "View Lineage" or "Lineage" | "Lineage" if false |
supportsViewLineageExtraction: true | "View Column-level Lineage" or "Column-level Lineage" | "Column-level Lineage" if false |
supportsProfiler: true | "Data Profiler" | "Data Profiler" if false |
supportsDBTExtraction: true | "dbt" | "dbt" if false |
supportsDataDiff: true | "Data Quality" | "Data Quality" if false |
storedProcedureFilterPattern present | "Stored Procedures" | "Stored Procedures" if absent |
sampleDataStorageConfig present | "Sample Data" | "Sample Data" if absent |
supportsProfiler: true (implicit) | "Auto-Classification" | "Auto-Classification" if profiler false |
| Check source code for owner extraction | "Owners" if supported | "Owners" if not |
| Check source code for tag extraction | "Tags" if supported | "Tags" if not |
For Pipeline Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "Pipelines" |
| Check code for status extraction | "Pipeline Status" |
supportsLineageExtraction: true or lineage source in spec | "Lineage" |
| Check code for owner extraction | "Owners" |
| Check code for usage tracking | "Usage" |
| Check code for tag extraction | "Tags" |
For Dashboard Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "Dashboards", "Charts" |
| Check code for datamodel extraction | "Datamodels" |
| Check code for project support | "Projects" |
supportsLineageExtraction: true or lineage source in spec | "Lineage" |
| Check code for column lineage | "Column Lineage" |
| Check code for owner extraction | "Owners" |
| Check code for usage tracking | "Usage" |
| Check code for tag extraction | "Tags" |
For Messaging Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "Topics" |
| Check for sample data support | "Sample Data" |
For Storage Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "Metadata" |
| Check code for structured containers | "Structured Containers" |
| Check code for unstructured containers | "Unstructured Containers" |
For Search Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "Search Indexes" |
| Check for sample data support | "Sample Data" |
For ML Model Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "ML Features" |
| Check for hyperparameters | "Hyperparameters" |
| Check for ML store | "ML Store" |
For each version being reviewed (v1.11.x, v1.12.x, v1.13.x-SNAPSHOT):
${DOCS_ROOT}/${version}/connectors/${service_type}/${connector_name}.mdx
Extract:
availableFeatures array from <ConnectorDetailsHeader>unavailableFeatures array from <ConnectorDetailsHeader>stage value (PROD or BETA)<Info> callout listing supported authentication types exists near the top of the page (after the intro text, before the table of contents)${DOCS_ROOT}/${version}/connectors/${service_type}/${connector_name}/yaml.mdx
If it exists, extract:
${DOCS_ROOT}/docs.json
Verify:
Run these validation checks and categorize findings:
Compare availableFeatures in docs against the truth table from Phase 1.
availableFeaturesavailableFeatures but should NOT be (per schema/code)unavailableFeatures but should be in availableFeatures, or vice versaSeverity: WARNING for each mismatch.
Verify that features NOT supported are listed in unavailableFeatures. A feature that is neither available nor unavailable is confusing to users.
Severity: SUGGESTION for missing entries in unavailableFeatures.
Compare documented permissions against:
connection.py)queries.py)client.py or source code)Check for:
For database connectors, verify documented SQL grants match the queries. For cloud connectors (AWS/GCP/Azure), verify IAM policy actions match API calls.
Severity: WARNING for missing permissions, SUGGESTION for unclear ones.
Compare the YAML example in yaml.mdx against the JSON schema properties:
Severity: WARNING for missing required fields, SUGGESTION for optional ones.
Based on the feature truth table, verify the documentation has the right sections:
supportsUsageExtraction: true → Query Usage section should existsupportsLineageExtraction: true → Lineage section should existsupportsProfiler: true → Data Profiler section should existsupportsDBTExtraction: true → dbt section or link should existsupportsDataDiff: true → Data Quality section should existSeverity: WARNING for missing sections.
If reviewing all versions, check that features and permissions are consistent across versions (unless a known version difference exists).
Severity: SUGGESTION for inconsistencies.
Verify that the supported authentication types are highlighted at the top of the page using an <Info> callout. This callout should appear after the intro text and before the table of contents, listing each authentication method the connector supports (derived from the authType property in the JSON schema).
Expected format:
<Info>
**Supported Authentication Types:**
- **Basic Auth** — Username and password authentication
- **IAM Auth** — AWS IAM-based authentication with automatic temporary credential retrieval (supports both Provisioned Clusters and Serverless Workgroups)
</Info>
Common authentication type labels by schema reference:
basicAuth.json → Basic Auth — Username and password authenticationiamAuthConfig.json → IAM Auth — AWS IAM-based authentication with automatic temporary credential retrievalawsCredentials.json → AWS Credentials — AWS access key, secret key, and optional session tokengcpCredentials.json → GCP Credentials — Google Cloud service account authenticationazureCredentials.json → Azure Credentials — Azure service principal or managed identity authenticationCheck for:
<Info> callout with authentication types exists at the topSeverity: WARNING for missing callout, SUGGESTION for incomplete/stale.
This check applies to both the main page and the yaml.mdx page.
Present a structured report:
## Connector Documentation Review: {connector_name}
### Ground Truth (from schema + code)
**Service Type**: {service_type}
**Schema File**: {path}
**Source Files**: {paths}
**Supported Features**: [list]
**Unsupported Features**: [list]
**Required Permissions**: [list with explanations]
**Required Configuration Fields**: [list]
### Findings
#### {version}
| # | Check | Severity | Finding | Current | Expected |
|---|-------|----------|---------|---------|----------|
| 1 | Features | WARNING | Missing "Data Quality" in availableFeatures | [...] | [...] |
| 2 | Permissions | WARNING | Missing dynamodb:DescribeTable | - | Required for metadata extraction |
| ... | | | | | |
### Summary
- **Warnings**: {count} (should fix)
- **Suggestions**: {count} (nice to have)
After presenting the report, fix all findings automatically:
Edit the <ConnectorDetailsHeader> component in both the main page and yaml.mdx to match the truth table. Ensure both pages use identical feature arrays.
Add missing permissions with clear explanations. Format as:
Make permissions user-friendly:
Update the YAML example to include all schema properties with correct defaults. Update ContentSection descriptions to match schema descriptions.
Add missing documentation sections using the standard snippet pattern. Import the appropriate shared snippets.
If the <Info> callout for authentication types is missing or incomplete, add or update it in both the main page and yaml.mdx. Insert it after the intro sentence ("In this section, we provide guides and references to use the {connector} connector.") and before the table of contents. Derive the authentication types from the JSON schema's authType property. Use the label mapping from Check 7. For connectors with cloud-specific auth (IAM, GCP, Azure), include relevant details like supported deployment types.
Apply the same fixes across all versions being reviewed.
After applying fixes:
Before: 5 warnings, 3 suggestions
After: 0 warnings, 0 suggestions
Fixed:
#1 WARNING Added "Data Quality" to availableFeatures
#2 WARNING Added dynamodb:DescribeTable to permissions
#3 WARNING Added missing hostPort field to YAML example
...
Available: "Metadata", "Query Usage", "Data Profiler", "Data Quality", "dbt",
"View Lineage" | "Lineage", "View Column-level Lineage" | "Column-level Lineage",
"Stored Procedures", "Sample Data", "Auto-Classification",
"Owners", "Tags"
Unavailable: same strings for features NOT supported
Available/Unavailable: "Pipelines", "Pipeline Status", "Lineage", "Owners", "Usage", "Tags"
Available/Unavailable: "Dashboards", "Charts", "Datamodels", "Projects",
"Lineage", "Column Lineage", "Owners", "Usage", "Tags"
Available/Unavailable: "Topics", "Sample Data"
Available/Unavailable: "Metadata", "Structured Containers", "Unstructured Containers"
Available/Unavailable: "Search Indexes", "Sample Data"
Available/Unavailable: "ML Features", "Hyperparameters", "ML Store"
| JSON Schema Flag | Default | Maps To |
|---|---|---|
supportsMetadataExtraction | true | "Metadata" |
supportsUsageExtraction | true | "Query Usage" |
supportsLineageExtraction | true | "View Lineage" or "Lineage" |
supportsViewLineageExtraction | true | "View Column-level Lineage" or "Column-level Lineage" |
supportsProfiler | true | "Data Profiler" |
supportsDBTExtraction | true | "dbt" |
supportsDataDiff | true | "Data Quality" |
supportsSystemProfile | false | (no direct feature, informational) |
supportsQueryComment | true | (no direct feature, informational) |
supportsDatabase | true | (no direct feature, structural) |
storedProcedureFilterPattern | (present/absent) | "Stored Procedures" |
sampleDataStorageConfig | (present/absent) | "Sample Data" |
supportsProfiler (implicit) | same as profiler | "Auto-Classification" |
When documenting permissions, follow these guidelines:
### Requirements
To extract metadata, the user needs the following permissions:
#### Metadata Ingestion
- `USAGE` on schemas — to list and access schemas
- `SELECT` on tables — to read table metadata and sample data
#### Profiler & Data Quality
- `SELECT` on tables — to run profiling queries
#### Usage & Lineage
- Access to query history views (e.g., `pg_stat_statements`, `stl_query`)
### Requirements
The IAM user/role needs the following permissions:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"service:ListAction", // Required for: discovering resources
"service:DescribeAction", // Required for: extracting metadata
"service:ReadAction" // Required for: profiling/sampling
],
"Resource": "*"
}
]
}
### Cloud Connectors (GCP)
```markdown
### Requirements
The service account needs the following roles:
- `roles/viewer` — for metadata extraction
- `roles/bigquery.dataViewer` — for profiling and sampling
Each configuration field should have a ContentSection with:
required schema fields MUST appear in the YAML example"{password}", "{access_key}"