| name | tuning-incremental-sync-config |
| description | Change the sync configuration of an existing data warehouse schema — switch sync_type, pick a different incremental_field, set primary_key_columns, choose cdc_table_mode, or change sync_frequency. Use when the user asks "switch my orders table from full refresh to incremental", "this table is syncing too slowly / too frequently", "I need to pick a different incremental column", "set up CDC for this Postgres table", or when diagnosis of a failing sync pointed to an incremental-field or PK misconfiguration.
|
Tuning incremental sync config
A sync's configuration lives on the ExternalDataSchema and can be changed any time via
external-data-schemas-partial-update. Most changes are non-destructive (take effect on the next sync), but a few
(switching sync_type, changing primary keys) require careful handling to avoid corrupting the synced data.
When to use this skill
- The user wants to change how an already-connected table is synced
- A diagnosis flagged the incremental field or primary key as wrong
- The table is syncing too often / not often enough
- Switching an incremental table to CDC (or vice versa)
- The source table was changed on the other side (new columns, dropped columns) and the sync config needs to catch up
If the user is setting up a brand-new source, use setting-up-a-data-warehouse-source instead — configuration is
chosen at creation time there.
Available tools
| Tool | Purpose |
|---|
external-data-schemas-retrieve | Current sync_type, incremental_field, PKs, sync_frequency |
external-data-schemas-incremental-fields-create | Refresh candidate incremental fields from the live source |
external-data-schemas-partial-update | Apply the config change |
external-data-schemas-reload | Trigger a sync with the new config |
external-data-schemas-resync | Wipe and re-import from scratch when the change invalidates existing data |
external-data-schemas-delete-data | Drop the synced table while keeping the schema entry |
external-data-sources-check-cdc-prerequisites-create | Pre-flight Postgres CDC (only when switching to/from CDC) |
external-data-sources-webhook-info-retrieve | Current webhook state (when switching to/from sync_type=webhook) |
external-data-sources-create-webhook-create | Register a webhook after switching a schema to sync_type=webhook |
external-data-sources-update-webhook-inputs-create | Rotate a webhook signing secret |
external-data-sources-delete-webhook-create | Unregister webhook when switching schemas off sync_type=webhook |
The fields you can tune
From the partial-update endpoint:
| Field | Values | Notes |
|---|
sync_type | full_refresh, incremental, append, cdc, webhook | Source must support the target type — check via incremental-fields |
incremental_field | Column name from the source | Must appear in incremental_fields list for the schema |
incremental_field_type | datetime, date, timestamp, integer, numeric, objectid | Must match the column's real type |
primary_key_columns | Array of column names | Required for CDC. Used for upsert dedup on incremental |
cdc_table_mode | consolidated, cdc_only, both | Only meaningful when sync_type=cdc |
sync_frequency | 1min, 5min, 15min, 30min, 1hour, 6hour, 12hour, 24hour, 7day, 30day, never | Applies to all non-CDC types |
sync_time_of_day | HH:MM:SS | When sync_frequency is daily/weekly-scale |
should_sync | true / false | Pause the schema without deleting it |
Workflow
Step 1 — Read the current config
Always start with external-data-schemas-retrieve({id}). Understanding the current state prevents mistakes like
"fixing" an incremental_field that's actually correct.
Note:
- Current
sync_type, incremental_field, incremental_field_type, primary_key_columns
- Current
status (don't tune a schema that's currently Running — wait or cancel first)
last_synced_at (so you can tell if the next sync worked)
latest_error if present (the error often tells you exactly what to change)
Step 2 — If changing sync_type or incremental_field, refresh candidates
Call external-data-schemas-incremental-fields-create({id}). Even though the operation name says "create", it
re-reads the source and returns the current candidate fields — use it to confirm the field you want to set actually
exists on the source and which sync types are now available for this table.
The response:
{
"incremental_fields": [{"field": "updated_at", "type": "datetime", ...}, ...],
"incremental_available": true,
"append_available": true,
"cdc_available": true,
"full_refresh_available": true,
"detected_primary_keys": ["id"],
"available_columns": [...]
}
If your target incremental_field isn't in the list, tell the user — they need to either pick a different field or
change the source table to add one.
Step 3 — Apply the change
Call external-data-schemas-partial-update({id}, {...changed fields}).
Only send the fields that are actually changing. Partial update means unspecified fields stay as they are.
Examples:
{
"sync_type": "incremental",
"incremental_field": "updated_at",
"incremental_field_type": "datetime"
}
{"sync_frequency": "1hour"}
{"primary_key_columns": ["tenant_id", "order_id"]}
{"should_sync": false}
Step 4 — Decide whether existing data is still valid
This is the step that's easy to get wrong. Some config changes invalidate the synced data; others don't.
Changes that DON'T invalidate existing data:
sync_frequency, sync_time_of_day — scheduling only
should_sync — on/off
cdc_table_mode in most cases — next sync will start writing to the new shape, but historical consolidated rows
stay valid
- Switching between
incremental and full_refresh with the same incremental_field — next sync just re-runs
fresh
- Switching to or from
sync_type: "webhook" — the synced data stays valid; only the ingestion path changes.
Remember to register or unregister the webhook (see sections below) alongside the sync_type change.
Changes that MAY invalidate existing data and need a resync:
- Changing
incremental_field to a different column — the high-water mark is from the old column and won't match.
Without a resync you'll miss rows that were updated between the two fields' histories.
- Changing
primary_key_columns — existing rows may be deduplicated incorrectly against new PK definitions.
- Switching from
full_refresh to append — the existing rows don't have the version-history shape that append
expects.
- Switching from
append to full_refresh — opposite problem; you'll end up with duplicate historical versions.
- Switching to/from
cdc — the table shape changes fundamentally.
When the change invalidates data, the clean flow is:
external-data-schemas-partial-update with the new config
- Warn the user this is destructive
external-data-schemas-resync to wipe and re-import under the new config
Or equivalently, external-data-schemas-delete-data → external-data-schemas-reload. delete-data + reload is
cleaner when the table is large and the user wants to start from zero.
Step 5 — Trigger and confirm
For non-destructive changes, call external-data-schemas-reload({id}) to pick up the new config immediately rather
than waiting for the schedule.
Wait a moment, then external-data-schemas-retrieve({id}) to confirm status = Running then Completed. Report
last_synced_at and any new latest_error.
Specific common changes
Switching full_refresh → incremental
incremental-fields-create to confirm the desired field exists and incremental_available: true.
partial-update: {sync_type: "incremental", incremental_field, incremental_field_type}.
- No data wipe needed — next sync just switches strategy. If the source is growing fast, the next incremental sync
is the cheap one.
Switching incremental → cdc (Postgres only)
- Run
external-data-sources-check-cdc-prerequisites-create on the parent source. Only proceed if valid: true.
incremental-fields-create to confirm cdc_available: true and see detected_primary_keys.
partial-update: {sync_type: "cdc", primary_key_columns: [...], cdc_table_mode: "consolidated"}.
- Resync required — CDC tables have a different shape. Trigger
external-data-schemas-resync after the update.
Warn the user this wipes existing data.
Fixing a stale incremental field after schema drift
Source dropped the updated_at column. Sync has been failing with "column does not exist".
incremental-fields-create to see what fields remain.
- Pick a replacement (or switch to
full_refresh if none are suitable).
partial-update with the new field + type (or new sync_type).
reload to retry.
Changing primary keys on a CDC table
partial-update: {primary_key_columns: [...]}.
- Resync required — existing CDC tombstones and upsert keys won't match the new PK definition, leading to row
duplication or missed updates.
resync, warn the user.
Changing sync_frequency
partial-update: {sync_frequency: "1hour"}.
- No reload needed — the next scheduled sync picks up the new cadence. Or reload manually if the user wants to
confirm nothing broke.
Switching a schema to sync_type: "webhook"
Only works for sources that implement WebhookSource (today: Stripe) and tables where supports_webhooks: true
from incremental-fields-create.
incremental-fields-create to confirm supports_webhooks: true for the table.
partial-update: {sync_type: "webhook"}.
- If the source doesn't already have a webhook registered (check with
webhook-info-retrieve), call
external-data-sources-create-webhook-create({source_id}) to register it.
- No resync required — the schema's existing bulk-synced data stays, and the webhook becomes the primary ingestion
path once the next reconciliation finishes.
- Keep
sync_frequency set (e.g. 24hour) — it acts as a safety-net reconciliation in case any webhook delivery
is missed.
Switching off sync_type: "webhook"
partial-update: {sync_type: "incremental"} (or whatever bulk type is appropriate) with the required
incremental_field + incremental_field_type.
- If no other schemas on the source are still using
sync_type: "webhook", call
external-data-sources-delete-webhook-create({source_id}) to unregister. Leaving an orphaned webhook
registered on the source side just means events will be received and dropped — not harmful, but messy.
- If other schemas on the source are still on webhook, leave the webhook registered — it's shared across all
webhook-type schemas on the source.
Rotating a webhook signing secret
The source's signing secret (e.g. Stripe's whsec_...) was rotated, and payloads are now failing signature
verification.
- Grab the new secret from the source's dashboard.
external-data-sources-update-webhook-inputs-create({source_id}, {inputs: {signing_secret: "whsec_..."}}).
- No reload needed — the next inbound webhook payload will verify against the new secret.
Pausing a schema
partial-update: {should_sync: false}. Schema stops syncing but stays configured.
- To resume later:
partial-update: {should_sync: true}, then reload for an immediate run.
Important notes
- Read before you write. Always retrieve the current config first.
partial-update doesn't complain if you set a
field to the value it already had, but you might be about to change something you didn't realize was already set.
- Not every sync_type is available on every schema. The
incremental-fields-create response tells you what's
available right now, which can be different from what was available at creation (e.g. CDC may have been
enabled for the team since).
- Wipe when the shape changes. Switching sync strategy often changes the physical table. If you don't resync,
you'll be mixing row shapes and queries will return garbage.
- CDC needs prerequisites. Never switch to
sync_type: "cdc" without running check-cdc-prerequisites-create
first. The sync will just fail immediately.
- Don't touch a Running schema. If the schema is currently running, either wait for it to finish or
external-data-schemas-cancel before applying the change. Updating config mid-sync can leave the incremental
high-water mark inconsistent.
- Sync frequency is cheap to change. Encourage experimentation there. Sync_type and incremental_field are
expensive to change — encourage care.
- Webhooks are registered at the source level, not the schema level. Multiple webhook-type schemas on the same
source share one webhook registration. Only delete the webhook when the last webhook-type schema on that
source is being switched away, otherwise other schemas stop receiving pushes.