with one click
cloudflare-preview-worker-debugging
// Debug Cloudflare preview Worker/Containers deploy failures by separating credential, deploy, DNS/custom-domain, Access, and container-runtime causes using wrangler plus Cloudflare/GitHub APIs.
// Debug Cloudflare preview Worker/Containers deploy failures by separating credential, deploy, DNS/custom-domain, Access, and container-runtime causes using wrangler plus Cloudflare/GitHub APIs.
| name | cloudflare-preview-worker-debugging |
| description | Debug Cloudflare preview Worker/Containers deploy failures by separating credential, deploy, DNS/custom-domain, Access, and container-runtime causes using wrangler plus Cloudflare/GitHub APIs. |
| version | 1.0.0 |
| author | Hermes Agent |
| license | MIT |
| metadata | {"hermes":{"tags":["Cloudflare","Workers","Containers","GitHub-Actions","DNS","Debugging"]}} |
Use when a GitHub Actions preview deploy for a Cloudflare Worker/Container fails, especially if the workflow ends in a vague health-check timeout.
A common failure mode is not bad Cloudflare credentials and not a broken container. The deploy can succeed while the workflow still fails because:
curl -s ... || true and hides the real error.This skill is for separating those layers cleanly.
CLOUDFLARE_API_TOKENCLOUDFLARE_ACCOUNT_IDwrangler available, or run via corepack pnpm dlx wranglerDo not guess from the workflow name.
Inspect the failed job and identify whether failure happened in:
wrangler deploy,For GitHub REST:
GET /repos/{owner}/{repo}/actions/jobs/{job_id}GET /repos/{owner}/{repo}/actions/jobs/{job_id}/logsWhat to look for:
Deploy to Cloudflare succeeded and Health check failed, credentials are probably fine.Not ready yet (response: ). Retrying in 60s...
then suspect DNS resolution or stderr being swallowed.Check local env without printing secrets. Only confirm presence/length/masked value.
Then confirm CI also received them by looking for masked env lines in the job logs:
CLOUDFLARE_API_TOKEN: ***CLOUDFLARE_ACCOUNT_ID: ***If wrangler deploy succeeded, this is further evidence credentials are not the root cause.
Useful commands:
corepack pnpm dlx wrangler whoami
corepack pnpm dlx wrangler deployments list --name <worker-name> --json
corepack pnpm dlx wrangler versions list --name <worker-name>
corepack pnpm dlx wrangler versions view <version-id> --name <worker-name> --json
corepack pnpm dlx wrangler containers list
corepack pnpm dlx wrangler containers info <container-app-id>
corepack pnpm dlx wrangler containers instances <container-app-id>
Interpretation:
state=ready, healthy instances > 0, failed instances = 0 -> container is not the primary failure.containers instances may still say no running instances if nothing is currently active; do not over-interpret that alone.The worker may be deployed even if the custom domain is not yet usable from the runner.
Helpful API endpoints:
GET /accounts/{account_id}/workers/services/{service}GET /accounts/{account_id}/workers/scripts/{script}/domainsGET /accounts/{account_id}/workers/domains/records/{domain_record_id}GET /zones/{zone_id}/dns_records?name=<hostname>Important pattern:
AAAA 100:: with proxied=true.curl -sS https://preview-host.example.com/health
Possible outcomes:
curl: (6) Could not resolve host -> DNS / propagation issue302 to Cloudflare Access login -> Access gate is working, but request lacks auth200 with expected JSON -> route is healthy5xx / timeout -> app or network issueThis is a powerful diagnostic.
curl --resolve preview-host.example.com:443:104.18.8.21 \
https://preview-host.example.com/health
If this works while plain hostname fails, the root cause is custom-domain DNS propagation/resolution, not the worker/container.
You can extract CF_ACCESS_CLIENT_ID and CF_ACCESS_CLIENT_SECRET from Worker settings/version bindings and test directly:
curl --resolve preview-host.example.com:443:104.18.8.21 \
-H "CF-Access-Client-Id: ..." \
-H "CF-Access-Client-Secret: ..." \
https://preview-host.example.com/health
Interpretation:
302 without headers, 200 with headers -> Access was expected and functioning.200 with headers + expected SHA -> worker/container is healthy.Run:
corepack pnpm dlx wrangler tail <worker-name> --format json
Then generate a request.
What to look for:
/health returning 200Error checking 80: The container is not listening in the TCP address 10.0.0.1:80This specific message can appear during early container startup and may be transient. If the final response is 200, it is not the root cause of the workflow failure.
For Notifly-style Cloudflare Container deployments, do not stop at "the worker has env bindings". There are three separate layers that can drift:
src/utils/env.ts, direct process.env.* reads)cf_deploy.yml generated vars)worker/index.ts envVars, plus worker/env.d.ts)A real failure mode is:
cf_deploy.yml,worker/index.ts envVars,Check these files together:
services/server/web-console/src/utils/env.tsprocess.env.* usages under the service.github/workflows/cf_deploy.ymlservices/server/web-console/worker/index.tsservices/server/web-console/worker/env.d.tsservices/server/web-console/task-definitions-*.jsoncf_deploy.yml -> Worker never receives it.cf_deploy.yml but missing in worker/index.ts envVars -> Worker has it, container does not.In web-console, Cloudflare preview env drift included these categories:
Missing from Cloudflare deploy vs ECS/runtime:
APPLICATION_NAMEINTERNAL_API_SERVICE_URLSLACK_NOTIFLY_OPS_BOT_TOKENSLACK_NOTIFLY_OPS_JOB_REPORT_CHANNEL_IDPresent in cf_deploy.yml but missing from worker-to-container passthrough:
KAKAO_BZM_CENTER_API_URLKAKAO_BZM_CENTER_UPLOAD_API_URLKAKAO_BZM_CENTER_PARTNER_KEYThis kind of drift is a real bug, even when it is not the direct cause of the immediate health-check failure.
APPLICATION_NAMEIf the app only uses it for DB application_name tagging via resolvePgApplicationName, its absence may degrade observability without crashing startup. Distinguish:
A transient log such as:
Error checking 80: The container is not listening in the TCP address 10.0.0.1:80
can happen during cold start before the app begins listening.
If the same request later returns 200, treat this as transient readiness delay, not proof of a crash loop.
For the Notifly web-console container, startup is not instantaneous because entrypoint.sh first launches multiple cloudflared access tcp processes, waits briefly, and only then starts node server.js.
Use these rules:
Diagnosis: custom-domain DNS propagation/resolution issue.
Diagnosis: workflow health check is missing/incorrect Access auth, or route is protected differently than expected.
Diagnosis: app/container readiness or routing issue.
Diagnosis: inspect credentials, wrangler config, or Cloudflare API errors first.
curl -s ... || true hides the real error. An empty response in logs may actually be DNS failure.wrangler deploy completed successfully.workers.dev hostname behavior may differ from custom domain behavior; use it as an extra signal, not the only truth.If root cause is DNS/custom-domain propagation:
curl stderr (-Ssv or equivalent).A real Cloudflare failure mode is:
NXDOMAINIn one confirmed case, Cloudflare support traced this to an internal service that propagates DNS changes to authoritative nameservers incorrectly skipping an update for the zone. After Cloudflare deployed their fix:
dig <preview-host> returned normal A recordsSo when all of these are simultaneously true:
wrangler deploy succeedsNXDOMAINthen keep platform/vendor DNS propagation bug high on the hypothesis list. The correct next step may be support escalation + later deploy retry, not more application changes.
For Cloudflare preview deployments that expose both a workers.dev hostname and a branded custom domain, a better production workflow is:
wrangler deployworkers.dev subdomain explicitly via Cloudflare APIworkers.dev/health until the expected SHA is servingworkers.dev/health is still challenged by Accessworkers.dev, disable that subdomain again as cleanupDo not let the workflow go green just because workers.dev is healthy while the real preview domain is publicly resolvable but broken.
The only safe tolerance case is:
That distinguishes:
A weak check like "response did not include .sha" is not enough. The unauthenticated workers.dev probe should explicitly expect an Access-style challenge, e.g. one of:
302401403Cloudflare AccessOtherwise a public HTML page, redirect loop, or other non-health response can falsely look "protected".
If you enable workers.dev only for CI diagnostics, add a failure cleanup step that disables it again. Otherwise a failed deployment may leave an unnecessary public alternate hostname behind.
When Cloudflare preview deploy fails after wrangler deploy success, first test whether the hostname resolves from the runner. If not, the likely culprit is DNS/custom-domain propagation, not credentials and not the container runtime.
[HINT] Download the complete skill directory including SKILL.md and all related files