Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

tokenkey-stage0-edge-ip-rotation

Name: Tokenkey Stage0 Edge Ip Rotation
Author: youxuanxue

// Rotate / replace the egress Elastic IP of a TokenKey Stage0 edge (uk1/us1/sg1/fra1/…) when the live IP has been risk-blocked ("polluted") by an upstream API (Anthropic / OpenAI / Google). Drives the single canonical path: a workflow_dispatch of deploy-edge-stage0.yml with operation=rotate_egress_ip, which does a CFN-native UpdateStack — no detach, no IMPORT, no drift class. Auto-allocates a clean candidate (checked against edge-polluted-ips.json), swaps via CFN, verifies SSM Online + outbound IP + Anthropic/OpenAI/Google pollution probe from the edge itself, and auto-reverts on a polluted result. The only operator step that remains is the DNS A-record update at Porkbun (and committing the retired IP into edge-polluted-ips.json).

Ejecutar en Manus

$ git log --oneline --stat

stars:2

forks:0

updated:23 de mayo de 2026, 22:56

SKILL.md

readonly

name

tokenkey-stage0-edge-ip-rotation

description

Rotate / replace the egress Elastic IP of a TokenKey Stage0 edge (uk1/us1/sg1/fra1/…) when the live IP has been risk-blocked ("polluted") by an upstream API (Anthropic / OpenAI / Google). Drives the single canonical path: a workflow_dispatch of deploy-edge-stage0.yml with operation=rotate_egress_ip, which does a CFN-native UpdateStack — no detach, no IMPORT, no drift class. Auto-allocates a clean candidate (checked against edge-polluted-ips.json), swaps via CFN, verifies SSM Online + outbound IP + Anthropic/OpenAI/Google pollution probe from the edge itself, and auto-reverts on a polluted result. The only operator step that remains is the DNS A-record update at Porkbun (and committing the retired IP into edge-polluted-ips.json).

TokenKey: rotate an edge gateway's egress EIP

v2 (OPC). Replaces the v1 manual multi-step nano-probe / CFN-IMPORT runbook. The deploy workflow now owns rotation end-to-end; this skill is a thin wrapper that decides which workflow input to pass, not a sequence of bash commands.

The previous v1 runbook in docs/deploy/tokenkey-edge-ip-history.md is retained as the historical & recovery reference — read it only if (a) you are doing the one-time per-stack migration via deploy/aws/stage0/migrate-edge-eip-to-parameter.sh on a stack that has not yet been converted to EIP-as-parameter, or (b) the v2 path failed in a way that requires hand-recovery (rare).

Why this is short now

The earlier multi-step procedure existed because the CFN template treated the EIP as a stack-managed resource (AWS::EC2::EIP + EIPAssociation with Retain). Manual EIP swaps then desynced template-vs-live and required detach + IMPORT to recover; that recovery sequence (specifically the IMPORT step on the EIPAssociation) is what silently disconnected the SSM agent on edge-uk1 on 2026-05-22.

The template has since been refactored so the EIP is an external EipAllocationId parameter, not a resource. CloudFormation does disassociate-old + associate-new natively when the parameter changes — the instance, its IAM profile, and its SSM agent are never touched. The entire class of "drift" disappears, and so does this skill's previous bulk.

One canonical invocation

gh workflow run deploy-edge-stage0.yml \
  -f edge_id=<id> \
  -f operation=rotate_egress_ip \
  -f confirm_stack=tokenkey-edge-<id>-stage0 \
  -f rotation_reason='<short reason>' \
  [-f candidate_allocation_id=eipalloc-XXXX]

edge_id matches a key in deploy/aws/stage0/edge-targets.json (normalize edge-uk1 → uk1). rotation_reason is required and ends up on the new EIP's tokenkey:replaces-reason tag and in the run summary's edge-polluted-ips.json snippet.

candidate_allocation_id is optional. If unset, the workflow allocates a fresh EIP and refuses any allocation that lands on a known-polluted IP. Set it only when the operator has pre-vetted a specific allocation outside the workflow (rare — only useful if the auto path keeps drawing dirty IPs).

What the workflow does, in order:

Reads the stack's current EipAllocationId (= rollback target).
Allocates a fresh EIP unless candidate_allocation_id is set; cross-checks against edge-polluted-ips.json and re-allocates if dirty.
aws cloudformation deploy --parameter-overrides EipAllocationId=<new> — atomic CFN swap.
Polls ssm:DescribeInstanceInformation until PingStatus=Online (post-mutation invariant; uk1-2026-05-22 was the incident that motivated this gate).
Runs the pollution probe via SSM on the edge itself (no throwaway nano): confirms outbound IP, then curls Anthropic / OpenAI / Google with dummy keys looking for 403 + Cloudflare HTML (= polluted) vs 401/400 + provider-shaped JSON (= clean).
On polluted → automatic revert (CFN update-stack back to OLD_ALLOC) + release the freshly-allocated EIP (only if the workflow itself allocated it).
On clean → curl https://<domain>/health via --resolve <domain>:443:<new_ip> to prove the data plane survives end-to-end on the new IP before DNS propagation.
Emits a step summary with: old/new IP, retired-IP JSON snippet ready to paste into edge-polluted-ips.json, and the Porkbun A-record change to make.

Two operator steps left (intentional)

DNS at Porkbun (or your provider): change the A record for api-<id>.tokenkey.dev to the new IP. The workflow does not automate this because the Porkbun API token is not in repo secrets; the run summary prints the exact transition.
Append the retired IP to edge-polluted-ips.json so future rotations refuse to re-allocate it. The run summary prints a paste-ready JSON entry. After DNS has propagated (~1 hour) you may aws ec2 release-address the old allocation and set released_on.

Everything else is mechanized.

First-time migration (per stack)

A stack that still has the v1 shape (ElasticIP + EIPAssociation with Retain, no EipAllocationId parameter) cannot accept operation=rotate_egress_ip yet. Migrate it once:

# Dry run (read-only):
bash deploy/aws/stage0/migrate-edge-eip-to-parameter.sh <edge_id>

# Apply (changes live CFN):
bash deploy/aws/stage0/migrate-edge-eip-to-parameter.sh <edge_id> --apply

The migration keeps the same physical EIP — the public IP does NOT change. It only converts the CFN representation from "EIP is in the template" to "EIP is referenced by allocation-id parameter". After the migration, the stack accepts operation=rotate_egress_ip and the rest of this skill is in force.

Stop-the-line rules

The workflow itself enforces the data-plane invariants. This skill must still refuse when:

The normalized edge_id is not a key in deploy/aws/stage0/edge-targets.json.
rotation_reason is empty or only whitespace.
The target stack has not been migrated yet (describe-stacks shows no EipAllocationId parameter) — direct the operator to migrate-edge-eip-to-parameter.sh first.
operation=rotate_egress_ip is requested against tokenkey-prod-stage0 (the production gateway). Prod IP rotation has different blast radius (active client connections) and is intentionally not covered by this skill.

The workflow handles the rest as mechanical gates — operator does not need this skill to babysit candidate allocation, probe results, or revert.

Reporting contract

The workflow's step summary is the contract. Nothing else needs to be produced. If you need to summarize for a chat caller, mirror the values from the summary:

edge_id: <id>
region: <aws-region>
old_ip / old_alloc: <ip> / <eipalloc-…>
new_ip / new_alloc: <ip> / <eipalloc-…>
status: rotated | reverted-polluted | revert-failed
follow_up:
  - update DNS A-record at Porkbun: <domain> → <new_ip>
  - append retired IP entry to deploy/aws/stage0/edge-polluted-ips.json
  - (after ~1h DNS propagation) aws ec2 release-address --allocation-id <old_alloc>

Out of scope

Production gateway IP rotation (tokenkey-prod-stage0).
Cross-region "clean EIP pool" maintenance. If the auto path repeatedly draws polluted IPs in a region, the answer is a different region (or an upstream Trust & Safety ticket), not a pre-warmed pool — adding a pool is premature.
DNS automation (Porkbun API). Documented as a known follow-up; would be a separate skill + a separate secret if/when wanted.

v1 (legacy) reference

The previous procedure — throwaway-nano probe, manual associate-address, drift-lock flag, recover-drift Phase 2 detach + IMPORT — is documented in docs/deploy/tokenkey-edge-ip-history.md. After all edges have been migrated to the parameter shape, that document becomes pure history.

related-skills.json

mismo repositorio

tokenkey-stage0-edge-lightsail-expansion.md

from "youxuanxue/sub2api"

End-to-end runbook for adding a TokenKey Stage0 Edge gateway on AWS Lightsail (parallel to the EC2/CFN path): register the edge in deploy/aws/lightsail/edge-targets-lightsail.json, ensure the one-time Lightsail IAM addon + GHCR PAT are in place, provision via deploy-edge-lightsail-stage0.yml, point DNS, smoke, and upgrade/rollback. EC2/CFN remains the default Edge path; this skill covers the Lightsail parallel path only.

2026-05-232

tokenkey-stage0-edge-lightsail-ip-rotation.md

from "youxuanxue/sub2api"

Rotate the egress Static IP of a TokenKey Stage0 Lightsail Edge (uk1-ls / us1-ls / fra1-ls / sg1-ls) when the live IP has been risk-blocked ("polluted") by Anthropic / OpenAI / Google. Mirrors the EC2 EIP rotation posture: a single primitive (ops/lightsail/rotate-static-ip.sh) swaps the Static IP, the operator updates Porkbun DNS, and external verification runs from a clean-egress host. No CloudFormation drift step because Lightsail Edge is not CloudFormation-owned.

2026-05-232

tokenkey-anthropic-oauth-config.md

from "youxuanxue/sub2api"

TokenKey Anthropic 配置写入流水线（snapshot → check → plan → apply → verify）。 **三条写入面**，都由同一个脚本 ops/anthropic/manage-anthropic-config.py 编排，且都 "JSON 派生 SQL、无静态模板、operator 不写 SQL"： (A) edge anthropic OAuth account 的 tier baseline（concurrency / base_rpm / sticky_buffer / max_sessions 等 account 字段）—— 来源 anthropic-oauth-stability-baselines-tiered.json；同一事务把 users.id=1 的 concurrency 更新为该 edge 库内 schedulable=true 的 anthropic 账号 concurrency 之和。 (B) prod anthropic api-key 镜像 stub（base_url=api-*.tokenkey.dev 形状）的 credentials.pool_mode + pool_mode_retry_count —— 来源 anthropic-stub-pool-baselines.json。 (C) prod stub concurrency 镜像（plan-concurrency-mirror）：把 edge users.id=1 与对应 prod stub.concurrency 与 prod users.id=1 都对齐为「Σ schedulable=true anthropic concurrency」的四跳级联——值从 live 派生，不引入新 baseline JSON；stub↔edge 链接按 edge-targets.json 的 domain 字段稳定匹配，不推断。 group.rpm_limit 不由本流水线写——admin UI 直接独立设置。

2026-05-232

tokenkey-anthropic-oauth-priority-by-window.md

from "youxuanxue/sub2api"

TokenKey 跨所有 deployable edge 的 Anthropic OAuth 账号 priority 重排流水线（snapshot → plan → apply → verify）。按账号当前 5h/7d 可用用量窗口剩余度打分，同 stability tier 内重排 priority（smaller wins），剩余越多 priority 越小（越优先调度）。**只写** accounts.priority 一个字段，不动 tier baseline、不动 group.rpm_limit、不动 credentials。单一脚本 ops/anthropic/rebalance-anthropic-priority.py 编排，1 个 SQL 模板固化写入。

2026-05-232

tokenkey-online-log-troubleshooting.md

from "youxuanxue/sub2api"

Read-only TokenKey production/edge troubleshooting workflow for querying live logs, ops_error_logs, Docker containers, SSM targets, CI/deploy runs, and turning evidence into a stable root-cause summary without ad-hoc command guessing.

2026-05-232

tokenkey-online-traffic-profile.md

from "youxuanxue/sub2api"

Read-only TokenKey production/edge traffic-profiling workflow. Reconstructs per-minute request-traffic series for the past N hours per account — base RPM (request-start minute), sticky vs non-sticky (load-balance) RPM split, active sessions (idle-window), and peak concurrency — then compares each against its cap (base_rpm / rpm_sticky_buffer / max_sessions / concurrency) and flags which limit is being touched. Use when asked to profile online traffic, see per-minute RPM/session/concurrency, validate the admin account-card gauges (concurrency 1/8, $/window cost, sessions 16/30, RPM 3/28), or explain "no available accounts" / throttling without ad-hoc command guessing.

2026-05-232

package.json

"author": "youxuanxue"

"repository": "youxuanxue/sub2api"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Administradores de redes y sistemas informáticosOcupaciones informáticas y matemáticas15-1244L4

name

tokenkey-stage0-edge-ip-rotation

description

TokenKey: rotate an edge gateway's egress EIP

Why this is short now

One canonical invocation

gh workflow run deploy-edge-stage0.yml \
  -f edge_id=<id> \
  -f operation=rotate_egress_ip \
  -f confirm_stack=tokenkey-edge-<id>-stage0 \
  -f rotation_reason='<short reason>' \
  [-f candidate_allocation_id=eipalloc-XXXX]

What the workflow does, in order:

Reads the stack's current EipAllocationId (= rollback target).
Allocates a fresh EIP unless candidate_allocation_id is set; cross-checks against edge-polluted-ips.json and re-allocates if dirty.
aws cloudformation deploy --parameter-overrides EipAllocationId=<new> — atomic CFN swap.
Polls ssm:DescribeInstanceInformation until PingStatus=Online (post-mutation invariant; uk1-2026-05-22 was the incident that motivated this gate).
Runs the pollution probe via SSM on the edge itself (no throwaway nano): confirms outbound IP, then curls Anthropic / OpenAI / Google with dummy keys looking for 403 + Cloudflare HTML (= polluted) vs 401/400 + provider-shaped JSON (= clean).
On polluted → automatic revert (CFN update-stack back to OLD_ALLOC) + release the freshly-allocated EIP (only if the workflow itself allocated it).
On clean → curl https://<domain>/health via --resolve <domain>:443:<new_ip> to prove the data plane survives end-to-end on the new IP before DNS propagation.
Emits a step summary with: old/new IP, retired-IP JSON snippet ready to paste into edge-polluted-ips.json, and the Porkbun A-record change to make.

Two operator steps left (intentional)

DNS at Porkbun (or your provider): change the A record for api-<id>.tokenkey.dev to the new IP. The workflow does not automate this because the Porkbun API token is not in repo secrets; the run summary prints the exact transition.
Append the retired IP to edge-polluted-ips.json so future rotations refuse to re-allocate it. The run summary prints a paste-ready JSON entry. After DNS has propagated (~1 hour) you may aws ec2 release-address the old allocation and set released_on.

Everything else is mechanized.

First-time migration (per stack)

A stack that still has the v1 shape (ElasticIP + EIPAssociation with Retain, no EipAllocationId parameter) cannot accept operation=rotate_egress_ip yet. Migrate it once:

# Dry run (read-only):
bash deploy/aws/stage0/migrate-edge-eip-to-parameter.sh <edge_id>

# Apply (changes live CFN):
bash deploy/aws/stage0/migrate-edge-eip-to-parameter.sh <edge_id> --apply

Stop-the-line rules

The workflow itself enforces the data-plane invariants. This skill must still refuse when:

The normalized edge_id is not a key in deploy/aws/stage0/edge-targets.json.
rotation_reason is empty or only whitespace.
The target stack has not been migrated yet (describe-stacks shows no EipAllocationId parameter) — direct the operator to migrate-edge-eip-to-parameter.sh first.
operation=rotate_egress_ip is requested against tokenkey-prod-stage0 (the production gateway). Prod IP rotation has different blast radius (active client connections) and is intentionally not covered by this skill.

The workflow handles the rest as mechanical gates — operator does not need this skill to babysit candidate allocation, probe results, or revert.

Reporting contract

The workflow's step summary is the contract. Nothing else needs to be produced. If you need to summarize for a chat caller, mirror the values from the summary:

edge_id: <id>
region: <aws-region>
old_ip / old_alloc: <ip> / <eipalloc-…>
new_ip / new_alloc: <ip> / <eipalloc-…>
status: rotated | reverted-polluted | revert-failed
follow_up:
  - update DNS A-record at Porkbun: <domain> → <new_ip>
  - append retired IP entry to deploy/aws/stage0/edge-polluted-ips.json
  - (after ~1h DNS propagation) aws ec2 release-address --allocation-id <old_alloc>

Out of scope

Production gateway IP rotation (tokenkey-prod-stage0).
Cross-region "clean EIP pool" maintenance. If the auto path repeatedly draws polluted IPs in a region, the answer is a different region (or an upstream Trust & Safety ticket), not a pre-warmed pool — adding a pool is premature.
DNS automation (Porkbun API). Documented as a known follow-up; would be a separate skill + a separate secret if/when wanted.

tokenkey-stage0-edge-ip-rotation

TokenKey: rotate an edge gateway's egress EIP

Why this is short now

One canonical invocation

Two operator steps left (intentional)

First-time migration (per stack)

Stop-the-line rules

Reporting contract

Out of scope

v1 (legacy) reference

Más de este repositorio

Más de este repositorio

TokenKey: rotate an edge gateway's egress EIP

Why this is short now

One canonical invocation

Two operator steps left (intentional)

First-time migration (per stack)

Stop-the-line rules

Reporting contract

Out of scope

v1 (legacy) reference