End-to-end runbook for adding a TokenKey Stage0 Edge gateway on AWS Lightsail (parallel to the EC2/CFN path): register the edge in deploy/aws/lightsail/edge-targets-lightsail.json, ensure the one-time Lightsail IAM addon + GHCR PAT are in place, provision via deploy-edge-lightsail-stage0.yml, point DNS, smoke, and upgrade/rollback. EC2/CFN remains the default Edge path; this skill covers the Lightsail parallel path only.
Rotate the egress Static IP of a TokenKey Stage0 Lightsail Edge (uk1-ls / us1-ls / fra1-ls / sg1-ls) when the live IP has been risk-blocked ("polluted") by Anthropic / OpenAI / Google. Mirrors the EC2 EIP rotation posture: a single primitive (ops/lightsail/rotate-static-ip.sh) swaps the Static IP, the operator updates Porkbun DNS, and external verification runs from a clean-egress host. No CloudFormation drift step because Lightsail Edge is not CloudFormation-owned.
Rotate / replace the egress Elastic IP of a TokenKey Stage0 edge (uk1/us1/sg1/fra1/…) when the live IP has been risk-blocked ("polluted") by an upstream API (Anthropic / OpenAI / Google). Drives the single canonical path: a workflow_dispatch of deploy-edge-stage0.yml with operation=rotate_egress_ip, which does a CFN-native UpdateStack — no detach, no IMPORT, no drift class. Auto-allocates a clean candidate (checked against edge-polluted-ips.json), swaps via CFN, verifies SSM Online + outbound IP + Anthropic/OpenAI/Google pollution probe from the edge itself, and auto-reverts on a polluted result. The only operator step that remains is the DNS A-record update at Porkbun (and committing the retired IP into edge-polluted-ips.json).
TokenKey Anthropic 配置写入流水线(snapshot → check → plan → apply → verify)。 **三条写入面**,都由同一个脚本 ops/anthropic/manage-anthropic-config.py 编排,且都 "JSON 派生 SQL、无静态模板、operator 不写 SQL": (A) edge anthropic OAuth account 的 tier baseline(concurrency / base_rpm / sticky_buffer / max_sessions 等 account 字段)—— 来源 anthropic-oauth-stability-baselines-tiered.json;同一事务把 users.id=1 的 concurrency 更新为该 edge 库内 schedulable=true 的 anthropic 账号 concurrency 之和。 (B) prod anthropic api-key 镜像 stub(base_url=api-*.tokenkey.dev 形状)的 credentials.pool_mode + pool_mode_retry_count —— 来源 anthropic-stub-pool-baselines.json。 (C) prod stub concurrency 镜像(plan-concurrency-mirror):把 edge users.id=1 与 对应 prod stub.concurrency 与 prod users.id=1 都对齐为「Σ schedulable=true anthropic concurrency」的四跳级联——值从 live 派生,不引入新 baseline JSON;stub↔edge 链接按 edge-targets.json 的 domain 字段稳定匹配,不推断。 group.rpm_limit 不由本流水线写——admin UI 直接独立设置。
TokenKey 跨所有 deployable edge 的 Anthropic OAuth 账号 priority 重排流水线 (snapshot → plan → apply → verify)。按账号当前 5h/7d 可用用量窗口剩余度 打分,同 stability tier 内重排 priority(smaller wins),剩余越多 priority 越小(越优先调度)。**只写** accounts.priority 一个字段,不动 tier baseline、 不动 group.rpm_limit、不动 credentials。单一脚本 ops/anthropic/rebalance-anthropic-priority.py 编排,1 个 SQL 模板固化写入。
Read-only TokenKey production/edge troubleshooting workflow for querying live logs, ops_error_logs, Docker containers, SSM targets, CI/deploy runs, and turning evidence into a stable root-cause summary without ad-hoc command guessing.
Read-only TokenKey production/edge traffic-profiling workflow. Reconstructs per-minute request-traffic series for the past N hours per account — base RPM (request-start minute), sticky vs non-sticky (load-balance) RPM split, active sessions (idle-window), and peak concurrency — then compares each against its cap (base_rpm / rpm_sticky_buffer / max_sessions / concurrency) and flags which limit is being touched. Use when asked to profile online traffic, see per-minute RPM/session/concurrency, validate the admin account-card gauges (concurrency 1/8, $/window cost, sessions 16/30, RPM 3/28), or explain "no available accounts" / throttling without ad-hoc command guessing.
End-to-end runbook for adding a new AWS Stage0 Edge gateway beyond existing uk1/fra1: prepare metadata and IAM/OIDC scope, provision edge stack, set DNS, run smoke/upgrade/rollback via deploy-edge-stage0.yml, and report structured acceptance results with known failure patterns.