con un clic
process-logs
// Process error logs from admin panel - fetch new errors, analyze, create tasks, fix, and mark resolved. Checks Axiom/Pino if DB entries are filtered.
// Process error logs from admin panel - fetch new errors, analyze, create tasks, fix, and mark resolved. Checks Axiom/Pino if DB entries are filtered.
Generate comprehensive role guides (job descriptions on steroids) for any position. Multi-phase workflow with web research, adaptive questions, 26 content blocks based on Netflix/Amazon/Toyota/Spotify/Bridgewater best practices, and course brief generation. Use /job-description to create a role guide.
Analyzes code changes for bugs, security vulnerabilities, performance problems, and architectural improvements. Generates structured markdown reports with evidence-based findings tied to specific file:line references, and creates Beads tasks for tracking. Adapts review depth based on scope — from quick pre-commit checks to thorough branch reviews. Use proactively when reviewing code before commits, after implementing features, before PRs, when checking code quality, or when the user mentions "review", "check changes", "code review", "PR review", "review branch", or any code quality assessment.
Inline orchestration workflow for dead code detection and removal with Beads integration. Provides step-by-step phases for dead-code-hunter detection, priority-based cleanup with dead-code-remover, and verification cycles.
Inline orchestration workflow for dependency audit and updates with Beads integration. Provides step-by-step phases for dependency-auditor detection, priority-based updates with dependency-updater, and verification cycles.
Generate standardized conventional commit messages with Claude Code attribution. Use when creating automated commits, release commits, or any git commit requiring consistent formatting.
Guide frontend design decisions to create distinctive, creative UIs that avoid generic AI-generated aesthetics. Use when building UI components, designing layouts, selecting colors/fonts, or implementing animations.
| name | process-logs |
| description | Process error logs from admin panel - fetch new errors, analyze, create tasks, fix, and mark resolved. Checks Axiom/Pino if DB entries are filtered. |
| version | 1.9.0 |
Automated workflow for processing error logs from /admin/logs.
YOU MUST FOLLOW THESE RULES. NO EXCEPTIONS.
EVERY error MUST have a Beads task. No direct fixes without tracking.
# ALWAYS run this FIRST for each error:
bd create --type=bug --priority=<1-3> --title="Fix: <error_message>" --files "<relevant_files>"
bd update <task_id> --status=in_progress
Route tasks by complexity:
| Complexity | Examples | Action |
|---|---|---|
| Simple | Typo fix, single import, config value | Execute directly |
| Medium | Multi-file fix, migration, API change | Delegate to subagent |
| Complex | Architecture change, new feature | Ask user first |
Subagent selection for MEDIUM tasks:
database-architectfullstack-nextjs-specialisttypescript-types-specialistnextjs-ui-designerExecute directly for SIMPLE tasks:
ALWAYS query documentation before implementing:
mcp__context7__resolve-library-id → mcp__context7__query-docs
This is PRODUCTION. Every bug matters.
Fix fundamentally, not superficially:
Never ignore errors:
Propose improvements:
bd create --type=chore --title="Improve: <description>"Quality over speed:
Always write notes when updating log status. Keep it brief, in English.
| Status | What to write in notes |
|---|---|
resolved | Root cause + fix applied. Example: Missing constraint. Added 'approved' to enum via migration. |
auto_muted | System-assigned. Don't change. Skip these errors in processing. |
ignored | Never use. Fix or ask user. |
to_verify | Why pending + what to check. Auto-resolved after 14d if no recurrence. Example: External API timeout. Monitor for 24h. |
in_progress | Beads task ID. Example: Working on mc2-5ch |
Format: <root_cause>. <action_taken>. — Max 100 chars.
Examples:
ESM import conflict. Renamed generator.ts to generator-node.ts.Constraint missing 'approved'. Added via migration 20250115_fix_status.Cloudflare 500. External issue, retry logic already exists. Monitoring.Some errors are automatically ignored by the system with status auto_muted. These are expected events, NOT bugs.
Current auto-mute rules (from src/shared/logger/auto-classification.ts, total: 58):
| Pattern | Reason | Description |
|---|---|---|
Redis connection (ended|closed) | graceful_shutdown | Redis disconnects during app restart |
graceful.*shutdown | graceful_shutdown | Server shutdown events during deploys |
/api/trpc/health.*404 | monitoring_probe | tRPC health endpoint probes (Uptime Kuma) |
/health.*404 | monitoring_probe | Generic health check probes |
Cloudflare.*5\d{2} | external_service | Cloudflare edge errors (502, 503, 521) |
ECONNRESET.*external | external_service | External API connection resets |
Layer failed, trying next | cascading_repair | Repair layer failed, trying next layer |
Critique-revise attempt failed | cascading_repair | Layer 2 retry attempt failed |
Zod.*validation failed.*Layer | cascading_repair | Layer 1 validation failed, escalating |
Job stalled | job_lifecycle | BullMQ job restarted (long LLM operations) |
Unexpected exit code: 10 | job_lifecycle | Worker TTL timeout (10 min), will retry |
No RAG chunks found | expected_behavior | Course without docs, generates w/o RAG |
Mermaid.*fallback.*used | graceful_fallback | Diagram gen failed, fallback to text |
/trpc/.*401 | expected_behavior | Unauthenticated tRPC request, 401 correct |
Cache directory does not exist | expected_behavior | Cache missing on fresh env, created later |
ModelConfigBunker.*sync.*fail | external_service | Network issue, has retry with backoff |
Invalid status for approval | ui_race_condition | User clicked approve but course progressed |
Job .+ not found | expected_behavior | Frontend polls job status after cleanup |
Failed to log generation trace | expected_behavior | Trace insert failed during pool pressure |
Patcher.*REJECTED.*truncated | graceful_fallback | Truncated content detected, returns original |
Preprocessing failed.*using raw | graceful_fallback | Preprocessing failed, using raw LLM output |
Stage 5.*Primary model attempt | cascading_repair | Stage 5 primary model unavailable, will retry |
JSON repair failed after all | graceful_fallback | JSON repair exhausted, LLM output too malformed |
ModelConfigBunker.*LKG file | graceful_fallback | LKG atomic write race, has Redis+DB fallback |
could not renew lock for job | job_lifecycle | BullMQ lock renewal failed, will restart |
Missing key for job.*moveToDelayed | job_lifecycle | BullMQ race condition, job already done |
Critical language consistency | expected_behavior | Cyrillic false positive in Russian courses |
Critical heuristic failures | expected_behavior | Heuristic skipped LLM review (false positive) |
Rate limit exceeded | expected_behavior | tRPC rate limiter working as designed |
/trpc/lessonContent.*429 | expected_behavior | HTTP 429 from rate limiter on partial generate |
/trpc/jobs\.getStatus 404 | expected_behavior | HTTP 404 from job status poll after cleanup |
Sufficiency verdict.*defaulting | graceful_fallback | Phase 0.5 Zod validation fallback, non-blocking |
Batch section insert failed.*fallback | graceful_fallback | Batch insert duplicate → individual fallback |
Failed to create section record | graceful_fallback | Individual section insert skipped (already exists) |
Content failed sanity check.*non-blocking | expected_behavior | Sanity check warning, content still accepted |
Unavailable For Legal Reasons|content policy violation | content_policy | Jina API content policy rejection (PII/legal) |
Using STALE phase config due to database error | graceful_fallback | ModelConfigBunker stale config during DB outage |
\[Phase 6\] Max retries reached.*best-effort | graceful_fallback | Phase 6 summary retries exhausted, best-effort |
MISCONF Redis.*unable to persist | infrastructure | Redis RDB/AOF disk issue - infra problem, not app |
Concurrency limit exceeded | external_service | Jina API server-side concurrency enforcement |
getaddrinfo EAI_AGAIN | infrastructure | DNS resolution failure during deploy/restart |
Course not ready.*generation | expected_behavior | Frontend polls after course completed or not ready |
Total rules: 60 (test validates sync with code)
Metadata-aware matching (v1.10):
shouldAutoMute() now accepts optional context?: { message?: string } parameter. For tRPC errors where error_message is generic ("tRPC error"), the actual error from metadata.message is also checked against all patterns. This catches ~2000+ errors per week that previously slipped through.
Test environment auto-muting:
Errors from NODE_ENV=test (vitest) are automatically muted at insert time via muteTestEnvironmentLog() in error-service.ts. They get environment = 'test' and status = 'auto_muted' immediately. This prevents test errors from polluting the admin logs UI and triggering auto-reopen.
When you see auto_muted errors:
auto-classification.tslogPermanentFailure() (canonical path) may still get auto-muted after insert.How to add a new auto-mute rule:
Edit packages/course-gen-platform/src/shared/logger/auto-classification.ts:
{
pattern: /your-pattern/i,
reason: 'category', // graceful_shutdown | monitoring_probe | external_service
description: 'Why this is expected',
}
Update this SKILL.md with the new pattern
When NOT to auto-mute:
Before fixing ANY error, search BOTH sources:
# Search by error keywords
bd search "<keyword>" --type=bug --status=closed
# Example searches:
bd search "constraint violation"
bd search "tRPC timeout"
bd search "undefined property"
What to look for in Beads:
-- Search similar errors by message (use mcp__supabase__execute_sql)
SELECT el.id, el.error_message, el.severity, lis.status, lis.notes, el.created_at
FROM error_logs el
LEFT JOIN log_issue_status lis ON lis.log_id = el.id AND lis.log_type = 'error_log'
WHERE to_tsvector('english', el.error_message) @@ plainto_tsquery('english', '<keyword>')
AND lis.status = 'resolved'
ORDER BY el.created_at DESC
LIMIT 5;
What to search for:
constraint, undefined, timeout, not foundnotes field — contains root cause and fixSimilar to mc2-xxx / <date>. Same fix applied.Every new error MUST have a Beads task before fixing:
# 1. Create task with all required fields
bd create --type=bug --priority=<1-3> --title="Fix: <error_message>" --files "<relevant_files>"
# 2. Start working
bd update <task_id> --status=in_progress
# 3. After fix - close with detailed reason
bd close <task_id> --reason="Root cause: <why>. Fix: <what was done>."
Beads task MUST include:
--files)Invoke via: /process-logs or "обработай логи ошибок"
IMPORTANT: The /admin/logs UI shows errors from TWO tables:
error_logs — system errors, validation failures, worker errorsgeneration_trace (where error_data IS NOT NULL) — LLM generation errorsBoth tables must be checked. Logs without a log_issue_status record show as "Новый" (new) in the UI.
-- Use mcp__supabase__execute_sql
-- NOTE: This excludes auto_muted errors (they are handled automatically)
SELECT el.id, el.severity, el.error_message, el.metadata, el.stack_trace,
el.course_id, el.lesson_id, el.request_id, el.trpc_path, el.trpc_input, el.attempted_value
FROM error_logs el
LEFT JOIN log_issue_status lis ON lis.log_id = el.id AND lis.log_type = 'error_log'
WHERE lis.id IS NULL OR (lis.status NOT IN ('resolved', 'ignored', 'auto_muted'))
ORDER BY
CASE el.severity WHEN 'CRITICAL' THEN 1 WHEN 'ERROR' THEN 2 ELSE 3 END,
el.created_at DESC
LIMIT 20;
-- generation_trace with error_data shows as ERROR in UI
SELECT gt.id, gt.created_at, gt.stage, gt.phase, gt.step_name, gt.course_id,
(gt.error_data->>'message')::text as error_message
FROM generation_trace gt
LEFT JOIN log_issue_status lis ON gt.id = lis.log_id AND lis.log_type = 'generation_trace'
WHERE gt.error_data IS NOT NULL
AND (lis.id IS NULL OR lis.status NOT IN ('resolved', 'ignored', 'auto_muted'))
ORDER BY gt.created_at DESC
LIMIT 20;
-- Quick check: how many "new" errors in each table?
SELECT
'error_logs' as source,
(SELECT COUNT(*) FROM error_logs el
LEFT JOIN log_issue_status lis ON el.id = lis.log_id AND lis.log_type = 'error_log'
WHERE lis.id IS NULL) as new_count
UNION ALL
SELECT
'generation_trace' as source,
(SELECT COUNT(*) FROM generation_trace gt
LEFT JOIN log_issue_status lis ON gt.id = lis.log_id AND lis.log_type = 'generation_trace'
WHERE gt.error_data IS NOT NULL AND lis.id IS NULL) as new_count;
CRITICAL: Both DEV and STAGE are production-like servers. ALL errors on these environments must be investigated and fixed. Only LOCAL (NULL) can be bulk-resolved.
The error_logs table has an environment column that indicates where the error occurred:
| Value | Environment | Action |
|---|---|---|
NULL | Local dev | Bulk resolve — local testing/development only |
'dev' | Dev server | MUST FIX — real errors affecting developers |
'stage' | Staging (prod) | MUST FIX — real production errors |
Always check environment distribution first:
-- Check how many errors per environment (includes both NULL status AND status='new')
SELECT environment, COUNT(*) as count
FROM error_logs el
LEFT JOIN log_issue_status lis ON lis.fingerprint = el.fingerprint AND lis.log_type = 'error_log'
WHERE lis.id IS NULL OR lis.status = 'new'
GROUP BY environment
ORDER BY count DESC;
Bulk resolve LOCAL errors only:
-- Bulk resolve ONLY local environment errors (environment IS NULL)
-- NEVER bulk resolve dev or stage errors - they must be investigated individually!
-- NOTE: This handles both NULL status AND status='new'
WITH local_fingerprints AS (
SELECT DISTINCT ON (el.fingerprint) el.id, el.fingerprint
FROM error_logs el
LEFT JOIN log_issue_status lis ON lis.fingerprint = el.fingerprint AND lis.log_type = 'error_log'
WHERE (lis.id IS NULL OR lis.status = 'new')
AND el.environment IS NULL
AND el.fingerprint IS NOT NULL
ORDER BY el.fingerprint, el.created_at DESC
)
INSERT INTO log_issue_status (log_type, log_id, status, notes, fingerprint, updated_at)
SELECT 'error_log', lf.id, 'resolved', 'Local environment: Testing/development errors', lf.fingerprint, NOW()
FROM local_fingerprints lf
ON CONFLICT (log_type, log_id) DO UPDATE SET status = 'resolved', notes = EXCLUDED.notes, updated_at = NOW();
Focus on server errors (dev + stage):
-- Get only SERVER errors (dev and stage environments)
-- NOTE: Includes both NULL status AND status='new'
SELECT
el.environment,
el.fingerprint,
el.severity,
MIN(el.error_message) as error_message,
COUNT(*) as count,
MAX(el.created_at) as last_seen
FROM error_logs el
LEFT JOIN log_issue_status lis ON lis.fingerprint = el.fingerprint AND lis.log_type = 'error_log'
WHERE (lis.id IS NULL OR lis.status = 'new')
AND el.fingerprint IS NOT NULL
AND el.environment IS NOT NULL -- Exclude local (NULL)
GROUP BY el.environment, el.fingerprint, el.severity
ORDER BY
CASE el.severity WHEN 'CRITICAL' THEN 1 WHEN 'ERROR' THEN 2 ELSE 3 END,
COUNT(*) DESC
LIMIT 20;
Why this matters:
Auto-resolution of stale
to_verifyfingerprints. Run on EVERY skill invocation.
Before processing new errors, resolve stale to_verify fingerprints:
-- Use mcp__supabase__execute_sql
-- Resolves inactive to_verify (14d no recurrence) and reopens recurred ones
SELECT resolve_inactive_to_verify(14);
Returns JSON:
{
"resolved_count": 3,
"reopened_count": 1,
"resolved_fingerprints": ["abc...", "def..."],
"reopened_fingerprints": ["ghi..."],
"inactive_days": 14
}
resolved_count > 0: Fixes confirmed. Include count in Step 3 summary.reopened_count > 0: Errors recurred — fixes didn't work. These fingerprints are now in_progress and will appear in Step 2 processing. Prioritize them.to_verify fingerprints pending. Continue to Step 2.-- Get details of reopened fingerprints for Step 2 processing
SELECT lis.fingerprint, lis.notes,
(SELECT MIN(el.error_message) FROM error_logs el WHERE el.fingerprint = lis.fingerprint) as error_message,
(SELECT COUNT(*) FROM error_logs el
WHERE el.fingerprint = lis.fingerprint
AND el.created_at > lis.updated_at - INTERVAL '14 days') as recent_count
FROM log_issue_status lis
WHERE lis.status = 'in_progress'
AND lis.notes LIKE 'Recurred after fix%'
AND lis.updated_at > NOW() - INTERVAL '5 minutes';
FOR each error:
1. CREATE BEADS TASK (MANDATORY):
bd create --type=bug --priority=<1-3> --title="Fix: <message>" --files "<files>"
bd update <id> --status=in_progress
2. ANALYZE error type and SELECT subagent:
- DB constraint → database-architect
- tRPC/API → fullstack-nextjs-specialist
- Types → typescript-types-specialist
- UI → nextjs-ui-designer
3. QUERY context7 for relevant docs
4. DELEGATE using Task tool:
Task(subagent_type="<selected>", prompt="Fix error: <details>...")
5. VERIFY results (MANDATORY):
- Read tool: check modified files
- Bash: pnpm type-check && pnpm build
- If errors → re-delegate
6. MARK resolved in DB:
-- For error_logs:
INSERT INTO log_issue_status (log_type, log_id, status, notes, updated_at)
VALUES ('error_log', '<id>', 'resolved', 'Fixed: <desc>', NOW())
ON CONFLICT (log_type, log_id) DO UPDATE SET status = 'resolved', notes = EXCLUDED.notes, updated_at = NOW();
-- For generation_trace:
INSERT INTO log_issue_status (log_type, log_id, status, notes, updated_at)
VALUES ('generation_trace', '<id>', 'resolved', 'Fixed: <desc>', NOW())
ON CONFLICT (log_type, log_id) DO UPDATE SET status = 'resolved', notes = EXCLUDED.notes, updated_at = NOW();
7. CLOSE Beads task:
bd close <id> --reason="Fixed"
## Log Processing Summary
| Severity | Fixed | Pending | To Verify |
| -------- | ----- | ------- | --------- |
| CRITICAL | X | Y | Z |
| ERROR | X | Y | Z |
| WARNING | X | Y | Z |
### to_verify Auto-Resolution
| Action | Count |
| --------------------------------- | ----- |
| Auto-resolved (14d no recurrence) | X |
| Reopened (error recurred) | Y |
### Beads Tasks Created:
- mc2-xxx: <description> → <status>
### Pending (need user input):
- <log_id>: <reason>
Task(
subagent_type="database-architect",
prompt="Fix DB constraint violation in error_logs.
Error: <full_error_message>
Context: <stack_trace>
Course: <course_id>
Create migration to fix the constraint."
)
Task(
subagent_type="fullstack-nextjs-specialist",
prompt="Fix tRPC error in <trpc_path>.
Error: <full_error_message>
Input: <trpc_input>
Stack: <stack_trace>
Fix the API endpoint."
)
Task(
subagent_type="typescript-types-specialist",
prompt="Fix TypeScript type error.
Error: <full_error_message>
File: <file_path>
Fix types and ensure compatibility."
)
Before marking ANY error as resolved:
pnpm type-check passespnpm build passes| Pattern | Category | Subagent | Priority |
|---|---|---|---|
violates.*constraint | DB constraint | database-architect | 1 |
tRPC error | API bug | fullstack-nextjs-specialist | 2 |
Type.*error | Type error | typescript-types-specialist | 2 |
Error querying | Query bug | database-architect | 2 |
| Config missing | Config issue | ASK USER | 3 |
| External service | External | mark to_verify | 3 |
| Redis shutdown | Expected | SKIP (auto_muted) | - |
| Health probe 404 | Expected | SKIP (auto_muted) | - |
Errors with status auto_muted are automatically ignored by the system. Skip them.
.claude/docs/admin-logs-guide.mdpackages/course-gen-platform/src/shared/logger/types.tspackages/course-gen-platform/src/server/routers/admin/logs.tsThe /admin/logs page aggregates errors from two sources:
| Table | log_type | What it contains |
|---|---|---|
error_logs | 'error_log' | System errors, validation, worker failures |
generation_trace | 'generation_trace' | LLM errors (where error_data IS NOT NULL) |
Status is tracked in log_issue_status table with composite key (log_type, log_id).
UI Logic: Status shows as "Новый" (new) when:
log_issue_status record exists for the fingerprintstatus = 'new'IMPORTANT: Always check BOTH conditions when querying for new errors.
The UI has two views:
log_idfingerprint, status by fingerprintAuto-sync trigger (trg_sync_log_status_fingerprint):
log_issue_status for an error_logfingerprint from error_logsIMPORTANT: You don't need to manually handle fingerprint — the trigger does it automatically. Just use the standard INSERT INTO log_issue_status by log_id.
Error flow after volume optimization:
logger.warn/error()
|
[Proxy Interceptor]
├── Pino → stdout → Axiom (ALWAYS, no filter)
└── writeToErrorLogs()
├── shouldAutoMute() → SKIP if matches auto-mute rules (58 patterns)
├── shouldWriteToDb() → SKIP if rate-limited (>5/min per fingerprint)
└── INSERT into error_logs
logPermanentFailure() (canonical path, bypasses proxy filters)
└── INSERT/UPSERT into error_logs → applyAutoMuteStatus()
Key points:
logPermanentFailure() is the canonical DB write — NOT affected by pre-insert filterbaseLogger.warn/error() bypasses proxy entirely (Pino only, no DB)If you suspect missing errors in error_logs (filtered by optimization):
Check Pino/Axiom logs for the full unfiltered stream:
/admin/logsFiles involved in filtering:
src/shared/logger/index.ts — proxy interceptor + writeToErrorLogs (pre-insert filter)src/shared/logger/auto-classification.ts — auto-mute patterns (58 rules)src/shared/logger/rate-limiter.ts — per-fingerprint rate limitersrc/shared/logger/error-service.ts — logPermanentFailure (canonical path, has own auto-mute)