원클릭으로
homoglyph-detector
Byte-level Unicode homoglyph detection for identifying invisible character substitutions in code
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
Byte-level Unicode homoglyph detection for identifying invisible character substitutions in code
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
SOC 직업 분류 기준
Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)
Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)
Clean up .a5c/runs and .a5c/processes directories. Aggregates insights from completed/failed runs into docs/run-history-insights.md, then removes old run data and orphaned process files.
Submit feedback or contribute to babysitter project
manage babysitter plugins. use this command to see the list of installed babysitter plugins, their status, and manage them (install, update, uninstall, list from marketplace, add marketplace, configure plugin, create new plugin, etc).
Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)
| name | homoglyph-detector |
| description | Byte-level Unicode homoglyph detection for identifying invisible character substitutions in code |
| allowed-tools | ["Bash","Read","Grep"] |
Byte-level forensic analysis of code changes to detect Unicode homoglyph substitutions — characters that look identical to ASCII in every editor and diff tool but have different codepoints, silently breaking string comparisons, dictionary lookups, and identifier resolution.
Homoglyph attacks (related to CVE-2021-42574 "Trojan Source") are the highest-stealth trojan technique. A Cyrillic р (U+0440) looks identical to a Latin p (U+0070) in every font, editor, and diff viewer. The only way to detect it is byte-level analysis via hexdump.
This skill pipes git diffs through hexdump -C and scans for multi-byte UTF-8 sequences where single-byte ASCII is expected, particularly in string literals used as dictionary keys, variable names, and identifiers.
Scans for these high-risk Unicode confusables:
| Latin | Cyrillic | Greek | UTF-8 Bytes |
|---|---|---|---|
| a (61) | а (D0 B0) | α (CE B1) | 1 vs 2 bytes |
| c (63) | с (D1 81) | — | 1 vs 2 bytes |
| e (65) | е (D0 B5) | ε (CE B5) | 1 vs 2 bytes |
| o (6F) | о (D0 BE) | ο (CE BF) | 1 vs 2 bytes |
| p (70) | р (D1 80) | ρ (CF 81) | 1 vs 2 bytes |
| x (78) | х (D1 85) | χ (CF 87) | 1 vs 2 bytes |
| y (79) | у (D1 83) | — | 1 vs 2 bytes |
{
"type": "object",
"required": ["projectRoot", "changedFiles"],
"properties": {
"projectRoot": {
"type": "string",
"description": "Absolute path to the git repository"
},
"changedFiles": {
"type": "array",
"items": { "type": "string" },
"description": "List of changed file paths to scan"
},
"scanMode": {
"type": "string",
"enum": ["uncommitted", "commit-range", "branch-diff"],
"default": "uncommitted"
},
"baseRef": { "type": "string" },
"headRef": { "type": "string" }
}
}
{
"type": "object",
"required": ["filesScanned", "homoglyphsFound", "verdict"],
"properties": {
"filesScanned": { "type": "number" },
"homoglyphsFound": {
"type": "array",
"items": {
"type": "object",
"properties": {
"file": { "type": "string" },
"line": { "type": "number" },
"byteOffset": { "type": "string" },
"context": { "type": "string" },
"expectedAscii": { "type": "string" },
"actualBytes": { "type": "string" },
"unicodeCodepoint": { "type": "string" },
"scriptName": { "type": "string" },
"impact": { "type": "string" }
}
}
},
"bidiControlChars": { "type": "array" },
"verdict": {
"type": "string",
"enum": ["CLEAN", "HOMOGLYPH_DETECTED"]
}
}
}
# Step 1: Pipe git diff through hexdump
git diff <file> | hexdump -C
# Step 2: In added (+) lines, look for multi-byte sequences
# where the removed (-) line had single-byte ASCII
#
# Example — Latin 'p' vs Cyrillic 'р':
# Removed: 22 70 70 67 22 | "ppg" | ← 70 = Latin 'p'
# Added: 22 d1 80 70 67 | "..pg" | ← d1 80 = Cyrillic 'р'
#
# The d1 80 bytes where 70 should be = HOMOGLYPH DETECTED
skill: {
name: 'homoglyph-detector',
context: {
projectRoot: '/path/to/project',
changedFiles: ['backend/app/prediction/temporal.py'],
scanMode: 'uncommitted'
}
}
From adversarial drill #6:
"ppg" changed to "рpg" (Cyrillic р + Latin pg)round() wrappers added as decoydict.get("ppg") lookups return default 0, disabling trend detectionhexdump -C revealed bytes d1 80 where 70 was expectednation-state-trojan-detection.js — Phase 2: Homoglyph Detection (parallel with semantic analysis)