ワンクリックで
homoglyph-detector
Byte-level Unicode homoglyph detection for identifying invisible character substitutions in code
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
Byte-level Unicode homoglyph detection for identifying invisible character substitutions in code
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
SOC 職業分類に基づく
Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)
Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)
Clean up .a5c/runs and .a5c/processes directories. Aggregates insights from completed/failed runs into docs/run-history-insights.md, then removes old run data and orphaned process files.
Submit feedback or contribute to babysitter project
manage babysitter plugins. use this command to see the list of installed babysitter plugins, their status, and manage them (install, update, uninstall, list from marketplace, add marketplace, configure plugin, create new plugin, etc).
Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)
| name | homoglyph-detector |
| description | Byte-level Unicode homoglyph detection for identifying invisible character substitutions in code |
| allowed-tools | ["Bash","Read","Grep"] |
Byte-level forensic analysis of code changes to detect Unicode homoglyph substitutions — characters that look identical to ASCII in every editor and diff tool but have different codepoints, silently breaking string comparisons, dictionary lookups, and identifier resolution.
Homoglyph attacks (related to CVE-2021-42574 "Trojan Source") are the highest-stealth trojan technique. A Cyrillic р (U+0440) looks identical to a Latin p (U+0070) in every font, editor, and diff viewer. The only way to detect it is byte-level analysis via hexdump.
This skill pipes git diffs through hexdump -C and scans for multi-byte UTF-8 sequences where single-byte ASCII is expected, particularly in string literals used as dictionary keys, variable names, and identifiers.
Scans for these high-risk Unicode confusables:
| Latin | Cyrillic | Greek | UTF-8 Bytes |
|---|---|---|---|
| a (61) | а (D0 B0) | α (CE B1) | 1 vs 2 bytes |
| c (63) | с (D1 81) | — | 1 vs 2 bytes |
| e (65) | е (D0 B5) | ε (CE B5) | 1 vs 2 bytes |
| o (6F) | о (D0 BE) | ο (CE BF) | 1 vs 2 bytes |
| p (70) | р (D1 80) | ρ (CF 81) | 1 vs 2 bytes |
| x (78) | х (D1 85) | χ (CF 87) | 1 vs 2 bytes |
| y (79) | у (D1 83) | — | 1 vs 2 bytes |
{
"type": "object",
"required": ["projectRoot", "changedFiles"],
"properties": {
"projectRoot": {
"type": "string",
"description": "Absolute path to the git repository"
},
"changedFiles": {
"type": "array",
"items": { "type": "string" },
"description": "List of changed file paths to scan"
},
"scanMode": {
"type": "string",
"enum": ["uncommitted", "commit-range", "branch-diff"],
"default": "uncommitted"
},
"baseRef": { "type": "string" },
"headRef": { "type": "string" }
}
}
{
"type": "object",
"required": ["filesScanned", "homoglyphsFound", "verdict"],
"properties": {
"filesScanned": { "type": "number" },
"homoglyphsFound": {
"type": "array",
"items": {
"type": "object",
"properties": {
"file": { "type": "string" },
"line": { "type": "number" },
"byteOffset": { "type": "string" },
"context": { "type": "string" },
"expectedAscii": { "type": "string" },
"actualBytes": { "type": "string" },
"unicodeCodepoint": { "type": "string" },
"scriptName": { "type": "string" },
"impact": { "type": "string" }
}
}
},
"bidiControlChars": { "type": "array" },
"verdict": {
"type": "string",
"enum": ["CLEAN", "HOMOGLYPH_DETECTED"]
}
}
}
# Step 1: Pipe git diff through hexdump
git diff <file> | hexdump -C
# Step 2: In added (+) lines, look for multi-byte sequences
# where the removed (-) line had single-byte ASCII
#
# Example — Latin 'p' vs Cyrillic 'р':
# Removed: 22 70 70 67 22 | "ppg" | ← 70 = Latin 'p'
# Added: 22 d1 80 70 67 | "..pg" | ← d1 80 = Cyrillic 'р'
#
# The d1 80 bytes where 70 should be = HOMOGLYPH DETECTED
skill: {
name: 'homoglyph-detector',
context: {
projectRoot: '/path/to/project',
changedFiles: ['backend/app/prediction/temporal.py'],
scanMode: 'uncommitted'
}
}
From adversarial drill #6:
"ppg" changed to "рpg" (Cyrillic р + Latin pg)round() wrappers added as decoydict.get("ppg") lookups return default 0, disabling trend detectionhexdump -C revealed bytes d1 80 where 70 was expectednation-state-trojan-detection.js — Phase 2: Homoglyph Detection (parallel with semantic analysis)