一键导入
visual-memory
Embed reference media (photos, voice, video templates) as base64 data URIs in skills for self-sufficient, portable, consistent generation
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Embed reference media (photos, voice, video templates) as base64 data URIs in skills for self-sufficient, portable, consistent generation
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
Create and maintain ASCII visual dashboards for project tracking with parallel lane progress bars
Store and manage voice samples for TTS cloning — portable, version-controlled audio references
Clear documentation through visual excellence
AI music generation via Replicate — 5 models for background tracks, lyrics, and sound design
Practitioner methodology for longitudinal case study research, evidence-based documentation, and publication-ready academic writing in AI-assisted development.
First impressions matter. Set projects up for success.
基于 SOC 职业分类
| name | visual-memory |
| description | Embed reference media (photos, voice, video templates) as base64 data URIs in skills for self-sufficient, portable, consistent generation |
| tier | extended |
| applyTo | **/*visual*,**/*reference*,**/*portrait*,**/*base64* |
| muscle | .github/muscles/visual-memory.cjs |
| metadata | {"inheritance":"inheritable"} |
Self-sufficient skills that carry their own reference media — no external folder dependencies.
Applies To: Any skill needing consistent visual identity, voice, or motion style across multiple generation tasks.
Skills that depend on external reference files (photo folders, audio samples) break when:
Embed optimized reference assets directly in the skill as base64 data URIs. The skill becomes fully self-sufficient — it works anywhere, exactly the same way, every time.
skill-folder/
├── SKILL.md
├── synapses.json
└── visual-memory/
├── index.json ← Metadata only (no binary data)
├── visual-memory.json ← Full base64 data URIs (~30-80KB per photo)
└── subject-1.jpg ← Optional: keep originals alongside
└── subject-2.jpg
Reference photos for face-consistent portrait generation. Embedded to eliminate folder dependencies.
| Spec | Value |
|---|---|
| Target size | 512px longest edge |
| Quality | 85% JPEG |
| Per-photo size | ~40-80KB (vs ~2MB originals) |
| Format | data:image/jpeg;base64,<encoded> |
| Quantity | 5-8 photos per subject, varied angles |
When to use: Face-consistent portrait generation, AI character references, persona avatars.
Moved to dedicated skill: See audio-memory/SKILL.md for voice sample storage and TTS cloning
# Install ImageMagick if needed:
# macOS: brew install imagemagick
# Windows: winget install ImageMagick.ImageMagick
# Resize single photo: 512px longest edge @ 85% JPEG quality
magick input.jpg -resize 512x512> -quality 85 output.jpg
# Batch resize folder
Get-ChildItem *.jpg | ForEach-Object {
magick $_.Name -resize "512x512>" -quality 85 "resized/$($_.Name)"
}
# Convert PNG to optimized JPG
magick input.png -resize "512x512>" -quality 85 output.jpg
import { readFileSync, writeFileSync } from "fs";
import { basename } from "path";
function toDataUri(imagePath) {
const buffer = readFileSync(imagePath);
const ext = imagePath.toLowerCase().endsWith(".png") ? "png" : "jpeg";
return `data:image/${ext};base64,${buffer.toString("base64")}`;
}
// Batch convert and write to JSON
const photos = ["photo1.jpg", "photo2.jpg", "photo3.jpg"];
const images = photos.map((p) => ({
filename: basename(p),
dataUri: toDataUri(p),
notes: "",
}));
writeFileSync("images.json", JSON.stringify({ images }, null, 2));
Quick PowerShell (single file to clipboard):
[Convert]::ToBase64String([IO.File]::ReadAllBytes("photo.jpg")) | Set-Clipboard
# Paste with prefix: "data:image/jpeg;base64,<paste>"
{
"schema": "visual-memory-v1",
"generated": "2026-03-01",
"subjects": {
"person-name": {
"description": "Brief visual description",
"ageInfo": {
"referenceAge": 30,
"birthYear": 1990,
"photoDate": "2026-03"
},
"images": [
{
"filename": "person-1.jpg",
"dataUri": "data:image/jpeg;base64,<base64-encoded-image>",
"notes": "Front-facing, natural lighting"
},
{
"filename": "person-2.jpg",
"dataUri": "data:image/jpeg;base64,<base64-encoded-image>",
"notes": "3/4 profile, outdoor"
}
]
}
}
}
{
"version": "1.0",
"generated": "2026-03-01",
"targetSize": 512,
"subjects": {
"person-name": {
"count": 7,
"files": ["person-1.jpg", "person-2.jpg", "person-3.jpg"]
}
}
}
import { readFileSync } from "fs";
import { join, dirname } from "path";
import { fileURLToPath } from "url";
const __dirname = dirname(fileURLToPath(import.meta.url));
const VISUAL_MEMORY_PATH = join(
__dirname,
".github/skills/<skill-name>/visual-memory/visual-memory.json"
);
function loadVisualMemory() {
const data = JSON.parse(readFileSync(VISUAL_MEMORY_PATH, "utf8"));
return Object.fromEntries(
Object.entries(data.subjects).map(([name, subject]) => [
name,
subject.images.map((i) => i.dataUri),
])
);
}
const visualMemory = loadVisualMemory();
// visualMemory.personName → array of data URIs
The reference photos speak for themselves. Only describe:
NEVER include:
| Model | Parameter | Max Refs | Notes |
|---|---|---|---|
nano-banana-pro | image_input | 14 | Array of data URIs, 4K output |
nano-banana-2 | image_input | 14 | Faster/cheaper alternative (Gemini 3.1 Flash) |
flux-2-pro | input_images | 8 | Array of data URIs |
flux-2-flex | input_images | 10 | Max-quality editing |
ideogram-v2 | ❌ None | — | No face reference |
Always start the prompt with explicit reference instruction:
Generate a photo of EXACTLY the person shown in the reference images.
For multiple subjects at once:
Generate a photo with two people.
LEFT: EXACTLY the person from [Name A]'s reference images, wearing [clothing].
RIGHT: EXACTLY the person from [Name B]'s reference images, wearing [clothing].
[Scene description]. [Lighting]. Professional photography.
Store consistent motion style as JSON — not actual video files:
{
"videoStyles": {
| ---------- | ------------------------------------------- |
| Quantity | 5-8 photos (more = better likeness) |
| Angles | Front, 3/4 left, 3/4 right, slight profile |
| Lighting | Mixed (natural, indoor, flash, outdoor) |
| Expression | Neutral, smiling, serious — varied |
| File size | 40-80KB each after 512px/85% optimization |
| Total size | ~500KB for 8-10 photos — acceptable |
> **For voice samples**: See [audio-memory/SKILL.md](../audio-memory/SKILL.md)
---
## Benefits Summary
| Without Visual Memory | With Visual Memory |
| ------------------------- | ------------------------------------- |
| External photo folder required | No external dependencies |
| Breaks on different machines | Works anywhere |
| Manual path management | Always correct path |
| Version control nightmare | JSON in version control |
| Different results per machine | Exact consistency everywhere |
| ~2MB unoptimized originals | ~50MB → 500KB optimized |
> **For voice samples**: See [audio-memory/SKILL.md](../audio-memory/SKILL.md)