| name | visual-memory |
| description | Embed reference media (photos, voice, video templates) as base64 data URIs in skills for self-sufficient, portable, consistent generation |
| tier | extended |
| applyTo | **/*visual*,**/*reference*,**/*portrait*,**/*base64* |
| muscle | .github/muscles/visual-memory.cjs |
| metadata | {"inheritance":"inheritable"} |
Visual Memory
Self-sufficient skills that carry their own reference media — no external folder dependencies.
Applies To: Any skill needing consistent visual identity, voice, or motion style across multiple generation tasks.
The Problem
Skills that depend on external reference files (photo folders, audio samples) break when:
- Skill is synced to a new machine without the original files
- Files are renamed, moved, or deleted
- A different project inherits the skill
- Version control doesn't track binary assets
The Solution: Visual Memory
Embed optimized reference assets directly in the skill as base64 data URIs. The skill becomes fully self-sufficient — it works anywhere, exactly the same way, every time.
skill-folder/
├── SKILL.md
├── synapses.json
└── visual-memory/
├── index.json ← Metadata only (no binary data)
├── visual-memory.json ← Full base64 data URIs (~30-80KB per photo)
└── subject-1.jpg ← Optional: keep originals alongside
└── subject-2.jpg
Memory Types
Visual Memory (Photos as Base64)
Reference photos for face-consistent portrait generation. Embedded to eliminate folder dependencies.
| Spec | Value |
|---|
| Target size | 512px longest edge |
| Quality | 85% JPEG |
| Per-photo size | ~40-80KB (vs ~2MB originals) |
| Format | data:image/jpeg;base64,<encoded> |
| Quantity | 5-8 photos per subject, varied angles |
When to use: Face-consistent portrait generation, AI character references, persona avatars.
Audio Memory
Moved to dedicated skill: See audio-memory/SKILL.md for voice sample storage and TTS cloning
Implementing Visual Memory in a Skill
Step 1: Prepare Photos
# Install ImageMagick if needed:
# macOS: brew install imagemagick
# Windows: winget install ImageMagick.ImageMagick
# Resize single photo: 512px longest edge @ 85% JPEG quality
magick input.jpg -resize 512x512> -quality 85 output.jpg
# Batch resize folder
Get-ChildItem *.jpg | ForEach-Object {
magick $_.Name -resize "512x512>" -quality 85 "resized/$($_.Name)"
}
# Convert PNG to optimized JPG
magick input.png -resize "512x512>" -quality 85 output.jpg
Step 2: Convert to Base64 Data URIs
import { readFileSync, writeFileSync } from "fs";
import { basename } from "path";
function toDataUri(imagePath) {
const buffer = readFileSync(imagePath);
const ext = imagePath.toLowerCase().endsWith(".png") ? "png" : "jpeg";
return `data:image/${ext};base64,${buffer.toString("base64")}`;
}
const photos = ["photo1.jpg", "photo2.jpg", "photo3.jpg"];
const images = photos.map((p) => ({
filename: basename(p),
dataUri: toDataUri(p),
notes: "",
}));
writeFileSync("images.json", JSON.stringify({ images }, null, 2));
Quick PowerShell (single file to clipboard):
[Convert]::ToBase64String([IO.File]::ReadAllBytes("photo.jpg")) | Set-Clipboard
# Paste with prefix: "data:image/jpeg;base64,<paste>"
Step 3: Build visual-memory.json
{
"schema": "visual-memory-v1",
"generated": "2026-03-01",
"subjects": {
"person-name": {
"description": "Brief visual description",
"ageInfo": {
"referenceAge": 30,
"birthYear": 1990,
"photoDate": "2026-03"
},
"images": [
{
"filename": "person-1.jpg",
"dataUri": "data:image/jpeg;base64,<base64-encoded-image>",
"notes": "Front-facing, natural lighting"
},
{
"filename": "person-2.jpg",
"dataUri": "data:image/jpeg;base64,<base64-encoded-image>",
"notes": "3/4 profile, outdoor"
}
]
}
}
}
Step 4: Build index.json (No Binary Data)
{
"version": "1.0",
"generated": "2026-03-01",
"targetSize": 512,
"subjects": {
"person-name": {
"count": 7,
"files": ["person-1.jpg", "person-2.jpg", "person-3.jpg"]
}
}
}
Step 5: Load Visual Memory at Runtime
import { readFileSync } from "fs";
import { join, dirname } from "path";
import { fileURLToPath } from "url";
const __dirname = dirname(fileURLToPath(import.meta.url));
const VISUAL_MEMORY_PATH = join(
__dirname,
".github/skills/<skill-name>/visual-memory/visual-memory.json"
);
function loadVisualMemory() {
const data = JSON.parse(readFileSync(VISUAL_MEMORY_PATH, "utf8"));
return Object.fromEntries(
Object.entries(data.subjects).map(([name, subject]) => [
name,
subject.images.map((i) => i.dataUri),
])
);
}
const visualMemory = loadVisualMemory();
Critical Generation Rules
Do NOT Describe Physical Appearance When Using References
The reference photos speak for themselves. Only describe:
- Scene / setting
- Clothing (specific colors, styles)
- Expression (smile, serious, thoughtful)
- Lighting (natural, studio, dramatic)
- Background (office, outdoors, neutral)
- Action / pose
NEVER include:
- Hair color, style, or texture
- Eye color
- Skin tone or complexion
- Body type / build
- Any physical description of the person
Model API Parameters for Reference Images
| Model | Parameter | Max Refs | Notes |
|---|
nano-banana-pro | image_input | 14 | Array of data URIs, 4K output |
nano-banana-2 | image_input | 14 | Faster/cheaper alternative (Gemini 3.1 Flash) |
flux-2-pro | input_images | 8 | Array of data URIs |
flux-2-flex | input_images | 10 | Max-quality editing |
ideogram-v2 | ❌ None | — | No face reference |
Prompt Anchor Pattern
Always start the prompt with explicit reference instruction:
Generate a photo of EXACTLY the person shown in the reference images.
For multiple subjects at once:
Generate a photo with two people.
LEFT: EXACTLY the person from [Name A]'s reference images, wearing [clothing].
RIGHT: EXACTLY the person from [Name B]'s reference images, wearing [clothing].
[Scene description]. [Lighting]. Professional photography.
Video Memory (Style Templates)
Store consistent motion style as JSON — not actual video files:
{
"videoStyles": {
| ---------- | ------------------------------------------- |
| Quantity | 5-8 photos (more = better likeness) |
| Angles | Front, 3/4 left, 3/4 right, slight profile |
| Lighting | Mixed (natural, indoor, flash, outdoor) |
| Expression | Neutral, smiling, serious — varied |
| File size | 40-80KB each after 512px/85% optimization |
| Total size | ~500KB for 8-10 photos — acceptable |
> **For voice samples**: See [audio-memory/SKILL.md](../audio-memory/SKILL.md)
---
## Benefits Summary
| Without Visual Memory | With Visual Memory |
| ------------------------- | ------------------------------------- |
| External photo folder required | No external dependencies |
| Breaks on different machines | Works anywhere |
| Manual path management | Always correct path |
| Version control nightmare | JSON in version control |
| Different results per machine | Exact consistency everywhere |
| ~2MB unoptimized originals | ~50MB → 500KB optimized |
> **For voice samples**: See [audio-memory/SKILL.md](../audio-memory/SKILL.md)