一键在 Manus 中运行任何 Skill

visual-memory

星标5

分支3

更新时间2026年4月15日 17:25

Embed reference media (photos, voice, video templates) as base64 data URIs in skills for self-sufficient, portable, consistent generation

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

fabioc-aloha

fabioc-aloha/Alex_Plug_In

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Visual Memory

Self-sufficient skills that carry their own reference media — no external folder dependencies.

Applies To: Any skill needing consistent visual identity, voice, or motion style across multiple generation tasks.

The Problem

Skills that depend on external reference files (photo folders, audio samples) break when:

Skill is synced to a new machine without the original files
Files are renamed, moved, or deleted
A different project inherits the skill
Version control doesn't track binary assets

The Solution: Visual Memory

Embed optimized reference assets directly in the skill as base64 data URIs. The skill becomes fully self-sufficient — it works anywhere, exactly the same way, every time.

skill-folder/
├── SKILL.md
├── synapses.json
└── visual-memory/
    ├── index.json              ← Metadata only (no binary data)
    ├── visual-memory.json      ← Full base64 data URIs (~30-80KB per photo)
    └── subject-1.jpg           ← Optional: keep originals alongside
    └── subject-2.jpg

Memory Types

Visual Memory (Photos as Base64)

Reference photos for face-consistent portrait generation. Embedded to eliminate folder dependencies.

Spec	Value
Target size	512px longest edge
Quality	85% JPEG
Per-photo size	~40-80KB (vs ~2MB originals)
Format	`data:image/jpeg;base64,<encoded>`
Quantity	5-8 photos per subject, varied angles

When to use: Face-consistent portrait generation, AI character references, persona avatars.

Audio Memory

Moved to dedicated skill: See audio-memory/SKILL.md for voice sample storage and TTS cloning

Implementing Visual Memory in a Skill

Step 1: Prepare Photos

# Install ImageMagick if needed:
# macOS: brew install imagemagick
# Windows: winget install ImageMagick.ImageMagick

# Resize single photo: 512px longest edge @ 85% JPEG quality
magick input.jpg -resize 512x512> -quality 85 output.jpg

# Batch resize folder
Get-ChildItem *.jpg | ForEach-Object {
    magick $_.Name -resize "512x512>" -quality 85 "resized/$($_.Name)"
}

# Convert PNG to optimized JPG
magick input.png -resize "512x512>" -quality 85 output.jpg

Step 2: Convert to Base64 Data URIs

import { readFileSync, writeFileSync } from "fs";
import { basename } from "path";

function toDataUri(imagePath) {
  const buffer = readFileSync(imagePath);
  const ext = imagePath.toLowerCase().endsWith(".png") ? "png" : "jpeg";
  return `data:image/${ext};base64,${buffer.toString("base64")}`;
}

// Batch convert and write to JSON
const photos = ["photo1.jpg", "photo2.jpg", "photo3.jpg"];
const images = photos.map((p) => ({
  filename: basename(p),
  dataUri: toDataUri(p),
  notes: "",
}));

writeFileSync("images.json", JSON.stringify({ images }, null, 2));

Quick PowerShell (single file to clipboard):

[Convert]::ToBase64String([IO.File]::ReadAllBytes("photo.jpg")) | Set-Clipboard
# Paste with prefix: "data:image/jpeg;base64,<paste>"

Step 3: Build visual-memory.json

{
  "schema": "visual-memory-v1",
  "generated": "2026-03-01",
  "subjects": {
    "person-name": {
      "description": "Brief visual description",
      "ageInfo": {
        "referenceAge": 30,
        "birthYear": 1990,
        "photoDate": "2026-03"
      },
      "images": [
        {
          "filename": "person-1.jpg",
          "dataUri": "data:image/jpeg;base64,<base64-encoded-image>",
          "notes": "Front-facing, natural lighting"
        },
        {
          "filename": "person-2.jpg",
          "dataUri": "data:image/jpeg;base64,<base64-encoded-image>",
          "notes": "3/4 profile, outdoor"
        }
      ]
    }
  }
}

Step 4: Build index.json (No Binary Data)

{
  "version": "1.0",
  "generated": "2026-03-01",
  "targetSize": 512,
  "subjects": {
    "person-name": {
      "count": 7,
      "files": ["person-1.jpg", "person-2.jpg", "person-3.jpg"]
    }
  }
}

Step 5: Load Visual Memory at Runtime

import { readFileSync } from "fs";
import { join, dirname } from "path";
import { fileURLToPath } from "url";

const __dirname = dirname(fileURLToPath(import.meta.url));
const VISUAL_MEMORY_PATH = join(
  __dirname,
  ".github/skills/<skill-name>/visual-memory/visual-memory.json"
);

function loadVisualMemory() {
  const data = JSON.parse(readFileSync(VISUAL_MEMORY_PATH, "utf8"));
  return Object.fromEntries(
    Object.entries(data.subjects).map(([name, subject]) => [
      name,
      subject.images.map((i) => i.dataUri),
    ])
  );
}

const visualMemory = loadVisualMemory();
// visualMemory.personName → array of data URIs

Critical Generation Rules

Do NOT Describe Physical Appearance When Using References

The reference photos speak for themselves. Only describe:

Scene / setting
Clothing (specific colors, styles)
Expression (smile, serious, thoughtful)
Lighting (natural, studio, dramatic)
Background (office, outdoors, neutral)
Action / pose

NEVER include:

Hair color, style, or texture
Eye color
Skin tone or complexion
Body type / build
Any physical description of the person

Model API Parameters for Reference Images

Model	Parameter	Max Refs	Notes
`nano-banana-pro`	`image_input`	14	Array of data URIs, 4K output
`nano-banana-2`	`image_input`	14	Faster/cheaper alternative (Gemini 3.1 Flash)
`flux-2-pro`	`input_images`	8	Array of data URIs
`flux-2-flex`	`input_images`	10	Max-quality editing
`ideogram-v2`	❌ None	—	No face reference

Prompt Anchor Pattern

Always start the prompt with explicit reference instruction:

Generate a photo of EXACTLY the person shown in the reference images.

For multiple subjects at once:

Generate a photo with two people.
LEFT: EXACTLY the person from [Name A]'s reference images, wearing [clothing].
RIGHT: EXACTLY the person from [Name B]'s reference images, wearing [clothing].
[Scene description]. [Lighting]. Professional photography.

Video Memory (Style Templates)

Store consistent motion style as JSON — not actual video files:

{
  "videoStyles": {
| ---------- | ------------------------------------------- |
| Quantity   | 5-8 photos (more = better likeness)         |
| Angles     | Front, 3/4 left, 3/4 right, slight profile |
| Lighting   | Mixed (natural, indoor, flash, outdoor)     |
| Expression | Neutral, smiling, serious — varied          |
| File size  | 40-80KB each after 512px/85% optimization   |
| Total size | ~500KB for 8-10 photos — acceptable         |

> **For voice samples**: See [audio-memory/SKILL.md](../audio-memory/SKILL.md)

---

## Benefits Summary

| Without Visual Memory     | With Visual Memory                    |
| ------------------------- | ------------------------------------- |
| External photo folder required | No external dependencies         |
| Breaks on different machines   | Works anywhere                    |
| Manual path management    | Always correct path                   |
| Version control nightmare  | JSON in version control              |
| Different results per machine  | Exact consistency everywhere      |
| ~2MB unoptimized originals | ~50MB → 500KB optimized               |
> **For voice samples**: See [audio-memory/SKILL.md](../audio-memory/SKILL.md)

name	visual-memory
description	Embed reference media (photos, voice, video templates) as base64 data URIs in skills for self-sufficient, portable, consistent generation
tier	extended
applyTo	*/visual,/reference,/portrait,/base64*
muscle	.github/muscles/visual-memory.cjs
metadata	{"inheritance":"inheritable"}

name	visual-memory
description	Embed reference media (photos, voice, video templates) as base64 data URIs in skills for self-sufficient, portable, consistent generation
tier	extended
applyTo	*/visual,/reference,/portrait,/base64*
muscle	.github/muscles/visual-memory.cjs
metadata	{"inheritance":"inheritable"}

visual-memory

同仓库更多 Skills

同仓库更多 Skills

Visual Memory

The Problem

The Solution: Visual Memory

Memory Types

Visual Memory (Photos as Base64)

Audio Memory

Implementing Visual Memory in a Skill

Step 1: Prepare Photos

Step 2: Convert to Base64 Data URIs

Step 3: Build visual-memory.json

Step 4: Build index.json (No Binary Data)

Step 5: Load Visual Memory at Runtime

Critical Generation Rules

Do NOT Describe Physical Appearance When Using References

Model API Parameters for Reference Images

Prompt Anchor Pattern

Video Memory (Style Templates)

Visual Memory

The Problem

The Solution: Visual Memory

Memory Types

Visual Memory (Photos as Base64)

Audio Memory

Implementing Visual Memory in a Skill

Step 1: Prepare Photos

Step 2: Convert to Base64 Data URIs

Step 3: Build visual-memory.json

Step 4: Build index.json (No Binary Data)

Step 5: Load Visual Memory at Runtime

Critical Generation Rules

Do NOT Describe Physical Appearance When Using References

Model API Parameters for Reference Images

Prompt Anchor Pattern

Video Memory (Style Templates)