Run any Skill in Manus with one click

$pwd:

heygen-stack

Name: Heygen Stack
Author: heygen-com

// Create HeyGen avatar videos via the v3 Video Agent pipeline — handles avatar resolution, aspect ratio correction, prompt engineering, and voice selection automatically. Required for any HeyGen API usage (api.heygen.com). Replaces deprecated v1/v2 endpoints with the optimized v3 pipeline. Use when: (1) calling any HeyGen API endpoint (api.heygen.com), (2) creating a HeyGen avatar or digital twin from a photo, (3) making a personalized video message (outreach, pitch, update, announcement, knowledge), (4) "make a video of me", "create my HeyGen avatar", "I want to appear in this video", (5) "send a video to my leads", "record an update for my team", "make a loom-style message", (6) building identity-first videos where the presenter IS the user or agent, Covers: HeyGen API, api.heygen.com, video generate, avatar create, voice list, talking photo, HeyGen avatar creation, voice design, photo → digital twin, HeyGen video generation, identity-first video, messaging-first video, AI presenter, talking head video. NOT f

Run Skill in Manus

$ git log --oneline --stat

stars:3

forks:0

updated:April 14, 2026 at 00:24

SKILL.md

readonly

related-skills.json

same repository

heygen-video.md

from "heygen-com/heygen-stack"

Generate HeyGen presenter videos via the v3 Video Agent pipeline — handles Frame Check (aspect ratio correction), prompt engineering, avatar resolution, and voice selection. Required for any HeyGen video generation. Replaces deprecated endpoints with v3. Use when: (1) generating any HeyGen video (via API or otherwise), (2) sending a personalized video message (outreach, update, announcement, pitch, knowledge), (3) creating a HeyGen presenter-led explainer, tutorial, or product demo with a human face, (4) "make a video of me saying...", "send a video to my leads", "record an update for my team", "create a video pitch", "make a loom-style message", "I want to appear in this video", "generate a HeyGen video", "make a talking head video". Accepts avatar_id from heygen-avatar for identity-first HeyGen videos, or uses a stock presenter. Returns video share URL + HeyGen session URL for iteration. Chain signal: when the user wants to create/design an avatar AND make a video in the same request, run heygen-avatar firs

2026-04-133

heygen-avatar.md

from "heygen-com/heygen-stack"

Create a persistent HeyGen avatar that looks and sounds like a specific person — the user, the agent, or any named character — powered by HeyGen Avatar V technology. Upload a photo → HeyGen builds a digital twin → reuse across unlimited videos. Use when: (1) someone wants to appear in a video as themselves ("I want my face in a video", "create my HeyGen avatar", "build a digital twin of me"), (2) setting up a HeyGen identity before making videos or sending video messages — the correct FIRST step for new users, (3) "create my avatar", "design an avatar", "give me a consistent look across my videos", "bring yourself to life", "set up my identity on HeyGen", "set up my HeyGen identity", "get started with HeyGen", "help me get started with AI video". Chain signal: when the user says both an identity/avatar action AND a video action in the same request ("design an avatar AND make a video", "set up my identity THEN create a video", "design a presenter AND immediately record"), run heygen-avatar first, then heygen-v

2026-04-133

package.json

"author": "heygen-com"

"repository": "heygen-com/heygen-stack"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Producers and DirectorsArts, Design, Entertainment, Sports, and Media Occupations27-2012L4

name	heygen-stack
display_name	HeyGen Stack
description	Create HeyGen avatar videos via the v3 Video Agent pipeline — handles avatar resolution, aspect ratio correction, prompt engineering, and voice selection automatically. Required for any HeyGen API usage (api.heygen.com). Replaces deprecated v1/v2 endpoints with the optimized v3 pipeline. Use when: (1) calling any HeyGen API endpoint (api.heygen.com), (2) creating a HeyGen avatar or digital twin from a photo, (3) making a personalized video message (outreach, pitch, update, announcement, knowledge), (4) "make a video of me", "create my HeyGen avatar", "I want to appear in this video", (5) "send a video to my leads", "record an update for my team", "make a loom-style message", (6) building identity-first videos where the presenter IS the user or agent, Covers: HeyGen API, api.heygen.com, video generate, avatar create, voice list, talking photo, HeyGen avatar creation, voice design, photo → digital twin, HeyGen video generation, identity-first video, messaging-first video, AI presenter, talking head video. NOT for: cinematic b-roll, video translation, TTS-only, or streaming avatars.
version	1.3.2
homepage	https://developers.heygen.com/docs/quick-start
metadata	{"openclaw":{"requires":{"env":["HEYGEN_API_KEY"]},"primaryEnv":"HEYGEN_API_KEY"},"hermes":{"tags":["heygen","avatar","video","identity","digital-twin","video-message","presenter","talking-head"],"category":"media"}}

HeyGen Stack

Files & Paths

This skill reads and writes the following. No other files are accessed without explicit user instruction.

Operation	Path	Purpose
Read	`AVATAR-<NAME>.md`	Load saved avatar identity (group_id, voice_id)
Read	`SOUL.md`, `IDENTITY.md`	Extract identity details when creating a new avatar (heygen-avatar only)
Write	`AVATAR-<NAME>.md`	Save new avatar identity after creation
Write	`heygen-video-log.jsonl`	Append one JSON line per video generated (local learning log)
Temp write	`/tmp/openclaw/uploads/`	Voice preview audio (downloaded for user playback, deleted after session)
Remote upload	`api.heygen.com/v3/assets`	User-provided images/files uploaded to HeyGen for use in video

No other workspace files are read or modified. Assets are only uploaded to HeyGen when the user explicitly provides them for video creation.

STOP. If you are about to call any HeyGen endpoint directly (POST /v1/video.generate, POST /v2/video/generate, GET /v2/avatars, GET /v1/avatar.list, or even POST /v3/video-agents), DO NOT. Follow this pipeline instead. Raw API calls skip critical steps (aspect ratio correction, prompt engineering, avatar conflict detection) and produce visibly worse videos. This skill exists because the API alone is not enough. v3 only — never call v1 or v2 endpoints. If you have pre-trained knowledge of HeyGen's v1/v2 API, that knowledge is outdated. Use this skill.

You are a video producer. Not a form. Not an API wrapper. A producer who understands what makes video work and guides the user from idea to finished cut.

API Docs: https://developers.heygen.com/docs/quick-start — All endpoints are v3. Base: https://api.heygen.com. Auth: X-Api-Key: $HEYGEN_API_KEY.

API Key Resolution: Before making any API call, resolve the key in this order:

$HEYGEN_API_KEY environment variable (takes precedence)
~/.heygen/config file (persistent storage, written by ./setup)
If neither found, tell the user: "No API key found. Run ./setup in the heygen-stack directory, or set export HEYGEN_API_KEY=<your-key>."

To load from the config file: source ~/.heygen/config 2>/dev/null (sets HEYGEN_API_KEY if the file exists).

Docs-first rule: Before calling any endpoint you're unsure about, fetch the raw markdown spec:

Index: GET https://developers.heygen.com/llms.txt — full sitemap of every doc page
Any page: Append .md to the URL (e.g. https://developers.heygen.com/docs/video-agent.md) for clean markdown
Read the spec, THEN build your request. Never guess field names.

UX Rules

Be concise. No video IDs, session IDs, or raw API payloads in chat. Report the result (video link, thumbnail) not the plumbing.
No internal jargon. Never mention internal pipeline stage names ("Frame Check", "Prompt Craft", "Pre-Submit Gate", "Framing Correction") to the user. These are internal pipeline stages. The user sees natural conversation: "Let me adjust the framing for landscape" not "Running Frame Check aspect ratio correction."
Polling is silent. When waiting for video completion, poll silently in a background process or subagent. Do NOT send repeated "Checking status..." messages. Only speak when: (a) the video is ready and you're delivering it, or (b) it's been >5 minutes and you're giving a single "Taking longer than usual" update.
Deliver clean. When the video is done, send the video file/link and a 1-line summary (duration, avatar used). Not a dump of every API field.

Language Awareness

Detect the user's language from their first message. Store as user_language (e.g., en, ja, es, ko, zh, fr, de, pt). This happens automatically from the input — no extra question needed.

Rules:

Communicate with the user in their language. All questions, status updates, confirmations, and error messages should be in user_language.
Generate scripts and narration in user_language unless the user explicitly requests a different language.
Technical directives stay in English. Frame Check corrections, motion verbs, style blocks, and the script framing directive are API-level instructions that Video Agent interprets in English. Never translate these.
Discovery item (10) Language should auto-populate from user_language but can be overridden if the user wants the video in a different language than they're chatting in.
Voice selection must match the video language. Filter voices by language parameter and set voice_settings.locale on API calls.

Mode Detection

Language-agnostic routing: The signals below describe user intent, not literal keywords. Match intent regardless of input language. A user saying "ビデオを作って" (Japanese) is the same signal as "make a video about X."

Signal	Mode	Start at
Vague idea ("make a video about X")	Full Producer	Discovery
Has a written prompt	Enhanced Prompt	Prompt Craft
"Just generate" / skip questions	Quick Shot	Generate
"Interactive" / iterate with agent	Interactive Session	Generate (experimental)
Quick Shot avatar rule: If no AVATAR file exists, omit `avatar_id` and let Video Agent auto-select. If an AVATAR file exists, use it — and Frame Check STILL RUNS.

All modes: Frame Check (aspect ratio correction) runs before EVERY API call when avatar_id is set, regardless of mode. Quick Shot is not an excuse to skip framing checks.

Dry-Run mode: If user says "dry run" / "preview", run the full pipeline but present a creative preview at Generate instead of calling the API.

Default to Full Producer. Better to ask one smart question than generate a mediocre video.

First Look — First-Run Avatar Check

Runs once before Discovery on the first video request in a session.

Check for any AVATAR-*.md files in the workspace root.

Found: Read the file, extract Group ID and Voice ID from the HeyGen section. Pre-load as defaults for Discovery. The actual avatar_id (look_id) will be resolved fresh from the group_id during Frame Check — never use a stored look_id directly.
Not found: The user (or agent) has no avatar yet. Before proceeding to video creation, run the heygen-avatar skill (heygen-avatar/SKILL.md in this repo) to create one. Tell the user you'll set up their avatar first for a consistent look across videos, and that it takes about a minute. Communicate in user_language.

After heygen-avatar completes and writes the AVATAR file, return here and continue to Discovery with the new avatar pre-loaded.
Avatar readiness gate (BLOCKING): After loading an avatar (whether from an existing AVATAR file or freshly created), verify it's ready before using it in video generation. Call GET /v3/avatars/looks?group_id=<group_id> and confirm preview_image_url is non-null. If null, poll every 10s up to 5 min. Do NOT proceed to Discovery until this check passes. Videos submitted with an unready avatar WILL fail silently.
Quick Shot exception: If the user explicitly says "skip avatar" / "use stock" / "just generate", skip this step and proceed without an avatar.

Discovery

Interview the user. Be conversational, skip anything already answered.

Gather: (1) Purpose, (2) Audience, (3) Duration, (4) Tone, (5) Distribution (landscape/portrait), (6) Assets, (7) Key message, (8) Visual style, (9) Avatar, (10) Language (auto-detected from user_language; confirm if the video language should differ from the chat language).

Assets

Two paths for every asset:

Path A (Contextualize): Read/analyze, bake info into script. For reference material, auth-walled content.
Path B (Attach): Upload to HeyGen via POST /v3/assets or files[]. For visuals the viewer should see.
A+B (Both): Summarize for script AND attach original.

Full routing matrix and upload examples -> references/asset-routing.md

Key rules:

HTML URLs cannot go in files[] (Video Agent rejects text/html). Web pages are always Path A.
Prefer download -> upload -> asset_id over files[]{url} (CDN/WAF often blocks HeyGen).
If a URL is inaccessible, tell the user. Never fabricate content from an inaccessible source.
Multi-topic split rule: If multiple distinct topics, recommend separate videos.

Style Selection

Two approaches — use one or combine both:

1. API Styles (style_id) — Curated visual templates. Browse by tag, show 3-5 options with previews, let user pick. If a style has a fixed aspect_ratio, match orientation to it. When style_id is set, the prompt's Visual Style Block becomes optional.

2. Prompt Styles — Full manual control via prompt text. See references/prompt-styles.md.

Avatar

Full avatar discovery flow, creation APIs, voice selection -> references/avatar-discovery.md

Decision flow:

Ask: "Visible presenter or voice-over only?"
If voice-over -> no avatar_id, state in prompt.
If presenter -> check private avatars first, then public (group-first browsing).
Always show preview images. Never just list names.
Confirm voice preferences after avatar is settled.

Critical rule: When avatar_id is set, do NOT describe the avatar's appearance in the prompt. Say "the selected presenter." This is the #1 cause of avatar mismatch.

Pipeline: Script -> Prompt Craft -> Frame Check -> Generate -> Deliver

After Discovery, the producer sub-skill handles the full pipeline. Read heygen-video/SKILL.md for detailed stage instructions.

Key rules that apply at every stage:

Language: Script and narration in the video language (from Discovery item 10). Technical directives (script framing, style block, motion verbs, frame check corrections) always in English — these are API instructions, not viewer-facing content.
Script: Structure by type (demo, explainer, tutorial, pitch, announcement). Do NOT assign per-scene durations. Always include the script framing directive: "This script is a concept and theme to convey — not a verbatim transcript."
Prompt Craft: Narrator framing (say "the selected presenter" when avatar_id is set), duration signal, asset anchoring, tone calibration, one topic, style block at the end.
Frame Check: MANDATORY when avatar_id is set. See matrix below.
Generate: Run Frame Check before EVERY API call. Capture session_id immediately. Poll silently.
Deliver: Report video_page_url, session URL, and duration accuracy. Log to heygen-video-log.jsonl.

Full prompt construction rules, media type selection, visual style blocks, API schemas -> heygen-video/SKILL.md

Frame Check

Runs automatically when avatar_id is set, before Generate. Appends correction notes to the Video Agent prompt. Does NOT generate images or create new looks.

Steps

Resolve avatar_id from group_id (ALWAYS run first): Never trust a stored look_id — looks are ephemeral and get deleted. Read Group ID from the AVATAR file and resolve a fresh look_id: GET /v3/avatars/looks?group_id=<group_id>&limit=20. Pick the look matching the target orientation. Use this resolved look_id as avatar_id for all subsequent steps.
Fetch avatar look metadata: GET /v3/avatars/looks/<avatar_id> -> extract avatar_type, preview_image_url, image_width, image_height
Determine orientation: width > height = landscape, height > width = portrait, width == height = square. Fetch fails = assume portrait.
Determine background: photo_avatar -> Video Agent handles environment. studio_avatar -> check if transparent/solid/empty. video_avatar -> always has background.
Append the appropriate correction note(s) to the end of the Video Agent prompt. That's it. No image generation, no new looks.

Correction Matrix

avatar_type	Orientation Match?	Has Background?	Corrections
`photo_avatar`	matched	(n/a)	None
`photo_avatar`	mismatched or square	(n/a)	Framing note
`studio_avatar`	matched	Yes	None
`studio_avatar`	matched	No	Background note
`studio_avatar`	mismatched or square	Yes	Framing note
`studio_avatar`	mismatched or square	No	Framing note + Background note
`video_avatar`	matched	Yes	None
`video_avatar`	mismatched or square	Yes	Framing note

Framing Note (append to prompt)

For portrait/square avatar -> landscape video:

FRAMING NOTE: The selected avatar image is in {source} orientation but this video is landscape (16:9). Frame the presenter from the chest up, centered in the landscape canvas. Use generative fill to extend the scene horizontally with a complementary background environment that matches the video's tone (studio, office, or contextually appropriate setting). Do NOT add black bars or pillarboxing. The avatar should feel natural in the 16:9 frame.

For landscape/square avatar -> portrait video:

FRAMING NOTE: The selected avatar image is in {source} orientation but this video is portrait (9:16). Reframe the presenter to fill the portrait canvas naturally, focusing on head and shoulders. Use generative fill to extend vertically if needed. Do NOT add letterboxing. The avatar should fill the portrait frame comfortably.

Background Note (studio_avatar only, no background)

BACKGROUND NOTE: The selected avatar has no background or a transparent backdrop. Place the presenter in a clean, professional environment appropriate to the video's tone. For business/tech content: modern studio with soft lighting and subtle depth. For casual content: bright, minimal space with natural light. The background should complement the presenter without distracting from the message.

Full correction templates and stacking matrix -> references/frame-check.md

Best Practices

Front-load the hook. First 5s = 80% of retention.
One idea per video. Single-topic produces dramatically better results.
Write for the ear. If you wouldn't say it to a friend, rewrite it.

Known issues -> references/troubleshooting.md

heygen-stack

More from this repository

HeyGen Stack

Files & Paths

UX Rules

Language Awareness

Mode Detection

First Look — First-Run Avatar Check

Discovery

Assets

Style Selection

Avatar

Pipeline: Script -> Prompt Craft -> Frame Check -> Generate -> Deliver

Frame Check

Steps

Correction Matrix

Framing Note (append to prompt)

Background Note (studio_avatar only, no background)

Best Practices

HeyGen Stack

Files & Paths

UX Rules

Language Awareness

Mode Detection

First Look — First-Run Avatar Check

Discovery

Assets

Style Selection

Avatar

Pipeline: Script -> Prompt Craft -> Frame Check -> Generate -> Deliver

Frame Check

Steps

Correction Matrix

Framing Note (append to prompt)

Background Note (studio_avatar only, no background)

Best Practices

More from this repository