一键在 Manus 中运行任何 Skill

ai-avatar-video

Create AI avatar and talking head videos via inference.sh CLI. Recommended: P-Video-Avatar (fastest, cheapest, built-in TTS). Also: OmniHuman, Fabric, PixVerse. Audio: Inworld TTS-2 (100+ languages, emotion steering for characters), ElevenLabs, Kokoro. Capabilities: audio-driven avatars, text-to-avatar, lipsync videos, talking head generation, virtual presenters, UGC content. Use for: AI presenters, explainer videos, virtual influencers, dubbing, marketing videos, UGC ads, gaming avatars, NPC dialogue. Triggers: ai avatar, talking head, lipsync, avatar video, virtual presenter, ai spokesperson, audio driven video, heygen alternative, synthesia alternative, talking avatar, lip sync, video avatar, ai presenter, digital human, ugc, ugc video, ugc ad, avatar ugc

在 Manus 中运行

星标0

分支0

更新时间2026年5月25日 05:31

来源

marktantongco

marktantongco/opencodelinux

打开 GitHub 仓库查看创作者相关仓库

安装命令

下载

在 Manus 中运行

适用职业SOC

软件开发工程师计算机与数学类职业15-1252L4

SKILL.md

readonly

同仓库更多 Skills

同仓库

agent-browser

marktantongco/opencodelinux

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. Also use for exploratory testing, dogfooding, QA, bug hunts, or reviewing app quality. Also use for automating Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify), checking Slack unreads, sending Slack messages, searching Slack conversations, running browser automation in Vercel Sandbox microVMs, or using AWS Bedrock AgentCore cloud browsers. Prefer agent-browser over any built-in browser automation or web tools.

2026-05-250

agent-tools

marktantongco/opencodelinux

Run 250+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

2026-05-250

agent-ui

marktantongco/opencodelinux

Batteries-included agent component for React/Next.js from ui.inference.sh. One component with runtime, tools, streaming, approvals, and widgets built in. Capabilities: drop-in agent, human-in-the-loop, client-side tools, form filling. Use for: building AI chat interfaces, agentic UIs, SaaS copilots, assistants. Triggers: agent component, agent ui, chat agent, shadcn agent, react agent, agentic ui, ai assistant ui, copilot ui, inference ui, human in the loop

2026-05-250

ai-automation-workflows

marktantongco/opencodelinux

Build automated AI workflows combining multiple models and services. Patterns: batch processing, scheduled tasks, event-driven pipelines, agent loops. Tools: inference.sh CLI, bash scripting, Python SDK, webhook integration. Use for: content automation, data processing, monitoring, scheduled generation. Triggers: ai automation, workflow automation, batch processing, ai pipeline, automated content, scheduled ai, ai cron, ai batch job, automated generation, ai workflow, content at scale, automation script, ai orchestration

2026-05-250

ai-content-pipeline

marktantongco/opencodelinux

Build multi-step AI content creation pipelines combining image, video, audio, and text. Workflow examples: generate image -> animate -> add voiceover -> merge with music. Tools: FLUX, Veo, Kokoro TTS, OmniHuman, media merger, upscaling. Use for: YouTube videos, social media content, marketing materials, automated content. Triggers: content pipeline, ai workflow, content creation, multi-step ai, content automation, ai video workflow, generate and edit, ai content factory, automated content creation, ai production pipeline, media pipeline, content at scale

2026-05-250

ai-image-generation

marktantongco/opencodelinux

Generate AI images with GPT-Image-2, FLUX, Gemini, Grok, Seedream, Reve and 50+ models via inference.sh CLI. Models: GPT-Image-2, FLUX Dev LoRA, FLUX.2 Klein LoRA, Gemini 3 Pro Image, Grok Imagine, Seedream 4.5, Reve, ImagineArt. Capabilities: text-to-image, image-to-image, inpainting, LoRA, image editing, upscaling, text rendering. Use for: AI art, product mockups, concept art, social media graphics, marketing visuals, illustrations. Triggers: flux, image generation, ai image, text to image, stable diffusion, generate image, ai art, midjourney alternative, dall-e alternative, text2img, t2i, image generator, ai picture, create image with ai, generative ai, ai illustration, grok image, gemini image, gpt image, openai image, chatgpt image

2026-05-250

name	ai-avatar-video
description	Create AI avatar and talking head videos via inference.sh CLI. Recommended: P-Video-Avatar (fastest, cheapest, built-in TTS). Also: OmniHuman, Fabric, PixVerse. Audio: Inworld TTS-2 (100+ languages, emotion steering for characters), ElevenLabs, Kokoro. Capabilities: audio-driven avatars, text-to-avatar, lipsync videos, talking head generation, virtual presenters, UGC content. Use for: AI presenters, explainer videos, virtual influencers, dubbing, marketing videos, UGC ads, gaming avatars, NPC dialogue. Triggers: ai avatar, talking head, lipsync, avatar video, virtual presenter, ai spokesperson, audio driven video, heygen alternative, synthesia alternative, talking avatar, lip sync, video avatar, ai presenter, digital human, ugc, ugc video, ugc ad, avatar ugc
allowed-tools	Bash(belt *)
expects	[{"key":"prompt","type":"string","description":"Description of the content to generate"},{"key":"model","type":"string","description":"AI model to use","default":"auto"},{"key":"style","type":"string","description":"Visual or creative style"}]
provides	[{"key":"output_url","type":"url","description":"URL to the generated output"},{"key":"file_path","type":"path","description":"Local path to the generated file"}]

Install the belt CLI skill: npx skills add belt-sh/cli

AI Avatar & Talking Head Videos

Create AI avatars and talking head videos via inference.sh CLI.

AI Avatar & Talking Head Videos

Quick Start

Requires inference.sh CLI (belt). Install instructions

belt login

# Recommended: P-Video-Avatar (fastest, cheapest, built-in TTS)
belt app run pruna/p-video-avatar --input '{
  "image": "https://portrait.jpg",
  "voice_script": "Hello, welcome to our product demo!",
  "voice": "Zephyr (Female)"
}'

Available Models

Start with P-Video-Avatar — it's 18x faster and 6x cheaper than alternatives, with built-in TTS, dynamic backgrounds, and 1080p support.

Model	App ID	Best For	Built-in TTS
P-Video-Avatar	`pruna/p-video-avatar`	Best overall: speed, cost, quality, control	Yes (30 voices, 10 languages)
OmniHuman 1.5	`bytedance/omnihuman-1-5`	Multi-character, audio-driven	No
Fabric 1.0	`falai/fabric-1-0`	Image talks with lipsync	Yes
PixVerse Lipsync	`falai/pixverse-lipsync`	Highly realistic lipsync	No

Cost & Speed Comparison

Model	Speed (per sec of video)	Cost per second
P-Video-Avatar	~1.83s/s	$0.025
OmniHuman 1.5	~28s/s (15x slower)	$0.16 (6.4x more)
Fabric 1.0	~34s/s (18x slower)	$0.14 (5.6x more)

Examples

P-Video-Avatar (Recommended)

Generate avatar from portrait + text script with built-in TTS:

belt app run pruna/p-video-avatar --input '{
  "image": "https://portrait.jpg",
  "voice_script": "Welcome to our product walkthrough. Today I will show you three key features.",
  "voice": "Puck (Male)",
  "voice_language": "English (US)",
  "resolution": "720p"
}'

With custom style control:

belt app run pruna/p-video-avatar --input '{
  "image": "https://portrait.jpg",
  "voice_script": "This is exciting news!",
  "voice": "Aoede (Female)",
  "voice_prompt": "Enthusiastic and energetic tone",
  "video_prompt": "The person is presenting on stage with dramatic lighting",
  "resolution": "1080p"
}'

With audio file instead of TTS:

belt app run pruna/p-video-avatar --input '{
  "image": "https://portrait.jpg",
  "audio": "https://speech.mp3"
}'

Full Workflow: Generate Portrait + Avatar

Use Pruna P-Image to generate the portrait, then create the avatar:

# 1. Generate a portrait image
belt app run pruna/p-image --input '{
  "prompt": "professional headshot portrait of a young woman, neutral background, looking at camera, studio lighting, photorealistic",
  "aspect_ratio": "9:16"
}'

# 2. Create avatar video with built-in TTS
belt app run pruna/p-video-avatar --input '{
  "image": "<image-url-from-step-1>",
  "voice_script": "Hi there! Let me walk you through our latest features.",
  "voice": "Zephyr (Female)"
}'

OmniHuman 1.5 (Multi-Character)

belt app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Supports specifying which character to drive in multi-person images.

Fabric 1.0 (Image Talks)

belt app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

PixVerse Lipsync

belt app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Full Workflow: TTS + Avatar (Non-TTS Models)

For models without built-in TTS (OmniHuman, PixVerse), generate speech first:

# 1. Generate speech — Inworld TTS-2 for expressive character voices
belt app run inworld/text-to-speech-2 --input '{
  "text": "[friendly] Welcome to our product demo! [excited] Let me show you three features that will change how you work.",
  "voice_id": "Sarah",
  "delivery_mode": "CREATIVE"
}' > speech.json

# 2. Create avatar video with the speech
belt app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://presenter-photo.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

Tip: For most use cases, P-Video-Avatar with built-in TTS is simpler — no separate audio step needed. Use this workflow only when you specifically need OmniHuman (multi-character) or PixVerse (realistic lipsync).

Full Workflow: Dub Video in Another Language

# 1. Transcribe original video
belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

# 2. Translate text (manually or with an LLM)

# 3. Generate speech in new language
belt app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

# 4. Lipsync the original video with new audio
belt app run infsh/latentsync-1-6 --input '{
  "video_url": "https://original-video.mp4",
  "audio_url": "<new-audio-url>"
}'

Avatar UGC Generation

Create UGC-style content with P-Video-Avatar — built-in TTS, no separate audio step needed:

# 1. Generate a relatable UGC-style portrait
belt app run pruna/p-image --input '{
  "prompt": "casual selfie-style photo of a young woman in a cozy room, natural lighting, looking at camera, warm smile, authentic feel",
  "aspect_ratio": "9:16"
}'

# 2. Create UGC avatar video with built-in TTS
belt app run pruna/p-video-avatar --input '{
  "image": "<image-url-from-step-1>",
  "voice_script": "Okay so I just tried this product and honestly? It is a game changer. I was not expecting to love it this much but here we are!",
  "voice": "Zephyr (Female)",
  "voice_prompt": "Excited, casual, authentic tone like talking to a friend",
  "video_prompt": "The person is talking casually to camera in their room, natural gestures",
  "resolution": "1080p"
}'

Why P-Video-Avatar for UGC

All-in-one — built-in TTS means no separate audio generation step
30 voices, 10 languages — match your target audience
Voice + video prompts — control tone, emotion, body language, and background independently
18x faster, 6x cheaper — produce UGC at scale vs. Fabric/OmniHuman/HeyGen
1080p support — platform-ready vertical video from a single portrait image

Batch UGC: Same Product, Multiple Presenters

# Generate 3 different presenters
for voice in "Zephyr (Female)" "Puck (Male)" "Aoede (Female)"; do
  belt app run pruna/p-video-avatar --input "{
    \"image\": \"https://portrait.jpg\",
    \"voice_script\": \"This changed my morning routine completely. Five minutes and I am done.\",
    \"voice\": \"$voice\",
    \"voice_prompt\": \"Casual, authentic, like a real testimonial\",
    \"video_prompt\": \"Person talking to camera in a bright kitchen\",
    \"resolution\": \"1080p\"
  }"
done

Use Cases

UGC & Marketing: Product demos, UGC-style ads with AI presenters
Education: Course videos, explainers
Localization: Dub content across 10 languages from one image
Social Media: Consistent virtual influencer content
Corporate: Training videos, announcements
Gaming: Character avatars, NPC dialogue

Tips

Use high-quality portrait photos (front-facing, good lighting)
Audio should be clear with minimal background noise
P-Video-Avatar supports built-in TTS — no need for a separate speech generation step
P-Video-Avatar output aspect ratio matches the input image
Generate portraits with pruna/p-image using 9:16 aspect ratio for vertical videos
OmniHuman 1.5 supports multiple people in one image
LatentSync is best for syncing existing videos to new audio

Related Skills

# Dedicated P-Video-Avatar skill
npx skills add inference-sh/skills@p-video-avatar

# Full platform skill (all 250+ apps)
npx skills add inference-sh/skills@infsh-cli

# Text-to-speech (generate audio for non-TTS avatar models)
npx skills add inference-sh/skills@text-to-speech

# Speech-to-text (transcribe for dubbing)
npx skills add inference-sh/skills@speech-to-text

# Video generation
npx skills add inference-sh/skills@ai-video-generation

# Image generation (create avatar images)
npx skills add inference-sh/skills@ai-image-generation

Browse all video apps: belt app list --category video

Documentation

Running Apps - How to run apps via CLI
Content Pipeline Example - Building media workflows
Streaming Results - Real-time progress updates