| name | fan-cam |
| description | Create personalized live sports broadcast fan-cam videos with genmedia. Use this for realistic spectator cutaways, stadium or arena crowd reactions, broadcast screenshots, sports TV shots, scoreboard overlays, TV channel bugs, and identity-preserving fan reaction videos from a user photo.
|
Fan cam production with genmedia
Use this skill when the user wants a personalized spectator video that feels
like a real live sports broadcast cutaway. The usual input is one photo of the
person, event details, and a desired reaction or situation.
Runtime is the genmedia CLI. Use the genmedia skill for command syntax. Load
model-routing, fal-prompting, and genmedia-workflow when endpoint choice,
model-specific prompt craft, or pipeline execution details matter.
Do not encode private examples, local file paths, user-specific workflow names,
or conversation-specific details into prompts or docs. Keep this skill
generalized.
References
Load only what is needed:
references/prompt-contract.md for the image prompt and Kling prompt rules.
references/genmedia-commands.md for executable CLI command patterns.
references/examples.md for sport-specific examples.
Required inputs
Ask only for missing information that changes execution:
- User photo: local path or URL. This is the identity reference.
- Event details: sport, matchup, venue, league, broadcast context, wardrobe,
scoreboard idea, crowd behavior, and any specific scenario.
- Reaction or situation: excited, happy, laughing, sad, neutral, angry,
surprised, nervous, focused, eating, distracted, caught on camera, noticing
the stadium screen, celebrating, disappointed, or another user-specified
moment.
- Budget or quality preference only when the user explicitly asks for economy,
preview, or native 4K final output. Otherwise use the standard fan-cam
defaults: GPT Image 2 edit at
quality=high with a 3840x2160 frame, then
Kling v3 Pro.
If the user gives a local image path, upload it once with genmedia upload and
reuse the returned URL. If the user gives multiple references, treat the first
person image as the identity source and later images as optional venue,
broadcast, or styling references.
The user photo is an identity reference, not a Kling-ready start frame. Do not
skip GPT Image 2 edit just because the user supplied a person's photo. For a
personalized fan-cam, first use openai/gpt-image-2/edit to place the person
inside a realistic 16:9 broadcast scene, then use the approved generated frame
as Kling start_image_url.
Pipeline
Default graph:
photo URL -> prompt planning -> GPT Image 2 edit frame -> optional compression -> Kling v3 image-to-video -> downloaded video manifest
The GPT Image 2 edit frame is mandatory when the input is an ordinary person
photo. Only bypass this step if the user explicitly provides an already
approved 16:9 broadcast fan-cam frame and asks to animate that frame.
The planning step is performed by the agent using this skill. Do not call a
separate LLM endpoint just to write prompts unless the user explicitly asks for
a hosted planner. Write the image prompt and Kling multi prompts directly.
Endpoint selection
Always verify endpoints before use:
genmedia models --endpoint_id openai/gpt-image-2/edit --json
genmedia models --endpoint_id fal-ai/kling-video/v3/standard/image-to-video --json
genmedia models --endpoint_id fal-ai/kling-video/v3/pro/image-to-video --json
genmedia models --endpoint_id fal-ai/kling-video/v3/4k/image-to-video --json
Inspect schemas before running:
genmedia schema openai/gpt-image-2/edit --json
genmedia schema fal-ai/kling-video/v3/pro/image-to-video --format openapi --json
Use --format openapi for Kling v3 image-to-video endpoints because compact
schema output may omit top-level fields such as multi_prompt,
start_image_url, duration, prompt, elements, shot_type,
negative_prompt, and cfg_scale.
Check pricing when cost matters:
genmedia pricing openai/gpt-image-2/edit --json
genmedia pricing fal-ai/kling-video/v3/standard/image-to-video --json
genmedia pricing fal-ai/kling-video/v3/pro/image-to-video --json
genmedia pricing fal-ai/kling-video/v3/4k/image-to-video --json
GPT Image 2 quality choice
- Use
quality=high by default for personalized fan-cam frames. GPT Image 2
price is strongly affected by low vs high, but fan-cam identity,
broadcast integration, and readable overlays need the stronger default.
- Use
quality=low only when the user explicitly requests economy, preview,
fast iteration, or lower-cost social drafts.
- Use
output_format=jpeg for the generated broadcast frame unless the user
needs transparency or lossless output.
- Use 16:9 4K frame size by default:
{"width":3840,"height":2160}
Kling v3 choice
Select the endpoint based on the brief:
fal-ai/kling-video/v3/pro/image-to-video: default fan-cam endpoint. Use it
for normal personalized sports cutaways, public examples, and any request
where the user did not explicitly ask for economy.
fal-ai/kling-video/v3/standard/image-to-video: use only when the user
explicitly asks for economy, preview, fastest iteration, or lower cost.
fal-ai/kling-video/v3/4k/image-to-video: use only for final premium 4K
delivery or when the user explicitly asks for 4K video. Check pricing first.
Do not choose from memory alone. Verify model status and schema with genmedia in
the current session.
Shot and duration planning
The agent decides the number and duration of multi prompts.
Hard rules:
- Each multi prompt must be at least 3 seconds.
- Total video duration must be 15 seconds or less.
- Use 2 to 5 multi prompts.
- Set the top-level Kling
duration equal to the sum of all beat durations.
- If a real user-provided or approved reference is supplied through Kling
elements, every multi prompt must reference @Element1. Do not invent
extra elements just to satisfy a prompt pattern.
- Keep every Kling prompt concise. Aim for 250-430 characters.
Recommended patterns:
- Simple cutaway: 2 beats, 6 seconds total.
- Standard reaction: 3 beats, 9 seconds total.
- Rich fan-cam moment: 4 beats, 12 seconds total.
- Full story beat: 5 beats, 15 seconds total.
Do not always use five beats. Pick the smallest number that expresses the
moment clearly.
Scene planning
The fan-cam does not need to be only a zoom on the spectator. Design the scene
from the event details:
- A nervous fan watching a decisive point.
- A supporter eating or drinking when the broadcast camera catches them.
- A spectator noticing themselves on the stadium screen.
- A quiet tennis audience reaction during a tiebreak.
- A basketball lower-bowl fan reacting to a buzzer-beater.
- A race grandstand spectator turning toward a pass or crash offscreen.
- A combat sports crowd cutaway during a tense round.
- A watch-party or esports arena reaction if the user specifies it.
Keep the whole video anchored to the generated frame. Use motion, camera
correction, crowd behavior, expression changes, and offscreen event energy to
create the sequence.
Broadcast logo and overlay
Add a small top-right TV channel bug when it fits the brief. It should feel
sport-specific and broadcast-realistic, but generic unless the user supplies an
exact approved logo or explicitly requests a named network.
Good generic examples:
FOOTBALL LIVE
COURT LIVE
BASKET LIVE
RACE LIVE
FIGHT LIVE
MATCH CAM
Use compact score or timing overlays when the event calls for them. Keep them
small, integrated, and secondary to the spectator. Avoid fake sponsor marks,
large UI graphics, unstable text, and logos that dominate the frame.
Image prompt requirements
The GPT Image 2 edit prompt must:
- Use the uploaded photo as the identity reference.
- Preserve the real face, age impression, skin tone, hair, facial hair,
glasses, face structure, asymmetry, pores, wrinkles, blemishes, and ordinary
imperfections.
- Create a horizontal 16:9 live TV broadcast screenshot.
- Place the person naturally in the spectator area.
- Make the selected reaction or situation visible but not theatrical.
- Include sport-specific venue, crowd, wardrobe, scoreboard, and broadcast
language.
- Include realistic TV capture flaws: mild compression noise, subtle motion
blur, off-center crop, foreground occlusion, focus falloff, imperfect
background faces, natural venue light, and small exposure inconsistencies.
- Include a small top-right broadcast channel bug when appropriate.
The image prompt must avoid:
- Beauty retouching, AI influencer face, changed face anatomy, enlarged eyes,
jawline sharpening, face slimming, porcelain skin, waxy skin.
- Studio portrait, passport photo, selfie framing, isolated subject, pasted
face, face cutout, empty background.
- Fake sponsor marks, oversized logos, warped scoreboard text, random props
not requested by the user, CGI crowd, cloned faces, anime, cartoon.
Kling prompt requirements
The Kling prompts must:
- Reference
@Element1 in every beat only when the request actually includes a
real user-provided or approved Kling elements entry. Otherwise describe the
featured spectator from the start_image_url; do not invent extra elements.
- Always submit Kling with
generate_audio=true. Do not use
generate_audio=false in this skill.
- When using
multi_prompt, do not send end_image_url; Kling rejects
end_image_url together with multi_prompt.
- Preserve the same person, face, outfit, seat area, crowd, overlay, lighting,
and channel bug.
- Animate realistic broadcast motion: small head movement, blinking, breath,
slight hand motion, food/drink gesture if present, nearby fans shifting,
camera push-in, pan, sidestep, operator correction, or crowd swell.
- Use sport-specific language. Never write generic alternatives like "field or
court" or "stadium or arena".
- Fit the chosen beat duration.
- If a spoken phrase should be external narration, do not write it as something
the featured spectator says. Phrase it as an off-screen broadcast commentator,
arena PA voice, or non-diegetic voiceover, and explicitly state that the
featured spectator stays silent with no lip sync and no mouth movement
matching the voice.
- Avoid face morphing, beautification, unstable scoreboard text, unstable logo,
wrong sport, impossible crowd action, excessive camera movement, and sudden
scene resets.
Negative prompt
Use a negative prompt like this and adapt only when needed:
low quality, smeared face, distorted faces, duplicated face, deformed hands, broken fingers, fake sponsor marks, oversized logos, unstable broadcast logo, watermark, text artifacts, unstable broadcast banner, flickering scoreboard, warped scoreboard text, unreadable names, passport photo, studio portrait, glamour portrait, beauty lighting, AI influencer, beautified face, changed face, enlarged eyes, sharpened jawline, pasted face, face cutout, over-smoothed skin, plastic skin, waxy skin, CGI crowd, cloned crowd, anime, cartoon, excessive camera movement, wrong sport, wrong venue
Quality gate
Before returning:
- The selected endpoint and schema were verified with genmedia.
- GPT Image 2 edit used
quality=high and image_size={"width":3840,"height":2160}
unless the user explicitly requested an economy or preview run.
- If the input was a person photo, a GPT Image 2 edit frame was generated and
approved before Kling. The raw person photo was not sent directly to Kling as
the fan-cam start frame.
- Kling used
fal-ai/kling-video/v3/pro/image-to-video unless the user
explicitly requested economy/preview or native 4K video.
- Multi prompt durations are each at least 3 seconds.
- Total duration is 15 seconds or less.
- Top-level Kling duration equals the sum of beat durations.
- Every beat references
@Element1 when a real Kling elements entry is used.
- No invented extra elements were added.
- Kling request uses
generate_audio=true.
- Kling request does not combine
multi_prompt with end_image_url.
- Any intended external narration is not lip-synced to the featured spectator;
verify the mouth does not move like the voice belongs to the person on camera.
- The generated frame is below Kling image limits. If not, compress it.
- The person remains recognizable and not beautified.
- The broadcast bug and scoreboard are small and stable enough.
- Final files were downloaded with
--download.
Return a compact manifest with endpoint IDs, request IDs, model settings,
prompts used, output URLs, downloaded files, and any visible defects.