Text-to-video
CLI
Generate a video from a text description:MCP
Use thegenerate_media tool with media_type: "video":
Image-to-video
Attach a start frame and describe the motion you want:Talking video
Attach a face image and provide a script in the prompt to generate a talking-head video:Lipsync
Attach both a face image and an audio file to sync lip movement to existing audio:The
--attachment flag is repeatable. For lipsync, provide the face image and audio file as separate attachments.Video settings
CLI flags
| Flag | Description | Example |
|---|---|---|
--model | Video model ID | kling-v2.6-pro |
--aspect-ratio | Output aspect ratio | 16:9, 9:16, 1:1 |
--output-format | File format | mp4, webm |
--num-variations | Number of variations | 1-4 |
MCP parameters
Thegenerate_media tool accepts: prompt, model, aspect_ratio, media_type ("video"), output_format, num_variations, and attachments.
Batch video generation
Generate multiple video variations:Tips
- Start frame matters. For image-to-video, the quality and composition of your start frame directly affects the output.
- Keep prompts focused. Describe one clear motion or scene rather than a complex sequence.
- Use
--jsonin agent workflows so the next step can parse the result from the output. - Async jobs. Video generation can take several minutes. Use
pixa jobs follow <id>to poll, or in MCP useget_job_statuswithsync: true.
Related
- Video Generation (User Guide) — in-app video workflow
- CLI Command Reference — full flag and command reference