Generate Happy Horse videos from prompts, a starting image, or up to 9 character references
Describe the scene, character action, motion and camera. For references, mention character1, character2, etc. in image order.
1080p costs more tokens because Happy Horse pricing is per second by resolution.
Aspect ratio is sent only for Text-to-Video and Reference-to-Video modes.
Balance: 0 tokens
Prompt is required
Your generated video will appear here
Happy Horse 1.0 is Alibaba's video model that took #1 on the Artificial Analysis Video Arena in April 2026, beating Veo 3 and Kling 3.0 in blind preference voting. Prizmad ships it as a built-in tool — text-to-video, image-to-video, and reference-to-video with up to nine character images, audio included.
Happy Horse 1.0 is the video AI model Alibaba revealed on April 9, 2026, after it appeared as a stealth entry on the Artificial Analysis Video Arena and immediately ranked #1 in both Text-to-Video and Image-to-Video categories. Prizmad shipped it as a built-in tool on April 27, 2026 — the day it became publicly available.
Architecturally it's a 15-billion-parameter transformer with 40 layers and a unified self-attention sequence — the same 32 middle layers handle text, image, video, and audio with no cross-attention modules. It generates audio and video in a single forward pass, so dialogue, ambient sound, Foley, and lip-sync are produced together rather than stitched in post.
On Prizmad, Happy Horse runs alongside ChatGPT Image 2, AI avatars, voiceover, and music inside one subscription. Pick a mode, write your prompt, attach images if you have them, and the model returns a 3- to 15-second clip at up to 1080p in 16:9, 9:16, 1:1, 4:3, or 3:4.
Vertical 9:16 clips of a presenter handling a product with natural motion, soft lighting, and synced dialogue — Happy Horse's strongest format on the leaderboard.
Image-to-Video on a single product photo turns a static shot into a 5–10 s reveal with smooth camera push-in, environmental motion, and Foley audio.
Reference-to-Video with two or three character images keeps a presenter, a customer, and a product on-screen consistently across the whole clip without identity drift.
Multilingual lip-sync lets one prompt run in English, Mandarin, Japanese, German, and other supported languages — same shot, localized voice and lip motion.
Eight 5-second vertical clips produced with Happy Horse 1.0 on Prizmad. All examples loop muted; tap any tile in your asset library after generating to download the original mp4 with audio.
Product reveal
9:16Lifestyle moment
9:16Presenter shot
9:16Cinematic close-up
9:16Brand showcase
9:16Vertical ad demo
9:16Studio motion
9:16Hero composition
9:16Tops the Artificial Analysis Video Arena at 1381 Elo (visual-only), a 107-point lead over the second-ranked model — equivalent to users preferring Happy Horse output ~65% of the time in blind head-to-head matchups.
Generates dialogue, ambient sound, and Foley effects in the same forward pass as the picture — the output mp4 already has the audio track, no separate voiceover step required.
Lip-sync support for English, Mandarin, Cantonese, Japanese, Korean, German, and French — useful for shipping the same clip in multiple markets without re-shooting.
In Reference-to-Video mode, attach 1–9 character images and reference them as character1 / character2 in your prompt — keeps look, outfit, and identity consistent across the clip.
Strong on natural human motion, smooth camera moves, and realistic lighting — these are the categories blind preference voting on the leaderboard actually rewards, not raw resolution.
Outputs up to 1920×1080 across five aspect ratios (16:9, 9:16, 1:1, 4:3, 3:4) and any duration from 3 to 15 seconds — covers vertical Reels and TikTok, landscape ad placements, and square feed posts.
Pick a mode at the top of the workspace: Text-to-Video for prompt-only generation, Image-to-Video to animate a starting frame, or Reference-to-Video to keep up to 9 specific characters consistent.
Write your prompt — describe the action, camera movement, lighting, and mood. For Reference-to-Video, mention character1, character2, and so on in the same order you uploaded the images.
Choose resolution (720p or 1080p), duration (3–15 s), and aspect ratio. Click Generate. Happy Horse returns the clip in roughly three minutes; it lands in your asset library and can be dropped straight into the video wizard.
Happy Horse pricing on Prizmad scales with output: 720p costs 1 token per second, 1080p costs 2 tokens per second. A 5-second 1080p clip is 10 tokens; a 10-second 720p clip is also 10 tokens.
Happy Horse is unlocked on the same Prizmad account you use for ChatGPT Image 2, AI avatars, voiceover, and music — no separate accounts, no API key setup, no extra billing.
If your monthly tokens run dry mid-shoot, buy a one-off top-up that doesn't change your plan. Generated videos stay yours with full commercial rights.
| Feature | Happy Horse 1.0 | Veo 3 | Kling 3.0 |
|---|---|---|---|
| Artificial Analysis Elo (visual) | 1381 (#1) | 1217 | 1218–1242 |
| Native audio in output | Yes (single pass) | Yes | Limited |
| Multilingual lip-sync | 7 languages | Limited | Limited |
| Max resolution | 1080p | 1080p | 1080p |
| Duration range | 3–15 s | Up to 8 s | 5–10 s |
| Reference characters | Up to 9 | 1 | 1 |
| Available on Prizmad | Yes | No | Motion Control variant only |
| Best for | UGC ads, product reveals, multi-character | General text-to-video | Stylized motion, longer pans |
Happy Horse charges per second of output by resolution: 720p costs 1 token per second, 1080p costs 2 tokens per second. So a 5-second 1080p clip costs 10 tokens, and a 10-second 720p clip costs 10 tokens. Tokens come from your Prizmad plan first; if they run out you can buy a one-off top-up without changing your plan. There are no separate accounts to manage, no API key setup, and no extra billing.
Image-to-Video takes a single starting frame and animates it — the model decides the motion and continuation from that frame. Reference-to-Video takes 1–9 character images as identity references; the prompt drives the scene and Happy Horse keeps each referenced character consistent throughout. Use Image-to-Video when you have an exact frame you want to animate; use Reference-to-Video when you have product or person photos that need to appear inside a new scene.
Yes. Happy Horse generates audio and video in a single forward pass: dialogue, ambient sound, Foley effects, and lip-sync are produced together, not added in post. Lip-sync supports English, Mandarin, Cantonese, Japanese, Korean, German, and French. The output mp4 file already contains the audio track.
On Prizmad the Happy Horse workspace shows an estimated time of about three minutes per clip. Actual time scales with duration and resolution — short 720p clips return faster than 15-second 1080p clips. Generations run asynchronously, so you can queue another while one is rendering.
Yes. You retain full commercial rights to every video generated with Happy Horse on Prizmad — use them in paid ads, on landing pages, in social campaigns, and on owned channels with no extra licensing.
Published May 2026 · Last updated May 2026