Happy Horse

Generate Happy Horse videos from prompts, a starting image, or up to 9 character references

Describe the scene, character action, motion and camera. For references, mention character1, character2, etc. in image order.

1080p costs more tokens because Happy Horse pricing is per second by resolution.

Aspect ratio is sent only for Text-to-Video and Reference-to-Video modes.

Balance: 0 tokens

Prompt is required

Your generated video will appear here

Overview

Happy Horse 1.0 is Alibaba's video model that took #1 on the Artificial Analysis Video Arena in April 2026, beating Veo 3 and Kling 3.0 in blind preference voting. Prizmad ships it as a built-in tool — text-to-video, image-to-video, and reference-to-video with up to nine character images, audio included.

Background

What is Happy Horse 1.0?

Happy Horse 1.0 is the video AI model Alibaba revealed on April 9, 2026, after it appeared as a stealth entry on the Artificial Analysis Video Arena and immediately ranked #1 in both Text-to-Video and Image-to-Video categories. Prizmad shipped it as a built-in tool on April 27, 2026 — the day it became publicly available.

Architecturally it's a 15-billion-parameter transformer with 40 layers and a unified self-attention sequence — the same 32 middle layers handle text, image, video, and audio with no cross-attention modules. It generates audio and video in a single forward pass, so dialogue, ambient sound, Foley, and lip-sync are produced together rather than stitched in post.

On Prizmad, Happy Horse runs alongside ChatGPT Image 2, AI avatars, voiceover, and music inside one subscription. Pick a mode, write your prompt, attach images if you have them, and the model returns a 3- to 15-second clip at up to 1080p in 16:9, 9:16, 1:1, 4:3, or 3:4.

Use cases

What you can create with Happy Horse

UGC-style product ads

Vertical 9:16 clips of a presenter handling a product with natural motion, soft lighting, and synced dialogue — Happy Horse's strongest format on the leaderboard.

Product showcase reels

Image-to-Video on a single product photo turns a static shot into a 5–10 s reveal with smooth camera push-in, environmental motion, and Foley audio.

Multi-character scenes

Reference-to-Video with two or three character images keeps a presenter, a customer, and a product on-screen consistently across the whole clip without identity drift.

Localized ad variants

Multilingual lip-sync lets one prompt run in English, Mandarin, Japanese, German, and other supported languages — same shot, localized voice and lip motion.

Real clips generated with Happy Horse

Eight 5-second vertical clips produced with Happy Horse 1.0 on Prizmad. All examples loop muted; tap any tile in your asset library after generating to download the original mp4 with audio.

Product reveal

9:16

Lifestyle moment

9:16

Presenter shot

9:16

Cinematic close-up

9:16

Brand showcase

9:16

Vertical ad demo

9:16

Studio motion

9:16

Hero composition

9:16
Capabilities

Key capabilities

#1 on Artificial Analysis arena

Tops the Artificial Analysis Video Arena at 1381 Elo (visual-only), a 107-point lead over the second-ranked model — equivalent to users preferring Happy Horse output ~65% of the time in blind head-to-head matchups.

Native audio with the video

Generates dialogue, ambient sound, and Foley effects in the same forward pass as the picture — the output mp4 already has the audio track, no separate voiceover step required.

Multilingual lip-sync

Lip-sync support for English, Mandarin, Cantonese, Japanese, Korean, German, and French — useful for shipping the same clip in multiple markets without re-shooting.

Up to nine reference characters

In Reference-to-Video mode, attach 1–9 character images and reference them as character1 / character2 in your prompt — keeps look, outfit, and identity consistent across the clip.

Cinematic motion and camera

Strong on natural human motion, smooth camera moves, and realistic lighting — these are the categories blind preference voting on the leaderboard actually rewards, not raw resolution.

1080p, 3–15 seconds

Outputs up to 1920×1080 across five aspect ratios (16:9, 9:16, 1:1, 4:3, 3:4) and any duration from 3 to 15 seconds — covers vertical Reels and TikTok, landscape ad placements, and square feed posts.

Workflow

How to use Happy Horse on Prizmad

1

Pick a mode at the top of the workspace: Text-to-Video for prompt-only generation, Image-to-Video to animate a starting frame, or Reference-to-Video to keep up to 9 specific characters consistent.

2

Write your prompt — describe the action, camera movement, lighting, and mood. For Reference-to-Video, mention character1, character2, and so on in the same order you uploaded the images.

3

Choose resolution (720p or 1080p), duration (3–15 s), and aspect ratio. Click Generate. Happy Horse returns the clip in roughly three minutes; it lands in your asset library and can be dropped straight into the video wizard.

Pricing

How Happy Horse fits Prizmad pricing

Pay per second of video

Happy Horse pricing on Prizmad scales with output: 720p costs 1 token per second, 1080p costs 2 tokens per second. A 5-second 1080p clip is 10 tokens; a 10-second 720p clip is also 10 tokens.

Included in every plan

Happy Horse is unlocked on the same Prizmad account you use for ChatGPT Image 2, AI avatars, voiceover, and music — no separate accounts, no API key setup, no extra billing.

Run out? Top up, don't upgrade

If your monthly tokens run dry mid-shoot, buy a one-off top-up that doesn't change your plan. Generated videos stay yours with full commercial rights.

Comparison

Happy Horse 1.0 vs Veo 3 vs Kling 3.0

FeatureHappy Horse 1.0Veo 3Kling 3.0
Artificial Analysis Elo (visual)1381 (#1)12171218–1242
Native audio in outputYes (single pass)YesLimited
Multilingual lip-sync7 languagesLimitedLimited
Max resolution1080p1080p1080p
Duration range3–15 sUp to 8 s5–10 s
Reference charactersUp to 911
Available on PrizmadYesNoMotion Control variant only
Best forUGC ads, product reveals, multi-characterGeneral text-to-videoStylized motion, longer pans
FAQ

Frequently asked questions

How does Prizmad pricing work for Happy Horse?

Happy Horse charges per second of output by resolution: 720p costs 1 token per second, 1080p costs 2 tokens per second. So a 5-second 1080p clip costs 10 tokens, and a 10-second 720p clip costs 10 tokens. Tokens come from your Prizmad plan first; if they run out you can buy a one-off top-up without changing your plan. There are no separate accounts to manage, no API key setup, and no extra billing.

What's the difference between Image-to-Video and Reference-to-Video?

Image-to-Video takes a single starting frame and animates it — the model decides the motion and continuation from that frame. Reference-to-Video takes 1–9 character images as identity references; the prompt drives the scene and Happy Horse keeps each referenced character consistent throughout. Use Image-to-Video when you have an exact frame you want to animate; use Reference-to-Video when you have product or person photos that need to appear inside a new scene.

Does Happy Horse really generate audio with the video?

Yes. Happy Horse generates audio and video in a single forward pass: dialogue, ambient sound, Foley effects, and lip-sync are produced together, not added in post. Lip-sync supports English, Mandarin, Cantonese, Japanese, Korean, German, and French. The output mp4 file already contains the audio track.

How long does a generation take?

On Prizmad the Happy Horse workspace shows an estimated time of about three minutes per clip. Actual time scales with duration and resolution — short 720p clips return faster than 15-second 1080p clips. Generations run asynchronously, so you can queue another while one is rendering.

Do I own the rights to videos I generate?

Yes. You retain full commercial rights to every video generated with Happy Horse on Prizmad — use them in paid ads, on landing pages, in social campaigns, and on owned channels with no extra licensing.

Published May 2026 · Last updated May 2026