xAI has quietly pushed its most significant video update of the year. Grok Imagine Video 1.5 Preview is now live — API-first, before any broad consumer rollout — and it has landed at the top of the image-to-video leaderboards. Here’s what it actually is, what’s genuinely confirmed versus merely reported, and exactly how to call it as a developer.
⚠️ Reality check on access: Despite some write-ups implying you can “try it free in the browser,” the verified path right now is the API (xAI’s own api.x.ai, plus resellers like fal, Replicate, and Kie.ai). A wider consumer rollout to X Premium tiers is still listed as “in progress.” Treat browser-based “no setup” pitches from third-party sites as their own hosted wrappers, not an official xAI consumer launch.
Grok Imagine Video 1.5 Preview is xAI’s latest image-to-video generation model. You feed it a still image plus a motion-focused prompt, and it produces a short clip — with natively generated, synchronized audio (dialogue, sound effects, ambient sound, and music) created in the same inference pass as the video, rather than bolted on afterward. That single-pass audio remains one of its clearest differentiators versus Sora, Runway, and Kling. (Replicate readme, Kie.ai)
🔴 Important correction to some articles: The official xAI model page explicitly states this preview model “currently does not support text-to-video.” Replicate confirms it is image-to-video only (every request needs an input image), and points to the separate xai/grok-imagine-video model for text-to-video. So claims that “the 1.5 Preview supports text-to-video, video editing, and multi-image editing” describe the broader Imagine API suite, not this specific preview model. Don’t build a T2V pipeline expecting this exact model alias to handle it.
The benchmark: #1 on the Image-to-Video Arena
This is the headline that drove the attention. On the Arena image-to-video (720p) leaderboard, the current standings are (arena.ai/leaderboard):
Rank
Model
Elo
1
grok-imagine-video-1.5-preview-720p
1473 ±9
2
dreamina-seedance-2.0-720p
1467 ±11
3
happyhorse-1.0
1443 ±12
4
grok-imagine-video-720p (the 1.0 predecessor)
1421 ±6
5
veo-3.1-audio
1397 ±11
The widely-cited “+52 Elo jump” checks out internally: 1473 (new) − 1421 (predecessor) = 52, edging past ByteDance’s Seedance 2.0. This is corroborated by Kie.ai (which cites 1473 vs Seedance’s 1467) and Oimi AI.
⚠️ Conflicting numbers you’ll see online: The Basenor article you linked cites “1404 ±6” as the debut Elo. That figure doesn’t match the arena.ai snapshot (1473) or the +52 math, and is likely a different/earlier capture. Separately, the Artificial Analysis “Image-to-Video (with audio)” board uses a different Elo scale entirely (its leader sits at ~1191) and, at the time of checking, had not yet listed the 1.5 Preview — its top entry there was still Seedance 2.0. (Artificial Analysis I2V leaderboard). Bottom line: the #1 ranking is real and well-supported, but exact Elo values vary by board and shift constantly as votes accumulate. Treat any single number as a snapshot, not gospel.
Verified specs
These are confirmed across the official docs and multiple independent API providers (fal, Replicate, Kie.ai):
Straight from the official xAI model page — you’re billed per second of generated video, by resolution, plus a small charge per input image:
Item
Price
Output — 480p
$0.08 / second
Output — 720p
$0.14 / second
Input image
$0.01 each
Rate limit
60 requests / minute
Region (official)
us-east-1
Worked examples (matching fal’s and JXP’s published math):
5-second 480p clip ≈ $0.40
5-second 720p clip ≈ $0.70
10-second 720p clip ≈ $1.40
15-second 720p clip ≈ $2.10 (+ $0.01 per input image)
Cost scales linearly with duration; generated audio is included at no extra charge. (fal pricing)
How to use it (API only)
Since this is API-first, here’s the practical workflow.
1. Direct via xAI (api.x.ai): Create an API key in the xAI console, then call the image-to-video endpoint with model: grok-imagine-video-1.5-preview (or the dated alias), passing your input image, prompt, duration (1–15), resolution (480p/720p), and aspect_ratio. See the xAI Video Generation / Image-to-Video docs.
2. Via a reseller (often the fastest way to test, with free Playgrounds):
The model already sees your image, so prompt for motion, not description (Replicate prompt guide):
Don’t re-describe the image — tell it what should change (action, camera move, atmosphere).
Don’t contradict the source — match the prompt to what’s actually in the photo.
Be specific about motion intensity — “car racing past at high speed” beats “car passing.”
Always give camera direction — pan, tilt, dolly, orbit, slow push-in, handheld, etc.
Negative prompts are ignored — describe what you want instead.
Add an AUDIO: block at the end to steer sound design (music, SFX, ambient, short dialogue).
Shorter is more stable — 5–8s is the sweet spot; 15s works but is more prone to artifacts.
Honest limitations
No 1080p yet — caps at 720p, while Sora and Kling offer 1080p. (1080p is on the rumored roadmap but unconfirmed by xAI.)
Quality degrades after ~2–3 chained extensions when stitching clips into longer narratives (community-reported; no xAI fix timeline). (JXP)
It’s a “Preview” — rankings and behavior can shift as it matures.
Weaker for long continuous scenes, complex multi-character choreography, and exact frame-by-frame control. (ImagineGo review)
Claims I could NOT fully verify (treat with caution)
You asked me not to hallucinate, so here’s the explicit “unconfirmed” pile — these appear in secondary blogs but not in xAI’s official docs, and several sources contradict each other:
Generation speed: “~17 seconds” (JXP) vs “~20–30 seconds for a 5s 720p clip” (Basenor). Not independently confirmed.
“2–3× faster than Seedance 2.0” — Basenor claim only.
Infrastructure: “Aurora autoregressive MoE engine, trained on Colossus 2 with ~555,000 GPUs” (Basenor) vs “Aurora trained on 110,000 NVIDIA GB200 GPUs” (JXP). These conflict and neither is in the official model docs.
“Hotshot acquired March 2025” and “1.245 billion videos generated in January 2026” — single-source, unverified.
The @grok X announcement — could not load (login wall), so its specific wording is unconfirmed.
For AI founders and marketers
Want your AI product explained to a large AI-native audience?
Kingy AI helps AI companies turn complex products into clear, useful YouTube videos that drive awareness, product understanding, demos, clicks, and search visibility.
A.I. enthusiast with multiple certificates and accreditations from Deep Learning AI, Coursera, and more. I am interested in machine learning, LLM's, and all things AI.