Everything you need to ship12-file multimodal videowith Seedance 2.0.
10 dated guides on Seedance 2.0 end to end. Native audio in one pass, 9 image plus 3 video plus 3 audio refs, running on fal.ai as ByteDance's chosen enterprise partner. Real numbers, real code, real pipelines.
Dated, opinionated, written once and kept current. Every entry is one subject, answered. No filler between the signal.
Real renders from the actual endpoint. Prompts are shown beneath each clip so you can copy the pattern.
See Seedance 2.0 in action.
A chef in a professional kitchen at dusk ties an apron in front of a pass window and looks up to say: 'Tonight, the kitchen opens at seven.'
A matte black wireless headphone slowly rotates on a concrete pedestal under one warm key light.
A calm coastal lake at dawn, camera slow push-in from behind a lighthouse toward the still water. Golden sunrise breaks across the horizon.
Two engineers in a server room look at a monitor. First says: 'That timing is off by 40 ms.' Second answers: 'Then we ship the rollback.'
Wide slow-motion shot of a falcon launching from a cliff at sunset, backlit by gold, feathers rim-lit, one long distant drumbeat over low wind.
A vintage analog synthesizer on a polished wooden desk, camera slow dolly past the glowing knobs and patch cables. Soft backlight, warm bokeh, tubes warming up.
Frequentlyasked.
01How much does Seedance 2.0 cost on fal.ai?
Standard tier at `bytedance/seedance-2.0/text-to-video` bills at roughly $0.3034 per second of 720p output, with native audio included at no extra cost. A 5 second 720p render lands at $1.52. The Fast tier at `bytedance/seedance-2.0/fast/text-to-video` drops to $0.2419 per second on the same schema, about 20 percent cheaper for iteration. Reference-to-video with video refs applies a 0.6x duration multiplier, effectively $0.1814 per second. Math follows the token formula (height x width x duration x 24) divided by 1024 at $0.014 per 1k tokens on standard and $0.0112 per 1k on Fast. Validate at fal.ai/pricing.
02What is the max resolution?
Seedance 2.0 caps at 720p. You pick between 480p and 720p on `bytedance/seedance-2.0/text-to-video`, with no 1080p or 4K option. The legacy Seedance 1.5 Pro endpoint still reaches 1080p if you need it, and Veo 3.1 pushes to 4K for broadcast work. For social delivery, creator content, and preview reviews, 720p with native audio is where most teams land; the cinematic grade and token budget go further at 720p than stretched 1080p. Upscale downstream with a dedicated upscaler if your target needs more pixel density.
03How many reference images, videos, and audios can I pass?
Up to 12 files total split across three channels: 9 images, 3 videos, and 3 audios. That is the full multimodal surface on `bytedance/seedance-2.0/reference-to-video`. Images hold character, wardrobe, and composition anchors. Video refs drive camera move and motion rhythm. Audio refs drive room tone, ambience, and voice character. Duration caps still apply to the output: 4 to 15 seconds per single shot. Reference-to-video with video refs uses a 0.6x duration multiplier, so you pay for 60 percent of rendered time when video conditioning is active.
04Can I generate videos with images of real people?
The face filter that blocks generation of real individuals without identity verification sits at the ByteDance model layer, not at fal.ai. No API provider has a bypass. Operators who need portraits of real people route through identity verification and licensed talent pipelines, which is the standard across every major commercial video provider. AI-generated portraits (faces that do not match a real person) remain the documented path on `bytedance/seedance-2.0/reference-to-video` when you want character continuity without licensing overhead. Brand campaigns with cleared talent use the verified-identity intake.
05Seedance 2.0 vs Kling 3.0 Pro: which should I pick?
Pick `bytedance/seedance-2.0/text-to-video` when your brief uses multiple reference channels (image plus video plus audio), when you need 15 seconds of single-shot duration, and when you want native audio in a single pass. Pick Kling 3.0 Pro when you need 1080p output, when motion smoothness is the top priority, and when you are already in a Kling storyboarding flow. Seedance leads the head-to-head on I2V Arena Elo (1346 vs 1282) and on the multimodal surface. Kling leads on resolution ceiling and per-second price at 1080p.
06What is the Fast tier?
`bytedance/seedance-2.0/fast/text-to-video` and `bytedance/seedance-2.0/fast/reference-to-video` are quicker-turnaround variants at $0.2419 per second (about 20 percent cheaper than the $0.3034 standard tier). Same input schema, same 12-file multimodal surface, same 720p ceiling, same duration caps. Use Fast for iteration passes where you want 20 versions of a single shot before committing to the final render. Native audio remains included at no extra cost. Token formula shifts from $0.014 per 1k to $0.0112 per 1k, which is where the Fast savings come from.
07How do I call it from Python?
Install `fal-client`, set `FAL_KEY` in your environment, and subscribe to `bytedance/seedance-2.0/text-to-video`. The input dictionary mirrors the TypeScript SDK: prompt, duration (4 to 15), resolution (480p or 720p), aspect_ratio, generate_audio, and seed. Use `fal_client.subscribe` for synchronous waits or `fal_client.submit` for async jobs with webhooks. The queue returns logs you can stream with `with_logs=True`. Full schema and code shape live on the endpoint page under fal.ai/models. The Fast tier swaps the endpoint path to `bytedance/seedance-2.0/fast/text-to-video` without any other code change.
08What happens when a render fails?
The fal async queue behind `bytedance/seedance-2.0/text-to-video` returns structured errors with the rule id or timeout reason. Soft retries: drop the duration from 15 seconds to 8, switch the seed, or loosen the prompt if a content rule fired. Hard retries: route the same input to the Fast tier for a lighter compute path, or fall back to Seedance 1.5 Pro if you need 1080p. The queue logs surface in your dashboard, and webhooks report completion state to your server. Transient queue failures retry once automatically; committed failures surface a clear error code.
09Why run Seedance 2.0 on fal.ai?
Eight reasons. One, fal.ai is ByteDance's chosen enterprise partner for Seedance 2.0 with day-one access to all six endpoints. Two, a single FAL_KEY speaks to 600+ models, so your pipeline does not fragment across providers. Three, serverless scale with no cold starts plus an async queue that supports webhooks for fan-out. Four, the Fast tier at $0.2419 per second for iteration budgets on `bytedance/seedance-2.0/fast/text-to-video`. Five, regional points of presence for lower latency. Six, one `@fal-ai/client` SDK in TypeScript, Python, and Swift. Seven, free signup credits to kick the tires. Eight, Slack and Discord access to the fal team when a pipeline question needs a human.
10What formats and aspect ratios does Seedance 2.0 support?
`bytedance/seedance-2.0/text-to-video` accepts aspect ratios 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, plus an auto mode that picks from the prompt. Duration ranges 4, 5, 8, 10, 12, and 15 seconds. Resolution is 480p or 720p. Output is MP4 with H.264 video and AAC audio when generate_audio is on. Seed input is supported for reproducible renders. For vertical social deliverables, 9:16 at 720p is the common pick; for cinematic widescreen, 21:9 is available natively without cropping downstream. Native audio runs at the same encode as the video track.
Seedance 2.0at a glance.
Seedance 2.0 is ByteDance Seed's flagship video model, announced February 12, 2026 and live on fal.ai as its chosen enterprise partner since April 15, 2026. The signature edge is a 12-file multimodal reference surface: up to 9 images, 3 videos, and 3 audios passed into a single call, combined with native audio generation in the same forward pass. You brief a shot with a character still, a camera-move reference clip, and a room-tone sample, and Seedance 2.0 returns the take with audio baked in. No separate TTS stitch, no lip sync pass, no identity matcher layered on top. On the Artificial Analysis Arena, Seedance 2.0 sits at rank two on both legs of the leaderboard with Elo 1270 on text-to-video and 1346 on image-to-video. HappyHorse 1.0 currently leads the Arena; Seedance 2.0 is the strongest of the major commercial endpoints that is generally available without partner allowlist.
The honest caveats matter. Max output resolution is 720p. You do not get 1080p or 4K out of Seedance 2.0, and if your delivery target is broadcast, you will upscale downstream or pick a different model. The model-layer face filter blocks generation of real people without identity verification. This is enforced at the ByteDance model layer, not at fal.ai, so no API provider has a bypass. Operators that need portraits of real individuals route through licensed talent and verified identity pipelines; AI-generated portraits remain the documented path across all commercial video providers. Where Seedance 2.0 wins: 15 second maximum single-shot duration beats Veo 3.1's 8 seconds, the 12-file multimodal reference surface is unique across the cohort, and native audio is included at no extra charge while Veo 3.1 charges $0.40 per second at 1080p.
Against the field, the picks sort cleanly. Against Kling 3.0 Pro: pick Seedance for longer duration, multimodal ref control, and joint audio. Against Veo 3.1: pick Seedance when $0.30 per second beats $0.40 per second and 15 seconds beats 8 seconds, concede to Veo when you need 4K or the broadcast color pipeline. Against Grok Imagine v1.0: pick Seedance for cinematic narrative work and multi-reference briefs, concede to Grok on raw iteration speed and $0.07 per second pricing. The Fast tier at $0.2419 per second gives you the same schema on iteration budgets, and six endpoints (text-to-video, image-to-video, reference-to-video plus Fast variants of each) cover every entry point a production pipeline needs.
- 01Indie film directors briefing cinematic narrative shots with multi-reference control
- 02Agency teams producing short-form content that needs native audio in one pass
- 03Ad studios working with licensed talent portraits and brand-approved reference imagery
- 04Research groups benchmarking hosted video generation against the Arena leaderboard
- 05Creative pipelines that need the 12-file multimodal surface for character continuity
- 01Your brief uses more than one reference channel (image plus video plus audio)
- 02You need 15 second single-shot duration where Veo 3.1 tops out at 8 seconds
- 03You want identical character and motion across multiple shots using shared refs
- 04Your budget prefers $0.30 per second for cinematic work over $0.40 per second
- 05You need native audio baked into the render without a separate synthesis pass
fal.ai is ByteDance's chosen enterprise partner for Seedance 2.0, giving you day-one access to all six endpoints, the full 12-file multimodal reference surface, and the Fast tier at $0.2419 per second for iteration. One fal.subscribe call, one billing dashboard, one SDK for every other model you need to route to next.
Call Seedance 2.0in under 20 lines.
01import { fal } from "@fal-ai/client";0203fal.config({ credentials: process.env.FAL_KEY });0405// Seedance 2.0 text-to-video on fal.ai06const result = await fal.subscribe("bytedance/seedance-2.0/text-to-video", {07 input: {08 prompt: "A chef in a professional kitchen at dusk looks to camera and says: 'Tonight, the kitchen opens at seven.' Warm key light, shallow depth of field, soft extractor hum, 24fps cinematic grade.",09 duration: 5, // 4 to 15 seconds10 resolution: "720p", // 480p or 720p11 aspect_ratio: "16:9",12 generate_audio: true, // native audio on by default13 seed: 42,14 },15 logs: true,16 onQueueUpdate: (update) => {17 if (update.status === "IN_PROGRESS") {18 update.logs?.map((log) => log.message).forEach(console.log);19 }20 },21});2223console.log(result.data.video.url);
{ video: { url: "https://v3.fal.media/files/..." }, seed: 42 }What Seedance 2.0costs on fal.ai.
5s 720p audio on
5s 720p from still
5s 720p, 4 image refs
5s 720p, 2 video refs
5s 720p audio on
5s 720p, 2 video refs
Pricing via token formula (h x w x duration x 24) / 1024 at $0.014/1k on standard, $0.0112/1k on Fast. Native audio included at no extra cost.
Official pricing pageSeedance 2.0vs the field.
12-file multimodal refs, native audio, cinematic narrative
Motion smoothness, storyboarding
Broadcast-grade color, cinematic finish
Fastest and cheapest iteration
Seedance 2.0 leads the Kling/Veo/Grok cohort on duration, multimodal ref control, and I2V Arena Elo. Pick it when your brief needs more than one reference channel and you want audio baked in.
The posts we point people at when they ask where to start with Seedance 2.0.
Three to read first.
Debugging Seedance 2.0: Face Blocks, IP Warnings, and What to Do
Two Seedance 2.0 filters block a lot of first tries: the face guardrail on real-person references, and the copyright guardrail on trademarked characters. Both live at the model layer.
Seedance 2.0 Fast Tier vs Standard: The Pricing Math
Image-to-Video: Character Consistency Patterns That Hold
Every topic we cover.
Technique
- Image-to-Video: Character Consistency Patterns That Hold
- Native Audio Generation: When to Keep It On and When to Skip
Troubleshooting
Pricing
Integration
Prompting
Use case
Comparisons
Comparison
Workflow
Latest posts.
Integrating Seedance 2.0 Into a Production Render Queue
Async queue pattern for Seedance 2.0: submit with webhooks, poll for status, track cost per job, and retry around the face filter without burning budget.
Native Audio Generation: When to Keep It On and When to Skip
Prompting Seedance 2.0 with 12-File Multimodal References
Reference-to-Video: Building a Brief With the Rule of 12
Seedance 2.0 vs Kling 3.0 Pro vs Veo 3.1: Who Wins When
When to Fall Back to Seedance 1.5 Pro
Seedance 2.0 caps at 720p while 1.5 Pro still outputs 1080p. A guide to when the older model wins: broadcast deliverables, upscale pipelines, and archival clients who contract 1080p masters.
The numbers.
What this publication is and isn't, in numbers.
Each one is dated, second-person, and opinionated.
Filter by the constraint you care about.
Total length of every post in the archive.
Not a single U+2014 survives our ship check.
Editor-selected cover stories.
Custom covers on every featured post.
What we write about most.
Keyword frequency across every post. The bigger the word, the more often we come back to it.
Keep reading.The full blog is open.
No gates, no sign-up, no newsletter. Just 10 dated posts on Seedance 2.0.
Browse the full blog
Sort by date, filter by category, search by keyword.
Debugging Seedance 2.0: Face Blocks, IP Warnings, and What to Do
Two Seedance 2.0 filters block a lot of first tries: the face guardrail on real-person references, and the copyright guardrail on trademarked characters. Both live at the model layer.