Fish Audio Review: S2 Voice Cloning & Pricing

Reviewed July 28, 2026. Fish Audio’s current model pages, changelog and API pricing were checked for this refresh. This is a source-based review, not a claim of an undisclosed hands-on test.

Verdict: Fish Audio is most compelling when expressive speech and fast voice cloning matter more than a polished, all-purpose editing suite. S2 gives creators unusually direct control through inline performance instructions, while the API is straightforward to price. The trade-off is that voice quality still needs script-by-script review, and cloning somebody’s voice requires explicit permission.

What Fish Audio Is Now

Fish Audio combines a web speech studio with developer APIs for text-to-speech, transcription, voice design, real-time streaming and voice cloning. The important freshness change is the S2 family. Fish’s changelog dates the original S2 launch to March 10, 2026; the current developer pricing table also lists S2.1 Pro.

Capability	What it does	What to verify
Expressive TTS	Uses inline instructions to shape delivery, emotion and vocal detail.	Listen for pronunciation, pacing and unwanted performance shifts.
Voice cloning	Builds a reusable voice from reference audio.	Use clean source audio and document the speaker’s consent.
Streaming API	Returns audio progressively for conversational applications.	Measure end-to-end latency with production-sized requests.
Multi-speaker work	Supports dialogue and longer narrative workflows.	Check speaker separation and consistency across the full piece.

S2 Control Is the Main Reason to Choose It

The useful S2 idea is control inside the script. Fish describes open-domain inline tags that can direct delivery at the word or phrase level instead of forcing the creator to regenerate an entire passage for one missed beat. That is relevant for dialogue, audiobooks, game characters and localization, where a technically intelligible read can still be dramatically wrong.

This control is not a substitute for audio direction. Treat each tag as an instruction to audition, not a guaranteed result. Names, abbreviations, numbers and mixed-language passages deserve a separate listening pass.

Fish Audio API Pricing

Fish’s API is pay as you go, with no subscription fee or monthly minimum for API access. Its July 2026 documentation lists S2.1 Pro, S2 Pro and S1 at $15 per million UTF-8 input bytes. Fish equates that unit to roughly 180,000 English words or about 12 hours of speech, but real output length varies by language and delivery. Transcription and voice design use different billing units.

That makes the API easy to estimate, but cost is only one production constraint. Concurrency depends on account tier, and a long narration occupies a request slot much longer than a short voice-agent reply. Benchmark your actual workload before promising throughput.

Who Fish Audio Suits

Audio-first creators: narration, characters and multilingual dialogue where performance control matters.
Developers: applications that need streaming speech, cloning or programmatic voice management.
Localization teams: projects that can pair generation with native-speaker review.
Not a fit: teams expecting one-click, legally cleared celebrity voices or error-free long-form output.

Limits and Responsible Use

Voice cloning is the product’s sharpest feature and its biggest governance risk. Clone only voices you own or have permission to use; keep that permission tied to the model and project; label synthetic audio when context could mislead; and restrict who can generate with a shared voice. For public-facing work, review current terms and the output in full before release.

Fish Audio is a strong shortlist choice, not an automatic winner. Compare it with another production voice on your hardest scripts, not a friendly demo sentence. Kingy’s coverage of NVIDIA Fugatto and the Miso One voice model provides useful context for the broader generative-audio field.

Primary Sources

Kingy Launch Brief

The public Friday pilot has not sent its first issue yet. Join for a source-checked launch briefing with a clear try, watch or skip verdict, then check your inbox and confirm your address.

Free · Friday pilot · Double opt-in · Unsubscribe anytime