• AI News
  • Blog
  • AI Calculators
    • YouTube Sponsorship ROI Calculator for AI Companies
    • AI Agent Directory & Readiness Scorecard
    • AI Search Visibility Calculator
    • Build Your AI Workflow Stack: Find the Best AI Tools for Your Job, Budget, and Skill Level
    • 100 AI Agent Use Cases That Actually Work in 2026: Real Workflows for Founders, Marketers, Creators, and Operators
  • AI Courses
    • AI Loop Engineering for Beginners
    • OpenAI Codex Course for Beginners: Build Apps Without Coding
    • How to Use ChatGPT: The Complete Beginner-to-Expert Course
    • AI Agents for Beginners: Build Your First AI Worker Without Coding
    • AI Coding Foundations for Beginners
    • AI Workflow Operator Course for Beginners
    • AI Search Visibility Course for Beginners
    • AI Video Production Course for Beginners
    • MCP, AGENTS.md, and Context Engineering for Beginners – Online Course
    • AI Browser Agents for Beginners: Use AI Websites Safely – Full Course
    • Codex Zero to Hero: Learn OpenAI Codex, GitHub, Git, Vercel, AI Coding Agents, and Real-World Software Shipping
    • Microsoft Copilot – Zero To Hero
  • AI Launch Intelligence
    • AI Agent Launches
    • AI App Builder and Vibe Coding Launches
    • AI Coding Tool Launches
    • AI Companies and Launches With Strong Creator Coverage Potential
    • AI Funding Announcements
    • AI Image Tool Launches
    • AI Launch Visibility Score Calculator
    • AI Open-Weight Model Launches
    • AI Search and Research Tool Launches
    • AI Video Tool Launches
    • AI Launch Scorecard
  • AI Launch Tracker
  • Clients
  • Contact
  • Sponsorship & Youtube
Sunday, June 14, 2026
Kingy AI
  • AI News
  • Blog
  • AI Calculators
    • YouTube Sponsorship ROI Calculator for AI Companies
    • AI Agent Directory & Readiness Scorecard
    • AI Search Visibility Calculator
    • Build Your AI Workflow Stack: Find the Best AI Tools for Your Job, Budget, and Skill Level
    • 100 AI Agent Use Cases That Actually Work in 2026: Real Workflows for Founders, Marketers, Creators, and Operators
  • AI Courses
    • AI Loop Engineering for Beginners
    • OpenAI Codex Course for Beginners: Build Apps Without Coding
    • How to Use ChatGPT: The Complete Beginner-to-Expert Course
    • AI Agents for Beginners: Build Your First AI Worker Without Coding
    • AI Coding Foundations for Beginners
    • AI Workflow Operator Course for Beginners
    • AI Search Visibility Course for Beginners
    • AI Video Production Course for Beginners
    • MCP, AGENTS.md, and Context Engineering for Beginners – Online Course
    • AI Browser Agents for Beginners: Use AI Websites Safely – Full Course
    • Codex Zero to Hero: Learn OpenAI Codex, GitHub, Git, Vercel, AI Coding Agents, and Real-World Software Shipping
    • Microsoft Copilot – Zero To Hero
  • AI Launch Intelligence
    • AI Agent Launches
    • AI App Builder and Vibe Coding Launches
    • AI Coding Tool Launches
    • AI Companies and Launches With Strong Creator Coverage Potential
    • AI Funding Announcements
    • AI Image Tool Launches
    • AI Launch Visibility Score Calculator
    • AI Open-Weight Model Launches
    • AI Search and Research Tool Launches
    • AI Video Tool Launches
    • AI Launch Scorecard
  • AI Launch Tracker
  • Clients
  • Contact
  • Sponsorship & Youtube
No Result
View All Result
  • AI News
  • Blog
  • AI Calculators
    • YouTube Sponsorship ROI Calculator for AI Companies
    • AI Agent Directory & Readiness Scorecard
    • AI Search Visibility Calculator
    • Build Your AI Workflow Stack: Find the Best AI Tools for Your Job, Budget, and Skill Level
    • 100 AI Agent Use Cases That Actually Work in 2026: Real Workflows for Founders, Marketers, Creators, and Operators
  • AI Courses
    • AI Loop Engineering for Beginners
    • OpenAI Codex Course for Beginners: Build Apps Without Coding
    • How to Use ChatGPT: The Complete Beginner-to-Expert Course
    • AI Agents for Beginners: Build Your First AI Worker Without Coding
    • AI Coding Foundations for Beginners
    • AI Workflow Operator Course for Beginners
    • AI Search Visibility Course for Beginners
    • AI Video Production Course for Beginners
    • MCP, AGENTS.md, and Context Engineering for Beginners – Online Course
    • AI Browser Agents for Beginners: Use AI Websites Safely – Full Course
    • Codex Zero to Hero: Learn OpenAI Codex, GitHub, Git, Vercel, AI Coding Agents, and Real-World Software Shipping
    • Microsoft Copilot – Zero To Hero
  • AI Launch Intelligence
    • AI Agent Launches
    • AI App Builder and Vibe Coding Launches
    • AI Coding Tool Launches
    • AI Companies and Launches With Strong Creator Coverage Potential
    • AI Funding Announcements
    • AI Image Tool Launches
    • AI Launch Visibility Score Calculator
    • AI Open-Weight Model Launches
    • AI Search and Research Tool Launches
    • AI Video Tool Launches
    • AI Launch Scorecard
  • AI Launch Tracker
  • Clients
  • Contact
  • Sponsorship & Youtube
No Result
View All Result
Kingy AI
No Result
View All Result
Home AI News

Rio 3.5 Open 397B: A Serious Open Model Release, Or A Benchmark Claim In Need Of An Audit?

Curtis Pyke by Curtis Pyke
June 13, 2026
in AI News
Reading Time: 24 mins read
A A

Rio 3.5 Open 397B is a real, large, open-weight Hugging Face release tied to Rio de Janeiro’s municipal technology company. Its specs are impressive. Its benchmark claims are even bigger. The right response is neither hype nor dismissal, but verification.

As of June 14, 2026, this analysis treats the Hugging Face repository and model card as primary sources while distinguishing confirmed repository evidence from first-party benchmark claims that have not yet been independently reproduced.

TL;DR

Rio 3.5 Open 397B is a newly published Hugging Face model under prefeitura-rio/Rio-3.5-Open-397B. The repository is real, large, and downloadable: the Hugging Face API lists 97 safetensor shards, 110 repository files, and safetensor metadata totaling 403,397,928,944 tensor parameters, with an index total size of 806,795,875,168 bytes, or roughly 807 GB decimal / 751 GiB binary. The model card identifies the base model as Qwen/Qwen3.5-397B-A17B, tags the release as MIT-licensed, and says it was developed by IplanRIO, the municipal IT company associated with Rio de Janeiro’s city government. Hugging Face model card, Hugging Face API, raw config, safetensors index

The headline claims are substantial: about 397B total parameters, about 17B active parameters, a sparse Mixture-of-Experts architecture, multimodal input, multilingual coverage, a 1,010,000-token context window, and first-party benchmark results that place Rio close to leading open and proprietary models. The important caveat is that most Rio-specific benchmark claims appear only on the Rio model card at the time of writing. I found repository evidence for the weights, architecture, base-model relationship, modality configuration, and a permissive license tag. I did not find independent Rio benchmark reproductions, a published Rio training report, or repo-local code implementing the claimed SwiReasoning inference path.

That makes Rio 3.5 Open 397B interesting, but not yet settled. Builders should treat it as an auditable open-weight release with aggressive first-party claims, not as an independently proven new frontier champion. For broader context on choosing between closed, open, and local models, see Kingy AI’s guide: Which AI Model Should You Use?

Why This Release Matters

The unusual part is not just the scale. Open-weight MoE releases at hundreds of billions or trillions of total parameters are now part of the competitive landscape. Qwen, DeepSeek, Kimi, MiniMax, Meta, Z.ai/GLM, and others have already made large open or open-weight models central to the AI race.

The unusual part is the name on the repository. The Rio model card says the model was developed by IplanRIO – Empresa Municipal de Informatica e Planejamento S.A., which it describes as the municipal IT company of Rio de Janeiro’s city government. IplanRIO’s official site describes the organization as the municipal company responsible for administering the city’s information and communications technology resources. Rio model card, IplanRIO official site

That is a different release profile from the usual frontier-lab playbook. Frontier systems are normally launched by OpenAI, Anthropic, Google DeepMind, xAI, Alibaba/Qwen, DeepSeek, Moonshot, Meta, Mistral, MiniMax, or Z.ai. A city-government-associated technology company publishing a Qwen-derived, frontier-scale open-weight model is therefore notable even before the benchmark table enters the room.

It also lands at the center of a live strategic question: will the next useful AI stack be controlled mostly through closed APIs, or will open models become good enough for serious agentic software, government, and enterprise work? Kingy has covered that broader sovereignty question here: Open-source AI models, local LLMs, hardware, and AI sovereignty.

What Rio 3.5 Open 397B Is

Rio 3.5 Open 397B appears to be a post-trained derivative of Qwen3.5-397B-A17B. The Rio README front matter lists base_model: Qwen/Qwen3.5-397B-A17B, and the Hugging Face tags include base_model:Qwen/Qwen3.5-397B-A17B. The text configuration is also structurally aligned with Qwen3.5: qwen3_5_moe_text, 60 layers, 512 experts, 10 selected experts per token, 4096 hidden size, 32 attention heads, and 2 key-value heads. Rio README, Rio config, Qwen3.5-397B-A17B card

In plain English, it is not a dense 397B model that activates every parameter for every token. It is a Mixture-of-Experts model. In an MoE model, many expert subnetworks exist, but only a subset is routed for each token. That is why the model card can claim roughly 397B total parameters and roughly 17B active parameters. Total parameters shape storage and memory requirements; active parameters shape part of per-token compute cost. Both matter.

AI-generated illustration of MoE routing
AI-generated illustration: Mixture-of-Experts routing. No benchmark or architectural data is embedded in this image.

Technical Specs

Spec What the source says Verification status
Repository prefeitura-rio/Rio-3.5-Open-397B on Hugging Face Confirmed
Developer claim IplanRIO / municipal IT company of Rio de Janeiro Claimed by model card; IplanRIO role corroborated by official site
Base model Qwen/Qwen3.5-397B-A17B Confirmed in README metadata and HF tags
Architecture Qwen3_5MoeForConditionalGeneration; text config qwen3_5_moe_text Confirmed in config
Total parameters Model card: about 397B; HF safetensors metadata: 403.397B tensor parameters Confirmed with counting caveat
Active parameters Model card/base model: about 17B active Claimed by model card and Qwen base card
Text layers 60 Confirmed in config
Experts 512 experts, 10 selected experts per token Confirmed in config
Hidden size 4096 Confirmed in config
Attention heads 32 query heads, 2 key-value heads Confirmed in config
Context window README claim: 1,010,000 tokens; config: max_position_embeddings 262,144; examples use 1,048,576 in vLLM/SGLang Mixed evidence; long-context serving likely requires overrides/scaling
Modalities Hugging Face pipeline: image-text-to-text; config includes image/video token IDs and vision config Plausible/confirmed at config level; needs independent task testing
Languages README lists multilingual support and front matter lists pt, en Claimed; broad language quality not independently verified here
Precision Safetensors metadata: BF16 parameters plus 8,640 F32 values Confirmed by HF API
Weight footprint Safetensors index total size: 806,795,875,168 bytes Confirmed
Repository files 97 .safetensors shards; 110 sibling files via HF API; no repo-local .py implementation files observed Confirmed at retrieval time
License README front matter license: mit; HF tags license:mit; no separate LICENSE file observed in the sibling list I retrieved MIT-tagged, but package-level license artifacts should be audited before commercial use
Inference examples Transformers, vLLM, and SGLang examples in README Confirmed
Quantizations Not part of the original repo files reviewed; Hugging Face model search shows derivative quantized listings may appear Verify each derivative separately
Chart: repository metadata versus prose claims

The context-window caveat deserves special attention. Rio’s README lists a 1,010,000-token context window and gives vLLM/SGLang examples using 1,048,576 tokens. But the actual config.json sets max_position_embeddings to 262,144. Qwen’s own Qwen3.5 card says Qwen3.5 “natively supports” up to 262,144 tokens and gives YaRN/RoPE-scaling instructions for reaching 1,010,000 tokens. That makes the safest interpretation: 262K is the native/configured context; roughly 1M is a long-context serving mode that likely depends on scaling overrides and memory budget. Rio README, Rio config, Qwen3.5 card long-context section

AI-generated illustration of long-context document flow
AI-generated illustration: long-context processing. The verified config value is 262,144 positions; the 1M-token claim is a README/serving-mode claim.

Benchmark Claims

The Rio model card includes a large benchmark table covering agentic coding, software engineering, reasoning, math, multilingual, multimodal, instruction following, agents, and an estimated economic-value benchmark. The repository also includes the following benchmark graphic.

Hugging Face benchmark chart for Rio 3.5 Open 397B
Source image from the Rio 3.5 Open 397B Hugging Face repository: rio-3.5-open-benchmarks.png. Treat it as a first-party chart unless independently reproduced.

Here is the core issue: benchmark numbers are not all equal. A number can be independently reproduced, benchmark-host reported, vendor reported, model-card-only, or copied from a comparison table with unclear harness settings. Rio’s model-card numbers should currently be treated as first-party/model-card-only unless a benchmark host or third party publishes reproducible Rio runs.

Category Benchmark Rio 3.5 Open 397B Qwen3.5 base in Rio card Source status
Agentic coding Terminal-Bench 2.1 70.8 52.5 Rio model card only
Agentic coding DeepSWE 23.0 6.0 Rio model card only
Agentic coding SWE-Bench Pro 58.1 50.9 Rio model card only
Agentic coding SWE-Bench Verified 80.2 76.2 Rio model card only
Agentic coding SWE-Bench Multilingual 77.0 69.3 Rio model card only
Knowledge/reasoning GPQA Diamond 90.9 88.4 Rio model card only
Knowledge/reasoning Humanity’s Last Exam 36.5 28.7 Rio model card only
Knowledge/reasoning MMLU-Pro 88.0 87.8 Rio model card only
Knowledge/reasoning MMLU-Redux 94.6 94.9 Rio model card only
Knowledge/reasoning SuperGPQA 72.3 70.4 Rio model card only
Knowledge/reasoning Apex 29.2 9.4 Rio model card only
Math HMMT 2026 Feb 93.9 87.9 Rio model card only
Math IMOAnswerBench 89.5 80.9 Rio model card only
Multilingual MMMLU 89.8 88.5 Rio model card only
Multilingual MMLU-ProX 85.6 84.7 Rio model card only
Multimodal MMMU-Pro 78.4 79.0 Rio model card only
Multimodal MathVision 89.1 88.6 Rio model card only
Multimodal VideoMMMU 81.6 84.7 Rio model card only
Tool/instruction MCP-Atlas 74.2 74.2 Rio model card only
Tool/instruction IFBench 78.4 76.5 Rio model card only
Tool/instruction IFEval 93.4 92.6 Rio model card only
Economic value GDPval estimated 1533 1200 Rio model card only; methodology unclear from card
Chart: selected coding and agent benchmarks from the Rio model card

The benchmark families themselves are real. Terminal-Bench evaluates terminal agents in realistic command-line environments; SWE-Bench Verified is a human-validated subset of GitHub issue-resolution tasks; SWE-Bench Pro extends software-engineering evaluation to harder enterprise-style tasks; DeepSWE is a long-horizon software-engineering benchmark; and MCP-Atlas evaluates tool use across real MCP servers. Terminal-Bench, SWE-Bench Verified, SWE-Bench Pro, DeepSWE, MCP-Atlas

But a real benchmark is not the same thing as a verified score. A model-card table does not tell us whether the run used a standard scaffold, whether the same harness was used across models, whether tool access was comparable, whether contamination screens were run, whether the benchmark version was the same, or whether the reported comparison values came from benchmark hosts, vendors, or internal reruns.

AI-generated illustration of benchmark audit workflow
AI-generated illustration: benchmark auditing. The article’s benchmark tables are sourced from the Rio model card unless otherwise noted.

Reported Gains Over Qwen3.5 Base

The Rio card’s most meaningful comparison is not Rio versus every frontier model. It is Rio versus its base model. Because the model claims to be post-trained from Qwen3.5-397B-A17B, the base-model delta is the natural place to look first.

Chart: reported gains over Qwen3.5 base

The reported gains are large in some places: +18.3 on Terminal-Bench 2.1, +17.0 on DeepSWE, +19.8 on Apex, +8.6 on IMOAnswerBench, +7.8 on HLE, and +7.7 on SWE-Bench Multilingual. If independently reproduced, that would be a meaningful post-training result. But the word “if” is doing real work.

One Hugging Face discussion adds useful color. A user asked whether the Rio benchmarks were run with SwiReasoning active and whether the model would still improve without latent reasoning. A reply from Sangu1nius answered “With!” and gave an ablation-style mini-table: Qwen base at 80.9 on IMOAnswerBench, 50.9 on SWE-Bench Pro, and 9.4 on Apex; “+ Training” at 84.5, 54.8, and 22.9; and “+ Latent Reasoning” at 89.5, 58.1, and 29.2. That suggests the published Rio headline numbers may depend materially on the latent-reasoning inference path. HF discussion #2

That is important because not every inference stack can implement soft/latent reasoning. The same discussion explicitly notes that engines limited to discrete token generation, such as llama.cpp, cannot currently implement that style of continuous soft-embedding reasoning. HF discussion #2

SwiReasoning: What It Is, And What Is Actually Released

Rio’s model card says the model “features SwiReasoning,” a training-free inference framework from Shi et al. that switches between explicit chain-of-thought reasoning and latent-space reasoning. The arXiv abstract for SwiReasoning describes exactly that: dynamic switching between explicit and latent reasoning guided by entropy trends in next-token distributions, plus a switch-count control mechanism to reduce overthinking. The paper reports average accuracy gains of 1.8%-3.1% and token-efficiency gains of 57%-79% under constrained budgets across mathematics, STEM, coding, and general benchmarks. SwiReasoning arXiv, SwiReasoning GitHub

The idea is straightforward, at least at a high level:

  • Explicit reasoning emits intermediate natural-language reasoning tokens. It is inspectable, but token-expensive.
  • Latent reasoning keeps some reasoning in continuous hidden space or soft-embedding form. It may be more token-efficient, but it is harder to inspect and harder to support in ordinary inference engines.
  • Switching tries to use entropy/confidence signals to decide when to explore in latent space and when to commit in explicit text.
AI-generated illustration of explicit versus latent reasoning paths
AI-generated illustration: explicit step-by-step reasoning and latent reasoning paths converging. It is conceptual, not a diagram from the SwiReasoning paper.

The catch: the Rio repository files I reviewed do not include repo-local Python code implementing SwiReasoning. The Hugging Face API file list includes model weights, configs, tokenizer files, chat_template.jinja, and preprocessors, but no .py files. The README examples show standard Transformers loading plus vLLM and SGLang serving commands. That does not disprove that the model was trained to work well with SwiReasoning, but it does mean the integration is not self-evident from the released repository alone. Rio API, Rio README

For readers who want a fuller primer on reasoning tokens and chain-of-thought tradeoffs, see Kingy AI’s explainer: Chain-of-thought explained.

How It Compares To Frontier Labs

Rio’s first-party numbers suggest a model that may be competitive with leading open-weight systems and within striking distance of some frontier API models on selected rows. But frontier comparison is where benchmark hygiene matters most.

OpenAI’s GPT-5.5 page, Anthropic’s Claude Opus 4.8 page, Google’s Gemini 3.1 Pro model page, and xAI’s Grok 4 materials all publish their own benchmark packages, often with different benchmark versions, scaffolds, tools, and inference settings. OpenAI’s GPT-5.5 page, for example, reports Terminal-Bench 2.0 and SWE-Bench Pro values, while Rio’s card lists Terminal-Bench 2.1. Those are not automatically comparable. OpenAI GPT-5.5, Anthropic Claude Opus 4.8, Google Gemini 3.1 Pro, xAI Grok 4

The real comparison looks like this:

Dimension Rio 3.5 Open 397B Frontier lab models
Weights Public Hugging Face weights Usually closed, API-only
License/control MIT-tagged on HF, but license package should be audited API terms, no weights
Reproducibility Weights are available; eval harness not published in repo Evals usually vendor-reported; some third-party leaderboards exist
Deployment Possible self-hosting, but ~807 GB BF16 footprint before KV cache and serving overhead Hosted APIs hide infra complexity
Latency/cost MoE active-parameter design helps, but full weights still need large distributed memory API price/latency depends on vendor tier
Long context 262K config; 1M serving claim likely via RoPE/YaRN-style scaling Many frontier APIs advertise 1M-class contexts, but details vary
Multimodal Config supports image/video tokens and vision tower Frontier labs generally have mature multimodal products
Agentic ecosystem README points to vLLM/SGLang/Transformers and Qwen Code Closed labs have integrated agents, connectors, computer use, and managed tools

Rio’s advantage is control. A user can download the weights, inspect the config, run local or private infrastructure, and fine-tune or adapt subject to license review and hardware budget. Frontier labs’ advantage is operational maturity: hosted reliability, safety/process documentation, product integrations, SDKs, tool ecosystems, and larger public testing footprints.

If you are building agents, that distinction matters. Agentic performance is not only “model IQ”; it is model plus tools, memory, sandboxing, retry logic, browser/terminal harness, context management, and eval discipline. Kingy has a broader implementation-oriented guide here: The state of AI agents in 2026, and an MCP-specific explainer here: What is Model Context Protocol?.

How It Compares To Heavyweight Open Models

Rio enters a crowded heavyweight open-weight field. The relevant peers are not only dense Llama-class models, but sparse MoE systems that try to combine huge total parameter counts with smaller active compute.

Model Public source Parameters / active parameters Context claim Modality License/open status
Rio 3.5 Open 397B Hugging Face Card: ~397B / ~17B active; HF metadata: 403.4B tensor params Config 262K; README 1.01M / examples 1.048M Image-text-to-text config MIT-tagged HF repo; no separate LICENSE file observed in API sibling list
Qwen3.5-397B-A17B Hugging Face/Qwen 397B / 17B active Native 262K; hosted Plus says 1M default; YaRN instructions to ~1.01M Multimodal Apache-2.0 tag
DeepSeek-V4-Pro Hugging Face 1.6T / 49B active 1M Text-focused MoE card; check repo for modality details MIT
Kimi-K2.6 Hugging Face/Moonshot 1T / 32B active 256K Image-text-to-text; MoonViT vision encoder Modified MIT
MiniMax-M3 Hugging Face ~428B / ~23B active 1M Native multimodal text/image/video MiniMax community license
GLM-5 Hugging Face/Z.ai 744B / 40B active Z.ai says long-context with DSA; HF snippet cites 744B/40B Model family varies by endpoint/card MIT according to Z.ai blog/HF card
Llama 4 Scout / Maverick Meta Scout: 17B active, 16 experts; Maverick: 17B active, 128 experts Scout advertised up to 10M context Natively multimodal Open-weight Llama license

Sources: Rio model card, Qwen3.5-397B-A17B, DeepSeek-V4-Pro, Kimi-K2.6, MiniMax-M3, GLM-5, Meta Llama 4

Against those peers, Rio’s positioning is narrower than the model-card headline might imply:

  • Versus Qwen3.5, Rio is best understood as a post-training and inference-strategy claim on top of the same base family. The reported gains over base are the key thing to reproduce.
  • Versus DeepSeek-V4-Pro, Rio is smaller in total and active parameters, and its license tag is attractive, but DeepSeek’s own V4 card claims a much larger 1.6T/49B MoE and 1M context.
  • Versus Kimi-K2.6, Rio is smaller and appears less obviously agent-productized. Kimi has a stronger public narrative around long-horizon coding and agent orchestration; Rio’s SWE-Bench Pro result is slightly lower than Kimi’s value in Rio’s own comparison table.
  • Versus MiniMax-M3, Rio is similar in total-parameter class but lower in claimed active parameters. MiniMax emphasizes native multimodality and sparse attention for 1M context.
  • Versus GLM-5, Rio is smaller and less documented, but potentially easier to reason about as a Qwen-derived release if the Qwen ecosystem supports it.
  • Versus Llama 4, Rio is in a different open-weight lineage: Qwen MoE rather than Meta’s Llama MoE family. Llama’s ecosystem maturity remains a major counterweight.

Kingy has covered some of these peer families in more detail here: DeepSeek V4 deep dive, Kimi K2.6 analysis, and AI model picker.

Deployment Realities

The “17B active” number can be misleading if read casually. It does not mean Rio is as easy to run as a 17B dense model. Sparse MoE still requires the full expert weights to be present somewhere in the serving system. For Rio, the safetensors index alone reports about 807 GB of model data. Add KV cache, allocator overhead, tensor parallelism, communication overhead, and long-context memory pressure, and this becomes a serious multi-GPU or hosted-inference project.

The README examples use --tensor-parallel-size 8 for vLLM and --tp 8 for SGLang. That does not prove eight GPUs are always sufficient, because GPU type, precision, quantization, context length, batch size, and KV-cache strategy dominate the practical answer. It does, however, signal that the authors expect distributed serving. Rio README

Quantized derivatives may make experimentation easier, and Hugging Face search already exposes derivative quantized listings tied to the Rio base-model metadata. But quantization is not magic. It can reduce memory footprint, but long-context KV cache, expert routing, throughput, and quality degradation still need testing. Hugging Face quantized model search

Claim Audit

Claim Rating Why
The Hugging Face repo exists and contains large model weights Confirmed HF API lists the repo, 97 safetensor shards, and 806.8 GB index total size
It is based on Qwen3.5-397B-A17B Confirmed README front matter and HF tags list the Qwen base model
It uses a Qwen3.5 MoE architecture Confirmed Config lists Qwen3_5MoeForConditionalGeneration, 512 experts, 10 experts per token
It is roughly 397B total / 17B active Plausible/mostly confirmed Model card and Qwen base card say this; HF safetensors metadata counts 403.4B tensor parameters
It has a 1,010,000-token context window Plausible but needs caveat README claims it; config says 262,144 max positions; Qwen base docs describe 1.01M via RoPE/YaRN scaling
It is multimodal Plausible/config-confirmed HF pipeline is image-text-to-text and config has vision/video fields; independent multimodal evals not found
It is MIT licensed Partially confirmed README metadata and HF tags say MIT; no separate LICENSE file observed in retrieved sibling list
It includes SwiReasoning Unclear Card claims it and a discussion reply says benchmark numbers used it; repo has no local implementation files
It beats or matches leading open models Unverified Rio-specific scores are currently model-card-only in sources reviewed
It competes with frontier labs Provisional It may on selected first-party rows, but frontier comparisons mix benchmark versions, tool settings, and vendor-reported values
GDPval estimate of 1533 Weak / unclear Model card gives the value but not enough methodology to treat it as a reproducible public score

Bottom Line

Rio 3.5 Open 397B is worth taking seriously because the weights and config are real, the base model is clear, the parameter footprint is large, and the release comes from an unusual institutional source. If the reported post-training and SwiReasoning gains reproduce, it could be an important open-weight model for coding agents, multilingual work, and high-control deployments.

But the responsible headline is not “municipal model beats frontier labs.” The responsible headline is: a Rio-associated team has released a large Qwen3.5-derived open-weight MoE with bold benchmark claims that now need independent reproduction.

For builders, the next steps are obvious:

  1. Run a small, auditable subset of SWE-Bench Pro, SWE-Bench Verified, Terminal-Bench 2.1, and DeepSWE with a published harness.
  2. Test Rio with and without SwiReasoning, because the discussion-thread ablation implies latent reasoning may account for a meaningful share of the gains.
  3. Verify long-context behavior at 262K, 524K, and ~1M with retrieval, reasoning, and degradation tests.
  4. Audit the license package before commercial deployment.
  5. Benchmark against Qwen3.5-397B-A17B under identical serving conditions.

Until then, Rio 3.5 Open 397B should sit in the promising-but-unproven bucket: a real open-weight release, a fascinating governance story, and a benchmark table that deserves a hard, careful second pass.

Source List

Primary Rio sources

  • Rio 3.5 Open 397B Hugging Face model card
  • Rio README
  • Rio raw config.json
  • Rio safetensors index
  • Rio Hugging Face API metadata
  • Rio benchmark image in repo
  • Rio discussion #2 on SwiReasoning
  • IplanRIO official site

Base model and open-weight peers

  • Qwen3.5-397B-A17B
  • DeepSeek-V4-Pro
  • Kimi-K2.6
  • Kimi K2.6 technical blog
  • MiniMax-M3
  • GLM-5
  • Z.ai GLM-5 blog
  • Meta Llama 4 announcement

Frontier-lab context

  • OpenAI GPT-5.5
  • Anthropic Claude Opus 4.8
  • Google DeepMind Gemini 3.1 Pro
  • xAI Grok 4

Benchmark and reasoning sources

  • Terminal-Bench
  • Terminal-Bench 2.1
  • SWE-Bench Verified dataset
  • SWE-Bench Pro dataset
  • DeepSWE
  • MCP-Atlas leaderboard
  • MCP-Atlas paper
  • SwiReasoning paper
  • SwiReasoning code

Internal Kingy AI links used

  • Which AI Model Should You Use?
  • Open-source AI models, local LLMs, hardware, and AI sovereignty
  • The state of AI agents in 2026
  • What is Model Context Protocol?
  • Chain-of-thought explained
  • DeepSeek V4 deep dive
  • Kimi K2.6 analysis
  • AI model picker
Tags: ai benchmarksHugging FaceOpen Source AIOpen-Weight ModelsQwenRio 3.5 Open 397B
Curtis Pyke

Curtis Pyke

A.I. enthusiast with multiple certificates and accreditations from Deep Learning AI, Coursera, and more. I am interested in machine learning, LLM's, and all things AI.

Related Posts

Private AI lab overshadowed by massive government architecture symbolizing soft nationalization of frontier AI
AI

The Soft Nationalization of AI Has Begun

June 12, 2026
Anthropic’s Fable 5 Shutdown: Did the U.S. Just Start Export Controls for AI Models?
AI News

Anthropic’s Fable 5 Shutdown: Did the U.S. Just Start Export Controls for AI Models?

June 12, 2026
AI-generated benchmark dashboard illustration for Kimi K2.7 Code
AI

Kimi K2.7 Code Released: Benchmarks, Specs, and How It Compares

June 12, 2026

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

I agree to the site terms and privacy practices.

Get Kingy AI Launch Intelligence

Choose daily AI launches, agents, coding tools, video tools, funding, model releases, or all Kingy AI updates.

Subscribe

Recent News

Rio 3.5 Open 397B: A Serious Open Model Release, Or A Benchmark Claim In Need Of An Audit?

June 13, 2026
AI generated editorial image of a creator controlling a local AI workstation for an owned AI stack

Own Your AI Stack: The Definitive Guide to Open-Source Models, Local LLMs, Hardware, and AI Sovereignty

June 13, 2026
OpenAI on OCI Marketplace AI launch guide editorial image

Should You Try OpenAI on OCI Marketplace? A Practical AI Launch Review

June 13, 2026
OpenAI Academy Work Courses AI launch guide editorial image

Should You Try OpenAI Academy Work Courses? A Practical AI Launch Review

June 13, 2026

Kingy AI Launch Intelligence

Choose the Kingy AI updates you want:

Check your inbox or spam folder to confirm your subscription.

The Best in A.I.

Kingy AI

We feature the best AI apps, tools, and platforms across the web. If you are an AI app creator and would like to be featured here, feel free to contact us.

Recent Posts

  • Rio 3.5 Open 397B: A Serious Open Model Release, Or A Benchmark Claim In Need Of An Audit?
  • Own Your AI Stack: The Definitive Guide to Open-Source Models, Local LLMs, Hardware, and AI Sovereignty
  • Should You Try OpenAI on OCI Marketplace? A Practical AI Launch Review

Recent News

Rio 3.5 Open 397B: A Serious Open Model Release, Or A Benchmark Claim In Need Of An Audit?

June 13, 2026
AI generated editorial image of a creator controlling a local AI workstation for an owned AI stack

Own Your AI Stack: The Definitive Guide to Open-Source Models, Local LLMs, Hardware, and AI Sovereignty

June 13, 2026
  • Home
  • Sponsor Kingy AI
  • Contact Us

© 2026 Kingy AI

No Result
View All Result
  • AI News
  • Blog
  • AI Calculators
    • YouTube Sponsorship ROI Calculator for AI Companies
    • AI Agent Directory & Readiness Scorecard
    • AI Search Visibility Calculator
    • Build Your AI Workflow Stack: Find the Best AI Tools for Your Job, Budget, and Skill Level
    • 100 AI Agent Use Cases That Actually Work in 2026: Real Workflows for Founders, Marketers, Creators, and Operators
  • AI Courses
    • AI Loop Engineering for Beginners
    • OpenAI Codex Course for Beginners: Build Apps Without Coding
    • How to Use ChatGPT: The Complete Beginner-to-Expert Course
    • AI Agents for Beginners: Build Your First AI Worker Without Coding
    • AI Coding Foundations for Beginners
    • AI Workflow Operator Course for Beginners
    • AI Search Visibility Course for Beginners
    • AI Video Production Course for Beginners
    • MCP, AGENTS.md, and Context Engineering for Beginners – Online Course
    • AI Browser Agents for Beginners: Use AI Websites Safely – Full Course
    • Codex Zero to Hero: Learn OpenAI Codex, GitHub, Git, Vercel, AI Coding Agents, and Real-World Software Shipping
    • Microsoft Copilot – Zero To Hero
  • AI Launch Intelligence
    • AI Agent Launches
    • AI App Builder and Vibe Coding Launches
    • AI Coding Tool Launches
    • AI Companies and Launches With Strong Creator Coverage Potential
    • AI Funding Announcements
    • AI Image Tool Launches
    • AI Launch Visibility Score Calculator
    • AI Open-Weight Model Launches
    • AI Search and Research Tool Launches
    • AI Video Tool Launches
    • AI Launch Scorecard
  • AI Launch Tracker
  • Clients
  • Contact
  • Sponsorship & Youtube

© 2026 Kingy AI

This website uses cookies. By continuing to use this website you are giving consent to cookies being used.