Baidu’s ERNIE 4.5 Goes Open Source: Where to Find It, How to Use It, and What Makes It Impressive

TL;DR

Baidu has unveiled its open-source ERNIE 4.5, the latest multimodal large language model release, on June 30, 2025. Accessible via GitHub, Hugging Face, and Baidu AI Studio, ERNIE 4.5 excels in reasoning, multimodal processing, and cost efficiency—outperforming competitors like GPT-4.5, Llama 3, and DeepSeek-V3 in key benchmarks while dramatically reducing deployment costs.

Introduction

Baidu has taken a bold step forward in the open-source AI movement with the release of ERNIE 4.5. As the company’s latest multimodal large language model (LLM), ERNIE 4.5 is engineered for both text and non-text inputs including images, audio, and video, making it a versatile tool for a wide array of applications. The official release on June 30, 2025, under the Apache 2.0 license, positions ERNIE 4.5 as a competitive player in the global AI landscape by combining advanced technical features, cost-efficient deployment, and expansive accessibility.

“ERNIE 4.5 is not just another LLM—it’s a multimodal powerhouse that’s rewriting the rules of what open-source AI can do.”
— ERNIE 4.5 Blog

Where to Find and Access ERNIE 4.5

Baidu has made ERNIE 4.5 available on several platforms to ensure that researchers, developers, and enterprises can easily integrate this model into their workflows:

GitHub: The official repository is hosted under the PaddlePaddle organization. It includes the core codebase, model weights, and comprehensive documentation.
Visit: GitHub – PaddlePaddle/ERNIE
Hugging Face: For those looking for pre-trained models and configuration files, Baidu’s ERNIE 4.5 is available on Hugging Face, facilitating effortless integration into various projects.
Visit: Hugging Face – Baidu ERNIE 4.5
Baidu AI Studio: This platform provides additional tools, training resources, and deployment options, helping users get started with ERNIE 4.5 in a more guided environment.
Visit: Baidu AI Studio – ERNIE 4.5
ERNIE Bot API: Baidu also exposes an easy-to-use API on their Intelligent Cloud platform, empowering developers to leverage ERNIE 4.5 via RESTful endpoints.
Visit: ERNIE Bot API

Each platform comes with detailed setup guides and documentation, ensuring smooth onboarding.

How to Use ERNIE 4.5

Leveraging ERNIE 4.5 is straightforward for anyone familiar with Python and deep learning frameworks. Below are the steps and code examples to help you get started:

Installation

Make sure you have PaddlePaddle and ERNIEKit installed on your system. Run the following commands:

bashCopy Codepip install paddlepaddle  
pip install erniekit

Downloading the Model

You can download a specific model variant (for example, the 0.3B parameter version) from Hugging Face:

bashCopy Codehuggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir ./ernie_model

Running Local Inference

To generate text using ERNIE 4.5, the following Python snippet uses FastDeploy for inference:

pythonCopy Codefrom fastdeploy import LLM, SamplingParams  
  
prompt = "Write me a poem about large language models."  
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)  
  
llm = LLM(model="baidu/ERNIE-4.5-0.3B-Paddle", max_model_len=32768)  
outputs = llm.generate(prompt, sampling_params)  
print(outputs)

Deploying as an API Server

For a production-grade deployment, you can launch ERNIE 4.5 as an API:

bashCopy Codepython -m fastdeploy.entrypoints.openai.api_server \  
    --model "baidu/ERNIE-4.5-0.3B-Paddle" \  
    --max-model-len 32768 \  
    --port 9904

For more detailed guidance, consult the ERNIEKit Documentation and FastDeploy Documentation.

What Makes ERNIE 4.5 So Impressive?

ERNIE 4.5 distinguishes itself through a series of innovative technical features that enable superior performance and versatility:

Heterogeneous Mixture-of-Experts (MoE) Architecture
ERNIE 4.5 employs a novel MoE design that separates experts for text and vision modalities while sharing knowledge across them. This architecture allows for:
- Modality-Specific Experts: Experts dedicated to text and image processing.
- Dynamic Modality Usage: The ability to bypass visual experts during text-only operations, boosting computational efficiency.
- Resource Efficiency: Visual experts operate with one-third the intermediate dimension compared to text experts, reducing visual processing computation by 66%.
Multimodal Capabilities
Designed to handle text, images, videos, and audio, ERNIE 4.5 supports:
- Thinking Mode: Enhanced reasoning for complex tasks such as mathematical problem solving and visual puzzles.
- Non-Thinking Mode: Optimized for perception tasks like document analysis and chart understanding.
- Emotional Intelligence: The model’s ability to interpret humor, satire, and internet memes, providing engaging interactions for diverse user groups.
Scalability and Efficiency
With a parameter range from 0.3B to 424B, ERNIE 4.5 ensures:
- High Efficiency in Training: Incorporating FP8 mixed-precision and memory-efficient pipeline scheduling.
- Cost-Effective Inference: Options for 4-bit and 2-bit lossless quantization accelerate inference while saving on computational expenses.
Post-Training Customization
Tailor the model to specific needs:
- ERNIE 4.5 for Language Models: Fine-tuning via Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO).
- ERNIE 4.5 for Vision-Language Models: Enhanced for tasks that combine visual and textual data.

“ERNIE 4.5’s multimodal architecture and cost efficiency are a wake-up call for the industry.”
— Analytics India Magazine

Benchmark Results and Performance

Extensive benchmarking attests to the performance and efficiency of ERNIE 4.5. Here’s a comprehensive look at its key statistics and head-to-head comparisons:

Overall Performance

Average Benchmark Score:
• ERNIE 4.5: 79.6
• DeepSeek-V3: 79.14
• GPT-4.5: 77.2
• Llama 3: 75.8
Reasoning Tasks (GSM8K & C-Eval):
• GSM8K (Math Reasoning): ERNIE 4.5 scores 94.1%, outperforming GPT-4.5 and DeepSeek-V3.
• C-Eval (General Reasoning): ERNIE 4.5 scores 92.3% against GPT-4.5’s 89.7%.
Multimodal Tasks:
• MathVista (integrating math with vision): ERNIE 4.5 achieves 68%, ahead of competing models.
• DocVQA (document-based Q&A): ERNIE 4.5 reaches 81.2% versus GPT-4.5’s 78.5%.
Chinese Benchmarks:
On tests like CMMLU and Chinese SimpleQA, ERNIE 4.5 outperforms non-Chinese models, showcasing its tailored efficiency for the local language context.

Cost Efficiency

ERNIE 4.5 is engineered for cost-effective deployment:

Operational Cost: Approximately 1% of GPT-4.5’s cost and 50% of DeepSeek-V3’s cost. For example, pricing runs at about$0.55 per million input tokens compared to GPT-4.5’s$55 per million input tokens.

Direct Comparisons

ERNIE 4.5 vs. GPT-4.5:
- ERNIE 4.5 excels in reasoning and multimodal evaluations while being significantly more cost-effective.
ERNIE 4.5 vs. DeepSeek-V3:
- While both models offer competitive general knowledge, ERNIE 4.5 shows a slight edge in reasoning tasks and dominates in Chinese benchmarks.
ERNIE 4.5 vs. Llama 3:
- Performance in multimodal and reasoning tasks consistently favors ERNIE 4.5 over Llama 3, which also comes with higher deployment costs.

“ERNIE 4.5 is the first open-source LLM to beat GPT-4.5 in both reasoning and cost, making it a true democratizer of AI.”
— This Week in AI Engineering

Conclusion

Baidu’s ERNIE 4.5 is setting new standards in the AI arena. With its state-of-the-art multimodal capabilities, innovative heterogeneous MoE architecture, and exceptional cost-efficiency, this open-source model empowers developers and enterprises alike to engage with high-performance AI without prohibitive expenses. ERNIE 4.5 not only outperforms rivals such as GPT-4.5, DeepSeek-V3, and Llama 3 on critical benchmarks but also paves the way for broader, more inclusive AI applications worldwide.

Explore the resources to get started: