Baidu has unveiled its open-source ERNIE 4.5, the latest multimodal large language model release, on June 30, 2025. Accessible via GitHub, Hugging Face, and Baidu AI Studio, ERNIE 4.5 excels in reasoning, multimodal processing, and cost efficiency—outperforming competitors like GPT-4.5, Llama 3, and DeepSeek-V3 in key benchmarks while dramatically reducing deployment costs.
Introduction
Baidu has taken a bold step forward in the open-source AI movement with the release of ERNIE 4.5. As the company’s latest multimodal large language model (LLM), ERNIE 4.5 is engineered for both text and non-text inputs including images, audio, and video, making it a versatile tool for a wide array of applications. The official release on June 30, 2025, under the Apache 2.0 license, positions ERNIE 4.5 as a competitive player in the global AI landscape by combining advanced technical features, cost-efficient deployment, and expansive accessibility.
“ERNIE 4.5 is not just another LLM—it’s a multimodal powerhouse that’s rewriting the rules of what open-source AI can do.” — ERNIE 4.5 Blog
Where to Find and Access ERNIE 4.5
Baidu has made ERNIE 4.5 available on several platforms to ensure that researchers, developers, and enterprises can easily integrate this model into their workflows:
GitHub: The official repository is hosted under the PaddlePaddle organization. It includes the core codebase, model weights, and comprehensive documentation. Visit: GitHub – PaddlePaddle/ERNIE
Hugging Face: For those looking for pre-trained models and configuration files, Baidu’s ERNIE 4.5 is available on Hugging Face, facilitating effortless integration into various projects. Visit: Hugging Face – Baidu ERNIE 4.5
Baidu AI Studio: This platform provides additional tools, training resources, and deployment options, helping users get started with ERNIE 4.5 in a more guided environment. Visit: Baidu AI Studio – ERNIE 4.5
ERNIE Bot API: Baidu also exposes an easy-to-use API on their Intelligent Cloud platform, empowering developers to leverage ERNIE 4.5 via RESTful endpoints. Visit: ERNIE Bot API
Each platform comes with detailed setup guides and documentation, ensuring smooth onboarding.
How to Use ERNIE 4.5
Leveraging ERNIE 4.5 is straightforward for anyone familiar with Python and deep learning frameworks. Below are the steps and code examples to help you get started:
Installation
Make sure you have PaddlePaddle and ERNIEKit installed on your system. Run the following commands:
ERNIE 4.5 distinguishes itself through a series of innovative technical features that enable superior performance and versatility:
Heterogeneous Mixture-of-Experts (MoE) Architecture ERNIE 4.5 employs a novel MoE design that separates experts for text and vision modalities while sharing knowledge across them. This architecture allows for:
Modality-Specific Experts: Experts dedicated to text and image processing.
Dynamic Modality Usage: The ability to bypass visual experts during text-only operations, boosting computational efficiency.
Resource Efficiency: Visual experts operate with one-third the intermediate dimension compared to text experts, reducing visual processing computation by 66%.
Multimodal Capabilities Designed to handle text, images, videos, and audio, ERNIE 4.5 supports:
Thinking Mode: Enhanced reasoning for complex tasks such as mathematical problem solving and visual puzzles.
Non-Thinking Mode: Optimized for perception tasks like document analysis and chart understanding.
Emotional Intelligence: The model’s ability to interpret humor, satire, and internet memes, providing engaging interactions for diverse user groups.
Scalability and Efficiency With a parameter range from 0.3B to 424B, ERNIE 4.5 ensures:
High Efficiency in Training: Incorporating FP8 mixed-precision and memory-efficient pipeline scheduling.
Cost-Effective Inference: Options for 4-bit and 2-bit lossless quantization accelerate inference while saving on computational expenses.
Post-Training Customization Tailor the model to specific needs:
ERNIE 4.5 for Language Models: Fine-tuning via Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO).
ERNIE 4.5 for Vision-Language Models: Enhanced for tasks that combine visual and textual data.
“ERNIE 4.5’s multimodal architecture and cost efficiency are a wake-up call for the industry.” — Analytics India Magazine
Benchmark Results and Performance
Extensive benchmarking attests to the performance and efficiency of ERNIE 4.5. Here’s a comprehensive look at its key statistics and head-to-head comparisons:
Multimodal Tasks: • MathVista (integrating math with vision): ERNIE 4.5 achieves 68%, ahead of competing models. • DocVQA (document-based Q&A): ERNIE 4.5 reaches 81.2% versus GPT-4.5’s 78.5%.
Chinese Benchmarks: On tests like CMMLU and Chinese SimpleQA, ERNIE 4.5 outperforms non-Chinese models, showcasing its tailored efficiency for the local language context.
Cost Efficiency
ERNIE 4.5 is engineered for cost-effective deployment:
Operational Cost: Approximately 1% of GPT-4.5’s cost and 50% of DeepSeek-V3’s cost. For example, pricing runs at about$0.55 per million input tokens compared to GPT-4.5’s$55 per million input tokens.
Direct Comparisons
ERNIE 4.5 vs. GPT-4.5:
ERNIE 4.5 excels in reasoning and multimodal evaluations while being significantly more cost-effective.
ERNIE 4.5 vs. DeepSeek-V3:
While both models offer competitive general knowledge, ERNIE 4.5 shows a slight edge in reasoning tasks and dominates in Chinese benchmarks.
ERNIE 4.5 vs. Llama 3:
Performance in multimodal and reasoning tasks consistently favors ERNIE 4.5 over Llama 3, which also comes with higher deployment costs.
“ERNIE 4.5 is the first open-source LLM to beat GPT-4.5 in both reasoning and cost, making it a true democratizer of AI.” — This Week in AI Engineering
Conclusion
Baidu’s ERNIE 4.5 is setting new standards in the AI arena. With its state-of-the-art multimodal capabilities, innovative heterogeneous MoE architecture, and exceptional cost-efficiency, this open-source model empowers developers and enterprises alike to engage with high-performance AI without prohibitive expenses. ERNIE 4.5 not only outperforms rivals such as GPT-4.5, DeepSeek-V3, and Llama 3 on critical benchmarks but also paves the way for broader, more inclusive AI applications worldwide.
A.I. enthusiast with multiple certificates and accreditations from Deep Learning AI, Coursera, and more. I am interested in machine learning, LLM's, and all things AI.