TMTPOST -- Nvidia Corp. and French artificial intelligence ( AI ) startup Mistral AI have achieved significant performance breakthroughs through their latest collaboration, delivering up to 10 times faster inference speeds for Mistral's new model family on Nvidia's GB200 NVL72 systems compared to the previous-generation H200 chips.

AI Generated Image
Mistral AI on Tuesday released its Mistral 3 family of open-weight models, optimized for Nvidia platforms from data centers to edge devices. The release includes Mistral Large 3, a 675 billion total parameter mixture-of-experts model with multilingual and multimodal capabilities, alongside nine smaller Ministral 3 variants designed for deployment on robots, drones and offline devices.
The partnership positions the two-year-old French company to better compete with leading AI labs including OpenAI and Google, particularly in enterprise deployments where customization and cost efficiency matter. Mistral has raised $2.7 billion at a $13.7 billion valuation, with Nvidia among its investors.
The collaboration delivers practical advantages for enterprise users. On the GB200 NVL72, Mistral Large 3 achieved over 5 million tokens per second per megawatt at 40 tokens per second per user, translating to lower per-token costs and improved energy efficiency for production AI systems.
GB200 Systems Drive Performance Gains
Mistral Large 3's architecture leverages Nvidia's hardware optimizations to unlock substantial efficiency improvements. The model's mixture-of-experts design activates only the most relevant parts for each task rather than engaging all 675 billion parameters, reducing computational waste while maintaining accuracy.
The performance leap stems from several technical advances. Nvidia's TensorRT-LLM Wide Expert Parallelism exploits the GB200 NVL72's coherent memory domain through NVLink fabric, enabling optimized expert distribution and load balancing. The system also employs NVFP4 low-precision inference and Dynamo disaggregated inference optimizations to deliver peak performance for large-scale training and deployment.
These optimizations work across Nvidia's inference frameworks including TensorRT-LLM, SGLang and vLLM. The models are available through leading open-source platforms and cloud service providers, with deployment expected soon as Nvidia NIM microservices.
Ministral 3 Targets Edge Deployment
The compact Ministral 3 suite brings AI capabilities to devices operating without network connectivity. Available in 3 billion, 8 billion and 14 billion parameter configurations, each size offers Base, Instruct and Reasoning variants to match specific use cases.
Performance on edge platforms demonstrates practical viability. The Ministral-3B variants achieve up to 385 tokens per second on Nvidia's RTX 5090 GPU. On Nvidia Jetson Thor, the models deliver 52 tokens per second for single concurrency, scaling to 273 tokens per second with eight concurrent requests.
Guillaume Lample, Mistral co-founder and chief scientist, emphasized the efficiency advantage: "The huge majority of enterprise use cases are things that can be tackled by small models, especially if you fine-tune them." All Ministral 3 variants support vision, handle 128,000 to 256,000 context windows, and run on single GPUs, reducing deployment costs and latency.
Commercial Push Intensifies Competition
The release comes as Mistral accelerates commercial activity following a 1.7 billion euro funding round in September that valued the company at 11.7 billion euros. Dutch chip equipment maker ASML contributed 1.3 billion euros, with Nvidia also participating.
Mistral has secured contracts worth hundreds of millions of dollars with corporate clients and announced a deal Monday with HSBC for financial analysis and translation tasks. The company is also expanding through acquisitions to compete with U.S. rivals establishing European operations, including Anthropic and OpenAI, which both opened European offices this year.
The startup's open-weight approach contrasts with closed-source competitors. While OpenAI and Anthropic maintain proprietary models accessible only through APIs, Mistral releases model weights publicly for download and customization. Lample argues this delivers superior results for specific enterprise deployments: "In many cases, you can actually match or even out-perform closed-source models" through fine-tuning.


登录后才可以发布评论哦
打开小程序可以发布评论哦