Milestone Systems has announced the release of an advanced Vision Language Model (VLM) specializing in traffic analysis, powered by NVIDIA Cosmos Reason. This breakthrough underpins two new offerings: a Video Summarization tool for XProtect Video Management Software and a VLM as a Service (VLMaaS) for third-party integrations.
Video Summarization for XProtect
Modern video systems generate enormous volumes of footage, making manual review time-consuming and prone to operator fatigue. Milestone’s new Video Summarization tool, a generative AI-powered plug-in for the XProtect Smart Client, addresses this challenge by automating workflows and reducing false alarms. Early reports suggest the tool can cut operator fatigue by up to 30 percent.
The tool analyzes video snippets and produces structured text summaries in seconds. Users can search summaries by content rather than timestamps, bookmark and filter results, and integrate summaries with existing XProtect event logic. By filtering out irrelevant motion, the system ensures operators focus on valid events. Sovereign VLMs tailored to regional requirements, starting with the US and EU, further enhance flexibility. The plug-in is free to download, with costs incurred only when prompting the VLM.
VLM as a Service
For developers, Milestone introduces Hafnia VLMaaS, providing API access to production-ready video intelligence. Built on NVIDIA’s latest technology and fine-tuned with responsibly sourced data, VLMaaS enables rapid integration of generative AI into applications without the need for bespoke training or infrastructure. This accelerates development by up to 70 times compared to traditional fine-tuning.
Key features include prompt-based traffic operations, API-first delivery via HTTPS, fine-tuned models for US and EU markets, and compliance with GDPR and the EU AI Act. Pricing follows a pay-per-use model, eliminating large upfront costs. Early access is available at hafnia.milestonesys.com.
Industry Impact
Andrew Burnett, Acting CTO of Milestone Systems, emphasized: “With Video Summarization and VLMaaS, we’re tackling video overload and manual bottlenecks. Operators gain immediate insight within XProtect, while developers get API-first access to intelligence they can trust.” Cities such as Genoa, Italy, and Dubuque, Iowa, are already preparing to adopt these capabilities for enhanced traffic management.
Responsible AI Foundation
Both solutions are powered by Milestone’s Hafnia VLM, fine-tuned on 75,000 hours of responsibly sourced real-world traffic video using NVIDIA Cosmos Curator. Running on cloud or regional data centers, the platform represents one of the most advanced video AI systems available today, combining responsible data practices with cutting-edge performance.