A high-severity security flaw has been discovered in vLLM, a popular AI inference engine widely used in large language model (LLM) deployments. The vulnerability, tracked as CVE-2025-62164, affects vLLM versions 0.10.2 and later and allows attackers to achieve remote code execution (RCE) through the Completions API endpoint by sending maliciously crafted prompt embeddings.
Technical Details: Where the Flaw Resides
The vulnerability stems from unsafe tensor deserialization in entrypoints/renderer.py at line 148. When processing user-supplied embeddings, vLLM uses torch.load() without adequate validation checks. This oversight creates an opening for attackers to inject malicious payloads disguised as serialized tensors.
Why This Happens: PyTorch Integrity Checks Disabled
The issue is compounded by a change introduced in PyTorch 2.8.0, which disabled sparse tensor integrity checks by default. This allows attackers to craft tensors that bypass internal bounds checks, triggering out-of-bounds memory writes during the to_dense() conversion process. The result? Memory corruption, server crashes, and potential arbitrary code execution.
Impact and Severity
- CVE ID: CVE-2025-62164
- Severity: High
- CVSS Score: 8.8/10
- Affected Versions: vLLM ≥ 0.10.2
- Attack Surface: Any deployment running vLLM as a server, especially those deserializing untrusted or model-provided payloads.
This vulnerability is particularly dangerous because no special privileges are required. Depending on API configuration, both authenticated and unauthenticated users can exploit the flaw.
Why It’s a Big Deal for AI Deployments
Organizations using vLLM in production environments, cloud-based inference services, or shared infrastructure face significant risk. Successful exploitation could:
- Crash the vLLM server, causing denial-of-service (DoS).
- Enable remote code execution, compromising the entire server.
- Potentially allow attackers to pivot into adjacent systems within the same network.
Given the widespread adoption of vLLM for LLM inference, this vulnerability poses a serious threat to AI infrastructure security.
Mitigation and Best Practices
The vLLM team has addressed this issue in pull request #27204. Users should:
- Upgrade immediately to the patched version.
- Restrict API access to trusted users only.
- Implement input validation layers to inspect prompt embeddings before they reach the vLLM processing pipeline.
- Consider running vLLM in sandboxed environments to limit the blast radius of potential exploits.
Responsible Disclosure
The vulnerability was discovered and responsibly disclosed by the AXION Security Research Team, underscoring the importance of coordinated vulnerability disclosure in the rapidly evolving AI infrastructure ecosystem.
Key Takeaways
- CVE-2025-62164 is a critical flaw in vLLM that enables remote code execution.
- Exploitation requires no special privileges, making it highly accessible.
- Immediate patching and API hardening are essential to protect AI deployments.