Critical vLLM Vulnerability Enables Remote Code Execution

A high-severity security flaw has been discovered in vLLM, a popular AI inference engine widely used in large language model (LLM) deployments. The vulnerability, tracked as CVE-2025-62164, affects vLLM versions 0.10.2 and later and allows attackers to achieve remote code execution (RCE) through the Completions API endpoint by sending maliciously crafted prompt embeddings.

Technical Details: Where the Flaw Resides

The vulnerability stems from unsafe tensor deserialization in entrypoints/renderer.py at line 148. When processing user-supplied embeddings, vLLM uses torch.load() without adequate validation checks. This oversight creates an opening for attackers to inject malicious payloads disguised as serialized tensors.

Why This Happens: PyTorch Integrity Checks Disabled

The issue is compounded by a change introduced in PyTorch 2.8.0, which disabled sparse tensor integrity checks by default. This allows attackers to craft tensors that bypass internal bounds checks, triggering out-of-bounds memory writes during the to_dense() conversion process. The result? Memory corruption, server crashes, and potential arbitrary code execution.

Impact and Severity

CVE ID: CVE-2025-62164
Severity: High
CVSS Score: 8.8/10
Affected Versions: vLLM ≥ 0.10.2
Attack Surface: Any deployment running vLLM as a server, especially those deserializing untrusted or model-provided payloads.

This vulnerability is particularly dangerous because no special privileges are required. Depending on API configuration, both authenticated and unauthenticated users can exploit the flaw.

Why It’s a Big Deal for AI Deployments

Organizations using vLLM in production environments, cloud-based inference services, or shared infrastructure face significant risk. Successful exploitation could:

Crash the vLLM server, causing denial-of-service (DoS).
Enable remote code execution, compromising the entire server.
Potentially allow attackers to pivot into adjacent systems within the same network.

Given the widespread adoption of vLLM for LLM inference, this vulnerability poses a serious threat to AI infrastructure security.

Mitigation and Best Practices

The vLLM team has addressed this issue in pull request #27204. Users should:

Upgrade immediately to the patched version.
Restrict API access to trusted users only.
Implement input validation layers to inspect prompt embeddings before they reach the vLLM processing pipeline.
Consider running vLLM in sandboxed environments to limit the blast radius of potential exploits.

Responsible Disclosure

The vulnerability was discovered and responsibly disclosed by the AXION Security Research Team, underscoring the importance of coordinated vulnerability disclosure in the rapidly evolving AI infrastructure ecosystem.

Key Takeaways

CVE-2025-62164 is a critical flaw in vLLM that enables remote code execution.
Exploitation requires no special privileges, making it highly accessible.
Immediate patching and API hardening are essential to protect AI deployments.

Urgent Fix: vLLM Vulnerability Could Compromise AI Systems