OpenShell marks a necessary shift from fragile prompt-based guardrails to robust architectural enforcement by treating LLM agents as untrusted code. It is a pragmatic realization that true security must exist outside the model's own logic to be effective.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
OpenShell AgentsIndexed:
In this video, we look at OpenShell, the layer that runs the protection in NanoClaw Blueprints, but we actually do it with a LangChain DeepAgents harness to show how you can use a number of different agent options. 🔗 Links: Github: https://github.com/langchain-ai/openshell-deepagent NVIDIA: https://build.nvidia.com/openshell Docs: https://docs.nvidia.com/openshell/home Twitter: https://x.com/Sam_Witteveen 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes 👨💻Github: https://github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:39 Quick Recap: NemoClaw 01:47 3 Flavors of NemoClaw 02:47 LangChain Deep Agent Framework 03:07 Deep Agent Architecture 04:26 Deep Agents+NemoClaw+OpenShell 04:52 Deep Agent Project 05:51 OpenShell: The Core Idea - Out-of-Process Enforcement 07:52 4 Things Supervisor Controls 10:14 End-to-end Walkthrough #NVIDIAAI #langchain
So, it's been a couple of months since Nvidia announced Nemo Claw at GTC, and I want to come back to something I said in the first video that I made about that because I think it's aged pretty well.
Nemo Claw isn't the star of the show.
The actual interesting piece sitting underneath it is OpenShell. Here's the headline that I feel most people kind of missed. Nemoclaw is a blueprint and every flavor of that blueprint, whatever harness you put on top, deploys into Open Shell. That's the part that deserves a deep dive. So, in this video, we're going to walk through a fully local agent running inside OpenShell.
And I'll show you why I think this is the pattern that's going to stick. Okay.
So, just a quick recap. Nemo claw is best thought of as like an open blueprint for building specialized agents. And a Nemo Claw blueprint, it's more than just open claw. A Neoclaw blueprint always has three parts. You've got a harness, that's the agent loop, the planning, the tool calling. You've got a model. Typically, Nvidia is using Neotron. And in this video, I'm going to be using the Neotron super model, but it doesn't have to be that model. Lastly, you've got a runtime, and that is OpenShell. So, the harness and the model can change. OpenShell is the constant here. Open Shell is the part doing the actual security work, the sandboxing, the policy enforcement, the network isolation. Nemo claw just wires these three things together with sensible defaults. So when Nvidia talks about Nemo claw, what they're really talking about is one specific blueprint, open claw plus neotron plus open shell. But the blueprint pattern itself is more general than that. And that's the part that I want you to walk away with. So just to make this concrete there are kind of three flavors of Nemo claw at the moment. Same runtime underneath.
Number one is open claw plus neotron plus open shell. That's the version that Nvidia shipped at GTC. That's what most the coverage has been about. Number two is kind of one that I think is coming a lot more around the corner but if you look at Nvidia's repos you can see this already and that's Hermes plus Neotron plus open shell. So you swap out the open claw harness for something like a Hermes style agent. Different planning style but still open shell and the same runtime. The third kind of blueprint is for a lot more sort of custom harnesses.
So in this video I'm going to be using the lang chain deep agents plus Neatron plus open shell. This is what we're going to spend the rest of this video on. So remember the harness is interchangeable. Open Shell doesn't care which one is on top because it's enforcing policy from outside the agent, which is actually the whole point here.
Okay, so if you haven't seen Lang Chain's deep agent framework yet, it's worth a minute to talk about because this is what we're going to use as the harness here. Deep agents is basically Langchain's attempt at packaging up the patterns that make up agents like Claude Code, Manis actually work over sort of long horizons. So you get a planning loop. You get sub agents that the main agent can spawn to handle specific subtasks. You've got a file system that the agent can read and write to. And then using things like that, you've got a structured to-do list that the agent can use to track its own work. And really this highlights all the key patterns that I've been talking about recently about things like sandboxes, file systems, all these sort of things that are making up a modern agent. And these are the patterns that we've seen emerge in production agents over the last year or so. The reason why I like deep agents as a harness is that it's model agnostic. You could use this with any particular model. In this case today, I'm going to be using it with a local Neotron model running on a DGX Spark, which Nvidia themselves have kindly sponsored for this video. The key thing here about the deep agents is that the agent loop is decoupled from the brain and from a lot of the tools that it's using, etc., which is exactly the kind of pattern that I've been talking about on the channel recently. But even with things like deep agents, it still has a lot of the same problems that other agents have that it has the keys to your machine. It can read files. It can hit any API it wants. That's fine if you're just messing around on your laptop. It's really not fine though if you're trying to put this anywhere near production or lock it down to just make it a lot safer. So, what we're going to build is a deep agent inside a Nemo Claw blueprint. The harness is deep agents.
The model is Neotron. And the runtime that we're using is OpenShell, which means that the agent still gets to do useful work like external search etc. But it's going to be through approved channels. All the work that it's going to do can only do that work inside a trust boundary that we've explicitly defined. Nothing else. All right. So this is what the deep agent project actually looks like. We've basically got our code in here. You can see I've got prompts. I've got the different backends etc. And you can see that for example assisting this locally and not running it on the DJX Spark. I was actually using the Neotron 3 model through the Nvidia cloud API here. But once we go for the version that we're going to use with OpenShell, that's when we will start using the Olama version in here.
Now you can see in here I've got the policy YAML file and this is where I can define what can actually go in and out and what endpoints etc get used. You can see I can define what folders in the sandbox and runtime are actually available to this. I could basically lock any of these out. I also can set up network policies. So in this case for search I was using duck.go and so here you can see I basically set that up.
That means that it won't allow it to go to other URLs. Okay, so this is where I want to slow down because this part of the video is what actually matters most.
How does OpenShell actually pull this off? The core idea and once you see this, you kind of can't unsee it is out of process enforcement. So, the old way of doing agent safety, and I'd say most agent frameworks still do this, is you put your rules in the system prompt, and you've got something like you're a helpful assistant. You must not do X, you must not do Y. Please always check before doing something else. And then you hope that the model follows them.
And the problem with that is that an LLM is a conditional probability machine.
You can't ask a conditional probability machine to be the enforcement layer on its own rules. It's just the wrong layer. The moment somebody slides a prompt injection into a web page, the agent reads or a tool output the agent processes. The agent will happily ignore those rules in the prompt. We've all seen things like this happen before.
OpenShell takes a totally different approach. The policies don't live inside the agent at all. They live outside the agent process enforced by a component called the supervisor. The supervisor starts before the agent does. It prepares the sandbox. It fetches the policy from the gateway and then it launches the agent as a restricted child process with the policies already in place. So when the agent tries to open a network connection, write to a file or call out an inference endpoint, the supervisor evaluates that against the policy before it ever reaches anything real. The agent can't bypass the policies because the agent doesn't enforce the policies. The supervisor does. And by having the enforcement of these policies sort of outside of the reach of the agent means that even if your agent is fully compromised, somebody jailbreaks it, prompt injects it, whatever it is, the policies still hold because the policies are not inside the thing that got compromised. All right. So what does the supervisor actually control? So there are four key things here that are worth knowing. The first is the network and this is default deny. When the sandbox starts, nothing can leave the box. You can add things to an allow list, whether that's your search API, your inference endpoint, but everything else is blocked and it's blocked outside of the agent's control.
So, the agent can't just talk its way around it. That default deny network posture is the key enterprise feature here, by the way. It's the thing that you can hand to a compliance team when they ask, "Can you prove that the agent won't send data externally?" Yes, here's the policy file denied by default.
That's something that you can actually point to and show. The second thing are files. Your container has its own workspace, but your host directories are not mounted in it. So, when the agent reads or writes files, it's reading and writing inside the sandbox. It can't reach out into your home directory. It can't grab your SSH keys. It can't even read your environment files. The whole class of attacks that exfiltrate your credentials have just gone away. The third is inference calls. There's a managed endpoint inside the sandbox.
This is the sort of inference.local and any model call the agent makes goes through that. The supervisor decides where each call actually routes to. And a key thing here is that the provider credentials like your LLM API call if you're calling something externally.
They're just never exposed. The agent just sees inference.local.
The supervisor handles the rest. The fourth year credentials API keys are never stored inside the sandbox. They're injected at runtime by the gateway through the supervisor, only alongside the configured policy paths. The sandbox doesn't keep them on disk. So if anyone does manage to compromise the agent, they don't get a stash of keys. They get a running process with no persistent secrets. This is so much better than just sort of like a typical agent setup where you've got a Nend file with every API key sitting on your disk. So network files, inference and credentials, that's the full attack surface of an agent locked down behind these policies. And that value proposition there, that's what makes OpenShell as the runtime the part of this stack that really matters the most. Okay, so now let's actually walk through what it looks like to run one of these end to end. I'm going to show you the deep agent blueprint we built. Deep agents harness neatron model open shell runtime and walk through the moving pieces in here. So the first thing is the policy file. This is a YAML file. And the cool thing here is your policies are basically code. They live in a repo. They go through a pull request. They get diff. They get reviewed. There's no magic here. If you want to know what your agent can actually do, you can read the file.
You'll see we've got a network section with the allow list. That's our search API and our inference endpoint. That's it. We've got a file system section declaring which paths inside the sandbox the agent can read versus read write.
That part is locked when the sandbox actually starts. So changing it means recreating a sandbox. And we've got the inference section that I talked about before pointing to inference.local in this case towards our Neotron routing config. The network and inference policies can hot reload, but the file system can't, and that's by design.
Credentials come in separately. When the gateway provisions the sandbox, the supervisor receives the credential material and injects it through a configured policy path. The agent itself never sees the raw key. It just makes a call to inference.local and the supervisor signs it on the way out. That's the bit that makes the hybrid setup actually work. Okay. Once I've connected into the actual DGX machine, I can create a new open shell sandbox just as simple as this. Right?
So you can see that okay I've actually created the sandbox I've given it the name SAM shell. Now if I want to go out of the sandbox I can just exit it. If I want to connect back in I can just connect back into it like that. And then the key thing is to set up our policy.
So to do that we just do open shell policy set. then the actual name of the sandbox in here and then we can just pass in the link to the actual policy that we had before. Okay, so the takeaway here if you're building agents right now and you want them to be secure, stop thinking purely in frameworks. Start thinking in blueprints, sandboxes and primitives because once you see it that way, it snaps into place. The harness gives you the agent loop. The model gives you the reasoning and OpenShell gives you the runtime that makes all of this safe.
That's the part to take away here. Swap the harness, swap the model. The runtime should stay the same. If you want to check out this more, I put the links to the Nemo Claw and Open Shell docs in the description along with the first video I made if you haven't seen it already. Let me know in the comments what you're building agents on top of and whether you've started thinking about this runtime layer yet. As always, if you found the video useful, please click like and subscribe. And I will talk to you in the next video.
Related Videos
Elon Musk’s XAI, Fiber-Optic Drones & the New Era of US Defense & Winning the AI Arms Race
DefenseNow
250 views•2026-05-15
Decart Raises $300M to Build the Future of Realtime AI
DecartAI
252 views•2026-05-18
I Read Every Google Antigravity 2.0 Doc So You Don't Have To (13-Min Operator Playbook)
hyperautomationlabs1045
120 views•2026-05-19
Could AI change the future of cancer survival?
MotherConservative
999 views•2026-05-16
[RQ] All Preview 2 Midnight Horror School Deepfakes in Macbg Major
macbghuggylego
102 views•2026-05-15
Firefox on Android Just Added 'Shake to Summarize'
BrenTech
349 views•2026-05-19
Google’s NEW AI Just SHOCKED The World…
JulianGoldiePodcast
188 views•2026-05-21
WWDC 2026 Promises Apple Intelligence and Siri Upgrades | Episode 195
TheMacRumorsShow
104 views•2026-05-22











