Letting inmates run the asylum: Using AI to secure AI

Matt Sayar August 14, 2025

One of Anthropic's quieter releases recently was their "Security Review," where Claude Code can identify and fix security issues in your code. But how good is it really? In my case, will it find issues with code it helped me write for my newsletter service and Chrome extension?

The release states it uses a "specialized security-focused prompt that checks for common vulnerability patterns." After throwing so much compute at model training, LLMs are nearing the top of the S-Curve, so finding ways to use their existing capabilities is becoming more important. In this case, a special prompt results in a new feature, much like OpenAI used a carefully-crafted prompt to release Study Mode in ChatGPT.

So is any of my code vulnerable?

First, I had it take a peek at Simple Wikiclaudia, the browser extension I wrote with Claude to simplify Wikipedia articles. One of the extension's biggest features is how simple it is, so I didn't expect it to find much. But how much can you trust the AI that wrote most of the code in the first place?

Appropriately found on a blog about o11y

Additionally, it looks like Claude's security review mainly targets low-hanging-fruit included in the OWASP Top 10 (new version coming soon!). But what if there's something else the security review can't catch, maybe something specific to browser sandboxes and web extensions? Or other unknown unknowns? After all, Claude said all my stuff was peachy keen:

Claude: Well *of course* it's fine. I wrote it!

It all comes back to Defense in Depth. Do I think this feature has value? Of course it does. Would I rely solely on this LLM review before shipping to production? Of course not. So what else can you do? Human code review, Static Application Security Testing, Dynamic Application Security Testing, QA testing, fuzz testing, testing, testing, testing. And you still won't catch everything!

OK but is this code secure or not

Fortuitously, I recently started playing around with a free trial of Datadog. They just let anybody sign up! And included in the trial, I can have it evaluate my code on GitHub with all kinds of code quality tools.

"No code vulnerabilities" from a second tool is another good signal. Please ignore the Code Quality signals. Nothing to see here, please move along and quickly scroll past this next image.

Alright alright, I used to work for a company that specializes in logs, so give me a break if I have a bunch of logging statements. And calling them "HIGH" severity is a little overreactive. This is out-of-the-box tuning, so I can turn all these off, which is nice, and the first thing I did.

How about the code for rsspberry2email?

Yes, the service I wrote with all kinds of weaker LLMs before Claude Code made these things way easier! There's a lot more potential for security risks here, especially since it involves more moving parts with emails, Atom/RSS, untrusted inputs, etc.

I kicked off Claude's /security-review and waited. It seems to take a while, especially if you forget about it and need to come back and press Confirm/Yes/Continue because you're not watching it closely.

While that ran, I noodled around with Datadog's dashboard for this repo. I don't want to gush but it's pretty slick. Just look at all these vulnerabilities!

Seriously though, I love how easy it is to navigate to the alerts I actually care about and the recommended way to fix it. With vulnerable libraries, it even helps you fix it by clicking a Remediate button.

Anyway, did Claude agree with Datadog's assessment? It actually did surface one issue in common, so I should accept the risk probably fix that. Keep in mind, this is a little Node.js server running on a raspberry pi in my home closet to email my dozens of subscribers updates to this very site. The business risk is small, but I promise I'm protecting that little json file with your email addresses better than some others.

No, I didn't expect you to read all this. Did you read all this? Don't do that.

Conclusion

As another tool in a software engineer's toolbelt, Claude's security review is pretty cool, but far from a panacea. Optimistically, I would expect users to use this as part of a standard CI/CD workflow to keep from merging code with obviously-bad security practices. Pessimistically, I expect an uptick in lazy vulnerability reports. I'm happy to see one of the biggest AI companies shine a spotlight on security at all, and I expect to see bigger and better in the future.