Illustration of a fox in judge's robes standing before a courtroom mirror with its reflection appearing more confident and holding a gavel

Why You Shouldn’t Ask Your AI to Audit Its Own Instructions

A viral post this week says you should ask Claude to audit its own instructions and delete whatever it flags as unnecessary.

I tried the opposite approach. Here’s why it works better.

I’ve been running an AI assistant full-time for 3 months. It manages my email, runs my marketing, posts to social media, sends my newsletter, manages ad campaigns. It has about 50 rules in its instruction files.

Every one of those rules exists because something went wrong.

“Never add contacts to list 13” is there because we accidentally emailed 2,800 people a welcome message they’d already received. “Always verify links before sending” is there because a broken URL went out to my entire Udemy student base. “Never let the AI calculate dates” is there because it got the math wrong in 3 different cron jobs in one week.

The viral advice says: paste your rules into Claude and ask it which ones to cut. Claude will happily tell you to remove half of them. It’ll say things like “I already do this by default” and “this is redundant with your other instructions.”

The problem? Anthropic’s own engineering team has found that models are unreliable judges of their own behavior. My AI will confidently say “I already verify links by default” — and then not verify links. The rules aren’t there because the AI needs a reminder. They’re there because without the rule, it fails silently and I find out when a customer complains.

Here’s what actually works for keeping your AI instructions clean:

Run a separate model as the auditor. Don’t ask Claude to grade Claude’s homework. Spawn a second instance with explicit criteria: find contradictions, find duplicates, find rules that fight each other. Two perspectives catch what one misses.

Review rules when you upgrade models. A rule written for Sonnet 3.5 might be unnecessary on Opus 4. But a human has to make that call, not the model that wants to believe it doesn’t need guardrails.

Test before you cut. Delete a rule, run your 3 most common tasks, and check the output carefully.

Keep a log of why each rule exists. When you know the failure that created a rule, you can judge whether the failure is still possible. When you don’t, you’re guessing.

The viral post’s core insight — that instruction bloat is real and degrades output — is correct. But “ask the AI to audit itself” is the blind leading the blind. Your instruction files are scar tissue from real failures. Treat them with respect.

OpenClaw is a self-hosted AI assistant that runs on your own server 24/7. It keeps its own memory, runs scheduled tasks, and learns your workflows over time.

I teach a class on setting up and getting the most from OpenClaw — details at themeperks.com/openclaw-course/.

Similar Posts