Common Failure Modes

When CLAUDE.md, taste skills, UX review agents, parallel review, or trust levels stop working, look here. The five most common failure modes per topic, with the fix for each.

If a workflow from one of the guides has stopped working, the cause is almost always one of the patterns below. Each entry is a real failure I have hit or watched others hit, with the fix that resolved it.

Jump to a topic:


CLAUDE.md

“My CLAUDE.md is 800 lines and Claude still hallucinates.” Too long. The model loses signal in noise. Cut to 200 lines max. Move long rule sets to .claude/rules/{topic}.md and reference them from CLAUDE.md instead of inlining them.

“Claude ignores my preferences.” Your preferences are buried under code-style boilerplate. Move the design-system-specific rules to the top, right after the project summary. Order matters.

“My team disagrees about what should be in CLAUDE.md.” Treat CLAUDE.md the same way you treat your component README files. Open a PR. Review it. Merge it. The disagreement is governance, not preference.

“Claude keeps making the same mistake even after I added it to the failure log.” The log entry is too vague. Bad: “stop using marketing words.” Good: “Banned words in product UI: seamless, unlock, empower, revolutionize. If you write one, regenerate without it.” Be specific enough that the rule is impossible to misread.

Related guide: CLAUDE.md for design systems.


Taste skill

“My rules are too general.” Rules like “use good spacing” do nothing. Rules like “use 4, 8, 16, 24, 48, 96” work. Every rule needs a number, hex code, or explicit list.

“I listed too many rules.” Over 30 rules and Claude starts ignoring the tail. Keep the core skill tight. Push edge cases into project CLAUDE.md files.

“I listed things I want to see, not things I hate.” Preferences are easy to write. Anti-rules are where your taste actually lives. If your skill has zero “never” statements, it is not a taste skill yet.

“The audit keeps passing but the output is still boring.” Add an authorship rule: “At least one element on the page must be specifically non-default.” Then list what counts as authored for you.

Related guide: Build Your Own Taste Skill.


UX review agent

“The skill doesn’t load.” Check cat ~/.claude/skills/ux-review/SKILL.md. If the content is missing or the frontmatter is malformed, fix and retry. The name: field in the frontmatter must match the folder name.

“The output is generic.” Your rubric is too general. Add project-specific or product-specific context. Be brutal about what you want.

“The skill ignores my project’s CLAUDE.md rules.” Skills and CLAUDE.md are separate. To have the review use project-specific rules (like your brand voice or motion rules), reference them in the skill: “Also check against the project’s CLAUDE.md at [path] before returning findings.”

“It returned 15 issues instead of 5.” The model sometimes ignores the count. Add a line at the very end of the skill: “Hard cap: 5 issues. If you wrote more, delete the lowest-severity ones before returning.”

Related guide: Your first UX review agent.


Parallel review workflows

“The parallel tasks are running serially.” You are invoking them one at a time instead of dispatching. Make sure your prompt says “in parallel” and “as subagents.” In the CLI, the dispatch happens automatically if phrased as subtasks.

“Reports are long and redundant.” Each lane is too broad. Tighten the rubrics. Cap each lane at 5 issues, minimum severity High.

“The merge step dropped important issues.” The parent agent is applying its own taste to the merge. Fix: tell it explicitly “do not drop any Blocking or High issues from any lane. Group them, do not filter them.”

“Running all four is expensive.” It should cost 4× a normal turn. For a production ship, that is still cheap relative to a real design review. If budget matters, run the 4 lanes only for high-stakes screens.

Related guide: Parallel review workflows.


Trust levels

“My agents are all at Junior because I don’t trust them enough to promote.” Track the metric. If 95%+ of the last 8 weeks of PRs were approved without changes, the agent has earned Senior. Refusing to promote is wasting your review time.

“My agents are all at Autonomous because it was easier to set up that way.” You skipped the levels. Write the verifier. Write the kill switch. Demote everything to Junior until both exist. Without verification and a kill switch, Autonomous is just unmonitored production access.

“The agent’s track record is great but I demoted it anyway.” That is fine. Trust is a tool for governance, not a measure of model intelligence. Sometimes you demote because the cost of a future mistake went up (a launch, a regulated domain, a high-stakes release). Promotion and demotion are reversible.

“Different teammates have different levels for the same agent.” Write it into the CLAUDE.md. The agent’s level is per-repo, per-path, per-agent, not per-person. If the team disagrees, the disagreement is about the system, not the agent.

Related guide: Trust levels: how agents earn autonomy.