Guardrails

Hard limits on what an AI agent can do, enforced by the system rather than the prompt: scoped permissions, protected files, required reviews, and kill switches.

What it is

Guardrails are the limits an AI agent physically cannot cross, no matter what it decides to do. Read-only access tokens, protected branches, files it cannot touch, actions that always require a human approval, and a kill switch that stops everything. The defining property: they are enforced by the system, not requested in the prompt.

Why this matters for designers

β€œI told it not to” is not a safety strategy. Models drift, prompts get truncated out of context, and agents misread instructions in ways that look reasonable. Guardrails are what let you say yes to agents at all: you can give an agent real work on your design system precisely because the blast radius of its worst mistake is bounded in advance.

How it works in practice

  1. Scope access mechanically: read-only tokens for observers, branch protection for actors.
  2. Mark protected zones (token sources, brand assets, release pipelines) that no agent edits directly.
  3. Gate irreversible actions (merge, publish, delete) behind human approval.
  4. Keep a kill switch that revokes all agent access in one step, and test that it works.