MCP Production Readiness Gate
MCP-style integrations should not move from demo to production only because a server exposes tools. Use this gate to decide whether an MCP server is blocked, safe for prototype work, safe for read-only use, allowed to perform writes with human approval, or approved for production workflows.
Updated: 2026-06-13. Review note: drafted with AI assistance and intended for human review before any operational policy adoption.
Sources and observations: compare the server against the Model Context Protocol specification, the server's own repository or package page, its changelog/release notes, permission documentation, and your own sandbox observations. Record links to every source used before assigning a production gate.
Decision first: the five production gates
- Block: Do not connect the server to agents. Use when auth is unclear, broad credentials are required, writes cannot be bounded, or sensitive data exposure is not documented.
- Prototype only: Allow local or sandbox testing with fake data. No real user, customer, financial, legal, health, credential, or private company data.
- Read-only: Allow production reads when scopes are narrow, logs are available, and the server cannot mutate external state.
- Writes with approval: Allow write tools only after a human reviews a preview, diff, or dry-run output and explicitly approves the action.
- Production approved: Allow routine production use only when evidence, scopes, audit logs, maintainer policy, recovery paths, and human-approval boundaries are documented and periodically reviewed.
Evidence checklist for an MCP server
For each item, record a link, screenshot, command output, or sandbox note. Missing evidence should lower the gate even if the server appears to work during a demo.
- Tools exposed: names, descriptions, input schemas, output shape, side effects, idempotency, and whether broad tools like
run_commandorexecute_sqlexist. - Resources exposed: resource URIs, data categories, pagination limits, caching behavior, and whether private files, tickets, notes, emails, or database rows can be read.
- Prompts and sampling: whether server-provided prompts or sampling requests can cause model calls, tool loops, or hidden data movement. Disable or limit sampling for untrusted servers.
- Authentication: token type, rotation process, revocation path, credential storage, environment variables used, and whether secrets are redacted from errors and logs.
- Scopes and least privilege: read/write separation, per-workspace or per-project scoping, short-lived credentials, and ability to create a read-only credential.
- Data touched: exact systems, tables, folders, accounts, and data classes reached by the server, including personal, customer, financial, legal, health, confidential, and credential-like data.
- Write actions: every mutation tool, dry-run or preview support, rollback path, duplicate prevention, rate limits, and whether writes return stable IDs for verification.
- Audit logs: ability to identify agent identity, tool name, input summary, affected object IDs, approval evidence, timestamp, and actor responsible for follow-up.
- Maintainer and update policy: owner, release cadence, security issue process, pinned version, changelog, dependency risk, and who reviews upgrades before production use.
Scoring rubric
Score each dimension from 0 to 3. The score informs the gate, but the gate is stricter than the total: one critical failure can still require Block.
- 0: Missing evidence, unsafe default, or unknown behavior.
- 1: Works in a demo but relies on broad permissions, manual assumptions, or undocumented behavior.
- 2: Usable with explicit guardrails, narrow credentials, and human verification.
- 3: Production-ready evidence, narrow scopes, clear auditability, stable maintainer policy, and tested recovery path.
Dimensions to score
- Tool boundaries: tools are narrow, named by action, schema-validatable, and do not hide dangerous side effects behind generic verbs.
- Permission model: read and write scopes are separated, credentials can be rotated, and production use does not require all-powerful account keys.
- Error recovery: failures explain what happened, rate limits include retry timing, partial failures are represented cleanly, and repeated calls are safe or idempotent.
- Verification: every write can be read back by ID, audit logs show agent/tool identity, and state changes are deterministic enough for tests.
- Sensitive-data control: the server documents what data can be read or written, supports minimization, and does not expose secrets, private records, or regulated data without explicit review.
- Maintainer reliability: ownership, release notes, upgrade review, vulnerability handling, and dependency risk are clear enough for production operations.
Interpreting the total and assigning the gate
- 0–5: Block unless the missing evidence is resolved.
- 6–9: Prototype only with fake or sandbox data.
- 10–13: Read-only if credentials and logs are narrow enough; otherwise Prototype only.
- 14–16: Writes with approval if dry-run, preview, rollback, and audit evidence are present.
- 17–18: Production approved only after human review confirms no critical failure remains.
Sensitive-data and human-approval boundaries
- Default to Prototype only when the server can touch customer records, private messages, source code secrets, payment data, legal documents, medical data, or HR data.
- Require a named human approver for account changes, external messages, financial actions, destructive operations, permission changes, production database writes, or irreversible state changes.
- Approval should include the requested tool, sanitized input summary, expected objects affected, rollback plan, and a read-back verification step.
- Never treat a successful demo as permission to process real sensitive data. Upgrade the gate only after evidence is recorded.
Worked scoring example
Hypothetical example: a GitHub issue triage MCP server exposes list_issues, get_issue, add_label, and comment_on_issue. The team pins the package version, creates a repository-scoped token, enables audit logs, and requires human approval before posting comments.
- Tool boundaries: 3 — tools are narrow and side effects are visible in the names.
- Permission model: 2 — repository-scoped token exists, but comments and labels still require write permissions.
- Error recovery: 2 — rate limits and API errors are visible, but duplicate comment prevention needs testing.
- Verification: 3 — labels and comments can be read back by issue ID and audit logs identify the token.
- Sensitive-data control: 2 — public issues are low risk, but private repos require separate review.
- Maintainer reliability: 2 — package is pinned and changelog exists, but upgrade review is manual.
Total: 14/18. Gate: Writes with approval. The server can be used for read-only triage and approved labeling/commenting, but should not post comments autonomously until duplicate prevention, approval records, and rollback procedures are tested.
Review record template
- Server and version: name, package/repository URL, pinned version or commit.
- Evidence links: spec, docs, repo, changelog, permission docs, sandbox notes, audit-log sample.
- Assigned gate: Block / Prototype only / Read-only / Writes with approval / Production approved.
- Known limits: missing evidence, sensitive-data exclusions, write actions excluded, required human approvals.
- Next review: owner and date for re-checking version, permissions, and logs.
Related internal pages: Agent API readiness checklist, static tools index, and Frontier radar.