Tool · Calculator

Workflow-to-Agent Suitability Calculator

Use this calculator before building an AI agent. It separates workflows that are merely annoying from workflows that are safe, observable, and valuable enough to automate.

Updated: June 13, 2026 · Review note: AI-assisted draft; requires human review by Cly before any public distribution. This calculator is a decision aid, not a compliance certification, security audit, or guarantee that autonomous execution is safe.

How the score works

Score every dimension from 0 to 4. Positive dimensions add points; risk dimensions are reverse-scored so safer workflows receive more points. The maximum score is 40.

Value fit: task frequency and time cost should be high enough to justify automation.
Safety fit: exception rate, reversibility, blast radius, data sensitivity, and approval needs should be manageable.
Execution fit: observability, tool/API maturity, and evaluation ability should make the agent testable and debuggable.

Calculator

Pick the closest description for each input. Treat uncertain answers as the lower-scoring option.

1. Task frequency

2. Time cost per occurrence

3. Exception rate

4. Reversibility

5. Blast radius

6. Data sensitivity

7. Observability

8. Approval needs

9. Tool/API maturity

10. Evaluation ability

Result

Score: 16 / 40

Prototype only. Explore with read-only access or synthetic data. Do not let the agent take unsupervised production actions yet.

Decision bands

0–9 · Do not automate: too rare, too risky, too opaque, or too sensitive. Improve process boundaries before agent work.
10–17 · Prototype only: useful for learning, but keep it in a sandbox with synthetic or copied data.
18–25 · Human-in-the-loop: the agent can draft, classify, research, or prepare actions, but humans approve before side effects.
26–33 · Limited production: safe for narrow production use with logging, rollback, and explicit thresholds.
34–40 · Strong candidate: high-value, low-risk, observable, and testable enough for serious agent implementation.

Worked example: weekly invoice follow-up drafts

Assumptions: a small company spends 90 minutes each week checking overdue invoices and drafting polite follow-up emails. The agent reads invoice status from an accounting API, drafts messages, and creates email drafts. A human reviews and sends each email. The agent cannot change payment terms, issue refunds, or send messages directly.

Task frequency: weekly = 2
Time cost: 45–120 minutes = 3
Exception rate: 10–20% need judgment, reverse-scored = 2
Reversibility: drafts can be deleted before sending = 4
Blast radius: limited customer segment, reverse-scored = 2
Data sensitivity: customer billing data, reverse-scored = 1
Observability: accounting IDs and email draft IDs are logged = 3
Approval needs: humans approve all sends, reverse-scored = 1
Tool/API maturity: stable accounting and email APIs = 3
Evaluation ability: overdue status and draft checklist can be tested = 3

Example score: 24 / 40. Decision: human-in-the-loop. Build a draft-only agent first, measure review edits, and do not allow autonomous sending until quality and policy checks are proven.

No-go boundaries

Do not automate workflows that require legal, medical, financial, or employment decisions without accountable human review.
Do not give agents broad credentials, production write access, or access to secrets unless the workflow has scoped permissions and audit logs.
Do not automate customer-visible actions when rollback is impossible and success cannot be independently verified.
Do not use sensitive personal data for prototypes. Use synthetic, redacted, or copied sandbox data.
Do not treat a high score as permission to skip security review, privacy review, or domain-owner approval.

Human review disclosure

This calculator is a decision aid, not a substitute for human accountability. A workflow owner should review the assumptions, risk scoring, permissions, logging, and rollback plan before any production agent is enabled. High-impact actions should remain human-approved even when the score is strong.

Sources and observations

The scoring model is based on recurring agent-automation review signals: task value, exception rate, reversibility, blast radius, data sensitivity, observability, approval design, tool/API maturity, and evaluation quality. Useful public references:

OWASP API Security Top 10 2023 for API risk categories that often affect agent tools.
OpenAPI Specification 3.1 for machine-readable contract expectations.
Model Context Protocol specification for MCP-style tool and resource boundaries.