Every time someone wants to give an AI agent a new capability, the conversation goes straight to MCP. Build a server, expose tools, connect your agent. It works. But for personal automation, it is more infrastructure than you need, and it costs you more tokens than you think.
I want to make a case for a different approach: a CLI bundled inside an Agent Skill. Then I will show you exactly how to build it.
The token problem with MCP
MCP tool schemas load into the context window before a single message is processed. Every tool has a name, description, and parameter definitions. All of it sits in context for the entire session, whether the agent uses the tool or not.
The numbers are not small. According to fastn.ai, the GitHub MCP server alone uses 55,000 tokens across its 93 tool definitions. One team tracked their Task Master MCP integration consuming 45-50k tokens, nearly 25% of Claude Code’s 200k context window, before any real work begins. Cerebras reports that connecting just three services (GitHub, Slack, Sentry) puts over 55,000 tokens of tool definitions in context, ranging from 3-42x higher token usage than equivalent CLI calls.
StackOne ran a direct comparison: MCP used 4-32x more tokens per operation than CLI, and failed 28% of the time. The CLI version cost a fraction of the price and succeeded more reliably.
vensas.de puts it concisely: a single tool call might cost 900-3,000 tokens total versus 15,000+ for the equivalent MCP call.
With a CLI, the agent gets one line in the skill:
uv run mytool do-something --flag value
That is it. The CLI handles all the logic. The agent does not need to know the parameter schema, the return type, or the implementation. It runs the command and reads stdout. The token cost is the length of that one line.
The portability problem with MCP
MCP requires per-client configuration. You build a server for Claude Code, then configure it again for Kiro, then again for Codex. Each client has its own way of connecting to MCP servers.
Agent Skills is an open standard supported by all of them. You write the skill once. It works everywhere. The CLI is a Python package. It runs anywhere Python runs.
When MCP is the right choice
MCP is not wrong. Use it when:
- The tool needs to run as a persistent service (database connections, long-running processes)
- Multiple users or agents share the same tool
- You need streaming responses
- You need fine-grained access control per tool call
For everything else, personal automation, daily workflows, one-off tasks, CLI + Skill is simpler.
The architecture
The system has three parts:
agents-stuff/ # git repo, uv workspace root
├── pyproject.toml # workspace definition
├── skills/
│ └── mytool/ # one folder per skill
│ ├── SKILL.md # agent instructions
│ ├── pyproject.toml # Python package
│ └── src/mytool/
│ └── cli.py # the CLI
└── .venv/ # shared virtual environment
The skill folder is both an Agent Skill (the SKILL.md) and a Python package (the pyproject.toml). The agent reads the skill to know when and how to use the tool. The agent runs the CLI to do the work.
Step 1: Set up the uv workspace
uv is a fast Python package manager that supports workspaces. A workspace lets you manage multiple Python packages in one repo with a shared virtual environment.
Create the repo and initialize it:
mkdir agents-stuff && cd agents-stuff
git init
uv init --no-package
Edit the root pyproject.toml to declare the workspace:
[project]
name = "agents-stuff"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = ["mytool"]
[tool.uv.workspace]
members = ["skills/mytool"]
[tool.uv.sources]
mytool = { workspace = true }
The members list tells uv which folders are workspace packages. The sources section tells uv to resolve mytool from the workspace rather than PyPI. When you add more skills later, you add them to both lists.
Step 2: Create the skill package
mkdir -p skills/mytool/src/mytool
touch skills/mytool/src/mytool/__init__.py
Create skills/mytool/pyproject.toml:
[project]
name = "mytool"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = ["typer>=0.15", "pydantic>=2"]
[project.scripts]
mytool = "mytool.cli:app"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src/mytool"]
The [project.scripts] entry registers mytool as a CLI entry point. After uv sync, the agent runs it with uv run mytool. The src/ layout keeps the package code separate from the skill metadata files.
Step 3: Write the CLI
Use Typer for the CLI and Pydantic for input validation. The key principle: validate everything before touching any external API. If the agent passes bad data, Pydantic rejects it with a clear error message before any side effects occur.
Ideally you would write this in a statically typed language like Rust, where the compiler catches bad inputs before the program runs. I wrote about this in Living with Agentic Engineering: the stronger your types, the better your experience with agents. Rust or Go give you that guarantee at compile time. Python does not. But Python with Pydantic is a good middle ground. You get runtime validation with clear error messages, model-level invariants, and a familiar ecosystem. For personal tooling, that is enough.
# skills/mytool/src/mytool/cli.py
import json
import typer
from pydantic import BaseModel, field_validator
app = typer.Typer(help="Do the thing.")
class Input(BaseModel):
name: str
count: int
@field_validator("count")
@classmethod
def check_count(cls, v: int) -> int:
if v <= 0:
raise ValueError("count must be greater than 0")
return v
def _run(data: Input, dry: bool) -> None:
if dry:
typer.echo(json.dumps({"ok": True, "dry_run": True, "input": data.model_dump()}))
return
# actual work here
result = {"ok": True, "processed": data.name, "count": data.count}
typer.echo(json.dumps(result))
@app.command()
def run(
name: str = typer.Argument(..., help="Name to process."),
count: int = typer.Argument(1, help="How many times."),
dry_run: bool = typer.Option(False, "--dry-run", help="Simulate only."),
) -> None:
"""Process a name."""
data = Input(name=name, count=count)
_run(data, dry_run)
if __name__ == "__main__":
app()
A few design decisions worth explaining:
Output as JSON. The agent reads stdout. Structured JSON is easier to parse and reason about than human-readable text. The agent can extract specific fields without string parsing.
Always include --dry-run. The agent can validate and preview before committing to any side effects. This is the single most important flag for agent-facing CLIs. It lets the agent show you what it would do before doing it.
Validate in the model, not in the command. Pydantic runs validation at Input(name=name, count=count). If validation fails, the error message is clear and structured. The command function stays clean.
Separate _run from the command. The _run function handles the dry-run logic and the actual work. The command function only handles argument parsing. This makes the CLI easier to test.
Step 4: Write the SKILL.md
The skill file is what connects the CLI to the agent. It has three jobs: tell the agent when to load the skill, tell the agent exactly what command to run, and tell the agent when to act without asking questions.
---
name: mytool
description: Use this skill when the user wants to process a name.
---
# My Tool
## Commands
Run from: /path/to/agents-stuff
uv run mytool <name> [count] [--dry-run]
Examples:
uv run mytool "foo" # process foo once
uv run mytool "foo" 3 # process foo three times
uv run mytool "foo" --dry-run # preview without executing
## Defaults
- count defaults to 1
## When to act immediately
If the user says "process foo", run `uv run mytool "foo" --dry-run` first,
show the output, then ask for confirmation before running without --dry-run.
If the user says "process foo, confirmed", run `uv run mytool "foo"` immediately.
The description field is what the agent uses to decide when to load the skill. Be specific. “Use this skill when the user wants to process a name” is better than “Use this skill for processing tasks.” The more specific the description, the less likely the agent loads the skill when it should not.
The “act immediately” rules encode your preferences about when the agent should ask for confirmation. For destructive or irreversible actions, always dry-run first. For safe, idempotent actions, you can tell the agent to act immediately.
Step 5: Install and symlink
uv sync
This installs all workspace packages into the shared .venv. The mytool command is now available via uv run mytool.
Symlink the skills folder into your agent’s config directory:
# for Kiro
ln -s /path/to/agents-stuff/skills ~/.kiro/skills
# for Claude Code
ln -s /path/to/agents-stuff/skills ~/.claude/skills
The agent discovers the skill at startup. The skill is version-controlled in your git repo. When you update the skill or the CLI, commit and the change is live immediately.
Step 6: Adding more skills
Each new skill is a new workspace member. Add it to the root pyproject.toml:
[tool.uv.workspace]
members = ["skills/mytool", "skills/anothertool"]
[tool.uv.sources]
mytool = { workspace = true }
anothertool = { workspace = true }
Create the folder structure, write the CLI, write the skill. Run uv sync. Done.
The skills share the same virtual environment. Dependencies are deduplicated. The repo stays clean.
The full flow
User: "process foo"
Agent loads SKILL.md (description matches)
Agent reads the command: uv run mytool "foo"
Agent runs: uv run mytool "foo" --dry-run
CLI validates input with Pydantic
CLI prints JSON preview to stdout
Agent shows preview to user
User confirms
Agent runs: uv run mytool "foo"
CLI executes, prints JSON result
Done
Compare this to MCP:
User: "process foo"
Agent has tool schema in context (always, every session, ~1,000+ tokens)
Agent calls tool with parameters
MCP server receives call over the protocol
Server executes
Server returns result
Done
The MCP flow is not wrong. But the schema is always in context. The server is always running. The configuration is per-client. For a personal tool you use once a day, that is overhead you do not need.
Summary
The case for CLI + Skill over MCP for personal automation:
- Token cost. MCP tool schemas load into context for every session. A CLI invocation is one line. StackOne measured 4-32x higher token usage with MCP.
- No server. The CLI runs as a subprocess when the agent calls it. It exits when done. No process to start, no port to configure, no service to keep alive.
- Portability. Agent Skills is an open standard. The same skill works in Kiro, Claude Code, Cursor, and Codex. MCP requires per-client configuration.
- Debuggability. The CLI is a Python script. Run it directly in your terminal. Add
--dry-run. Read the source. There is no protocol layer to debug. - Validation. Pydantic validates inputs before any API call. The agent cannot pass bad data silently.
The pattern is: encode your business logic in a CLI, encode your agent behavior in a skill, keep them separate, run with uv run.