mcp-see
Vision for AI agents, without the context bloat.
Vision models bloat context windows. You need to understand an image, not load every pixel. A 1MB screenshot becomes thousands of tokens — and most of those tokens are background noise.
mcp-see flips the approach: agents request descriptions, not raw data.
Structured Extraction
Instead of loading an image into context, your agent asks questions about it:
- describe — "What's in this image?" Get a text summary.
- detect — "Find all the buttons." Get objects with bounding boxes.
- describe_region — "What's in the top-right corner?" Zoom into areas of interest.
- analyze_colors — "What's the color palette?" Extract design specs.
The agent gets structured understanding without the context cost. A screenshot that would burn 2,000 tokens becomes a 50-token description.
Hierarchical Analysis
The real power is in the workflow. Start broad, then zoom in:
- Overview first. "Describe this dashboard mockup."
- Find regions of interest. "Detect all charts and graphs."
- Zoom into specifics. "Describe the bar chart in the bottom-left."
Your agent builds understanding incrementally, only loading detail where it matters. No wasted context on the parts that don't need attention.
Design Workflow
mcp-see shines for design-to-code workflows. An agent can:
- Get an overview of a mockup
- Detect UI components (buttons, inputs, cards)
- Extract colors and spacing from specific regions
- Generate code without ever loading the full image
The mockup stays outside the context window. The structured specs go in.
Multi-Provider
mcp-see works with multiple vision providers:
- Gemini — Google's vision models
- OpenAI — GPT-4 Vision
- Claude — Anthropic's vision capabilities
Configure your preferred provider. The tools work the same way regardless.
Try It
Add to your MCP configuration:
{
"mcpServers": {
"mcp-see": {
"command": "npx",
"args": ["github:sanity-io/mcp-see"],
"env": {
"GEMINI_API_KEY": "your-key"
}
}
}
}
Works with Gemini (free tier available), OpenAI, or Anthropic. Set the API key for your preferred provider.
Your agent can now see without seeing.
→ GitHub
Built with Miriad. Start building →