Jay Taylor's notes
back to listing indexsteipete/Peekaboo: Peekaboo is a macOS CLI & optional MCP server that enables AI agents to capture screenshots of applications, or the entire system, with optional visual question answering through local or remote AI models.
[web search]
Original source (github.com)
Clipped on: 2025-12-04
steipete/Peekaboo
Add file
Folders and files
| Name | Last commit message | Last commit date |
|---|---|---|
Latest commit8f60458 · Dec 3, 2025 History | ||
Dec 2, 2025 | ||
Nov 25, 2025 | ||
Dec 3, 2025 | ||
Nov 25, 2025 | ||
Dec 3, 2025 | ||
Nov 12, 2025 | ||
Dec 2, 2025 | ||
Nov 25, 2025 | ||
Dec 3, 2025 | ||
Nov 25, 2025 | ||
Nov 25, 2025 | ||
Dec 2, 2025 | ||
Nov 24, 2025 | ||
Nov 13, 2025 | ||
Nov 25, 2025 | ||
Dec 3, 2025 | ||
Nov 14, 2025 | ||
Nov 25, 2025 | ||
Nov 25, 2025 | ||
May 22, 2025 | ||
Nov 18, 2025 | ||
Nov 7, 2025 | ||
Nov 19, 2025 | ||
Nov 22, 2025 | ||
Dec 2, 2025 | ||
Dec 3, 2025 | ||
Nov 5, 2025 | ||
May 22, 2025 | ||
Nov 25, 2025 | ||
Dec 3, 2025 | ||
Nov 25, 2025 | ||
Nov 25, 2025 | ||
Nov 14, 2025 | ||
Nov 24, 2025 | ||
Nov 7, 2025 | ||
Nov 25, 2025 | ||
Repository files navigation
Peekaboo brings high-fidelity screen capture, AI analysis, and complete GUI automation to macOS. Version 3 adds native agent flows and multi-screen automation across the CLI and MCP server.
Note: v3 is currently in beta (3.0.0-beta1) and has a few known issues; see the changelog for details.
- Pixel-accurate captures (windows, screens, menu bar) with optional Retina 2x scaling.
- Natural-language agent that chains Peekaboo tools (see, click, type, scroll, hotkey, menu, window, app, dock, space).
- Menu and menubar discovery with structured JSON; no clicks required.
- Multi-provider AI: GPT-5.1 family, Claude 4.x, Grok 4-fast (vision), Gemini 2.5, and local Ollama models.
- MCP server for Claude Desktop and Cursor plus a native CLI; the same tools in both.
- Configurable, testable workflows with reproducible sessions and strict typing.
- Requires macOS Screen Recording + Accessibility permissions (see docs/permissions.md).
- macOS app + CLI (Homebrew):
brew install steipete/tap/peekaboo
- MCP server (Node 22+, no global install needed):
npx -y @steipete/peekaboo
# Capture full screen at Retina scale and save to Desktop peekaboo image --mode screen --retina --path ~/Desktop/screen.png # Click a button by label (captures, resolves, and clicks in one go) peekaboo see --app Safari --json-output | jq -r '.data.session_id' | read SID peekaboo click --on "Reload this page" --session "$SID" # Run a natural-language automation peekaboo "Open Notes and create a TODO list with three items" # Run as an MCP server (Claude/Cursor) npx -y @steipete/peekaboo # Minimal Claude Desktop config snippet (Developer → Edit Config): # { # "mcpServers": { # "peekaboo": { # "command": "npx", # "args": ["-y", "@steipete/peekaboo"], # "env": { # "PEEKABOO_AI_PROVIDERS": "openai/gpt-5.1,anthropic/claude-opus-4" # } # } # } # }
| Command | Key flags / subcommands | What it does |
|---|---|---|
| see | --app, --mode screen/window, --retina, --json-output |
Capture and annotate UI, return session + element IDs |
| click | --on <id/query>, --session, --wait, coords |
Click by element ID, label, or coordinates |
| type | --text, --clear, --delay-ms |
Enter text with pacing options |
| press | key names, --repeat |
Special keys and sequences |
| hotkey | combos like cmd,shift,t |
Modifier combos (cmd/ctrl/alt/shift) |
| scroll | --on <id>, --direction up/down, --ticks |
Scroll views or elements |
| swipe | --from/--to, --duration, --steps |
Smooth gesture-style drags |
| drag | --from/--to, modifiers, Dock/Trash targets |
Drag-and-drop between elements/coords |
| move | --to <id/coords>, --screen-index |
Position the cursor without clicking |
| window | list, move, resize, focus, set-bounds |
Move/resize/focus windows and Spaces |
| app | launch, quit, relaunch, switch, list |
Launch, quit, relaunch, switch apps |
| space | list, switch, move-window |
List or switch macOS Spaces |
| menu | list, list-all, click, click-extra |
List/click app menus and extras |
| menubar | list, click |
Target status-bar items by name/index |
| dock | launch, right-click, hide, show, list |
Interact with Dock items |
| dialog | list, click, input, file, dismiss |
Drive system dialogs (open/save/etc.) |
| image | --mode screen/window/menu, --retina, --analyze |
Screenshot screen/window/menu bar (+analyze) |
| list | apps, windows, screens, menubar, permissions |
Enumerate apps, windows, screens, permissions |
| tools | --source native|mcp, --server <name> |
Inspect native + MCP tools |
| config | init, show, add, login, models |
Manage credentials/providers/settings |
| permissions | status, grant |
Check/grant required macOS permissions |
| run | .peekaboo.json, --output, --no-fail-fast |
Execute .peekaboo.json automation scripts |
| sleep | --duration (ms) |
Millisecond delays between steps |
| clean | --all-sessions, --older-than, --session |
Prune sessions and caches |
| agent | --model, --dry-run, --resume, --max-steps, audio |
Natural-language multi-step automation |
| mcp | serve, list, add, enable/disable, test |
Manage external MCP servers and serve Peekaboo |
- OpenAI: GPT-5.1 (default) and GPT-4.1/4o vision
- Anthropic: Claude 4.x
- xAI: Grok 4-fast reasoning + vision
- Google: Gemini 2.5 (pro/flash)
- Local: Ollama (llama3.3, llava, etc.)
Set providers via PEEKABOO_AI_PROVIDERS or peekaboo config add.
- Command reference: docs/commands/
- Architecture: docs/ARCHITECTURE.md
- Building from source: docs/building.md
- Testing guide: docs/testing/tools.md
- MCP setup: docs/commands/mcp.md
- Permissions: docs/permissions.md
- Ollama/local models: docs/ollama.md
- Agent chat loop: docs/agent-chat.md
- Service API reference: docs/service-api-reference.md
- Requirements: macOS 15+, Xcode 16+/Swift 6.2. Node 22+ only if you run the pnpm docs/build helper scripts (core CLI/app/MCP are Swift-only).
- Install deps:
pnpm installthenpnpm run build:cliorpnpm run test:safe. - Lint/format:
pnpm run lint && pnpm run format.
MIT