Test any macOS app with AI agents

Browser-based testing tools like Puppeteer and Playwright only work in the browser. But your users interact with native apps, desktop software, and simulators. Agent Vision lets AI agents test any macOS application the same way a human would — by looking at the screen, finding elements, and interacting with them.

Without Agent Vision

Your AI agent can only test web apps. Native macOS applications, iOS Simulators, Electron apps, and desktop software are invisible. QA for these requires manual testing or fragile AppleScript hacks that break whenever the UI changes. You maintain separate test approaches for web and native, doubling your effort.

With Agent Vision

Your AI agent screenshots the app, discovers every button, label, and input field through the macOS Accessibility API, then interacts with them using precise coordinates. It works the same whether the target is Safari, Xcode, Figma, or your custom SwiftUI app. One approach for everything on screen.

Commands

How it works

Start a session targeting the app window

terminal
$ agent-vision start --region 0,0,1440,900

Lock onto a screen region. Agent Vision tracks this region across captures.

Capture the current state

terminal
$ agent-vision capture --session $SID --format png

Take a screenshot your AI agent can analyze. Returns a PNG path.

Discover all interactive elements

terminal
$ agent-vision elements --session $SID

Returns every button, text field, link, and label with coordinates and accessibility info.

Filter for specific element types

terminal
$ agent-vision elements --session $SID --filter button

Narrow discovery to just buttons, inputs, or any specific element type.

Click a discovered element

terminal
$ agent-vision click --element el-btn-001 --session $SID

Interact with an element by its discovered ID. Coordinates are mapped automatically.

Re-capture to verify the result

terminal
$ agent-vision capture --session $SID

After acting, screenshot again to verify the UI changed as expected.

Real scenario

Example: Testing a login flow in an iOS Simulator

workflow
01
AI agent starts an Agent Vision session targeting the Simulator window
02
Captures a screenshot and identifies the email field, password field, and login button
03
Types test credentials into each field using agent-vision type
04
Clicks the login button using agent-vision click
05
Re-captures and verifies the dashboard screen loaded (no error banners, expected elements present)
06
Reports pass/fail with annotated screenshots as evidence

Try it now

$ brew tap rvanbaalen/agent-vision
$ brew install agent-vision

Requires macOS 13+ · No dependencies · ~4MB

← Back to agentvision.robinvanbaalen.nl