Browser-based testing tools like Puppeteer and Playwright only work in the browser. But your users interact with native apps, desktop software, and simulators. Agent Vision lets AI agents test any macOS application the same way a human would — by looking at the screen, finding elements, and interacting with them.
Your AI agent can only test web apps. Native macOS applications, iOS Simulators, Electron apps, and desktop software are invisible. QA for these requires manual testing or fragile AppleScript hacks that break whenever the UI changes. You maintain separate test approaches for web and native, doubling your effort.
Your AI agent screenshots the app, discovers every button, label, and input field through the macOS Accessibility API, then interacts with them using precise coordinates. It works the same whether the target is Safari, Xcode, Figma, or your custom SwiftUI app. One approach for everything on screen.
Start a session targeting the app window
Lock onto a screen region. Agent Vision tracks this region across captures.
Capture the current state
Take a screenshot your AI agent can analyze. Returns a PNG path.
Discover all interactive elements
Returns every button, text field, link, and label with coordinates and accessibility info.
Filter for specific element types
Narrow discovery to just buttons, inputs, or any specific element type.
Click a discovered element
Interact with an element by its discovered ID. Coordinates are mapped automatically.
Re-capture to verify the result
After acting, screenshot again to verify the UI changed as expected.
Requires macOS 13+ · No dependencies · ~4MB
← Back to agentvision.robinvanbaalen.nl