Automate any web app — no API needed

Most web apps have terrible APIs, rate-limited APIs, or no APIs at all. Agent Vision doesn't need one. Point it at a browser window and interact with the web app the same way a human would — by seeing the page and clicking things. Update Jira tickets, export Notion tables, fill Google Forms, navigate legacy enterprise admin panels. No API keys, no webhooks, no OAuth flows. The last mile automation tool.

Without Agent Vision

You need API access for every web app you want to automate. Half the apps don't have public APIs. The ones that do require OAuth setup, API keys, rate limit management, and pagination handling. Legacy enterprise tools have internal APIs that IT won't give you access to. You end up doing it manually or building brittle Selenium scripts that break every time the CSS changes.

With Agent Vision

Point Agent Vision at the browser window. It discovers buttons, links, form fields, and text on the page through the macOS Accessibility API — no DOM access needed. Your AI agent reads the page visually, clicks buttons, fills forms, and navigates between pages. It works with any web app because it works at the screen level, not the API level.

Commands

How it works

Target the browser window

terminal

$ agent-vision start --region 0,0,1440,900 --name browser

Lock onto your browser window where the web app is open.

Discover page elements

terminal

$ agent-vision elements --session $SID

Find all interactive elements on the current page: buttons, links, inputs, dropdowns.

Click a navigation element

terminal

$ agent-vision click --element el-link-005 --session $SID

Navigate to a different page or section by clicking links and menu items.

Fill a form field

terminal

$ agent-vision type --element el-input-003 --text "PROJ-1234: Bug fix" --session $SID

Type into any input field on the page. Works with search boxes, text areas, and inline editors.

Submit or save

terminal

$ agent-vision click --element el-btn-save --session $SID

Click the save/submit/update button to commit the changes.

Real scenario

Example: Updating 20 Jira ticket statuses

workflow

Agent captures the Jira board in the browser and identifies all ticket cards

Agent clicks the first ticket to open its detail view

Agent discovers the status dropdown and clicks it to reveal options

Agent selects "In Review" from the dropdown list

Agent clicks the back button to return to the board view

Agent repeats for each remaining ticket, re-scanning after every navigation

Automate any web app — no API needed

Without Agent Vision

With Agent Vision

How it works

Example: Updating 20 Jira ticket statuses

Try it now