Automate across apps that were never designed to talk

Real work happens across multiple applications. You copy from a spreadsheet, paste into a CRM, verify in a database tool, then update a project tracker. These apps have no shared API. Agent Vision lets AI agents bridge the gaps by interacting with each application through the screen — the one interface every app shares.

Without Agent Vision

Cross-app automation requires each application to have an API, and those APIs need to be compatible. Most desktop apps don't have APIs. Enterprise tools have locked-down integrations. Legacy software has nothing. You end up doing it manually: alt-tab, copy, paste, verify, repeat. Hours of mind-numbing screen-shuffling that's too irregular to script but too repetitive to enjoy.

With Agent Vision

Your AI agent treats every application as a visual interface it can read and interact with. It screenshots the spreadsheet, reads the values, switches to the CRM, finds the input fields, types the data, switches to the database tool to verify. No APIs required. If a human can do it by looking at the screen and clicking, an AI agent with Agent Vision can do it too.

Commands

How it works

Start sessions for multiple app windows

terminal
$ agent-vision start --region 0,0,720,900 --name spreadsheet

Create named sessions for each application window.

Capture from one app

terminal
$ agent-vision capture --session $SHEET_SID

Screenshot the spreadsheet to read the source data.

Switch to another app and discover fields

terminal
$ agent-vision elements --session $CRM_SID --filter textfield

Find input fields in the target application.

Type data from source into target

terminal
$ agent-vision type --element el-input-001 --text "Acme Corp" --session $CRM_SID

Enter the value read from the spreadsheet into the CRM field.

Verify in a third app

terminal
$ agent-vision capture --session $DB_SID

Screenshot the database tool to confirm the record was created.

Real scenario

Example: Migrating contacts from a spreadsheet to a CRM

workflow
01
Agent captures the spreadsheet and reads the first row of contact data
02
Agent switches to the CRM window and discovers the "New Contact" form fields
03
Agent types the name, email, and phone number into the corresponding fields
04
Agent clicks "Save" and re-captures to verify the contact was created
05
Agent switches back to the spreadsheet and moves to the next row
06
Repeats until all contacts are migrated, logging successes and failures

Try it now

$ brew tap rvanbaalen/agent-vision
$ brew install agent-vision

Requires macOS 13+ · No dependencies · ~4MB

← Back to agentvision.robinvanbaalen.nl