Fill forms in any app with AI agents

Forms exist everywhere — CRM tools, admin panels, native desktop apps, government portals, legacy enterprise software. Agent Vision lets AI agents discover input fields in any application, type values into them, navigate between fields, and submit. No app-specific APIs or integrations needed.

Without Agent Vision

Automating forms means writing custom scripts per application. Selenium for web forms, AppleScript for native Mac apps (unreliably), and nothing at all for many enterprise desktop tools. Each integration is fragile, app-specific, and breaks on UI updates. Cross-app workflows (copy from spreadsheet, paste into web form) require stitching together completely different automation frameworks.

With Agent Vision

Your AI agent sees the form the same way a human does. It discovers all input fields, labels, dropdowns, and buttons through the Accessibility API. It types into fields by element ID, tabs between them, selects dropdown values, and clicks submit. The same commands work whether the form is in Chrome, a native Mac app, or an Electron-based tool.

Commands

How it works

Discover all form fields

terminal
$ agent-vision elements --session $SID --filter textfield

Find every text input, textarea, and editable field in the current view.

Type into a specific field

terminal
$ agent-vision type --element el-input-002 --text "hello@example.com"

Enter text into a discovered field. Agent Vision focuses the field and types without stealing your cursor.

Clear a field before typing

terminal
$ agent-vision type --element el-input-002 --text "" --clear

Clear existing content before entering new text.

Tab to the next field

terminal
$ agent-vision key --key tab --session $SID

Send keyboard events to navigate between form fields.

Submit the form

terminal
$ agent-vision click --element el-btn-submit --session $SID

Click the submit button after filling all fields.

Real scenario

Example: Filling an expense report in a desktop app

workflow
01
AI agent captures the expense form window and discovers all input fields
02
Reads field labels to understand which field is "Amount", "Date", "Category", "Description"
03
Types the expense amount into the amount field
04
Selects the date using the date picker
05
Chooses a category from the dropdown
06
Enters a description in the notes field
07
Clicks "Submit" and re-captures to verify the success confirmation

Try it now

$ brew tap rvanbaalen/agent-vision
$ brew install agent-vision

Requires macOS 13+ · No dependencies · ~4MB

← Back to agentvision.robinvanbaalen.nl