Chain desktop apps into AI workflows

Think of it as Zapier for your actual desktop, but with AI intelligence. Agent Vision sees and controls every app on your Mac — Mail, Calendar, Slack, Numbers, Preview, anything. Chain them into workflows: read email, create calendar event, post summary to Slack, log it in a spreadsheet. No app needs to support integrations because Agent Vision works at the screen level. Every window is just another interface to scan and interact with.

Without Agent Vision

Desktop app automation on macOS means AppleScript, which is unreliable and limited. Most apps have partial or no AppleScript support. Shortcuts.app helps for simple tasks but can't handle complex multi-app workflows with decision logic. You end up doing multi-step desktop workflows manually: read the email, copy the date, switch to Calendar, create an event, switch to Slack, type a message. Dozens of app switches per workflow.

With Agent Vision

Your AI agent manages sessions for each app window. It reads content from Mail.app, extracts dates and details, switches to Calendar to create an event, composes a Slack message with the summary, and logs everything in a Numbers spreadsheet. Each app is just a window to capture, scan for elements, and interact with. The AI agent handles the logic, context, and decision-making.

Commands

How it works

Start sessions for each app

terminal
$ agent-vision start --region 0,0,800,600 --name mail

Create a named session for Mail.app. Repeat for Calendar, Slack, Numbers.

Read email content

terminal
$ agent-vision capture --session $MAIL_SID

Screenshot the email. Your AI agent reads the content from the image.

Create a calendar event

terminal
$ agent-vision click --element el-btn-new-event --session $CAL_SID

Switch to Calendar and click "New Event" to start creating an entry.

Fill in event details

terminal
$ agent-vision type --element el-input-title --text "Team standup re: Q2 planning" --session $CAL_SID

Type the event title extracted from the email.

Post to Slack

terminal
$ agent-vision type --element el-input-message --text "New meeting scheduled: Q2 planning" --session $SLACK_SID

Switch to Slack and type a summary message in the channel input.

Log in spreadsheet

terminal
$ agent-vision type --element el-cell-A1 --text "2026-03-31" --session $NUMBERS_SID

Switch to Numbers and log the event date in the tracking spreadsheet.

Real scenario

Example: Email → Calendar → Slack → Spreadsheet

workflow
01
Agent captures Mail.app and reads the latest email about a meeting request
02
Agent extracts the date, time, attendees, and subject from the email content
03
Agent switches to Calendar, clicks "New Event", and fills in all the details
04
Agent saves the calendar event and verifies it appears on the correct date
05
Agent switches to Slack, navigates to the team channel, and posts a summary message
06
Agent switches to Numbers and adds a row logging the meeting with date, subject, and attendees

Try it now

$ brew tap rvanbaalen/agent-vision
$ brew install agent-vision

Requires macOS 13+ · No dependencies · ~4MB

← Back to agentvision.robinvanbaalen.nl