AI Desktop Agent

Computer control via AI prompts

What is it for?

This is an application that automates your work on the computer. It's not limited to a single application but works across all of them. For example, you could ask it to "release an application to production." It would then commit changes in your IDE, open the build pipeline in a browser, wait for the build to finish, click the "Deploy to production" button, and finally, send an email to clients through your email application.

How does it work?

It takes your prompt, adds a screenshot, and submits them to an AI engine (currently supporting Gemini and ChatGPT). The AI engine then sends back a list of keyboard and mouse commands to execute. If feedback is needed, the application sends a new screenshot, and the AI provides new commands. This process continues until the AI confirms the task is complete. You can then review the results or create a new prompt.

How well does it work?

Currently, it can be slow and may not always perform as expected, but it's a work in progress. It's more of a fun experiment than a production-ready tool, but with newer AI models, it's expected to become more useful. In my subjective experience, Gemini 3 has performed better, but you can try it for yourself.

Features

Gemini support
ChatGPT support
Sandbox, virtual desktop, specific app support