Help

Overview

The AI Desktop Agent is an application that sends user prompts and screenshots to an AI engine and executes the commands received in the AI's response. This process is repeated until the AI engine determines that the task is complete. The commands include keyboard and mouse control, as well as delays and requests for new screenshots. You can also fine-tune the process with various configurations.

Sandbox and virtual desktop support

If you're concerned about what the AI engine can see or do, you can specify an isolated screen area. The AI engine will only receive screenshots of that area and will only be able to move the mouse within it. For example, you can open Windows Sandbox, define it as the designated area, and the AI engine will operate exclusively within that sandbox.

Settings

Settings can be accessed via the "Main/Settings" menu.

AI client

Select the AI engine you want to use, either Gemini or ChatGPT.

API key

Enter a valid API key for the selected AI engine. Please note that the application does not track API key usage or associated costs, so this is your responsibility. You can create a Gemini API key at https://aistudio.google.com/app/api-keys and a ChatGPT API key at https://platform.openai.com/settings/organization/api-keys.

Model

Enter a valid model name for the AI engine. Gemini models can be found at https://ai.google.dev/gemini-api/docs/models (use the named model codes). ChatGPT models are listed at https://platform.openai.com/docs/models (use the named snapshots).

Delay

This sets the time between requests. It's useful if you're monitoring the execution and want to stop the process for any reason.

Auto request limit

This is the maximum number of automatic iterations. After your initial prompt, the application will run automatically until the AI engine indicates it's finished or you cancel the process. If it reaches this number of iterations but isn't done, you can click "Generate" again to reset the counter and continue.

Run minimized

The application will minimize the main window when you start a prompt and restore it after the AI finishes.

Retry on error

If an error occurs, the application will retry and continue. Otherwise, it will finish the process.

Screenshot area

These are the coordinates and dimensions of the screen area where the AI engine will operate. Only this area will be sent to the AI engine in screenshots, and the mouse will be confined to this area. This is useful for specifying a Sandbox or Virtual Desktop.

Screenshot area Preview

This will open a new window to display the screenshot area. It also allows you to move or resize it by dragging the mouse.

Screenshot area Reset

This will reset the screenshot area to the full screen.