Help
Overview
The AI Desktop Agent is an application that sends user prompts and screenshots to an AI engine and executes the commands received in the AI's response. This process is repeated until the AI engine determines that the task is complete. The commands include keyboard and mouse control, as well as delays and requests for new screenshots. You can also fine-tune the process with various configurations.
Sandbox and virtual desktop support
If you're concerned about what the AI engine can see or do, you can specify an isolated screen area. The AI engine will only receive screenshots of that area and will only be able to move the mouse within it. For example, you can open Windows Sandbox, define it as the designated area, and the AI engine will operate exclusively within that sandbox.
Settings
Settings can be accessed via the "Main/Settings" menu.
AI client
Select the AI engine you want to use, either Gemini or ChatGPT.
API key
Enter a valid API key for the selected AI engine. Please note that the application does not track API key usage or associated costs, so this is your responsibility. You can create a Gemini API key at https://aistudio.google.com/app/api-keys and a ChatGPT API key at https://platform.openai.com/settings/organization/api-keys.
Model
Enter a valid model name for the AI engine. Gemini models can be found at https://ai.google.dev/gemini-api/docs/models (use the named model codes). ChatGPT models are listed at https://platform.openai.com/docs/models (use the named snapshots).
Delay
This sets the time between requests. It's useful if you're monitoring the execution and want to stop the process for any reason.
Auto request limit
This is the maximum number of automatic iterations. After your initial prompt, the application will run automatically until the AI engine indicates it's finished or you cancel the process. If it reaches this number of iterations but isn't done, you can click "Generate" again to reset the counter and continue.
Run minimized
The application will minimize the main window when you start a prompt and restore it after the AI finishes.
Retry on error
If an error occurs, the application will retry and continue. Otherwise, it will finish the process.
Screenshot area
These are the coordinates and dimensions of the screen area where the AI engine will operate. Only this area will be sent to the AI engine in screenshots, and the mouse will be confined to this area. This is useful for specifying a Sandbox or Virtual Desktop.
Screenshot area Preview
This will open a new window to display the screenshot area. It also allows you to move or resize it by dragging the mouse.
Screenshot area Reset
This will reset the screenshot area to the full screen.