원클릭으로
screenshot-analysis
// How to interpret accessibility tree elements and correlate them with screenshot regions for accurate GUI interaction.
// How to interpret accessibility tree elements and correlate them with screenshot regions for accurate GUI interaction.
| name | screenshot-analysis |
| description | How to interpret accessibility tree elements and correlate them with screenshot regions for accurate GUI interaction. |
The a11y_tree is your primary guide for finding interactive elements:
[role=window] "Document - LibreOffice Calc"
[role=menubar]
[role=menuitem] "File"
[role=menuitem] "Edit"
[role=toolbar]
[role=button] "Save" (x=120, y=45)
[role=table] "Sheet"
[role=cell] "A1" (x=80, y=200)
button: clickable action (Save, OK, Cancel)menuitem: menu entry (click to open submenu or execute)textbox/text: editable text field (click then type)combobox: dropdown (click to open, then select)checkbox: toggle (click to check/uncheck)cell: spreadsheet cell (click to select, then type)tab: tab selector (click to switch)dialog: modal window (must handle before other actions)Keyboard shortcuts for common desktop applications — LibreOffice, GIMP, Chrome, Thunderbird, VS Code.
When to prefer GUI mouse clicks over keyboard shortcuts — especially for formatting, multi-step visual tasks, and cross-application workflows.
General GUI navigation patterns for desktop environments — finding elements, interacting with menus, and handling dialogs.
Verification patterns to confirm task completion before submitting. Read this before calling submit().
Workarounds for bot detection, CAPTCHA, 403 errors, and Cloudflare challenges when browsing the web.
Best practices for multi-step Python tasks including data analysis, HuggingFace datasets, token counting, and any task requiring state across multiple python() calls.