What can you do with it?
The/browser-operator command is an AI agent that autonomously performs browser automation based on natural language instructions. You describe what you want accomplished, and the AI agent makes independent decisions on how to navigate, click, fill forms, extract data, and download files - no code required.
How to use it?
Basic Command Structure
Parameters
Required:task- Natural language description of what to do
model- AI model to use (default: google/gemini-2.5-flash)- google/gemini-2.5-flash
- google/gemini-2.5-pro
- openai/gpt-4o
- openai/gpt-4o-mini
- anthropic/claude-sonnet-4
systemPrompt- Custom instructions to give the AI agent specialized behavior or contextmaxSteps- Maximum number of browser actions the agent can perform (default: 30)buildId- Prefix used to name all output files including logs, downloads, and scraped datadisableCache- Set to true to disable action caching (default: false, caching is enabled)cacheDurationDays- How long cached actions remain valid in days (default: 7 days, recommended: 30 for stable sites, 1 for frequently changing sites)region- Geographic region where the browser session runs (default: us-west-2)proxies- Enable residential proxies to improve anti-detection capabilities (default: false)browserSettings- Browser configuration options:viewport- Browser window dimensions in pixels (default: 1920x1080)advancedStealth- Enable advanced anti-detection features (default: false)blockAds- Block advertisements for faster page loading (default: true)solveCaptchas- Automatically solve CAPTCHA challenges (default: true)recordSession- Record the session for replay and debugging (default: true)
collectionId- Specify which file storage resource to store output files in (defaults to Multimedia Artifact Collection if not specified)filesToUpload- Files to make available in the browser for upload operations, provided as an array of objects withurlandfileNamepropertiesuseContextService- ID of a saved browser login connection to use for authentication (automatically added when you select a browser login from the slash command menu)
Response Format
Browser Operator returns immediately with session details while the automation runs in the background:Self-Healing Automation
Browser Operator uses a self-healing technique that automatically adapts when websites change their structure or layout. Unlike traditional automation that breaks when a CSS class changes, Browser Operator understands the intent of your task and can find new ways to accomplish it.How It Works
Instead of relying on brittle selectors that break with every UI update, Browser Operator uses AI to understand what you want to accomplish: Traditional automation (brittle):- CSS class or ID changes
- HTML structure modifications
- Element position changes
- Layout redesigns
Adapting to Website Changes
When websites evolve, Browser Operator automatically adapts: Example scenario: Before: Login button was directly in the header- Browser Operator recognizes the button is now nested
- Automatically performs: Open account menu → Click login button
- No code changes needed
Benefits
Maintenance reduction:- Automations continue working through website redesigns
- No need to update selectors when HTML changes
- Fewer broken workflows after site updates
- Understands intent rather than just following instructions
- Finds alternative paths when primary methods fail
- Recovers gracefully from minor structural changes
- “Click the submit button” works regardless of button implementation
- “Fill in the email field” adapts to different form structures
- “Extract product prices” handles various page layouts
When Self-Healing Helps Most
Self-healing is particularly valuable for:- Long-running production automations
- Sites that frequently update their UI
- Third-party websites you don’t control
- Automations that need to work across multiple similar sites
Self-Healing with Caching
Self-healing and caching work together to create fast, resilient automations. Here’s how they interact: How it works:-
First execution (no cache):
- AI agent explores and learns how to complete the task
- Actions are saved to cache for future runs
- Takes full time (e.g., 30-60 seconds)
-
Subsequent executions (cache hit):
- Cached actions replay instantly (10-100x faster)
- No AI inference needed
- Works perfectly if website hasn’t changed
-
When website changes slightly (cache + self-healing):
- Cached actions attempt to execute
- Minor changes detected (e.g., button moved, class name changed)
- Self-healing adapts the cached actions to the new structure
- Continues working without full re-exploration
-
When website changes significantly (cache invalidation):
- Cached actions fail and self-healing cannot adapt
- Cache is cleared automatically
- Fresh exploration happens to re-learn the task
- Traditional automation: Breaks immediately, needs code update
- With cache only: Cache becomes invalid, requires full re-exploration (slow)
- With cache + self-healing: Adapts automatically, continues using cache (fast)
Limitations
Self-healing has limitations:- Major functional changes may require task updates
- Completely new workflows need new task descriptions
- Significant structural changes may require cache invalidation to re-learn
Authentication
Browser Operator supports two methods for handling authentication:Using Stored Secrets from Vault
Select a stored secret from the slash command menu to retrieve login credentials from your vault:Using Browser Login Connections
Browser Login Connections store authenticated session details (not credentials) that you create by logging in through a live browser session. Once created, these saved sessions can be used with Browser Operator. To create a Browser Login Connection:- Create a new browser connection in your Connections page
- Authenticate through the live browser session
- Complete any 2FA or CAPTCHA challenges
- Save the connection
Automatic iFrame and Shadow DOM Support
Browser Operator automatically handles iFrame traversal and Shadow DOM elements without requiring any special configuration. Elements inside iFrames and Shadow DOMs can be targeted using the same natural language instructions as regular page elements. What’s supported:- Payment forms inside iFrames
- Embedded widgets and third-party content
- Nested iFrames (multiple levels deep)
- Shadow DOM components (custom web components)
- No frame switching or special selectors required
Parameter Notes
This section provides detailed information about Browser Operator’s configuration parameters and how they affect automation performance and capabilities.Proxies
Residential proxies route your browser traffic through real residential IP addresses instead of datacenter IPs. This significantly improves success rates when automating websites that employ anti-bot detection. Benefits:- Appear as genuine residential user traffic
- Reduce likelihood of being flagged as automated
- Enable IP rotation for distributed requests
- Support geo-location testing from different regions
- Scraping sites with anti-bot protection
- Production automation workflows
- E-commerce or booking platforms
- Sites that block datacenter IPs
- Local development and testing
- Internal tools with no restrictions
- Speed-critical applications where latency matters
Advanced Stealth
Advanced stealth mode implements sophisticated browser fingerprinting countermeasures that make automated browsers indistinguishable from human-driven browsers. This goes beyond basic anti-detection to defeat even advanced bot-detection systems. What it does:- Masks automation signatures in the browser environment
- Modifies browser fingerprinting characteristics
- Prevents detection through WebDriver properties
- Mimics genuine user behavior patterns
- Websites with Cloudflare or similar protections
- Platforms with sophisticated bot detection
- Production scrapers requiring maximum stealth
- Compliance-sensitive automation
Block Ads
Ad blocking improves automation speed and reliability by preventing ad networks from loading. This reduces page weight, eliminates ad-related JavaScript, and prevents popups or overlays that can interfere with automation. Benefits:- Faster page load times (20-50% improvement typical)
- Reduced bandwidth and data transfer
- Fewer elements to process during automation
- Elimination of modal ads and interruptions
- More consistent page structure
Solve CAPTCHAs
Automatic CAPTCHA solving uses specialized services to detect and solve CAPTCHA challenges without manual intervention. This enables fully autonomous workflows even on CAPTCHA-protected sites. How it works:- Detects CAPTCHA challenges during automation
- Submits challenges to solving service
- Waits for solution and applies it automatically
- Continues automation after solving challenge
- reCAPTCHA v2 and v3
- hCaptcha
- Image-based challenges
- Audio challenges
Record Session
Session recording captures a complete visual and technical record of the browser automation for replay and debugging. Every action, network request, and console log is preserved. What’s captured:- Frame-by-frame video of browser activity
- All network requests with timing data
- JavaScript console logs and errors
- Browser resource usage metrics
- Action timeline with timestamps
- Visual verification of what the agent did
- Identify where automations fail
- Review network issues or errors
- Share session replays with team members
- Audit automation behavior
- Replays are reconstructed from recorded DOM changes and may not be perfectly one-to-one with the live session due to the nature of the RRWeb replay technology
- To ensure the full automation is captured in the replay, add a few seconds of wait time at the end of your task (e.g., “then wait 3 seconds”)
- The live session view during execution is always accurate - replay limitations only affect post-execution review
Viewport
Viewport dimensions determine the visible browser window size, which affects responsive layouts and how AI models perceive page content. Different sizes can reveal different UI variations or trigger responsive breakpoints. Common configurations:1920x1080- Standard desktop (default)1280x720- Smaller desktop display2560x1440- Large desktop/4K displays1024x768- Tablet landscape375x667- Mobile phone simulation
- Responsive sites show different content per viewport
- Some elements only appear at certain sizes
- AI computer-use models analyze pixel coordinates
- Form layouts may change with screen size
Caching
Action caching stores the AI agent’s decisions so identical tasks can replay without requiring new LLM inference. This dramatically reduces both execution time and cost for repeated operations. How caching works:- First execution: AI determines actions, which are saved
- Subsequent executions: Saved actions replay instantly
- Cache invalidation: Changes to task or page structure trigger re-learning
- 10-100x faster execution on cached operations
- Zero LLM API costs after initial run
- Deterministic behavior across runs
- Consistent timing and results
- Exact task match = cache hit: If you run the same task description again, cached actions are reused
- Task change = cache miss: Any modification to the task description creates a new cache entry
- Automatic invalidation: Changing even a single word in your task invalidates the cache and triggers fresh exploration
- Keep task descriptions consistent to benefit from caching
- Minor wording changes invalidate cache (e.g., “click Submit” vs “click the Submit button”)
- Variables in tasks are part of the cache key
- This ensures cache accuracy but means precise wording matters for cache hits
- Organized per organization and automation
- Task description hashed to generate unique cache identifier
- Separate caches for different automations
cacheDurationDays: 7- Cached actions expire after 7 days (default)cacheDurationDays: 30- Longer duration for stable, unchanging sitescacheDurationDays: 1- Short duration for frequently updated sites
- Repetitive data extraction tasks
- Form submission workflows run multiple times
- Stable website structures
- Production automations with consistent requirements
Region
Geographic region determines where the cloud browser session physically runs. This affects latency, data residency, and in some cases, content availability. Available regions:us-west-2- US West Coast (default)us-east-1- US East Coasteu-west-1- Europe (Ireland)- Other regions available based on infrastructure
- Latency to target websites (choose closest region)
- Data residency compliance requirements
- Content geo-restrictions or geo-targeted content
- Legal/regulatory considerations
Max Steps
Maximum steps limit how many individual browser actions the AI agent can perform before terminating. This prevents infinite loops and controls execution time. Setting appropriate limits:- Simple tasks (click, extract): 5-10 steps
- Multi-step workflows: 20-40 steps
- Complex multi-page processes: 50-100 steps
- Navigate to URL
- Click element
- Fill form field
- Extract data
- Wait for condition
- Scroll action
Model Selection
Different AI models have varying capabilities, speeds, and costs. Choosing the right model depends on your task complexity and performance requirements. Model characteristics:- Gemini 2.5 Flash - Fastest execution, lowest cost, good for straightforward tasks
- Gemini 2.5 Pro - More capable for complex reasoning and extraction
- GPT-4o - Strong general-purpose model for diverse tasks
- GPT-4o Mini - Balanced speed and capability
- Claude Sonnet 4 - Excellent for complex form interactions and nuanced understanding
- Start with the default (Gemini 2.5 Flash) for most tasks
- Switch to more capable models if you encounter failures
- Consider cost vs. capability tradeoffs for production use
- Test different models to find optimal performance for your specific use case
System Prompt
System prompts provide specialized instructions or context to guide the AI agent’s behavior throughout the automation. This is useful for domain-specific tasks or when you need the agent to follow particular patterns. When to use system prompts:- Domain expertise needed (medical, legal, financial portals)
- Specific behavioral requirements (cautious, thorough, fast)
- Context about the website structure
- Special handling instructions
File Upload Details
When you provide files to Browser Operator, they are automatically uploaded to the browser’s~/Downloads directory before your task executes. This makes them available for selection in any file picker dialogs.
How it works:
- Files are uploaded to browser session before task starts
- Files appear in
~/Downloadsdirectory - AI agent can select them from file pickers
- Task instruction simply references the filename
File Download Details
Files downloaded during browser automation are automatically captured and saved to your file storage. Browser Operator monitors download activity and retrieves all downloaded files after the session completes. Download process:- Your task instructs the agent to trigger downloads (click download link, submit form that returns file, etc.)
- Browser infrastructure captures the downloaded files
- After task completion, files are retrieved and saved to your file storage
- Files are named with your buildId prefix for easy organization
- PDFs, spreadsheets, images, archives
- Form submission responses (CSV exports, reports)
- Dynamically generated files
- Multiple downloads in a single session
- Pattern:
{buildId}-download-{index}-{originalFileName} - Example:
reports-1730000000-download-1-quarterly-report.pdf
DOM Settling
DOM settling refers to how long Browser Operator waits for web pages to stabilize before taking actions. This ensures that animations complete, lazy-loaded content appears, and JavaScript updates finish before the agent interacts with elements. Why it matters:- Dynamic pages with animations need time to complete transitions
- Lazy-loading content needs to appear before extraction
- JavaScript frameworks (React, Vue, Angular) need time to render
- Single-page applications update content asynchronously
- Automatically waits for DOM to stabilize before each action
- Adapts waiting based on page complexity
- Ensures elements are ready for interaction
- Prevents errors from interacting with elements that aren’t ready
- Heavy animations or transitions
- Infinite scroll or lazy-loading
- Complex single-page applications
- Dynamic content that loads after page navigation

