Skip to main content

Overview

Natural language or Playwright. Describe the task; the AI navigates, clicks, fills forms, extracts data. Output files (extracted data, downloads, screenshots) are returned automatically. For full tool parameters and schemas, see the browser-automation server reference.

Tools

All browser automation tools are on the /browser-automation server path.
ToolDescription
browser-automation_operator_runStart a browser task using natural language
browser-automation_operator_run_continuePoll for completion (call every 3-5 seconds)
browser-automation_playwright_runRun Playwright JavaScript code in a cloud browser
browser-automation_playwright_run_continuePoll for Playwright completion
browser-automation_logins_listList saved browser login contexts
Use Browser Operator (natural language) by default. Use Playwright for precise programmatic control.

Browser Operator

Step 1: Start the task

curl -s -X POST "https://mcp.app.pinkfish.ai/browser-automation" \
  -H "Authorization: Bearer $PINKFISH_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "browser-automation_operator_run",
      "arguments": {
        "task": "Navigate to https://news.ycombinator.com, extract the top 10 story titles with their URLs and point counts, save as stories.json",
        "cacheKey": "hn8x2k4m",
        "model": "google/gemini-3-flash-preview",
        "agentMode": "hybrid",
        "maxSteps": 30,
        "blockAds": true,
        "solveCaptchas": true,
        "recordSession": true
      }
    },
    "id": 1
  }'
Response:
{
  "status": "RUNNING",
  "sessionId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "collectionId": "coll_xyz789",
  "buildId": "agent-1234567890",
  "logFileName": "agent-1234567890.log",
  "message": "Browser Operator task started. Use operator_run_continue in a loop to poll for completion."
}

Step 2: Poll for completion

curl -s -X POST "https://mcp.app.pinkfish.ai/browser-automation" \
  -H "Authorization: Bearer $PINKFISH_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "browser-automation_operator_run_continue",
      "arguments": {
        "sessionId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
        "collectionId": "coll_xyz789",
        "logFileName": "agent-1234567890.log"
      }
    },
    "id": 1
  }'
Call this every 3-5 seconds. While running, you’ll get "status": "RUNNING". When finished: Completed response:
{
  "status": "completed",
  "sessionId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "result": "Successfully extracted top 10 stories from Hacker News",
  "files": [
    {
      "url": "https://s3-signed-url/stories.json",
      "fileName": "stories.json",
      "mimeType": "application/json",
      "size": 2456,
      "source": "extract"
    }
  ],
  "collectionId": "coll_xyz789",
  "logFileName": "agent-1234567890.log",
  "logContent": "Step 1: Navigating to https://news.ycombinator.com...\nStep 2: Extracting story titles...\n...",
  "message": "Browser Operator task completed! Use result.files to iterate over ALL output files."
}

Step 3: Download the files

Every file in the files array has a signed S3 URL:
curl -o stories.json "https://s3-signed-url/stories.json"

File Output

Browser automation automatically captures three categories of files:
SourceDescriptionExample
extractData the AI extracted from the page (JSON, CSV, text)stories.json, product_data.csv
downloadFiles the browser downloaded by clicking linksreport.pdf, export.xlsx
scriptFiles explicitly saved by Playwright codescreenshot.png, results.json
All files are returned in a unified files array on completion. Each file includes:
  • url — Signed S3 URL (valid for ~1 hour)
  • fileName — Original filename
  • mimeType — File type
  • size — File size in bytes
  • source — How the file was generated

Parameters Reference

Browser Operator (operator_run)

ParameterTypeDefaultDescription
taskstring(required)Natural language task description (single line, be specific)
cacheKeystring(required)Unique 8-character alphanumeric ID (generate a new one each time you change the task)
modelenumgoogle/gemini-3-flash-previewAI model: google/gemini-2.5-flash, google/gemini-2.5-pro, openai/gpt-4o, openai/gpt-4o-mini, anthropic/claude-sonnet-4
agentModeenumhybriddom (CSS selectors), hybrid (visual + DOM), cua (Computer Use Agent)
maxStepsnumber30Max actions (1-100). Increase for complex multi-page tasks
systemPromptstringRole/context (e.g., “You are filling out insurance forms”)
regionenumus-west-2us-west-2, us-east-1, eu-central-1, ap-southeast-1
proxiesbooleanfalseEnable residential proxies (useful for geo-restricted sites)
advancedStealthbooleanfalseAdvanced anti-detection measures
blockAdsbooleantrueBlock advertisements
solveCaptchasbooleantrueAutomatically solve CAPTCHAs
recordSessionbooleantrueEnable session recording for replay
viewportWidthnumber1288Browser viewport width
viewportHeightnumber711Browser viewport height
filesToUploadarrayFiles to make available for upload: [{ url: "https://...", fileName: "doc.pdf" }]
collectionIdstringFilestorage collection ID for output files
useContextServicestringSaved login context ID (see Reusing Saved Logins)

Writing Good Tasks

The task parameter is the most important input. Write it as a single line with specific instructions: Good examples:
Navigate to https://example.com/contact, fill out the contact form with name=John Doe, email=john@example.com, message=Test inquiry, then submit the form
Go to https://example.com/products, scroll through all pages, extract product name, price, and description for each product, save as products.json
Go to https://amazon.com, search for "wireless headphones", extract the first 5 results with title, price, rating, and review count
Bad examples (too vague):
Register on the site          -> Which site? What registration data?
Get the data                  -> What data? From where?
Fill out the form             -> Which form? What values?

Playwright: Code-Based Automation

For precise programmatic control, use Playwright. A page variable is pre-configured — you don’t need to launch a browser.

Start a Playwright task

curl -s -X POST "https://mcp.app.pinkfish.ai/browser-automation" \
  -H "Authorization: Bearer $PINKFISH_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "browser-automation_playwright_run",
      "arguments": {
        "code": "await page.goto(\"https://news.ycombinator.com\");\nconst stories = await page.$$eval(\".titleline > a\", links => links.slice(0, 10).map((a, i) => ({ rank: i + 1, title: a.textContent, url: a.href })));\nreturn { writeToCollection: true, fileName: \"stories.json\", fileContent: JSON.stringify(stories, null, 2) };",
        "buildId": "hn-scrape-001"
      }
    },
    "id": 1
  }'
Then poll with browser-automation_playwright_run_continue using the sessionId, exactly like the Browser Operator flow.

Saving files from Playwright

Return this structure from your code to save files:
return {
  writeToCollection: true,
  fileName: "results.json",
  fileContent: JSON.stringify(data, null, 2),
};
Browser downloads (clicking download links) are automatically captured — no special code needed.

Reusing Saved Logins

For sites that require authentication, you can reuse saved login contexts instead of re-authenticating each time.

List available logins

curl -s -X POST "https://mcp.app.pinkfish.ai/browser-automation" \
  -H "Authorization: Bearer $PINKFISH_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "browser-automation_logins_list",
      "arguments": {}
    },
    "id": 1
  }'
Response:
{
  "count": 2,
  "contexts": [
    {
      "id": "ctx_abc123",
      "label": "My LinkedIn",
      "service": "linkedin",
      "loginUrl": "https://www.linkedin.com/login",
      "status": "active",
      "createdAt": "2026-01-15T10:30:00Z"
    }
  ]
}

Use a saved login

Pass the id as the useContextService parameter:
curl -s -X POST "https://mcp.app.pinkfish.ai/browser-automation" \
  -H "Authorization: Bearer $PINKFISH_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "browser-automation_operator_run",
      "arguments": {
        "task": "Go to https://www.linkedin.com/feed, extract the latest 10 posts from my feed",
        "cacheKey": "li9f3k2p",
        "useContextService": "ctx_abc123",
        "model": "google/gemini-3-flash-preview",
        "agentMode": "hybrid",
        "maxSteps": 30,
        "recordSession": true,
        "blockAds": true,
        "solveCaptchas": true
      }
    },
    "id": 1
  }'
The browser session starts already logged in — no credentials in your task description.

Caching

Browser Operator supports intelligent caching to avoid redundant executions:
  • cacheKey (required) — A unique 8-character alphanumeric string that identifies this task variant. Generate a new cacheKey whenever you change the task text.
  • disableCache — Set to true to bypass the cache and force a fresh execution.
  • cacheDurationDays — How long cached results remain valid (1-30 days, default: 7).
Caching is automatically disabled if the task contains the word “screenshot” (cached screenshots wouldn’t reflect current page state).

Complete Python Example

import requests
import json
import time

MCP_URL = "https://mcp.app.pinkfish.ai"
TOKEN = "<YOUR_PLATFORM_JWT>"

HEADERS = {
    "Authorization": f"Bearer {TOKEN}",
    "Content-Type": "application/json",
    "Accept": "application/json",
}


def mcp_call(tool_name, arguments):
    """Call a browser automation tool."""
    resp = requests.post(
        f"{MCP_URL}/browser-automation",
        headers=HEADERS,
        json={
            "jsonrpc": "2.0",
            "method": "tools/call",
            "params": {"name": tool_name, "arguments": arguments},
            "id": 1,
        },
    )
    resp.raise_for_status()
    result = resp.json().get("result", {})
    content = result.get("content", [{}])[0].get("text", "{}")
    return json.loads(content)


# Step 1: Start the browser task
print("Starting browser automation...")
run_result = mcp_call("browser-automation_operator_run", {
    "task": "Navigate to https://news.ycombinator.com, extract the top 10 stories with title, URL, and points, save as stories.json",
    "cacheKey": "hn8x2k4m",
    "model": "google/gemini-3-flash-preview",
    "agentMode": "hybrid",
    "maxSteps": 30,
    "blockAds": True,
    "solveCaptchas": True,
    "recordSession": True,
})

session_id = run_result["sessionId"]
collection_id = run_result.get("collectionId")
log_file = run_result.get("logFileName")
print(f"Session started: {session_id}")

# Step 2: Poll until complete
result = run_result
while result.get("status") not in ("completed", "failed"):
    time.sleep(5)
    result = mcp_call("browser-automation_operator_run_continue", {
        "sessionId": session_id,
        "collectionId": collection_id,
        "logFileName": log_file,
    })
    print(f"  Status: {result.get('status')}")

# Step 3: Process results
if result["status"] == "completed":
    print(f"\nTask completed: {result.get('result')}")
    print(f"\nFiles generated:")
    for f in result.get("files", []):
        print(f"  - {f['fileName']} ({f['mimeType']}, {f['size']} bytes, source: {f['source']})")
        print(f"    Download: {f['url'][:80]}...")
else:
    print(f"\nTask failed: {result.get('result')}")

# Step 4: Print the session log
if result.get("logContent"):
    print(f"\nSession log:\n{result['logContent']}")
Requires: pip install requests

Using Browser Automation in Workflows

Browser automation works inside Pinkfish workflows (see Workflows). The key pattern: put the entire run + poll + file saving loop in a single node function.
async function node_scrape_data(params) {
  // Start the browser task
  let result = await pf.mcp.callTool(
    "browser-automation",
    "browser-automation_operator_run",
    {
      task: params.task,
      cacheKey: "a1b2c3d4",
      model: "google/gemini-3-flash-preview",
      agentMode: "hybrid",
      maxSteps: 30,
      blockAds: true,
      solveCaptchas: true,
      recordSession: true,
    },
  );

  // Enable live preview in the Pinkfish UI
  await pf.run.updateMetadata({
    browserSessionId: result.sessionId,
    collectionId: result.collectionId,
    logFileName: result.logFileName,
  });

  // Poll until complete (same function — do NOT split into separate nodes)
  while (result.status !== "completed" && result.status !== "failed") {
    await new Promise((r) => setTimeout(r, 5000));
    result = await pf.mcp.callTool(
      "browser-automation",
      "browser-automation_operator_run_continue",
      {
        sessionId: result.sessionId,
        collectionId: result.collectionId,
        logFileName: result.logFileName,
      },
    );
  }

  // Save all output files
  for (const file of result.files || []) {
    await pf.files.writeFileFromUrl(file.fileName, file.url);
  }
  if (result.logContent) {
    await pf.files.writeFile("browser_session.log", result.logContent);
  }

  await pf.files.writeFile("node_scrape_data_output.json", result);
  return result;
}