Skip to main content

What can you do with it?

Firecrawl provides three powerful capabilities:
  • Scrape: Extract content from websites in various formats (markdown, HTML, screenshots, etc.) with browser automation
  • Extract: Use AI to extract structured data from multiple URLs using prompts and schemas
  • Search: Search the web and optionally scrape the results
The tool supports browser automation actions, caching for faster repeated operations, location/language settings for geo-specific content, and proxy options for challenging sites.

How to use it?

Choose one of these commands:

Scrape Command

/firecrawl scrape [urls]
Scrapes web pages and returns content in various formats with optional browser automation.

Extract Command

/firecrawl extract [urls] [prompt]
Uses AI to extract structured data from web pages based on your prompt.

Search Command

/firecrawl search [query]
Searches the web and optionally scrapes the results.

SCRAPE Command

Parameters for Scrape

Required: Optional Output Formats:
  • formats - Array of output formats (default: [“markdown”])
    • "markdown" - Clean markdown content
    • "html" - Processed HTML
    • "rawHtml" - Raw HTML source
    • "links" - Extracted links array
    • "summary" - AI-generated summary
    • For screenshots: [{"type": "screenshot", "fullPage": true, "quality": 80}]
    • For JSON extraction: [{"type": "json", "schema": {...}, "prompt": "..."}]
Browser Actions:
  • actions - Array of browser actions to perform:
    • {"type": "wait", "milliseconds": 2000} - Wait specified time
    • {"type": "click", "selector": "button"} - Click element
    • {"type": "write", "text": "search text"} - Type text
    • {"type": "press", "key": "Enter"} - Press keyboard key
    • {"type": "scroll", "selector": "body", "direction": "down"} - Scroll element
    • {"type": "screenshot", "fullPage": true} - Take screenshot
Content Control:
  • onlyMainContent - Extract only main content (default: true)
  • includeTags - HTML tags to include (e.g., [“div”, “p”, “h1”])
  • excludeTags - HTML tags to exclude (e.g., [“script”, “style”])
  • removeBase64Images - Remove base64 encoded images
  • waitFor - Wait milliseconds before scraping
Caching:
  • maxAge - Use cache if younger than this in ms (default: 172800000 - 2 days)
  • storeInCache - Store results in cache (default: true)
Location/Language:
  • location - Object with:
    • country - ISO country code (US, GB, DE, FR, etc.)
    • languages - Language preferences (e.g., [“en”, “es”])
Proxy:
  • proxy - Proxy type: “basic”, “stealth”, or “auto” (default)
File Storage:
  • file_links_expire_in_days - Days until file links expire
  • file_links_expire_in_minutes - Alternative to days

Response Format for Scrape

Our proxy returns URL-keyed results. For single URL, you get the data directly:
{
  "success": true,
  "data": {
    "markdown": "markdown content here...",
    "html": "<html>...</html>",
    "links": ["https://link1.com", "https://link2.com"],
    "screenshot": "https://storage.googleapis.com/...",  // If screenshot format requested
    "actions": {
      "screenshots": ["https://storage.googleapis.com/..."],  // If screenshot action used
      "scrapes": [],
      "javascriptReturns": [],
      "pdfs": []
    },
    "metadata": {
      "title": "Page Title",
      "description": "Page description",
      "language": "en",
      "sourceURL": "https://example.com",
      "url": "https://example.com/",
      "statusCode": 200,
      "contentType": "text/html",
      "proxyUsed": "basic",
      "creditsUsed": 1
    }
  },
  "file_urls": ["https://storage.example.com/screenshot-123.png"]  // Our storage URLs for screenshots
}
For multiple URLs (returns URL-keyed object):
{
  "https://example.com": {
    "success": true,
    "data": { "markdown": "...", "metadata": {...} },
    "file_urls": ["https://storage.example.com/screenshot-456.png"]
  },
  "https://another.com": {
    "success": true,
    "data": { "markdown": "...", "metadata": {...} },
    "file_urls": ["https://storage.example.com/screenshot-789.png"]
  }
}

Scrape Examples

Basic Usage
/firecrawl scrape https://example.com
Scrapes a single URL and returns the content in markdown format by default.

Advanced Usage with Actions

/firecrawl scrape https://example.com with screenshot after scrolling down and waiting 2 seconds
Performs browser actions before scraping the content.

Multiple URLs (Batch Processing)

/firecrawl scrape https://site1.com and https://site2.com and https://site3.com
Our proxy automatically uses batch processing for multiple URLs (3-5x faster).

Multiple Formats

/firecrawl scrape https://example.com get markdown, html, and links
Retrieves content in multiple formats simultaneously.

Cached Scraping

/firecrawl scrape https://news.site.com using cache if less than 1 hour old
Uses cached content if available and fresh (maxAge: 3600000).

Location-Specific Scraping

/firecrawl scrape https://global-site.com from Germany in German language
Scrapes content as if accessing from Germany with German language preference.

Stealth Mode for Protected Sites

/firecrawl scrape https://protected-site.com using stealth proxy
Uses stealth proxy for sites with anti-bot protection.

Complex Browser Automation

/firecrawl scrape https://site.com login by typing "user@email.com" then pressing tab then typing "password" then clicking submit button
Automates login process with multiple browser actions.

EXTRACT Command

Parameters for Extract

Required:
  • urls - Array of URLs to extract from (must be an array)
  • prompt - What to extract (e.g., “Extract product names and prices”)
Optional:
  • schema - JSON schema for structured extraction:
    {
      "type": "object",
      "properties": {
        "products": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "price": {"type": "number"}
            }
          }
        }
      }
    }
    
Scrape Options:
  • formats - Formats to use for extraction (default: [“markdown”])
  • onlyMainContent - Extract only main content
  • proxy - Proxy type: “basic”, “stealth”, or “auto”
  • file_links_expire_in_days - Days until file links expire

Response Format for Extract

Extract returns structured data with job completion info:
{
  "success": true,
  "status": "completed",
  "data": {
    "products": [
      {
        "name": "Product 1",
        "price": 29.99
      },
      {
        "name": "Product 2", 
        "price": 49.99
      }
    ]
  },
  "tokensUsed": 348,
  "expiresAt": "2024-01-15T10:30:00Z"
}

Extract Examples

Basic Extraction

/firecrawl extract product information from https://shop.com/products
Extracts data using AI based on the prompt.

Multiple URLs (Efficient Processing)

/firecrawl extract prices from https://shop1.com and https://shop2.com
Extract endpoint processes multiple URLs in one call (more efficient than individual scrape calls).

With Schema

/firecrawl extract products with name and price schema from https://store.com
Uses structured schema for consistent extraction.

Complex Extraction

/firecrawl extract company details, contact info, and services from https://company.com
Extracts multiple types of information.

SEARCH Command

Required:
  • query - Search query (e.g., “web scraping tools”)
Optional Search Filters:
  • limit - Maximum results to return (default: 10)
  • sources - Result types: [“web”, “news”, “images”]
  • tbs - Time filter:
    • "qdr:h" - Past hour
    • "qdr:d" - Past day
    • "qdr:w" - Past week
    • "qdr:m" - Past month
    • "qdr:y" - Past year
    • Custom: "cdr:1,cd_min:12/1/2024,cd_max:12/31/2024"
  • location - Search from this location (e.g., “United States”)
Scrape Options:
  • scrapeOptions - Object to scrape search results:
    • formats - Output formats for scraped results
    • onlyMainContent - Extract only main content
    • maxAge - Use cache if younger than this (ms)
Other:
  • timeout - Timeout in milliseconds
  • file_links_expire_in_days - Days until file links expire
Search returns web search results with scraped content:
{
  "success": true,
  "data": {
    "web": [
      {
        "url": "https://example.com",
        "title": "Example Title",
        "description": "Example description of the page",
        "position": 1,
        "markdown": "Scraped content if scrapeOptions used..."
      },
      {
        "url": "https://another.com",
        "title": "Another Result",
        "description": "Another description",
        "position": 2
      }
    ]
  },
  "creditsUsed": 5,
  "file_urls": []
}

Search Examples

/firecrawl search for "web scraping tools"
Searches the web for the specified query.
/firecrawl search news about "AI developments" from past week
Searches news sources with time filtering.
/firecrawl search images of "modern architecture"
Searches for images matching the query.

Search and Scrape

/firecrawl search "Python tutorials" and scrape top 5 results
Searches and automatically scrapes the results.
/firecrawl search "restaurants" from Japan in Japanese
Searches from specific location with language preference.
/firecrawl search "tech news" from past 24 hours
Searches with time constraints.

Performance & Best Practices

Automatic Optimizations:
  • Our proxy automatically uses batch processing for multiple URLs (3-5x faster than sequential)
  • Conditional screenshots: Only processed when explicitly requested (30x performance improvement)
  • Extract endpoint can process multiple URLs in one call (~7-20 seconds for complex extractions)
E-commerce Extraction Strategy:
  • Homepage URLs (like https://store.com/) show 4-10 featured products, promotional content
  • Category URLs (like https://store.com/products/) show full product catalogs
  • Use MAP endpoint first to discover the right URLs for comprehensive product extraction
  • For dynamic content: Add waitFor: 3000 and scroll actions to load JavaScript content
Screenshot Guidelines:
  • Screenshots add significant processing time - only request when needed
  • Use [{"type": "screenshot", "fullPage": true}] format for full page captures
  • Screenshots are automatically uploaded to our storage and returned as signed URLs in file_urls
Other Notes:
  • Legacy parameters are auto-converted (maxDepth→maxDiscoveryDepth, allowBackwardLinks→allowExternalLinks, ignoreSitemap→sitemap)
  • Extract endpoint polls job to completion automatically
  • Search can optionally scrape results using scrapeOptions

Content Extraction

  • Scrape multiple URLs
  • Extract main content
  • Remove unwanted elements
  • Clean HTML output
  • Convert to markdown

Output Formats

  • Markdown (default)
  • HTML (processed)
  • Raw HTML
  • JSON structure
  • Screenshots
  • Link extraction

Browser Actions

  • Take screenshots
  • Scroll pages
  • Wait for content
  • Click elements
  • Full page capture

Content Filtering

  • Include specific tags
  • Exclude unwanted tags
  • Main content only
  • Remove base64 images

Example Commands

Basic Scrape

/firecrawl scrape https://example.com

Multiple URLs

/firecrawl scrape https://site1.com and https://site2.com as markdown

With Screenshot

/firecrawl capture https://example.com with full page screenshot
/firecrawl get all links from https://example.com

Custom Tags

/firecrawl scrape https://example.com including only div, p, h1, h2 tags

Configuration Options

Output Formats

  • markdown: Clean markdown content
  • html: Processed HTML
  • rawHtml: Original HTML
  • links: Array of links
  • screenshot: Base64 image
  • json: Structured data

Browser Actions

{
  "type": "screenshot",
  "fullPage": true
}
{
  "type": "scroll",
  "direction": "down"
}

Wait Options

  • Default: 1000ms
  • Custom: specify milliseconds
  • Ensures content loads

Tag Filtering

Include Tags

  • div, p, h1, h2
  • Article content tags
  • Custom selections

Exclude Tags

  • script, style, noscript
  • Ads and tracking
  • Unwanted elements

Response Data

Success Response

  • Scraped content
  • Metadata (title, language)
  • Source URL
  • Status code
  • File URLs

Metadata Includes

  • Page title
  • Language
  • Referrer
  • Scrape ID
  • Status code

Tips

  • Use markdown format for clean text
  • Enable screenshots for visual content
  • Filter tags for cleaner output
  • Set appropriate wait times for dynamic content
I