What can you do with it?
Firecrawl provides three powerful capabilities:- Scrape: Extract content from websites in various formats (markdown, HTML, screenshots, etc.) with browser automation
- Extract: Use AI to extract structured data from multiple URLs using prompts and schemas
- Search: Search the web and optionally scrape the results
How to use it?
Choose one of these commands:Scrape Command
Extract Command
Search Command
SCRAPE Command
Parameters for Scrape
Required:urls
- Array of URLs to scrape (e.g., [“https://example.com”, “https://another-example.com”])
formats
- Array of output formats (default: [“markdown”])"markdown"
- Clean markdown content"html"
- Processed HTML"rawHtml"
- Raw HTML source"links"
- Extracted links array"summary"
- AI-generated summary- For screenshots:
[{"type": "screenshot", "fullPage": true, "quality": 80}]
- For JSON extraction:
[{"type": "json", "schema": {...}, "prompt": "..."}]
actions
- Array of browser actions to perform:{"type": "wait", "milliseconds": 2000}
- Wait specified time{"type": "click", "selector": "button"}
- Click element{"type": "write", "text": "search text"}
- Type text{"type": "press", "key": "Enter"}
- Press keyboard key{"type": "scroll", "selector": "body", "direction": "down"}
- Scroll element{"type": "screenshot", "fullPage": true}
- Take screenshot
onlyMainContent
- Extract only main content (default: true)includeTags
- HTML tags to include (e.g., [“div”, “p”, “h1”])excludeTags
- HTML tags to exclude (e.g., [“script”, “style”])removeBase64Images
- Remove base64 encoded imageswaitFor
- Wait milliseconds before scraping
maxAge
- Use cache if younger than this in ms (default: 172800000 - 2 days)storeInCache
- Store results in cache (default: true)
location
- Object with:country
- ISO country code (US, GB, DE, FR, etc.)languages
- Language preferences (e.g., [“en”, “es”])
proxy
- Proxy type: “basic”, “stealth”, or “auto” (default)
file_links_expire_in_days
- Days until file links expirefile_links_expire_in_minutes
- Alternative to days
Response Format for Scrape
Our proxy returns URL-keyed results. For single URL, you get the data directly:Scrape Examples
Basic Usage
Advanced Usage with Actions
Multiple URLs (Batch Processing)
Multiple Formats
Cached Scraping
Location-Specific Scraping
Stealth Mode for Protected Sites
Complex Browser Automation
EXTRACT Command
Parameters for Extract
Required:urls
- Array of URLs to extract from (must be an array)prompt
- What to extract (e.g., “Extract product names and prices”)
schema
- JSON schema for structured extraction:
formats
- Formats to use for extraction (default: [“markdown”])onlyMainContent
- Extract only main contentproxy
- Proxy type: “basic”, “stealth”, or “auto”file_links_expire_in_days
- Days until file links expire
Response Format for Extract
Extract returns structured data with job completion info:Extract Examples
Basic Extraction
Multiple URLs (Efficient Processing)
With Schema
Complex Extraction
SEARCH Command
Parameters for Search
Required:query
- Search query (e.g., “web scraping tools”)
limit
- Maximum results to return (default: 10)sources
- Result types: [“web”, “news”, “images”]tbs
- Time filter:"qdr:h"
- Past hour"qdr:d"
- Past day"qdr:w"
- Past week"qdr:m"
- Past month"qdr:y"
- Past year- Custom:
"cdr:1,cd_min:12/1/2024,cd_max:12/31/2024"
location
- Search from this location (e.g., “United States”)
scrapeOptions
- Object to scrape search results:formats
- Output formats for scraped resultsonlyMainContent
- Extract only main contentmaxAge
- Use cache if younger than this (ms)
timeout
- Timeout in millisecondsfile_links_expire_in_days
- Days until file links expire
Response Format for Search
Search returns web search results with scraped content:Search Examples
Basic Search
News Search
Image Search
Search and Scrape
Location-Based Search
Time-Filtered Search
Performance & Best Practices
Automatic Optimizations:- Our proxy automatically uses batch processing for multiple URLs (3-5x faster than sequential)
- Conditional screenshots: Only processed when explicitly requested (30x performance improvement)
- Extract endpoint can process multiple URLs in one call (~7-20 seconds for complex extractions)
- Homepage URLs (like https://store.com/) show 4-10 featured products, promotional content
- Category URLs (like https://store.com/products/) show full product catalogs
- Use MAP endpoint first to discover the right URLs for comprehensive product extraction
- For dynamic content: Add
waitFor: 3000
and scroll actions to load JavaScript content
- Screenshots add significant processing time - only request when needed
- Use
[{"type": "screenshot", "fullPage": true}]
format for full page captures - Screenshots are automatically uploaded to our storage and returned as signed URLs in
file_urls
- Legacy parameters are auto-converted (maxDepth→maxDiscoveryDepth, allowBackwardLinks→allowExternalLinks, ignoreSitemap→sitemap)
- Extract endpoint polls job to completion automatically
- Search can optionally scrape results using scrapeOptions
Content Extraction
- Scrape multiple URLs
- Extract main content
- Remove unwanted elements
- Clean HTML output
- Convert to markdown
Output Formats
- Markdown (default)
- HTML (processed)
- Raw HTML
- JSON structure
- Screenshots
- Link extraction
Browser Actions
- Take screenshots
- Scroll pages
- Wait for content
- Click elements
- Full page capture
Content Filtering
- Include specific tags
- Exclude unwanted tags
- Main content only
- Remove base64 images
Example Commands
Basic Scrape
Multiple URLs
With Screenshot
Extract Links
Custom Tags
Configuration Options
Output Formats
markdown
: Clean markdown contenthtml
: Processed HTMLrawHtml
: Original HTMLlinks
: Array of linksscreenshot
: Base64 imagejson
: Structured data
Browser Actions
Wait Options
- Default: 1000ms
- Custom: specify milliseconds
- Ensures content loads
Tag Filtering
Include Tags
div
,p
,h1
,h2
- Article content tags
- Custom selections
Exclude Tags
script
,style
,noscript
- Ads and tracking
- Unwanted elements
Response Data
Success Response
- Scraped content
- Metadata (title, language)
- Source URL
- Status code
- File URLs
Metadata Includes
- Page title
- Language
- Referrer
- Scrape ID
- Status code
Tips
- Use markdown format for clean text
- Enable screenshots for visual content
- Filter tags for cleaner output
- Set appropriate wait times for dynamic content