What can you do with it?
With Crawling, crawl a website to retrieve the contents of pages. Systematically extract content from multiple pages by following links, perfect for comprehensive data collection, documentation scraping, or content migration. Features configurable depth controls, path filtering, subdomain inclusion, and the ability to apply scraping options (formats, proxy, caching) to all crawled pages.How to use it?
Basic Command Structure
Parameters
Required:urls
- Array of starting URLs to crawl
limit
- Maximum number of pages to crawl (default: 10)maxDepth
- Max depth to crawl (legacy, converts to maxDiscoveryDepth)maxDiscoveryDepth
- How many link levels deep to discoverallowBackwardLinks
- Allow external domains (legacy, converts to allowExternalLinks)allowExternalLinks
- Crawl external domainscrawlEntireDomain
- Crawl entire domain, not just children of URLallowSubdomains
- Include subdomains in crawl
includePaths
- Only crawl URLs matching these patterns (e.g., [“/blog/*”])excludePaths
- Skip URLs matching these patterns (e.g., [“/admin/*”])
scrapeOptions
- Object containing any scrape parameters:formats
- Output formats ([“markdown”, “html”, “links”, etc.])onlyMainContent
- Extract only main contentproxy
- Proxy type (“basic”, “stealth”, “auto”)maxAge
- Use cache if younger than this (milliseconds)location
- Location settings for geo-specific content- Any other scrape parameter
file_links_expire_in_days
- Days until file links expirefile_links_expire_in_minutes
- Alternative to days
Response Format
Examples
Basic Usage
Path Filtering
Entire Domain Crawl
With Caching
Stealth Mode Crawl
Multi-Format Output
Location-Specific Crawl
Deep Crawl with Limits
Notes
- Polls job to completion automatically
- Legacy parameters are auto-converted (maxDepth→maxDiscoveryDepth, allowBackwardLinks→allowExternalLinks)
- Screenshots are uploaded to storage and returned as file_urls
- Supports glob patterns for path filtering (e.g., “/docs/”, “/blog/”)
- All scrape parameters can be applied to crawled pages via scrapeOptions
- Each page crawled uses one credit