Crawling Guide

What can you do with it?

With Crawling, crawl a website to retrieve the contents of pages. Systematically extract content from multiple pages by following links, perfect for comprehensive data collection, documentation scraping, or content migration. Features configurable depth controls, path filtering, subdomain inclusion, and the ability to apply scraping options (formats, proxy, caching) to all crawled pages.

How to use it?

Basic Command Structure

/crawling [urls] [limit]

Parameters

Required:

urls - Array of starting URLs to crawl

Optional Crawl Control:

limit - Maximum number of pages to crawl (default: 10)
maxDepth - Max depth to crawl (legacy, converts to maxDiscoveryDepth)
maxDiscoveryDepth - How many link levels deep to discover
allowBackwardLinks - Allow external domains (legacy, converts to allowExternalLinks)
allowExternalLinks - Crawl external domains
crawlEntireDomain - Crawl entire domain, not just children of URL
allowSubdomains - Include subdomains in crawl

Path Filtering:

includePaths - Only crawl URLs matching these patterns (e.g., [“/blog/*”])
excludePaths - Skip URLs matching these patterns (e.g., [“/admin/*”])

Scrape Options (Applied to All Pages):

scrapeOptions - Object containing any scrape parameters:
- formats - Output formats ([“markdown”, “html”, “links”, etc.])
- onlyMainContent - Extract only main content
- proxy - Proxy type (“basic”, “stealth”, “auto”)
- maxAge - Use cache if younger than this (milliseconds)
- location - Location settings for geo-specific content
- Any other scrape parameter

File Storage:

file_links_expire_in_days - Days until file links expire
file_links_expire_in_minutes - Alternative to days

Response Format

{
  "status": "completed",
  "total": 4,
  "completed": 4,
  "creditsUsed": 4,
  "results": [
    {
      "url": "https://example.com/page1",
      "data": {
        "markdown": "# Page Title\n\nContent in markdown...",
        "html": "<html>...</html>",
        "links": ["https://link1.com", "https://link2.com"],
        "metadata": {
          "title": "Page Title",
          "description": "Page description",
          "language": "en",
          "sourceURL": "https://example.com/page1",
          "url": "https://example.com/page1",
          "statusCode": 200
        }
      }
    },
    {
      "url": "https://example.com/page2",
      "data": {
        "markdown": "# Another Page\n\nMore content...",
        "metadata": {
          "title": "Another Page",
          "sourceURL": "https://example.com/page2",
          "url": "https://example.com/page2",
          "statusCode": 200
        }
      }
    }
  ]
}

Examples

Basic Usage

/crawling crawl https://docs.example.com up to 20 pages

Crawl a documentation site up to 20 pages.

Path Filtering

/crawling crawl https://example.com including only docs and guides folders, excluding blog and news

Crawl with path filtering to focus on specific sections.

Entire Domain Crawl

/crawling crawl entire domain of https://example.com including subdomains

Crawls the entire domain and all subdomains, not just children of the URL.

With Caching

/crawling crawl https://site.com with cached results if less than 1 hour old

Uses cached content when available (scrapeOptions with maxAge).

Stealth Mode Crawl

/crawling crawl https://protected-site.com using stealth proxy for all pages

Applies stealth proxy to all crawled pages.

Multi-Format Output

/crawling crawl https://docs.site.com getting markdown, html, and screenshots for each page

Retrieves multiple formats for each crawled page.

Location-Specific Crawl

/crawling crawl https://global-site.com from Japan in Japanese

Crawls as if accessing from Japan with Japanese language preference.

Deep Crawl with Limits

/crawling crawl https://site.com up to depth 5 maximum 100 pages

Controls crawl depth and total page limit.

Notes

Polls job to completion automatically
Legacy parameters are auto-converted (maxDepth→maxDiscoveryDepth, allowBackwardLinks→allowExternalLinks)
Screenshots are uploaded to storage and returned as file_urls
Supports glob patterns for path filtering (e.g., “/docs/”, “/blog/”)
All scrape parameters can be applied to crawled pages via scrapeOptions
Each page crawled uses one credit

Get Started

Organization

Agents

Workflows

Resources

Integrations

Orchestration

Credits & Pricing

Skills

How To Guides

Release Notes

Support

What can you do with it?

How to use it?

Basic Command Structure

Parameters

Response Format

Examples

Basic Usage

Path Filtering

Entire Domain Crawl

With Caching

Stealth Mode Crawl

Multi-Format Output

Location-Specific Crawl

Deep Crawl with Limits

Notes

Get Started

Organization

Agents

Workflows

Resources

Integrations

Orchestration

Credits & Pricing

Skills

How To Guides

Release Notes

Support

​What can you do with it?

​How to use it?

​Basic Command Structure

​Parameters

​Response Format

​Examples

​Basic Usage

​Path Filtering

​Entire Domain Crawl

​With Caching

​Stealth Mode Crawl

​Multi-Format Output

​Location-Specific Crawl

​Deep Crawl with Limits

​Notes

What can you do with it?

How to use it?

Basic Command Structure

Parameters

Response Format

Examples

Basic Usage

Path Filtering

Entire Domain Crawl

With Caching

Stealth Mode Crawl

Multi-Format Output

Location-Specific Crawl

Deep Crawl with Limits

Notes