What can you do with it?

With Crawling, crawl a website to retrieve the contents of pages. Systematically extract content from multiple pages by following links, perfect for comprehensive data collection, documentation scraping, or content migration with configurable depth and path controls.

How to use it?

Basic Command Structure

/crawling [urls] [limit]

Parameters

Required:
  • urls - Array of URLs to crawl
  • limit - Maximum number of pages to crawl (default: 10)
Optional:
  • maxDepth - Maximum depth of links to follow
  • excludePaths - Paths to exclude from crawling (glob patterns)
  • includePaths - Only include these paths when crawling (glob patterns)
  • allowBackwardLinks - Allow crawling links that aren’t direct children of the provided URL
  • scrapeOptions - Configure output formats (markdown, html)
  • webhook - URL to receive webhook events for crawl updates

Response Format

The command returns:
{
  "url": {
    "status": "completed",
    "total": 4,
    "completed": 4,
    "creditsUsed": 4,
    "data": [{
      "markdown": "content in markdown",
      "html": "content in HTML",
      "metadata": {
        "title": "page title",
        "sourceURL": "original URL"
      }
    }]
  }
}

Examples

Basic Usage

/crawling
urls: ["https://docs.example.com"]
limit: 20
Crawl a documentation site up to 20 pages.

Advanced Usage

/crawling
urls: ["https://example.com"]
limit: 50
includePaths: ["docs/*", "guides/*"]
excludePaths: ["blog/*", "news/*"]
Crawl with path filtering to focus on specific sections.

Specific Use Case

/crawling
urls: ["https://old-site.example.com"]
limit: 100
maxDepth: 5
scrapeOptions: {formats: ["markdown", "html"]}
Deep crawl for content migration with multiple output formats.

Notes

Supports glob patterns for path filtering (e.g., “docs/”, “blog/”). Output formats include markdown and HTML. Each page crawled uses one credit. Webhook integration available for progress monitoring. Maximum depth control prevents infinite crawling.