Skip to main content

What can you do with it?

Crawl Map provides the easiest way to go from a single URL to a map of the entire website. This is extremely useful when you need to prompt the end-user to choose which links to scrape, need to quickly know the links on a website, need to find pages related to a specific topic using the search parameter, or only need to identify specific pages before scraping. The map endpoint is optimized for speed and returns URLs with titles and descriptions. Supports location settings for geo-specific site mapping.

How to use it?

Basic Command Structure

/crawl-map [urls] [includeSubdomains] [ignoreSitemap] [search]

Parameters

Required: Optional Map Control:
  • search - Search term to filter URLs containing specific content
  • limit - Maximum number of URLs to return (default: 100)
  • includeSubdomains - Include subdomains in the map (default: false)
  • ignoreSitemap - Skip sitemap (legacy, converts to sitemap parameter)
  • sitemap - Sitemap usage: “include”, “skip”, or “only” (default: “include”)
Location Settings:
  • location - Object with:
    • country - ISO country code (US, GB, DE, FR, etc.)
    • languages - Language preferences (e.g., [“en”, “es”])
File Storage:
  • file_links_expire_in_days - Days until file links expire
  • file_links_expire_in_minutes - Alternative to days

Response Format

For single URL (returns the data object directly):
{
  "success": true,
  "links": [
    {
      "url": "https://lights.com",
      "title": "Lights - Modern Lighting Store",
      "description": "Premium lighting solutions for your home"
    },
    {
      "url": "https://lights.com/pages/contact",
      "title": "Contact Us",
      "description": "Get in touch with our lighting experts"
    },
    {
      "url": "https://lights.com/products/foscarini-twiggy",
      "title": "Foscarini Twiggy Floor Lamp",
      "description": "Iconic Italian design floor lamp"
    }
  ],
  "scrapeId": "abc-123-def",
  "metadata": {
    "statusCode": 200
  }
}
For multiple URLs (returns array of results):
[
  {
    "url": "https://lights.com",
    "success": true,
    "links": [...],
    "metadata": {...}
  },
  {
    "url": "https://another.com",
    "success": true,
    "links": [...],
    "metadata": {...}
  }
]

Examples

Basic Usage

/crawl-map map https://lights.com/
Maps the entire website starting from the homepage with default settings.

Search-Filtered Mapping

/crawl-map find all pages about "Foscarini" on https://lights.com/
Maps the website and filters results to pages containing “Foscarini”.

Include Subdomains

/crawl-map map https://example.com/ including all subdomains
Maps the main domain and all subdomains like blog.example.com, shop.example.com.

Sitemap Only

/crawl-map get sitemap urls only from https://example.com/
Returns only URLs found in the sitemap.xml file (sitemap: “only”).

Skip Sitemap

/crawl-map map https://example.com/ without using sitemap
Discovers pages by following links only, ignoring sitemap.xml.

Limited Results

/crawl-map map https://large-site.com/ limit to 500 urls
Returns maximum 500 URLs even if more exist.

Location-Specific Mapping

/crawl-map map https://global-site.com/ from Germany
Maps the site as if accessing from Germany.
/crawl-map find all product pages on https://store.com/ searching for "electronics"
Maps e-commerce site filtering for electronics products.

Notes

  • Legacy parameter ignoreSitemap converts to sitemap parameter
  • Links include URL, title, and description when available
  • Search parameter filters results after discovery
  • Map is optimized for speed over completeness

URL Discovery

  • Map entire websites
  • Follow internal links
  • Discover hidden pages
  • Generate comprehensive lists
  • Identify site structure

Filtering Options

  • Include/exclude subdomains
  • Search for specific content
  • Filter by page types
  • Ignore sitemaps option
  • Custom crawl parameters
  • Find pages with keywords
  • Filter relevant content
  • Topic-based discovery
  • Brand-specific pages
  • Product searches

Example Commands

Basic Website Map

/crawl-map create full map of "https://example.com/"

Include Subdomains

/crawl-map map "https://company.com/" with all subdomains included

Search-Filtered Map

/crawl-map find all pages mentioning "products" on "https://store.com/"

Ignore Sitemap

/crawl-map map "https://site.com/" without using sitemap data

Multi-URL Mapping

/crawl-map generate maps for multiple starting URLs

Parameters

Required Parameters

  • urls: Array of starting URLs
  • Must include at least one valid URL
  • URLs should be fully qualified (http/https)

Optional Parameters

  • includeSubdomains: Include subdomain pages (default: true)
  • ignoreSitemap: Skip sitemap.xml parsing (default: false)
  • search: Filter pages containing specific keywords

Response Structure

Success Response

{
  "https://lights.com/": {
    "success": true,
    "links": [
      "https://lights.com",
      "https://lights.com/pages/contact",
      "https://lights.com/pages/about-us",
      "https://lights.com/pages/help"
    ]
  }
}

Error Handling

  • success: Boolean indicating operation status
  • links: Array of discovered URLs
  • Error messages for failed operations

Use Cases

Web Scraping Preparation

/crawl-map map target site before scraping specific pages

Content Discovery

/crawl-map find all product pages on e-commerce site

Site Auditing

/crawl-map generate complete site structure for analysis

Competitive Research

/crawl-map discover competitor website structure and pages

SEO Analysis

/crawl-map map site to identify all indexable pages

Subdomain Handling

Include Subdomains (true)

  • Maps blog.example.com
  • Maps shop.example.com
  • Maps support.example.com
  • Comprehensive coverage

Exclude Subdomains (false)

  • Only main domain
  • Faster mapping
  • Focused results
  • Reduced scope

Sitemap Integration

Use Sitemap (ignoreSitemap: false)

  • Leverages sitemap.xml
  • Faster discovery
  • Official page list
  • Complete coverage

Ignore Sitemap (ignoreSitemap: true)

  • Manual link following
  • Discovers unlisted pages
  • More thorough crawling
  • Hidden content finding

Search Filtering

  • Filter by page content
  • Brand mentions
  • Product names
  • Topic relevance

Search Examples

/crawl-map find "contact" pages on company website
/crawl-map discover "pricing" related pages
/crawl-map locate "support" documentation

Best Practices

  1. Start Small
    • Test with single URLs first
    • Verify results before scaling
    • Check site robots.txt
    • Respect rate limits
  2. Use Filters Wisely
    • Apply search terms for focus
    • Include subdomains when needed
    • Consider sitemap usage
    • Balance speed vs completeness
  3. Plan Your Scraping
    • Map before scraping
    • Identify target pages
    • Prioritize important content
    • Avoid unnecessary pages
  4. Monitor Performance
    • Large sites take time
    • Check for timeouts
    • Handle failed URLs
    • Validate results

Common Patterns

E-commerce Mapping

/crawl-map find all product pages on online store

Blog Discovery

/crawl-map map blog subdomain for all articles

Documentation Crawl

/crawl-map discover all help and support pages

Brand Research

/crawl-map find pages mentioning specific brand names

Error Handling

Common Issues

  • Invalid URLs
  • Network timeouts
  • Access restrictions
  • Large site limits

Best Practices

  • Validate URLs before mapping
  • Handle partial failures
  • Check success flags
  • Retry failed operations

Performance Considerations

Speed Factors

  • Site size affects time
  • Subdomain inclusion impacts speed
  • Search filtering adds processing
  • Network conditions matter

Optimization Tips

  • Use specific starting URLs
  • Apply filters early
  • Limit subdomain scope
  • Monitor response times

Tips

  • Always validate starting URLs before mapping
  • Use search parameters to focus on relevant content
  • Include subdomains for comprehensive coverage
  • Check robots.txt and respect crawling guidelines
  • Plan scraping strategy based on discovered URLs
I