What can you do with it?

Crawl Map provides the easiest way to go from a single URL to a map of the entire website. This is extremely useful when you need to prompt the end-user to choose which links to scrape, need to quickly know the links on a website, need to scrape pages of a website that are related to a specific topic using the search parameter, or only need to scrape specific pages of a website.

How to use it?

Basic Command Structure

/crawl-map [urls] [includeSubdomains] [ignoreSitemap] [search]

Parameters

Required:

Optional:

  • includeSubdomains - Boolean to include subdomains in the map (default: true)

  • ignoreSitemap - Boolean to skip using the sitemap for discovery (default: false)

  • search - Search term to filter results for specific content (e.g., “Foscarini”)

Response Format

The command returns:

{
  "https://lights.com/": {
    "success": true,
    "links": [
      "https://lights.com",
      "https://lights.com/pages/contact",
      "https://lights.com/pages/about-us",
      "https://lights.com/pages/help",
      "https://lights.com/pages/return-policy",
      "https://lights.com/pages/warranty-policy"
    ]
  }
}

Examples

Basic Usage

/crawl-map
urls: ["https://lights.com/"]

Maps the entire website starting from the homepage with default settings.

Advanced Usage

/crawl-map
urls: ["https://lights.com/"]
includeSubdomains: true
ignoreSitemap: false
search: "Foscarini"

Maps the website including subdomains, uses the sitemap, and filters results to pages containing “Foscarini”.

Specific Use Case

/crawl-map
urls: ["https://example.com/"]
includeSubdomains: false
search: "products"

Maps only the main domain without subdomains and filters for pages containing “products”.

Notes

The tool uses an internal endpoint SCRAPER_FC_URL + ‘scraperfc/map’ for processing requests. The response includes a success flag and an array of discovered links for each starting URL.

URL Discovery

  • Map entire websites
  • Follow internal links
  • Discover hidden pages
  • Generate comprehensive lists
  • Identify site structure

Filtering Options

  • Include/exclude subdomains
  • Search for specific content
  • Filter by page types
  • Ignore sitemaps option
  • Custom crawl parameters
  • Find pages with keywords
  • Filter relevant content
  • Topic-based discovery
  • Brand-specific pages
  • Product searches

Example Commands

Basic Website Map

/crawl-map create full map of "https://example.com/"

Include Subdomains

/crawl-map map "https://company.com/" with all subdomains included

Search-Filtered Map

/crawl-map find all pages mentioning "products" on "https://store.com/"

Ignore Sitemap

/crawl-map map "https://site.com/" without using sitemap data

Multi-URL Mapping

/crawl-map generate maps for multiple starting URLs

Parameters

Required Parameters

  • urls: Array of starting URLs
  • Must include at least one valid URL
  • URLs should be fully qualified (http/https)

Optional Parameters

  • includeSubdomains: Include subdomain pages (default: true)
  • ignoreSitemap: Skip sitemap.xml parsing (default: false)
  • search: Filter pages containing specific keywords

Response Structure

Success Response

{
  "https://lights.com/": {
    "success": true,
    "links": [
      "https://lights.com",
      "https://lights.com/pages/contact",
      "https://lights.com/pages/about-us",
      "https://lights.com/pages/help"
    ]
  }
}

Error Handling

  • success: Boolean indicating operation status
  • links: Array of discovered URLs
  • Error messages for failed operations

Use Cases

Web Scraping Preparation

/crawl-map map target site before scraping specific pages

Content Discovery

/crawl-map find all product pages on e-commerce site

Site Auditing

/crawl-map generate complete site structure for analysis

Competitive Research

/crawl-map discover competitor website structure and pages

SEO Analysis

/crawl-map map site to identify all indexable pages

Subdomain Handling

Include Subdomains (true)

  • Maps blog.example.com
  • Maps shop.example.com
  • Maps support.example.com
  • Comprehensive coverage

Exclude Subdomains (false)

  • Only main domain
  • Faster mapping
  • Focused results
  • Reduced scope

Sitemap Integration

Use Sitemap (ignoreSitemap: false)

  • Leverages sitemap.xml
  • Faster discovery
  • Official page list
  • Complete coverage

Ignore Sitemap (ignoreSitemap: true)

  • Manual link following
  • Discovers unlisted pages
  • More thorough crawling
  • Hidden content finding

Search Filtering

  • Filter by page content
  • Brand mentions
  • Product names
  • Topic relevance

Search Examples

/crawl-map find "contact" pages on company website
/crawl-map discover "pricing" related pages
/crawl-map locate "support" documentation

Best Practices

  1. Start Small

    • Test with single URLs first
    • Verify results before scaling
    • Check site robots.txt
    • Respect rate limits
  2. Use Filters Wisely

    • Apply search terms for focus
    • Include subdomains when needed
    • Consider sitemap usage
    • Balance speed vs completeness
  3. Plan Your Scraping

    • Map before scraping
    • Identify target pages
    • Prioritize important content
    • Avoid unnecessary pages
  4. Monitor Performance

    • Large sites take time
    • Check for timeouts
    • Handle failed URLs
    • Validate results

Common Patterns

E-commerce Mapping

/crawl-map find all product pages on online store

Blog Discovery

/crawl-map map blog subdomain for all articles

Documentation Crawl

/crawl-map discover all help and support pages

Brand Research

/crawl-map find pages mentioning specific brand names

Error Handling

Common Issues

  • Invalid URLs
  • Network timeouts
  • Access restrictions
  • Large site limits

Best Practices

  • Validate URLs before mapping
  • Handle partial failures
  • Check success flags
  • Retry failed operations

Performance Considerations

Speed Factors

  • Site size affects time
  • Subdomain inclusion impacts speed
  • Search filtering adds processing
  • Network conditions matter

Optimization Tips

  • Use specific starting URLs
  • Apply filters early
  • Limit subdomain scope
  • Monitor response times

Tips

  • Always validate starting URLs before mapping
  • Use search parameters to focus on relevant content
  • Include subdomains for comprehensive coverage
  • Check robots.txt and respect crawling guidelines
  • Plan scraping strategy based on discovered URLs