What can you do with it?
Crawl Map provides the easiest way to go from a single URL to a map of the entire website. This is extremely useful when you need to prompt the end-user to choose which links to scrape, need to quickly know the links on a website, need to scrape pages of a website that are related to a specific topic using the search parameter, or only need to scrape specific pages of a website.
How to use it?
Basic Command Structure
/crawl-map [urls] [includeSubdomains] [ignoreSitemap] [search]
Parameters
Required:
Optional:
-
includeSubdomains
- Boolean to include subdomains in the map (default: true)
-
ignoreSitemap
- Boolean to skip using the sitemap for discovery (default: false)
-
search
- Search term to filter results for specific content (e.g., “Foscarini”)
The command returns:
{
"https://lights.com/": {
"success": true,
"links": [
"https://lights.com",
"https://lights.com/pages/contact",
"https://lights.com/pages/about-us",
"https://lights.com/pages/help",
"https://lights.com/pages/return-policy",
"https://lights.com/pages/warranty-policy"
]
}
}
Examples
Basic Usage
/crawl-map
urls: ["https://lights.com/"]
Maps the entire website starting from the homepage with default settings.
Advanced Usage
/crawl-map
urls: ["https://lights.com/"]
includeSubdomains: true
ignoreSitemap: false
search: "Foscarini"
Maps the website including subdomains, uses the sitemap, and filters results to pages containing “Foscarini”.
Specific Use Case
/crawl-map
urls: ["https://example.com/"]
includeSubdomains: false
search: "products"
Maps only the main domain without subdomains and filters for pages containing “products”.
Notes
The tool uses an internal endpoint SCRAPER_FC_URL + ‘scraperfc/map’ for processing requests. The response includes a success flag and an array of discovered links for each starting URL.
URL Discovery
- Map entire websites
- Follow internal links
- Discover hidden pages
- Generate comprehensive lists
- Identify site structure
Filtering Options
- Include/exclude subdomains
- Search for specific content
- Filter by page types
- Ignore sitemaps option
- Custom crawl parameters
Content Search
- Find pages with keywords
- Filter relevant content
- Topic-based discovery
- Brand-specific pages
- Product searches
Example Commands
Basic Website Map
/crawl-map create full map of "https://example.com/"
Include Subdomains
/crawl-map map "https://company.com/" with all subdomains included
Search-Filtered Map
/crawl-map find all pages mentioning "products" on "https://store.com/"
Ignore Sitemap
/crawl-map map "https://site.com/" without using sitemap data
Multi-URL Mapping
/crawl-map generate maps for multiple starting URLs
Parameters
Required Parameters
- urls: Array of starting URLs
- Must include at least one valid URL
- URLs should be fully qualified (http/https)
Optional Parameters
- includeSubdomains: Include subdomain pages (default: true)
- ignoreSitemap: Skip sitemap.xml parsing (default: false)
- search: Filter pages containing specific keywords
Response Structure
Success Response
{
"https://lights.com/": {
"success": true,
"links": [
"https://lights.com",
"https://lights.com/pages/contact",
"https://lights.com/pages/about-us",
"https://lights.com/pages/help"
]
}
}
Error Handling
- success: Boolean indicating operation status
- links: Array of discovered URLs
- Error messages for failed operations
Use Cases
Web Scraping Preparation
/crawl-map map target site before scraping specific pages
Content Discovery
/crawl-map find all product pages on e-commerce site
Site Auditing
/crawl-map generate complete site structure for analysis
Competitive Research
/crawl-map discover competitor website structure and pages
SEO Analysis
/crawl-map map site to identify all indexable pages
Subdomain Handling
Include Subdomains (true)
- Maps blog.example.com
- Maps shop.example.com
- Maps support.example.com
- Comprehensive coverage
Exclude Subdomains (false)
- Only main domain
- Faster mapping
- Focused results
- Reduced scope
Sitemap Integration
Use Sitemap (ignoreSitemap: false)
- Leverages sitemap.xml
- Faster discovery
- Official page list
- Complete coverage
Ignore Sitemap (ignoreSitemap: true)
- Manual link following
- Discovers unlisted pages
- More thorough crawling
- Hidden content finding
Search Filtering
Keyword Search
- Filter by page content
- Brand mentions
- Product names
- Topic relevance
Search Examples
/crawl-map find "contact" pages on company website
/crawl-map discover "pricing" related pages
/crawl-map locate "support" documentation
Best Practices
-
Start Small
- Test with single URLs first
- Verify results before scaling
- Check site robots.txt
- Respect rate limits
-
Use Filters Wisely
- Apply search terms for focus
- Include subdomains when needed
- Consider sitemap usage
- Balance speed vs completeness
-
Plan Your Scraping
- Map before scraping
- Identify target pages
- Prioritize important content
- Avoid unnecessary pages
-
Monitor Performance
- Large sites take time
- Check for timeouts
- Handle failed URLs
- Validate results
Common Patterns
E-commerce Mapping
/crawl-map find all product pages on online store
Blog Discovery
/crawl-map map blog subdomain for all articles
Documentation Crawl
/crawl-map discover all help and support pages
Brand Research
/crawl-map find pages mentioning specific brand names
Error Handling
Common Issues
- Invalid URLs
- Network timeouts
- Access restrictions
- Large site limits
Best Practices
- Validate URLs before mapping
- Handle partial failures
- Check success flags
- Retry failed operations
Speed Factors
- Site size affects time
- Subdomain inclusion impacts speed
- Search filtering adds processing
- Network conditions matter
Optimization Tips
- Use specific starting URLs
- Apply filters early
- Limit subdomain scope
- Monitor response times
Tips
- Always validate starting URLs before mapping
- Use search parameters to focus on relevant content
- Include subdomains for comprehensive coverage
- Check robots.txt and respect crawling guidelines
- Plan scraping strategy based on discovered URLs