What can you do with it?
Crawl Map provides the easiest way to go from a single URL to a map of the entire website. This is extremely useful when you need to prompt the end-user to choose which links to scrape, need to quickly know the links on a website, need to find pages related to a specific topic using the search parameter, or only need to identify specific pages before scraping. The map endpoint is optimized for speed and returns URLs with titles and descriptions. Supports location settings for geo-specific site mapping.How to use it?
Basic Command Structure
Parameters
Required:urls
- Array of starting URLs (e.g., [“https://lights.com/”])
search
- Search term to filter URLs containing specific contentlimit
- Maximum number of URLs to return (default: 100)includeSubdomains
- Include subdomains in the map (default: false)ignoreSitemap
- Skip sitemap (legacy, converts to sitemap parameter)sitemap
- Sitemap usage: “include”, “skip”, or “only” (default: “include”)
location
- Object with:country
- ISO country code (US, GB, DE, FR, etc.)languages
- Language preferences (e.g., [“en”, “es”])
file_links_expire_in_days
- Days until file links expirefile_links_expire_in_minutes
- Alternative to days
Response Format
For single URL (returns the data object directly):Examples
Basic Usage
Search-Filtered Mapping
Include Subdomains
Sitemap Only
Skip Sitemap
Limited Results
Location-Specific Mapping
Product Search
Notes
- Legacy parameter ignoreSitemap converts to sitemap parameter
- Links include URL, title, and description when available
- Search parameter filters results after discovery
- Map is optimized for speed over completeness
URL Discovery
- Map entire websites
- Follow internal links
- Discover hidden pages
- Generate comprehensive lists
- Identify site structure
Filtering Options
- Include/exclude subdomains
- Search for specific content
- Filter by page types
- Ignore sitemaps option
- Custom crawl parameters
Content Search
- Find pages with keywords
- Filter relevant content
- Topic-based discovery
- Brand-specific pages
- Product searches
Example Commands
Basic Website Map
Include Subdomains
Search-Filtered Map
Ignore Sitemap
Multi-URL Mapping
Parameters
Required Parameters
- urls: Array of starting URLs
- Must include at least one valid URL
- URLs should be fully qualified (http/https)
Optional Parameters
- includeSubdomains: Include subdomain pages (default: true)
- ignoreSitemap: Skip sitemap.xml parsing (default: false)
- search: Filter pages containing specific keywords
Response Structure
Success Response
Error Handling
- success: Boolean indicating operation status
- links: Array of discovered URLs
- Error messages for failed operations
Use Cases
Web Scraping Preparation
Content Discovery
Site Auditing
Competitive Research
SEO Analysis
Subdomain Handling
Include Subdomains (true)
- Maps blog.example.com
- Maps shop.example.com
- Maps support.example.com
- Comprehensive coverage
Exclude Subdomains (false)
- Only main domain
- Faster mapping
- Focused results
- Reduced scope
Sitemap Integration
Use Sitemap (ignoreSitemap: false)
- Leverages sitemap.xml
- Faster discovery
- Official page list
- Complete coverage
Ignore Sitemap (ignoreSitemap: true)
- Manual link following
- Discovers unlisted pages
- More thorough crawling
- Hidden content finding
Search Filtering
Keyword Search
- Filter by page content
- Brand mentions
- Product names
- Topic relevance
Search Examples
Best Practices
-
Start Small
- Test with single URLs first
- Verify results before scaling
- Check site robots.txt
- Respect rate limits
-
Use Filters Wisely
- Apply search terms for focus
- Include subdomains when needed
- Consider sitemap usage
- Balance speed vs completeness
-
Plan Your Scraping
- Map before scraping
- Identify target pages
- Prioritize important content
- Avoid unnecessary pages
-
Monitor Performance
- Large sites take time
- Check for timeouts
- Handle failed URLs
- Validate results
Common Patterns
E-commerce Mapping
Blog Discovery
Documentation Crawl
Brand Research
Error Handling
Common Issues
- Invalid URLs
- Network timeouts
- Access restrictions
- Large site limits
Best Practices
- Validate URLs before mapping
- Handle partial failures
- Check success flags
- Retry failed operations
Performance Considerations
Speed Factors
- Site size affects time
- Subdomain inclusion impacts speed
- Search filtering adds processing
- Network conditions matter
Optimization Tips
- Use specific starting URLs
- Apply filters early
- Limit subdomain scope
- Monitor response times
Tips
- Always validate starting URLs before mapping
- Use search parameters to focus on relevant content
- Include subdomains for comprehensive coverage
- Check robots.txt and respect crawling guidelines
- Plan scraping strategy based on discovered URLs