Similarity Search
Compare and rank similar strings using Levenshtein distance algorithm for fuzzy string matching and text similarity analysis.
Overview
The Similarity Search skill provides functionality for:
- Finding closest string matches using Levenshtein distance
- Ranking results by similarity score
- Smart extraction for path-like structures
- Configurable result filtering and limits
- Pattern exclusion for refined matching
Connection Requirements
This skill uses an internal helper service and doesn’t require external connections.
Basic Usage
// Simple similarity search
const searchRequest = {
"searchTerm": "example search",
"items": [
"example text",
"sample search",
"different content",
"example search match"
],
"maxResults": 3
};
Key Features
String Matching
- Levenshtein Distance: Calculate edit distance between strings
- Ranked Results: Sort matches by similarity score (lower is better)
- Configurable Limits: Control maximum number of results returned
- Full String Comparison: Compare entire strings for similarity
- Path Processing: Extract relevant parts from path-like structures
- Pattern Exclusion: Exclude specific patterns during extraction
- Custom Splitting: Configure split characters for path parsing
- Intelligent Matching: Focus on meaningful path components
Common Operations
Basic Similarity Search
POST: /similarity-search
{
"searchTerm": "lorem ipsum text",
"items": [
"lorem ipsum sample",
"different content",
"ipsum lorem variation",
"completely unrelated"
],
"maxResults": 3
}
POST: /similarity-search
{
"searchTerm": "product catalog",
"items": [
"website/pages/product-catalog/main.html",
"website/pages/user-profile/settings.html",
"website/pages/product-catalog-v2/index.html",
"mobile/screens/catalog/product-list.js"
],
"useSmartExtraction": true,
"excludePatterns": ["html", "js", "v2"],
"splitValue": "/",
"maxResults": 5
}
Configuration Options
Required Parameters
- searchTerm: The string you want to find matches for
- items: Array of strings to search through
Optional Parameters
- useSmartExtraction (default: false): Enable path-aware extraction
- excludePatterns (default: []): Patterns to exclude during extraction
- splitValue (default: ’/’): Character for splitting paths
- maxResults (default: 5): Maximum results to return
Response Structure
Basic Response
{
"searchTerm": "example search",
"results": [
{
"item": "example search match",
"distance": 5
},
{
"item": "example text",
"distance": 8
},
{
"item": "sample search",
"distance": 10
}
],
"config": {
"useSmartExtraction": false,
"excludePatterns": [],
"splitValue": "",
"maxResults": 3
}
}
{
"searchTerm": "product catalog",
"results": [
{
"item": "website/pages/product-catalog/main.html",
"distance": 0
},
{
"item": "website/pages/product-catalog-v2/index.html",
"distance": 2
},
{
"item": "mobile/screens/catalog/product-list.js",
"distance": 8
}
],
"config": {
"useSmartExtraction": true,
"excludePatterns": ["html", "js", "v2"],
"splitValue": "/",
"maxResults": 5
}
}
Distance Scoring
Levenshtein Distance
- 0: Exact match
- 1-5: Very similar (minor differences)
- 6-10: Moderately similar
- 11-20: Somewhat similar
- 21+: Low similarity
Interpretation
- Lower distance scores indicate higher similarity
- Distance represents minimum number of edits needed to transform one string into another
- Edits include insertions, deletions, and substitutions
File Paths
// Finding similar file paths
"components/Button.tsx" → matches "src/components/Button.tsx"
URL Paths
// Finding similar routes
"api/users/profile" → matches "api/v1/users/profile"
Category Hierarchies
// Finding similar categories
"electronics/phones" → matches "category/electronics/smartphones"
Important Notes
- Case Sensitivity: Matching is case-sensitive by default
- Unicode Support: Handles Unicode characters properly
- Performance: Optimized for moderate-sized string arrays
- Memory Usage: Consider memory limits for very large item arrays
- Smart Extraction: Only use when dealing with path-like structures
Best Practices
- Appropriate Limits: Set reasonable maxResults to avoid overwhelming responses
- Smart Extraction: Only enable for path-like data structures
- Pattern Exclusion: Use excludePatterns to filter out noise (file extensions, version numbers)
- Preprocessing: Clean input data for better matching results
- Threshold Filtering: Consider filtering results by distance threshold
- Performance: For large datasets, consider batching requests
- Validation: Validate input arrays are not empty before processing
Responses are generated using AI and may contain mistakes.