Semantic Search
AI-powered vector similarity with automatic embeddings
Full-Text Search
PostgreSQL tsvector with ranking and typo tolerance
Auto-Embeddings
Documents automatically embedded on index
Metadata Filters
Faceted filtering by category, type, and tags
Overview
Sylphx Search combines two complementary search paradigms into a single unified API. Full-text search uses PostgreSQL tsvector indexing with trigram-based typo tolerance for fast, precise keyword matching. Semantic search uses vector extensions with HNSW indexes and 1536-dimensional embeddings (text-embedding-3-small via OpenRouter) to find results based on meaning, not just keywords. Hybrid mode combines both approaches with reciprocal rank fusion for the highest relevance.
Documents are automatically embedded when indexed -- no manual embedding pipeline needed. All embedding costs are tracked through OpenRouter and billed transparently.
Keyword Search
Best for exact terms, product names, SKUs, and technical jargon. Uses PostgreSQL full-text indexing with relevance ranking.
Semantic Search
Best for natural language queries. Understands synonyms, related concepts, and intent even when exact words do not match.
Hybrid Search
Combines keyword and semantic ranking with reciprocal rank fusion. Recommended for most production use cases.
Quick Start
Index a document and search it in three steps:
import { platform } from '@/lib/platform'
// Index a document — embedding is generated automatically
await platform.search.index({
namespace: 'docs',
externalId: 'getting-started',
title: 'Getting Started with Sylphx',
content: 'Learn how to set up authentication, analytics, and more in under 5 minutes.',
metadata: { category: 'guides', author: 'team' },
})Indexing Documents
When you index a document, Sylphx stores the content for full-text search and automatically generates a 1536-dimensional vector embedding for semantic search. Each document belongs to a namespace for data isolation.
await platform.search.index({
// Namespace isolates documents into separate search domains
namespace: 'products',
// Your system's ID — used for upserts and lookups
externalId: 'prod_wireless_hp',
// Title is weighted higher in full-text ranking
title: 'Wireless Noise-Canceling Headphones',
// Main searchable content
content: 'Premium over-ear headphones with active noise cancellation, 30-hour battery life, and multipoint Bluetooth connectivity.',
// URL for linking back to the source
url: '/products/wireless-hp',
// Structured metadata for filtering
metadata: {
price: 299.99,
brand: 'AudioPro',
inStock: true,
},
// Pre-defined facets for fast filtering
category: 'electronics',
type: 'headphones',
tags: ['audio', 'wireless', 'noise-canceling'],
})For large datasets, use batch indexing to index up to 100 documents in a single request:
// Index multiple documents in one request
await platform.search.batchIndex({
namespace: 'products',
documents: [
{
externalId: 'prod_001',
title: 'Wireless Headphones',
content: 'Premium noise-canceling over-ear headphones.',
category: 'electronics',
tags: ['audio', 'wireless'],
},
{
externalId: 'prod_002',
title: 'Mechanical Keyboard',
content: 'Compact 75% layout with hot-swappable switches.',
category: 'electronics',
tags: ['keyboard', 'mechanical'],
},
{
externalId: 'prod_003',
title: 'Ultrawide Monitor',
content: '34-inch curved ultrawide with 165Hz refresh rate.',
category: 'electronics',
tags: ['display', 'monitor'],
},
],
})Upsert Behavior
namespace + externalId already exists, it will be updated in place with fresh content and a re-generated embedding.Querying
Query documents using keyword, semantic, or hybrid search. Results are ranked by relevance and include optional highlighting of matched terms.
// Full-text keyword search with typo tolerance
const results = await platform.search.query({
query: 'wireless headpohnes', // typo-tolerant
searchType: 'keyword',
namespace: 'products',
highlight: true,
limit: 10,
})
// Results ranked by PostgreSQL ts_rank
for (const hit of results.results) {
console.log(hit.title, hit.score)
// highlight contains matched terms in <mark> tags
console.log(hit.highlight)
}Embedding Cost
Metadata Filtering
Narrow search results using structured facet fields and metadata filters. Facets are indexed for fast filtering, while metadata supports arbitrary key-value lookups.
// Combine search with facet filters
const results = await platform.search.query({
query: 'wireless audio',
searchType: 'hybrid',
namespace: 'products',
// Filter by indexed facet fields (fast)
filters: {
category: 'electronics',
type: 'headphones',
tags: ['wireless'],
},
// Filter by arbitrary metadata (flexible)
metadata: {
inStock: true,
brand: 'AudioPro',
},
limit: 10,
})| Filter Type | Field | Description |
|---|---|---|
| facet | category | Primary classification (e.g., "electronics", "clothing") |
| facet | type | Secondary classification (e.g., "headphones", "shirts") |
| facet | tags | Array of labels for multi-value filtering |
| metadata | { key: value } | Arbitrary JSON metadata (slower than facets, but fully flexible) |
Facets vs Metadata
category, type, and tags facets for frequently-filtered fields. They have dedicated database indexes and are significantly faster than metadata filtering.Search Types
keyword
PostgreSQL full-text search with tsvector/tsquery. Includes trigram-based typo tolerance and relevance ranking.
Fastest, exact matchingsemantic
Vector cosine similarity with HNSW indexing. Understands synonyms, intent, and related concepts.
Best for natural languagehybrid
Combines keyword and semantic results with reciprocal rank fusion for the best overall relevance.
Recommended for productionAPI Reference
| Method | Description |
|---|---|
| search.index(doc) | Index a single document with automatic embedding generation |
| search.batchIndex(docs) | Batch index up to 100 documents in a single request |
| search.query(params) | Search documents by keyword, semantic similarity, or hybrid |
| search.delete(params) | Delete a document by internal ID or external ID |
| search.getStats() | Get index statistics including document counts and namespace breakdown |
| search.listDocuments(params) | List indexed documents with pagination and namespace filtering |
Index Parameters
| Property | Type | Description |
|---|---|---|
namespace | string= "default" | Data isolation namespace (e.g., "products", "docs") |
externalId | string | Your system's document ID for upserts and lookups |
title | string | Document title (weighted higher in full-text ranking) |
contentrequired | string | Main document content for indexing and embedding |
url | string | URL or path for linking back to the source document |
metadata | Record<string, unknown> | Arbitrary key-value metadata for filtering |
category | string | Facet field for primary classification |
type | string | Facet field for secondary classification |
tags | string[] | Facet field for multi-value label filtering |
language | string= "english" | Language for full-text search stemming |
Query Parameters
| Property | Type | Description |
|---|---|---|
queryrequired | string | The search query text |
searchType | "keyword" | "semantic" | "hybrid"= "hybrid" | Search algorithm to use |
namespace | string= "default" | Namespace to search within |
filters | { category?, type?, tags? } | Facet-based filters for fast pre-filtering |
metadata | Record<string, unknown> | Arbitrary metadata filters |
highlight | boolean= false | Return matched terms wrapped in <mark> tags |
limit | number= 10 | Maximum number of results to return |
offset | number= 0 | Offset for pagination |
Best Practices
Structure Documents Well
Provide a clear title, descriptive content, and relevant metadata for every document to maximize search quality
Use Batch Indexing
Index documents in batches of up to 100 for significantly better throughput and lower embedding costs
Leverage Namespaces
Separate documents into namespaces (e.g., "products", "docs", "faq") for isolated, faster searches
Choose the Right Search Type
Use "keyword" for exact matches, "semantic" for meaning-based queries, or "hybrid" for the best of both
Filter with Facets
Use category, type, and tags facets for fast pre-filtering instead of scanning metadata at query time
Monitor Search Analytics
Track top queries, click-through rates, and zero-result queries to continuously improve relevance