Search | Sylphx Documentation

Semantic Search

AI-powered vector similarity with automatic embeddings

Full-Text Search

PostgreSQL tsvector with ranking and typo tolerance

Auto-Embeddings

Documents automatically embedded on index

Metadata Filters

Faceted filtering by category, type, and tags

Overview

Sylphx Search combines two complementary search paradigms into a single unified API. Full-text search uses PostgreSQL tsvector indexing with trigram-based typo tolerance for fast, precise keyword matching. Semantic search uses vector extensions with HNSW indexes and 1536-dimensional embeddings (text-embedding-3-small via OpenRouter) to find results based on meaning, not just keywords. Hybrid mode combines both approaches with reciprocal rank fusion for the highest relevance.

Documents are automatically embedded when indexed -- no manual embedding pipeline needed. All embedding costs are tracked through OpenRouter and billed transparently.

Keyword Search

Best for exact terms, product names, SKUs, and technical jargon. Uses PostgreSQL full-text indexing with relevance ranking.

Semantic Search

Best for natural language queries. Understands synonyms, related concepts, and intent even when exact words do not match.

Hybrid Search

Combines keyword and semantic ranking with reciprocal rank fusion. Recommended for most production use cases.

Quick Start

Index a document and search it in three steps:

import { platform } from '@/lib/platform'

// Index a document — embedding is generated automatically
await platform.search.index({
  namespace: 'docs',
  externalId: 'getting-started',
  title: 'Getting Started with Sylphx',
  content: 'Learn how to set up authentication, analytics, and more in under 5 minutes.',
  metadata: { category: 'guides', author: 'team' },
})

Indexing Documents

When you index a document, Sylphx stores the content for full-text search and automatically generates a 1536-dimensional vector embedding for semantic search. Each document belongs to a namespace for data isolation.

Single document

await platform.search.index({
  // Namespace isolates documents into separate search domains
  namespace: 'products',

  // Your system's ID — used for upserts and lookups
  externalId: 'prod_wireless_hp',

  // Title is weighted higher in full-text ranking
  title: 'Wireless Noise-Canceling Headphones',

  // Main searchable content
  content: 'Premium over-ear headphones with active noise cancellation, 30-hour battery life, and multipoint Bluetooth connectivity.',

  // URL for linking back to the source
  url: '/products/wireless-hp',

  // Structured metadata for filtering
  metadata: {
    price: 299.99,
    brand: 'AudioPro',
    inStock: true,
  },

  // Pre-defined facets for fast filtering
  category: 'electronics',
  type: 'headphones',
  tags: ['audio', 'wireless', 'noise-canceling'],
})

For large datasets, use batch indexing to index up to 100 documents in a single request:

Batch indexing

// Index multiple documents in one request
await platform.search.batchIndex({
  namespace: 'products',
  documents: [
    {
      externalId: 'prod_001',
      title: 'Wireless Headphones',
      content: 'Premium noise-canceling over-ear headphones.',
      category: 'electronics',
      tags: ['audio', 'wireless'],
    },
    {
      externalId: 'prod_002',
      title: 'Mechanical Keyboard',
      content: 'Compact 75% layout with hot-swappable switches.',
      category: 'electronics',
      tags: ['keyboard', 'mechanical'],
    },
    {
      externalId: 'prod_003',
      title: 'Ultrawide Monitor',
      content: '34-inch curved ultrawide with 165Hz refresh rate.',
      category: 'electronics',
      tags: ['display', 'monitor'],
    },
  ],
})

Upsert Behavior

If a document with the same namespace + externalId already exists, it will be updated in place with fresh content and a re-generated embedding.

Querying

Query documents using keyword, semantic, or hybrid search. Results are ranked by relevance and include optional highlighting of matched terms.

// Full-text keyword search with typo tolerance
const results = await platform.search.query({
  query: 'wireless headpohnes', // typo-tolerant
  searchType: 'keyword',
  namespace: 'products',
  highlight: true,
  limit: 10,
})

// Results ranked by PostgreSQL ts_rank
for (const hit of results.results) {
  console.log(hit.title, hit.score)
  // highlight contains matched terms in <mark> tags
  console.log(hit.highlight)
}

Embedding Cost

Semantic and hybrid queries generate a vector embedding of your query text. This embedding cost is tracked automatically and visible in your usage dashboard.

Metadata Filtering

Narrow search results using structured facet fields and metadata filters. Facets are indexed for fast filtering, while metadata supports arbitrary key-value lookups.

Filtered search

// Combine search with facet filters
const results = await platform.search.query({
  query: 'wireless audio',
  searchType: 'hybrid',
  namespace: 'products',

  // Filter by indexed facet fields (fast)
  filters: {
    category: 'electronics',
    type: 'headphones',
    tags: ['wireless'],
  },

  // Filter by arbitrary metadata (flexible)
  metadata: {
    inStock: true,
    brand: 'AudioPro',
  },

  limit: 10,
})

Filter Type	Field	Description
facet	category	Primary classification (e.g., "electronics", "clothing")
facet	type	Secondary classification (e.g., "headphones", "shirts")
facet	tags	Array of labels for multi-value filtering
metadata	{ key: value }	Arbitrary JSON metadata (slower than facets, but fully flexible)

Facets vs Metadata

Use category, type, and tags facets for frequently-filtered fields. They have dedicated database indexes and are significantly faster than metadata filtering.

Search Types

keyword

PostgreSQL full-text search with tsvector/tsquery. Includes trigram-based typo tolerance and relevance ranking.

Fastest, exact matching

semantic

Vector cosine similarity with HNSW indexing. Understands synonyms, intent, and related concepts.

Best for natural language

hybrid

Combines keyword and semantic results with reciprocal rank fusion for the best overall relevance.

Recommended for production

API Reference

Method	Description
search.index(doc)	Index a single document with automatic embedding generation
search.batchIndex(docs)	Batch index up to 100 documents in a single request
search.query(params)	Search documents by keyword, semantic similarity, or hybrid
search.delete(params)	Delete a document by internal ID or external ID
search.getStats()	Get index statistics including document counts and namespace breakdown
search.listDocuments(params)	List indexed documents with pagination and namespace filtering

Index Parameters

Property	Type	Description
`namespace`	`string`= "default"	Data isolation namespace (e.g., "products", "docs")
`externalId`	`string`	Your system's document ID for upserts and lookups
`title`	`string`	Document title (weighted higher in full-text ranking)
`content`required	`string`	Main document content for indexing and embedding
`url`	`string`	URL or path for linking back to the source document
`metadata`	`Record<string, unknown>`	Arbitrary key-value metadata for filtering
`category`	`string`	Facet field for primary classification
`type`	`string`	Facet field for secondary classification
`tags`	`string[]`	Facet field for multi-value label filtering
`language`	`string`= "english"	Language for full-text search stemming

Query Parameters

Property	Type	Description
`query`required	`string`	The search query text
`searchType`	`"keyword" \| "semantic" \| "hybrid"`= "hybrid"	Search algorithm to use
`namespace`	`string`= "default"	Namespace to search within
`filters`	`{ category?, type?, tags? }`	Facet-based filters for fast pre-filtering
`metadata`	`Record<string, unknown>`	Arbitrary metadata filters
`highlight`	`boolean`= false	Return matched terms wrapped in <mark> tags
`limit`	`number`= 10	Maximum number of results to return
`offset`	`number`= 0	Offset for pagination

Best Practices

Structure Documents Well

Provide a clear title, descriptive content, and relevant metadata for every document to maximize search quality

Use Batch Indexing

Index documents in batches of up to 100 for significantly better throughput and lower embedding costs

Leverage Namespaces

Separate documents into namespaces (e.g., "products", "docs", "faq") for isolated, faster searches

Choose the Right Search Type

Use "keyword" for exact matches, "semantic" for meaning-based queries, or "hybrid" for the best of both

Filter with Facets

Use category, type, and tags facets for fast pre-filtering instead of scanning metadata at query time

Monitor Search Analytics

Track top queries, click-through rates, and zero-result queries to continuously improve relevance

Realtime

Pub/sub messaging and streams

Email

Transactional and newsletter email