Semantic Search

Hybrid vector + lexical product search with a soft lexical fallback — embeddings from Gemini or local Ollama, and an opt-in pgvector/Postgres path for large catalogs.

Cartwright product search is hybrid: it ranks by vector cosine-similarity and a lexical boost, so a query like "warm waterproof jacket for hiking" finds the right products even when none of those exact words appear in the title. When embeddings aren't ready, it falls back softly to pure lexical search — there is no hard dependency and no regression.

How ranking works

Each product gets an embedding built from its name, brand, category, description, and attributes — one searchable haystack. At query time:

The query is embedded with the same model.
Candidates are scored by cosine similarity (semantic) plus a lexical boost for direct token matches.
Results are ordered by the combined score.

The same path powers both the storefront search API (/api/products/search) and the agent-callable products.search tool, so a human and an AI agent get identically ranked results.

Embedding providers

Embeddings are produced by lib/ai/embeddings.ts, which routes by your AI configuration:

Provider	Model	Notes
Google Gemini	`text-embedding-004` (768-dim)	Default when a Gemini key is set.
Ollama (local)	`nomic-embed-text`	On-device fallback — no cloud calls.
none	—	Returns `null` → search uses lexical-only ranking.

Backfill or re-embed your whole catalog after a model change:

pnpm embeddings:backfill

New and updated products are re-embedded on a best-effort hook, so the index stays current as you edit the catalog.

Embeddings are stored in ProductEmbedding.vectorJson — portable JSON that works on every database driver, including the default Turso/libSQL.

Scaling: pgvector / Postgres (opt-in)

The default ranking runs in TypeScript over the embedding table — perfect for typical catalogs. For large catalogs (tens of thousands of products), an opt-in path pushes the nearest-neighbour search down into Postgres + a pgvector HNSW index, so search scales logarithmically instead of linearly.

Same ranking formula as the TS path → identical results.
Gated behind DATABASE_DRIVER=postgres; the default Turso/SQLite path always fires first.
Dual-write: the portable vectorJson and a native vector(768) column.

See Postgres & Supabase for the full setup (pnpm pgvector:setup, Supabase notes).

How ranking works

Embedding providers

Scaling: pgvector / Postgres (opt-in)

Postgres & Supabase

AI assistant

Local AI / Ollama

On this page