In the Weeds: Semantic Search at Scale with Supabase + pgvector

joey-ioApril 12, 20266 min read

A production-grade guide to building semantic search with Supabase and pgvector — from initial setup through indexing strategies, query optimization, and the hybrid search patterns that actually work at scale.

technical in-the-weeds supabase pgvector semantic-search database

Why pgvector Won

The vector database market exploded in 2023-2024. Pinecone, Weaviate, Qdrant, Milvus, Chroma — dozens of purpose-built solutions for storing and querying embeddings. So why are so many production systems settling on pgvector — a PostgreSQL extension that stores vectors in the same database as everything else?

Three reasons: simplicity, co-location, and transactions.

Simplicity: no new infrastructure to manage. If you already have Postgres (and you probably do), you already have a vector database.

Co-location: your vectors live next to your metadata. No cross-system joins. No sync lag. No eventual consistency between your data and your embeddings.

Transactions: embeddings update atomically with their source data. Delete a document, and its embedding disappears in the same transaction. No orphaned vectors. No stale results.

SSupabase MCP makes this even more accessible by providing a managed Postgres instance with pgvector pre-installed, plus edge functions for embedding generation. Let me show you how to build production semantic search on this stack.

Schema Design

Start with a schema that co-locates content and embeddings:

sql-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Content table with embedded vectors
CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  embedding VECTOR(1536),  -- OpenAI ada-002 dimensions
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Index for vector similarity search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Key decisions in this schema:
- VECTOR(1536) matches OpenAI's ada-002 embedding model. If you use a different model, adjust the dimension.
- ivfflat index for approximate nearest neighbor search. Fast at scale, with tunable accuracy.
- metadata JSONB for filtering without additional tables.

Embedding Generation

Generate embeddings at insert time using a Supabase Edge Function:

typescript// supabase/functions/embed-document/index.ts
import { serve } from 'https://deno.land/std@0.168.0/http/server.ts'
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'
serve(async (req) => {
  const { id, content } = await req.json()
// Generate embedding
  const embeddingResponse = await fetch('https://api.openai.com/v1/embeddings', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer ' + Deno.env.get('OPENAI_API_KEY'),
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'text-embedding-ada-002',
      input: content.substring(0, 8000) // Token limit safety
    })
  })
const { data } = await embeddingResponse.json()
  const embedding = data[0].embedding
// Store in database
  const supabase = createClient(
    Deno.env.get('SUPABASE_URL'),
    Deno.env.get('SUPABASE_SERVICE_KEY')
  )
await supabase
    .from('documents')
    .update({ embedding })
    .eq('id', id)
return new Response(JSON.stringify({ success: true }))
})

The Search Function

Expose semantic search through a Postgres function (callable via Supabase RPC):

sqlCREATE OR REPLACE FUNCTION search_documents(
  query_embedding VECTOR(1536),
  match_threshold FLOAT DEFAULT 0.78,
  match_count INT DEFAULT 10,
  filter_metadata JSONB DEFAULT '{}'
)
RETURNS TABLE (
  id UUID,
  title TEXT,
  content TEXT,
  similarity FLOAT,
  metadata JSONB
)
LANGUAGE plpgsql
AS $fn$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.title,
    d.content,
    1 - (d.embedding <=> query_embedding) AS similarity,
    d.metadata
  FROM documents d
  WHERE
    1 - (d.embedding <=> query_embedding) > match_threshold
    AND (filter_metadata = '{}' OR d.metadata @> filter_metadata)
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$fn$;

The <=> operator is cosine distance. Subtracting from 1 gives cosine similarity (higher is more similar). The match_threshold prevents returning irrelevant results — anything below 0.78 similarity is probably not what the user wants.

Hybrid Search: Semantic + Full-Text

Pure vector search has a weakness: it can miss exact matches. If someone searches for a specific product name or technical term, semantic similarity might rank a conceptually similar but factually different document higher.

The solution is hybrid search — combining vector similarity with traditional full-text search:

sqlCREATE OR REPLACE FUNCTION hybrid_search(
  query_text TEXT,
  query_embedding VECTOR(1536),
  match_count INT DEFAULT 10,
  semantic_weight FLOAT DEFAULT 0.7,
  text_weight FLOAT DEFAULT 0.3
)
RETURNS TABLE (
  id UUID,
  title TEXT,
  content TEXT,
  combined_score FLOAT
)
LANGUAGE plpgsql
AS $fn$
BEGIN
  RETURN QUERY
  WITH semantic_results AS (
    SELECT d.id, d.title, d.content,
           1 - (d.embedding <=> query_embedding) AS semantic_score
    FROM documents d
    ORDER BY d.embedding <=> query_embedding
    LIMIT match_count  3
  ),
  text_results AS (
    SELECT d.id, d.title, d.content,
           ts_rank(to_tsvector('english', d.content), plainto_tsquery('english', query_text)) AS text_score
    FROM documents d
    WHERE to_tsvector('english', d.content) @@ plainto_tsquery('english', query_text)
    LIMIT match_count  3
  ),
  combined AS (
    SELECT
      COALESCE(s.id, t.id) AS id,
      COALESCE(s.title, t.title) AS title,
      COALESCE(s.content, t.content) AS content,
      (COALESCE(s.semantic_score, 0)  semantic_weight +
       COALESCE(t.text_score, 0)  text_weight) AS combined_score
    FROM semantic_results s
    FULL OUTER JOIN text_results t ON s.id = t.id
  )
  SELECT c.id, c.title, c.content, c.combined_score
  FROM combined c
  ORDER BY c.combined_score DESC
  LIMIT match_count;
END;
$fn$;

The semantic_weight and text_weight parameters let you tune the balance. For general knowledge bases, 0.7/0.3 works well. For technical documentation with many exact terms, try 0.5/0.5.

Scaling: Index Tuning

As your document count grows, the ivfflat index needs tuning:

sql-- For 10,000 documents: lists = 100
-- For 100,000 documents: lists = 316 (sqrt of count)
-- For 1,000,000 documents: lists = 1000
-- Rebuild with appropriate lists count
DROP INDEX documents_embedding_idx;
CREATE INDEX documents_embedding_idx ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 316);
-- Tune probes for accuracy vs speed
SET ivfflat.probes = 10; -- Default is 1. Higher = more accurate, slower.

The rule of thumb: lists should be approximately sqrt(row_count). Probes should be approximately sqrt(lists). These settings give you 90%+ recall with good performance.

For maximum performance at large scale, consider HNSW indexes instead:

sqlCREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

HNSW is faster at query time but slower to build and uses more memory. For read-heavy workloads (which most search systems are), it is the better choice above 100,000 documents.

Chunking Strategy

Documents longer than the embedding model's context window need chunking. The strategy matters enormously:

pythondef chunk_document(content: str, chunk_size: int = 1000, overlap: int = 200) -> list:
    """Chunk with overlap for context preservation."""
    chunks = []
    start = 0
    while start < len(content):
        end = start + chunk_size
        # Try to break at sentence boundary
        if end < len(content):
            last_period = content[start:end].rfind('.')
            if last_period > chunk_size * 0.7:
                end = start + last_period + 1
        chunks.append(content[start:end])
        start = end - overlap
    return chunks

Each chunk gets its own embedding and its own row, but shares a document_id foreign key for grouping results.

Integration with the Stack

This Supabase + pgvector setup integrates cleanly with the tools in our ecosystem:

nn8n can trigger re-embedding when documents change
AApify MCP can scrape content and feed it directly into the pipeline
CContext7 can use the search function for documentation retrieval
NNeon MCP provides an alternative serverless Postgres if you need scale-to-zero
ttxtai can provide the embedding generation locally if you want to avoid API calls

The beauty of building on Postgres is that everything else in your stack already knows how to talk to it. No new protocols. No new authentication. No new monitoring. Just SQL with vectors.

Production Checklist

Before going live:
1. Index tuning (lists and probes appropriate for your data size)
2. Embedding dimension matches your model exactly
3. Threshold tuned for your data (start at 0.78, adjust based on relevance testing)
4. Chunking strategy tested with real documents
5. Hybrid search weights tuned for your query patterns
6. Monitoring on query latency (alert if p95 exceeds 200ms)
7. Embedding generation error handling (retry logic, dead letter queue)
8. Periodic re-embedding as models improve

Semantic search is not magic. It is engineering. But engineering with the right tools — and Supabase + pgvector gives you remarkably capable tools with remarkably little complexity.

Share this post:

Ratings & Reviews

5.0

out of 5

1 rating

joey-io

1 month ago

Tools in this post

Apify MCP

Access 3,000+ pre-built cloud tools for web scraping

Context7

Up-to-date docs for any library, instantly

n8n

Open-source workflow automation with AI integration

Neon MCP

Interact with Neon serverless Postgres databases

Supabase MCP

Connect AI agents to Supabase database, auth, and edge functions

txtai

All-in-one embeddings database and RAG framework