You Don’t Need a Vector DB: Running hybrid search in production

This is the last part of the series on using Postgres for hybrid search. Here I want to focus on what changed when the same idea had to work under real load.

The hard part is not storing vectors. The hard part is keeping search fast when the data grows, background jobs keep running, and more users hit the database at the same time.

I still like keeping it in Postgres when the numbers allow it. The reason is simple: the data is already there, the team already knows how to run it, and I don't add a second search system too early.

When exact search was too slow

Without an ANN index, the vector query is still just an ordered scan:

SELECT id
FROM knowledge_base
ORDER BY embedding <=> :query_embedding
LIMIT 10;

That was perfectly acceptable on a small table. It became a different story once the row count moved into the millions. At that point I cared less about exact nearest neighbor as an ideal and more about results that were good enough with response time that stayed steady.

So I switched the vector side to approximate nearest neighbor. That trade felt reasonable for this kind of search. These were support, docs, and internal search workloads. I didn't need a perfect answer. I needed stable response time and results that were still clearly relevant.

Choosing between IVFFlat and HNSW

For pgvector, the real decision was usually IVFFlat versus HNSW.

IVFFlat was the lighter option. It built faster, used less memory, and was easier to introduce when I just wanted an ANN index in place. The downside was recall, plus a bit more need for tuning.

HNSW was the better default when read performance mattered more. It took longer to build and cost more memory, but recall was usually stronger and queries were usually faster once the index was warm.

This was the basic HNSW index shape:

CREATE INDEX ON knowledge_base USING hnsw (embedding vector_cosine_ops);

And this was the lighter variant:

CREATE INDEX ON knowledge_base USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

In production I also cared about build timing and lock behavior, but that's separate from the main idea here. I tried not to create IVFFlat too early on tiny datasets because it works better once there is enough data behind it. And whenever recall dropped after adding ANN, I went straight to hnsw.ef_search or ivfflat.probes rather than guessing.

The trigram query also needed an index

It's easy to focus on the vector side and forget that text search also needs an index. pg_trgm without a proper index works, but not in a way you want under real traffic.

The useful index for me was GIN with gin_trgm_ops:

CREATE INDEX ON knowledge_base USING gin (content gin_trgm_ops);

A normal B-tree does not help much once the query is fuzzy or pattern-based. If title mattered a lot, I indexed that as well. In a couple of cases I ended up building a search_text field because the raw text was too messy.

GIN was not free. It took space and it made writes heavier. Still, if the trigram query is part of every search request, paying that cost once is better than paying for slow text scans on every query.

The memory numbers changed the conversation

The moment I wrote down the rough storage math, the production discussion got more concrete. A 1536-dimensional vector is roughly 1536 * 4 bytes, so around 6 KB per embedding before row overhead and before ANN index overhead. At five million rows, that is already around 30 GB of raw embeddings.

That doesn't mean the design is wrong. It just means memory stops being a default setting you ignore. I usually started shared_buffers at something like a quarter of machine RAM and then measured what happened. What I cared about was whether the hot part of the data and the useful parts of the indexes could stay in memory often enough.

work_mem mattered too, because the hybrid query still sorts and joins. If it is too low, the merge step spills to disk. If it is too high for the whole database, concurrency hurts you somewhere else. I usually preferred a conservative default and a local override for the search path when I needed it.

SET LOCAL work_mem = '128MB';

Then I checked EXPLAIN (ANALYZE, BUFFERS) and adjusted from there. I don't think there is a more elegant way around that part.

The boring fixes helped most

Some of the biggest wins were simple. Filtering early mattered. If the user was already in one product area, one language, or one tenant, cutting the candidate set before the expensive ranking work helped both the vector and trigram queries.

Connection pooling mattered too. These hybrid queries are not absurdly expensive, but they do hold a connection longer than simple lookups. Under traffic that difference is enough to make backend counts annoying. So I still used PgBouncer and watched pool saturation closely.

Where I would stop pushing this pattern

I wouldn't keep forcing this into Postgres forever. If the workload becomes very write-heavy, if embeddings are reprocessed all the time, or if the data grows past what the box can carry comfortably, a dedicated search system can be the cleaner choice. The same is true if the team wants ranking logic that is too complex for SQL to stay pleasant.

But for the sort of search I was dealing with, Postgres held up longer than I expected. The useful lesson for me was not that vector databases are bad. It was that I didn't need one nearly as early as I thought I did.