RAG hybrid search for non‑technical readers: semantic vs keyword, RRF, and weighted fusion
If you are building or studying Retrieval‑Augmented Generation (RAG), you will hear three ideas over and over: semantic search (meaning-based), keyword search (exact words), and hybrid search (using both). This article explains them in plain language, then introduces two common ways to merge results: Reciprocal Rank Fusion (RRF) and weighted score sums. Technical details come from official vendor documentation and peer‑reviewed research—no invented formulas.
A simple story: two librarians
Imagine two librarians helping you find pages in a large textbook.
- Keyword librarian: Looks for the exact words you typed (and close variants). Great for names, codes, and phrases that must match precisely.
- Meaning librarian: Reads your question in everyday language and finds passages that mean the same thing, even if the wording differs. Great when users paraphrase or do not know the jargon used in the document.
Modern RAG systems often consult both librarians, then combine their lists. That is the everyday idea behind hybrid search.
What is “semantic search” in RAG?
In software, semantic search usually means: turn text into vectors (long lists of numbers) using an embedding model, then retrieve chunks whose vectors are closest to the question vector. Close vectors ≈ similar meaning in embedding space.
Strength: Handles synonyms and paraphrases (“car won’t start” vs “engine fails to turn over”).
Weakness: Can miss rare exact tokens where the “right” answer is defined by a precise string (policy IDs, error codes, legal subsection labels). That is not a moral failure of the model—it is a mismatch between “similar meaning” and “must match this exact token.”
What is keyword / lexical search?
Keyword search (often powered by BM25-style scoring in search engines) ranks passages by how well they match the query terms—like a very good Ctrl+F with ranking. It cares about words and statistics, not learned “meaning” vectors.
Strength: Strong for exact matches, rare words, SKUs, and named entities when those strings appear in the text.
Weakness: Can miss relevant passages if the author used different words than the user (synonyms, multilingual phrasing, or conceptual overlap without shared tokens).
Why hybrid search is the default for serious RAG
Most production RAG tutorials and databases document hybrid patterns because real users mix “concept questions” and “lookup questions” in the same chat. Vendors describe combining dense (semantic) and sparse (lexical) signals—for example Pinecone’s hybrid search guide and Weaviate’s hybrid search concept page.
Hybrid does not mean “twice as expensive forever.” Teams typically retrieve a small shortlist from each method, then fuse into one ranked list before sending the top chunks to the language model.
The tricky part: merging two ranked lists
Vector search produces scores in embedding space; BM25 produces different scores. They are often not directly comparable like two marks out of 100 from the same exam. So you cannot always trust a naive “add score A + score B” unless the system has put both on a comparable scale.
Engineers therefore use either rank-based fusion (use positions in each list, not raw scores) or normalized weighted sums (put scores on a common scale, then combine with weights).
Reciprocal Rank Fusion (RRF): ranks in, fused list out
RRF was introduced in information retrieval research by Cormack, Clarke, and Buettcher at SIGIR 2009 as a simple way to merge multiple ranked lists without training data. Google Research hosts the paper record at research.google/pubs/pub36196.
The intuition: being high on multiple independent rankings is stronger evidence than a single huge score from one ranking. RRF adds contributions from each list using rank (1st, 2nd, 3rd…), not the underlying score units.
Elasticsearch documents the same structure for production search: for each document d, accumulate 1 / (k + rank(d)) across child retrievers, where k is a constant (their default rank_constant is 60). See Reciprocal rank fusion | Elasticsearch Reference.
Elastic also notes a practical advantage: child retrievers can use different relevance signals (for example kNN vector retrieval plus a standard text query) and RRF still produces a single ranking—without requiring those signals to be on the same numeric scale.
For learners: think of RRF as “collect points for showing up near the top of multiple expert opinions,” instead of “average two incompatible grades.”
Weighted sums: when scores are normalized
Another family of approaches normalizes dense and sparse scores into a comparable range, then computes a weighted combination—for example α × semantic + (1 − α) × keyword, with α between 0 and 1. The exact mechanics depend on the database. Weaviate’s hybrid documentation discusses balancing vector vs keyword contribution in hybrid queries (see their hybrid search docs linked above).
When teams pick weighted fusion: They often already trust the engine’s normalization, want an interpretable knob (more “semantic” vs more “keyword”), or need a single score for downstream rerankers.
When teams pick RRF: They want a robust merge when scores are not comparable, or they combine more than two retrievers (for example vector + BM25 + a metadata filter ranker) without hand‑tuning scale factors—Elastic’s RRF retriever requires two or more child retrievers.
How this fits a typical RAG pipeline
- Ingest: Clean and chunk documents (headings, tables, and citations need thoughtful chunking—bad chunks defeat good retrieval).
- Index: Build both lexical indexes and vector indexes (or a product that supports hybrid in one stack—see OpenSearch vector search documentation for open-source patterns).
- Retrieve: Run semantic and keyword retrieval in parallel; fuse with RRF or weighted strategies.
- Optional rerank: A cross-encoder or dedicated reranker model can re-order the shortlist.
- Generate: Prompt the LLM with the top chunks and require citations to chunk IDs where possible.
- Evaluate: Measure answer correctness and citation accuracy—not just “cosine similarity feels high.”
For a comparison of vector-first vs keyword-first RAG designs, read our Vector RAG vs vectorless RAG article—hybrid sits in the middle ground most teams end up in.
Official sources (bookmark these)
Paath.online teaches RAG, evaluation, and agent patterns in live 1:1 sessions so you connect documentation to working code—see Advanced AI tutoring and our AI News index for library releases.