Skip to main content

Advanced Retrieval

Standard vector search is a starting point, not a ceiling. These seven guides cover the retrieval strategies that close the gap between a proof-of-concept RAG pipeline and one that holds up in production — when queries are ambiguous, documents are large, or the cost of a wrong answer is high.

Each guide is self-contained, includes a Google Colab notebook, and assumes you already have a working RAG pipeline. If you're new to RAG, start with RAG Fundamentals first.

Guides in this section

GuideStrategyBest forDifficulty
RAG FusionMulti-query + Reciprocal Rank FusionQueries with ambiguous or varied phrasingIntermediate
GraphRAGEntity extraction + relationship graphDocuments with dense entity relationshipsAdvanced
Self-RAGRelevance grading + hallucination detectionHigh-stakes answers requiring verifiable groundingAdvanced
Parent Document RetrieverSmall chunks retrieved, large chunks returnedLong documents where chunk context is lostIntermediate
Query DecompositionSub-query generation + synthesisMulti-part or multi-hop questionsIntermediate
Cross-Encoder RerankingRetrieve wide, rerank narrowAny pipeline where precision matters more than recallIntermediate
Adaptive RAGComplexity routing to fast/strong LLMCost-sensitive pipelines with mixed query difficultyAdvanced

When to use which strategy

Your answers are inconsistent across phrasings — use RAG Fusion. Generating multiple query variants and fusing their ranked results smooths out sensitivity to wording.

Your documents are about entities and their relationships — use GraphRAG. A relationship graph lets you answer multi-hop questions that defeat flat vector search.

You need to cite sources and cannot tolerate hallucinations — use Self-RAG. The grading loop surfaces low-relevance retrievals and detects unsupported claims before they reach the user.

Your chunks are too small to give the LLM enough context — use the Parent Document Retriever. Retrieve on fine-grained child chunks, but pass the full parent chunk to the model.

Your users ask compound questions — use Query Decomposition. Breaking a question into focused sub-queries produces richer, more accurate retrieval than searching the compound question directly.

Your top-k results are noisy — use Cross-Encoder Reranking. Retrieve a large candidate set cheaply, then score each candidate against the query precisely.

Your query volume is high and queries vary in complexity — use Adaptive RAG. Route simple queries to a fast, cheap model and reserve a stronger model for queries that need it.

Prerequisites

  • Python 3.10+
  • pip install synapsekit
  • An OpenAI API key (set as OPENAI_API_KEY)
  • Familiarity with the RAG Fundamentals guide series

Common imports

Every guide in this section shares the same import baseline:

import asyncio
import os

from synapsekit import RAG
from synapsekit.llms.openai import OpenAILLM
from synapsekit.embeddings.openai import OpenAIEmbeddings
from synapsekit.vectorstores.memory import InMemoryVectorStore