Retriever

The Retriever finds the most relevant chunks for a query using vector similarity and optional BM25 reranking.

Basic usage

from synapsekit.retriever import Retriever
from synapsekit.vectorstore import InMemoryVectorStore
from synapsekit.embeddings import SynapsekitEmbeddings

embeddings = SynapsekitEmbeddings()
store = InMemoryVectorStore(embeddings)
store.add(["Chunk one...", "Chunk two...", "Chunk three..."])

retriever = Retriever(store)
results = await retriever.retrieve("Your query here", top_k=3)

for doc in results:
    print(doc.text, doc.score)

BM25 reranking

Enable hybrid retrieval (vector + BM25) for better precision:

retriever = Retriever(store, use_bm25=True, bm25_weight=0.3)
results = await retriever.retrieve("Your query", top_k=5)

Requires rank-bm25 (included as a hard dependency).

Metadata filtering

Filter results by metadata before ranking:

results = await retriever.retrieve(
    "Your query",
    top_k=5,
    metadata_filter={"source": "report.pdf"},
)

Only documents whose metadata contains all specified key-value pairs are considered.

MMR retrieval (diversity)

Maximal Marginal Relevance balances relevance with diversity to reduce redundant results:

results = await retriever.retrieve_mmr(
    "Your query",
    top_k=5,
    lambda_mult=0.5,  # 0 = max diversity, 1 = max relevance
    fetch_k=20,       # Initial candidate pool size
)

MMR greedily selects documents that maximize: lambda * relevance(query, doc) - (1-lambda) * max_similarity(doc, selected_docs)

RAG Fusion

Generate multiple query variations with an LLM and fuse results using Reciprocal Rank Fusion for better recall:

from synapsekit import RAGFusionRetriever

fusion = RAGFusionRetriever(
    retriever=retriever,
    llm=llm,
    num_queries=3,   # Number of query variations to generate
    rrf_k=60,        # RRF constant (higher = less aggressive reranking)
)

results = await fusion.retrieve("What is quantum computing?", top_k=5)

The process:

LLM generates num_queries variations of your query
Each variation (plus the original) is used to retrieve results
Results are fused using Reciprocal Rank Fusion scoring
Documents appearing in multiple result sets rank higher

Contextual Retrieval

Inspired by Anthropic's Contextual Retrieval approach. Before embedding, each chunk is enriched with a short LLM-generated context sentence, improving accuracy for ambiguous chunks:

from synapsekit import ContextualRetriever

cr = ContextualRetriever(
    retriever=retriever,
    llm=llm,
)

# Add chunks — each gets a context sentence prepended before embedding
await cr.add_with_context(["chunk one...", "chunk two..."])

# Retrieve as normal
results = await cr.retrieve("What is quantum computing?", top_k=5)

The process:

For each chunk, the LLM generates a 1-2 sentence context
The context is prepended to the chunk before embedding
At retrieval time, the enriched embeddings improve search accuracy

You can customize the context generation prompt:

cr = ContextualRetriever(
    retriever=retriever,
    llm=llm,
    context_prompt="Summarize this chunk in one sentence:\n{chunk}",
)

Sentence Window Retrieval

Embeds individual sentences for fine-grained search, but returns a window of surrounding sentences for richer context:

from synapsekit import SentenceWindowRetriever

swr = SentenceWindowRetriever(
    retriever=retriever,
    window_size=2,  # Include 2 sentences before and after the match
)

# Add full documents — they're split into sentences automatically
await swr.add_documents(["Full document text here. With multiple sentences. And more."])

# Retrieve — matched sentences are expanded with surrounding context
results = await swr.retrieve("query", top_k=3)

The process:

Documents are split into individual sentences
Each sentence is embedded independently for fine-grained matching
At retrieval time, matched sentences are expanded with window_size surrounding sentences

Self-Query Retrieval

The SelfQueryRetriever uses an LLM to decompose a natural-language question into a semantic search query and structured metadata filters. This automates the process of extracting filters from user questions.

from synapsekit import SelfQueryRetriever

sqr = SelfQueryRetriever(
    retriever=retriever,
    llm=llm,
    metadata_fields=["source", "author", "year", "category"],
)

# The LLM extracts filters automatically
results = await sqr.retrieve("Papers by John about ML from 2024", top_k=5)

The process:

The LLM analyzes the question and extracts a semantic query ("ML papers") and metadata filters ({"author": "John", "year": "2024"})
The semantic query is used for vector search
The metadata filters are applied to narrow results

Inspecting extracted filters

Use retrieve_with_filters() to see what the LLM extracted:

results, info = await sqr.retrieve_with_filters(
    "Papers by John about ML from 2024", top_k=5
)
print(info["query"])    # "ML papers"
print(info["filters"])  # {"author": "John", "year": "2024"}

Custom prompt

Override the default decomposition prompt:

sqr = SelfQueryRetriever(
    retriever=retriever,
    llm=llm,
    metadata_fields=["source", "year"],
    prompt="Custom prompt with {fields} and {question} placeholders...",
)

Parent Document Retrieval

The ParentDocumentRetriever embeds small chunks for precise matching but returns full parent documents for richer context:

from synapsekit import ParentDocumentRetriever

pdr = ParentDocumentRetriever(
    retriever=retriever,
    chunk_size=200,
    chunk_overlap=50,
)

# Add full documents — they're chunked internally
await pdr.add_documents(["Full document one...", "Full document two..."])

# Retrieve — returns full parent documents, not small chunks
results = await pdr.retrieve("query", top_k=3)

The process:

Documents are split into small overlapping chunks (controlled by chunk_size and chunk_overlap)
Each chunk is embedded and stored with a reference to its parent document
At retrieval time, matched chunks are traced back to their parent documents
Duplicate parents are deduplicated — each parent appears at most once

This is ideal when you need the precision of small-chunk search but want to feed the LLM the full document for context.

Adding documents with metadata

await pdr.add_documents(
    ["Document one...", "Document two..."],
    metadata=[{"source": "report.pdf"}, {"source": "paper.pdf"}],
)

Metadata is propagated to all chunks of a document.

Cross-Encoder Reranking

The CrossEncoderReranker uses a cross-encoder model to rerank retrieval results for higher precision. Cross-encoders score query-document pairs jointly, giving much more accurate relevance scores than bi-encoder similarity alone.

from synapsekit import CrossEncoderReranker

reranker = CrossEncoderReranker(
    retriever=retriever,
    model="cross-encoder/ms-marco-MiniLM-L-6-v2",
    fetch_k=20,  # Initial candidates to retrieve before reranking
)

results = await reranker.retrieve("What is RAG?", top_k=5)

The process:

fetch_k candidates are retrieved using standard vector search
Each candidate is scored jointly with the query using the cross-encoder
Results are reranked by cross-encoder score and the top top_k are returned

Getting scores

Use retrieve_with_scores() to see the cross-encoder scores:

results = await reranker.retrieve_with_scores("What is RAG?", top_k=5)
for r in results:
    print(r["text"], r["cross_encoder_score"])

info

Requires sentence-transformers: pip install synapsekit[semantic]

CRAG (Corrective RAG)

The CRAGRetriever implements self-correcting retrieval: it retrieves candidates, grades each for relevance using an LLM, and rewrites the query to retry if too few documents pass the relevance check.

from synapsekit import CRAGRetriever

crag = CRAGRetriever(
    retriever=retriever,
    llm=llm,
    relevance_threshold=0.5,  # Fraction of docs that must be relevant
    max_retries=1,            # Max query rewrites before giving up
)

results = await crag.retrieve("What is quantum computing?", top_k=5)

The process:

Retrieve top_k candidates using the base retriever
LLM grades each document as "relevant" or "irrelevant" to the query
If fewer than relevance_threshold fraction pass, the LLM rewrites the query
Retry retrieval with the rewritten query (up to max_retries times)
Return only the documents that passed relevance grading

Inspecting grades

Use retrieve_with_grades() to see grading details:

results, info = await crag.retrieve_with_grades("query", top_k=5)
print(info["relevant_count"])   # Number of relevant docs
print(info["total_count"])      # Total docs retrieved
print(info["query_rewritten"])  # Whether the query was rewritten
print(info["final_query"])      # The (possibly rewritten) query used

Query Decomposition

The QueryDecompositionRetriever uses an LLM to break complex queries into simpler sub-queries, retrieves for each, and deduplicates results:

from synapsekit import QueryDecompositionRetriever

qdr = QueryDecompositionRetriever(
    retriever=retriever,
    llm=llm,
    num_sub_queries=3,  # Number of sub-queries to generate
)

results = await qdr.retrieve("Compare quantum and classical computing for ML", top_k=5)

The process:

LLM decomposes the query into num_sub_queries simpler sub-queries
Each sub-query is used to retrieve results independently
Results are deduplicated and returned

Inspecting sub-queries

results, sub_queries = await qdr.retrieve_with_sub_queries("query", top_k=5)
print(sub_queries)  # ["What is quantum computing?", "What is classical computing?", ...]

Contextual Compression

The ContextualCompressionRetriever retrieves documents then uses an LLM to compress each to only the content relevant to the query:

from synapsekit import ContextualCompressionRetriever

ccr = ContextualCompressionRetriever(
    retriever=retriever,
    llm=llm,
    fetch_k=10,  # Retrieve this many, then compress
)

results = await ccr.retrieve("What is RAG?", top_k=5)

The process:

Retrieve fetch_k candidates using the base retriever
LLM compresses each document, extracting only content relevant to the query
Documents the LLM marks as "NOT_RELEVANT" are filtered out
Top top_k compressed results are returned

Ensemble Retrieval

The EnsembleRetriever fuses results from multiple retrievers using weighted Reciprocal Rank Fusion (RRF):

from synapsekit import EnsembleRetriever

ensemble = EnsembleRetriever(
    retrievers=[retriever_a, retriever_b],
    weights=[0.7, 0.3],  # Optional, defaults to equal weights
    rrf_k=60,            # RRF constant
)

results = await ensemble.retrieve("What is RAG?", top_k=5)

The process:

Each retriever independently retrieves candidates
Results are scored using weighted RRF: score = weight / (rrf_k + rank + 1)
Scores are summed across retrievers for documents appearing in multiple result sets
Final results are sorted by fused score

Cohere Reranking

The CohereReranker uses Cohere's rerank models to rerank retrieval results for higher precision. Unlike CrossEncoderReranker (local model), this uses the Cohere Rerank API.

from synapsekit import CohereReranker

reranker = CohereReranker(
    retriever=retriever,
    model="rerank-v3.5",
    fetch_k=20,  # Initial candidates to retrieve before reranking
)

results = await reranker.retrieve("What is RAG?", top_k=5)

The process:

fetch_k candidates are retrieved using standard vector search
Candidates are sent to the Cohere Rerank API
Results are reranked by relevance score and the top top_k are returned

Getting scores

Use retrieve_with_scores() to see the Cohere relevance scores:

results = await reranker.retrieve_with_scores("What is RAG?", top_k=5)
for r in results:
    print(r["text"], r["relevance_score"])

API key

The API key is resolved in order:

api_key parameter
CO_API_KEY environment variable

info

Requires cohere: pip install synapsekit[cohere]

Step-Back Retrieval

The StepBackRetriever generates a more abstract "step-back" question using an LLM, retrieves for both the original and step-back queries in parallel, and merges deduplicated results. This improves retrieval for specific or narrow questions by also searching with a broader perspective.

from synapsekit import StepBackRetriever

step_back = StepBackRetriever(
    retriever=retriever,
    llm=llm,
)

results = await step_back.retrieve("What is the melting point of gold?", top_k=5)

The process:

The LLM generates a step-back (more abstract) question from the original query
Both the original and step-back queries are used to retrieve results in parallel
Results are merged and deduplicated, preserving order

Custom prompt template

Override the default prompt to control how step-back questions are generated:

step_back = StepBackRetriever(
    retriever=retriever,
    llm=llm,
    prompt_template="Given this question, ask a more general version:\n{query}",
)

The template must include {query} as a placeholder for the user's question.

FLARE (Forward-Looking Active REtrieval)

The FLARERetriever implements an iterative retrieve-generate-retrieve loop. It generates an answer, identifies parts that need more information (marked with [SEARCH: ...]), retrieves for those sub-queries, and regenerates — repeating until no more search markers appear or max_iterations is reached.

from synapsekit import FLARERetriever

flare = FLARERetriever(
    retriever=retriever,
    llm=llm,
    max_iterations=3,
)

results = await flare.retrieve("Explain the history of quantum computing", top_k=5)

The process:

Initial retrieval for the original query
LLM generates an answer, inserting [SEARCH: sub-query] markers where it needs more information
Sub-queries are extracted from the markers
If no markers are found, return current documents
New retrieval is performed for each sub-query
Results are merged, deduplicated, and the process repeats (up to max_iterations)

Parameters

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for answer generation
`max_iterations`	`3`	Maximum generate-retrieve cycles
`generate_prompt`	built-in	Prompt for initial answer generation
`regenerate_prompt`	built-in	Prompt for regeneration with new context

HyDE (Hypothetical Document Embeddings)

The HyDERetriever generates a hypothetical answer to the query using an LLM, then uses that hypothetical answer as the search query. This often improves retrieval for complex or abstract questions because the hypothetical answer is closer in embedding space to relevant documents than the original question.

from synapsekit import HyDERetriever

hyde = HyDERetriever(
    retriever=retriever,
    llm=llm,
)

results = await hyde.retrieve("What is quantum entanglement?", top_k=5)

The process:

The LLM generates a hypothetical passage that would answer the query
The hypothetical passage is used as the search query (instead of the original question)
Results are retrieved using the hypothetical passage, which is often closer to relevant documents in embedding space

Custom prompt template

Override the default prompt to control how hypothetical answers are generated:

hyde = HyDERetriever(
    retriever=retriever,
    llm=llm,
    prompt_template="Write a short paragraph answering: {query}",
)

The template must include {query} as a placeholder for the user's question.

Hybrid Search Retrieval

The HybridSearchRetriever combines BM25 keyword matching with vector similarity using Reciprocal Rank Fusion (RRF). This gives you the best of both sparse (keyword) and dense (vector) retrieval.

from synapsekit import HybridSearchRetriever

hybrid = HybridSearchRetriever(
    retriever=retriever,
    bm25_weight=0.5,
    vector_weight=0.5,
    rrf_k=60,
)

# Build the BM25 index from your documents
hybrid.add_documents(["doc one text...", "doc two text...", "doc three text..."])

# Retrieve — fuses BM25 and vector results via RRF
results = await hybrid.retrieve("search query", top_k=5)

The process:

Vector retrieval via the base retriever
BM25 scoring on the indexed documents
RRF fusion: score = weight / (rrf_k + rank + 1) for both result sets
Results are sorted by fused score and deduplicated

Uses the existing rank-bm25 hard dependency — no extra install needed.

Self-RAG (Self-Reflective RAG)

The SelfRAGRetriever implements a self-reflective retrieval loop: retrieve candidates, grade each for relevance, generate an answer, check if the documents support the answer, and retry with a rewritten query if not.

from synapsekit import SelfRAGRetriever

self_rag = SelfRAGRetriever(
    retriever=retriever,
    llm=llm,
    max_iterations=2,
    relevance_threshold=0.5,
)

results = await self_rag.retrieve("What is quantum computing?", top_k=5)

The process:

Retrieve candidates using the base retriever
LLM grades each document as "relevant" or "irrelevant"
LLM generates an answer from relevant documents
LLM checks if the answer is "fully", "partially", or "not" supported
If not fully supported, the query is rewritten and the process repeats

Inspecting reflection metadata

results, meta = await self_rag.retrieve_with_reflection("query", top_k=5)
print(meta["iterations"])     # Number of iterations performed
print(meta["support_level"])  # "fully", "partially", or "not"

Adaptive RAG

The AdaptiveRAGRetriever uses an LLM to classify query complexity (simple/moderate/complex) and routes to different retrieval strategies accordingly.

from synapsekit import AdaptiveRAGRetriever

adaptive = AdaptiveRAGRetriever(
    llm=llm,
    simple_retriever=basic_retriever,
    moderate_retriever=fusion_retriever,
    complex_retriever=multi_step_retriever,
)

results = await adaptive.retrieve("What is 2+2?")  # → routed to simple
results = await adaptive.retrieve("Compare quantum and classical computing for ML")  # → routed to complex

The process:

LLM classifies the query as "simple", "moderate", or "complex"
The query is routed to the corresponding retriever
Fallback: if moderate_retriever is not provided, uses simple_retriever; if complex_retriever is not provided, uses moderate_retriever

Inspecting classification

results, classification = await adaptive.retrieve_with_classification("query")
print(classification)  # "simple", "moderate", or "complex"

GraphRAG (Knowledge Graph Retrieval)

The GraphRAGRetriever combines knowledge graph traversal with vector retrieval. It extracts entities from the query, traverses a knowledge graph to find related documents, and merges those with standard vector retrieval results.

from synapsekit import GraphRAGRetriever, KnowledgeGraph

# Build a knowledge graph
kg = KnowledgeGraph()
kg.add_triple("Python", "is_a", "programming language")
kg.add_triple("Python", "used_for", "machine learning")
kg.add_document_link("Python", "doc_1")
kg.add_document_link("machine learning", "doc_2")

# Or build from documents using an LLM
await kg.build_from_documents(["Python is a programming language used for ML..."], llm)

# Combine with vector retrieval
graphrag = GraphRAGRetriever(
    retriever=retriever,
    llm=llm,
    knowledge_graph=kg,
    max_hops=2,
)

results = await graphrag.retrieve("What is Python used for?", top_k=5)

The process:

The LLM extracts entities from the query
The knowledge graph is traversed up to max_hops from each entity
Related documents are gathered from the graph
Standard vector retrieval runs in parallel
Results are merged and deduplicated

Inspecting graph metadata

results, meta = await graphrag.retrieve_with_graph("query", top_k=5)
print(meta["entities_extracted"])  # Entities found in the query
print(meta["graph_docs"])          # Documents from graph traversal
print(meta["traversal_hops"])      # Max hops used

Multi-Step Retrieval

The MultiStepRetriever performs iterative retrieval-generation: retrieve documents, generate an answer, identify information gaps, retrieve for those gaps, and repeat until the answer is complete or max_steps is reached.

from synapsekit import MultiStepRetriever

ms = MultiStepRetriever(
    retriever=retriever,
    llm=llm,
    max_steps=3,
)

results = await ms.retrieve("What is the history and future of quantum computing?", top_k=5)

The process:

Initial retrieval for the original query
LLM generates an answer from retrieved documents
LLM identifies gaps — returns search queries for missing information, or "COMPLETE" if done
Gap queries are used for additional retrieval
New documents are added (deduplicated) and the process repeats

Inspecting the step trace

results, trace = await ms.retrieve_with_steps("query")
for step in trace:
    print(step["step"], step["query"], step["new_docs"])
    # step 0: initial query, N new docs
    # step 1: ["gap query 1", "gap query 2"], M new docs
    # step 2: None, 0 new docs, complete=True

Parameters

Retriever

Parameter	Default	Description
`top_k`	`4`	Number of chunks to return
`use_bm25`	`False`	Enable BM25 reranking
`bm25_weight`	`0.3`	Weight for BM25 score in hybrid ranking
`metadata_filter`	`None`	Filter by metadata key-value pairs

HyDERetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for generating hypothetical answers
`prompt_template`	built-in	Custom prompt (must include `{query}`)

SelfQueryRetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for query decomposition
`metadata_fields`	—	List of metadata field names the LLM can filter on
`prompt`	built-in	Custom decomposition prompt

ParentDocumentRetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`chunk_size`	`200`	Characters per chunk
`chunk_overlap`	`50`	Overlap between chunks

CrossEncoderReranker

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`model`	`"cross-encoder/ms-marco-MiniLM-L-6-v2"`	Cross-encoder model name
`fetch_k`	`20`	Number of initial candidates to retrieve

CRAGRetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for grading and query rewriting
`relevance_threshold`	`0.5`	Min fraction of docs that must be relevant
`max_retries`	`1`	Max query rewrites before returning what we have

QueryDecompositionRetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for query decomposition
`num_sub_queries`	`3`	Number of sub-queries to generate

ContextualCompressionRetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for document compression
`fetch_k`	`10`	Number of candidates to retrieve before compression

EnsembleRetriever

Parameter	Default	Description
`retrievers`	—	List of `Retriever` instances
`weights`	equal	Weight for each retriever in RRF scoring
`rrf_k`	`60`	RRF constant (higher = less aggressive reranking)

CohereReranker

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`model`	`"rerank-v3.5"`	Cohere rerank model name
`api_key`	`None`	Cohere API key (falls back to `CO_API_KEY` env var)
`fetch_k`	`20`	Number of initial candidates to retrieve

StepBackRetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for generating step-back questions
`prompt_template`	built-in	Custom prompt (must include `{query}`)

FLARERetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for answer generation
`max_iterations`	`3`	Maximum generate-retrieve cycles
`generate_prompt`	built-in	Prompt for initial answer generation
`regenerate_prompt`	built-in	Prompt for regeneration with new context

HybridSearchRetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`bm25_weight`	`0.5`	Weight for BM25 scores in RRF fusion
`vector_weight`	`0.5`	Weight for vector scores in RRF fusion
`rrf_k`	`60`	RRF constant (higher = less aggressive reranking)

SelfRAGRetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for grading, generation, and support checking
`max_iterations`	`2`	Max retrieve-grade-generate-check cycles
`relevance_threshold`	`0.5`	Min fraction of docs that must be graded relevant

AdaptiveRAGRetriever

Parameter	Default	Description
`llm`	—	LLM for query classification
`simple_retriever`	—	Retriever for simple queries
`moderate_retriever`	`None`	Retriever for moderate queries (falls back to simple)
`complex_retriever`	`None`	Retriever for complex queries (falls back to moderate)
`classify_prompt`	built-in	Custom classification prompt

MultiStepRetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for answer generation and gap identification
`max_steps`	`3`	Maximum retrieval-generation iterations

GraphRAGRetriever

Parameter	Default	Description
`retriever`	—	Base `Retriever` instance
`llm`	—	LLM for entity extraction
`knowledge_graph`	`None`	`KnowledgeGraph` instance (falls back to vector-only if None)
`max_hops`	`2`	Maximum graph traversal hops from extracted entities

GraphRAGRetriever (dedicated import)

Added in v1.1.0 · Requires: pip install synapsekit[openai] (or any LLM provider)

Knowledge-graph-augmented retrieval. Extracts entities from the query using an LLM, traverses a KnowledgeGraph to find related entities, then merges graph context with standard vector results.

from synapsekit.retrieval.graphrag import GraphRAGRetriever, KnowledgeGraph
from synapsekit.llms.openai import OpenAILLM
from synapsekit.embeddings.openai import OpenAIEmbeddings
from synapsekit.vectorstores.chroma import ChromaVectorStore

llm = OpenAILLM(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()
store = ChromaVectorStore(embeddings)

# Build a knowledge graph
kg = KnowledgeGraph()
kg.add_triple("SynapseKit", "is_a", "Python framework")
kg.add_triple("SynapseKit", "supports", "RAG")
kg.add_triple("RAG", "uses", "vector search")
kg.add_triple("RAG", "uses", "LLMs")

retriever = GraphRAGRetriever(
    vector_store=store,
    knowledge_graph=kg,
    llm=llm,
    k=5,
    graph_depth=2,
)

# Add documents
await retriever.add(["SynapseKit is a Python framework for building LLM applications."])

# Retrieve with graph augmentation
results = await retriever.retrieve("What does SynapseKit support?")
# Returns documents with both vector similarity AND graph-traversal context

KnowledgeGraph

In-memory triple store with BFS traversal and entity-document linking.

from synapsekit.retrieval.graphrag import KnowledgeGraph

kg = KnowledgeGraph()

# Add triples
kg.add_triple("subject", "predicate", "object")
kg.add_triple("Alice", "works_at", "Acme Corp")
kg.add_triple("Acme Corp", "located_in", "San Francisco")
kg.add_triple("Acme Corp", "makes", "SynapseKit")

# Traverse from an entity (BFS, depth=2)
related = kg.traverse("Alice", depth=2)
print(related)
# Expected output:
# ['works_at', 'Acme Corp', 'located_in', 'San Francisco', 'makes', 'SynapseKit']

# Get all entities
entities = kg.entities()

# Get triples for an entity
triples = kg.get_triples("Acme Corp")
# Expected output:
# [('Acme Corp', 'located_in', 'San Francisco'), ('Acme Corp', 'makes', 'SynapseKit')]

LLM-powered entity extraction

GraphRAGRetriever uses an LLM to extract entities from the query, then looks them up in the knowledge graph:

# When you query:
results = await retriever.retrieve("Tell me about SynapseKit's RAG support")

# Internally:
# 1. LLM extracts entities: ["SynapseKit", "RAG"]
# 2. KG traversal finds related: ["Python framework", "vector search", "LLMs"]
# 3. Vector search finds similar docs
# 4. Results are merged: graph context prepended to vector results

When to use GraphRAGRetriever

Use when:

Your domain has structured relationships (org charts, product hierarchies, knowledge domains)
Queries involve multi-hop reasoning ("Who works with Alice's team?")
You want to augment semantic search with domain knowledge

Use standard retrieval when:

Documents are self-contained and don't reference external entities
You don't have a pre-built knowledge graph
Latency is critical (KG traversal + LLM entity extraction adds overhead)

ColBERT Retrieval (Late Interaction)

ColBERTRetriever uses RAGatouille to index documents with token-level embeddings and perform ColBERT-style late interaction (MaxSim) during search. This gives significantly higher retrieval quality than single-vector methods, especially for long documents.

pip install synapsekit[colbert]

from synapsekit.retrieval.strategies.colbert import ColBERTRetriever

retriever = ColBERTRetriever(
    model="colbert-ir/colbertv2.0",
    index_name="my-docs",
)

# Index documents (downloads model on first use, ~500MB)
await retriever.add(
    texts=["SynapseKit is an async RAG framework.", "ColBERT uses late interaction."],
    metadata=[{"id": "doc-1"}, {"id": "doc-2"}],
)

# Retrieve top results as strings
results = await retriever.retrieve("async RAG framework", top_k=3)
# ["SynapseKit is an async RAG framework.", ...]

# Retrieve with scores and metadata
results = await retriever.retrieve_with_scores("async RAG framework", top_k=3)
# [{"text": "...", "score": 24.7, "metadata": {"id": "doc-1"}}, ...]

Parameters

Parameter	Type	Default	Description
`model`	`str`	`"colbert-ir/colbertv2.0"`	RAGatouille model name
`index_name`	`str`	`"colbert"`	Name for the local ColBERT index
`index_root`	`str\|None`	`None`	Directory to store the index (default: `.ragatouille/`)

Methods

Method	Returns	Description
`add(texts, metadata=None)`	`None`	Index documents with ColBERT token embeddings (async)
`retrieve(query, top_k=5)`	`list[str]`	Search and return top text results (async)
`retrieve_with_scores(query, top_k=5)`	`list[dict]`	Search with scores + metadata (async)

When to use ColBERT

High-quality retrieval is more important than indexing speed
Documents are long (ColBERT token-level matching handles long docs better than single vectors)
You have GPU resources (ColBERT indexing is compute-intensive)

Use standard vector search when:

Speed and low latency matter more than marginal recall improvement
You need to update the index frequently (ColBERT re-indexes from scratch)
You're CPU-only and can't afford the ColBERT index overhead

Basic usage​

BM25 reranking​

Metadata filtering​

MMR retrieval (diversity)​

RAG Fusion​

Contextual Retrieval​

Sentence Window Retrieval​

Self-Query Retrieval​

Inspecting extracted filters​

Custom prompt​

Parent Document Retrieval​

Adding documents with metadata​

Cross-Encoder Reranking​

Getting scores​

CRAG (Corrective RAG)​

Inspecting grades​

Query Decomposition​

Inspecting sub-queries​

Contextual Compression​

Ensemble Retrieval​

Cohere Reranking​

Getting scores​

API key​

Step-Back Retrieval​

Custom prompt template​

FLARE (Forward-Looking Active REtrieval)​

Parameters​

HyDE (Hypothetical Document Embeddings)​

Custom prompt template​

Hybrid Search Retrieval​

Self-RAG (Self-Reflective RAG)​

Inspecting reflection metadata​

Adaptive RAG​

Inspecting classification​

GraphRAG (Knowledge Graph Retrieval)​

Inspecting graph metadata​

Multi-Step Retrieval​

Inspecting the step trace​

Parameters​

Retriever​

HyDERetriever​

SelfQueryRetriever​

ParentDocumentRetriever​

CrossEncoderReranker​

CRAGRetriever​

QueryDecompositionRetriever​

ContextualCompressionRetriever​

EnsembleRetriever​

CohereReranker​

StepBackRetriever​

FLARERetriever​

HybridSearchRetriever​

SelfRAGRetriever​

AdaptiveRAGRetriever​

MultiStepRetriever​

GraphRAGRetriever​

GraphRAGRetriever (dedicated import)​

KnowledgeGraph​

LLM-powered entity extraction​

When to use GraphRAGRetriever​

ColBERT Retrieval (Late Interaction)​

Parameters​

Methods​

When to use ColBERT​

Basic usage

BM25 reranking

Metadata filtering

MMR retrieval (diversity)

RAG Fusion

Contextual Retrieval

Sentence Window Retrieval

Self-Query Retrieval

Inspecting extracted filters

Custom prompt

Parent Document Retrieval

Adding documents with metadata

Cross-Encoder Reranking

Getting scores

CRAG (Corrective RAG)

Inspecting grades

Query Decomposition

Inspecting sub-queries

Contextual Compression

Ensemble Retrieval

Cohere Reranking

Getting scores

API key

Step-Back Retrieval

Custom prompt template

FLARE (Forward-Looking Active REtrieval)

Parameters

HyDE (Hypothetical Document Embeddings)

Custom prompt template

Hybrid Search Retrieval

Self-RAG (Self-Reflective RAG)

Inspecting reflection metadata

Adaptive RAG

Inspecting classification

GraphRAG (Knowledge Graph Retrieval)

Inspecting graph metadata

Multi-Step Retrieval

Inspecting the step trace

Parameters

Retriever

HyDERetriever

SelfQueryRetriever

ParentDocumentRetriever

CrossEncoderReranker

CRAGRetriever

QueryDecompositionRetriever

ContextualCompressionRetriever

EnsembleRetriever

CohereReranker

StepBackRetriever

FLARERetriever

HybridSearchRetriever

SelfRAGRetriever

AdaptiveRAGRetriever

MultiStepRetriever

GraphRAGRetriever

GraphRAGRetriever (dedicated import)

KnowledgeGraph

LLM-powered entity extraction

When to use GraphRAGRetriever

ColBERT Retrieval (Late Interaction)

Parameters

Methods

When to use ColBERT