Skip to main content

RAG Pipeline

RAGPipeline is the full orchestrator. The RAG facade wraps it for the happy path.

Using the RAG facade

from synapsekit import RAG

rag = RAG(model="gpt-4o-mini", api_key="sk-...")
rag.add("Document text...")

# Async
answer = await rag.ask("Your question?")

# Streaming
async for token in rag.stream("Your question?"):
print(token, end="")

# Sync
answer = rag.ask_sync("Your question?")

Loading documents

from synapsekit import RAG, PDFLoader, DirectoryLoader

rag = RAG(model="gpt-4o-mini", api_key="sk-...")

# From a loader (List[Document])
rag.add_documents(PDFLoader("report.pdf").load())

# Entire directory
rag.add_documents(DirectoryLoader("./docs/").load())

# Async versions
await rag.add_async("raw text string")
await rag.add_documents_async(docs)

Persistence

Save and load your vector store across sessions:

rag.save("my_store.npz")

# Later
rag.load("my_store.npz")

Using RAGPipeline directly

For full control over every component:

from synapsekit.rag.pipeline import RAGConfig, RAGPipeline
from synapsekit.llm.openai import OpenAILLM
from synapsekit.llm.base import LLMConfig
from synapsekit import SynapsekitEmbeddings, InMemoryVectorStore, Retriever
from synapsekit.memory.conversation import ConversationMemory
from synapsekit.observability.tracer import TokenTracer

llm = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key="sk-..."))
embeddings = SynapsekitEmbeddings()
store = InMemoryVectorStore(embeddings)
retriever = Retriever(store, rerank=True)

pipeline = RAGPipeline(RAGConfig(
llm=llm,
retriever=retriever,
memory=ConversationMemory(window=10),
tracer=TokenTracer(model="gpt-4o-mini"),
retrieval_top_k=5,
chunk_size=512,
chunk_overlap=50,
))

await pipeline.add("Your document text...")
await pipeline.add_documents(docs) # List[Document]

answer = await pipeline.ask("Your question?")

RAGConfig reference

FieldTypeDefaultDescription
llmBaseLLMrequiredLLM provider
retrieverRetrieverrequiredVector store + retrieval logic
memoryConversationMemoryrequiredConversation history
tracerTokenTracer | NoneNoneToken/latency tracking
retrieval_top_kint5Chunks to retrieve per query
system_promptstr"Answer using only..."LLM system instruction
chunk_sizeint512Max characters per chunk
chunk_overlapint50Overlap between chunks