Architecture Deep Dive

SynapseKit is intentionally small on the surface, but there is real machinery underneath. This page is a long-form tour of how the core pieces fit together and where to extend them. It is aimed at power users and contributors who want to understand execution flow, not just APIs.

If you only need the high-level overview, read the shorter Architecture page first. This deep dive assumes you already know the basic concepts (RAG, agents, graphs).

Reading map (use this to skip around)

If you care about event loops and sync wrappers: see Async runtime model.
If you are building RAG pipelines: see RAG internals and Retrieval flow.
If you want graph pipelines: see Graph engine internals.
If you build agents: see Agent loop internals and ReasoningAgent routing.
If you want extensions: see Plugins and Extension points.

1) Layer model recap

SynapseKit is structured as a set of loosely-coupled layers you can use independently. The layer diagram remains the simplest mental model:

SynapseKit layer architecture

Key idea: layers are composable but not forced. You can use a single LLM provider without agents, or a graph without RAG, or RAG without graphs. Most modules are designed to be swappable behind clear interfaces.

Layer-by-layer quick guide (minimal)

LLM layer: provider-agnostic generation, streaming, and tool calls
Retrieval layer: vector stores, retrievers, rerankers
RAG layer: orchestration of retrieval + generation + memory
Agent layer: tool loops over LLMs
Graph layer: stateful DAG execution and checkpoints
Observability layer: spans, metrics, evaluation

This mental model helps decide where to extend: pick the smallest layer that solves your problem.

2) Core abstractions and contracts

This is the minimal set of types you should understand to reason about the system. These are the core contracts that most features build on top of.

BaseLLM

All providers implement a single async interface. This is the “thin waist” of the LLM layer.

Key methods (simplified):

class BaseLLM:
    async def generate(self, prompt: str, **kwargs) -> str: ...
    async def stream(self, prompt: str, **kwargs): ...
    async def call_with_tools(self, messages, tools, **kwargs) -> dict: ...

Where to read:

src/synapsekit/llm/base.py
docs/api/llm

RAGPipeline

RAGPipeline orchestrates ingestion + retrieval + generation. It owns a splitter, a retriever, a memory buffer, and tracing hooks.

Where to read:

src/synapsekit/rag/pipeline.py
docs/rag/pipeline

StateGraph and CompiledGraph

StateGraph is a fluent builder for DAG workflows. CompiledGraph is the executable form.

Where to read:

src/synapsekit/graph/graph.py
src/synapsekit/graph/compiled.py
docs/graph/overview

BaseAgent

Agents are tool-using loops on top of an LLM. Two main flavors exist: ReAct and Function Calling.

Where to read:

src/synapsekit/agents/react.py
src/synapsekit/agents/function_calling.py
docs/agents/overview

TokenTracer

TokenTracer records token usage, latency, cost, and quality metrics. It is the core of observability in most flows.

Where to read:

src/synapsekit/observability/tracer.py
docs/observability/overview

PluginRegistry

Plugins are optional, user-defined hooks that can be loaded and unloaded. The registry is a small but important extension point.

Where to read:

src/synapsekit/plugins/registry.py
src/synapsekit/plugins/loader.py
docs/plugins

3) Async runtime model (and why sync wrappers exist)

SynapseKit is async-first. Every public API is async under the hood. But the framework still supports sync entry points for notebooks and quick scripts. This is handled via two small utilities:

install_fast_loop(): tries to install uvloop if present
run_sync(): runs a coroutine safely in both loop and no-loop contexts

Relevant code:

src/synapsekit/_loop.py
src/synapsekit/_compat.py

Key behavior:

On import, install_fast_loop() runs once.
run_sync() checks if a loop is already running.
If a loop is running (e.g., Jupyter), it uses a fresh thread and asyncio.run.
If no loop is running, it calls asyncio.run directly.

This avoids deadlocks when calling sync wrappers from a running loop.

Example (conceptual):

def run_sync(coro):
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None
    if loop is running:
        # start new thread -> asyncio.run(coro)
    else:
        return asyncio.run(coro)

Mermaid view of the decision:

flowchart TD
    A[Sync wrapper called] --> B{Is event loop running?}
    B -- No --> C[asyncio.run(coro)] --> Z[Return result]
    B -- Yes --> D[ThreadPoolExecutor]
    D --> E[asyncio.run in new thread]
    E --> Z[Return result]

Best practice:

Prefer async APIs in servers.
Use sync wrappers only for scripts or notebooks.
Never call sync wrappers from within a running asyncio loop in production servers.

4) RAG facade and ingestion flow

The RAG class is a convenience wrapper for the full pipeline. It chooses defaults and wires together the main components.

Key initialization steps (from src/synapsekit/rag/facade.py):

Build the LLM via make_llm(model, api_key, provider, ...)
Build embeddings via SynapsekitEmbeddings
Build vector store via InMemoryVectorStore
Build Retriever using that vector store
Optionally build KnowledgeGraphBuilder + HybridKGRetriever
Build ConversationMemory
Build TokenTracer
Construct RAGPipeline with a RAGConfig

This is the “3-line happy path.”

Provider auto-detection (factory)

The RAG facade calls make_llm(), which auto-detects provider from model name. This is why “gpt-4o-mini” works without specifying provider.

Provider resolution (simplified):

claude* -> anthropic
gemini* -> gemini
open-mistral* / mistral* -> mistral
deepseek* -> deepseek
moonshot* -> moonshot
minimax* / abab* -> minimax
glm* -> zhipu
@cf/* or @hf/* -> cloudflare
model contains “/” -> openrouter
otherwise -> openai

Where to read:

src/synapsekit/llm/_factory.py

Multimodal ingestion

The facade detects file paths and routes to specific loaders:

ImageLoader
AudioLoader
VideoLoader
PDFLoader

If no file is detected, it falls back to plain text ingestion.

This means RAG.add() can accept both text and file paths.

RAG facade quick guide (minimal)

Start with RAG(model, api_key)
Call add() for ingestion
Call ask_sync() for a quick answer
Use stream() for token streaming
Use RAGPipeline directly when you need custom components

5) RAGPipeline internals (ingest -> retrieve -> answer)

RAGPipeline owns the concrete flow and the safety logic. Some key points:

Uses a splitter (RecursiveCharacterTextSplitter) unless retriever overrides add_document.
Skips empty or whitespace-only chunks.
Supports metadata propagation on ingestion.
Tracks auto-eval tasks and async evaluation tasks.

Data flow diagram

RAG pipeline data flow

Query path (simplified)

start_span("rag.ask")
start_span("rag.retrieve")
call retriever.retrieve or retriever.retrieve_with_scores
optional context packer
end_span("rag.retrieve") with chunk count and top score
build prompt (system prompt + memory + context)
call LLM stream/generate
update memory
end_span("rag.ask")

You can see the span calls in src/synapsekit/rag/pipeline.py. This is why the tracing layer can show retrieval latency separately from model latency.

6) Retrieval and ranking pipeline

Retrieval is intentionally modular. The core Retriever is thin and delegates to vector store backends. Advanced strategies live under src/synapsekit/retrieval.

Base Retriever flow (step-by-step)

This is the default path used when no advanced strategy overrides it.

fetch_k is calculated as top_k (or top_k * 3 when rerank is enabled)
VectorStore.search() returns candidate chunks + metadata
Optional BM25 rerank narrows results back down to top_k
Results are returned to RAGPipeline for prompt assembly

The base retriever emits a reranker span when BM25 is used:

reranker.rerank (attributes: type, top_k, candidates)

Where to read:

src/synapsekit/retrieval/retriever.py

MMR and diversity

Retriever.retrieve_mmr() delegates to VectorStore.search_mmr(). This is a diversity-focused retrieval strategy used by some advanced flows.

Where to read:

src/synapsekit/retrieval/retriever.py (retrieve_mmr)
src/synapsekit/retrieval/vectorstore.py (search_mmr)

Strategy modules (examples)

rag_fusion.py (RAG Fusion)
self_rag.py (self-rag)
query_decomposition.py
adaptive.py
kg/* (knowledge graph retrieval)
federated.py (fan-out to multiple retrievers)
parent_document.py (parent document retrieval)
sentence_window.py (windowed retrieval)

These are opt-in strategies that build on the same Retriever/VectorStore contracts.

VectorStore extension point

VectorStore implementations live under:

src/synapsekit/retrieval/*.py

Each backend implements add(), search(), search_mmr(), save(), load(). The interface is intentionally small so new backends are easy to add.

Advanced reranking

BM25 rerank is built-in (rank-bm25)
Cross-encoder and other rerankers live in retrieval/* modules

Ingestion override rule (important)

If Retriever implements add_document explicitly, RAGPipeline defers chunking to it. This allows advanced retrievers to control their own ingest behavior.

7) Graph engine internals

The graph engine is built around a fluent builder and a compiled executor. The builder is StateGraph; the compiled form is CompiledGraph.

Builder behavior (StateGraph)

add_node(name, fn)
add_edge(src, dst)
add_conditional_edge(src, condition_fn, mapping)
set_entry_point(name)
compile()

Validation happens in compile(), which calls _validate(). If entry point is missing or invalid, GraphConfigError is raised.

Where to read:

src/synapsekit/graph/graph.py
src/synapsekit/graph/edge.py
src/synapsekit/graph/node.py
src/synapsekit/graph/errors.py

Execution overview

Graph execution flow

CompiledGraph execution model (more detail)

CompiledGraph pre-builds an adjacency map for O(1) edge lookup per node. Execution happens in waves; nodes in the same wave can run concurrently.

Key internal details:

_adj is built once in init for fast edge traversal
max_steps defaults to 100 to prevent infinite cycles
state is copied at the start of run()
transient keys are injected into state for subgraphs and stripped later

Relevant transient keys:

checkpointer
graph_id
step

Where to read:

src/synapsekit/graph/compiled.py

Graph events and streaming

Graph execution emits typed events via EventHooks. These can be used for SSE and WebSocket streaming.

Key event types:

node_start
node_complete
wave_start
wave_complete
error

Where to read:

src/synapsekit/graph/streaming.py
docs/graph/mermaid

Mermaid view of graph event flow:

flowchart TD
    A[compiled.run] --> B[node_start]
    B --> C[node_complete]
    C --> D[wave_complete]
    C --> E[error]
    D --> F[next wave]
    F --> B

ExecutionTrace and GraphVisualizer

ExecutionTrace records all events with timestamps and duration. GraphVisualizer renders the trace in three formats:

ASCII timeline
Mermaid diagram (optionally trace-highlighted)
Standalone HTML with embedded Mermaid

Where to read:

src/synapsekit/graph/trace.py
src/synapsekit/graph/visualization.py

Checkpointing

Checkpointers are pluggable storage backends. Built-ins include memory, SQLite, Redis, Postgres, and JSON file checkpointers.

Where to read:

src/synapsekit/graph/checkpointers/*
docs/graph/checkpointing

Graph how-to quick guide (minimal)

Build your StateGraph with clear node names.
Use TypedState when you need reducers or structured state.
Use EventHooks for logging and debugging.
Use ExecutionTrace when you need post-mortem analysis.
Use checkpointers for resumable workflows.

8) Agent loop internals (ReAct and Function Calling)

ReActAgent (tool loop)

ReActAgent uses a strict prompt format and a scratchpad. It parses “Thought / Action / Action Input” blocks, executes tools, and continues.

Key components:

ToolRegistry (tool schemas)
AgentScratchpad (history of steps)
_parse_action / _parse_final_answer helpers
AgentMemory (optional persistent memory)

Where to read:

src/synapsekit/agents/react.py
src/synapsekit/agents/memory.py
src/synapsekit/agents/registry.py

FunctionCallingAgent

Function calling relies on provider-native tool call APIs. The agent selects tools via structured tool call payloads rather than text parsing. This eliminates brittle string parsing and makes tool schemas first-class.

Where to read:

src/synapsekit/agents/function_calling.py

Agent loop step-by-step (ReAct)

The ReAct flow is deterministic at a high level:

Build system + user messages (including scratchpad)
LLM returns Thought / Action / Action Input
Tool is resolved via ToolRegistry
Tool is executed and Observation appended
Scratchpad is updated
Repeat until Final Answer

If a tool name is unknown, the agent raises an error rather than guessing. This is intentional to prevent hallucinated tool calls.

ReasoningAgent routing

ReasoningAgent wraps a fast LLM and a reasoning LLM. It routes based on a ComplexityClassifier.

Key features:

LLM-based classification if classifier_llm is provided
Heuristic fallback (length, keywords, question count)
Budgeted reasoning LLM wrapper with thinking token limits

Where to read:

src/synapsekit/agents/reasoning_agent.py

Mermaid view of the routing logic:

flowchart TD
    Q[User query] --> C[ComplexityClassifier]
    C -->|simple| F[Fast LLM AgentExecutor]
    C -->|complex| R[Reasoning LLM AgentExecutor]
    R --> B[Budget guard]
    F --> A[Final answer]
    R --> A

9) Observability and evaluation pipeline

Observability is a combination of spans + token tracing. It is implemented by a small runtime that manages spans, exporters, and sampling.

Key pieces:

TokenTracer: records tokens, latency, cost, and quality metrics
observe.runtime: start_span / end_span / record_exception
RAGEvaluator: optional async evaluation of RAG quality
PrometheusMetrics: optional metrics exporter

Where to read:

src/synapsekit/observability/tracer.py
src/synapsekit/observe/runtime.py
src/synapsekit/observability/metrics.py
src/synapsekit/evaluation/rag_evaluator.py
docs/observability/overview
docs/evalci/overview

Observe runtime internals

Observe uses a global _STATE and a ContextVar to track the current span. It supports multiple exporters (console, OTLP, Jaeger, Langfuse, Honeycomb). Sampling is controlled by ObserveConfig.sample_rate. Sensitive keys can be redacted with ObserveConfig.redact_keys.

This is intentionally simple so instrumentation has low overhead.

Metrics pipeline (Prometheus)

PrometheusMetrics can record:

synapsekit_cost_usd_total
synapsekit_tokens_total
synapsekit_latency_seconds

These metrics are emitted when the llm.generate span is recorded. If you enable metrics, they attach to observe runtime and are updated per span.

How traces flow

For example, RAGPipeline.stream() emits:

rag.ask
rag.retrieve
rag.generate (inside LLM call)

Agent loops emit:

agent.run
agent.step
agent.final_answer

Graph flows emit:

graph.run
graph.wave
graph.node

These spans are deliberately named so dashboards can aggregate across features.

Observability quick guide (minimal)

Call observe.configure() early in your app
Choose an exporter (console, otlp, jaeger, langfuse)
Optionally enable PrometheusMetrics
Verify spans by checking exporter output

10) Plugins and extension hooks

The plugin system is intentionally small but fully async.

Key classes:

BasePlugin: subclass with a name, optional on_load / on_unload
PluginRegistry: register, load, unload, list, get
load_plugin_from_path: dynamic import + registration

Where to read:

src/synapsekit/plugins/base.py
src/synapsekit/plugins/registry.py
src/synapsekit/plugins/loader.py
docs/plugins

Mermaid view of plugin lifecycle:

flowchart TD
    A[Plugin class] --> B[registry.register]
    B --> C[registry.load]
    C --> D[instance.on_load]
    D --> E[Plugin active]
    E --> F[registry.unload]
    F --> G[instance.on_unload]

This design keeps plugins explicit and predictable. There is no hidden global auto-loading unless you call it.

11) API stability markers (public_api / experimental / deprecated)

SynapseKit uses lightweight decorators to mark API stability. These do not change runtime behavior, but they attach metadata and warnings.

Where to read:

src/synapsekit/_api.py

Summary:

public_api: marks stable interfaces
experimental: warns on first use
deprecated: warns with a reason and alternative

Why this matters:

You can build tooling to detect experimental APIs via attributes
You can surface deprecations in your own docs or CLI

This is helpful when deciding what to depend on.

12) Optional dependencies and lazy imports

SynapseKit intentionally has only two hard dependencies. Everything else is opt-in and imported lazily.

Common extras:

synapsekit[openai]
synapsekit[chroma]
synapsekit[redis]
synapsekit[postgres]
synapsekit[performance]
synapsekit[all]

Lazy import behavior:

Optional modules are imported inside methods, not at module import time.
If an extra is missing, you get an ImportError only when you use that feature.

This keeps core installs small while still allowing advanced features.

13) Extension points (where to hook in)

This section is intentionally practical. It lists the actual files you touch and the typical sequence.

Add a new LLM provider (guide)

Minimal steps:

Create src/synapsekit/llm/<provider>.py and implement BaseLLM
Wire it into the factory (src/synapsekit/llm/_factory.py)
Add optional dependency in pyproject.toml extras
Add tests under tests/llm
Add docs under synapsekit-docs/docs/llms/<provider>.md

Key file: src/synapsekit/llm/_factory.py (provider auto-detection)

Tip: provider auto-detection is done by model name prefix. If your models follow a unique prefix, add it there.

Add a vector store (guide)

Minimal steps:

Create src/synapsekit/retrieval/<backend>.py
Implement VectorStore.add/search/search_mmr/save/load
Add optional dependency in pyproject.toml extras
Add tests under tests/retrieval
Add docs under docs/rag/vector-stores or docs/api/vector-store

Tip: keep VectorStore small and let Retriever own reranking.

Add a tool (guide)

Minimal steps:

Implement BaseTool under src/synapsekit/agents/tools
Export in src/synapsekit/agents/init.py
Re-export in src/synapsekit/init.py
Add tests under tests/agents or tests/tools
Add docs under docs/agents/tools or docs/guides/agents

Tip: keep tools pure and side-effect free when possible.

Add a graph node type (guide)

Implement a node function that accepts and returns dict state
Register it with StateGraph.add_node()
If it needs metadata, attach metadata on add_node(..., metadata={...})
Use TypedState when merging needs reducers

Tip: use EventHooks to debug node ordering.

Add a plugin (guide)

Subclass BasePlugin and define name
Register with PluginRegistry.register()
Optionally load dynamically via load_plugin_from_path()
Use on_load/on_unload for async setup/teardown

These extension points are deliberately explicit and documented.

14) Common pitfalls (and how to avoid them)

Mixing sync wrappers inside a running asyncio loop. Use the async APIs in servers and long-running apps.
Assuming retriever.add_document will be called. If you override Retriever, ensure it handles ingestion explicitly.
Forgetting to pass metadata on ingestion. Metadata powers source tracing, filtering, and evaluation.
Overusing global state for tools or agents. Prefer dependency injection when possible.
Ignoring TokenTracer and spans. Observability makes debugging much faster in production.
Building graphs without setting an entry point. GraphConfigError is raised; set_entry_point() before compile().
Calling graph streaming without attaching hooks. If you want events, pass EventHooks or use sse_stream/ws_stream.
Returning non-dict state from graph nodes. StateGraph expects dict state updates or reducers.

15) Suggested next reads

Graph Workflows overview: /docs/graph/overview
RAG pipeline details: /docs/rag/pipeline
Agent system overview: /docs/agents/overview
Observability overview: /docs/observability/overview
API reference: /docs/api/llm

16) Quick mental model (one paragraph)

Think of SynapseKit as a thin, async-first core with small but explicit contracts. RAGPipeline and StateGraph are orchestration layers built on BaseLLM and VectorStore. Agents are loops over those same contracts, with tool execution layered on top. Observability and evaluation are cross-cutting concerns that instrument the flow. Plugins and optional extras keep the system extensible without bloating the base install.

Reading map (use this to skip around)​

1) Layer model recap​

Layer-by-layer quick guide (minimal)​

2) Core abstractions and contracts​

BaseLLM​

RAGPipeline​

StateGraph and CompiledGraph​

BaseAgent​

TokenTracer​

PluginRegistry​

3) Async runtime model (and why sync wrappers exist)​

4) RAG facade and ingestion flow​

Provider auto-detection (factory)​

Multimodal ingestion​

RAG facade quick guide (minimal)​

5) RAGPipeline internals (ingest -> retrieve -> answer)​

Data flow diagram​

Query path (simplified)​

6) Retrieval and ranking pipeline​

Base Retriever flow (step-by-step)​

MMR and diversity​

Strategy modules (examples)​

VectorStore extension point​

Advanced reranking​

Ingestion override rule (important)​

7) Graph engine internals​

Builder behavior (StateGraph)​

Execution overview​

CompiledGraph execution model (more detail)​

Graph events and streaming​

ExecutionTrace and GraphVisualizer​

Checkpointing​

Graph how-to quick guide (minimal)​

8) Agent loop internals (ReAct and Function Calling)​

ReActAgent (tool loop)​

FunctionCallingAgent​

Agent loop step-by-step (ReAct)​

ReasoningAgent routing​

9) Observability and evaluation pipeline​

Observe runtime internals​

Metrics pipeline (Prometheus)​

How traces flow​

Observability quick guide (minimal)​

10) Plugins and extension hooks​

11) API stability markers (public_api / experimental / deprecated)​

12) Optional dependencies and lazy imports​

13) Extension points (where to hook in)​

Add a new LLM provider (guide)​

Add a vector store (guide)​

Add a tool (guide)​

Add a graph node type (guide)​

Add a plugin (guide)​

14) Common pitfalls (and how to avoid them)​

15) Suggested next reads​

16) Quick mental model (one paragraph)​

Reading map (use this to skip around)

1) Layer model recap

Layer-by-layer quick guide (minimal)

2) Core abstractions and contracts

BaseLLM

RAGPipeline

StateGraph and CompiledGraph

BaseAgent

TokenTracer

PluginRegistry

3) Async runtime model (and why sync wrappers exist)

4) RAG facade and ingestion flow

Provider auto-detection (factory)

Multimodal ingestion

RAG facade quick guide (minimal)

5) RAGPipeline internals (ingest -> retrieve -> answer)

Data flow diagram

Query path (simplified)

6) Retrieval and ranking pipeline

Base Retriever flow (step-by-step)

MMR and diversity

Strategy modules (examples)

VectorStore extension point

Advanced reranking

Ingestion override rule (important)

7) Graph engine internals

Builder behavior (StateGraph)

Execution overview

CompiledGraph execution model (more detail)

Graph events and streaming

ExecutionTrace and GraphVisualizer

Checkpointing

Graph how-to quick guide (minimal)

8) Agent loop internals (ReAct and Function Calling)

ReActAgent (tool loop)

FunctionCallingAgent

Agent loop step-by-step (ReAct)

ReasoningAgent routing

9) Observability and evaluation pipeline

Observe runtime internals

Metrics pipeline (Prometheus)

How traces flow

Observability quick guide (minimal)

10) Plugins and extension hooks

11) API stability markers (public_api / experimental / deprecated)

12) Optional dependencies and lazy imports

13) Extension points (where to hook in)

Add a new LLM provider (guide)

Add a vector store (guide)

Add a tool (guide)

Add a graph node type (guide)

Add a plugin (guide)

14) Common pitfalls (and how to avoid them)

15) Suggested next reads

16) Quick mental model (one paragraph)