Troubleshooting

Diagnosis steps and fixes for common SynapseKit symptoms, organized by what you observe.

RAG returns irrelevant results

Symptoms: Answers are off-topic, contain hallucinations, or say "I don't know" when the document clearly has the answer.

Diagnosis checklist:

Chunk size too large or too small. Large chunks dilute the signal; small chunks lose context.

# Start here: 512 tokens with 64-token overlap
from synapsekit.rag.splitters import RecursiveCharacterSplitter
splitter = RecursiveCharacterSplitter(chunk_size=512, chunk_overlap=64)

k too small. You may not be retrieving the relevant chunk at all.

retriever = store.as_retriever(k=6)  # retrieve more, then rerank

Wrong retrieval strategy. Try hybrid (BM25 + vector) or MMR for diverse coverage.

retriever = store.as_retriever(strategy="hybrid", k=6)
retriever = store.as_retriever(strategy="mmr", k=6, lambda_mult=0.5)

Embedding model mismatch. The embedding model used at index time must match query time.

# Both must use the same model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
store = ChromaVectorStore(embeddings=embeddings)

Inspect what is actually retrieved:

docs = await retriever.retrieve("your query")
for doc in docs:
    print(doc.metadata, doc.content[:200])

My agent is looping

Symptoms: The agent keeps calling tools in circles, exceeds max_steps, or never returns a final answer.

Diagnosis checklist:

Set max_steps to catch infinite loops early:

agent = ReActAgent(llm=llm, tools=tools, max_steps=10)

Inspect the thought process. Add verbose logging to see each step:

agent = ReActAgent(llm=llm, tools=tools, verbose=True)

Temperature too high. High temperature causes inconsistent action selection.

from synapsekit.llms.config import LLMConfig
llm = OpenAILLM(model="gpt-4o", config=LLMConfig(temperature=0.0))

Tool descriptions are ambiguous. The LLM may not know when to stop using a tool. Make the stopping condition explicit in the tool description.
Prompt missing a termination instruction. Add a clear stopping condition:

When you have enough information to answer the question, stop using tools and return FINAL ANSWER: <answer>.

Streaming is not working

Symptoms: async for chunk in llm.stream(prompt) hangs, returns all at once, or raises RuntimeError: no running event loop.

Diagnosis checklist:

Running in a sync context. Streaming requires an async context.

# Wrong — sync context
for chunk in llm.stream("Hello"):  # AttributeError or hangs
    print(chunk)

# Correct — async context
async def main():
    async for chunk in llm.stream("Hello"):
        print(chunk, end="", flush=True)

import asyncio
asyncio.run(main())

Jupyter notebooks. Use nest_asyncio if you get RuntimeError: This event loop is already running.

import nest_asyncio
nest_asyncio.apply()

FastAPI / Starlette. Use StreamingResponse with an async generator:

from fastapi.responses import StreamingResponse

@app.get("/stream")
async def stream_endpoint(q: str):
    async def generate():
        async for chunk in llm.stream(q):
            yield chunk
    return StreamingResponse(generate(), media_type="text/plain")

Provider does not support streaming. Check the provider's docs. All OpenAI and Anthropic models support streaming; some smaller providers may not.

ImportError for a backend

Symptoms: ImportError: ChromaDB is not installed. Run: pip install synapsekit[chroma]

Fix: Install the extra indicated in the error message.

pip install "synapsekit[chroma]"
pip install "synapsekit[redis]"
pip install "synapsekit[postgres]"
pip install "synapsekit[serve]"
pip install "synapsekit[all]"   # install everything

See Error Reference — ImportError for the full extras table.

Graph checkpoint not saving

Symptoms: graph.resume() raises CheckpointNotFoundError, or state is lost between runs.

Diagnosis checklist:

Checkpointer not attached. You must pass a checkpointer at compile time.

from synapsekit.graph.checkpointing import SQLiteCheckpointer

checkpointer = SQLiteCheckpointer(path="checkpoints.db")
graph = workflow.compile(checkpointer=checkpointer)

Different thread_id between run and resume. Both calls must use the same ID.

thread_id = "my-conversation-1"
await graph.run(state, thread_id=thread_id)
await graph.resume(thread_id=thread_id, updates={...})

Redis/Postgres not reachable. Verify the REDIS_URL or DATABASE_URL environment variable is set and the service is running.

redis-cli -u $REDIS_URL ping  # should return PONG
psql $DATABASE_URL -c "SELECT 1"

Checkpointer table not created. For Postgres, run the schema migration:

await PostgresCheckpointer.create_tables(DATABASE_URL)

Rate limit exceeded

Symptoms: RateLimitError: Rate limit reached for gpt-4o after many requests.

Fix: Configure LLMConfig with a rate limiter and automatic retries.

from synapsekit.llms.config import LLMConfig

llm = OpenAILLM(
    model="gpt-4o",
    config=LLMConfig(
        max_retries=5,
        requests_per_minute=60,    # stay under the limit
        tokens_per_minute=90_000,
    )
)

For batch workloads, add delays between calls:

import asyncio

results = []
for chunk in batches:
    result = await llm.complete(chunk)
    results.append(result)
    await asyncio.sleep(1.0)  # 1 second between calls

My eval cases always pass

Symptoms: @eval_case tests never fail even when the output is clearly wrong.

Diagnosis checklist:

Threshold too low. The default threshold may be 0.5 — raise it.

@eval_case(metric="faithfulness", threshold=0.85)
async def test_rag():
    ...

Wrong metric for the task. Use faithfulness for RAG (does the answer match the context), answer_relevancy for Q&A, groundedness for factual claims.
MockLLM grader. If you are using MockLLM in tests, swap in a real LLM for the evaluation judge.

from synapsekit.evaluation import Evaluator
evaluator = Evaluator(judge_llm=OpenAILLM(model="gpt-4o"))

Empty or trivial outputs. If the RAG returns "I don't know" every time, faithfulness is technically 1.0 (no hallucinations). Check answer_relevancy as well.

synapsekit serve returns 500

Symptoms: All requests to the served endpoint return HTTP 500 Internal Server Error.

Diagnosis checklist:

Import path is wrong. Verify the module:attribute format.

# Check the object loads correctly in Python first
python -c "from myapp import rag; print(rag)"

# Then serve it
synapsekit serve myapp:rag

Object is not an Agent, Pipeline, or Graph. synapsekit serve only accepts these types. Pass a compatible object.
Missing environment variables. The server process inherits the shell environment. Ensure API keys are exported.

export OPENAI_API_KEY=sk-...
synapsekit serve myapp:rag

Check the server logs. Run with --log-level debug for the full traceback.

synapsekit serve myapp:rag --log-level debug

Memory not persisting between runs

Symptoms: The agent starts fresh on every run and has no recall of previous conversations.

Diagnosis checklist:

In-memory store. The default ConversationMemory is in-process and lost on restart. Switch to a persistent backend.

from synapsekit.memory import RedisConversationMemory, SQLiteConversationMemory

# Persists across restarts
memory = SQLiteConversationMemory(path="memory.db", session_id="user-123")
memory = RedisConversationMemory(redis_url="redis://localhost:6379/0", session_id="user-123")

Different session_id each run. The session ID must be the same to retrieve the same history.

# Use a stable identifier — user ID, conversation ID, etc.
memory = SQLiteConversationMemory(path="memory.db", session_id="user-123")

SQLite file path changes. If you use a relative path, it resolves differently depending on your working directory. Use an absolute path.

import os
memory = SQLiteConversationMemory(
    path=os.path.expanduser("~/.synapsekit/memory.db"),
    session_id="user-123"
)

Function calling not working

Symptoms: Tools are never called, the agent always responds with plain text, or you get NotImplementedError.

Supported providers: Function calling is only available on providers that expose a tool-call API.

Provider	Function calling
OpenAI (gpt-4o, gpt-4o-mini, gpt-3.5-turbo)	Yes
Anthropic (claude-3+)	Yes
Groq (llama-3, mixtral)	Yes
Mistral (mistral-large, mistral-medium)	Yes
Google Gemini (gemini-1.5+)	Yes
AWS Bedrock (Claude, Llama 3)	Yes
Ollama	Model-dependent
Cohere	Yes (command-r+)
DeepSeek	Yes (deepseek-chat)

Fix for unsupported models: Use ReActAgent (text-based tool use via prompting) instead of FunctionCallingAgent.

# FunctionCallingAgent requires native tool-call support
agent = FunctionCallingAgent(llm=OpenAILLM(model="gpt-4o"), tools=[...])

# ReActAgent works with any model via prompting
agent = ReActAgent(llm=OllamaLLM(model="llama3"), tools=[...])

Embeddings are slow

Symptoms: Indexing or retrieval takes too long, especially in a loop.

Diagnosis and fixes:

Enable embedding caching. Re-embedding the same text repeatedly is wasteful.

from synapsekit.llms.caching import RedisLLMCache

cache = RedisLLMCache(redis_url="redis://localhost:6379/0")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", cache=cache)

Use batch embedding. Embed many texts in one API call.

texts = [doc.content for doc in documents]
vectors = await embeddings.embed_batch(texts)  # one API call

Use a local embedding model for development. Avoids network round-trips entirely.

from synapsekit.llms.embeddings import SentenceTransformerEmbeddings
embeddings = SentenceTransformerEmbeddings(model="all-MiniLM-L6-v2")

Parallelise large indexing jobs.

import asyncio

async def embed_all(texts: list[str]):
    tasks = [embeddings.embed(t) for t in texts]
    return await asyncio.gather(*tasks)

BudgetExceededError immediately

Symptoms: Every call raises BudgetExceededError even for cheap requests, including the very first one.

Diagnosis checklist:

Circuit breaker is tripped. After any budget exceedance, the guard stays tripped until you reset it.

guard.reset()  # reset the tripped circuit breaker

Limit is too tight. A per_request limit of 0.001 (0.1 cent) will fire even for short responses with gpt-4o.

from synapsekit import BudgetLimit

# Realistic limits (2026 prices)
limit = BudgetLimit(
    per_request=0.05,    # 5 cents per call
    per_day=5.00,        # $5 / day
    per_month=50.00,     # $50 / month
)

Cost estimate is using the wrong model pricing. Ensure the model name passed to BudgetGuard matches the model you are actually using.

guard = BudgetGuard(limit, model="gpt-4o-mini")  # uses gpt-4o-mini pricing

See Observability — Cost Tracker for the full BudgetGuard reference.

`TypeError: 'async for' requires an object with aiter method, got coroutine` (Ollama)

Symptoms: Streaming from OllamaLLM (via stream(), stream_with_messages(), or the RAG facade with provider="ollama") raises:

TypeError: 'async for' requires an object with __aiter__ method, got coroutine

Cause: A breaking change in ollama-python SDK changed AsyncClient.chat() from an async generator function to a regular coroutine that returns an async generator. The previous SynapseKit implementation iterated the coroutine directly rather than awaiting it first, causing this error on newer ollama-python versions.

Fix: Upgrade to synapsekit>=1.9.1, which correctly awaits the coroutine before iterating:

pip install --upgrade "synapsekit[ollama]"

`sounddevice` import error on `import synapsekit` (non-voice users)

Symptoms: Users who do not use SynapseKit's voice features see an import error such as:

ImportError: No module named 'sounddevice'

or:

OSError: PortAudio library not found

immediately on import synapsekit, even though they never use VoicePipeline or any STT/TTS class.

Cause: Prior to v1.9.1, the top-level synapsekit package eagerly imported the voice module tree at startup, which in turn imported sounddevice and other native audio dependencies. This caused failures in environments where those libraries are not installed (CI, serverless, Docker images without PortAudio, etc.).

Fix: Upgrade to synapsekit>=1.9.1. All voice exports (VoicePipeline, BaseSTT, BaseTTS, LocalWhisperSTT, OpenAIWhisperSTT, DeepgramSTT, OpenAITTS, ElevenLabsTTS, CartesiaTTS, PiperTTS, EnergyVAD, SileroVAD, and associated types) are now lazily loaded — they are only imported from disk when first accessed. import synapsekit no longer triggers any audio library loading.

pip install --upgrade synapsekit

If you actively use voice features and want to ensure the extras are present:

pip install "synapsekit[voice]"

RAG returns irrelevant results​

My agent is looping​

Streaming is not working​

ImportError for a backend​

Graph checkpoint not saving​

Rate limit exceeded​

My eval cases always pass​

synapsekit serve returns 500​

Memory not persisting between runs​

Function calling not working​

Embeddings are slow​

BudgetExceededError immediately​

TypeError: 'async for' requires an object with __aiter__ method, got coroutine (Ollama)​

sounddevice import error on import synapsekit (non-voice users)​

RAG returns irrelevant results

My agent is looping

Streaming is not working

ImportError for a backend

Graph checkpoint not saving

Rate limit exceeded

My eval cases always pass

synapsekit serve returns 500

Memory not persisting between runs

Function calling not working

Embeddings are slow

BudgetExceededError immediately

`TypeError: 'async for' requires an object with aiter method, got coroutine` (Ollama)

`sounddevice` import error on `import synapsekit` (non-voice users)