Skip to main content

CostRouter & FallbackChain

SynapseKit provides two drop-in BaseLLM subclasses for intelligent model routing: CostRouter (cheapest model meeting quality constraints) and FallbackChain (ordered priority with cascading fallback).

CostRouter

Route every request to the cheapest model that meets a quality threshold. If that model fails, automatically try the next cheapest.

from synapsekit import CostRouter, CostRouterConfig, RouterModelSpec

router = CostRouter(CostRouterConfig(
models=[
RouterModelSpec(model="gpt-4o-mini", api_key="sk-...", provider="openai"),
RouterModelSpec(model="gpt-4o", api_key="sk-...", provider="openai"),
RouterModelSpec(model="claude-sonnet-4-20250514", api_key="sk-ant-...", provider="anthropic"),
],
quality_threshold=0.7, # Minimum quality score (0.0-1.0)
fallback_on_error=True, # Try next model on failure
))

# Use like any BaseLLM — streams from the cheapest qualifying model
async for token in router.stream("Explain quantum computing"):
print(token, end="")

print(router.selected_model) # e.g. "gpt-4o-mini"

Quality table

CostRouter uses a built-in QUALITY_TABLE with scores for 30+ models:

from synapsekit import QUALITY_TABLE

print(QUALITY_TABLE["gpt-4o"]) # 0.92
print(QUALITY_TABLE["gpt-4o-mini"]) # 0.78
print(QUALITY_TABLE["claude-sonnet-4-20250514"]) # 0.90

Models not in the table default to 0.5. The router filters out models below quality_threshold, then sorts by price from the built-in COST_TABLE.

Latency constraints

from synapsekit import CostRouterConfig, RouterModelSpec

config = CostRouterConfig(
models=[
RouterModelSpec(model="gpt-4o-mini", api_key="sk-...", provider="openai", max_latency_ms=500),
RouterModelSpec(model="gpt-4o", api_key="sk-...", provider="openai", max_latency_ms=2000),
],
quality_threshold=0.7,
)

RouterModelSpec

ParameterTypeDefaultDescription
modelstrrequiredModel name
api_keystrrequiredAPI key for this model
providerstr"openai"Provider name
max_latency_msfloat | NoneNoneMax acceptable latency

CostRouterConfig

ParameterTypeDefaultDescription
modelslist[RouterModelSpec]requiredCandidate models
quality_thresholdfloat0.0Minimum quality score
strategystr"cheapest"Routing strategy
fallback_on_errorboolTrueTry next model on failure

FallbackChain

Try models in a fixed priority order. If a model errors or returns a response shorter than min_response_length, escalate to the next.

from synapsekit import FallbackChain, FallbackChainConfig
from synapsekit.llm.openai import OpenAILLM
from synapsekit.llm.anthropic import AnthropicLLM
from synapsekit.llm.base import LLMConfig

fast = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key="sk-..."))
strong = AnthropicLLM(LLMConfig(model="claude-sonnet-4-20250514", api_key="sk-ant-...", provider="anthropic"))

chain = FallbackChain(FallbackChainConfig(
models=[fast, strong],
min_response_length=10, # Escalate if response < 10 chars
))

result = await chain.generate("Summarize this document...")
print(chain.used_model) # Which model actually answered

FallbackChainConfig

ParameterTypeDefaultDescription
modelslist[BaseLLM]requiredModels in priority order
min_response_lengthint0Minimum response length before escalating

Drop-in compatibility

Both CostRouter and FallbackChain extend BaseLLM, so they work anywhere a regular LLM is expected:

from synapsekit import RAGPipeline, CostRouter

pipeline = RAGPipeline(llm=router, retriever=retriever)
answer = await pipeline.query("What is the revenue?")

See also