LLM Overview
All LLMs in SynapseKit extend BaseLLM and share the same interface.
Interface
class BaseLLM(ABC):
async def stream(self, prompt: str, **kwargs) -> AsyncIterator[str]: ...
async def generate(self, prompt: str, **kwargs) -> str: ...
async def stream_with_messages(self, messages: list[dict], **kwargs) -> AsyncIterator[str]: ...
async def generate_with_messages(self, messages: list[dict], **kwargs) -> str: ...
generate() is always implemented as "".join([...async for... in stream()]) — streaming is primary.
LLMConfig
from synapsekit import LLMConfig
config = LLMConfig(
model="gpt-4o-mini",
api_key="sk-...",
provider="openai",
system_prompt="You are a helpful assistant.",
temperature=0.2,
max_tokens=1024,
# Optional: caching and retries
cache=False, # Enable LRU response caching
cache_maxsize=128, # Max cached responses
cache_backend="memory", # "memory" or "sqlite"
max_retries=0, # Retry attempts (0 = disabled)
retry_delay=1.0, # Initial retry delay in seconds
# Rate limiting
requests_per_minute=None, # Token-bucket rate limiter
)
See Caching & Retries for details on response caching and exponential backoff.
Available providers
| Provider | Class | Extra | Provider string |
|---|---|---|---|
| OpenAI | OpenAILLM | pip install synapsekit[openai] | "openai" |
| Anthropic | AnthropicLLM | pip install synapsekit[anthropic] | "anthropic" |
| Ollama | OllamaLLM | pip install synapsekit[ollama] | "ollama" |
| Cohere | CohereLLM | pip install synapsekit[cohere] | "cohere" |
| Mistral | MistralLLM | pip install synapsekit[mistral] | "mistral" |
| Google Gemini | GeminiLLM | pip install synapsekit[gemini] | "gemini" |
| AWS Bedrock | BedrockLLM | pip install synapsekit[bedrock] | "bedrock" |
| Azure OpenAI | AzureOpenAILLM | pip install synapsekit[openai] | "azure" |
| Groq | GroqLLM | pip install synapsekit[groq] | "groq" |
| DeepSeek | DeepSeekLLM | pip install synapsekit[openai] | "deepseek" |
| OpenRouter | OpenRouterLLM | pip install synapsekit[openai] | "openrouter" |
| Together AI | TogetherLLM | pip install synapsekit[openai] | "together" |
| Fireworks AI | FireworksLLM | pip install synapsekit[openai] | "fireworks" |
| Perplexity AI | PerplexityLLM | pip install synapsekit[openai] | "perplexity" |
| Cerebras | CerebrasLLM | pip install synapsekit[openai] | "cerebras" |
| Google Vertex AI | VertexAILLM | pip install synapsekit[vertexai] | "vertexai" |
| Moonshot AI | MoonshotLLM | pip install synapsekit[openai] | "moonshot" |
| Zhipu AI | ZhipuLLM | pip install synapsekit[openai] | "zhipu" |
| Cloudflare AI | CloudflareLLM | built-in | "cloudflare" |
| AI21 Labs | AI21LLM | pip install synapsekit[ai21] | "ai21" |
| Databricks | DatabricksLLM | pip install synapsekit[openai] | "databricks" |
| Baidu ERNIE | ErnieLLM | pip install synapsekit[ernie] | "ernie" |
| llama.cpp | LlamaCppLLM | pip install synapsekit[llamacpp] | "llamacpp" |
| Minimax | MinimaxLLM | built-in | "minimax" |
| Aleph Alpha | AlephAlphaLLM | pip install synapsekit[aleph-alpha] | "aleph-alpha" |
| Hugging Face | HuggingFaceLLM | pip install synapsekit[huggingface] | "huggingface" |
| SambaNova | SambaNovaLLM | pip install synapsekit[openai] | "sambanova" |
| xAI (Grok) | XaiLLM | pip install synapsekit[openai] | "xai" |
| NovitaAI | NovitaLLM | pip install synapsekit[openai] | "novita" |
| Writer (Palmyra) | WriterLLM | pip install synapsekit[openai] | "writer" |
| LM Studio | LMStudioLLM | pip install synapsekit[lmstudio] | "lmstudio" |
| GPT4All | GPT4AllLLM | pip install synapsekit[gpt4all] | "gpt4all" |
| vLLM | VLLMLlm | pip install synapsekit[vllm] | "vllm" |
| Replicate | ReplicateLLM | pip install synapsekit[replicate] | "replicate" |
Auto-detection
The RAG facade auto-detects the provider from the model name:
| Model prefix | Detected provider |
|---|---|
claude-* | anthropic |
gemini-* | gemini |
command-* | cohere |
mistral-*, open-mistral-* | mistral |
deepseek-* | deepseek |
moonshot-* | moonshot |
glm-* | zhipu |
jamba-* | ai21 |
@cf/*, @hf/* | cloudflare |
dbrx-*, databricks-* | databricks |
ernie-* | ernie |
minimax-* | minimax |
luminous-*, pharia-* | aleph-alpha |
llama-*, mixtral-*, gemma-* | groq |
*/... (contains /) | openrouter |
| everything else | openai |
SambaNova
SambaNova model names (e.g. Meta-Llama-3.1-8B-Instruct) don't have a unique prefix — always pass provider="sambanova" explicitly.
Override with the provider= argument:
rag = RAG(model="llama3", api_key="", provider="ollama")
Tokens and cost tracking
Every provider tracks input/output tokens:
llm = OpenAILLM(config)
await llm.generate("Hello!")
print(llm.tokens_used) # {"input": 12, "output": 8}
The TokenTracer in RAGPipeline aggregates this across all calls.
Next steps
- OpenAI — GPT-4o, GPT-4o-mini, structured output, vision
- Anthropic — Claude models, extended context, tool use
- Ollama — run local models with no API key
- AI21 Labs — Jamba models, 256K context
- Databricks — DBRX and Llama on your workspace
- Baidu ERNIE — Chinese-English bilingual models
- llama.cpp — run GGUF models fully on-device
- Minimax — SSE streaming with group_id auth
- Aleph Alpha — European LLMs, German-language and multilingual
- Hugging Face — thousands of open-source models via Inference API and Dedicated Endpoints
- SambaNova — fast inference on Llama, Qwen, and other open models
- xAI — Grok models (grok-2, grok-2-mini)
- NovitaAI — hosted open models (Llama, Mistral, Qwen, etc.)
- Writer — Palmyra models including domain-specific (palmyra-med, palmyra-fin)
- LM Studio — local models via LM Studio's OpenAI-compatible API; no API key required
- GPT4All — fully on-device GGUF models via GPT4All Python bindings; no API key, no internet
- vLLM — high-throughput self-hosted inference via vLLM's OpenAI-compatible API
- Replicate — hosted open models (Llama, Mistral, SDXL, etc.) via the Replicate REST API
- Caching & Retries — LRU caching, exponential backoff, rate limiting
- CostRouter & FallbackChain — route to cheapest model or cascade on failure
- Cost Tracker — attribute and budget LLM spending