Skip to main content

LLM Overview

PyPI Python License Tests

All LLMs in SynapseKit extend BaseLLM and share the same interface.

Interface

class BaseLLM(ABC):
async def stream(self, prompt: str, **kwargs) -> AsyncIterator[str]: ...
async def generate(self, prompt: str, **kwargs) -> str: ...
async def stream_with_messages(self, messages: list[dict], **kwargs) -> AsyncIterator[str]: ...
async def generate_with_messages(self, messages: list[dict], **kwargs) -> str: ...

generate() is always implemented as "".join([...async for... in stream()]) — streaming is primary.

LLMConfig

from synapsekit import LLMConfig

config = LLMConfig(
model="gpt-4o-mini",
api_key="sk-...",
provider="openai",
system_prompt="You are a helpful assistant.",
temperature=0.2,
max_tokens=1024,
# Optional: caching and retries
cache=False, # Enable LRU response caching
cache_maxsize=128, # Max cached responses
cache_backend="memory", # "memory" or "sqlite"
max_retries=0, # Retry attempts (0 = disabled)
retry_delay=1.0, # Initial retry delay in seconds
# Rate limiting
requests_per_minute=None, # Token-bucket rate limiter
)

See Caching & Retries for details on response caching and exponential backoff.

Available providers

ProviderClassExtraProvider string
OpenAIOpenAILLMpip install synapsekit[openai]"openai"
AnthropicAnthropicLLMpip install synapsekit[anthropic]"anthropic"
OllamaOllamaLLMpip install synapsekit[ollama]"ollama"
CohereCohereLLMpip install synapsekit[cohere]"cohere"
MistralMistralLLMpip install synapsekit[mistral]"mistral"
Google GeminiGeminiLLMpip install synapsekit[gemini]"gemini"
AWS BedrockBedrockLLMpip install synapsekit[bedrock]"bedrock"
Azure OpenAIAzureOpenAILLMpip install synapsekit[openai]"azure"
GroqGroqLLMpip install synapsekit[groq]"groq"
DeepSeekDeepSeekLLMpip install synapsekit[openai]"deepseek"
OpenRouterOpenRouterLLMpip install synapsekit[openai]"openrouter"
Together AITogetherLLMpip install synapsekit[openai]"together"
Fireworks AIFireworksLLMpip install synapsekit[openai]"fireworks"
Perplexity AIPerplexityLLMpip install synapsekit[openai]"perplexity"
CerebrasCerebrasLLMpip install synapsekit[openai]"cerebras"
Google Vertex AIVertexAILLMpip install synapsekit[vertexai]"vertexai"
Moonshot AIMoonshotLLMpip install synapsekit[openai]"moonshot"
Zhipu AIZhipuLLMpip install synapsekit[openai]"zhipu"
Cloudflare AICloudflareLLMbuilt-in"cloudflare"
AI21 LabsAI21LLMpip install synapsekit[ai21]"ai21"
DatabricksDatabricksLLMpip install synapsekit[openai]"databricks"
Baidu ERNIEErnieLLMpip install synapsekit[ernie]"ernie"
llama.cppLlamaCppLLMpip install synapsekit[llamacpp]"llamacpp"
MinimaxMinimaxLLMbuilt-in"minimax"
Aleph AlphaAlephAlphaLLMpip install synapsekit[aleph-alpha]"aleph-alpha"
Hugging FaceHuggingFaceLLMpip install synapsekit[huggingface]"huggingface"
SambaNovaSambaNovaLLMpip install synapsekit[openai]"sambanova"
xAI (Grok)XaiLLMpip install synapsekit[openai]"xai"
NovitaAINovitaLLMpip install synapsekit[openai]"novita"
Writer (Palmyra)WriterLLMpip install synapsekit[openai]"writer"
LM StudioLMStudioLLMpip install synapsekit[lmstudio]"lmstudio"
GPT4AllGPT4AllLLMpip install synapsekit[gpt4all]"gpt4all"
vLLMVLLMLlmpip install synapsekit[vllm]"vllm"
ReplicateReplicateLLMpip install synapsekit[replicate]"replicate"

Auto-detection

The RAG facade auto-detects the provider from the model name:

Model prefixDetected provider
claude-*anthropic
gemini-*gemini
command-*cohere
mistral-*, open-mistral-*mistral
deepseek-*deepseek
moonshot-*moonshot
glm-*zhipu
jamba-*ai21
@cf/*, @hf/*cloudflare
dbrx-*, databricks-*databricks
ernie-*ernie
minimax-*minimax
luminous-*, pharia-*aleph-alpha
llama-*, mixtral-*, gemma-*groq
*/... (contains /)openrouter
everything elseopenai
SambaNova

SambaNova model names (e.g. Meta-Llama-3.1-8B-Instruct) don't have a unique prefix — always pass provider="sambanova" explicitly.

Override with the provider= argument:

rag = RAG(model="llama3", api_key="", provider="ollama")

Tokens and cost tracking

Every provider tracks input/output tokens:

llm = OpenAILLM(config)
await llm.generate("Hello!")
print(llm.tokens_used) # {"input": 12, "output": 8}

The TokenTracer in RAGPipeline aggregates this across all calls.

Next steps

  • OpenAI — GPT-4o, GPT-4o-mini, structured output, vision
  • Anthropic — Claude models, extended context, tool use
  • Ollama — run local models with no API key
  • AI21 Labs — Jamba models, 256K context
  • Databricks — DBRX and Llama on your workspace
  • Baidu ERNIE — Chinese-English bilingual models
  • llama.cpp — run GGUF models fully on-device
  • Minimax — SSE streaming with group_id auth
  • Aleph Alpha — European LLMs, German-language and multilingual
  • Hugging Face — thousands of open-source models via Inference API and Dedicated Endpoints
  • SambaNova — fast inference on Llama, Qwen, and other open models
  • xAI — Grok models (grok-2, grok-2-mini)
  • NovitaAI — hosted open models (Llama, Mistral, Qwen, etc.)
  • Writer — Palmyra models including domain-specific (palmyra-med, palmyra-fin)
  • LM Studio — local models via LM Studio's OpenAI-compatible API; no API key required
  • GPT4All — fully on-device GGUF models via GPT4All Python bindings; no API key, no internet
  • vLLM — high-throughput self-hosted inference via vLLM's OpenAI-compatible API
  • Replicate — hosted open models (Llama, Mistral, SDXL, etc.) via the Replicate REST API
  • Caching & Retries — LRU caching, exponential backoff, rate limiting
  • CostRouter & FallbackChain — route to cheapest model or cascade on failure
  • Cost Tracker — attribute and budget LLM spending