Skip to main content

Groq

Ultra-fast inference with Groq's LPU hardware. Supports Llama, Mixtral, Gemma, and other open models.

Install

pip install synapsekit[groq]

Usage

from synapsekit.llm.groq import GroqLLM
from synapsekit import LLMConfig

config = LLMConfig(
model="llama-3.3-70b-versatile",
api_key="gsk_...",
provider="groq",
)

llm = GroqLLM(config)

# Streaming
async for token in llm.stream("Explain quantum computing"):
print(token, end="")

# Generate
response = await llm.generate("What is Rust?")

Available models

ModelContextNotes
llama-3.3-70b-versatile128KBest quality
llama-3.1-8b-instant128KFastest
mixtral-8x7b-3276832KGood balance
gemma2-9b-it8KGoogle Gemma

Function calling

result = await llm.call_with_tools(
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[...],
)

Auto-detection

The RAG facade auto-detects Groq for llama, mixtral, and gemma model prefixes:

from synapsekit import RAG

rag = RAG(model="llama-3.3-70b-versatile", api_key="gsk_...")
tip

Groq is ideal for latency-sensitive applications. Most models respond in under 500ms.