AI21 Labs

AI21 Labs' Jamba models — a hybrid SSM-Transformer architecture offering long context windows and low inference cost.

Install

pip install synapsekit[ai21]

Usage

from synapsekit.llm.ai21 import AI21LLM
from synapsekit import LLMConfig

config = LLMConfig(
    model="jamba-1.5-mini",
    api_key="...",
    provider="ai21",
)

llm = AI21LLM(config)

# Streaming
async for token in llm.stream("Explain transformer architecture"):
    print(token, end="")

# Generate
response = await llm.generate("What is the Jamba architecture?")

Available models

Model	Context	Notes
`jamba-1.5-mini`	256K	Fast, efficient
`jamba-1.5-large`	256K	Higher quality
`jamba-instruct`	256K	Instruction-tuned

Function calling

AI21 Jamba supports native function calling:

from synapsekit import FunctionCallingAgent, tool

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Sunny, 22°C in {city}"

agent = FunctionCallingAgent(llm=llm, tools=[get_weather])
answer = await agent.run("What's the weather in Paris?")
print(answer)

Auto-detection

The RAG facade auto-detects AI21 for jamba-* model prefixes:

from synapsekit import RAG

rag = RAG(model="jamba-1.5-mini", api_key="...")
rag.add("Your document text here")
answer = rag.ask_sync("Summarize this.")

Rate limits

See AI21 documentation for current rate limits.

tip

Jamba models support 256K context windows — ideal for long-document RAG or multi-turn conversations that would overflow shorter-context models.

Install​

Usage​

Available models​

Function calling​

Auto-detection​

Rate limits​