Skip to main content

AI21 Labs

AI21 Labs' Jamba models — a hybrid SSM-Transformer architecture offering long context windows and low inference cost.

Install

pip install synapsekit[ai21]

Usage

from synapsekit.llm.ai21 import AI21LLM
from synapsekit import LLMConfig

config = LLMConfig(
model="jamba-1.5-mini",
api_key="...",
provider="ai21",
)

llm = AI21LLM(config)

# Streaming
async for token in llm.stream("Explain transformer architecture"):
print(token, end="")

# Generate
response = await llm.generate("What is the Jamba architecture?")

Available models

ModelContextNotes
jamba-1.5-mini256KFast, efficient
jamba-1.5-large256KHigher quality
jamba-instruct256KInstruction-tuned

Function calling

AI21 Jamba supports native function calling:

from synapsekit import FunctionCallingAgent, tool

@tool
def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f"Sunny, 22°C in {city}"

agent = FunctionCallingAgent(llm=llm, tools=[get_weather])
answer = await agent.run("What's the weather in Paris?")
print(answer)

Auto-detection

The RAG facade auto-detects AI21 for jamba-* model prefixes:

from synapsekit import RAG

rag = RAG(model="jamba-1.5-mini", api_key="...")
rag.add("Your document text here")
answer = rag.ask_sync("Summarize this.")

Rate limits

See AI21 documentation for current rate limits.

tip

Jamba models support 256K context windows — ideal for long-document RAG or multi-turn conversations that would overflow shorter-context models.