Skip to main content

Google Gemini

Use Google's Gemini models with up to 1M token context, multimodal inputs, and native function calling.

Install

pip install synapsekit[gemini]

Via the RAG facade

from synapsekit import RAG

rag = RAG(model="gemini-2.0-flash", api_key="your-google-api-key")
rag.add("Your document text here")

answer = rag.ask_sync("Summarize the document.")

Direct usage

from synapsekit.llm.gemini import GeminiLLM
from synapsekit.llm.base import LLMConfig

llm = GeminiLLM(LLMConfig(
model="gemini-2.0-flash",
api_key="your-google-api-key",
provider="gemini",
temperature=0.3,
max_tokens=1024,
))

async for token in llm.stream("Explain vector embeddings."):
print(token, end="", flush=True)

Available models

ModelContextInput (per 1M)Output (per 1M)Notes
gemini-2.5-pro1M$1.25$10.00Most capable, multimodal
gemini-2.5-flash1M$0.075$0.30Fast, low cost
gemini-2.0-flash1M$0.075$0.30Stable, production-ready
gemini-2.0-flash-lite1M$0.01$0.04Cheapest
gemini-1.5-pro2M$1.25$5.00Legacy, largest context
gemini-1.5-flash1M$0.075$0.30Legacy fast

Google AI API vs Vertex AI

FeatureGoogle AI APIVertex AI
AuthAPI keygcloud / service account
CostPay-per-useSame, + GCP billing
Region controlNoYes
Enterprise SLANoYes
Free tierYesNo

Google AI API (default)

llm = GeminiLLM(LLMConfig(
model="gemini-2.0-flash",
api_key="AIza...",
provider="gemini",
))

Vertex AI

llm = GeminiLLM(
LLMConfig(model="gemini-2.0-flash", api_key="", provider="gemini"),
use_vertex=True,
project_id="my-gcp-project",
location="us-central1",
)

When use_vertex=True, SynapseKit uses google-auth Application Default Credentials. Run gcloud auth application-default login first.

Function calling

GeminiLLM supports native function calling via call_with_tools(). SynapseKit automatically converts OpenAI-format tool schemas to Gemini's FunctionDeclaration format.

from synapsekit import FunctionCallingAgent, CalculatorTool
from synapsekit.llm.gemini import GeminiLLM
from synapsekit.llm.base import LLMConfig

llm = GeminiLLM(LLMConfig(
model="gemini-2.0-flash",
api_key="your-google-api-key",
provider="gemini",
))

agent = FunctionCallingAgent(
llm=llm,
tools=[CalculatorTool()],
)

answer = await agent.run("What is 144 divided by 12?")

Direct call_with_tools

tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
},
}
]

messages = [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "What's the weather in Paris?"},
]

result = await llm.call_with_tools(messages, tools)
# {"content": None, "tool_calls": [{"id": "call_...", "name": "get_weather", "arguments": {"city": "Paris"}}]}
note

Gemini doesn't provide tool call IDs natively. SynapseKit generates them via uuid4 for compatibility.

Multimodal inputs

from synapsekit.multimodal import ImageContent

# Analyze an image
message = {
"role": "user",
"content": [
ImageContent.from_url("https://example.com/chart.png"),
{"type": "text", "text": "Describe the trend shown in this chart."},
],
}

response = await llm.generate(message)

Audio inputs

from synapsekit.multimodal import AudioContent

with open("meeting_recording.mp3", "rb") as f:
audio = AudioContent.from_bytes(f.read(), media_type="audio/mp3")

response = await llm.generate([audio, "Summarize this meeting recording."])

Long context: processing large documents

Gemini's 1M+ token context enables loading entire books or codebases:

# Load a 500-page PDF (as text) into context
with open("annual_report.txt") as f:
document = f.read()

# Gemini 2.5 Pro handles ~750K words in a single request
llm = GeminiLLM(LLMConfig(
model="gemini-2.5-pro",
api_key="AIza...",
max_tokens=8192,
))

response = await llm.generate(
f"Here is the annual report:\n\n{document}\n\nWhat were the top 3 risks mentioned?"
)

For documents exceeding 1M tokens, chunk and summarize progressively:

CHUNK_SIZE = 800_000  # tokens (approximate)

chunks = [document[i:i+CHUNK_SIZE*4] for i in range(0, len(document), CHUNK_SIZE*4)]
summaries = []

for i, chunk in enumerate(chunks):
summary = await llm.generate(f"Summarize section {i+1}:\n\n{chunk}")
summaries.append(summary)

final = await llm.generate("Combine these summaries:\n\n" + "\n\n".join(summaries))

Rate limits

TierRPMTPMNotes
Free151MFor prototyping
Pay-as-you-go3604Mgemini-2.0-flash
Pay-as-you-go3604Mgemini-2.5-pro

Use requests_per_minute in LLMConfig to throttle if needed:

llm = GeminiLLM(LLMConfig(
model="gemini-2.0-flash",
api_key="AIza...",
requests_per_minute=14, # stay under free tier limit
))

Error handling

from synapsekit.exceptions import LLMError, RateLimitError, AuthenticationError

try:
response = await llm.generate("Hello")
except AuthenticationError:
print("Invalid API key — visit aistudio.google.com to create one")
except RateLimitError:
print("Rate limit exceeded — upgrade to pay-as-you-go or reduce RPM")
except LLMError as e:
print(f"Gemini error: {e}")
tip

Get a free API key at aistudio.google.com. The free tier includes 15 RPM and 1M tokens/day.