OpenAI

Use OpenAI's GPT models with streaming, function calling, vision, and structured output.

Install

pip install synapsekit[openai]

Usage

from synapsekit.llms import OpenAILLM, LLMConfig

llm = OpenAILLM(LLMConfig(
    model="gpt-4o-mini",
    api_key="sk-...",
    temperature=0.7,
))

# Streaming
async for token in llm.stream("Tell me about async Python."):
    print(token, end="", flush=True)

# Full response
response = await llm.generate("Tell me about async Python.")
print(response)

Available models

Model	Context	Input (per 1M tokens)	Output (per 1M tokens)	Notes
`gpt-4o`	128K	$2.50	$10.00	Best quality, multimodal
`gpt-4o-mini`	128K	$0.15	$0.60	Fast and cheap
`gpt-4-turbo`	128K	$10.00	$30.00	Legacy high-quality
`gpt-3.5-turbo`	16K	$0.50	$1.50	Legacy, cheapest
`o1`	200K	$15.00	$60.00	Reasoning, no streaming
`o1-mini`	128K	$3.00	$12.00	Reasoning, fast
`o3-mini`	200K	$1.10	$4.40	Latest reasoning

Any model supported by the OpenAI API works.

Function calling

Use the @tool decorator or pass raw JSON Schema. SynapseKit auto-generates schemas from Python type hints.

from synapsekit import tool, FunctionCallingAgent
from synapsekit.llms import OpenAILLM, LLMConfig

@tool
def get_weather(city: str, unit: str = "celsius") -> str:
    """Get current weather for a city."""
    return f"Weather in {city}: 22 degrees {unit}, sunny"

@tool
def calculate(expression: str) -> float:
    """Evaluate a math expression."""
    return eval(expression)

llm = OpenAILLM(LLMConfig(model="gpt-4o-mini", api_key="sk-..."))
agent = FunctionCallingAgent(llm=llm, tools=[get_weather, calculate])

answer = await agent.run("What's the weather in Paris? Also, what's 144 / 12?")
print(answer)

Raw tool schema

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "num_results": {"type": "integer", "default": 5},
                },
                "required": ["query"],
            },
        },
    }
]

result = await llm.call_with_tools(
    messages=[{"role": "user", "content": "What happened in AI this week?"}],
    tools=tools,
)
# {"content": None, "tool_calls": [{"id": "call_abc123", "name": "search_web", "arguments": {"query": "AI news this week"}}]}

Vision

Pass images alongside text using ImageContent:

from synapsekit.multimodal import ImageContent

message = {
    "role": "user",
    "content": [
        ImageContent.from_url("https://example.com/chart.png"),
        {"type": "text", "text": "Describe this chart."},
    ],
}

response = await llm.generate(message)

# From local file
from synapsekit.multimodal import ImageContent

with open("screenshot.png", "rb") as f:
    image = ImageContent.from_bytes(f.read(), media_type="image/png")

response = await llm.generate([image, "What's in this image?"])

LLMConfig options

Parameter	Type	Default	Description
`model`	str	required	OpenAI model name
`api_key`	str	required	Your OpenAI API key
`temperature`	float	`1.0`	Sampling temperature (0-2)
`max_tokens`	int	None	Maximum output tokens
`seed`	int	None	For deterministic outputs
`max_retries`	int	`3`	Auto-retry on transient errors
`requests_per_minute`	int	None	Rate limit (RPM) throttle
`cache_backend`	str	None	`"sqlite"` or `"lru"`

Response caching

Enable caching to avoid re-requesting the same prompt:

llm = OpenAILLM(LLMConfig(
    model="gpt-4o-mini",
    api_key="sk-...",
    cache_backend="sqlite",
    cache_path="~/.synapsekit/cache.db",
))

# First call hits the API
response1 = await llm.generate("What is Python?")

# Second call is served from cache
response2 = await llm.generate("What is Python?")

Cost tracking

from synapsekit.observability import CostTracker

tracker = CostTracker()
llm = OpenAILLM(LLMConfig(model="gpt-4o", api_key="sk-..."))
llm.attach_tracker(tracker)

await llm.generate("Summarize the Python docs.")
await llm.generate("Write a haiku about async.")

print(tracker.total_cost_usd)
print(tracker.summary())

Reasoning models (o1, o3)

The o1 and o3 series have different constraints:

# o1 does not support streaming or system messages
llm = OpenAILLM(LLMConfig(
    model="o1",
    api_key="sk-...",
    # Do not set temperature -- unsupported for o1
))

response = await llm.generate("Solve this logic puzzle: ...")

caution

o1 and o3-mini do not support stream(), system messages, or temperature. Use generate() only.

Error handling

from synapsekit.exceptions import LLMError, RateLimitError, AuthenticationError

try:
    response = await llm.generate("Hello")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except LLMError as e:
    print(f"LLM error: {e}")

SynapseKit automatically retries on 429 Too Many Requests and 5xx errors up to max_retries times with exponential backoff.

Using the RAG facade

from synapsekit import RAG

rag = RAG(model="gpt-4o-mini", api_key="sk-...")
rag.add("SynapseKit is a Python library for building LLM applications.")
rag.add_file("docs/readme.txt")

answer = rag.ask_sync("What is SynapseKit?")
print(answer)

tip

Set OPENAI_API_KEY in your environment and omit api_key from LLMConfig -- SynapseKit will pick it up automatically.

Install​

Usage​

Available models​

Function calling​

Raw tool schema​

Vision​

LLMConfig options​

Response caching​

Cost tracking​

Reasoning models (o1, o3)​

Error handling​

Using the RAG facade​