Skip to main content

Fireworks AI

Fireworks AI provides optimized inference for open-source models with an OpenAI-compatible API. It offers some of the lowest latency for popular models like Llama and Mixtral, with their FireFunction models purpose-built for reliable tool use.

Install

pip install synapsekit[openai]

Fireworks AI uses the OpenAI-compatible API, so it requires the openai package.

Basic usage

from synapsekit import LLMConfig
from synapsekit.llm.fireworks import FireworksLLM

llm = FireworksLLM(LLMConfig(
model="accounts/fireworks/models/llama-v3p3-70b-instruct",
api_key="fw_...",
))

response = await llm.generate("Explain the difference between RAG and fine-tuning.")
print(response)
# RAG retrieves relevant context at inference time, while fine-tuning...

Streaming

from synapsekit import LLMConfig
from synapsekit.llm.fireworks import FireworksLLM

llm = FireworksLLM(LLMConfig(
model="accounts/fireworks/models/llama-v3p3-70b-instruct",
api_key="fw_...",
temperature=0.6,
))

async for token in llm.stream("Write a Python function to parse JSON safely."):
print(token, end="", flush=True)
# def safe_json_parse(text: str) -> dict | None:
# try:
# return json.loads(text)
# except json.JSONDecodeError:
# return None

Available models

ModelIDContextNotes
Llama 3.3 70Baccounts/fireworks/models/llama-v3p3-70b-instruct131KBest quality
Llama 3.1 8Baccounts/fireworks/models/llama-v3p1-8b-instruct131KFast, cheap
Mixtral 8x7Baccounts/fireworks/models/mixtral-8x7b-instruct32KStrong reasoning
Qwen 2.5 72Baccounts/fireworks/models/qwen2p5-72b-instruct131KMultilingual
FireFunction v2accounts/fireworks/models/firefunction-v28KOptimized for tool use
Llama 3.1 405Baccounts/fireworks/models/llama-v3p1-405b-instruct131KLargest open model

See the full list at fireworks.ai/models.

Function calling

Fireworks offers FireFunction-v2, a model specifically optimized for reliable function calling:

from synapsekit import FunctionCallingAgent, tool
from synapsekit import LLMConfig
from synapsekit.llm.fireworks import FireworksLLM

@tool
def search_documentation(query: str, max_results: int = 3) -> list:
"""Search the SynapseKit documentation for a query."""
# In practice, run a vector search
return [
{"title": f"Result {i}: {query}", "url": f"https://docs.example.com/{i}"}
for i in range(1, max_results + 1)
]

@tool
def create_github_issue(title: str, body: str, labels: list[str] = None) -> dict:
"""Create a GitHub issue in the SynapseKit repository."""
return {
"number": 42,
"title": title,
"url": "https://github.com/SynapseKit/SynapseKit/issues/42",
"labels": labels or [],
}

# Use FireFunction-v2 for most reliable tool calling
llm = FireworksLLM(LLMConfig(
model="accounts/fireworks/models/firefunction-v2",
api_key="fw_...",
))

agent = FunctionCallingAgent(llm=llm, tools=[search_documentation, create_github_issue])
answer = await agent.run(
"Search for 'streaming' in the docs and create an issue to improve those docs."
)
print(answer)
# Found 3 results for 'streaming'. Created issue #42: 'Improve streaming documentation'.

Raw call_with_tools

tools = [
{
"type": "function",
"function": {
"name": "classify_text",
"description": "Classify text into a category",
"parameters": {
"type": "object",
"properties": {
"text": {"type": "string"},
"categories": {
"type": "array",
"items": {"type": "string"},
},
},
"required": ["text", "categories"],
},
},
}
]

result = await llm.call_with_tools(
messages=[{"role": "user", "content": "Is 'I love this product!' positive or negative?"}],
tools=tools,
)
# result["tool_calls"] → [{"name": "classify_text", "arguments": {"text": "I love this product!", "categories": ["positive", "negative", "neutral"]}}]

FireFunction models

Fireworks' FireFunction models are fine-tuned versions of Llama specifically for tool use:

ModelBest for
accounts/fireworks/models/firefunction-v2Reliable single and parallel tool calls

FireFunction-v2 is recommended over general-purpose models when your agent makes many tool calls, as it produces cleaner JSON arguments and fewer hallucinated tool names.

Custom base URL

llm = FireworksLLM(
LLMConfig(model="accounts/fireworks/models/llama-v3p3-70b-instruct", api_key="fw_..."),
base_url="http://localhost:8000/v1",
)

Cost tracking

from synapsekit.observability import CostTracker

tracker = CostTracker()
llm = FireworksLLM(LLMConfig(
model="accounts/fireworks/models/llama-v3p1-8b-instruct",
api_key="fw_...",
))
llm.attach_tracker(tracker)

for i in range(10):
await llm.generate(f"Summarize paragraph {i}.")

print(f"Total cost: ${tracker.total_cost_usd:.6f}")

Parameters reference

ParameterDescription
modelFireworks model ID (full accounts/fireworks/models/... path)
api_keyYour Fireworks API key (starts with fw_)
temperatureSampling temperature (0.0–1.0)
max_tokensMaximum output tokens
base_urlCustom API base URL (default: https://api.fireworks.ai/inference/v1)

Error handling

from synapsekit.exceptions import LLMError, RateLimitError, AuthenticationError

try:
response = await llm.generate("Hello")
except AuthenticationError:
print("Invalid API key — get one at fireworks.ai")
except RateLimitError as e:
print(f"Rate limited. Retry after {e.retry_after}s")
except LLMError as e:
print(f"Fireworks error: {e}")
tip

Use firefunction-v2 when building production agents that need reliable tool calling. For general Q&A workloads, llama-v3p3-70b-instruct offers the best quality-to-cost ratio.