Skip to main content

Structured Output with Function Calling

Open In Colab

Getting structured data out of an LLM — a typed dict, a list of objects, or a nested model — is one of the most practical patterns in production LLM applications. Function calling is the most reliable mechanism: the LLM fills in a JSON schema instead of generating free text, and Pydantic validates the result. What you'll build: an agent that returns typed Pydantic models for tasks like entity extraction, report generation, and multi-step research with a structured summary. Time: ~20 min. Difficulty: Intermediate

Prerequisites

pip install synapsekit pydantic
export OPENAI_API_KEY="sk-..."

What you'll learn

  • How to use a Pydantic model as a tool schema to force structured output
  • The "extraction tool" pattern: define the desired schema as a tool, the LLM "calls" it to produce output
  • How to validate and deserialize LLM output into typed Python objects
  • Combining structured output with real tool calls in the same agent
  • Handling optional fields and nested models

Step 1: Define output schemas as Pydantic models

import asyncio
import json
from typing import Any
from pydantic import BaseModel, Field
from synapsekit.agents import BaseTool, FunctionCallingAgent, ToolResult
from synapsekit.llms.openai import OpenAILLM

Define the target shape of the output using Pydantic:

class CompanyProfile(BaseModel):
name: str = Field(description="Official company name")
industry: str = Field(description="Primary industry sector")
founded_year: int | None = Field(None, description="Year the company was founded")
headquarters: str = Field(description="City and country of headquarters")
key_products: list[str] = Field(description="Top 3-5 products or services")
competitors: list[str] = Field(description="Main competitors")
summary: str = Field(description="2-3 sentence company summary")


class NewsArticle(BaseModel):
title: str
source: str
published_date: str | None = None
key_points: list[str] = Field(description="3-5 key takeaways from the article")
sentiment: str = Field(description="Overall sentiment: positive, negative, or neutral")


class ResearchReport(BaseModel):
topic: str
findings: list[str] = Field(description="Key facts and findings, one per item")
sources: list[str] = Field(description="Source names or URLs")
confidence: str = Field(description="Confidence level: high, medium, or low")
conclusion: str = Field(description="Single-paragraph conclusion")

Step 2: Create extraction tools from Pydantic models

The "extraction tool" pattern treats the Pydantic schema as a tool that the LLM "calls" to produce structured output. The tool's job is to receive the validated JSON, deserialize it, and store or return it.

def make_extraction_tool(model_class: type[BaseModel], tool_name: str, description: str) -> BaseTool:
"""Create a BaseTool that extracts data into a Pydantic model."""

# Convert Pydantic schema to OpenAI-compatible JSON Schema
pydantic_schema = model_class.model_json_schema()

class ExtractionTool(BaseTool):
name = tool_name
description = description
parameters = pydantic_schema

# Store the last extracted result for retrieval after the run
last_result: model_class | None = None

async def run(self, **kwargs: Any) -> ToolResult:
try:
# Pydantic validates and coerces the LLM's JSON output
instance = model_class(**kwargs)
ExtractionTool.last_result = instance
return ToolResult(output=instance.model_dump_json())
except Exception as e:
return ToolResult(output="", error=f"Schema validation failed: {e}")

return ExtractionTool()

Step 3: Build a structured extraction agent

company_extractor = make_extraction_tool(
CompanyProfile,
tool_name="extract_company_profile",
description=(
"Call this tool to return a structured company profile. "
"Use it when you have gathered enough information to fill all fields."
),
)

agent = FunctionCallingAgent(
llm=OpenAILLM(model="gpt-4o-mini"),
tools=[company_extractor],
system_prompt=(
"You are a business analyst. When asked about a company, "
"call extract_company_profile with all available information. "
"Do not answer in plain text — always call the tool."
),
max_iterations=3,
)

Step 4: Combine real tools with structured output

For agents that need to search before extracting, combine research tools with the extraction tool:

from synapsekit.agents import DuckDuckGoSearchTool, WikipediaTool

research_extractor = make_extraction_tool(
ResearchReport,
tool_name="submit_research_report",
description=(
"Call this tool AFTER completing research to submit the final structured report. "
"Fill all fields based on information gathered from search and Wikipedia."
),
)

research_agent = FunctionCallingAgent(
llm=OpenAILLM(model="gpt-4o-mini"),
tools=[
DuckDuckGoSearchTool(),
WikipediaTool(),
research_extractor,
],
system_prompt=(
"You are a research analyst. Research the given topic using search and Wikipedia, "
"then submit a structured report using submit_research_report. "
"Always call submit_research_report as your final action."
),
max_iterations=8,
)

Step 5: Extract and validate the typed result

After agent.run(), recover the typed Pydantic object from the extraction tool:

async def extract_company(company_name: str) -> CompanyProfile | None:
await agent.run(f"Create a detailed profile for the company: {company_name}")
return company_extractor.__class__.last_result

Complete working example

import asyncio
import json
from typing import Any
from pydantic import BaseModel, Field
from synapsekit.agents import (
BaseTool,
DuckDuckGoSearchTool,
FunctionCallingAgent,
ToolResult,
WikipediaTool,
)
from synapsekit.llms.openai import OpenAILLM


class ResearchReport(BaseModel):
topic: str = Field(description="Research topic")
findings: list[str] = Field(description="3-5 key findings, one per list item")
sources: list[str] = Field(description="Source names referenced")
confidence: str = Field(description="Confidence level: high, medium, or low")
conclusion: str = Field(description="One-paragraph conclusion")


def make_report_tool() -> BaseTool:
class ReportTool(BaseTool):
name = "submit_research_report"
description = (
"Submit the final structured research report. "
"Call this as your last action after gathering all information."
)
parameters = ResearchReport.model_json_schema()
last_result: ResearchReport | None = None

async def run(self, **kwargs: Any) -> ToolResult:
try:
report = ResearchReport(**kwargs)
ReportTool.last_result = report
return ToolResult(output=f"Report submitted: {report.topic}")
except Exception as e:
return ToolResult(output="", error=str(e))

return ReportTool()


async def research_topic(topic: str) -> ResearchReport | None:
report_tool = make_report_tool()

agent = FunctionCallingAgent(
llm=OpenAILLM(model="gpt-4o-mini"),
tools=[
DuckDuckGoSearchTool(),
WikipediaTool(),
report_tool,
],
system_prompt=(
"You are a research analyst. Use search and Wikipedia to research the topic. "
"Then call submit_research_report with structured findings. "
"Always end by calling submit_research_report."
),
max_iterations=8,
)

await agent.run(f"Research the following topic and submit a report: {topic}")
return report_tool.__class__.last_result


async def main() -> None:
topics = [
"The current state of quantum computing hardware",
"SynapseKit Python library for LLM applications",
]

for topic in topics:
print(f"\nResearching: {topic}")
print("-" * 60)
report = await research_topic(topic)

if report is None:
print("No report generated.")
continue

print(f"Topic: {report.topic}")
print(f"Confidence: {report.confidence}")
print(f"\nFindings:")
for i, finding in enumerate(report.findings, 1):
print(f" {i}. {finding}")
print(f"\nSources: {', '.join(report.sources)}")
print(f"\nConclusion:\n{report.conclusion}")

# Access as a typed Python object — no manual JSON parsing needed
report_dict = report.model_dump()
print(f"\nJSON keys: {list(report_dict.keys())}")


asyncio.run(main())

Expected output

Researching: The current state of quantum computing hardware
------------------------------------------------------------
Topic: The current state of quantum computing hardware
Confidence: high

Findings:
1. IBM's Condor processor reached 1,000+ qubits in late 2023
2. Google achieved quantum supremacy benchmarks with Sycamore
3. Error correction remains the primary engineering challenge
4. Trapped-ion and superconducting approaches are the leading architectures

Sources: Wikipedia, DuckDuckGo search results
Conclusion:
Quantum computing hardware has advanced significantly, with IBM and Google leading...

JSON keys: ['topic', 'findings', 'sources', 'confidence', 'conclusion']

How it works

The extraction tool pattern works because the LLM interprets the instruction "call this tool to return your answer" as the termination condition. Instead of generating free text, it fills the tool's JSON schema — which Pydantic then validates. If required fields are missing or types are wrong, model_class(**kwargs) raises a ValidationError, which the tool returns as a ToolResult error, giving the LLM a chance to retry with corrected values.

model_json_schema() (Pydantic v2) converts the model's field definitions, type annotations, and Field(description=...) metadata into an OpenAI-compatible JSON Schema object. The description strings become the parameter descriptions that guide the LLM's field population.

Variations

Extract a list of objects by wrapping in a container model:

class ArticleList(BaseModel):
articles: list[NewsArticle]
total_count: int

list_extractor = make_extraction_tool(ArticleList, "submit_articles", "Submit a list of news articles.")

Use OpenAI's response_format for simpler cases (no tool call needed):

# Note: response_format is available on OpenAILLM via the underlying SDK
# For full control, the extraction tool pattern is more portable across providers

Return the Pydantic object directly from run() by serializing with model_dump_json():

async def run_and_parse(agent, question: str, model_class: type[BaseModel]) -> BaseModel | None:
await agent.run(question)
for step in agent.memory.steps:
try:
data = json.loads(step.observation)
return model_class(**data)
except Exception:
continue
return None

Troubleshooting

ValidationError: field required — the LLM did not fill all required fields. Add field_name: str | None = None to make fields optional, or strengthen the system prompt: "You must populate every field in the schema."

Agent calls the extraction tool immediately without researching — the description says "call when done" but the LLM ignores it. Add to system_prompt: "You must call at least one research tool before calling submit_research_report."

model_json_schema() not found — you are using Pydantic v1. Replace with model_class.schema() for v1 compatibility.

Nested model fields populated as strings instead of objects — the LLM serialized the nested model as a JSON string. Normalize with json.loads() before passing to Pydantic, or use model_validate_json() instead of model_class(**kwargs).

Next steps