SQL Database Agent

A SQL agent lets non-technical users query a database in plain English. Instead of writing SELECT COUNT(*) FROM orders WHERE status = 'shipped', they can ask "How many orders shipped this week?" and the agent translates the question, runs the query safely, and explains the results. What you'll build: a natural-language analytics assistant backed by an in-memory SQLite database, using SQLQueryTool for safe SELECT-only execution and SQLSchemaInspectionTool for schema discovery. Time: ~20 min. Difficulty: Intermediate

Prerequisites

pip install synapsekit
export OPENAI_API_KEY="sk-..."
# For PostgreSQL/MySQL, also: pip install sqlalchemy

What you'll learn

SQLQueryTool — parameterized SELECT-only query execution
SQLSchemaInspectionTool — automatic schema discovery for LLM context
How to seed an in-memory SQLite database for demos
Security: why read-only enforcement matters and how it works
Connecting to real databases via SQLAlchemy connection strings

Step 1: Import tools

import asyncio
import sqlite3
from synapsekit.agents import (
    FunctionCallingAgent,
    SQLQueryTool,
    SQLSchemaInspectionTool,
)
from synapsekit.llms.openai import OpenAILLM

Step 2: Create and seed a demo database

For production, point connection_string at your real database. For this guide we create an in-memory SQLite file on disk so both tools can connect independently.

def create_demo_db(path: str = "demo.db") -> str:
    """Seed a demo database with orders, customers, and products."""
    conn = sqlite3.connect(path)
    cursor = conn.cursor()

    cursor.executescript("""
        CREATE TABLE IF NOT EXISTS customers (
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL,
            email TEXT NOT NULL,
            country TEXT NOT NULL
        );

        CREATE TABLE IF NOT EXISTS products (
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL,
            category TEXT NOT NULL,
            price REAL NOT NULL
        );

        CREATE TABLE IF NOT EXISTS orders (
            id INTEGER PRIMARY KEY,
            customer_id INTEGER REFERENCES customers(id),
            product_id INTEGER REFERENCES products(id),
            quantity INTEGER NOT NULL,
            status TEXT NOT NULL,
            created_at TEXT NOT NULL
        );
    """)

    customers = [
        (1, "Alice Johnson", "alice@example.com", "USA"),
        (2, "Bob Smith", "bob@example.com", "Canada"),
        (3, "Carlos Rivera", "carlos@example.com", "Mexico"),
        (4, "Diana Wei", "diana@example.com", "USA"),
    ]
    cursor.executemany(
        "INSERT OR IGNORE INTO customers VALUES (?, ?, ?, ?)", customers
    )

    products = [
        (1, "Wireless Headphones", "Electronics", 89.99),
        (2, "Standing Desk", "Furniture", 349.00),
        (3, "Python Book", "Books", 49.99),
        (4, "Coffee Maker", "Appliances", 129.00),
    ]
    cursor.executemany(
        "INSERT OR IGNORE INTO products VALUES (?, ?, ?, ?)", products
    )

    orders = [
        (1, 1, 1, 2, "shipped", "2025-04-01"),
        (2, 2, 3, 1, "pending", "2025-04-03"),
        (3, 1, 4, 1, "shipped", "2025-04-05"),
        (4, 3, 2, 1, "delivered", "2025-03-28"),
        (5, 4, 1, 3, "shipped", "2025-04-08"),
        (6, 2, 4, 2, "pending", "2025-04-09"),
    ]
    cursor.executemany(
        "INSERT OR IGNORE INTO orders VALUES (?, ?, ?, ?, ?, ?)", orders
    )

    conn.commit()
    conn.close()
    return path

Step 3: Configure the SQL tools

SQLQueryTool enforces SELECT-only at the Python level before the query reaches the database driver — so even if the LLM hallucinates a DROP TABLE, it will be blocked.

db_path = create_demo_db()

query_tool = SQLQueryTool(
    connection_string=db_path,
    max_rows=50,           # cap result size to prevent huge observations
)

schema_tool = SQLSchemaInspectionTool(
    connection_string=db_path,
)

Step 4: Build the agent with schema context

The SQLSchemaInspectionTool lets the agent discover table names and column types on demand rather than requiring you to hard-code them in the system prompt. This is especially valuable when the schema changes or is too large to embed manually.

agent = FunctionCallingAgent(
    llm=OpenAILLM(model="gpt-4o-mini"),
    tools=[schema_tool, query_tool],
    system_prompt=(
        "You are a data analyst assistant. "
        "Before writing any SQL query, always call sql_schema_inspection first "
        "to understand the available tables and columns. "
        "Only use SELECT statements — never modify data. "
        "Return results in a readable format and explain what they mean."
    ),
    max_iterations=8,
)

Step 5: Query in natural language

async def query(question: str) -> str:
    return await agent.run(question)

Step 6: Connect to a real database

For PostgreSQL, MySQL, or other SQLAlchemy-supported databases, pass a full connection URL:

# PostgreSQL
pg_query_tool = SQLQueryTool(
    connection_string="postgresql://user:password@localhost:5432/mydb",
    max_rows=100,
)

# MySQL
mysql_query_tool = SQLQueryTool(
    connection_string="mysql+pymysql://user:password@localhost/mydb",
    max_rows=100,
)

Complete working example

import asyncio
import sqlite3
from synapsekit.agents import (
    ActionEvent,
    FinalAnswerEvent,
    FunctionCallingAgent,
    ObservationEvent,
    SQLQueryTool,
    SQLSchemaInspectionTool,
)
from synapsekit.llms.openai import OpenAILLM


def seed_db(path: str = "demo.db") -> str:
    conn = sqlite3.connect(path)
    c = conn.cursor()
    c.executescript("""
        CREATE TABLE IF NOT EXISTS customers (id INTEGER PRIMARY KEY, name TEXT, country TEXT);
        CREATE TABLE IF NOT EXISTS products (id INTEGER PRIMARY KEY, name TEXT, price REAL);
        CREATE TABLE IF NOT EXISTS orders (
            id INTEGER PRIMARY KEY, customer_id INTEGER, product_id INTEGER,
            quantity INTEGER, status TEXT, created_at TEXT
        );
    """)
    c.executemany("INSERT OR IGNORE INTO customers VALUES (?,?,?)",
                  [(1,"Alice","USA"),(2,"Bob","Canada"),(3,"Carlos","Mexico")])
    c.executemany("INSERT OR IGNORE INTO products VALUES (?,?,?)",
                  [(1,"Headphones",89.99),(2,"Desk",349.00),(3,"Book",49.99)])
    c.executemany("INSERT OR IGNORE INTO orders VALUES (?,?,?,?,?,?)",
                  [(1,1,1,2,"shipped","2025-04-01"),
                   (2,2,3,1,"pending","2025-04-03"),
                   (3,1,2,1,"shipped","2025-04-05"),
                   (4,3,1,3,"delivered","2025-03-28")])
    conn.commit()
    conn.close()
    return path


def build_agent(db_path: str) -> FunctionCallingAgent:
    return FunctionCallingAgent(
        llm=OpenAILLM(model="gpt-4o-mini"),
        tools=[
            SQLSchemaInspectionTool(connection_string=db_path),
            SQLQueryTool(connection_string=db_path, max_rows=50),
        ],
        system_prompt=(
            "You are a SQL analytics assistant. "
            "Always inspect the schema before writing queries. "
            "Only use SELECT. Explain query results in plain English."
        ),
        max_iterations=8,
    )


async def main() -> None:
    db_path = seed_db()
    agent = build_agent(db_path)

    questions = [
        "How many orders are in each status category?",
        "Which customer has spent the most money in total?",
        "What are the top 3 products by total revenue?",
    ]

    for question in questions:
        print(f"\nQ: {question}")
        print("-" * 60)

        async for event in agent.stream_steps(question):
            if isinstance(event, ActionEvent):
                print(f"[{event.tool}] {str(event.tool_input)[:100]}")
            elif isinstance(event, ObservationEvent):
                print(f"  {event.observation[:200]}")
            elif isinstance(event, FinalAnswerEvent):
                print(f"\n{event.answer}")


asyncio.run(main())

Expected output

Q: How many orders are in each status category?
------------------------------------------------------------
[sql_schema_inspection] {}
  orders: id, customer_id, product_id, quantity, status, created_at
[sql_query] SELECT status, COUNT(*) as count FROM orders GROUP BY status
  status | count
  --- | ---
  delivered | 1
  pending | 1
  shipped | 2

There are 2 shipped orders, 1 pending order, and 1 delivered order.

Q: Which customer has spent the most money in total?
------------------------------------------------------------
[sql_query] SELECT c.name, SUM(p.price * o.quantity) as total ...
  name | total
  --- | ---
  Alice | 528.98

Alice has spent the most, with a total of $528.98 across her orders.

How it works

SQLQueryTool strips leading whitespace and checks that the query starts with SELECT before sending it to the database driver. Non-SELECT statements (INSERT, UPDATE, DELETE, DROP) are rejected with a ToolResult error before any database connection is opened. Parameters passed alongside the query are injected via the driver's native parameterized-query mechanism, preventing SQL injection regardless of what the LLM generates.

SQLSchemaInspectionTool uses PRAGMA table_info (SQLite) or information_schema.columns (SQLAlchemy) to discover the schema dynamically, then formats it as a compact text block that fits easily in the observation without blowing up the context window.

Variations

Hard-code the schema in the system prompt when you want to save one tool call per query on stable schemas:

schema_hint = """
Tables:
- customers(id, name, country)
- products(id, name, price)
- orders(id, customer_id, product_id, quantity, status, created_at)
"""
agent = FunctionCallingAgent(
    llm=OpenAILLM(model="gpt-4o-mini"),
    tools=[SQLQueryTool(connection_string=db_path)],
    system_prompt=f"You are a SQL assistant.\n\nSchema:\n{schema_hint}",
)

Chain with CodeInterpreterTool to visualize query results:

from synapsekit.agents import CodeInterpreterTool

agent = FunctionCallingAgent(
    llm=OpenAILLM(model="gpt-4o-mini"),
    tools=[
        SQLSchemaInspectionTool(connection_string=db_path),
        SQLQueryTool(connection_string=db_path),
        CodeInterpreterTool(timeout=10.0),
    ],
    system_prompt=(
        "Query the database and then use code_interpreter to create a "
        "matplotlib chart of the results."
    ),
)

Troubleshooting

Only SELECT queries are allowed error — the LLM generated a non-SELECT statement. Reinforce in system_prompt: "You may only use SELECT. Never generate INSERT, UPDATE, DELETE, or DROP."

Schema tool returns empty tables — the database file path is wrong or the file was not created before the agent started. Check that seed_db() runs before build_agent().

SQLAlchemy not found — install it for non-SQLite databases: pip install sqlalchemy. Also install the driver for your database: pip install psycopg2-binary (PostgreSQL) or pip install pymysql (MySQL).

Agent writes SQL with wrong column names — the schema observation was cut off. Reduce max_rows or increase max_iterations so the agent has room to call the schema tool without running out of steps.

Next steps

Code Execution Agent — visualize query results with matplotlib
Multi-Tool Orchestration — combine SQL with web search for enriched analytics
Structured Output with Function Calling — return query results as typed Pydantic models

Prerequisites​

What you'll learn​

Step 1: Import tools​

Step 2: Create and seed a demo database​

Step 3: Configure the SQL tools​

Step 4: Build the agent with schema context​

Step 5: Query in natural language​

Step 6: Connect to a real database​

Complete working example​

Expected output​

How it works​

Variations​

Troubleshooting​

Next steps​