Web Search Agent

A web search agent closes the knowledge gap between an LLM's training cutoff and today's world. By combining a fast search tool with a scraper that reads full page content, the agent can answer questions about current events, prices, documentation, and anything else that lives on the web. What you'll build: an agent that uses DuckDuckGo for quick searches, scrapes full page content when it needs more detail, and optionally upgrades to Tavily for richer structured results. Time: ~15 min. Difficulty: Beginner

Prerequisites

pip install synapsekit
export OPENAI_API_KEY="sk-..."
# Optional, for Tavily:
export TAVILY_API_KEY="tvly-..."

What you'll learn

DuckDuckGoSearchTool — zero-API-key web search
WebScraperTool — follow URLs and extract page text
TavilySearchTool — structured search results with AI-optimized excerpts
How to compose search + scraping for deeper research
Switching between free and paid search providers

Step 1: Import tools

import asyncio
from synapsekit.agents import (
    FunctionCallingAgent,
    DuckDuckGoSearchTool,
    WebScraperTool,
    TavilySearchTool,
)
from synapsekit.llms.openai import OpenAILLM

Step 2: Build the agent with free search

DuckDuckGoSearchTool requires no API key and is sufficient for most general queries. Pair it with WebScraperTool so the agent can follow links from search results and read the full page when a snippet is not enough.

agent = FunctionCallingAgent(
    llm=OpenAILLM(model="gpt-4o-mini"),
    tools=[
        DuckDuckGoSearchTool(),
        WebScraperTool(),
    ],
    system_prompt=(
        "You are a research assistant. When answering questions about current events "
        "or facts that may have changed recently, always search the web first. "
        "If a search snippet is insufficient, scrape the full page for detail."
    ),
    max_iterations=6,
)

Step 3: Run a current-events query

async def search(question: str) -> str:
    return await agent.run(question)

Step 4: Upgrade to Tavily for richer results

Tavily returns AI-curated excerpts, source URLs, and relevance scores. Use it when you need cleaner, more structured search results — especially for technical topics where DuckDuckGo may surface low-quality pages.

import os

# Use Tavily when the API key is available, fall back to DuckDuckGo otherwise
if os.getenv("TAVILY_API_KEY"):
    search_tool = TavilySearchTool()
else:
    search_tool = DuckDuckGoSearchTool()

premium_agent = FunctionCallingAgent(
    llm=OpenAILLM(model="gpt-4o-mini"),
    tools=[search_tool, WebScraperTool()],
    system_prompt="You are a research assistant with access to web search.",
    max_iterations=6,
)

Step 5: Stream the agent's search process

Seeing which URLs the agent decides to visit and which it skips helps you understand whether your system prompt is steering it correctly.

from synapsekit.agents import ActionEvent, FinalAnswerEvent, ObservationEvent

async def stream_search(question: str) -> None:
    async for event in agent.stream_steps(question):
        if isinstance(event, ActionEvent):
            tool_input = str(event.tool_input)
            print(f"[{event.tool}] {tool_input[:100]}")
        elif isinstance(event, ObservationEvent):
            print(f"  -> {event.observation[:150]}...")
        elif isinstance(event, FinalAnswerEvent):
            print(f"\nAnswer:\n{event.answer}")

Step 6: Batch multiple queries

For research tasks that require answering several related questions, run them sequentially and collect answers into a report.

async def batch_research(questions: list[str]) -> dict[str, str]:
    results = {}
    for question in questions:
        results[question] = await agent.run(question)
    return results

Complete working example

import asyncio
import os
from synapsekit.agents import (
    ActionEvent,
    DuckDuckGoSearchTool,
    FinalAnswerEvent,
    FunctionCallingAgent,
    ObservationEvent,
    TavilySearchTool,
    WebScraperTool,
)
from synapsekit.llms.openai import OpenAILLM


def build_agent() -> FunctionCallingAgent:
    # Prefer Tavily for richer results; DuckDuckGo requires no key
    search_tool = (
        TavilySearchTool() if os.getenv("TAVILY_API_KEY") else DuckDuckGoSearchTool()
    )
    return FunctionCallingAgent(
        llm=OpenAILLM(model="gpt-4o-mini"),
        tools=[search_tool, WebScraperTool()],
        system_prompt=(
            "You are a concise research assistant. Search the web for up-to-date information. "
            "Cite at least one source URL in every answer."
        ),
        max_iterations=6,
    )


async def main() -> None:
    agent = build_agent()

    questions = [
        "What is the current version of Python and when was it released?",
        "What are the top AI announcements from the past month?",
    ]

    for question in questions:
        print(f"\nQ: {question}")
        print("-" * 60)

        async for event in agent.stream_steps(question):
            if isinstance(event, ActionEvent):
                print(f"  Calling {event.tool}: {str(event.tool_input)[:80]}")
            elif isinstance(event, ObservationEvent):
                print(f"  Got: {event.observation[:120]}...")
            elif isinstance(event, FinalAnswerEvent):
                print(f"\n{event.answer}")


asyncio.run(main())

Expected output

Q: What is the current version of Python and when was it released?
------------------------------------------------------------
  Calling duck_duck_go_search: current Python version 2025
  Got: Python 3.13 was released in October 2024...
  Calling web_scraper: https://www.python.org/downloads/
  Got: Python 3.13.2 - Feb. 4, 2025...

Python 3.13.2 is the latest stable release, published on February 4, 2025.
Source: https://www.python.org/downloads/

How it works

DuckDuckGoSearchTool uses the DuckDuckGo HTML API to return a list of result titles, URLs, and snippets. WebScraperTool takes a URL and returns the visible text content of the page after stripping HTML tags. The agent decides when to scrape by evaluating whether a snippet answers the question — this decision is made by the LLM based on your system_prompt instructions.

TavilySearchTool calls the Tavily API, which runs its own AI-powered extraction layer to return cleaner, more relevant excerpts. It reduces the need to follow up with WebScraperTool on individual pages.

Variations

Restrict to specific domains by post-processing search results in a wrapper tool:

from synapsekit.agents import BaseTool, ToolResult
from synapsekit.agents import DuckDuckGoSearchTool

class RestrictedSearchTool(BaseTool):
    name = "search"
    description = "Search for information on trusted sources only."
    parameters = DuckDuckGoSearchTool.parameters  # reuse schema

    def __init__(self, allowed_domains: list[str]) -> None:
        self._inner = DuckDuckGoSearchTool()
        self._allowed = allowed_domains

    async def run(self, **kwargs) -> ToolResult:
        result = await self._inner.run(**kwargs)
        # Filter lines containing disallowed domains
        filtered = "\n".join(
            line for line in result.output.splitlines()
            if any(d in line for d in self._allowed)
        )
        return ToolResult(output=filtered or result.output)

Add a cache to avoid repeating identical searches within a session:

from synapsekit.agents import BaseTool, DuckDuckGoSearchTool, ToolResult

class CachedSearchTool(BaseTool):
    name = "search"
    description = "Search the web with result caching."
    parameters = {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}

    def __init__(self) -> None:
        self._inner = DuckDuckGoSearchTool()
        self._cache: dict[str, ToolResult] = {}

    async def run(self, query: str = "", **kwargs) -> ToolResult:
        if query in self._cache:
            return self._cache[query]
        result = await self._inner.run(query=query)
        self._cache[query] = result
        return result

Troubleshooting

DuckDuckGo returns empty results — DuckDuckGo rate-limits aggressive automated queries. Add a short asyncio.sleep(1) between searches or switch to TavilySearchTool.

WebScraperTool returns garbled text — some sites return JavaScript-only content that the scraper cannot execute. The observation will contain minimal text; the agent should fall back to the search snippet.

Agent scrapes too many pages and hits max_iterations — add "Do not scrape more than two pages per question" to system_prompt.

Tavily key not recognized — ensure the environment variable is exported before starting the Python process: export TAVILY_API_KEY="tvly-...".

Next steps

ReAct Research Assistant — add Wikipedia and arXiv alongside web search
Multi-Tool Orchestration — combine search with calculators, databases, and code execution
Agent with Safety Guardrails — validate that output does not contain PII or blocked topics

Prerequisites​

What you'll learn​

Step 1: Import tools​

Step 2: Build the agent with free search​

Step 3: Run a current-events query​

Step 4: Upgrade to Tavily for richer results​

Step 5: Stream the agent's search process​

Step 6: Batch multiple queries​

Complete working example​

Expected output​

How it works​

Variations​

Troubleshooting​

Next steps​