Hugging Face

Access thousands of open-source models via the Hugging Face Inference API. Supports both the free Serverless API and Dedicated Inference Endpoints.

Install

pip install synapsekit[huggingface]

Setup

export HUGGINGFACE_API_KEY=hf_...

Usage

from synapsekit.llm.huggingface import HuggingFaceLLM
from synapsekit import LLMConfig
import os

# Serverless Inference API
config = LLMConfig(
    model="meta-llama/Llama-3.2-3B-Instruct",
    api_key=os.environ["HUGGINGFACE_API_KEY"],
    provider="huggingface",
)

llm = HuggingFaceLLM(config)

# Streaming
async for token in llm.stream("Explain transformer architecture"):
    print(token, end="")

# Generate
response = await llm.generate("What is the Hugging Face Hub?")

Dedicated Inference Endpoints

llm = HuggingFaceLLM(
    config,
    endpoint_url="https://your-endpoint.huggingface.cloud",
)

Popular models

Model	Notes
`meta-llama/Llama-3.2-3B-Instruct`	Fast, small Llama 3.2
`meta-llama/Llama-3.1-8B-Instruct`	Balanced quality
`mistralai/Mistral-7B-Instruct-v0.3`	Mistral 7B
`HuggingFaceH4/zephyr-7b-beta`	Strong instruction following
`Qwen/Qwen2.5-7B-Instruct`	Multilingual, 7B

Via RAG facade

from synapsekit import RAG
import os

rag = RAG(
    model="meta-llama/Llama-3.2-3B-Instruct",
    api_key=os.environ["HUGGINGFACE_API_KEY"],
    provider="huggingface",
)
rag.add("Your document text here")
answer = rag.ask_sync("Summarize this.")

tip

The free Serverless Inference API has rate limits. For production use, deploy a Dedicated Inference Endpoint and pass the URL via endpoint_url.

Install​

Setup​

Usage​

Dedicated Inference Endpoints​

Popular models​

Via RAG facade​