Skip to main content

Hugging Face

Access thousands of open-source models via the Hugging Face Inference API. Supports both the free Serverless API and Dedicated Inference Endpoints.

Install

pip install synapsekit[huggingface]

Setup

export HUGGINGFACE_API_KEY=hf_...

Usage

from synapsekit.llm.huggingface import HuggingFaceLLM
from synapsekit import LLMConfig
import os

# Serverless Inference API
config = LLMConfig(
model="meta-llama/Llama-3.2-3B-Instruct",
api_key=os.environ["HUGGINGFACE_API_KEY"],
provider="huggingface",
)

llm = HuggingFaceLLM(config)

# Streaming
async for token in llm.stream("Explain transformer architecture"):
print(token, end="")

# Generate
response = await llm.generate("What is the Hugging Face Hub?")

Dedicated Inference Endpoints

llm = HuggingFaceLLM(
config,
endpoint_url="https://your-endpoint.huggingface.cloud",
)
ModelNotes
meta-llama/Llama-3.2-3B-InstructFast, small Llama 3.2
meta-llama/Llama-3.1-8B-InstructBalanced quality
mistralai/Mistral-7B-Instruct-v0.3Mistral 7B
HuggingFaceH4/zephyr-7b-betaStrong instruction following
Qwen/Qwen2.5-7B-InstructMultilingual, 7B

Via RAG facade

from synapsekit import RAG
import os

rag = RAG(
model="meta-llama/Llama-3.2-3B-Instruct",
api_key=os.environ["HUGGINGFACE_API_KEY"],
provider="huggingface",
)
rag.add("Your document text here")
answer = rag.ask_sync("Summarize this.")
tip

The free Serverless Inference API has rate limits. For production use, deploy a Dedicated Inference Endpoint and pass the URL via endpoint_url.