Skip to main content

Cohere

Cohere provides Command R and Command R+ models optimized for retrieval-augmented generation, along with best-in-class embedding and reranking models.

Installation

pip install synapsekit[cohere]
export CO_API_KEY=...

Models

Chat / generation models

ModelContext windowBest forCost (input / output per 1M tokens)
command-r-plus-08-2024128KComplex reasoning, tool use$2.50 / $10.00
command-r-08-2024128KRAG, fast responses$0.15 / $0.60
command-r-plus128KPrevious generation flagship$3.00 / $15.00
command-r128KPrevious generation balanced$0.50 / $1.50
command4KLegacy, short tasks$1.00 / $2.00

Embedding models

ModelDimensionsBest for
embed-english-v3.01024English text
embed-multilingual-v3.01024100+ languages
embed-english-light-v3.0384Fast English

Reranking models

ModelBest for
rerank-english-v3.0High-accuracy English reranking
rerank-multilingual-v3.0Multilingual reranking

Via the RAG facade

from synapsekit import RAG

rag = RAG(model="command-r-08-2024", api_key="...")
rag.add("Your document text here")

answer = rag.ask_sync("Summarize the document.")

Direct usage

from synapsekit.llm.cohere import CohereLLM
from synapsekit.llm.base import LLMConfig

llm = CohereLLM(LLMConfig(
model="command-r-08-2024",
api_key="...",
provider="cohere",
temperature=0.3,
max_tokens=1024,
))

response = await llm.generate("Explain retrieval-augmented generation in three sentences")
print(response)

Streaming

async for token in llm.stream("Write a Python function to compute cosine similarity"):
print(token, end="", flush=True)

Function calling / tool use

Command R and Command R+ support native tool use:

from synapsekit.tools import tool
from synapsekit.agents import FunctionCallingAgent
from synapsekit.llm.cohere import CohereLLM

@tool
def search_documents(query: str, top_k: int = 5) -> list[dict]:
"""Search the document store for relevant passages."""
return [
{"id": 1, "text": "Vector embeddings represent semantic meaning...", "score": 0.92},
{"id": 2, "text": "RAG retrieves context before generation...", "score": 0.87},
]

@tool
def get_document(doc_id: int) -> str:
"""Retrieve the full text of a document by ID."""
docs = {1: "Full article on vector embeddings...", 2: "Full article on RAG..."}
return docs.get(doc_id, "Not found")

llm = CohereLLM(LLMConfig(model="command-r-plus-08-2024", api_key="...", provider="cohere"))
agent = FunctionCallingAgent(llm=llm, tools=[search_documents, get_document])

result = await agent.arun("Find articles about RAG and get the full content of the top result")
print(result)

Reranking

Rerank retrieval results for higher precision:

from synapsekit.retrievers.cohere_reranker import CohereReranker

reranker = CohereReranker(model="rerank-english-v3.0", api_key="...")

query = "What is RAG?"
documents = [
"RAG stands for Retrieval-Augmented Generation",
"Python is a programming language",
"RAG combines LLMs with document retrieval for accurate answers",
]

results = await reranker.rerank(query=query, documents=documents, top_n=2)
for r in results:
print(f"Score: {r.relevance_score:.3f}{r.document}")

Cost tracking

from synapsekit import CostTracker
from synapsekit.llm.cohere import CohereLLM

tracker = CostTracker()
llm = CohereLLM(LLMConfig(model="command-r-08-2024", api_key="...", provider="cohere"))

with tracker.scope("rag-query"):
response = await llm.generate("Summarize the benefits of RAG")
rec = tracker.record("command-r-08-2024", input_tokens=100, output_tokens=150)

print(f"Cost: ${rec.cost_usd:.6f}")

Error handling

import cohere
from synapsekit.llm.cohere import CohereLLM

llm = CohereLLM(LLMConfig(
model="command-r-08-2024",
api_key="...",
provider="cohere",
max_retries=3,
))

try:
response = await llm.generate("Hello")
except cohere.TooManyRequestsError:
print("Rate limit exceeded — reduce request frequency")
except cohere.UnauthorizedError:
print("Invalid API key — check CO_API_KEY")

Environment variables

VariableDescription
CO_API_KEYCohere API key
COHERE_API_KEYAlternative environment variable name

Supported models

  • command-r-plus-08-2024 — most capable
  • command-r-08-2024 — faster, cheaper, RAG-optimized
  • command-r-plus — previous generation
  • command-r — previous generation balanced
  • command — legacy

See Cohere docs for the full list.

See also