Cerebras - Timbal

All Models

Source: Cerebras model docs. All model IDs use the prefix cerebras/. Powered by Cerebras wafer-scale chips — the world’s largest AI accelerator — delivering up to 3000+ tokens/second.

Cerebras models are not yet available through the Timbal platform proxy. A CEREBRAS_API_KEY is required to use these models. If you’d like to access Cerebras models via your Timbal API key, please contact sales.

All Models

gpt-oss-120b

Reasoning · Speedcerebras/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token), running at up to 3000 tokens/s on Cerebras wafer-scale hardware. Near-parity with o4-mini on reasoning benchmarks. Supports extended thinking. Apache 2.0.

$0.35 / $0.75
128K context
Text input
Thinking
Knowledge cutoff Jun 2024

qwen-3-235b-a22b-instruct-2507

Reasoning · Speedcerebras/qwen-3-235b-a22b-instruct-2507Qwen 3 235B MoE model with 22B active parameters, running at ~1400 tokens/s on Cerebras wafer-scale hardware. Hybrid thinking mode with strong multilingual and agentic capabilities.

$0.60 / $1.20
128K context
Text input
Hybrid thinking
Knowledge cutoff ~early 2025

zai-glm-4.7

Reasoning · Speedcerebras/zai-glm-4.7ZAI GLM 4.7 with 355B parameters, running at ~1000 tokens/s on Cerebras hardware. Strong multilingual reasoning and instruction-following capabilities.

$2.25 / $2.75
128K context
Text input
Knowledge cutoff ~early 2025

llama3.1-8b

Reasoning · Speedcerebras/llama3.1-8bMeta’s Llama 3.1 8B running at up to 2200 tokens/s on Cerebras wafer-scale hardware. The fastest and most affordable entry point for real-time applications.

$0.10 / $0.10
128K context
Text input
Knowledge cutoff Jul 2024

BytePlus SambaNova

Model Reference

Documentation Index

​All Models

gpt-oss-120b

qwen-3-235b-a22b-instruct-2507

zai-glm-4.7

llama3.1-8b

All Models