Source: Cerebras model docs. All model IDs use the prefix
cerebras/. Powered by Cerebras wafer-scale chips — the world’s largest AI accelerator — delivering up to 3000+ tokens/second.All Models
gpt-oss-120b
Reasoning · Speed
cerebras/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token), running at up to 3000 tokens/s on Cerebras wafer-scale hardware. Near-parity with o4-mini on reasoning benchmarks. Supports extended thinking. Apache 2.0.- $0.35 / $0.75
- 128K context
- Text input
- Thinking
- Knowledge cutoff Jun 2024
qwen-3-235b-a22b-instruct-2507
Reasoning · Speed
cerebras/qwen-3-235b-a22b-instruct-2507Qwen 3 235B MoE model with 22B active parameters, running at ~1400 tokens/s on Cerebras wafer-scale hardware. Hybrid thinking mode with strong multilingual and agentic capabilities.- $0.60 / $1.20
- 128K context
- Text input
- Hybrid thinking
- Knowledge cutoff ~early 2025
zai-glm-4.7
Reasoning · Speed
cerebras/zai-glm-4.7ZAI GLM 4.7 with 355B parameters, running at ~1000 tokens/s on Cerebras hardware. Strong multilingual reasoning and instruction-following capabilities.- $2.25 / $2.75
- 128K context
- Text input
- Knowledge cutoff ~early 2025
llama3.1-8b
Reasoning · Speed
cerebras/llama3.1-8bMeta’s Llama 3.1 8B running at up to 2200 tokens/s on Cerebras wafer-scale hardware. The fastest and most affordable entry point for real-time applications.- $0.10 / $0.10
- 128K context
- Text input
- Knowledge cutoff Jul 2024