Source: Cerebras model docs. All model IDs use the prefix
cerebras/. Powered by Cerebras wafer-scale chips — the world’s largest AI accelerator — delivering up to 3000+ tokens/second.All Models
gpt-oss-120b
Reasoning · Speed
cerebras/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token), running at up to 3000 tokens/s on Cerebras wafer-scale hardware. Near-parity with o4-mini on reasoning benchmarks. Supports extended thinking. Apache 2.0.- $0.35 / $0.75
- 128K context
- Text input
- Thinking
- Knowledge cutoff Jun 2024
zai-glm-4.7
Reasoning · Speed
cerebras/zai-glm-4.7ZAI GLM 4.7 with 355B parameters, running at ~1000 tokens/s on Cerebras hardware. Strong multilingual reasoning and instruction-following capabilities.- $2.25 / $2.75
- 128K context
- Text input
- Knowledge cutoff ~early 2025