Skip to main content
Source: Cerebras model docs. All model IDs use the prefix cerebras/. Powered by Cerebras wafer-scale chips — the world’s largest AI accelerator — delivering up to 3000+ tokens/second.
Cerebras models are not yet available through the Timbal platform proxy. A CEREBRAS_API_KEY is required to use these models. If you’d like to access Cerebras models via your Timbal API key, please contact sales.

All Models

gpt-oss-120b

Reasoning · Speedcerebras/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token), running at up to 3000 tokens/s on Cerebras wafer-scale hardware. Near-parity with o4-mini on reasoning benchmarks. Supports extended thinking. Apache 2.0.
  • $0.35 / $0.75
  • 128K context
  • Text input
  • Thinking
  • Knowledge cutoff Jun 2024

qwen-3-235b-a22b-instruct-2507

Reasoning · Speedcerebras/qwen-3-235b-a22b-instruct-2507Qwen 3 235B MoE model with 22B active parameters, running at ~1400 tokens/s on Cerebras wafer-scale hardware. Hybrid thinking mode with strong multilingual and agentic capabilities.
  • $0.60 / $1.20
  • 128K context
  • Text input
  • Hybrid thinking
  • Knowledge cutoff ~early 2025

zai-glm-4.7

Reasoning · Speedcerebras/zai-glm-4.7ZAI GLM 4.7 with 355B parameters, running at ~1000 tokens/s on Cerebras hardware. Strong multilingual reasoning and instruction-following capabilities.
  • $2.25 / $2.75
  • 128K context
  • Text input
  • Knowledge cutoff ~early 2025

llama3.1-8b

Reasoning · Speedcerebras/llama3.1-8bMeta’s Llama 3.1 8B running at up to 2200 tokens/s on Cerebras wafer-scale hardware. The fastest and most affordable entry point for real-time applications.
  • $0.10 / $0.10
  • 128K context
  • Text input
  • Knowledge cutoff Jul 2024