Source: Groq model docs. All model IDs use the prefix
groq/. Ultra-low latency inference via custom LPU hardware.All Models
meta-llama/llama-4-scout-17b-16e-instruct
Reasoning · Speed
groq/meta-llama/llama-4-scout-17b-16e-instructNatively multimodal MoE model with 17B active parameters and 16 experts, supporting an industry-leading 10M token context length.- $0.11 / $0.34
- 10M context
- Text, Image input
- Knowledge cutoff Aug 2024
llama-3.3-70b-versatile
Reasoning · Speed
groq/llama-3.3-70b-versatileMultilingual instruction-tuned model with 70B parameters, optimized for versatile tasks with Groq’s ultra-low latency inference.- $0.59 / $0.79
- 128K context
- Text input
- Knowledge cutoff Dec 2023
llama-3.1-8b-instant
Reasoning · Speed
groq/llama-3.1-8b-instantThe most compact Llama 3.1 model with 8B parameters, optimized for instant responses on Groq’s LPU hardware.- $0.05 / $0.08
- 128K context
- Text input
- Knowledge cutoff Jul 2024
qwen/qwen3-32b
Reasoning · Speed
groq/qwen/qwen3-32bThe largest dense model in the Qwen3 series with hybrid thinking mode, delivered at Groq’s ultra-low latency for seamless switching between deep reasoning and fast chat.- $0.29 / $0.59
- 131K context
- Text input
- Hybrid thinking
- Knowledge cutoff ~early 2025
openai/gpt-oss-120b
Reasoning · Speed
groq/openai/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token), running at Groq speeds. Near-parity with o4-mini on reasoning benchmarks. Apache 2.0.- $0.15 / $0.75
- 128K context
- Text input
- Thinking
- Knowledge cutoff Jun 2024
openai/gpt-oss-20b
Reasoning · Speed
groq/openai/gpt-oss-20bOpenAI’s compact open-weight MoE model with 20B total parameters (3.6B active), delivering results similar to o3-mini at Groq’s ultra-low latency. Apache 2.0.- $0.075 / $0.30
- 128K context
- Text input
- Thinking
- Knowledge cutoff Jun 2024
moonshotai/kimi-k2-instruct-0905
Reasoning · Speed
groq/moonshotai/kimi-k2-instruct-0905Moonshot AI’s general-purpose chat and agentic model with 1T total parameters (32B active), running at Groq speeds for rapid tool-calling and autonomous task execution.- $1 / $3
- 256K context
- Text input
- Knowledge cutoff Sep 2024