Skip to main content
Source: Groq model docs. All model IDs use the prefix groq/. Ultra-low latency inference via custom LPU hardware.

All Models

meta-llama/llama-4-scout-17b-16e-instruct

Reasoning · Speedgroq/meta-llama/llama-4-scout-17b-16e-instructNatively multimodal MoE model with 17B active parameters and 16 experts, supporting an industry-leading 10M token context length.
  • $0.11 / $0.34
  • 10M context
  • Text, Image input
  • Knowledge cutoff Aug 2024

llama-3.3-70b-versatile

Reasoning · Speedgroq/llama-3.3-70b-versatileMultilingual instruction-tuned model with 70B parameters, optimized for versatile tasks with Groq’s ultra-low latency inference.
  • $0.59 / $0.79
  • 128K context
  • Text input
  • Knowledge cutoff Dec 2023

llama-3.1-8b-instant

Reasoning · Speedgroq/llama-3.1-8b-instantThe most compact Llama 3.1 model with 8B parameters, optimized for instant responses on Groq’s LPU hardware.
  • $0.05 / $0.08
  • 128K context
  • Text input
  • Knowledge cutoff Jul 2024

qwen/qwen3-32b

Reasoning · Speedgroq/qwen/qwen3-32bThe largest dense model in the Qwen3 series with hybrid thinking mode, delivered at Groq’s ultra-low latency for seamless switching between deep reasoning and fast chat.
  • $0.29 / $0.59
  • 131K context
  • Text input
  • Hybrid thinking
  • Knowledge cutoff ~early 2025

openai/gpt-oss-120b

Reasoning · Speedgroq/openai/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token), running at Groq speeds. Near-parity with o4-mini on reasoning benchmarks. Apache 2.0.
  • $0.15 / $0.75
  • 128K context
  • Text input
  • Thinking
  • Knowledge cutoff Jun 2024

openai/gpt-oss-20b

Reasoning · Speedgroq/openai/gpt-oss-20bOpenAI’s compact open-weight MoE model with 20B total parameters (3.6B active), delivering results similar to o3-mini at Groq’s ultra-low latency. Apache 2.0.
  • $0.075 / $0.30
  • 128K context
  • Text input
  • Thinking
  • Knowledge cutoff Jun 2024

moonshotai/kimi-k2-instruct-0905

Reasoning · Speedgroq/moonshotai/kimi-k2-instruct-0905Moonshot AI’s general-purpose chat and agentic model with 1T total parameters (32B active), running at Groq speeds for rapid tool-calling and autonomous task execution.
  • $1 / $3
  • 256K context
  • Text input
  • Knowledge cutoff Sep 2024