Skip to main content
Source: Fireworks model docs. All model IDs use the prefix fireworks/accounts/fireworks/models/. Prices marked “on-demand” require a dedicated deployment.

Meta LLaMA

llama4-maverick-instruct-basic

Reasoning · Speedfireworks/accounts/fireworks/models/llama4-maverick-instruct-basicNatively multimodal MoE with 17B active parameters and 128 experts (400B total), supporting 1M token context for multimodal tasks.
  • $0.27 / $0.85
  • 1M context
  • Text, Image input
  • Knowledge cutoff Aug 2024

llama4-scout-instruct-basic

Reasoning · Speedfireworks/accounts/fireworks/models/llama4-scout-instruct-basicNatively multimodal MoE with 17B active parameters and 16 experts, featuring an industry-leading 10M token context on a single GPU.
  • 10M context
  • Text, Image input
  • Knowledge cutoff Aug 2024

llama-v3p3-70b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/llama-v3p3-70b-instructMultilingual 70B-parameter model matching Llama 3.2 90B on text-only tasks with broad language support.
  • ~$0.90 / ~$0.90
  • 128K context
  • Text input
  • Knowledge cutoff Dec 2023

llama-v3p1-405b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/llama-v3p1-405b-instructThe largest Llama 3.1 model with 405B parameters, optimized for synthetic data generation, LLM-as-a-Judge, and distillation use cases.
  • ~$3 / ~$3
  • 128K context
  • Text input
  • Knowledge cutoff Jul 2024

llama-v3p1-70b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/llama-v3p1-70b-instructMultilingual 70B-parameter model designed for large-scale AI-native applications with 128K context and 8-language support.
  • ~$0.90 / ~$0.90
  • 128K context
  • Text input
  • Knowledge cutoff Jul 2024

llama-v3p1-8b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/llama-v3p1-8b-instructThe most compact Llama 3.1 model with 8B parameters for efficient deployment on consumer GPUs with multilingual support.
  • ~$0.20 / ~$0.20
  • 128K context
  • Text input
  • Knowledge cutoff Jul 2024

Qwen

qwen3-coder-480b-a35b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/qwen3-coder-480b-a35b-instructQwen’s most agentic code model, a 480B-parameter MoE (35B active) for agentic coding, browser-use, and repository-scale tasks.
  • $0.45 / $1.80
  • 262K context
  • Text input

qwen3-235b-a22b

Reasoning · Speedfireworks/accounts/fireworks/models/qwen3-235b-a22bMoE model with 235B total parameters (22B active), supporting seamless switching between thinking and non-thinking modes across 100+ languages.
  • 262K context
  • Text input
  • Hybrid thinking
  • Knowledge cutoff ~early 2025

qwen3-32b

Reasoning · Speedfireworks/accounts/fireworks/models/qwen3-32bThe largest dense Qwen3 model with 32B parameters and hybrid thinking mode. Performs on par with Qwen2.5-72B despite being less than half the size.
  • 131K context
  • Text input
  • Hybrid thinking
  • Knowledge cutoff ~early 2025

qwen3-8b

Reasoning · Speedfireworks/accounts/fireworks/models/qwen3-8bDense 8B-parameter model with seamless switching between thinking mode and non-thinking mode. Apache 2.0.
  • ~$0.20 / ~$0.20
  • 131K context
  • Text input
  • Hybrid thinking
  • Knowledge cutoff ~early 2025

qwen2p5-72b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/qwen2p5-72b-instructFlagship dense model of the Qwen2.5 series with 72B parameters, strong across coding, math, and instruction following with 29+ language support.
  • 128K context
  • Text input
  • Knowledge cutoff ~Oct 2023

qwq-32b

Reasoning · Speedfireworks/accounts/fireworks/models/qwq-32bQwen’s dedicated reasoning model with 32B parameters, trained via reinforcement learning with outcome-based rewards. Competitive with DeepSeek-R1 and o1-mini. Apache 2.0.
  • 131K context
  • Text input
  • Thinking
  • Knowledge cutoff ~late 2024

DeepSeek

deepseek-r1

Reasoning · Speedfireworks/accounts/fireworks/models/deepseek-r1Open reasoning model with 671B total parameters (37B active, MoE), trained via large-scale reinforcement learning. Achieves performance comparable to OpenAI o1. MIT license.
  • $3 / $8
  • 128K context
  • Text input
  • Thinking always
  • Knowledge cutoff Jul 2024

deepseek-v3p2

Reasoning · Speedfireworks/accounts/fireworks/models/deepseek-v3p2MoE model harmonizing efficiency with superior reasoning and agent performance, featuring DeepSeek Sparse Attention for long-context efficiency.
  • $0.56 / $1.68
  • 163K context
  • Text input

deepseek-v3p1

Reasoning · Speedfireworks/accounts/fireworks/models/deepseek-v3p1Hybrid model supporting both thinking and non-thinking modes with significantly improved tool usage and agent task performance.
  • ~$0.56 / ~$1.68
  • 128K context
  • Text input
  • Hybrid thinking
  • Knowledge cutoff ~mid 2025

deepseek-r1-0528

Reasoning · Speedfireworks/accounts/fireworks/models/deepseek-r1-0528Updated R1 (also called R1.1) with improved RL, reduced hallucinations, JSON output, and function calling support. Performance approaching o3 and Gemini 2.5 Pro.
  • $3 / $8
  • 128K context
  • Text input
  • Thinking always
  • Knowledge cutoff Jul 2024

deepseek-r1-distill-llama-70b

Reasoning · Speedfireworks/accounts/fireworks/models/deepseek-r1-distill-llama-70bA 70B dense model derived from Llama-3.3-70B, fine-tuned on reasoning data generated by DeepSeek-R1 for strong chain-of-thought in a smaller form factor.
  • 128K context
  • Text input
  • Thinking
  • Knowledge cutoff Jul 2024

Kimi / MiniMax / GLM

kimi-k2p5

Reasoning · Speedfireworks/accounts/fireworks/models/kimi-k2p5Moonshot AI’s open-source multimodal model with 1T total parameters (32B active) and Agent Swarm technology for coordinating up to 100 specialized AI agents.
  • $0.60 / $3
  • 256K context
  • Text, Image, Video input
  • Thinking
  • Knowledge cutoff Apr 2024

kimi-k2-instruct-0905

Reasoning · Speedfireworks/accounts/fireworks/models/kimi-k2-instruct-0905General-purpose chat and agentic model optimized as a reflex-grade model without long thinking, with strong autonomous tool calling.
  • 256K context
  • Text input
  • Knowledge cutoff Sep 2024

kimi-k2-thinking

Reasoning · Speedfireworks/accounts/fireworks/models/kimi-k2-thinkingOpen-source thinking agent that reasons step-by-step while dynamically invoking tools, supporting 200-300 sequential tool calls without drift.
  • $0.60 / $2.50
  • 256K context
  • Text input
  • Thinking always
  • Knowledge cutoff ~Sep 2024

minimax-m2p5

Reasoning · Speedfireworks/accounts/fireworks/models/minimax-m2p5MoE model with 230B total parameters (10B active), achieving SOTA in coding and agentic tool use, completing tasks 37% faster than its predecessor.
  • 200K context
  • Text input

glm-5

Reasoning · Speedfireworks/accounts/fireworks/models/glm-5Zhipu AI’s fifth-generation model with ~745B MoE parameters (44B active), designed for complex system engineering and long-range agent tasks.
  • $1 / $3.20
  • 200K context
  • Text input
  • Thinking
  • Knowledge cutoff late 2025

glm-4p5

Reasoning · Speedfireworks/accounts/fireworks/models/glm-4p5Zhipu AI’s foundation model with strong coding, reasoning, and agentic capabilities for real-world development environments.
  • $0.22 / $0.88
  • 128K context
  • Text input
  • Thinking
  • Knowledge cutoff ~mid 2024

Other

gpt-oss-120b

Reasoning · Speedfireworks/accounts/fireworks/models/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active), achieving near-parity with o4-mini on reasoning benchmarks. Apache 2.0.
  • $0.15 / $0.60
  • 128K context
  • Text input
  • Thinking
  • Knowledge cutoff Jun 2024

gpt-oss-20b

Reasoning · Speedfireworks/accounts/fireworks/models/gpt-oss-20bOpenAI’s compact open-weight MoE model with 20B total parameters (3.6B active), similar to o3-mini. Runs on edge devices with 16GB memory. Apache 2.0.
  • $0.07 / $0.30
  • 128K context
  • Text input
  • Thinking
  • Knowledge cutoff Jun 2024

mistral-large-3-fp8

Reasoning · Speedfireworks/accounts/fireworks/models/mistral-large-3-fp8Mistral’s open-weight multimodal frontier model with granular MoE architecture (41B active, 675B total) supporting 256K context.
  • 256K context
  • Text, Image input
  • On-demand pricing

mistral-small-24b-instruct-2501

Reasoning · Speedfireworks/accounts/fireworks/models/mistral-small-24b-instruct-2501A 24B dense model with native function calling, JSON output, and dozens of languages. Fits on a single RTX 4090.
  • 32K context
  • Text input
  • Knowledge cutoff Oct 2023

gemma-3-27b-it

Reasoning · Speedfireworks/accounts/fireworks/models/gemma-3-27b-itGoogle’s multimodal open model with 27B parameters, supporting 128K context and 140+ languages on a single GPU/TPU.
  • $0.10 / $0.10
  • 128K context
  • Text, Image input
  • Knowledge cutoff Aug 2024