Fireworks - Timbal

Source: Fireworks model docs. All model IDs use the prefix fireworks/accounts/fireworks/models/. Prices marked “on-demand” require a dedicated deployment.

Meta LLaMA

llama4-maverick-instruct-basic

Reasoning · Speedfireworks/accounts/fireworks/models/llama4-maverick-instruct-basicNatively multimodal MoE with 17B active parameters and 128 experts (400B total), supporting 1M token context for multimodal tasks.

$0.27 / $0.85
1M context
Text, Image input
Knowledge cutoff Aug 2024

llama4-scout-instruct-basic

Reasoning · Speedfireworks/accounts/fireworks/models/llama4-scout-instruct-basicNatively multimodal MoE with 17B active parameters and 16 experts, featuring an industry-leading 10M token context on a single GPU.

~$1.20 / ~$1.20
10M context
Text, Image input
Knowledge cutoff Aug 2024

llama-v3p3-70b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/llama-v3p3-70b-instructMultilingual 70B-parameter model matching Llama 3.2 90B on text-only tasks with broad language support.

~$0.90 / ~$0.90
128K context
Text input
Knowledge cutoff Dec 2023

llama-v3p1-405b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/llama-v3p1-405b-instructThe largest Llama 3.1 model with 405B parameters, optimized for synthetic data generation, LLM-as-a-Judge, and distillation use cases.

~$3 / ~$3
128K context
Text input
Knowledge cutoff Jul 2024

llama-v3p1-70b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/llama-v3p1-70b-instructMultilingual 70B-parameter model designed for large-scale AI-native applications with 128K context and 8-language support.

~$0.90 / ~$0.90
128K context
Text input
Knowledge cutoff Jul 2024

llama-v3p1-8b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/llama-v3p1-8b-instructThe most compact Llama 3.1 model with 8B parameters for efficient deployment on consumer GPUs with multilingual support.

~$0.20 / ~$0.20
128K context
Text input
Knowledge cutoff Jul 2024

Qwen

qwen3-coder-480b-a35b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/qwen3-coder-480b-a35b-instructQwen’s most agentic code model, a 480B-parameter MoE (35B active) for agentic coding, browser-use, and repository-scale tasks.

$0.45 / $1.80
262K context
Text input

qwen3-235b-a22b

Reasoning · Speedfireworks/accounts/fireworks/models/qwen3-235b-a22bMoE model with 235B total parameters (22B active), supporting seamless switching between thinking and non-thinking modes across 100+ languages.

~$1.20 / ~$1.20
262K context
Text input
Hybrid thinking
Knowledge cutoff ~early 2025

qwen3-32b

Reasoning · Speedfireworks/accounts/fireworks/models/qwen3-32bThe largest dense Qwen3 model with 32B parameters and hybrid thinking mode. Performs on par with Qwen2.5-72B despite being less than half the size.

~$0.90 / ~$0.90
131K context
Text input
Hybrid thinking
Knowledge cutoff ~early 2025

qwen3-8b

Reasoning · Speedfireworks/accounts/fireworks/models/qwen3-8bDense 8B-parameter model with seamless switching between thinking mode and non-thinking mode. Apache 2.0.

~$0.20 / ~$0.20
131K context
Text input
Hybrid thinking
Knowledge cutoff ~early 2025

qwen2p5-72b-instruct

Reasoning · Speedfireworks/accounts/fireworks/models/qwen2p5-72b-instructFlagship dense model of the Qwen2.5 series with 72B parameters, strong across coding, math, and instruction following with 29+ language support.

~$0.90 / ~$0.90
128K context
Text input
Knowledge cutoff ~Oct 2023

qwq-32b

Reasoning · Speedfireworks/accounts/fireworks/models/qwq-32bQwen’s dedicated reasoning model with 32B parameters, trained via reinforcement learning with outcome-based rewards. Competitive with DeepSeek-R1 and o1-mini. Apache 2.0.

~$0.90 / ~$0.90
131K context
Text input
Thinking
Knowledge cutoff ~late 2024

DeepSeek

deepseek-r1

Reasoning · Speedfireworks/accounts/fireworks/models/deepseek-r1Open reasoning model with 671B total parameters (37B active, MoE), trained via large-scale reinforcement learning. Achieves performance comparable to OpenAI o1. MIT license.

$3 / $8
128K context
Text input
Thinking always
Knowledge cutoff Jul 2024

deepseek-v3p2

Reasoning · Speedfireworks/accounts/fireworks/models/deepseek-v3p2MoE model harmonizing efficiency with superior reasoning and agent performance, featuring DeepSeek Sparse Attention for long-context efficiency.

$0.56 / $1.68
163K context
Text input

deepseek-v3p1

Reasoning · Speedfireworks/accounts/fireworks/models/deepseek-v3p1Hybrid model supporting both thinking and non-thinking modes with significantly improved tool usage and agent task performance.

~$0.56 / ~$1.68
128K context
Text input
Hybrid thinking
Knowledge cutoff ~mid 2025

deepseek-r1-0528

Reasoning · Speedfireworks/accounts/fireworks/models/deepseek-r1-0528Updated R1 (also called R1.1) with improved RL, reduced hallucinations, JSON output, and function calling support. Performance approaching o3 and Gemini 2.5 Pro.

$3 / $8
128K context
Text input
Thinking always
Knowledge cutoff Jul 2024

deepseek-r1-distill-llama-70b

Reasoning · Speedfireworks/accounts/fireworks/models/deepseek-r1-distill-llama-70bA 70B dense model derived from Llama-3.3-70B, fine-tuned on reasoning data generated by DeepSeek-R1 for strong chain-of-thought in a smaller form factor.

~$0.90 / ~$0.90
128K context
Text input
Thinking
Knowledge cutoff Jul 2024

Kimi / MiniMax / GLM

kimi-k2p5

Reasoning · Speedfireworks/accounts/fireworks/models/kimi-k2p5Moonshot AI’s open-source multimodal model with 1T total parameters (32B active) and Agent Swarm technology for coordinating up to 100 specialized AI agents.

$0.60 / $3
256K context
Text, Image, Video input
Thinking
Knowledge cutoff Apr 2024

kimi-k2-instruct-0905

Reasoning · Speedfireworks/accounts/fireworks/models/kimi-k2-instruct-0905General-purpose chat and agentic model optimized as a reflex-grade model without long thinking, with strong autonomous tool calling.

~$0.60 / ~$2.50
256K context
Text input
Knowledge cutoff Sep 2024

kimi-k2-thinking

Reasoning · Speedfireworks/accounts/fireworks/models/kimi-k2-thinkingOpen-source thinking agent that reasons step-by-step while dynamically invoking tools, supporting 200-300 sequential tool calls without drift.

$0.60 / $2.50
256K context
Text input
Thinking always
Knowledge cutoff ~Sep 2024

minimax-m2p5

Reasoning · Speedfireworks/accounts/fireworks/models/minimax-m2p5MoE model with 230B total parameters (10B active), achieving SOTA in coding and agentic tool use, completing tasks 37% faster than its predecessor.

~$0.30 / ~$1.20
200K context
Text input

glm-5

Reasoning · Speedfireworks/accounts/fireworks/models/glm-5Zhipu AI’s fifth-generation model with ~745B MoE parameters (44B active), designed for complex system engineering and long-range agent tasks.

$1 / $3.20
200K context
Text input
Thinking
Knowledge cutoff late 2025

glm-4p5

Reasoning · Speedfireworks/accounts/fireworks/models/glm-4p5Zhipu AI’s foundation model with strong coding, reasoning, and agentic capabilities for real-world development environments.

$0.22 / $0.88
128K context
Text input
Thinking
Knowledge cutoff ~mid 2024

Other

gpt-oss-120b

Reasoning · Speedfireworks/accounts/fireworks/models/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active), achieving near-parity with o4-mini on reasoning benchmarks. Apache 2.0.

$0.15 / $0.60
128K context
Text input
Thinking
Knowledge cutoff Jun 2024

gpt-oss-20b

Reasoning · Speedfireworks/accounts/fireworks/models/gpt-oss-20bOpenAI’s compact open-weight MoE model with 20B total parameters (3.6B active), similar to o3-mini. Runs on edge devices with 16GB memory. Apache 2.0.

$0.07 / $0.30
128K context
Text input
Thinking
Knowledge cutoff Jun 2024

mistral-large-3-fp8

Reasoning · Speedfireworks/accounts/fireworks/models/mistral-large-3-fp8Mistral’s open-weight multimodal frontier model with granular MoE architecture (41B active, 675B total) supporting 256K context.

~$0.90 / ~$0.90
256K context
Text, Image input

mistral-small-24b-instruct-2501

Reasoning · Speedfireworks/accounts/fireworks/models/mistral-small-24b-instruct-2501A 24B dense model with native function calling, JSON output, and dozens of languages. Fits on a single RTX 4090.

~$0.90 / ~$0.90
32K context
Text input
Knowledge cutoff Oct 2023

gemma-3-27b-it

Reasoning · Speedfireworks/accounts/fireworks/models/gemma-3-27b-itGoogle’s multimodal open model with 27B parameters, supporting 128K context and 140+ languages on a single GPU/TPU.

$0.10 / $0.10
128K context
Text, Image input
Knowledge cutoff Aug 2024

Model Reference

​Meta LLaMA

llama4-maverick-instruct-basic

llama4-scout-instruct-basic

llama-v3p3-70b-instruct

llama-v3p1-405b-instruct

llama-v3p1-70b-instruct

llama-v3p1-8b-instruct

​Qwen

qwen3-coder-480b-a35b-instruct

qwen3-235b-a22b

qwen3-32b

qwen3-8b

qwen2p5-72b-instruct

qwq-32b

​DeepSeek

deepseek-r1

deepseek-v3p2

deepseek-v3p1

deepseek-r1-0528

deepseek-r1-distill-llama-70b

​Kimi / MiniMax / GLM

kimi-k2p5

kimi-k2-instruct-0905

kimi-k2-thinking

minimax-m2p5

glm-5

glm-4p5

​Other

gpt-oss-120b

gpt-oss-20b

mistral-large-3-fp8

mistral-small-24b-instruct-2501

gemma-3-27b-it

Meta LLaMA

Qwen

DeepSeek

Kimi / MiniMax / GLM

Other