Source: Fireworks model docs. All model IDs use the prefix
fireworks/accounts/fireworks/models/. Prices marked “on-demand” require a dedicated deployment.Meta LLaMA
llama4-maverick-instruct-basic
Reasoning · Speed
fireworks/accounts/fireworks/models/llama4-maverick-instruct-basicNatively multimodal MoE with 17B active parameters and 128 experts (400B total), supporting 1M token context for multimodal tasks.- $0.27 / $0.85
- 1M context
- Text, Image input
- Knowledge cutoff Aug 2024
llama4-scout-instruct-basic
Reasoning · Speed
fireworks/accounts/fireworks/models/llama4-scout-instruct-basicNatively multimodal MoE with 17B active parameters and 16 experts, featuring an industry-leading 10M token context on a single GPU.- 10M context
- Text, Image input
- Knowledge cutoff Aug 2024
llama-v3p3-70b-instruct
Reasoning · Speed
fireworks/accounts/fireworks/models/llama-v3p3-70b-instructMultilingual 70B-parameter model matching Llama 3.2 90B on text-only tasks with broad language support.- ~$0.90 / ~$0.90
- 128K context
- Text input
- Knowledge cutoff Dec 2023
llama-v3p1-405b-instruct
Reasoning · Speed
fireworks/accounts/fireworks/models/llama-v3p1-405b-instructThe largest Llama 3.1 model with 405B parameters, optimized for synthetic data generation, LLM-as-a-Judge, and distillation use cases.- ~$3 / ~$3
- 128K context
- Text input
- Knowledge cutoff Jul 2024
llama-v3p1-70b-instruct
Reasoning · Speed
fireworks/accounts/fireworks/models/llama-v3p1-70b-instructMultilingual 70B-parameter model designed for large-scale AI-native applications with 128K context and 8-language support.- ~$0.90 / ~$0.90
- 128K context
- Text input
- Knowledge cutoff Jul 2024
llama-v3p1-8b-instruct
Reasoning · Speed
fireworks/accounts/fireworks/models/llama-v3p1-8b-instructThe most compact Llama 3.1 model with 8B parameters for efficient deployment on consumer GPUs with multilingual support.- ~$0.20 / ~$0.20
- 128K context
- Text input
- Knowledge cutoff Jul 2024
Qwen
qwen3-coder-480b-a35b-instruct
Reasoning · Speed
fireworks/accounts/fireworks/models/qwen3-coder-480b-a35b-instructQwen’s most agentic code model, a 480B-parameter MoE (35B active) for agentic coding, browser-use, and repository-scale tasks.- $0.45 / $1.80
- 262K context
- Text input
qwen3-235b-a22b
Reasoning · Speed
fireworks/accounts/fireworks/models/qwen3-235b-a22bMoE model with 235B total parameters (22B active), supporting seamless switching between thinking and non-thinking modes across 100+ languages.- 262K context
- Text input
- Hybrid thinking
- Knowledge cutoff ~early 2025
qwen3-32b
Reasoning · Speed
fireworks/accounts/fireworks/models/qwen3-32bThe largest dense Qwen3 model with 32B parameters and hybrid thinking mode. Performs on par with Qwen2.5-72B despite being less than half the size.- 131K context
- Text input
- Hybrid thinking
- Knowledge cutoff ~early 2025
qwen3-8b
Reasoning · Speed
fireworks/accounts/fireworks/models/qwen3-8bDense 8B-parameter model with seamless switching between thinking mode and non-thinking mode. Apache 2.0.- ~$0.20 / ~$0.20
- 131K context
- Text input
- Hybrid thinking
- Knowledge cutoff ~early 2025
qwen2p5-72b-instruct
Reasoning · Speed
fireworks/accounts/fireworks/models/qwen2p5-72b-instructFlagship dense model of the Qwen2.5 series with 72B parameters, strong across coding, math, and instruction following with 29+ language support.- 128K context
- Text input
- Knowledge cutoff ~Oct 2023
qwq-32b
Reasoning · Speed
fireworks/accounts/fireworks/models/qwq-32bQwen’s dedicated reasoning model with 32B parameters, trained via reinforcement learning with outcome-based rewards. Competitive with DeepSeek-R1 and o1-mini. Apache 2.0.- 131K context
- Text input
- Thinking
- Knowledge cutoff ~late 2024
DeepSeek
deepseek-r1
Reasoning · Speed
fireworks/accounts/fireworks/models/deepseek-r1Open reasoning model with 671B total parameters (37B active, MoE), trained via large-scale reinforcement learning. Achieves performance comparable to OpenAI o1. MIT license.- $3 / $8
- 128K context
- Text input
- Thinking always
- Knowledge cutoff Jul 2024
deepseek-v3p2
Reasoning · Speed
fireworks/accounts/fireworks/models/deepseek-v3p2MoE model harmonizing efficiency with superior reasoning and agent performance, featuring DeepSeek Sparse Attention for long-context efficiency.- $0.56 / $1.68
- 163K context
- Text input
deepseek-v3p1
Reasoning · Speed
fireworks/accounts/fireworks/models/deepseek-v3p1Hybrid model supporting both thinking and non-thinking modes with significantly improved tool usage and agent task performance.- ~$0.56 / ~$1.68
- 128K context
- Text input
- Hybrid thinking
- Knowledge cutoff ~mid 2025
deepseek-r1-0528
Reasoning · Speed
fireworks/accounts/fireworks/models/deepseek-r1-0528Updated R1 (also called R1.1) with improved RL, reduced hallucinations, JSON output, and function calling support. Performance approaching o3 and Gemini 2.5 Pro.- $3 / $8
- 128K context
- Text input
- Thinking always
- Knowledge cutoff Jul 2024
deepseek-r1-distill-llama-70b
Reasoning · Speed
fireworks/accounts/fireworks/models/deepseek-r1-distill-llama-70bA 70B dense model derived from Llama-3.3-70B, fine-tuned on reasoning data generated by DeepSeek-R1 for strong chain-of-thought in a smaller form factor.- 128K context
- Text input
- Thinking
- Knowledge cutoff Jul 2024
Kimi / MiniMax / GLM
kimi-k2p5
Reasoning · Speed
fireworks/accounts/fireworks/models/kimi-k2p5Moonshot AI’s open-source multimodal model with 1T total parameters (32B active) and Agent Swarm technology for coordinating up to 100 specialized AI agents.- $0.60 / $3
- 256K context
- Text, Image, Video input
- Thinking
- Knowledge cutoff Apr 2024
kimi-k2-instruct-0905
Reasoning · Speed
fireworks/accounts/fireworks/models/kimi-k2-instruct-0905General-purpose chat and agentic model optimized as a reflex-grade model without long thinking, with strong autonomous tool calling.- 256K context
- Text input
- Knowledge cutoff Sep 2024
kimi-k2-thinking
Reasoning · Speed
fireworks/accounts/fireworks/models/kimi-k2-thinkingOpen-source thinking agent that reasons step-by-step while dynamically invoking tools, supporting 200-300 sequential tool calls without drift.- $0.60 / $2.50
- 256K context
- Text input
- Thinking always
- Knowledge cutoff ~Sep 2024
minimax-m2p5
Reasoning · Speed
fireworks/accounts/fireworks/models/minimax-m2p5MoE model with 230B total parameters (10B active), achieving SOTA in coding and agentic tool use, completing tasks 37% faster than its predecessor.- 200K context
- Text input
glm-5
Reasoning · Speed
fireworks/accounts/fireworks/models/glm-5Zhipu AI’s fifth-generation model with ~745B MoE parameters (44B active), designed for complex system engineering and long-range agent tasks.- $1 / $3.20
- 200K context
- Text input
- Thinking
- Knowledge cutoff late 2025
glm-4p5
Reasoning · Speed
fireworks/accounts/fireworks/models/glm-4p5Zhipu AI’s foundation model with strong coding, reasoning, and agentic capabilities for real-world development environments.- $0.22 / $0.88
- 128K context
- Text input
- Thinking
- Knowledge cutoff ~mid 2024
Other
gpt-oss-120b
Reasoning · Speed
fireworks/accounts/fireworks/models/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active), achieving near-parity with o4-mini on reasoning benchmarks. Apache 2.0.- $0.15 / $0.60
- 128K context
- Text input
- Thinking
- Knowledge cutoff Jun 2024
gpt-oss-20b
Reasoning · Speed
fireworks/accounts/fireworks/models/gpt-oss-20bOpenAI’s compact open-weight MoE model with 20B total parameters (3.6B active), similar to o3-mini. Runs on edge devices with 16GB memory. Apache 2.0.- $0.07 / $0.30
- 128K context
- Text input
- Thinking
- Knowledge cutoff Jun 2024
mistral-large-3-fp8
Reasoning · Speed
fireworks/accounts/fireworks/models/mistral-large-3-fp8Mistral’s open-weight multimodal frontier model with granular MoE architecture (41B active, 675B total) supporting 256K context.- 256K context
- Text, Image input
- On-demand pricing
mistral-small-24b-instruct-2501
Reasoning · Speed
fireworks/accounts/fireworks/models/mistral-small-24b-instruct-2501A 24B dense model with native function calling, JSON output, and dozens of languages. Fits on a single RTX 4090.- 32K context
- Text input
- Knowledge cutoff Oct 2023
gemma-3-27b-it
Reasoning · Speed
fireworks/accounts/fireworks/models/gemma-3-27b-itGoogle’s multimodal open model with 27B parameters, supporting 128K context and 140+ languages on a single GPU/TPU.- $0.10 / $0.10
- 128K context
- Text, Image input
- Knowledge cutoff Aug 2024