Skip to main content
Source: TogetherAI model docs. All model IDs use the prefix togetherai/.

Meta LLaMA

Llama-4-Maverick-17B-128E-Instruct-FP8

Reasoning · Speedtogetherai/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8Natively multimodal MoE with 17B active parameters and 128 experts (400B total), supporting 1M token context. Outperforms GPT-4o and Gemini 2.0 Flash across multimodal benchmarks.
  • $0.27 / $0.85
  • 1M context
  • Text, Image input
  • Knowledge cutoff Aug 2024

Llama-3.3-70B-Instruct-Turbo

Reasoning · Speedtogetherai/meta-llama/Llama-3.3-70B-Instruct-TurboMultilingual instruction-tuned model with 70B parameters, delivering enhanced performance relative to Llama 3.1 70B and matching Llama 3.2 90B on text-only tasks.
  • $0.88 / $0.88
  • 128K context
  • Text input
  • Knowledge cutoff Dec 2023

Llama-3.2-3B-Instruct-Turbo

Reasoning · Speedtogetherai/meta-llama/Llama-3.2-3B-Instruct-TurboLightweight 3B-parameter model optimized for on-device use cases including summarization, instruction following, and rewriting tasks.
  • $0.06 / $0.06
  • 128K context
  • Text input
  • Knowledge cutoff Dec 2023

Qwen

Qwen/Qwen3.5-397B-A17B

Reasoning · Speedtogetherai/Qwen/Qwen3.5-397B-A17BMultimodal foundation model with 397B total parameters (17B active) featuring a Hybrid MoE architecture with early fusion vision-language training. State-of-the-art across chat, RAG, vision-language, and agentic workflows.
  • $0.30 / $1.20
  • 262K context
  • Text, Image input
  • Hybrid thinking
  • Knowledge cutoff ~2025

Qwen3-235B-A22B-Instruct-2507-tput

Reasoning · Speedtogetherai/Qwen/Qwen3-235B-A22B-Instruct-2507-tputMoE model with 235B total parameters (22B active) in non-thinking mode, optimized for throughput. Supports multilingual dialogue across 100+ languages.
  • $0.20 / $0.60
  • 262K context
  • Text input
  • Knowledge cutoff ~early 2025

Qwen3-235B-A22B-Thinking-2507

Reasoning · Speedtogetherai/Qwen/Qwen3-235B-A22B-Thinking-2507The always-thinking variant of Qwen3-235B, with deep chain-of-thought reasoning for complex math, coding, and analytical tasks.
  • $0.65 / $3
  • 262K context
  • Text input
  • Thinking always
  • Knowledge cutoff ~early 2025

Qwen3-Coder-480B-A35B-Instruct-FP8

Reasoning · Speedtogetherai/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8Qwen’s most agentic code model, a 480B-parameter MoE (35B active) achieving results comparable to Claude Sonnet on agentic coding, browser-use, and repository-scale tasks.
  • $0.22 / $1
  • 262K context
  • Text input

Qwen3-Coder-Next-FP8

Reasoning · Speedtogetherai/Qwen/Qwen3-Coder-Next-FP8Next-generation coding model with hybrid thinking mode for adaptive reasoning depth.
  • 256K context
  • Text input
  • Hybrid thinking

Qwen3-Next-80B-A3B-Instruct

Reasoning · Speedtogetherai/Qwen/Qwen3-Next-80B-A3B-InstructFirst model in the Qwen3-Next series with 80B total parameters (3.9B active), featuring hybrid attention. Matches Qwen3-235B performance while using less than 10% training cost.
  • $0.15 / $1.50
  • 262K context
  • Text input
  • Hybrid thinking

Qwen2.5-7B-Instruct-Turbo

Reasoning · Speedtogetherai/Qwen/Qwen2.5-7B-Instruct-TurboPart of the Qwen2.5 family with 7B parameters, featuring improvements in coding, mathematics, instruction following, and structured data understanding.
  • $0.30 / $1.20
  • 128K context
  • Text input
  • Knowledge cutoff ~Oct 2023

DeepSeek

DeepSeek-V3.1

Reasoning · Speedtogetherai/deepseek-ai/DeepSeek-V3.1Hybrid model supporting both thinking and non-thinking modes. Features significantly improved tool usage and agent task performance, with quality comparable to DeepSeek-R1-0528 in thinking mode.
  • $0.60 / $1.70
  • 128K context
  • Text input
  • Hybrid thinking
  • Knowledge cutoff ~mid 2025

DeepSeek-R1

Reasoning · Speedtogetherai/deepseek-ai/DeepSeek-R1Open reasoning model with 671B total parameters (37B active, MoE), trained via large-scale reinforcement learning. Achieves performance comparable to OpenAI o1 across math, code, and reasoning. MIT license.
  • ~$3 / ~$7
  • 128K context
  • Text input
  • Thinking always
  • Knowledge cutoff Jul 2024

Kimi / MiniMax / GLM / Other

Kimi-K2.5

Reasoning · Speedtogetherai/moonshotai/Kimi-K2.5Moonshot AI’s open-source multimodal model with 1T total parameters (32B active, MoE). Features Agent Swarm technology coordinating up to 100 specialized AI agents simultaneously.
  • $0.50 / $2.80
  • 256K context
  • Text, Image, Video input
  • Thinking
  • Knowledge cutoff Apr 2024

Kimi-K2-Instruct-0905

Reasoning · Speedtogetherai/moonshotai/Kimi-K2-Instruct-0905General-purpose chat and agentic model with 1T total parameters (32B active), optimized as a reflex-grade model without long thinking. Strong autonomous tool-calling.
  • $1 / $3
  • 256K context
  • Text input
  • Knowledge cutoff Sep 2024

Kimi-K2-Thinking

Reasoning · Speedtogetherai/moonshotai/Kimi-K2-ThinkingOpen-source thinking agent that reasons step-by-step while dynamically invoking tools. Sets state-of-the-art on Humanity’s Last Exam and BrowseComp, supporting 200-300 sequential tool calls without drift.
  • $1.20 / $4
  • 256K context
  • Text input
  • Thinking always
  • Knowledge cutoff ~Sep 2024

MiniMax-M2.5

Reasoning · Speedtogetherai/MiniMaxAI/MiniMax-M2.5MoE model with 230B total parameters (10B active), extensively trained with RL in complex real-world environments. Achieves SOTA in coding (80.2% SWE-Bench Verified) and agentic tool use.
  • $0.30 / $1.20
  • 200K context
  • Text input

GLM-5

Reasoning · Speedtogetherai/zai-org/GLM-5Zhipu AI’s fifth-generation model with ~745B parameters in a MoE architecture (44B active), designed for complex system engineering and long-range agent tasks. Trained entirely on Huawei Ascend chips.
  • $1 / $3.20
  • 200K context
  • Text input
  • Thinking
  • Knowledge cutoff late 2025

GLM-4.7

Reasoning · Speedtogetherai/zai-org/GLM-4.7Zhipu AI’s foundation model with ~400B parameters and 200K context, designed for real-world development environments with strong coding, reasoning, and agentic capabilities.
  • $0.45 / $2
  • 200K context
  • Text input
  • Thinking
  • Knowledge cutoff ~mid 2024

gpt-oss-120b

Reasoning · Speedtogetherai/openai/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token). Achieves near-parity with o4-mini on core reasoning benchmarks while running on a single 80GB GPU. Apache 2.0.
  • $0.15 / $0.60
  • 128K context
  • Text input
  • Thinking
  • Knowledge cutoff Jun 2024

gemma-3n-E4B-it

Reasoning · Speedtogetherai/google/gemma-3n-E4B-itGoogle’s on-device multimodal model with 8B raw parameters but an effective 4B memory footprint. First sub-10B model to exceed 1300 on LMArena, running with as little as 3GB of memory.
  • $0.02 / $0.04
  • 32K context
  • Text, Image, Audio, Video input

gemma-3-27b-it

Reasoning · Speedtogetherai/google/gemma-3-27b-itGoogle’s multimodal open model with 27B parameters, built from the same technology as Gemini 2.0. Supports 128K context, 140+ languages, and runs on a single GPU/TPU.
  • ~$0.10 / ~$0.10
  • 128K context
  • Text, Image input
  • Knowledge cutoff Aug 2024

cogito-v2-1-671b

Reasoning · Speedtogetherai/deepcogito/cogito-v2-1-671bDeepCogito’s MoE model with 671B total parameters (37B active), trained via a novel process supervision approach that guides reasoning chains. Competitive with frontier closed models while using fewer tokens.
  • $1.25 / $1.25
  • 128K context
  • Text input
  • Thinking

Mistral-Small-24B-Instruct-2501

Reasoning · Speedtogetherai/mistralai/Mistral-Small-24B-Instruct-2501A 24B-parameter dense model setting new benchmarks in the sub-70B category, with native function calling, JSON output, and support for dozens of languages. Fits on a single RTX 4090.
  • 32K context
  • Text input
  • Knowledge cutoff Oct 2023