Source: TogetherAI model docs. All model IDs use the prefix
togetherai/.Meta LLaMA
Llama-4-Maverick-17B-128E-Instruct-FP8
Reasoning · Speed
togetherai/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8Natively multimodal MoE with 17B active parameters and 128 experts (400B total), supporting 1M token context. Outperforms GPT-4o and Gemini 2.0 Flash across multimodal benchmarks.- $0.27 / $0.85
- 1M context
- Text, Image input
- Knowledge cutoff Aug 2024
Llama-3.3-70B-Instruct-Turbo
Reasoning · Speed
togetherai/meta-llama/Llama-3.3-70B-Instruct-TurboMultilingual instruction-tuned model with 70B parameters, delivering enhanced performance relative to Llama 3.1 70B and matching Llama 3.2 90B on text-only tasks.- $0.88 / $0.88
- 128K context
- Text input
- Knowledge cutoff Dec 2023
Llama-3.2-3B-Instruct-Turbo
Reasoning · Speed
togetherai/meta-llama/Llama-3.2-3B-Instruct-TurboLightweight 3B-parameter model optimized for on-device use cases including summarization, instruction following, and rewriting tasks.- $0.06 / $0.06
- 128K context
- Text input
- Knowledge cutoff Dec 2023
Qwen
Qwen/Qwen3.5-397B-A17B
Reasoning · Speed
togetherai/Qwen/Qwen3.5-397B-A17BMultimodal foundation model with 397B total parameters (17B active) featuring a Hybrid MoE architecture with early fusion vision-language training. State-of-the-art across chat, RAG, vision-language, and agentic workflows.- $0.30 / $1.20
- 262K context
- Text, Image input
- Hybrid thinking
- Knowledge cutoff ~2025
Qwen3-235B-A22B-Instruct-2507-tput
Reasoning · Speed
togetherai/Qwen/Qwen3-235B-A22B-Instruct-2507-tputMoE model with 235B total parameters (22B active) in non-thinking mode, optimized for throughput. Supports multilingual dialogue across 100+ languages.- $0.20 / $0.60
- 262K context
- Text input
- Knowledge cutoff ~early 2025
Qwen3-235B-A22B-Thinking-2507
Reasoning · Speed
togetherai/Qwen/Qwen3-235B-A22B-Thinking-2507The always-thinking variant of Qwen3-235B, with deep chain-of-thought reasoning for complex math, coding, and analytical tasks.- $0.65 / $3
- 262K context
- Text input
- Thinking always
- Knowledge cutoff ~early 2025
Qwen3-Coder-480B-A35B-Instruct-FP8
Reasoning · Speed
togetherai/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8Qwen’s most agentic code model, a 480B-parameter MoE (35B active) achieving results comparable to Claude Sonnet on agentic coding, browser-use, and repository-scale tasks.- $0.22 / $1
- 262K context
- Text input
Qwen3-Coder-Next-FP8
Reasoning · Speed
togetherai/Qwen/Qwen3-Coder-Next-FP8Next-generation coding model with hybrid thinking mode for adaptive reasoning depth.- 256K context
- Text input
- Hybrid thinking
Qwen3-Next-80B-A3B-Instruct
Reasoning · Speed
togetherai/Qwen/Qwen3-Next-80B-A3B-InstructFirst model in the Qwen3-Next series with 80B total parameters (3.9B active), featuring hybrid attention. Matches Qwen3-235B performance while using less than 10% training cost.- $0.15 / $1.50
- 262K context
- Text input
- Hybrid thinking
Qwen2.5-7B-Instruct-Turbo
Reasoning · Speed
togetherai/Qwen/Qwen2.5-7B-Instruct-TurboPart of the Qwen2.5 family with 7B parameters, featuring improvements in coding, mathematics, instruction following, and structured data understanding.- $0.30 / $1.20
- 128K context
- Text input
- Knowledge cutoff ~Oct 2023
DeepSeek
DeepSeek-V3.1
Reasoning · Speed
togetherai/deepseek-ai/DeepSeek-V3.1Hybrid model supporting both thinking and non-thinking modes. Features significantly improved tool usage and agent task performance, with quality comparable to DeepSeek-R1-0528 in thinking mode.- $0.60 / $1.70
- 128K context
- Text input
- Hybrid thinking
- Knowledge cutoff ~mid 2025
DeepSeek-R1
Reasoning · Speed
togetherai/deepseek-ai/DeepSeek-R1Open reasoning model with 671B total parameters (37B active, MoE), trained via large-scale reinforcement learning. Achieves performance comparable to OpenAI o1 across math, code, and reasoning. MIT license.- ~$3 / ~$7
- 128K context
- Text input
- Thinking always
- Knowledge cutoff Jul 2024
Kimi / MiniMax / GLM / Other
Kimi-K2.5
Reasoning · Speed
togetherai/moonshotai/Kimi-K2.5Moonshot AI’s open-source multimodal model with 1T total parameters (32B active, MoE). Features Agent Swarm technology coordinating up to 100 specialized AI agents simultaneously.- $0.50 / $2.80
- 256K context
- Text, Image, Video input
- Thinking
- Knowledge cutoff Apr 2024
Kimi-K2-Instruct-0905
Reasoning · Speed
togetherai/moonshotai/Kimi-K2-Instruct-0905General-purpose chat and agentic model with 1T total parameters (32B active), optimized as a reflex-grade model without long thinking. Strong autonomous tool-calling.- $1 / $3
- 256K context
- Text input
- Knowledge cutoff Sep 2024
Kimi-K2-Thinking
Reasoning · Speed
togetherai/moonshotai/Kimi-K2-ThinkingOpen-source thinking agent that reasons step-by-step while dynamically invoking tools. Sets state-of-the-art on Humanity’s Last Exam and BrowseComp, supporting 200-300 sequential tool calls without drift.- $1.20 / $4
- 256K context
- Text input
- Thinking always
- Knowledge cutoff ~Sep 2024
MiniMax-M2.5
Reasoning · Speed
togetherai/MiniMaxAI/MiniMax-M2.5MoE model with 230B total parameters (10B active), extensively trained with RL in complex real-world environments. Achieves SOTA in coding (80.2% SWE-Bench Verified) and agentic tool use.- $0.30 / $1.20
- 200K context
- Text input
GLM-5
Reasoning · Speed
togetherai/zai-org/GLM-5Zhipu AI’s fifth-generation model with ~745B parameters in a MoE architecture (44B active), designed for complex system engineering and long-range agent tasks. Trained entirely on Huawei Ascend chips.- $1 / $3.20
- 200K context
- Text input
- Thinking
- Knowledge cutoff late 2025
GLM-4.7
Reasoning · Speed
togetherai/zai-org/GLM-4.7Zhipu AI’s foundation model with ~400B parameters and 200K context, designed for real-world development environments with strong coding, reasoning, and agentic capabilities.- $0.45 / $2
- 200K context
- Text input
- Thinking
- Knowledge cutoff ~mid 2024
gpt-oss-120b
Reasoning · Speed
togetherai/openai/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token). Achieves near-parity with o4-mini on core reasoning benchmarks while running on a single 80GB GPU. Apache 2.0.- $0.15 / $0.60
- 128K context
- Text input
- Thinking
- Knowledge cutoff Jun 2024
gemma-3n-E4B-it
Reasoning · Speed
togetherai/google/gemma-3n-E4B-itGoogle’s on-device multimodal model with 8B raw parameters but an effective 4B memory footprint. First sub-10B model to exceed 1300 on LMArena, running with as little as 3GB of memory.- $0.02 / $0.04
- 32K context
- Text, Image, Audio, Video input
gemma-3-27b-it
Reasoning · Speed
togetherai/google/gemma-3-27b-itGoogle’s multimodal open model with 27B parameters, built from the same technology as Gemini 2.0. Supports 128K context, 140+ languages, and runs on a single GPU/TPU.- ~$0.10 / ~$0.10
- 128K context
- Text, Image input
- Knowledge cutoff Aug 2024
cogito-v2-1-671b
Reasoning · Speed
togetherai/deepcogito/cogito-v2-1-671bDeepCogito’s MoE model with 671B total parameters (37B active), trained via a novel process supervision approach that guides reasoning chains. Competitive with frontier closed models while using fewer tokens.- $1.25 / $1.25
- 128K context
- Text input
- Thinking
Mistral-Small-24B-Instruct-2501
Reasoning · Speed
togetherai/mistralai/Mistral-Small-24B-Instruct-2501A 24B-parameter dense model setting new benchmarks in the sub-70B category, with native function calling, JSON output, and support for dozens of languages. Fits on a single RTX 4090.- 32K context
- Text input
- Knowledge cutoff Oct 2023