Source: TogetherAI model docs. All model IDs use the prefix
togetherai/. Models with a dedicated-only warning are not serverless — you must create and start a dedicated endpoint first.Meta LLaMA
Llama-3.3-70B-Instruct-Turbo
Reasoning · Speed
togetherai/meta-llama/Llama-3.3-70B-Instruct-TurboMultilingual instruction-tuned model with 70B parameters, delivering enhanced performance relative to Llama 3.1 70B and matching Llama 3.2 90B on text-only tasks.- $0.88 / $0.88
- 128K context
- Text input
- Knowledge cutoff Dec 2023
Qwen
Qwen/Qwen3.5-397B-A17B
Reasoning · Speed
togetherai/Qwen/Qwen3.5-397B-A17BMultimodal foundation model with 397B total parameters (17B active) featuring a Hybrid MoE architecture with early fusion vision-language training. State-of-the-art across chat, RAG, vision-language, and agentic workflows.- $0.30 / $1.20
- 262K context
- Text, Image input
- Hybrid thinking
- Knowledge cutoff ~2025
Qwen3-235B-A22B-Instruct-2507-tput
Reasoning · Speed
togetherai/Qwen/Qwen3-235B-A22B-Instruct-2507-tputMoE model with 235B total parameters (22B active) in non-thinking mode, optimized for throughput. Supports multilingual dialogue across 100+ languages.- $0.20 / $0.60
- 262K context
- Text input
- Knowledge cutoff ~early 2025
Qwen3-Coder-480B-A35B-Instruct-FP8
Reasoning · Speed
togetherai/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8Qwen’s most agentic code model, a 480B-parameter MoE (35B active) achieving results comparable to Claude Sonnet on agentic coding, browser-use, and repository-scale tasks.- $0.22 / $1
- 262K context
- Text input
Qwen3-Coder-Next-FP8
Reasoning · Speed
togetherai/Qwen/Qwen3-Coder-Next-FP8Next-generation coding model with hybrid thinking mode for adaptive reasoning depth.- $0.50 / $1.20
- 256K context
- Text input
- Hybrid thinking
Qwen3-Next-80B-A3B-Instruct
Reasoning · Speed
togetherai/Qwen/Qwen3-Next-80B-A3B-InstructFirst model in the Qwen3-Next series with 80B total parameters (3.9B active), featuring hybrid attention. Matches Qwen3-235B performance while using less than 10% training cost.- $0.15 / $1.50
- 262K context
- Text input
- Hybrid thinking
Qwen2.5-7B-Instruct-Turbo
Reasoning · Speed
togetherai/Qwen/Qwen2.5-7B-Instruct-TurboPart of the Qwen2.5 family with 7B parameters, featuring improvements in coding, mathematics, instruction following, and structured data understanding.- $0.30 / $1.20
- 128K context
- Text input
- Knowledge cutoff ~Oct 2023
DeepSeek
DeepSeek-V3.1
Reasoning · Speed
togetherai/deepseek-ai/DeepSeek-V3.1Hybrid model supporting both thinking and non-thinking modes. Features significantly improved tool usage and agent task performance, with quality comparable to DeepSeek-R1-0528 in thinking mode.- $0.60 / $1.70
- 128K context
- Text input
- Hybrid thinking
- Knowledge cutoff ~mid 2025
DeepSeek-V4-Pro
Reasoning · Speed
togetherai/deepseek-ai/DeepSeek-V4-ProDeepSeek V4 Pro on Together serverless: hybrid attention, up to 512K context, strong coding and agent benchmarks.- $2.10 / $4.40
- 512K context
- Text input
- Hybrid thinking
- Knowledge cutoff ~2025
Kimi / MiniMax / GLM / Other
Kimi-K2.6
Reasoning · Speed
togetherai/moonshotai/Kimi-K2.6Moonshot Kimi K2.6 on Together serverless: 1T-scale MoE with tool calling and JSON mode for agentic and multimodal workloads.- $1.20 / $4.50
- 262K context
- Text, Image input
- Knowledge cutoff ~2025
MiniMax-M2.7
Reasoning · Speed
togetherai/MiniMaxAI/MiniMax-M2.7MiniMax successor MoE (~229B) with improved coding and agentic tool use, JSON mode, and prompt caching on Together serverless.- $0.30 / $1.20
- 203K context
- Text input
GLM-5
Reasoning · Speed
togetherai/zai-org/GLM-5Zhipu AI’s fifth-generation model with ~745B parameters in a MoE architecture (44B active), designed for complex system engineering and long-range agent tasks. Trained entirely on Huawei Ascend chips.- $1 / $3.20
- 200K context
- Text input
- Thinking
- Knowledge cutoff late 2025
GLM-5.1
Reasoning · Speed
togetherai/zai-org/GLM-5.1Z.ai post-training upgrade to GLM-5: 754B MoE (40B active), 200K context, thinking mode, tool calling, and stronger coding via RL.- $1.40 / $4.40
- 200K context
- Text input
- Thinking
- Knowledge cutoff late 2025
GLM-4.7
Reasoning · Speed
togetherai/zai-org/GLM-4.7Zhipu AI’s foundation model with ~400B parameters and 200K context, designed for real-world development environments with strong coding, reasoning, and agentic capabilities.- $0.45 / $2
- 200K context
- Text input
- Thinking
- Knowledge cutoff ~mid 2024
gpt-oss-120b
Reasoning · Speed
togetherai/openai/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token). Achieves near-parity with o4-mini on core reasoning benchmarks while running on a single 80GB GPU. Apache 2.0.- $0.15 / $0.60
- 128K context
- Text input
- Thinking
- Knowledge cutoff Jun 2024
gpt-oss-20b
Reasoning · Speed
togetherai/openai/gpt-oss-20bOpenAI’s compact 20B MoE delivering o3-mini-level results on Together serverless. Apache 2.0.- $0.05 / $0.20
- 128K context
- Text input
- Thinking
- Knowledge cutoff Jun 2024
gemma-3n-E4B-it
Reasoning · Speed
togetherai/google/gemma-3n-E4B-itGoogle’s on-device multimodal model with 8B raw parameters but an effective 4B memory footprint. First sub-10B model to exceed 1300 on LMArena, running with as little as 3GB of memory.- $0.02 / $0.04
- 32K context
- Text, Image, Audio, Video input
gemma-3-27b-it
Reasoning · Speed
togetherai/google/gemma-3-27b-itGoogle’s multimodal open model with 27B parameters, built from the same technology as Gemini 2.0. Supports 128K context, 140+ languages, and runs on a single GPU/TPU.- ~$0.10 / ~$0.10
- 128K context
- Text, Image input
- Knowledge cutoff Aug 2024
cogito-v2-1-671b
Reasoning · Speed
togetherai/deepcogito/cogito-v2-1-671bDeepCogito’s MoE model with 671B total parameters (37B active), trained via a novel process supervision approach that guides reasoning chains. Competitive with frontier closed models while using fewer tokens.- $1.25 / $1.25
- 128K context
- Text input
- Thinking
Mistral-Small-24B-Instruct-2501
Reasoning · Speed
togetherai/mistralai/Mistral-Small-24B-Instruct-2501A 24B-parameter dense model setting new benchmarks in the sub-70B category, with native function calling, JSON output, and support for dozens of languages. Fits on a single RTX 4090.- $0.10 / $0.30
- 32K context
- Text input
- Knowledge cutoff Oct 2023