Skip to main content
Source: TogetherAI model docs. All model IDs use the prefix togetherai/. Models with a dedicated-only warning are not serverless — you must create and start a dedicated endpoint first.

Meta LLaMA

Llama-3.3-70B-Instruct-Turbo

Reasoning · Speedtogetherai/meta-llama/Llama-3.3-70B-Instruct-TurboMultilingual instruction-tuned model with 70B parameters, delivering enhanced performance relative to Llama 3.1 70B and matching Llama 3.2 90B on text-only tasks.
  • $0.88 / $0.88
  • 128K context
  • Text input
  • Knowledge cutoff Dec 2023

Qwen

Qwen/Qwen3.5-397B-A17B

Reasoning · Speedtogetherai/Qwen/Qwen3.5-397B-A17BMultimodal foundation model with 397B total parameters (17B active) featuring a Hybrid MoE architecture with early fusion vision-language training. State-of-the-art across chat, RAG, vision-language, and agentic workflows.
  • $0.30 / $1.20
  • 262K context
  • Text, Image input
  • Hybrid thinking
  • Knowledge cutoff ~2025

Qwen3-235B-A22B-Instruct-2507-tput

Reasoning · Speedtogetherai/Qwen/Qwen3-235B-A22B-Instruct-2507-tputMoE model with 235B total parameters (22B active) in non-thinking mode, optimized for throughput. Supports multilingual dialogue across 100+ languages.
  • $0.20 / $0.60
  • 262K context
  • Text input
  • Knowledge cutoff ~early 2025

Qwen3-Coder-480B-A35B-Instruct-FP8

Reasoning · Speedtogetherai/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8Qwen’s most agentic code model, a 480B-parameter MoE (35B active) achieving results comparable to Claude Sonnet on agentic coding, browser-use, and repository-scale tasks.
Dedicated only — create and start a dedicated endpoint before use.
  • $0.22 / $1
  • 262K context
  • Text input

Qwen3-Coder-Next-FP8

Reasoning · Speedtogetherai/Qwen/Qwen3-Coder-Next-FP8Next-generation coding model with hybrid thinking mode for adaptive reasoning depth.
Dedicated only — create and start a dedicated endpoint before use.
  • $0.50 / $1.20
  • 256K context
  • Text input
  • Hybrid thinking

Qwen3-Next-80B-A3B-Instruct

Reasoning · Speedtogetherai/Qwen/Qwen3-Next-80B-A3B-InstructFirst model in the Qwen3-Next series with 80B total parameters (3.9B active), featuring hybrid attention. Matches Qwen3-235B performance while using less than 10% training cost.
Dedicated only — create and start a dedicated endpoint before use.
  • $0.15 / $1.50
  • 262K context
  • Text input
  • Hybrid thinking

Qwen2.5-7B-Instruct-Turbo

Reasoning · Speedtogetherai/Qwen/Qwen2.5-7B-Instruct-TurboPart of the Qwen2.5 family with 7B parameters, featuring improvements in coding, mathematics, instruction following, and structured data understanding.
  • $0.30 / $1.20
  • 128K context
  • Text input
  • Knowledge cutoff ~Oct 2023

DeepSeek

DeepSeek-V3.1

Reasoning · Speedtogetherai/deepseek-ai/DeepSeek-V3.1Hybrid model supporting both thinking and non-thinking modes. Features significantly improved tool usage and agent task performance, with quality comparable to DeepSeek-R1-0528 in thinking mode.
Dedicated only — create and start a dedicated endpoint before use.
  • $0.60 / $1.70
  • 128K context
  • Text input
  • Hybrid thinking
  • Knowledge cutoff ~mid 2025

DeepSeek-V4-Pro

Reasoning · Speedtogetherai/deepseek-ai/DeepSeek-V4-ProDeepSeek V4 Pro on Together serverless: hybrid attention, up to 512K context, strong coding and agent benchmarks.
  • $2.10 / $4.40
  • 512K context
  • Text input
  • Hybrid thinking
  • Knowledge cutoff ~2025

Kimi / MiniMax / GLM / Other

Kimi-K2.6

Reasoning · Speedtogetherai/moonshotai/Kimi-K2.6Moonshot Kimi K2.6 on Together serverless: 1T-scale MoE with tool calling and JSON mode for agentic and multimodal workloads.
  • $1.20 / $4.50
  • 262K context
  • Text, Image input
  • Knowledge cutoff ~2025

MiniMax-M2.7

Reasoning · Speedtogetherai/MiniMaxAI/MiniMax-M2.7MiniMax successor MoE (~229B) with improved coding and agentic tool use, JSON mode, and prompt caching on Together serverless.
  • $0.30 / $1.20
  • 203K context
  • Text input

GLM-5

Reasoning · Speedtogetherai/zai-org/GLM-5Zhipu AI’s fifth-generation model with ~745B parameters in a MoE architecture (44B active), designed for complex system engineering and long-range agent tasks. Trained entirely on Huawei Ascend chips.
  • $1 / $3.20
  • 200K context
  • Text input
  • Thinking
  • Knowledge cutoff late 2025

GLM-5.1

Reasoning · Speedtogetherai/zai-org/GLM-5.1Z.ai post-training upgrade to GLM-5: 754B MoE (40B active), 200K context, thinking mode, tool calling, and stronger coding via RL.
  • $1.40 / $4.40
  • 200K context
  • Text input
  • Thinking
  • Knowledge cutoff late 2025

GLM-4.7

Reasoning · Speedtogetherai/zai-org/GLM-4.7Zhipu AI’s foundation model with ~400B parameters and 200K context, designed for real-world development environments with strong coding, reasoning, and agentic capabilities.
Dedicated only — create and start a dedicated endpoint before use.
  • $0.45 / $2
  • 200K context
  • Text input
  • Thinking
  • Knowledge cutoff ~mid 2024

gpt-oss-120b

Reasoning · Speedtogetherai/openai/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token). Achieves near-parity with o4-mini on core reasoning benchmarks while running on a single 80GB GPU. Apache 2.0.
  • $0.15 / $0.60
  • 128K context
  • Text input
  • Thinking
  • Knowledge cutoff Jun 2024

gpt-oss-20b

Reasoning · Speedtogetherai/openai/gpt-oss-20bOpenAI’s compact 20B MoE delivering o3-mini-level results on Together serverless. Apache 2.0.
  • $0.05 / $0.20
  • 128K context
  • Text input
  • Thinking
  • Knowledge cutoff Jun 2024

gemma-3n-E4B-it

Reasoning · Speedtogetherai/google/gemma-3n-E4B-itGoogle’s on-device multimodal model with 8B raw parameters but an effective 4B memory footprint. First sub-10B model to exceed 1300 on LMArena, running with as little as 3GB of memory.
  • $0.02 / $0.04
  • 32K context
  • Text, Image, Audio, Video input

gemma-3-27b-it

Reasoning · Speedtogetherai/google/gemma-3-27b-itGoogle’s multimodal open model with 27B parameters, built from the same technology as Gemini 2.0. Supports 128K context, 140+ languages, and runs on a single GPU/TPU.
Dedicated only — create and start a dedicated endpoint before use.
  • ~$0.10 / ~$0.10
  • 128K context
  • Text, Image input
  • Knowledge cutoff Aug 2024

cogito-v2-1-671b

Reasoning · Speedtogetherai/deepcogito/cogito-v2-1-671bDeepCogito’s MoE model with 671B total parameters (37B active), trained via a novel process supervision approach that guides reasoning chains. Competitive with frontier closed models while using fewer tokens.
  • $1.25 / $1.25
  • 128K context
  • Text input
  • Thinking

Mistral-Small-24B-Instruct-2501

Reasoning · Speedtogetherai/mistralai/Mistral-Small-24B-Instruct-2501A 24B-parameter dense model setting new benchmarks in the sub-70B category, with native function calling, JSON output, and support for dozens of languages. Fits on a single RTX 4090.
Dedicated only — create and start a dedicated endpoint before use.
  • $0.10 / $0.30
  • 32K context
  • Text input
  • Knowledge cutoff Oct 2023