TogetherAI - Timbal

Source: TogetherAI model docs. All model IDs use the prefix togetherai/.

Meta LLaMA

Llama-4-Maverick-17B-128E-Instruct-FP8

Reasoning · Speedtogetherai/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8Natively multimodal MoE with 17B active parameters and 128 experts (400B total), supporting 1M token context. Outperforms GPT-4o and Gemini 2.0 Flash across multimodal benchmarks.

$0.27 / $0.85
1M context
Text, Image input
Knowledge cutoff Aug 2024

Llama-3.3-70B-Instruct-Turbo

Reasoning · Speedtogetherai/meta-llama/Llama-3.3-70B-Instruct-TurboMultilingual instruction-tuned model with 70B parameters, delivering enhanced performance relative to Llama 3.1 70B and matching Llama 3.2 90B on text-only tasks.

$0.88 / $0.88
128K context
Text input
Knowledge cutoff Dec 2023

Llama-3.2-3B-Instruct-Turbo

Reasoning · Speedtogetherai/meta-llama/Llama-3.2-3B-Instruct-TurboLightweight 3B-parameter model optimized for on-device use cases including summarization, instruction following, and rewriting tasks.

$0.06 / $0.06
128K context
Text input
Knowledge cutoff Dec 2023

Qwen

Qwen/Qwen3.5-397B-A17B

Reasoning · Speedtogetherai/Qwen/Qwen3.5-397B-A17BMultimodal foundation model with 397B total parameters (17B active) featuring a Hybrid MoE architecture with early fusion vision-language training. State-of-the-art across chat, RAG, vision-language, and agentic workflows.

$0.30 / $1.20
262K context
Text, Image input
Hybrid thinking
Knowledge cutoff ~2025

Qwen3-235B-A22B-Instruct-2507-tput

Reasoning · Speedtogetherai/Qwen/Qwen3-235B-A22B-Instruct-2507-tputMoE model with 235B total parameters (22B active) in non-thinking mode, optimized for throughput. Supports multilingual dialogue across 100+ languages.

$0.20 / $0.60
262K context
Text input
Knowledge cutoff ~early 2025

Qwen3-235B-A22B-Thinking-2507

Reasoning · Speedtogetherai/Qwen/Qwen3-235B-A22B-Thinking-2507The always-thinking variant of Qwen3-235B, with deep chain-of-thought reasoning for complex math, coding, and analytical tasks.

$0.65 / $3
262K context
Text input
Thinking always
Knowledge cutoff ~early 2025

Qwen3-Coder-480B-A35B-Instruct-FP8

Reasoning · Speedtogetherai/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8Qwen’s most agentic code model, a 480B-parameter MoE (35B active) achieving results comparable to Claude Sonnet on agentic coding, browser-use, and repository-scale tasks.

$0.22 / $1
262K context
Text input

Qwen3-Coder-Next-FP8

Reasoning · Speedtogetherai/Qwen/Qwen3-Coder-Next-FP8Next-generation coding model with hybrid thinking mode for adaptive reasoning depth.

$0.50 / $1.20
256K context
Text input
Hybrid thinking

Qwen3-Next-80B-A3B-Instruct

Reasoning · Speedtogetherai/Qwen/Qwen3-Next-80B-A3B-InstructFirst model in the Qwen3-Next series with 80B total parameters (3.9B active), featuring hybrid attention. Matches Qwen3-235B performance while using less than 10% training cost.

$0.15 / $1.50
262K context
Text input
Hybrid thinking

Qwen2.5-7B-Instruct-Turbo

Reasoning · Speedtogetherai/Qwen/Qwen2.5-7B-Instruct-TurboPart of the Qwen2.5 family with 7B parameters, featuring improvements in coding, mathematics, instruction following, and structured data understanding.

$0.30 / $1.20
128K context
Text input
Knowledge cutoff ~Oct 2023

DeepSeek

DeepSeek-V3.1

Reasoning · Speedtogetherai/deepseek-ai/DeepSeek-V3.1Hybrid model supporting both thinking and non-thinking modes. Features significantly improved tool usage and agent task performance, with quality comparable to DeepSeek-R1-0528 in thinking mode.

$0.60 / $1.70
128K context
Text input
Hybrid thinking
Knowledge cutoff ~mid 2025

DeepSeek-R1

Reasoning · Speedtogetherai/deepseek-ai/DeepSeek-R1Open reasoning model with 671B total parameters (37B active, MoE), trained via large-scale reinforcement learning. Achieves performance comparable to OpenAI o1 across math, code, and reasoning. MIT license.

~$3 / ~$7
128K context
Text input
Thinking always
Knowledge cutoff Jul 2024

Kimi / MiniMax / GLM / Other

Kimi-K2.5

Reasoning · Speedtogetherai/moonshotai/Kimi-K2.5Moonshot AI’s open-source multimodal model with 1T total parameters (32B active, MoE). Features Agent Swarm technology coordinating up to 100 specialized AI agents simultaneously.

$0.50 / $2.80
256K context
Text, Image, Video input
Thinking
Knowledge cutoff Apr 2024

Kimi-K2-Instruct-0905

Reasoning · Speedtogetherai/moonshotai/Kimi-K2-Instruct-0905General-purpose chat and agentic model with 1T total parameters (32B active), optimized as a reflex-grade model without long thinking. Strong autonomous tool-calling.

$1 / $3
256K context
Text input
Knowledge cutoff Sep 2024

Kimi-K2-Thinking

Reasoning · Speedtogetherai/moonshotai/Kimi-K2-ThinkingOpen-source thinking agent that reasons step-by-step while dynamically invoking tools. Sets state-of-the-art on Humanity’s Last Exam and BrowseComp, supporting 200-300 sequential tool calls without drift.

$1.20 / $4
256K context
Text input
Thinking always
Knowledge cutoff ~Sep 2024

MiniMax-M2.5

Reasoning · Speedtogetherai/MiniMaxAI/MiniMax-M2.5MoE model with 230B total parameters (10B active), extensively trained with RL in complex real-world environments. Achieves SOTA in coding (80.2% SWE-Bench Verified) and agentic tool use.

$0.30 / $1.20
200K context
Text input

GLM-5

Reasoning · Speedtogetherai/zai-org/GLM-5Zhipu AI’s fifth-generation model with ~745B parameters in a MoE architecture (44B active), designed for complex system engineering and long-range agent tasks. Trained entirely on Huawei Ascend chips.

$1 / $3.20
200K context
Text input
Thinking
Knowledge cutoff late 2025

GLM-4.7

Reasoning · Speedtogetherai/zai-org/GLM-4.7Zhipu AI’s foundation model with ~400B parameters and 200K context, designed for real-world development environments with strong coding, reasoning, and agentic capabilities.

$0.45 / $2
200K context
Text input
Thinking
Knowledge cutoff ~mid 2024

gpt-oss-120b

Reasoning · Speedtogetherai/openai/gpt-oss-120bOpenAI’s open-weight MoE model with 120B total parameters (5.1B active per token). Achieves near-parity with o4-mini on core reasoning benchmarks while running on a single 80GB GPU. Apache 2.0.

$0.15 / $0.60
128K context
Text input
Thinking
Knowledge cutoff Jun 2024

gemma-3n-E4B-it

Reasoning · Speedtogetherai/google/gemma-3n-E4B-itGoogle’s on-device multimodal model with 8B raw parameters but an effective 4B memory footprint. First sub-10B model to exceed 1300 on LMArena, running with as little as 3GB of memory.

$0.02 / $0.04
32K context
Text, Image, Audio, Video input

gemma-3-27b-it

Reasoning · Speedtogetherai/google/gemma-3-27b-itGoogle’s multimodal open model with 27B parameters, built from the same technology as Gemini 2.0. Supports 128K context, 140+ languages, and runs on a single GPU/TPU.

~$0.10 / ~$0.10
128K context
Text, Image input
Knowledge cutoff Aug 2024

cogito-v2-1-671b

Reasoning · Speedtogetherai/deepcogito/cogito-v2-1-671bDeepCogito’s MoE model with 671B total parameters (37B active), trained via a novel process supervision approach that guides reasoning chains. Competitive with frontier closed models while using fewer tokens.

$1.25 / $1.25
128K context
Text input
Thinking

Mistral-Small-24B-Instruct-2501

Reasoning · Speedtogetherai/mistralai/Mistral-Small-24B-Instruct-2501A 24B-parameter dense model setting new benchmarks in the sub-70B category, with native function calling, JSON output, and support for dozens of languages. Fits on a single RTX 4090.

$0.10 / $0.30
32K context
Text input
Knowledge cutoff Oct 2023

Model Reference

​Meta LLaMA

Llama-4-Maverick-17B-128E-Instruct-FP8

Llama-3.3-70B-Instruct-Turbo

Llama-3.2-3B-Instruct-Turbo

​Qwen

Qwen/Qwen3.5-397B-A17B

Qwen3-235B-A22B-Instruct-2507-tput

Qwen3-235B-A22B-Thinking-2507

Qwen3-Coder-480B-A35B-Instruct-FP8

Qwen3-Coder-Next-FP8

Qwen3-Next-80B-A3B-Instruct

Qwen2.5-7B-Instruct-Turbo

​DeepSeek

DeepSeek-V3.1

DeepSeek-R1

​Kimi / MiniMax / GLM / Other

Kimi-K2.5

Kimi-K2-Instruct-0905

Kimi-K2-Thinking

MiniMax-M2.5

GLM-5

GLM-4.7

gpt-oss-120b

gemma-3n-E4B-it

gemma-3-27b-it

cogito-v2-1-671b

Mistral-Small-24B-Instruct-2501

Meta LLaMA

Qwen

DeepSeek

Kimi / MiniMax / GLM / Other