Compare · ModelsLive · 2 picked · head to head

GPT-3.5 Turbo (older v0613) vs Mixtral 8x7B Instruct

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

GPT-3.5 Turbo (older v0613) wins on 6/9 benchmarks

GPT-3.5 Turbo (older v0613) wins 6 of 9 shared benchmarks. Leads in knowledge.

GPT-3.5 Turbo (older v0613)

6 / 9

Mixtral 8x7B Instruct

3 / 9

Category leads

knowledge·GPT-3.5 Turbo (older v0613)math·Mixtral 8x7B Instruct

Hype vs Reality

Attention vs performance

GPT-3.5 Turbo (older v0613)

#111 by perf·no signal

QUIET

Mixtral 8x7B Instruct

#54 by perf·no signal

QUIET

See full mindshare →

Best value

Mixtral 8x7B Instruct

3.5x better value than GPT-3.5 Turbo (older v0613)

GPT-3.5 Turbo (older v0613)

30.5 pts/$

$1.50/M

Mixtral 8x7B Instruct

107.0 pts/$

$0.54/M

Explore pricing →

Vendor risk

Who is behind the model

OpenAI

$840.0B·Tier 1

Medium risk

Mistral AI

$14.0B·Tier 1

Medium risk

See the AI economy →

Head to head

9 benchmarks · 2 models

GPT-3.5 Turbo (older v0613)Mixtral 8x7B Instruct

ANLI

GPT-3.5 Turbo (older v0613) leads by +4.4

ANLI (Adversarial NLI) · adversarially constructed natural language inference dataset where each round targets weaknesses found in previous model generations.

GPT-3.5 Turbo (older v0613)

37.1

Mixtral 8x7B Instruct

32.8

ARC AI2

GPT-3.5 Turbo (older v0613) leads by +0.1

AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.

GPT-3.5 Turbo (older v0613)

83.2

Mixtral 8x7B Instruct

83.1

GPQA diamond

Mixtral 8x7B Instruct leads by +4.6

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

GPT-3.5 Turbo (older v0613)

2.9

Mixtral 8x7B Instruct

7.5

GSM8K

Mixtral 8x7B Instruct leads by +16.6

Grade School Math 8K · 8,500 linguistically diverse grade-school math word problems that require multi-step reasoning to solve.

GPT-3.5 Turbo (older v0613)

57.8

Mixtral 8x7B Instruct

74.4

MATH level 5

GPT-3.5 Turbo (older v0613) leads by +1.7

MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.

GPT-3.5 Turbo (older v0613)

11.6

Mixtral 8x7B Instruct

9.9

MMLU

Mixtral 8x7B Instruct leads by +4.4

Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.

GPT-3.5 Turbo (older v0613)

56.4

Mixtral 8x7B Instruct

60.8

OpenBookQA

GPT-3.5 Turbo (older v0613) leads by +0.3

OpenBookQA · science questions that require combining a given core fact with broad common knowledge, mimicking an open-book exam setting.

GPT-3.5 Turbo (older v0613)

81.3

Mixtral 8x7B Instruct

81.1

TriviaQA

GPT-3.5 Turbo (older v0613) leads by +3.6

TriviaQA · reading comprehension benchmark with trivia questions, requiring models to find and reason over evidence from provided documents.

GPT-3.5 Turbo (older v0613)

85.8

Mixtral 8x7B Instruct

82.2

Winogrande

GPT-3.5 Turbo (older v0613) leads by +8.8

WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.

GPT-3.5 Turbo (older v0613)

63.2

Mixtral 8x7B Instruct

54.4

Full benchmark table

Benchmark	GPT-3.5 Turbo (older v0613)	Mixtral 8x7B Instruct
ANLI ANLI (Adversarial NLI) · adversarially constructed natural language inference dataset where each round targets weaknesses found in previous model generations.	37.1	32.8
ARC AI2 AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.	83.2	83.1
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	2.9	7.5
GSM8K Grade School Math 8K · 8,500 linguistically diverse grade-school math word problems that require multi-step reasoning to solve.	57.8	74.4
MATH level 5 MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.	11.6	9.9
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.	56.4	60.8
OpenBookQA OpenBookQA · science questions that require combining a given core fact with broad common knowledge, mimicking an open-book exam setting.	81.3	81.1
TriviaQA TriviaQA · reading comprehension benchmark with trivia questions, requiring models to find and reason over evidence from provided documents.	85.8	82.2
Winogrande WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.	63.2	54.4

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
GPT-3.5 Turbo (older v0613)	$1.00	$2.00	4K tokens (~2 books)	$12.50
Mixtral 8x7B Instruct	$0.54	$0.54	33K tokens (~16 books)	$5.40

People also compared

GPT-3.5 Turbo vs Mixtral 8x7B Instruct GPT-5.5 Pro vs Mixtral 8x7B Instruct GPT-5.5 vs Mixtral 8x7B Instruct Claude Mythos Preview vs Mixtral 8x7B Instruct Mixtral 8x7B Instruct vs Qwen3.5 397B A17B DeepSeek V3.2 Speciale vs Mixtral 8x7B Instruct Claude Instant vs Mixtral 8x7B Instruct DeepSeek-V2 (MoE-236B, May 2024) vs Mixtral 8x7B Instruct