How much does Yi 6B cost?

Yi 6B is open source and can be self-hosted.

What benchmarks has Yi 6B been tested on?

Yi 6B has been evaluated on 13 benchmarks. Top scores: HellaSwag: 65.9, MMLU: 52.0, GSM8K: 44.9.

How does Yi 6B compare to Qwen2.5 3B Instruct?

Yi 6B has an average score of 42.8 while Qwen2.5 3B Instruct scores 42.9. Qwen2.5 3B Instruct slightly outperforms Yi 6B overall. See full comparison →

Home/Models/Yi 6B

Yi 6B

Name: Yi 6B
Author: Unknown

by Unknown · Released Jan 2024

Open Source

42.8

avg score

Rank #135

Compare

Better than 42% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Open Source

Benchmarks

13 tested

Data updated today

About

Tested on 13 benchmarks with 31.4% average. Top scores: HellaSwag (65.9%), MMLU (52.0%), GSM8K (44.9%).

Capabilities

reasoning

19.6

#103 globally

math

18.4

#169 globally

knowledge

41.3

#145 globally

language

30.5

#123 globally

general

35.5

#27 globally

Benchmark Scores

Compare All

Tested on 13 benchmarks · Ranked across 5 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

reasoningCompare reasoning →

BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

29.6—

MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

9.7—

mathCompare math →

GSM8K

Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.

44.9—

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

5.2—

MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

5.1—

knowledgeCompare knowledge →

HellaSwag

Sentence completion requiring commonsense reasoning about physical and social situations. Tests real-world understanding.

65.9—

MMLU

Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.

52.0—

Winogrande

Commonsense coreference resolution. Tests understanding of pronoun references in ambiguous sentences.

42.6—

Quick compare:

vs Qwen2.5 3B Instruct

vs Gemini 1.5 Flash (May 2024)

vs GPT-4.1 Mini

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Similar Models

Qwen2.5 3B Instruct

Alibaba

42.9TBD

Gemini 1.5 Flash (May 2024)

Links

Info

Unknown Pricing explorer Developers · API

Research

Documentation

Community

Source Code

BenchGecko API

yi-6b

Specifications

Typetext
ContextN/A
ReleasedJan 2024
LicenseOpen Source
Statusbenchmark-only

Available On

UnknownTBD

Frequently Asked Questions

Yi 6B is an open-source text AI model by Unknown, released in January 2024. It has an average benchmark score of 42.8.

Benchmarks

HellaSwag MMLU GSM8K Winogrande MMLU-PRO

Unknown · Provider Unknown · Economy All Models Compare Models Pricing Developers · API

Yi 6B

Frequently Asked Questions

Related Models

Benchmarks

Related Pages