Home/Models/Yi 6B
U

Yi 6B

by Unknown · Released Jan 2024

Open Source
42.8
avg score
Rank #135
Compare
Better than 42% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text
License
Open Source
Benchmarks
13 tested
Data updated today
About

Tested on 13 benchmarks with 31.4% average. Top scores: HellaSwag (65.9%), MMLU (52.0%), GSM8K (44.9%).

Capabilities
reasoning
19.6
#103 globally
math
18.4
#169 globally
knowledge
41.3
#145 globally
language
30.5
#123 globally
general
35.5
#27 globally
Benchmark Scores
Compare All
Tested on 13 benchmarks · Ranked across 5 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

29.6
MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

9.7
GSM8K

Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.

44.9
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

5.2
MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

5.1
HellaSwag

Sentence completion requiring commonsense reasoning about physical and social situations. Tests real-world understanding.

65.9
MMLU

Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.

52.0
Winogrande

Commonsense coreference resolution. Tests understanding of pronoun references in ambiguous sentences.

42.6
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
yi-6b
Specifications
  • Typetext
  • ContextN/A
  • ReleasedJan 2024
  • LicenseOpen Source
  • Statusbenchmark-only
Available On
U
UnknownTBD
Share & Export
Tweet
Yi 6B is an open-source text AI model by Unknown, released in January 2024. It has an average benchmark score of 42.8.