Home/Models/Stable Beluga 2
U

Stable Beluga 2

by Unknown · Released Jan 2024

55.5
avg score
Rank #90
Compare
Better than 61% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text
License
Proprietary
Benchmarks
13 tested
Data updated today
About

Tested on 13 benchmarks with 47.8% average. Top scores: ARC AI2 (81.5%), HellaSwag (78.8%), LAMBADA (71.3%).

Capabilities
reasoning
38.9
#68 globally
math
37.0
#113 globally
knowledge
55.9
#73 globally
language
37.9
#117 globally
general
41.3
#17 globally
Benchmark Scores
Compare All
Tested on 13 benchmarks · Ranked across 5 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

59.1
MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

18.6
GSM8K

Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.

69.6
MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

4.4
ARC AI2

AI2 Reasoning Challenge. Grade-school science questions requiring multi-step reasoning. Easy and Challenge sets test different difficulty levels.

81.5
HellaSwag

Sentence completion requiring commonsense reasoning about physical and social situations. Tests real-world understanding.

78.8
LAMBADA

Language modeling benchmark testing ability to predict the last word of passages requiring long-range context understanding.

71.3
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
stable-beluga-2
Specifications
  • Typetext
  • ContextN/A
  • ReleasedJan 2024
  • LicenseProprietary
  • Statusbenchmark-only
Available On
U
UnknownTBD
Share & Export
Tweet
Stable Beluga 2 is a proprietary text AI model by Unknown, released in January 2024. It has an average benchmark score of 55.5.