How much does Stable Beluga 2 cost?

Stable Beluga 2 pricing information is not yet available.

What benchmarks has Stable Beluga 2 been tested on?

Stable Beluga 2 has been evaluated on 13 benchmarks. Top scores: ARC AI2: 81.5, HellaSwag: 78.8, LAMBADA: 71.3.

Is Stable Beluga 2 open source?

No, Stable Beluga 2 is a proprietary model by Unknown.

How does Stable Beluga 2 compare to DeepSeek V3 0324?

Stable Beluga 2 has an average score of 55.5 while DeepSeek V3 0324 scores 55.1. Stable Beluga 2 outperforms DeepSeek V3 0324 overall. See full comparison →

Home/Models/Stable Beluga 2

Stable Beluga 2

Name: Stable Beluga 2
Author: Unknown

by Unknown · Released Jan 2024

55.5

avg score

Rank #90

Compare

Better than 61% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Proprietary

Benchmarks

13 tested

Data updated today

About

Tested on 13 benchmarks with 47.8% average. Top scores: ARC AI2 (81.5%), HellaSwag (78.8%), LAMBADA (71.3%).

Capabilities

reasoning

38.9

#68 globally

math

37.0

#113 globally

knowledge

55.9

#73 globally

language

37.9

#117 globally

general

41.3

#17 globally

Benchmark Scores

Compare All

Tested on 13 benchmarks · Ranked across 5 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

reasoningCompare reasoning →

BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

59.1—

MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

18.6—

mathCompare math →

GSM8K

Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.

69.6—

MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

4.4—

knowledgeCompare knowledge →

ARC AI2

AI2 Reasoning Challenge. Grade-school science questions requiring multi-step reasoning. Easy and Challenge sets test different difficulty levels.

81.5—

HellaSwag

Sentence completion requiring commonsense reasoning about physical and social situations. Tests real-world understanding.

78.8—

LAMBADA

Language modeling benchmark testing ability to predict the last word of passages requiring long-range context understanding.

71.3—

Quick compare:

vs DeepSeek V3 0324

vs Dolphin 2.9.1 Yi 1.5 34b

vs Nemotron-4 15B

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Similar Models

DeepSeek V3 0324

DeepSeek

55.1$0.20/1M

Dolphin 2.9.1 Yi 1.5 34b

Links

Info

Unknown Pricing explorer Developers · API

Research

Technical Report

Documentation

API Docs Playground

Community

@Unknown

BenchGecko API

stable-beluga-2

Specifications

Typetext
ContextN/A
ReleasedJan 2024
LicenseProprietary
Statusbenchmark-only

Available On

UnknownTBD

Frequently Asked Questions

Stable Beluga 2 is a proprietary text AI model by Unknown, released in January 2024. It has an average benchmark score of 55.5.

Benchmarks

ARC AI2 HellaSwag LAMBADA GSM8K PIQA

Unknown · Provider Unknown · Economy All Models Compare Models Pricing Developers · API

Stable Beluga 2

Frequently Asked Questions

Related Models

Benchmarks

Related Pages