How much does LLaMA-13B cost?

LLaMA-13B is open source and can be self-hosted.

What benchmarks has LLaMA-13B been tested on?

LLaMA-13B has been evaluated on 20 benchmarks. Top scores: Chatbot Arena Elo — Overall: 970.9, TriviaQA: 77.9, LAMBADA: 75.2.

How does LLaMA-13B compare to StarCoder 2 15B?

LLaMA-13B has an average score of 30.4 while StarCoder 2 15B scores 30.5. StarCoder 2 15B slightly outperforms LLaMA-13B overall. See full comparison →

Home/Models/LLaMA-13B

LLaMA-13B

Name: LLaMA-13B
Author: Meta

by Meta · Released Jan 2024

Open Source

30.4

avg score

Rank #178

Compare

Better than 24% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Open Source

Benchmarks

20 tested

Data updated today

About

Tested on 20 benchmarks with 34.9% average. Top scores: Chatbot Arena Elo — Overall (970.9%), TriviaQA (77.9%), LAMBADA (75.2%).

Capabilities

reasoning

9.6

#141 globally

math

11.8

#184 globally

knowledge

43.9

#129 globally

language

25.3

#131 globally

general

25.3

#35 globally

Benchmark Scores

Compare All

Tested on 20 benchmarks · Ranked across 6 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

reasoningCompare reasoning →

BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

17.2—

MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

2.0—

mathCompare math →

GSM8K

Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.

20.6—

MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

3.1—

knowledgeCompare knowledge →

TriviaQA

Trivia questions sourced from trivia enthusiasts and quiz websites. Tests breadth of general knowledge.

77.9—

LAMBADA

Language modeling benchmark testing ability to predict the last word of passages requiring long-range context understanding.

75.2—

HellaSwag

Sentence completion requiring commonsense reasoning about physical and social situations. Tests real-world understanding.

72.3—

Quick compare:

vs StarCoder 2 15B

vs Gemma 2B

vs Llama 3 70B Instruct

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Similar Models

Frequently Asked Questions

LLaMA-13B is an open-source text AI model by Meta, released in January 2024. It has an average benchmark score of 30.4.

Benchmarks

Chatbot Arena Elo — Overall TriviaQA LAMBADA HellaSwag PIQA

Meta · Provider Meta · Economy All Models Compare Models Pricing Developers · API

LLaMA-13B

Frequently Asked Questions

Related Models

Benchmarks

Related Pages