How much does Llama 2-13B cost?

Llama 2-13B is open source and can be self-hosted.

What benchmarks has Llama 2-13B been tested on?

Llama 2-13B has been evaluated on 14 benchmarks. Top scores: TriviaQA: 79.6, LAMBADA: 76.5, HellaSwag: 74.3.

Is Llama 2-13B open source?

Yes, Llama 2-13B is open source.

How does Llama 2-13B compare to Qwen3 8B?

Llama 2-13B has an average score of 40.7 while Qwen3 8B scores 40.6. Llama 2-13B outperforms Qwen3 8B overall. See full comparison →

Home/Models/Llama 2-13B

Llama 2-13B

Name: Llama 2-13B
Author: Meta

by Meta · Released Jan 2024

Open Source

40.7

avg score

Rank #142

Compare

Better than 39% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Open Source

Benchmarks

14 tested

Data updated today

About

Tested on 14 benchmarks with 42.5% average. Top scores: TriviaQA (79.6%), LAMBADA (76.5%), HellaSwag (74.3%).

Capabilities

reasoning

44.3

#58 globally

math

20.1

#159 globally

knowledge

46.5

#116 globally

Benchmark Scores

Compare All

Tested on 14 benchmarks · Ranked across 3 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

reasoningCompare reasoning →

BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

44.3—

mathCompare math →

GSM8K

Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.

36.9—

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

3.3—

knowledgeCompare knowledge →

TriviaQA

Trivia questions sourced from trivia enthusiasts and quiz websites. Tests breadth of general knowledge.

79.6—

LAMBADA

Language modeling benchmark testing ability to predict the last word of passages requiring long-range context understanding.

76.5—

HellaSwag

Sentence completion requiring commonsense reasoning about physical and social situations. Tests real-world understanding.

74.3—

Quick compare:

vs Qwen3 8B

vs Mistral Medium 3

vs GPT-4.1

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · Meta Llama 2

Llama 2 7b Chat HfJul 2023

27.3

N/AN/Actx11 benchmarks

Llama 2 7b HfJul 2023

23.6-3.7

N/AN/Actx11 benchmarks

Llama 2-13BJan 2024

42.5+18.9

N/AN/Actx14 benchmarks

See the full Llama 2 family →

Similar Models

Links

Info

Meta Pricing explorer Developers · API

Research

Documentation

Community

Source Code

BenchGecko API

llama-2-13b

Specifications

Typetext
ContextN/A
ReleasedJan 2024
LicenseOpen Source
Statusbenchmark-only

Available On

MetaTBD

Frequently Asked Questions

Llama 2-13B is an open-source text AI model by Meta, released in January 2024. It has an average benchmark score of 40.7.

Benchmarks

TriviaQA LAMBADA HellaSwag PIQA ARC AI2

Meta · Provider Meta · Economy All Models Compare Models Pricing Developers · API

Llama 2-13B

Frequently Asked Questions

Related Models

Benchmarks

Related Pages