How much does Falcon-180B cost?

Falcon-180B is open source and can be self-hosted.

What benchmarks has Falcon-180B been tested on?

Falcon-180B has been evaluated on 17 benchmarks. Top scores: HellaSwag: 85.3, TriviaQA: 79.9, LAMBADA: 79.8.

Is Falcon-180B open source?

Yes, Falcon-180B is open source.

How does Falcon-180B compare to Gemini 2.0 Pro?

Falcon-180B has an average score of 50.5 while Gemini 2.0 Pro scores 50.6. Gemini 2.0 Pro slightly outperforms Falcon-180B overall. See full comparison →

Home/Models/Falcon-180B

Falcon-180B

Name: Falcon-180B
Author: TII

by TII · Released Jan 2024

Open Source

50.5

avg score

Rank #133

Compare

Better than 51% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Open Source

Benchmarks

17 tested

Data updated today

About

Tested on 17 benchmarks with 44.4% average. Top scores: HellaSwag (85.3%), TriviaQA (79.9%), LAMBADA (79.8%).

Capabilities

reasoning

11.8

#148 globally

math

28.6

#165 globally

knowledge

56.3

#84 globally

language

32.6

#140 globally

general

21.9

#42 globally

Benchmark Scores

Compare All

Tested on 17 benchmarks · Ranked across 5 categories

Score Distribution (all 274 models)

0255075100

▲ You are here

reasoningCompare reasoning →

BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

16.1—

MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

7.5—

mathCompare math →

GSM8K

Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.

54.4—

MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

2.8—

knowledgeCompare knowledge →

HellaSwag

Sentence completion requiring commonsense reasoning about physical and social situations. Tests real-world understanding.

85.3—

TriviaQA

Trivia questions sourced from trivia enthusiasts and quiz websites. Tests breadth of general knowledge.

79.9—

LAMBADA

Language modeling benchmark testing ability to predict the last word of passages requiring long-range context understanding.

79.8—

Quick compare:

vs Gemini 2.0 Pro

vs Grok 4 Fast

vs Mistral 7B V0.1

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Similar Models

Links

Info

TII Pricing explorer Developers · API

Research

Documentation

Community

Source Code

BenchGecko API

falcon-180b

Specifications

Typetext
ContextN/A
ReleasedJan 2024
LicenseOpen Source
Statusbenchmark-only

Available On

TIITBD

Frequently Asked Questions

Falcon-180B is an open-source text AI model by TII, released in January 2024. It has an average benchmark score of 50.5.

Benchmarks

HellaSwag TriviaQA LAMBADA Winogrande PIQA

TII · Provider TII · Economy All Models Compare Models Pricing Developers · API

Falcon-180B

Frequently Asked Questions

Related Models

Benchmarks

Related Pages