Can I use DeepSeek models commercially?

Yes · DeepSeek releases under permissive licenses (MIT for R1, custom commercial-friendly for V3). Check per-model license.

Is DeepSeek as good as GPT-5?

On some benchmarks yes (coding, reasoning). On others no (multimodal, instruction-following fine-detail). Very close on most Western-language general benchmarks.

Model familiesReading · ~3 min · 89 words deep

DeepSeek

Chinese AI lab that shipped DeepSeek V3 (MoE 671B) and R1 (open reasoning model) · crashed frontier pricing in 2025.

DeepSeek family on /family

TL;DR

Chinese AI lab that shipped DeepSeek V3 (MoE 671B) and R1 (open reasoning model) · crashed frontier pricing in 2025.

Live · family members

DeepSeek variants tracked on BenchGecko

Full family page

DeepSeek V3.2 Speciale

$0.40/M in · 78.2 avg

DeepSeek-V2 (MoE-236B, May 2024)

price TBD · 76.5 avg

DeepSeek V3

$0.32/M in · 59.0 avg

DeepSeek R1 Distill Qwen 14B

price TBD · 56.0 avg

DeepSeek V3 0324

$0.20/M in · 55.1 avg

DeepSeek V3.2 Exp

$0.27/M in · 53.2 avg

DeepSeek V3.2

$0.26/M in · 53.0 avg

DeepSeek V3.1

$0.15/M in · 51.1 avg

Level 1

Basic

DeepSeek is a Hangzhou-based AI research company. Their DeepSeek V3 (Dec 2024) was a 671B-parameter MoE with 37B active, matching GPT-4 class benchmarks at roughly 1/30th the inference cost. DeepSeek R1 (Jan 2025) demonstrated that pure RL on verifiable rewards produces reasoning behavior · and they open-sourced the weights and recipe. V3.2 (2025) and R2 (2026) continued the series. The DeepSeek moment reshaped expectations around what open-weight models could do.

Level 2

Deep

DeepSeek V3 architecture: 256 routed experts + 1 shared expert, 8 experts active per token, 671B total parameters, 37B active. Multi-head Latent Attention (MLA) reduces KV cache size by ~7× vs standard attention. DeepSeek R1 applied pure outcome-reward RL to a strong base model and observed emergent long-CoT reasoning, including self-reflection and self-correction · no process supervision needed. Distilled R1 variants (1.5B to 70B) perform well above their size class on reasoning benchmarks. DeepSeek's pricing: V3 at $0.27/M output, R1 at $2.19/M output · order-of-magnitude cheaper than Western frontier.

Level 3

Expert

DeepSeek V3 training details (publicly disclosed): 14.8T tokens, FP8 training from scratch (first frontier-class model to do so), estimated $5-6M training cost. Multi-head Latent Attention compresses KV cache via low-rank projection. DualPipe pipelining overlaps communication and computation for efficient multi-GPU training. R1 training: GRPO (Group Relative Policy Optimization) algorithm · RLHF variant that uses group-wise rewards, avoiding need for explicit value network. R1 paper (Jan 2025) showed pure RL recipe without SFT warmup also works, contradicting conventional wisdom. Architecture + training efficiency are DeepSeek's differentiators · they don't have access to H100s in the same quantity as Western labs.

Why this matters now

DeepSeek V3 and R1 opened the door to open-weight frontier quality at 1/30th the price. Every pricing comp since accounts for DeepSeek as the floor.

The takeaway for you

Depending on why you're here

If you are a

Researcher

·MoE + MLA + FP8 training · three stacked efficiencies
·Pure RL produces reasoning without process supervision (R1 result)
·GRPO as a cheaper RLHF alternative

If you are a

Builder

·DeepSeek V3 for cheap high-quality inference · $0.27/M output
·R1 for open-weight reasoning · $2.19/M output
·Self-host for max cost savings if you have the VRAM (800GB+ for 671B MoE)

If you are a

Investor

·DeepSeek crashed frontier pricing expectations · reshapes US lab margins
·Demonstrates Chinese AI labs can match frontier at fraction of Western capex
·Export controls on NVIDIA GPUs didn't prevent frontier-class training

If you are a

Curious · Normie

·Chinese AI that's almost as smart as GPT-5 but way cheaper
·Open source · anyone can download and run it
·Kicked off the "AI is getting cheaper fast" narrative

Gecko's take

DeepSeek V3 + R1 are the single biggest pricing events of 2025. Every API price floor is now set by what DeepSeek charges.

Frequently Asked Questions

Hangzhou, China. Founded by High-Flyer Quant in 2023.

Canonical sources

Read the primary sources

DeepSeek V3 paper (2024)arxiv.org
DeepSeek R1 paper (2025)arxiv.org

DeepSeek

Basic

Deep

Expert

Depending on why you're here

Frequently Asked Questions

Read the primary sources

Related terms

Glossary

Explore live data

Cite or embed