DeepSeek V4 API | Integration guide, Pro & Flash models, pricing

Overview

DeepSeek V4 exposes an OpenAI-compatible API with two models

The DeepSeek V4 API follows the OpenAI Chat Completions format. Two models are available: deepseek-v4-pro (1.6T parameters, 49B activated) and deepseek-v4-flash (284B parameters, 13B activated). Both support a one-million-token context window and Non-think, Think High, and Think Max reasoning modes.

deepseek-v4-pro

deepseek-v4-pro for complex tasks and advanced reasoning

The Pro model is the larger variant in the series. It is positioned for deep reasoning, coding, math, and agentic workflows. OpenRouter pricing: $1.74 / 1M input tokens, $3.48 / 1M output tokens.

Use Pro when accuracy and depth of analysis matter more than speed.

Get API key Official docs

deepseek-v4-flash

deepseek-v4-flash for fast responses and everyday use

The Flash model keeps the one-million-token context window with a more compact size. OpenRouter pricing: $0.14 / 1M input tokens, $0.28 / 1M output tokens. Ideal for high-frequency calls and low-latency pipelines.

Use Flash for summaries, routine writing, and high-volume pipelines.

Get API key See pricing

Integration

OpenAI-compatible

The API follows the OpenAI Chat Completions format. Change the base URL and key to migrate from an existing OpenAI client.

Endpoint: https://api.deepseek.com/v1/chat/completions. Standard parameters: messages, model, temperature, max_tokens, stream.

Available

Official docs

Long context

1M token context

Both models support 1M tokens of context, allowing very long documents to be sent in a single request.

For Think Max, the model card recommends at least 384K tokens of thinking budget in max_tokens.

Available

View model card

Parameters

Reasoning modes

Control reasoning depth via the thinking_mode parameter: non-think, think-high, or think-max.

Non-think prioritizes speed. Think High improves accuracy. Think Max pushes reasoning to the maximum.

Available

Test in chat

Integration

SSE streaming

The API supports Server-Sent Events streaming with stream: true for real-time responses.

Compatible with OpenAI Python and Node.js clients by simply changing base_url and api_key.

Available

Official docs

Capabilities

Function calling

V4 models support function calling and tool calls in the OpenAI format for agentic workflows.

Useful for agents that need to call external APIs, execute code, or orchestrate multi-step tasks.

Available

Test in chat

Resources

Model weights and code

Model weights and source code are available on Hugging Face under the MIT license.

Community repository: Rooc/DeepSeek-V4-Pro on Hugging Face and GitHub for integration scripts.

Open source

Hugging Face

Why use the DeepSeek V4 API

A cost-effective API with long context and adjustable reasoning modes

DeepSeek V4 combines a one-million-token context window, direct OpenAI compatibility, and competitive pricing. Flash is one of the cheapest options on the market for high-volume pipelines.

Competitive pricing

Flash at $0.14 / 1M input tokens is among the cheapest for a model with 1M context. Pro at $1.74 / 1M input tokens remains competitive for complex tasks.

Easy migration from OpenAI

Change base_url to https://api.deepseek.com/v1 and replace your API key. The rest of the code stays identical for Chat Completions calls.

Adjustable reasoning

Non-think, Think High, and Think Max let you trade latency for quality depending on task complexity.

Long context for documents

1M tokens allows sending entire contracts, large codebases, or long research notes in a single API request.

Quick integration

Get started with the DeepSeek V4 API in 3 steps

The API is OpenAI-compatible, making migration or direct integration very fast.

Step 1: Get an API key

Create an account at platform.deepseek.com.
Generate an API key in account settings.
Add credits to activate API calls.

Step 2: Configure the client

Use the OpenAI Python or Node.js SDK.
Change base_url to https://api.deepseek.com/v1.
Replace api_key with your DeepSeek key.

Step 3: Choose the model

deepseek-v4-pro for complex tasks and reasoning.
deepseek-v4-flash for fast and cost-effective pipelines.

Get API key

Benchmarks

DeepSeek V4 Pro on reasoning and coding benchmarks

The official model card presents results on MMLU-Pro, HumanEval, GSM8K, LongBench-V2, SWE Verified, and MCPAtlas. Pro is positioned for tasks where accuracy and depth of reasoning matter most.

MMLU-Pro and HumanEval for world knowledge and coding.
LongBench-V2 for long-document tasks.
SWE Verified and MCPAtlas for agentic workflows.

View model card

DeepSeek V4 Pro benchmarks — reasoning and coding

Model comparison

Pro vs Flash — choose based on task complexity

Pro (1.6T parameters, 49B activated) is optimized for deep reasoning. Flash (284B parameters, 13B activated) is more compact and about 12x cheaper on input. Both support 1M token context.

Pro at $1.74 / 1M input tokens for complex tasks.
Flash at $0.14 / 1M input tokens for high-volume pipelines.
Same 1M token context and reasoning modes for both.

Get API key

Long context

1M token context for long documents and large codebases

The NIAH (Needle In A Haystack) test measures the model's ability to retrieve specific information from a very long context. DeepSeek V4 maintains strong performance across the full 1M token window.

Send entire contracts, manuals, or long research notes.
Analyze large codebases in a single API request.
For Think Max, plan for at least 384K tokens in max_tokens.

Official docs

DeepSeek V4 NIAH test — 1M token long context

Official resources

Documentation, model cards, and repositories for DeepSeek V4

All the resources you need to integrate and evaluate DeepSeek V4 in your projects.

Official API documentation

The complete DeepSeek API documentation covers authentication, endpoints, parameters, streaming, function calling, and code examples in Python and Node.js.

DeepSeek-V4-Pro model card on Hugging Face

The official model card details the architecture (hybrid attention, manifold-constrained hyper-connections, Muon optimizer), benchmarks (MMLU-Pro, HumanEval, LongBench-V2, SWE Verified, MCPAtlas), and recommended sampling parameters.

DeepSeek-V4-Pro GitHub repository

The GitHub repository contains integration scripts, code examples, encoding notes, and test cases for DeepSeek V4 Pro.

FAQ

Frequently asked questions about the DeepSeek V4 API

Answers to the most common questions about integration and usage.

How do I get a DeepSeek V4 API key?

Create an account at platform.deepseek.com, then generate an API key in settings. Add credits to activate API calls. The key works like a standard OpenAI API key.

What is the pricing difference between Pro and Flash?

Flash is about 12x cheaper than Pro on input ($0.14 vs $1.74 / 1M tokens). For high-volume pipelines or simple tasks, Flash is the cost-effective choice. Pro is justified for complex tasks where quality matters most.

Is the API compatible with the OpenAI SDK?

Yes. Change base_url to https://api.deepseek.com/v1 and replace api_key. The rest of the code (messages, temperature, max_tokens, stream, tools) stays identical.

How do I use reasoning modes via the API?

Pass the thinking_mode parameter with the value non-think, think-high, or think-max. For Think Max, plan for at least 384K tokens in max_tokens according to the official model card.

Where can I find model weights for local deployment?

Weights are available on Hugging Face (Rooc/DeepSeek-V4-Pro) under the MIT license. The model card includes encoding instructions and recommended sampling parameters (temperature 1.0, top_p 1.0).

Which benchmarks does DeepSeek V4 Pro cover?

The official model card includes MMLU-Pro, HumanEval, GSM8K, LongBench-V2, LiveCodeBench, SWE Verified, and MCPAtlas. These evaluations cover world knowledge, coding, math, long context, and agentic tasks.

Get started

Get your DeepSeek V4 API key and integrate Pro or Flash

Create an account at platform.deepseek.com to access the API. Check the official documentation for code examples and detailed parameters.

Get API key Official docs