How do I get a DeepSeek V4 API key?
Create an account at platform.deepseek.com, then generate an API key in settings. Add credits to activate API calls. The key works like a standard OpenAI API key.
Overview
The DeepSeek V4 API follows the OpenAI Chat Completions format. Two models are available: deepseek-v4-pro (1.6T parameters, 49B activated) and deepseek-v4-flash (284B parameters, 13B activated). Both support a one-million-token context window and Non-think, Think High, and Think Max reasoning modes.
deepseek-v4-pro
The Pro model is the larger variant in the series. It is positioned for deep reasoning, coding, math, and agentic workflows. OpenRouter pricing: $1.74 / 1M input tokens, $3.48 / 1M output tokens.
Use Pro when accuracy and depth of analysis matter more than speed.
deepseek-v4-flash
The Flash model keeps the one-million-token context window with a more compact size. OpenRouter pricing: $0.14 / 1M input tokens, $0.28 / 1M output tokens. Ideal for high-frequency calls and low-latency pipelines.
Use Flash for summaries, routine writing, and high-volume pipelines.
Integration
The API follows the OpenAI Chat Completions format. Change the base URL and key to migrate from an existing OpenAI client.
Endpoint: https://api.deepseek.com/v1/chat/completions. Standard parameters: messages, model, temperature, max_tokens, stream.
Long context
Both models support 1M tokens of context, allowing very long documents to be sent in a single request.
For Think Max, the model card recommends at least 384K tokens of thinking budget in max_tokens.
Parameters
Control reasoning depth via the thinking_mode parameter: non-think, think-high, or think-max.
Non-think prioritizes speed. Think High improves accuracy. Think Max pushes reasoning to the maximum.
Integration
The API supports Server-Sent Events streaming with stream: true for real-time responses.
Compatible with OpenAI Python and Node.js clients by simply changing base_url and api_key.
Capabilities
V4 models support function calling and tool calls in the OpenAI format for agentic workflows.
Useful for agents that need to call external APIs, execute code, or orchestrate multi-step tasks.
Resources
Model weights and source code are available on Hugging Face under the MIT license.
Community repository: Rooc/DeepSeek-V4-Pro on Hugging Face and GitHub for integration scripts.
Why use the DeepSeek V4 API
DeepSeek V4 combines a one-million-token context window, direct OpenAI compatibility, and competitive pricing. Flash is one of the cheapest options on the market for high-volume pipelines.
Flash at $0.14 / 1M input tokens is among the cheapest for a model with 1M context. Pro at $1.74 / 1M input tokens remains competitive for complex tasks.
Change base_url to https://api.deepseek.com/v1 and replace your API key. The rest of the code stays identical for Chat Completions calls.
Non-think, Think High, and Think Max let you trade latency for quality depending on task complexity.
1M tokens allows sending entire contracts, large codebases, or long research notes in a single API request.
Quick integration
The API is OpenAI-compatible, making migration or direct integration very fast.
Step 1: Get an API key
Step 2: Configure the client
Step 3: Choose the model
Benchmarks
The official model card presents results on MMLU-Pro, HumanEval, GSM8K, LongBench-V2, SWE Verified, and MCPAtlas. Pro is positioned for tasks where accuracy and depth of reasoning matter most.

Model comparison
Pro (1.6T parameters, 49B activated) is optimized for deep reasoning. Flash (284B parameters, 13B activated) is more compact and about 12x cheaper on input. Both support 1M token context.

Long context
The NIAH (Needle In A Haystack) test measures the model's ability to retrieve specific information from a very long context. DeepSeek V4 maintains strong performance across the full 1M token window.

Official resources
All the resources you need to integrate and evaluate DeepSeek V4 in your projects.
The complete DeepSeek API documentation covers authentication, endpoints, parameters, streaming, function calling, and code examples in Python and Node.js.
The official model card details the architecture (hybrid attention, manifold-constrained hyper-connections, Muon optimizer), benchmarks (MMLU-Pro, HumanEval, LongBench-V2, SWE Verified, MCPAtlas), and recommended sampling parameters.
The GitHub repository contains integration scripts, code examples, encoding notes, and test cases for DeepSeek V4 Pro.
FAQ
Answers to the most common questions about integration and usage.
Create an account at platform.deepseek.com, then generate an API key in settings. Add credits to activate API calls. The key works like a standard OpenAI API key.
Flash is about 12x cheaper than Pro on input ($0.14 vs $1.74 / 1M tokens). For high-volume pipelines or simple tasks, Flash is the cost-effective choice. Pro is justified for complex tasks where quality matters most.
Yes. Change base_url to https://api.deepseek.com/v1 and replace api_key. The rest of the code (messages, temperature, max_tokens, stream, tools) stays identical.
Pass the thinking_mode parameter with the value non-think, think-high, or think-max. For Think Max, plan for at least 384K tokens in max_tokens according to the official model card.
Weights are available on Hugging Face (Rooc/DeepSeek-V4-Pro) under the MIT license. The model card includes encoding instructions and recommended sampling parameters (temperature 1.0, top_p 1.0).
The official model card includes MMLU-Pro, HumanEval, GSM8K, LongBench-V2, LiveCodeBench, SWE Verified, and MCPAtlas. These evaluations cover world knowledge, coding, math, long context, and agentic tasks.
Get started
Create an account at platform.deepseek.com to access the API. Check the official documentation for code examples and detailed parameters.