A laboratory in China has launched one of the most formidable open AI models. The model named DeepSeek V3 was crafted by the AI company DeepSeek and was made available on Wednesday under a flexible license. It enables the developers to download and adapt it for various users, including commercial applications.
DeepSeek V3 is an expert at managing a wide range of text-based tasks such as translation, composing essays, coding, and composing emails based on descriptive and detailed prompts.
Based on the internal benchmarks conducted by DeepSeek, DeepSeek V3 surpasses both freely available models and proprietary AI systems that need API access. In a series of coding challenges on Codeforces, DeekSeek as a competitive programming platform outshines other models including OpenAI’s GPT-4o, Llama 3.1 405B, and Alibaba’s Qwen 2.5 72B.
Moreover, DeepSeek V3 reigns supremacy in Aider Polygot which is a test designed to evaluate the ability of the model to write new code that is seamlessly integrated with the existing codebases.
DeepSeek emphasizes that DeepSeek V3 was trained on an astonishing dataset that consists of 14.8 trillion tokens. In the world of data science, tokens represent fragments of raw data, with one million tokens matching 750,000 words.
DeepSeek V3’s scale is equally impressive and it boasts 671 billion parameters or 685 billion on the AI development platform Hugging Face. These parameters are the internal variables that models use to make predictions or decisions. This makes it roughly 1.6 times larger than Llama 3.1 405 B which has around 405 bullion parameters.
There would be a need for a cluster of high-performance GPUs for an unoptimized version of DeepSeek V3 to respond to inquiries at acceptable speeds. Even though it may not be the most practical model available, DeepSeek V3 signifies a major milestone.