DeepSeek marks a potential shift in the AI competitive landscape

Portfolio Manager

27 Jan 2025
6 minute read

Key takeaways:

DeepSeek’s innovative V3 LLM and reinforcement learning-based reasoning model R1 appear to suggest that the company has made advancements in offering more efficient and cost-effective AI solutions.
This is driving a reassessment of AI investment strategies, focusing attention on the sustainability of AI capital expenditures, the AI competitive landscape, and the monetisation of AI.
A more selective approach to identifying AI capex beneficiaries, as well as looking ahead to the next phases of AI investment opportunity is crucial as this new tech wave develops.

What has DeepSeek achieved in terms of LLM innovation?

DeepSeek, the Chinese AI startup and developer of open-source large language models (LLMs) launched its third generation V3 LLM in December 2024. DeepSeek-V3, which is a mixture of experts (MoE) model that is benchmarking well against the best developed LLMs in the West and this month DeepSeek-R1, which is a reinforcement learning reasoning model that benchmarks well against OpenAI’s o1 generative pre-trained transformer (GPT). V3 uses a MoE model taking several smaller models working together with a total of 671 billion parameters and only 37 billion active parameters at any given moment for each token during inferencing. V3 has further innovations such as multi-head latent attention (MHLA) reducing cache and memory size/usage, mixed precision computation on FP8 and a post-training phase re-architecture. Now MoE always looks more efficient as only a portion of the total parameters are active at any given point during token inferencing so that’s not overly surprising albeit V3 looks even more efficient, about 10x vs peers and 3-7x given other innovations. The DeepSeek-R1 model is claimed uniquely to have done away with supervised fine tuning. So there seems to be some innovation there, even if a lot of the headline improvements come from more standard techniques, while there is a wider debate on how much of the work DeepSeek has done themselves and how much is from leveraging open-source third-party LLMs.

3 key reasons why the markets are concerned with DeepSeek

1. DeepSeek appears to have significantly lower training costs

DeepSeek claims to have trained V3 on only 2,048 NVIDIA H800 GPUs for two months, which at US$2 per hour explains the US$5 million total cost headline number announced. That is a fraction of what Western hyperscalers are throwing at their LLM training (eg. it’s 9% of the compute used for Meta’s LLaMA 3.1 405B model).

2. China can still compete despite US restrictions

DeepSeek shows that a Chinese company can compete with the US best-of-breed AI companies, despite the current restrictions on Chinese access to advanced US semiconductor technology. This evokes memories of a generation of Russian coders, who given restrictions on PC time in post-Soviet Russia, invented ingenious ways to code. Has the same thing happened in China where semi restrictions have forced greater LLM architecture innovation vs the US who has just relied on throwing the compute kitchen sink at the problem?

3. AI monetisation

DeepSeek is charging significantly less than OpenAI to use its models (about 20-40x lower), which plays into the AI monetisation concern given the extraordinary amounts of capex deployed in the West.

A notable AI force

The global AI ecosystem is taking note of DeepSeek’s developments. Despite only being launched two years ago (2023), DeepSeek benefits from the pedigree and backing of the team at quantitative fund High-Flyer Capital Management, as well as the success and innovation of its prior generation models. This is why while V3 was launched in December and R1 earlier this month, the market is only reacting now because R1’s reasoning capabilities are now viewed as cutting edge. Plus, over the last weekend DeepSeek became the top free app on Apple’s AppStore, overtaking ChatGPT. Silicon Valley investor Marc Andreessen posted that DeepSeek is “one of the most amazing and impressive breakthroughs I’ve ever seen,” which is high praise from a credible industry veteran. Comments like that have heightened the market’s concerns for the sustainability of AI capex and associated companies like NVIDIA.

What do we make of all this?

New technology waves require innovation

Any new technology wave requires innovation to drive down the cost curve over time to enable mass adoption. We are witnessing multiple avenues of AI innovation to address scaling issues with training LLMs as well as more efficient inferencing. DeepSeek appears to bring some genuine innovation to the architecture of general purpose and reasoning models. Innovation and the driving down of costs are key to unlocking AI and enabling mass adoption longer term.

Distillation

DeepSeek’s model leverages a technique called distillation, which is being pursued more broadly in the AI industry. Distillation refers to equipping smaller models with the abilities of larger ones, by transferring the learnings of the larger, teacher model into the smaller, student one. However, it is important to note DeepSeek’s distillation techniques are reliant on the work of others. Exactly how reliant is a key question the market is grappling with currently.

Take the capex number with a pinch of salt:

Related to the above, the capex numbers referred to are just comparing apples to oranges. The US$5 million cited relates to just one training run, ignoring any prior training runs and the training of the larger teacher models, whether at DeepSeek or the third-party open source LLMs they were built on.

Open source innovation

As AI luminary Yann LeCun has noted, this is a victory for the open source model of driving community innovation with DeepSeek leveraging Meta’s Llama and Alibaba’s Qwen open source models. Again this is positive for the longer-term development of AI, driving and proliferating innovation. However, due to the current state of geopolitics one would probably expect greater US government scrutiny on other countries accessing state of the art AI LLMs from the US.

LLMs commoditising?

It has long been our belief that monetising LLMs in the longer term will be challenging given the volume of competition, including from open source developers and competitors looking to monetise in alternative ways. The DeepSeek announcement only brings greater scrutiny to the return on investment (ROI) of the huge capex general purpose foundational model developers are spending.

Investment Implications

The concerns around DeepSeek play into the growing debate on AI scaling challenges as well as the ROI of AI capex spend, and ultimately, concerns around the sustainability of AI capex beneficiary earnings and the prices the market is willing to pay. We continue to expect ongoing strong spending on AI capex as seen recently from announcements by Meta and the Stargate AI project. But we also think we need to be more selective in those AI capex beneficiaries, as well as think about the next phases of AI investment opportunity as this new tech wave develops.

We characterise infrastructure as the first phase of a new wave followed by platforms and then the software, applications and services. We are approaching that pivot to the platform phase led by the cloud but still see longer-term investment opportunities in AI infrastructure as well. The market has rapidly shifted from concerns on AI capex being too high, to now worrying that AI capex is going to collapse. Both cannot happen simultaneously, and the truth likely lies in between. Ultimately, we think these developments are positive for the long-term health and development of AI. We continue to identify selective AI infrastructure beneficiaries and build our exposure to platforms that will benefit from more efficient AI compute, training models and inferencing.

Source for DeepSeek information: https://api-docs.deepseek.com/news/news250120

Definitions

AI token: the smallest units of data used by a language model to process and generate text.

Capex/capital expenditure: company spending to acquire or upgrade physical assets such as buildings, machinery, equipment, technology etc. to maintain or improve operations and foster future growth.

GPT or Generative Pre-trained Transformers: a family of neural network models that use the transformer architecture, which power generative AI applications such as ChatGPT.

GPU: a graphics processing unit performs complex mathematical and geometric calculations that are necessary for graphics rendering and are also used in gaming, content creation and machine learning.

Inference or inferencing: refers to artificial intelligence processing. Whereas machine learning and deep learning refer to training neural networks, AI inference applies knowledge from a trained neural network model and uses it to infer a result.

Hyperscalers: companies that provide infrastructure for cloud, networking, and internet services at scale. Examples include Google Cloud, Microsoft Azure, Facebook Infrastructure, Alibaba Cloud, and Amazon Web Services.

LLM (large language model): a specialised type of artificial intelligence that has been trained on vast amounts of text to understand existing content and generate original content.

MoE (Mixture of Experts Model): a machine learning approach that divides an AI model into separate sub-networks/experts to jointly perform a task. This enables significant cost reduction and faster performance for inferencing because specific experts are used for a task, instead of activating the entire neural network for every task.

Open source software: code that is designed to be publicly accessible, in terms of viewing, modifying and distributing.

Reinforcement Learning (RL): a technique where the AI learns by interacting with its environment and receiving feedback in the form of rewards or penalties. This allows the AI to adapt and evolve, as well as improve its logical and problem-solving skills.

ROI (return on investment): is a financial ratio used to measure the performance of an investment, calculated by dividing net profit/loss by the initial cost of the investment.

Richard Clode, CFA

Portfolio Manager

27 Jan 2025
6 minute read

Key takeaways:

DeepSeek’s innovative V3 LLM and reinforcement learning-based reasoning model R1 appear to suggest that the company has made advancements in offering more efficient and cost-effective AI solutions.
This is driving a reassessment of AI investment strategies, focusing attention on the sustainability of AI capital expenditures, the AI competitive landscape, and the monetisation of AI.
A more selective approach to identifying AI capex beneficiaries, as well as looking ahead to the next phases of AI investment opportunity is crucial as this new tech wave develops.

24 Jan 2025

Timely & Topical

DeepSeek marks a potential shift in the AI competitive landscape

Key takeaways:

What has DeepSeek achieved in terms of LLM innovation?

3 key reasons why the markets are concerned with DeepSeek

A notable AI force

What do we make of all this?

Investment Implications

Key takeaways:

Stargate AI: further reacceleration of the AI compute wave

Shifting demographics are driving tech and sustainability preferences

Powering down an AI future

Subscribe

DeepSeek marks a potential shift in the AI competitive landscape

Key takeaways:

What has DeepSeek achieved in terms of LLM innovation?

3 key reasons why the markets are concerned with DeepSeek

A notable AI force

What do we make of all this?

Investment Implications

Related themes

Key takeaways:

Related insights

Stargate AI: further reacceleration of the AI compute wave

Shifting demographics are driving tech and sustainability preferences

Powering down an AI future

Featured products

Subscribe