Subscribe
Sign up for timely perspectives delivered to your inbox.
Portfolio Manager Richard Clode discusses the key learnings from NVIDIA’s annual technology conference, a key event highlighting the rapid progression of the AI revolution, including opportunities created by agentic AI.
Tokens are the ‘new oil’
Data used to be the ‘new oil’ but in a generative AI world where synthetic data creation is limitless, tokens are now the new resource of power. The original innovation of the transformer model that revolutionised and ushered in the generative AI era is built on tokenisation. Tokens are units of data that are processed by AI models during training and inference, enabling prediction, generation and reasoning. Hence tokens equal intelligence, and ultimately will drive greater revenues and profits.
NVIDIA CEO Jensen Huang has long talked about ‘AI factories’ ie. AI datacentres creating tokens and therefore intelligence to better design products, run business more efficiently and improve quality of service to customers. He envisages a future where every company will have two types of factories: manufacturing and mathematics. NVIDIA, being the leader in advanced AI chips, is only now designing future chips with AI accelerated Electronic Design Automation (EDA) software tools – only recently has the software been optimised to work on NVIDIA’s CUDA programming language. The company also announced an all-encompassing partnership with GM (General Motors) to use AI to help GM design cars, improve efficiency, and also to enable autonomous driving.
Agentic AI – the next wave of AI
We remain early in the innovation curve of generative AI. The new scaling law Jensen Huang has been highlighting is test time scaling or the long think reasoning models, which take a longer thought process approach to arrive at a more accurate response rather than prioritising speed. Recently introduced to the market, these reasoning models enable agentic AI. This is AI that has agency in that they understand the context of the problem they have been asked to solve. The breakthrough here is that they can now reason and plan a course of action to solve the problem in a multi-modal way. That could involve reading a website article or watching a video and then simultaneously taking several potential paths to solve the problem, and then sense checking the answers for consistency or plugging those answers back into the question. This solves the challenges that ChatGPT and other one shot inferencing models had with answering simple questions, let alone more complex ones. Agentic AI is greater intelligence that enables the next wave from co-pilots to AI agents that can complete tasks without supervision with a high degree of accuracy and consistency. Agentic AI significantly expands the addressable market for AI and opens up new physical AI applications such as humanoids and autonomous driving, where real-world forces such as gravity, friction, and ’cause and effect’ come into play.
Clarifying the DeepSeek misunderstanding
Jensen was at pains to make the point that the market had completely misunderstood the implications of the launch of DeepSeek’s R1 model earlier this year. Comparing a response from DeepSeek to a standard, non-reasoning model from Meta, DeepSeek’s response was more accurate but required 20x as many tokens and 150x the compute. Far from indicating less compute requirements going forward, DeepSeek was a ‘coming out’ party for reasoning models, which opens up a new scaling vector for AI compute requirements.
NVIDIA also laid out its roadmap through 2027, which culminates with its next AI superchip, Rubin Ultra that will have over 400x the performance of Hopper. That is important because current US semiconductor export restrictions (aimed at limiting China’s access to advanced semiconductors and the equipment needed to produce them) put an absolute ceiling on AI compute at a degraded Hopper level. Over the next few years, new Chinese AI models will be constrained by that compute ceiling, while globally AI models are likely to be training on exponentially higher performance AI infrastructure. That could indicate we are likely at a relative high point in China AI capabilities relative to the rest of the world.
Full stack solutions will solve the AI power challenge
NVIDIA has never been just a semiconductor company – a significant amount of the performance gains and power savings delivered have been a function of software and networking innovation. Jensen has always talked about generative AI being a full stack problem that requires a full stack solution. At its GTC event, the company laid out new innovations such as co-packaged optics, and its Dynamo software virtualisation. Optical networking in AI training clusters is a major power drain with 6 transceivers per GPU, drawing down 30 watts of power each, so as training clusters scale so does the optical power consumption. By packaging optical components in the switches themselves, NVIDIA claims it can provide 3.5x greater power efficiency by using 4x fewer lasers. Dynamo is a virtualisation software layer that optimises inferencing workloads by virtualising the GPUs and slicing and dicing the workloads across them, driving 30x the inferencing performance.
We continue to believe the power challenges that are needed to advance and run AI will be solved by technology innovation. Therefore, more compelling investment opportunities can be found across the technology stack rather than in utilities and power infrastructure.
Agentic AI: uses sophisticated reasoning and iterative planning to autonomously solve complex, multi-step problems. Vast amounts of data from multiple data sources and third-party applications are used to independently analyse challenges, develop strategies and execute tasks.
CUDA: a programming language developed by NVIDIA that uses Graphics Processing Units (GPU). It allows computations to be performed in parallel while providing well-formed speed. CUDA allows Nvidia GPUs to perform common computing tasks, such as processing matrices and other linear algebra operations, rather than simply performing graphical calculations.
DeepSeek: a Chinese AI startup and developer of open-source advanced large language models (LLMs) such as DeepSeek-V3 – a key rival, and less expensive option compared to OpenAI’s ChatGPT and Google’s Gemini.
Electronic Design Automation (EDA): a specific category of hardware, software, services and processes that use computer-aided design to develop complex electronic systems like printed circuit boards, integrated circuits and microprocessors. The dense packing of elements onto a circuit board or microprocessor requires highly complex designs. EDA software uses automated, standardized processes that facilitate rapid development, while minimising bugs, defects, and other design errors.
Full stack solution: refers to a comprehensive approach to software development that covers all layers of an application or project. This includes both the front-end and back-end components, as well as any other layers necessary for the application to function fully.
GPU: a graphics processing unit performs complex mathematical and geometric calculations that are necessary for graphics rendering and are also used in gaming, content creation and machine learning.
Inferencing: refers to artificial intelligence processing. Whereas machine learning and deep learning refer to training neural networks, AI inference applies knowledge from a trained neural network model and uses it to infer a result.
LLM (large language model): a specialised type of artificial intelligence that has been trained on vast amounts of text to understand existing content and generate original content.
Long think reasoning: a deliberate and extended process of considering information and potential outcomes, by analysing multiple perspectives, considering long-term implications, carefully weighing various factors before reaching a conclusion.
One-shot inferencing: refers to the method where a model is provided with a single example or prompt to perform a task. It is reliant on a single, well-crafted prompt to achieve the desired output.
Test time scaling: a language modelling approach that uses extra test-time compute to improve performance.
Token: AI tokens are the fundamental building blocks of input and output that Large Language Models (LLMs) use. These units of data are processed by AI models during training and inference, enabling prediction, generation and reasoning.