Our Thesis

Advancements in AI are fundamentally driven by progress in its three core pillars: data, compute, and algorithms. Innovations in any of these areas have the potential to significantly influence the development and scalability of AI systems. In the sections below, we explore key trends and technologies shaping these pillars and their transformative impact on the future of AI.

Future of LLM Inference is in cloud

Leading researchers believe that the techniques powering OpenAI's recently released O1 model could reshape the AI arms race and redefine the resources AI companies rely on—ranging from energy consumption to the ever-growing demand for powerful GPUs.

The key innovation driving this shift is “test-time compute,” a technique that enhances model output during the inference phase—the stage when models are actively used. Rather than immediately selecting a single answer, this approach enables models to generate and evaluate multiple possibilities in real-time, ultimately generating the most optimal outcome.

This shift will move us from a world of massive pre-training clusters toward inference clouds, which are distributed cloud-based servers for inference.

Scaling pre-trained models now depends on access to private data

Until recently, the primary bottleneck for scaling AI systems has been procuring GPU chips which are rapidly increasing in supply. But now the most critical input, for scaling AI is the availability of new training datasets. The most powerful AI systems to date—GPT-4o—have been trained on trillions of words of human-generated text scraped from the internet. However, this finite pool of publicly available data raises a pressing issue: Could the shortage of high-quality training data become the primary bottleneck to scaling? This is what Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI says:

“Results from scaling up pre-training - the phase of training an AI model that uses a vast amount of unlabeled public data to understand language patterns and structures - have plateaued”

To overcome this limitation, access to high-quality, private datasets is becoming increasingly important. This shift underscores the need for innovative approaches to leverage untapped data sources, ensuring continued progress in AI development.

Embodied AI will be the catalyst for accelerating privacy-preserving AI

Given the computational and energy demands, compute is shifting away from end-users or target applications of large language models (LLMs). A notable example in the consumer space is the recent launch of Apple Intelligence, which leverages homomorphic encryption to perform complex machine learning computations securely in the cloud. Embodied AI systems may process inputs such as voice commands, video streams, biometric data, and environmental signals. To build trust and comply with stringent privacy regulations, embodied AI would require advanced privacy-preserving inference techniques that safeguard user data while enabling real-time computations. Looking ahead, future AI models will embrace architectures tailored to meet the privacy, performance, and scalability needs of embodied AI. These innovations will ensure secure, efficient, and context-aware operations that align with the dynamic requirements of real-world applications.

FHE accelerator chips are atleast 3-4 years away but even those will fall short

Currently, several research teams are exploring FPGA-based implementations of the tFHE circuit to accelerate arithmetic and bootstrapping operations. Some companies are also working on ASIC designs, specifically tailored to enhance arithmetic operations and bootstrapping, with projections targeting 2027 for public availability. However, FHE chips may face several challenges: Transformers’ Computational Complexity: Arithmetic operations are just a small subset of the linear operations within a transformer block. Beyond these, transformers also include numerous non-linear components, such as Softmax and Activation functions like GeLU or ReLU in feed-forward networks, which are even more computationally intensive under FHE. Speed Acceleration Limitation: FHE accelerator chips are projected to have at least a magnitude of improvement in speed, but even for a small LLM model, achieving the throughput necessary for real-time inference remains an ambitious goal. FHE needs at-least another 3-4 magnitude of acceleration before becoming viable for any AI-based application.

PreviousOpenVector NextOur Solution

Last updated 5 months ago