Musk: Grok 3 requires 100,000 NVIDIA H100 AI GPUs

Elon-Musk-plans-to-take-xAI-chatbot-Grok-open-source-80-3
Translate from : Musk: Grok 3 kræver 100.000 NVIDIA H100 AI GPUs
Elon Musk will train the next generation Grok 3 AI chatbot with 100,000 NVIDIA H100 AI GPUs.

Elon Musk, CEO of Tesla and founder of xAI, made some bold predictions about the development of Artificial General Intelligence (AGI) and discussed the challenges facing the AI industry. He predicts that AGI could surpass human intelligence as early as next year or by 2026, but that it will require an extreme number of processors to train, which in turn requires huge amounts of electricity, Reuters reports.

Musk's project, xAI, is currently training the second version of its Grok large language model and expects to complete its next training phase in May. Training Grok's version 2 model required as many as 20,000 Nvidia H100 GPUs, and Musk expects future iterations to require even greater resources, with the Grok 3 model needing around 100,000 Nvidia H100 chips for training.

ia55QP8m9rcHgUNEx8QqsQ.jpg

Advances in AI technology, according to Musk, are currently being hampered by two main factors: a lack of high-end processors, like Nvidia's H100, since it's not easy to get 100,000 of them quickly, and the availability of electricity.

Nvidia's H100 GPU consumes about 700W when fully utilized, so 100,000 GPUs for AI and HPC tasks can consume up to 70 megawatts of power. Since these GPUs need servers and cooling to operate, it's safe to say that a data center with 100,000 Nvidia H100 processors will consume around 100 megawatts of power. This corresponds to the electricity consumption of a small town.

Musk emphasized that while the supply of compute GPUs has been a significant hurdle so far, the supply of electricity will become increasingly critical in the next year or two. This dual limitation underscores the challenges of scaling AI technologies to meet the growing computational demands.

Despite the challenges, advances in compute and memory architectures will enable the training of increasingly large large language models (LLMs) in the coming years. Nvidia unveiled its Blackwell B200 at GTC 2024, a GPU architecture and platform designed to scale to LLMs with trillions of parameters. This will play a crucial role in the development of AGI.

Our Partners