Tech

Tesla’s Dojo, a timeline | TechCrunch

Elon Musk doesn’t want Tesla to be just an automaker. He wants Tesla to be an AI company, one that’s figured out how to make cars drive themselves. 

Crucial to that mission is Dojo, Tesla’s custom-built supercomputer designed to train its Full Self-Driving (FSD) neural networks. FSD isn’t actually fully self-driving; it can perform some automated driving tasks, but still requires an attentive human behind the wheel. But Tesla thinks with more data, more compute power and more training, it can cross the threshold from almost self-driving to full self-driving. 

And that’s where Dojo comes in. 

Musk has been teasing Dojo for some time, but the executive has been ramping up discussions about the supercomputer throughout 2024. Dojo’s importance to Tesla might be existential – with EV sales slumping, investors want assurances that Tesla can achieve autonomy. Below is a timeline of Dojo mentions and promises. 

2019

First mentions of Dojo

April 22 – At Tesla’s Autonomy Day, the automaker had its AI team up on stage to talk about Autopilot and Full Self-Driving, and the AI powering them both. The company shares information about Tesla’s custom-built chips that are designed specifically for neural networks and self-driving cars. 

During the event, Musk teases Dojo, revealing that it’s a supercomputer for training AI. He also notes that all Tesla cars being produced at the time would have all hardware necessary for full self-driving and only needed a software update.

2020 

Musk begins the Dojo roadshow

Feb 2 – Musk says Tesla will soon have more than a million connected vehicles worldwide with sensors and compute needed for full self-driving — and touts Dojo’s capabilities. 

“Dojo, our training supercomputer, will be able to process vast amounts of video training data & efficiently run hyperspace arrays with a vast number of parameters, plenty of memory & ultra-high bandwidth between cores. More on this later.”

August 14 – Musk reiterates Tesla’s plan to develop a neural network training computer called Dojo “to process truly vast amounts of video data,” calling it “a beast.” He also says the first version of Dojo is “about a year away,” which would put its launch date somewhere around August 2021.

December 31 — Elon says Dojo isn’t needed, but it will make self-driving better. “It isn’t enough to be safer than human drivers, Autopilot ultimately needs to be more than 10 times safer than human drivers.”

2021

Tesla makes Dojo official

August 19 – The automaker officially announces Dojo at Tesla’s first AI Day, an event meant to attract engineers to Tesla’s AI team. Tesla also introduces its D1 chip, which the automaker says it will use — alongside Nvidia’s GPU — to power the Dojo supercomputer. Tesla notes its AI cluster will house 3,000 D1 chips. 

October 12 – Tesla releases a Dojo Technology whitepaper, “a guide to Tesla’s configurable floating point formats & arithmetic.” The whitepaper outlines a technical standard for a new type of binary floating-point arithmetic that’s used in deep learning neural networks and can be implemented “entirely in software, entirely in hardware, or in any combination of software and hardware.”

2022

Tesla reveals Dojo progress

August 12 – Musk says Tesla will “phase in Dojo. Won’t need to buy as many incremental GPUs next year.”

September 30 – At Tesla’s second AI Day, the company reveals that it has installed the first Dojo cabinet, testing 2.2 megawatts of load testing. Tesla says it was building one tile per day (which is made up of 25 D1 chips). Tesla demos Dojo onstage running a Stable Diffusion model to create an AI-generated image of a “Cybertruck on Mars.”

Importantly, the company sets a target date of a full Exapod cluster to be completed by Q1 2023, and says it plans to build a total of seven Exapods in Palo Alto. 

2023

A ‘long-shot bet

April 19 – Musk tells investors during Tesla’s first-quarter earnings that Dojo “has the potential for an order of magnitude improvement in the cost of training,” and also “has the potential to become a sellable service that we would offer to other companies in the same way that Amazon Web Services offers web services.”

Musk also notes that he’d “look at Dojo as kind of a long-shot bet,” but a “bet worth making.”

June 21 — The Tesla AI X account posts that the company’s neural networks are already in customer vehicles. The thread includes a graph with a timeline of Tesla’s current and projected compute power, which places the start of Dojo production at July 2023, although it’s not clear if this refers to the D1 chips or the supercomputer itself. Musk says that same day that Dojo was already online and running tasks at Tesla data centers. 

The company also projects that Tesla’s compute will be the top five in the entire world by around February 2024 (there are no indications this was successful) and that Tesla would reach 100 exaflops by October 2024.

July 19 – Tesla notes in its second-quarter earnings report it has started production of Dojo. Musk also says Tesla plans to spend more than $1 billion on Dojo through 2024.  

September 6 – Musk posts on X that Tesla is limited by AI training compute, but that Nvidia and Dojo will fix that. He says managing the data from the roughly 160 billion frames of video Tesla gets from its cars per day is extremely difficult. 

2024

Plans to scale

January 24 – During Tesla’s fourth-quarter and full-year earnings call, Musk acknowledges again that Dojo is a high-risk, high-reward project. He also says that Tesla was pursuing “the dual path of Nvidia and Dojo,” that “Dojo is working,” and is “doing training jobs.” He notes Tesla is scaling it up and has “plans for Dojo 1.5, Dojo 2, Dojo 3 and whatnot.”

January 26 – Tesla announced plans to spend $500 million to build a Dojo supercomputer in Buffalo. Musk then downplays the investment somewhat, posting on X that while $500 million is a large sum, it’s “only equivalent to a 10k H100 system from Nvidia. Tesla will spend more than that on Nvidia hardware this year. The table stakes for being competitive in AI are at least several billion dollars per year at this point.”

April 30 – At TSMC’s North American Technology Symposium, the company says Dojo’s next-generation training tile — the D2, which puts the entire Dojo tile onto a single silicon wafer, rather than connecting 25 chips to make one tile — is already in production, according to IEEE Spectrum

May 20 – Musk notes that the rear portion of the Giga Texas factory extension will include the construction of “a super dense, water-cooled supercomputer cluster.”

June 4 – A CNBC report reveals Musk diverted thousands of Nvidia chips reserved for Tesla to X and xAI. After initially saying the report was false, Musk posts on X that Tesla didn’t have a location to send the Nvidia chips to turn them on, due to the continued construction on the south extension of Giga Texas, “so they would have just sat in a warehouse.” He noted the extension will “house 50k H100s for FSD training.”   

He also posts

“Of the roughly $10B in AI-related expenditures I said Tesla would make this year, about half is internal, primarily the Tesla-designed AI inference computer and sensors present in all of our cars, plus Dojo. For building the AI training superclusters, NVidia hardware is about 2/3 of the cost. My current best guess for Nvidia purchases by Tesla are $3B to $4B this year.”

July 1 – Musk reveals on X that current Tesla vehicles may not have the right hardware for the company’s next-gen AI model. He says that the roughly 5x increase in parameter count with the next-gen AI “is very difficult to achieve without upgrading the vehicle inference computer.”

Nvidia supply challenges

July 23 – During Tesla’s second-quarter earnings call, Musk says demand for Nvidia hardware is “so high that it’s often difficult to get the GPUs.” 

“I think this therefore requires that we put a lot more effort on Dojo in order to ensure that we’ve got the training capability that we need,” Musk says. “And we do see a path to being competitive with Nvidia with Dojo.”

A graph in Tesla’s investor deck predicts that Tesla AI training capacity will ramp to roughly 90,000 H100 equivalent GPUs by the end of 2024, up from around 40,000 in June. Later that day on X, Musk posts that Dojo 1 will have “roughly 8k H100-equivalent of training online by end of year.” He also posts photos of the supercomputer, which appears to use the same fridge-like stainless steel exterior as Tesla’s Cybertrucks. 

XXX

July 30 –  AI5 is ~18 months away from high-volume production, Musk says in a reply to a post from someone claiming to start a club of “Tesla HW4/AI4 owners angry about getting left behind when AI5 comes out.” 

August 3 – Musk posts on X that he did a walkthrough of “the Tesla supercompute cluster at Giga Texas (aka Cortex).” He notes that it would be made roughly of 100,000 H100/H200 Nvidia GPUs with “massive storage for video training of FSD & Optimus.”




Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button