How XPeng uses a $500 million budget to redefine autonomous EVs
The line between software companies and traditional automakers continues to blur. At the CVPR 2026 conference in Denver, Chinese company XPeng showed that it views itself more as an artificial intelligence company than a standard manufacturer of EVs. Dr. Xianming Liu, the head of XPeng's General Intelligence Center, revealed that the brand spends roughly 300 million RMB, which equals about £30 million, every month solely on training AI models.
This massive spending adds up to nearly £364 million per year, a surprising budget for a company that delivered only around 200,000 electric cars last year. However, XPeng relies on this heavy financial commitment to push its new VLA 2.0 software into mass production.
During the conference, Dr. Liu explained a unique choice in XPeng's software architecture, stating that language can act as a bottleneck for autonomous driving systems. The first-generation system passed visual road data through language tokens before choosing a driving action; VLA 2.0 removes this middle step entirely.
XPeng engineers realized that an EV captures about two billion visual tokens every second from its cameras, but only needs 10 to 20 tokens to control the steering wheel and pedals. Forcing the system to explain the road in written words creates unnecessary computer calculations and adds dangerous delays. Instead, VLA 2.0 keeps language only as an input, allowing drivers to give verbal commands that the electric cars can easily interpret.
This new approach already operates inside actual consumer vehicles. XPeng successfully moved VLA 2.0 from the research phase into full mass production. Within the first month of its over-the-air update rollout, the software achieved a milestone by handling over 50% of the total assisted driving mileage among users.
The architecture supports both Level 2 driver assistance and Level 4 fully autonomous operations. In fact, XPeng used its new GX platform to build its first Robotaxi, which is already rolling off the production line. This dedicated vehicle has an impressive onboard computing power of 3,000 TOPS to handle complex driving environments without any human intervention.
XPeng does not rely on VLA 2.0 alone to pilot its electric cars. The software works directly with what the company calls a "world model." VLA 2.0 learns how to drive by copying millions of hours of human behavior, but the world model focuses on learning the actual laws of physics. It predicts how the surrounding environment will change and how other drivers will move in the near future.
To support this system, the company developed three specialized sub-programs: X-World, X-Foresight, and X-Cache. X-Cache is the one that reduces redundant computer processing by roughly 70% with almost no loss in image quality, allowing the system to run 2.7 times faster.
Many EV companies disagree on the best sensor layouts, but XPeng chooses a middle ground. Models like the P7 and G7 use a vision-only system for the main driving brain, mirroring the strategy used by companies like Tesla. And yet, XPeng refuses to remove traditional safety sensors completely.
Every electric car still carries three mmWave radars and 12 ultrasonic sensors. Dr. Liu explained that these extra sensors do not talk to the main AI driving brain at all. Instead, they operate a completely separate safety system for automatic emergency braking and steering. This setup ensures that if the vision AI makes a mistake, a secondary physical layer can protect the passengers.
To make these systems work, XPeng pushes the limits of the AI "Scaling Law," which states that models get smarter as you feed them more data and computing power. The current VLA 2.0 system holds billions of parameters and trains on hundreds of millions of video clips, consuming over four trillion tokens during each update iteration.
Despite the massive scale, XPeng managed its high costs by boosting efficiency. Over the past 12 months, the company improved its single-job training efficiency by 4,360% and raised its GPU hardware utilization from 40% to 90%. This allows them to maximize the value of their RMB 47.66 billion (£4.73 billion) cash reserve.
The rapid progress of the software sparked a high-stakes friendly wager within the company. Last year, XPeng CEO He Xiaopeng made a public bet with Dr. Liu about the performance of VLA 2.0: if the system fails to match Tesla's Full Self-Driving software by August 30, 2026, Dr. Liu must run naked across the Golden Gate Bridge in California.
Fortunately for the engineer, he expressed complete confidence that his clothes would stay on. Dr. Liu stated that internal testing in chaotic cities like Beijing proved that VLA 2.0 already reached parity with older Tesla systems and matches or exceeds newer versions due to the complex driving data found on Chinese roads.
XPeng sees its electric cars as physical hardware built to showcase its engineering prowess and collect data for its advanced VLA software. This perspective explains why the company licensed the VLA 2.0 architecture to Volkswagen earlier this year - VW will use it in vehicles launching in 2027.
The company plans to expand this artificial intelligence beyond the automotive industry and is preparing its "IRON" humanoid robot for mass production by the end of 2026. The intention is to deploy these robots as shopping assistants inside its retail stores starting in the first quarter of 2027, transforming the business from a traditional automaker into a physical AI enterprise.
Reader comments
Nothing yet. Be the first to comment.