The latest advanced driver-assistance systems (ADAS) like Mercedes’ Drive Pilot and Tesla’s FSD perform SAE Level 3 self-driving, with the driver ready to take back control if the vehicle calls for it. Reaching Level 5 – full, unconditional autonomy – means facing a new class of challenges unsolvable with existing technology or conventional approaches. From a silicon perspective, it requires SoCs to scale in performance, memory usage, interconnect, chip area, and power consumption. In a new white paper, neural network processing IP company Expedera envisions ultra-efficient heterogeneous SoCs for Level 5 self-driving solutions, increasing AI operations while decreasing power consumption in realizable solutions.
TOPS are only the start of the journey
Artificial intelligence (AI) technology is central to the self-driving discussion. Sensors, processing, and control elements must carefully coordinate every move a vehicle makes. But there’s a burning question: how much AI processing is needed to get to Level 5?
If you ask ten people, ten different answers come back, usually with something in common: it’s a big number. Until recently, conversations have been in TOPS, or trillions of operations per second. Some observers talk about Level 5 needing 3 or 4 POPS – peta operations per second. It may not sound like that big a deal since earlier this year, one SoC vendor announced a chip for self-driving applications with 1 POPS performance. They describe it as an “AI data center on wheels.” But, when asked what their power consumption is, they’re less forthcoming. Ditto for the transistor count or die size, probably massive.
These aren’t issues in a data center, but they are in a car. Every watt of power and pound of weight going into self-driving electronics cuts electric vehicle range, and bigger die sizes drive up wafer and package costs. Larger, more complex chip footprints often mean higher on-chip latency. Scaling AI inference TOPS without other improvements will soon run into a wall.
The self-driving processing pipeline workload
That’s not to say having more TOPS now doesn’t reveal helpful information about the compute workload. There are many unknowns – which sensor payloads provide better information, which AI models will perform best in the self-driving software stack, and what form it ultimately takes. Expedera’s white paper takes an in-depth look at the processing pipeline looking for answers, starting from a conceptual diagram.
Changes in the sensor package are ahead. There’s a debate around camera-only systems and whether they can detect all scenarios necessary to ensure safety. More sensors of different types and higher resolutions will likely appear and drive up processing requirements. In turn, more intensive AI models will be needed – and, in a fascinating observation from Expedera based on customer conversations, a self-driving processing pipeline may have ten, twenty, or more AI models operating concurrently.
Expedera expands on each of these phases in the white paper, looking at where compute-intensive tasks may lie. To deal with this self-driving workload, they anticipate a two- to three-order of magnitude increase in AI operations. At the same time, an order of magnitude decrease in power consumption (measured as thermal design power, or TDP) must occur for realizable implementations. According to Expedera, these combined effects are leaving GPUs in the dust when used as a tool for AI inference in vehicles.
Ultra-efficient heterogeneous AI inference for scale
What could take the place of a GPU in more efficient AI inference? A neural network processing unit (NPU) as part of ultra-efficient heterogeneous SoCs, after overcoming limitations of classical NPU hardware Expedera identifies. Scaling drives latency up and determinism down. Hardware utilization is low, maybe only 30 to 40%, driving area and power consumption up. Multi-model execution poses problems in scheduling and memory usage. And partitioning TOPS to fit workloads may not be possible within choices made in a custom SoC architecture.
Some themes Expedera sees in ultra-efficient heterogeneous SoC discussions with customers:
- Fire-and-forget task scheduling is crucial, with a simple runtime where jobs start and finish predictably, and tasks can be reordered to fit the models and workload.
- Independent, isolated AI inference engines are a must, where available TOPS are sliced into configurable pieces of processing to dedicate to groups of tasks.
- Higher resolution, longer-range sensors generate more intermediate data in neural networks, which can oversubscribe DDR memory.
- IP blocks that worked well in SoCs at lower performance levels prove unrealizable when scaled up – taking too much area, too much power, or both.
Expedera’s co-designed hardware and software neural network processing solution hits new levels of TOPS area density, some 2.7x better than their competition, and pushes hardware utilization as high as 90%. It also enables OEMs to differentiate SoCs and explore different AI models, avoiding risks of impacts from models changing and growing down the road to Level 5.
We’ll save more details of the solution and the discussion of ultra-efficient heterogeneous SoCs for Level 5 self-driving for the Expedera white paper itself – which you can download here:Share this post via: