Mike Muller’s keynote focused on a lot of changes since the ARM1 was designed in 1983 when ARM the company did not exist and ARM was the next generation processor for Acorn Computer, which was really in the hobby market and had its first boost when they had a contract to design the BBC Microcomputer to go along with a computer literacy project. My minor claim to fame about the ARM1 was that I installed the VLSI Technology design software that was used to design it on two Apollo workstations (which hadn’t actually shipped so I had to go and install it at Apollo not Acorn).
Mike decided to dig up the layout of the ARM1, which turned out to be more of a challenge than expected, involving finding an Exabyte drive, converting VAX files to Linux and converting from CIF (which VLSI Technology used for layout) to GDSII. The original ARMs were full custom designs (this predated synthesis). ARM have recently announced a low end microcontroller, the Cortex-M0. It turns out that this is roughly the same number of transistors as the ARM1 so it makes an interesting basis for comparison.
The chips are about 26 years apart, or 13 process generations, so it should be about 1/2[SUP]13[/SUP] smaller (namely 1/8000) and it is actually 1/10,000 times smaller so about right. Performance should be about 6 performance scalings, so 64 times as fast but it is only 16 times as fast. This is because the 5V power supply that the ARM1 used should have scaled down to 8mV but in fact it has only scaled to 950mV and so the transistor threshold voltages haven’t scaled enough.
The big change in design is that the ARM1 took 6 months to layout. The Cortex-M0 took 32 minutes. Basically synthesis, place and route has automated the whole process whereas the ARM1 was custom. But that is really the only major improvement in design methodology.
Mike then looked at the design productivity. The ARM1 took 6 man-years (MY) and was 25,000 transistors. The dual Cortex-A15 took 150MY and is 150 million transistors. Luckily design productivity has increased 240 times. It is when you look at software that things are scary. The ARM1 graphics library was 0.25MY of work and was 150 lines of code (Loc). Assembly of course. The current ARM GPU, the Mali-T604 has OpenGl, open CL and other graphics support and is 190MY of work and 1M LoC. Just a 7 times increase in productivity.
Mike pointed out that the hardware people shouldn’t be complacent. We came up with synthesis and P&R so that we can essentially compile our chips. But we haven’t come up with anything comparable since. Software people have moved onto Python, cloud computing, development environment and lots of new goodies. So apart from the few people left having to write device drivers, the way software is being developed is changing fast.
Next Mike moved onto validation. The ARM1 was in 3um, had 24,000 transistors and took 6 MY to validate. The Cortex-M0 is in 20nm, 32,000 transistors (not much more) and took 11 MY to validate. But the big difference is the machine resources brought to bear. The ARM1 took 2,000 hours to validate but the Cortex-M0 took 1,439,000 hours to validate. Taking the speed of the machines into account this is 3,000,000 times less efficient. We waste a lot of computer cycles early, especially with constrained random verification, in order to avoid silicon respins later.
Mike feels we need to get more formal design techniques. Today formal verification is stuck on the side of the design process as an “optional extra” to be run by a specialist, rather than something embedded deeply in the design process. Without this we are stuck with constrained random and burning computer and verification engineer cycles, and formal verification not part of every designers job.
As an aside, Mike talked a bit about building ARM’s datacenter in the parking lot. It has 200TFLOPS and 93TB DRAM. It consumes 1.5MW (luckily there was a new housing tract going in across the street so they actually managed to get a line like that installed). The UPS consists of two parts, firstly 6 spinning flywheels which can each deliver 250KW for 20 seconds. Then two 785bhp quad turbo diesel generators that can get up to full power in 8 seconds. Looks pretty good too!
There is a video the keynote here. The video starts with the DAC award session and the keynote itself starts at 29 minutes in. Wait and let enough of the video load and then you can skip straight to the start (unless you really want to see a re-run of the award session).