There is a new edition of Hennessy and Patterson’s book Computer Architecture: A Quantative Approach out. The 5th edition.
There is lots of fascinating stuff in it but what is especially interesting to those of us who are not actually computer architects, is the big picture stuff about what they choose to cover. Previous editions of the book have emphasized different things, depending on what the issues of the day were (RISC vs CISC in the first edition, for example). The more recent editions have mostly focused on instruction-level parallelism, namely how do we wring more performance from our microprocessor while keeping binary compatibility. I was at the Microprocessor Design Forum years ago and a famous computer architect, I forget who, said that all of the work on computer architecture over the years (long pipelines, out of order execution, branch prediction etc) had sped up microprocessors by 6X (Hennesy and Patterson reckon 7X in the latest edition of their book). All the remaining 1,000,000X improvement had come from Moore’s law.
If there are a few themes in the latest edition they are:
- How to do computation using the least amount of energy possible (battery life for your iPad, electricity and cooling costs for your warehose scale cloud computing installation)?
- How can we make use of very large numbers of cores efficiently?
- How can we structure systems to use memory efficiently?
The energy problem is acute for SoC design because without some breakthroughs we are going to run head on into the problem of dark silicon: we can design more stuff, especially multicore processors, onto a chip than we can power up at the same time. It used to be that power was free and transistors were expensive but now transistors are free and power is expensive. It is easy to put more transistors on a chip than we can power on.
The next problem is that instruction level parallelism has pretty much reached the limit. In the 5th edition it is relegated to a single chapter as opposed to being the focus of the book in the 4th edition. We have parallelism at the multicore SoC level (e.g. iPhone) and at the cloud computing (e.g. Google’s 100,000 processor datacenters). But for most stuff we simply don’t know how to exploit the architecture. Above 4 cores most algorithms see no speedup if you add more, and in many cases actually slow down, running more slowly on 16 cores than on 1 or 2.
Another old piece of conventional wisdom (just read anything about digital signal processing) is that multiplies are expensive and memory access is cheap. But a modern Intel microprocessor does a floating point multiply in 4 clock cycles but takes 200 to access DRAM.
These 3 walls, the power wall, the ILP wall, and the memory wall means that whereas CPU performance used to double every year and a half it is now getting to be over 5 years. How to solve this problem is getting harder too. Chips are too expensive for people to build new architectures in hardware. Compilers used to be thought of as more flexible than the microprocessor architecture but now it takes a decade for new ideas in compilers to get into production code, whereas the microprocessor running the code will go through 4 or 5 iterations in that time. So there is a crisis of sorts because we don’t know how to program the types of microprocessors that we know how to design and operate. We live in interesting times.