How do you benchmark a processor? It seems like it should be easy, just run some code and see how fast it is. Traditionally processors were indeed benchmarked by raw performance like GMACS, GFLOPS, memory bandwidth and so on. But in today’s world where systems have become very complex and applications very compute intensive, the raw numbers don’t mean very much.
If you are benchmarking a general purpose processor for something like a PC where you don’t actually know what code it will run, then there are general purpose benchmarks. However, if you are benchmarking a specialized processor that is going to run a largely fixed workload then a general purpose benchmark is completely inappropriate. Although there are designs where very high performance processors can be used to keep power low (basically “race to halt” and then power the whole system down until the next race begins), typically a fast-enough processor that minimizes power is the sweet spot (and it mustn’t take up too much area: cost is an important aspect too, of course).
A good piece to read is the Berkeley Design Technology Inc white paper (BDTI, not to be confused with Berkeley Design Automation) The Art of Processor Benchmarking: What Makes a Good Benchmark and Why You Should Care.
Good benchmarks need to be complex so as to exercise the entire system including cache hit rates, cache latency, branch prediction. The system performance is not just the raw performance of the processor core itself running a tight inner loop.
One challenge is that algorithms are constantly changing, especially in new areas such as new wireless standards, vision processing, face recognition, voice recognition and so on. Of course these are just the areas that a product can leverage and differentiate by running good software on a well-matched hardware solution. In some of these areas there are some benchmark suites but sometimes the algorithms are simply too unstable for benchmarks to have yet been created.
One area in particular where benchmarks are emerging is in vision processing. While not specifically a benchmark, the OpenCV vision processing library contains many common algorithms such as red-eye removal, object recognition, image similarity and so on. An appropriate selection of these algorithms can be used as a representative workload for evaluating a processor subsystem.
Two more benchmark suites, the San Diego Vision Benchmark Suite (SD-VBS) and the Michigan Embedded Vision Benchmark Suite (MEVBench) draw algorithms from a diverse set of application domains. SD-VBS (first published in 2009) includes 28 non-trivial computationally intensive kernels such as feature tracking, image stitch and texture synthesis. MEVBench (first published in 2011) is built using full algorithms such as virtual reality and, further, contains a subset suitable for mobile embedded vision.
This is obviously a long way from simply counting how many multiply-accumulates a processor can run when it is put in a tight loop. It requires looking at a real-world software load and actually digging down into the PPA points that can realistically be implemented in the target process. Anything less risks being completely misleading, leading to picking a processor that is not a good match for the job in hand.