WP_Term Object
    [term_id] => 13
    [name] => ARM
    [slug] => arm
    [term_group] => 0
    [term_taxonomy_id] => 13
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 358
    [filter] => raw
    [cat_ID] => 13
    [category_count] => 358
    [category_description] => 
    [cat_name] => ARM
    [category_nicename] => arm
    [category_parent] => 178

Debugging Subtle Cache Problems

Debugging Subtle Cache Problems
by Paul McLellan on 08-22-2012 at 5:11 pm

When I worked for virtual platform companies, one of the things that I used to tell prospective customers was that virtual prototypes were not some second-rate approach to software and hardware development to be dropped the moment real silicon was available, that in many ways they were better than the real hardware since they had so much better controllability and observability. An interesting blog entry on Carbon’s website really shows this strongly.

A customer was having a problem with cache corruption while booting Linux on an ARM-based platform. They had hardware but they couldn’t freeze the system at the exact moment cache corruption occurred and since they couldn’t really have enough visibility anyway, they had not been able to find the root problem. Instead they had managed to find a software workaround that at least got the system booted.

Carbon worked with the customer to put together a virtual platform of the system. They produced a model of the processor using Carbon IP Exchange (and note, this did not require the customer to have detailed knowledge of the processor RTL). They omitted all the peripherals not needed for the boot (and also didn’t need to run any driver initialization code, speeding and simplifying things). Presumably the problem was somewhere in the memory subsystem. There was RTL for the L2 Cache, AHB fabric, boot ROM, memory controller, and parts of their internal register/bus structure. This was all compiled with Carbon Model Studio to a 100% accurate model that could be dropped into the platform. And when they booted the virtual platform the cache problem indeed showed up. When you are having a debugging party it is always nice when the guest of honor deigns to show up. Nothing is worse than trying to fix a Heisenbug, where when you add instrumentation to isolate it the problem goes away. By analyzing waveforms on the ARM High-performance Bus (AHB), the cache-memory, the memory controller, and assembly code (all views that Carbon’s SOCDesigner provides) they isolated the problem. It turned out that the immediate problem was that a read was taking place from memory before the write to that memory location had been flushed from the cache, thus picking up the wrong value. In turn, the real reason was that a writethrough to the cache (which should write the cache and also write through to main memory) had been wrongly implemented as a writeback (which just goes to the cache immediately and only gets written to main memory when that line in the cache is flushed). With that knowledge it was easy to fix the problem.

They had not been able to fix this problem using real hardware because they couldn’t see enough of the hardware/software interaction, bus transactions and so forth. But with a virtual platform, where the system can be run with fine granularity, complete accuracy and with full visibility, this type of bug can be tracked down. As Eric Raymond said about open source, “with enough eyeballs, all bugs are shallow.” Well, with enough visibility all bugs are shallow too.Read more detail about this story on the Carbon blog here.


There are no comments yet.

You must register or log in to view/post comments.