I wrote back in March about Undo Software. They have a reverse debugging solution called UndoDB (the DB is for debug, not database). I have a soft spot for reverse debugging ever since seeing one of the engineers at Virtutech type reverse single step and seeing the code back up a single instruction and realizing that literally months of my career could have been saved if I had had the same capability. Undo have a similar capability for Linux running on x86 and ARM. Of course, under the hood it doesn’t actually run the code backwards. What it does is that regularly it saves a snapshot of the running code and then it records all the inputs. By restoring the snapshot and re-running the code with the same inputs almost all the way, it gives the appearance of code running backwards since it is almost instantaneous.
Reverse debugging is a nice productivity kick for day-to-day programming. But it is especially powerful for tracking down the really difficult bugs where the source of the bug and the detection of the bug are a long way apart and where the source is completely non-obvious. For example, a data structure is corrupted and has been over-written. The code that did the over-writing might be anywhere. This is especially difficult in systems such as mobile phones where the code doesn’t run the same every time and errors can be very intermittent, or tools like simulation that take in one language and turn it into binary code dynamically, which typically means many software tools such as static analysis don’t work.
One company that has been making use of UndoDB is one that you are very familiar with: Cadence, in their advanced verification solutions (AVS) business unit. Synopsys and Mentor are also customers, apparently. Cadence AVS used the tool internally to track down hard-to-find bugs. But they also needed to be able to debug code on customer sites. Due to the crown-jewel nature of a lot of semiconductor designs, the designs often are not allowed to leave the customer’s own network and servers. Undo have an option called Out-and-about that can easily be run on customers servers. They could then use UndoDB.
Over to Jonathan DeCock, a senior software architect at Cadence:Our engineer had spent months struggling to try to track down the problem. It only struck in 1 in 300 runs, making finding it like looking for a needle in a haystack. We’d been using GDB, but that didn’t let us see what had caused the problem, as when the code failed we were so far past the point of failure that we couldn’t find the source of the bug.
They set up a 20 machine server farm on the customer site, running multiple copies of the tool 24×7 until the problem struck.
DeCock again:As soon as the code failed we got experts on the line and stepped backwards and forwards line-by-line using UndoDB. We found the bug in three hours, and it then took just two hours to solve, which was a huge win after three months of searching using other methods. Given its nature, we simply couldn’t have found it through source-code analysers, as it was generated within dynamic code.
That was my experience at Virtutech’s customers too. Bugs that once might have taken months to track down (or, in some cases, had already taken months) were solved in minutes. You just don’t have those “how on earth did that happen?” problems.
UndoDB works on ARM and x86 (32 and 64 bit) processors; Linux and Android operating systems; and with any language supported by gdb most notably C and C++.
Undo software have a case-study: Cadence Design Systems, Finding Customer-Critical Bugs with UndoDB. You can download the case-study here.Share this post via: