It is self-evident that large systems of any type would not be possible without hierarchical design. Decomposing a large system objective into subsystems, and subsystems of subsystems, has multiple benefits. Smaller subsystems can be more easily understood and better tested when built, robust 3[SUP]rd[/SUP] party alternatives may be available for some subsystems, large systems can be partitioned among multiple design teams and complete system implementation can (in principle) be reduced to assembly of finished or nearly finished subsystems.
But what makes for an optimal implementation doesn’t always align well with the partitioning that best served the purposes of logic design. Physical design teams have known this for a long time and have driven physical tool vendors to add many enhancements in support of:
· Adjusting logic partitioning to better balance sizes for physical units
· While also minimizing inter-block routing to reduce demand on top-level routing resources
· Reducing delays in long inter-block signal routes with block feedthrus
· Duplicating high-fanout ports or even logic to reduce congestion
These methods worked well and still do, to some extent, but they paper over a rather obvious problem. The burden of resolving mismatches between logic and physical structure falls entirely on the physical design team yet the line between logical and physical design is more blurred than it used to be, increasing the likelihood of iteration between these phases and therefore repeated effort and delay in re-discovering optimal implementation strategies on each iteration. In a climate of aggressive shift-left to minimize time to market and increasing cost-sensitivity disallowing any sub-optimal compromises, this approach to optimizing the logic/implementation divide is not moving in the right direction.
For those who don’t understand why logical and physical design have become so entangled, here’s a brief recap of a few examples. I’ve mentioned before the effects of low-power structure. Similar power islands may appear in widely separated parts of the logic hierarchy, yet there are obvious area and PG routing benefits to combining such logic into a single power island. But this restructuring can’t simply be moved to physical design, because changes like this must also be reflected in the RTL netlist and power intent for functional/power verification. Or think about MBIST insertion. It would be impossibly expensive to require one MBIST controller per memory in a design containing thousands of memories, so controllers are shared between memories. But the best sharing strategy depends heavily on the floorplan, and changing the strategy obviously affects the RTL netlist and DFT verification. Or think of a safety-critical design in which a better implementation suggests duplicating some logic. If that logic has been fault-injection tested, it’s not clear to me that it can simply be duplicated in implementation without being re-verified in fault-testing.
The obvious solution is to hand over more of this “coarse-grained” restructuring to logic design, leaving fine-grained tuning to the implementation team. This view has already gained traction in several design houses. The challenge though is that manually restructuring an RTL netlist can be very expensive in engineering resource and in time. Unfortunately, hierarchy in this case is not our friend. Moving blocks around a hierarchy looks easy in principle but maintaining all the right connections (rubber-banding connections) while not accidentally making incorrect connections (through naming collisions for example) is a lot harder, especially in modern SoC designs where some blocks you want to move may have hundreds or even thousands of connections.
Which makes this task a natural for automation. The objective is complex but mechanical, in restructuring (as one example) requiring large numbers of ports and nets to be added, changed or deleted, in a systematic way avoiding accidental wire-ORs. Intelligent decisions need to be made on whether fanins/fanouts should be consolidated inside a block or outside (there should be some user control over this) and there should be strategies for handling tieoffs and opens. And at the end of it all, the modified netlist should still be human-readable. You would also like to see some level of changes reflected in constraint files like UPF and SDC. Probably these still would need designer cleanup to accurately reflect modified intent, but they should be a good running start.
Sounds like magic? DeFacto offers these capabilities as a part of their STAR platform. In fact, they have been doing this in production for a while and cite some fairly compelling benchmark stats to support their claims. In one example a subsystem containing about 4K block instances, manual restructuring by a customer took 12 man-months followed by 3 man-months to verify/correct the changed design against the original. Using STAR, the same restructuring was completed in 1.5 hours (3.5 hours for bit-blasted nets) and verification was an error-free run through equivalence checking. This flow has also been used to restructure gate-level netlists up to 10M instances (65M gates).
There’s the usual problem getting customer testimonials but a couple of organizations stepped up. Socionext in Japan stated that they saved up to 3% of die area by manipulating one of their designs in gates using STAR. They added that if they had pushed harder, they felt they could have got up to 10% area saving, which is a pretty massive claim. Marvell didn’t share stats but they did say that they had built a cost-effective IP integration and design restructuring system for large SoC designs at RTL. I happen to know that Marvell have been working on solutions of this type for years, so it’s impressive that they finally settled on STAR.
I mentioned restructuring was a part of the STAR platform. More generally this platform can be used to build sub-system and SoC top-levels, to inject control fabrics (such as DFT or power management) on top of an existing netlist or to seamlessly update memory instances, for improved power or performance, though auto-generated wrappers. The platform supports a wide variety of design inputs – RTL of all flavors, IP-XACT, Excel, JSON (believe it or not) and more. It’s also scriptable through Tcl, Python and other languages. You can learn more from DeFacto’s webinar on restructuring HERE.