We have been hearing for some time about the Synopsys HAPS-70 and how they have co-created the hardware and software architecture for FPGA-based prototyping with their customers. Now, we see details published by Synopsys on how they collaborated with Imagination on the design of the PowerVR Series6XT GPU.
The first thing to come to grips with is just what a beast the PowerVR Series6XT GPU is. With up to eight unified shader clusters and an array of diverse co-processor units, testing all the configurations and concurrent execution of IP blocks pre-silicon is a tall order. The danger, as designs get larger and larger, is making an error in partitioning the design onto a prototype. This hazard multiplies when customers put the PowerVR Series6XT GPU into their own designs with other IP around it.
Synopsys and Imagination worked together to tackle the partitioning of a basic two-shader cluster, some of the GPU logic, and test logic allowing synchronization of stimuli from DDR3 storage and a connection to a PC host. This spanned four Virtex-7 FPGAs on a HAPS-70 S48. The biggest part of the two-week, manual effort was iterating the partitioning to get the right combination of logic and I/O multiplexing. The result was a prototype running at 8 MHz, which allowed 7000 regression tests to be run successfully – all pre-silicon.
When attempting to scale up to the full Series6XT GPU design, it became evident that the test logic that swallowed 90% of an FPGA in the initial prototype was going to exceed 100% quickly. The logical choice would be repartitioning again, but issues with I/O multiplexing using the “manual” synthesis rules would cut the system performance to 2 MHz. This would make evaluation of the full-up GPU with live video output excruciatingly slow.
Automation came to the rescue. ProtoCompiler has the ability to synthesize code versus HAPS-aware constraints, including interconnect. The teams upped the FPGA count to six, dialed in constraints including keeping FPGA utilization to 80%, and selected a pin-muxing strategy. By using the abstraction flows feature to explore FPGA-to-FPGA interconnects quickly, typically in less than a minute, ProtoCompiler was able to pick the best possible multiplexing ratio. The result was a full-up live video analysis prototype in five, not six, FPGAs running at 7.3 MHz.
One more performance tweak would make the difference. With the partitioning set, the chance to optimize interconnects came into play. The HAPS-70 supports a high-speed time-domain multiplexing feature on all connectors. ProtoCompiler understands how to assign source synchronous clocks, split multi-source nets, and other details to use the HSTDM feature. After a day of exploration of an HSTDM scheme, full-up performance was 12 MHz.
This successful effort retains all of the benefits of FPGA-based prototyping. Executing design changes in RTL is quick and easy. A host connection and debug tools allow control and visibility into the design and the test environment, facilitating sophisticated tests such as video analysis via a compressor/decompressor and frame buffer. The power of a synthesis environment that has detailed knowledge of the prototyping platform also shows the potential.
Synopsys published these results via a presentation at the SNUG Japan sessions in September 2014, and a short article in 4Q2014 edition of Synopsys Insight (on page 7). The author, Andy Jolley of Synopsys who worked directly with the Imagination teams, is presenting a live webinar to discuss his findings on Feburary 4, 2015 – the event is now open for registration:
Whether you are looking to the use the PowerVR Series6XT GPU or just facing a design of similar complexity, the lessons learned from this development are worth a look.