WP_Term Object
(
    [term_id] => 497
    [name] => ArterisIP
    [slug] => arterisip
    [term_group] => 0
    [term_taxonomy_id] => 497
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 92
    [filter] => raw
    [cat_ID] => 497
    [category_count] => 92
    [category_description] => 
    [cat_name] => ArterisIP
    [category_nicename] => arterisip
    [category_parent] => 178
    [is_post] => 1
)

Timing Closure Complexity Mounts at FinFET Nodes

Timing Closure Complexity Mounts at FinFET Nodes
by Tom Simon on 01-27-2017 at 7:00 am

Timing closure is the perennial issue in digital IC design. While the specific problem that has needed to be solved to achieve timing closure over the decades has continuously changed, it has always been a looming problem. And the timing closure problem has gotten more severe with 16/14nm FinFET SoCs due to greater distances between IPs, higher performance requirements and lower drive voltages. The timing closure problems will only get worse in 10nm and 7nm SoCs.

By today’s standards, the complexity of early timing closure challenges seems quaint. Initially on-chip delays were dominated by gate delays. Later on, as the process nodes shrank, wire delays became the main factor. Wire lengths grew longer and wires became thinner and developed higher aspect ratios. The taller thinner wires exhibited increased capacitive and coupling delays aggravated by resistive shielding.

Still, designers were able to address these issues with logic changes, buffer insertion and clock tree optimization. For many years clock tree synthesis (CTS) was neglected by the major P&R vendors. Around 2006 Azuro shook up the CTS market, realizing big gains in performance, area and power reductions with their improved CTS. Cadence later acquired them and now we see attention to improving CTS from Synopsys as well. Big changes have occurred with concurrent logic and clock optimization.

But the problem of timing closure occurs not only inside of P&R blocks but also between them. Often within blocks it is possible to avoid multi-cycle paths. However, connections between blocks at nodes like 28nm and below are not so easy to deal with. According to Arteris, with a clock running at 600MHz, you can reasonably expect ~1.42ns of usable cycle time per clock cycle. Assuming a transport delay of .63 ns/mm, it is only possible to cover 2.2mm before registers need to be inserted into a data line. And in most 28 nm SoCs, there are a large number of paths which are longer than 2.2mm.

The process of improving timing becomes myriad, with designers torn between a huge number of trade offs. Low threshold gates are faster but can wreak havoc with power budgets. Likewise adding pipeline stages for interconnect between major blocks must be weighed carefully because of power and resource limitations. Ideally chip architects can look ahead and anticipate when there will be timing issues later in the flow as chip assembly is taking place. However, this does not always work out as planned. The burden often falls to the backend place-and-route team to rectify unaddressed timing closure issues.

Furthermore, when timing closure issues at the top level are identified late in the flow, they can necessitate iterations back to the front end team, causing massively expensive delays. The history of digital IC design is filled with innovations to deal with timing closure. Early placement and routing tools were the first tools used to address timing issues. They were quickly followed by floor planning tools. The new floor planning tools were very good at estimating IP block parameters, but not so good at optimizing the placement of the interconnect that exists between the IP blocks.

The designs most prone to difficult timing issues are large SoCs at advanced nodes. Their complexity has grown explosively. For timing closure within blocks history has shown that linkages with the front end can help back end tools do their job better. The same is likely with connections between blocks.

In fact, over the last couple of years we have seen increasingly sophisticated approaches to top level interconnect in SoCs. One example is the adoption of Network on Chip (NoC) for making connections between blocks more efficient, providing reduced area, and offering higher performance with lower power. Arteris, a leading provider of Network on Chip technology has recently hinted that NoC may be key in gaining further improvements to top level timing closure.

The largest SoCs, CPUs and GPUs are scaling up in size dramatically. The upper bounds have reached over 10-15 Billion transistors. Timing closure in these designs is paramount. However, the scale of the problem has moved beyond the ability of one part of the flow to provide a comprehensive solution. Front to back integration will be essential. I predict that 2017 will prove to be a pivotal year for solutions to timing closure in SoCs.


0 Replies to “Timing Closure Complexity Mounts at FinFET Nodes”

You must register or log in to view/post comments.