Bronco Webinar 800x100 1

Getting Physical to Improve Test – White Paper

Getting Physical to Improve Test – White Paper
by Tom Simon on 08-26-2020 at 6:00 am

Calculating Total Critical Area

One of the most significant and oft repeated trends in EDA is the use of information from layout to help drive other parts of the design flow. This has happened with simulation and synthesis among other things. Of course, we think of test as a physical operation, but test pattern generation and sorting have been netlist based operations. However, just as it was instrumental in other domains, physical information has now been shown to greatly assist with test pattern generation and selection.

In a very interesting white paper released by Mentor, a Siemens business, on the topic of Critical Area Based Test Pattern Optimization for High Quality Test, the authors Ron Press and Andreas Glowatz discuss how physical information from the design can help predict which patterns are going to effectively find the most likely faults. Mentor uses what they call Total Critical Area (TCA) to help assess the likelihood of certain faults occurring. With this information patterns can be gauged based on their effectiveness in reducing defects per million (DPM).

ATPG test patterns have the goal of detecting every possible fault in a design. The truth is that while it is possible, it is not practical. So, test teams spend enormous amounts of time trying to decide what patterns to use. The Mentor paper points out that even if you can detect an extremely high percentage of possible faults, the ones you miss might be the most common. To remedy this, they are looking at the geometry associated with potential faults to assign a priority to them. For instance, the likelihood of an interconnect bridge depends on a number of parameters. The diagram below from the paper shows how TCAs are calculated for their example.

Calculating Total Critical Area

Mentor’s methodology not only looks at interconnect but looks at cell internals and interactions between adjacent cells. The standard cell library is analyzed by Calibre to produce cell aware models for faults. LEF/DEF data is used to add information about potential interconnect faults. All this is combined to produce the User Defined Fault Model (UDFM) which is design specific. With the UDFM, Mentor’s Tessent TestKompress can help produce the optimal test patterns to find the most important faults and reduce DPM.

The white paper does a good job of explaining each of the cell aware defect types that are modeled as cell-internal defects. In addition, they summarize interconnect fault types and inter-cell bridge defects. Taken together these new fault models are referred to as automotive-grade ATPG. This acknowledges the much higher fault detection rates that they make possible. There is also a short section on small delay defects and how they are handled. The paper also explains the command sequence that would be used to load and sort the pattern set to optimize fault detection based on TCAs.

TCA offers an innovative and rational system for weighted test pattern selection and sorting to help achieve the lowest DPM. Mentor continues to innovate in their test products. They have been a leader in the area for a long time and continue to show that they are investing to ensure that their leadership is maintained. The white paper has a comprehensive appendix showing the details of the critical area reporting. The body of the paper goes into more detail than can be covered here. If you are interested in learning more about the application of TCA, the white paper is available for download and reading from the Mentor website.


Xilinx Moves from Internal Flow to Commercial Flow for IP Integration

Xilinx Moves from Internal Flow to Commercial Flow for IP Integration
by Daniel Payne on 08-25-2020 at 10:00 am

Xilinx IP min

I’ll never forget first learning about Xilinx when they got started back in 1984, because the concept of a Field Programmable Gate Array (FPGA) was so simple and elegant, it was rows and columns of logic gates that a designer could program to perform any logic function, then connect that logic to IO pads to drive other chips on the board. At first their chips were used to gather up all of the glue-logic found on a printed circuit board which was hundreds of logic gates, but today that situation is drastically different because with the new Adaptive Compute Acceleration Platform (ACAP) architecture they have managed to pack 50 billion transistors on a single die with an impressive IP library.

Xilinx has figured out how to offer SoC designers lots of features through IP blocks, and the fastest-growing market segments are loving it: AI, ADAS, IoT, Cloud computing, data center. Designers now have access to a wide range of configurable IP like:

  • Application processors, ARM cores
  • Real time processors
  • High Bandwidth Memory (HBM)
  • RF, ADC, DAC
  • SerDes
  • Programmable IO: CCX, DDR
  • Network-On-Chip
  • Programmable Logic

Nitin Navale from Xilinx presented at DAC on the topic, “IP Integration Challenges of Domain-Specific Architectures on Programmable SoC Platforms“, so I watched the 22 minute presentation and summarize the major findings in this blog.

Near the end of an SoC project all of the physical IP needs to be integrated together in a correct manner, quickly and with disk storage efficiency. The top-level for a Xilinx chip is constructed in a hierarchical fashion, but with different layout styles, like:  Custom layout, Place & Route (P&R). In addition, each IP block is either developed internally, or provided by a third party. Data formats for IP blocks can be in GDS II, LEF/DEF or the newer Oasis format.

Here’s a diagram showing how P&R blocks, custom blocks and 3rd party blocks can be assembled in a complex hierarchy to form a Xilinx SoC.

An IP block may contain P&R regions and some Custom, or P&R and some 3rd party, or Custom with P&R blocks, so lots of permutations are possible. It’s important that consistent versions of an IP are maintained across the entire hierarchy, so no mismatching versions can be allowed.

As an IP block has been verified clean and is no longer changing, it gets marked as Golden, so no further revisions are accepted. Having a QA methodology for checking and preserving IP integrity is a requirement, because getting to market with first-pass silicon success is what keeps a company profitable.

Xilinx formerly used an internal flow for IP integration, but now they prefer using a commercial EDA tool flow. Using the previous hierarchy diagram as a basis, consider how we want to trace the chip hierarchy in order to identify all IP layout data to build a merge list:

  1. Retrieve IC layouts for all top-level blocks
  2. Review each layout for undefined sub-blocks
  3. Recurse on undefined sub-blocks

This diagram shows how we start out at the top-level and identify Block A, recurse into it and find Blocks B, C and D. Recurse into Block D and find the lowest leaf cells.

Custom blocks are defined using OpenAccess, so the integration task is to stream out the GDS II from OA. Most 3rd party designs are stored on Unix as a GDS II file, so integration requires retrieving the Unix path. A P&R block could be saved either as Oasis or GDS II, so a custom script makes that decision.

In step 2 they use another utility to recurse and find any undefined sub-blocks. In the example both Bock B and C are defined as P&R blocks, while Block D is made up of 3rd party  IP.

This methodology continues for all blocks on the Xilinx chip, identifying between 40 and 70 layout files, so finally it’s time to merge the data. The internal CAD tool flow required two steps, creating a very large intermediate GDS II file, then converting that GDS II into the final Oasis file to satisfy the foundry. Problems with the internal tool flow:

  • Slow run times
  • Disk intensive
  • Cannot mark a layout as Golden

The new tool adopted at Xilinx for IP merge integration is called Skipper, and it’s from Empyrean. There were several reasons that Skipper was so attractive:

  • Fastest speed for both GDS II and Oasis viewing
  • Fast IP merging, 5-10X speedup
  • One step method, instead of two
  • Smaller disk space usage
  • Integrated with 3rd party DRC/LVS tools, easy to debug
  • Speedy LVL runtimes

The proof of these improvements is shown in a comparison of both runtime and disk usage on three test cases:

Runtime speed improvements range from 3.6X to 11.5X, while disk usage is quite impressive at 31X to 42.5X smaller when using Skipper.

Summary

A 50 billion transistor programmable SoC from Xilinx is a dazzling achievement because of the sheer engineering challenges, and the CAD team has moved from an internal IP integration methodology to a commercial tool flow using Skipper from Empyrean. Both run time improvements and disk efficiency metrics were quite convincing to make the switch. Watch the entire 22 minute video on YouTube.

Related Blogs


Netlist CDC. Why You Need it and How You do it.

Netlist CDC. Why You Need it and How You do it.
by Bernard Murphy on 08-25-2020 at 6:00 am

netlist cdc min

The most obvious question here is “why do I need netlist CDC?” A lot of what you’re looking for in CDC analysis is really complex behaviors, like handshakes between different clock domains, correct gray coding in synchronizing FIFOs, eliminating quasi-static signals and the like. Deeply functional, system-level intent stuff. How on earth could you deal with that at the netlist level? Not to mention that you’re already battling massive levels of false negatives. Wouldn’t these be a thousand times worse in a netlist design? Well yes, but – synthesis, ECOs, CTS and the like can mess up a lot of details relevant to CDC, during implementation. Which is why you also have to run CDC on netlists. The question then is – how?

Yes, you still have to do RTL-based CDC

First, I don’t believe anyone is claiming you can ignore CDC analysis at RTL and start from scratch on a netlist. RTL CDC is still essential to do all that functional/intent-based analysis, and to build up lists of exceptions and waivers. Because you’re going to need those controls and constraints for your netlist-based CDC. You’re also going to have to be careful with what netlist names you use in those controls. For example, not using names that may disappear in synthesis. The constraints and controls you develop in this RTL-level analysis will then be the starting point for netlist-based CDC.

Design changes in implementation

Second, what does implementation change that requires re-running CDC? Quite a lot. From a big-picture behavior point of view it’s low-level stuff, though still very important to CDC correctness. Some of these spring from optimizations in synthesis, for example design restructuring to optimize timing. Flip-flops move around in paths, causing CDC-correct pre-synthesis logic to become invalid. Some power related capabilities must be inserted in implementation. Some DFT choices are floorplan sensitive (such as MBIST controller sharing), requiring they be updated in implementation. Clock-tree balancing is a very delicate task, continuing to evolve in implementation, not only to optimize skews but also as clock gating plans evolve.

New CDC problems can also arise through ECOs. A functional, timing or power problem discovered late in implementation may force a logic change in the netlist, one reason we need netlist equivalence checking. Unfortunately EQ doesn’t help with CDC. Going all the way back to the RTL to fix a functional problem is too big a hit to the schedule at this late stage. More changes, more need to recheck CDC.

Glitching

Logic gates can glitch when two or more inputs change at the same time. The output briefly flips then flips back again. Not a big deal when the gate is safely cocooned between two levels of registers. But it can be a very big deal if the gate is controlling a clock or an asynchronous reset around a clock (or reset) domain crossing. These hazards can’t always be fixed at RTL. Synthesis can choose how to implement functions like muxes, for example, if you don’t carefully constrain the choice. The default synthesis selections may be glitch prone. You can’t figure this out until after synthesis.

VC SpyGlass for netlist CDC

Synopsys has adapted SpyGlass CDC to be well aligned with these netlists CDC needs. Designers benefit with VC SpyGlass’ ability to support huge designs (1.5 billion+) and a robust Tcl environment to reduce design setup between implementation and verification flows, including the ability to develop their own design query scripts and generation of custom reports. The tool also supports features I’ve mentioned in earlier blogs, such as machine learning root cause analysis (ML RCA) help to generate constraints to reduce noise and hierarchical flow support to improve

You can learn more from this white paper.

Also Read:

The Big Three Weigh in on Emulation Best Practices

Synopsys Presents SAT-Sweeping Enhancements for Logic Synthesis

DAC Panel – Artificial Intelligence Comes to CAD: Where’s the Data?


Semiconductors Not as Bad as Expected!

Semiconductors Not as Bad as Expected!
by Bill Jewell on 08-24-2020 at 4:00 pm

img 5f444eedf131d

In the early stages of the global COVID-19 pandemic, most forecasters expected the semiconductor market to decline in 2020, including our May Semiconductor Intelligence projection of a 6% drop. However, the semiconductor market has shown surprising strength so far this year. WSTS reported the 2Q 2020 semiconductor market was only down 0.9% from 1Q 2020. The top semiconductor companies had mixed results for 2Q20. Nvidia’s acquisition of Mellanox and Infineon’s acquisition of Cypress increased their 2Q20 revenues. These acquisitions are excluded in the comparison of 2Q20 to 1Q20. The total revenues of the twelve companies listed were up 3.9% in 2Q20 from 1Q20. Seven of the twelve companies had revenue declines. The memory companies (Samsung, SK Hynix, Micron and Kioxia) were up 12.7% while the non-memory companies were down 1.4%.

The consolidated 3Q20 guidance of the non-memory companies is 4% revenue growth from 2Q20. However, the guidance is dragged down by Intel’s guidance of -7.7%. Excluding Intel from the non-memory companies, the resulting 3Q20 guidance is 15%.

The major memory suppliers did not provide guidance for 3Q20 revenues, except for Micron Technology. However, they all stated new products in smartphones and videogames are expected to drive revenue growth in the second half of 2020. Apple is expected to introduce its 5G iPhone 12 models in October. Microsoft will release its Xbox Series X gaming system in November. Sony is expected to release its PlayStation 5 gaming system before the end of 2020. In late June, Micron said it expected revenues to increase from 6% to 15% for its quarter ending in late August. However, in a financial conference earlier in August, Micron said the quarter would be “somewhat weaker” than its previous guidance.

IDC in late May and early June of this year forecast double-digit declines in 2020 for both smartphones and PCs. July data on 2Q 2020 shipments showed smartphones were down 16% versus a year ago – in line with IDC’s June forecast of an 11.9% decline in 2020 smartphone shipments. July data on 2Q 2020 PC shipments showed a surprising 11% year-to-year increase driven by increased use of PCs for working and learning from home during the pandemic. IDC will certainly revise its 2020 PC forecast upward from its May forecast of a 15% decline, possibly showing an increase in PC units for the year 2020.

The global recession of 2008 to 2009 was the worst economic downturn since the great depression in the 1920s and 1930s. In 2009, the global GDP declined 1.7% and U.S. GDP declined 2.5%, according to the World Bank. The current global recession will certainly be much worse than 2009. The June economic forecast from the International Monetary Fund (IMF) was a 4.9% decline in global GDP in 2020. The advanced economies will be hit the worst, with an 8% decline. Emerging and developing economies are expected to decline 3%. Ironically, the only major economy expected to show growth in 2020 is China – the original source of the COVID-19 outbreak. The IMF projects global GDP will recover to 5.4% growth in 2021. Looking at the net change in GDP from 2019 to 2021, the advanced economies should see a 3.6% decline while the emerging and developing economies should see a 2.7% increase.

As pessimistic as the IMF economic update was in June, the COVID-19 pandemic has become more severe since. According to Johns Hopkins University, worldwide COVID-19 cases more than doubled from 10.5 million at the end of June (after the IMF report) to 23.4 million as of August 23. U.S. cases also more than doubled during the same time period, 2.6 million to 5.7 million. Recently cases have been increasing in hard-hit countries which seemed to have had COVID-19 under control such as the UK, Spain, France, and Italy.

Against this backdrop, we at Semiconductor Intelligence find it difficult to expect much of an increase in the world semiconductor market in 2020. However, the strength of the PC market and the relatively optimistic 3Q 2020 guidance of several major semiconductor companies indicates the semiconductor market will fare better than the overall economy in 2020. We are forecasting 1% growth in the semiconductor market in 2020 and 8% growth in 2021. Several recent forecasts for 2020 are around 3%. The Cowan LRA Model predicts 5.2% growth in 2020, but this model is based on historical trends and does not account for the current pandemic. For 2021, the Cowan LRA model forecasts 4.4% growth, WSTS expects 6.2%, and Semico Research predicts “low double digit” growth (shown as 10% on the chart).

We are in uncharted territory in 2020 – the worst pandemic in 100 years and the worst economic downturn in 90 years. The relative strength of the semiconductor market compared to the overall economy is largely due to the shifting nature of human interaction. Most countries have placed restrictions on workplaces, schools, and retail outlets. Thus, people increasingly work, learn and shop from home. This increases demand trends for PCs, smartphones, computing infrastructure and communications infrastructure. Even after the COVID-19 pandemic is over, many of these trends will continue – making semiconductors and electronics an even more important part of the global economy.

Also Read:

Semiconductors up in 2020? Not so fast

Is the Worst Over for Semiconductors?

COVID-19 and Semiconductors


Moving to Deeply Scaled Nodes for Power? There is a Better Way

Moving to Deeply Scaled Nodes for Power? There is a Better Way
by Mike Gianfagna on 08-24-2020 at 10:00 am

AGGIOS Definition

Did you know you can save 30% to 60% power without spending a fortune on a process migration? There is a better way than moving to deeply scaled nodes for power. Read on…

Have you heard of AGGIOS? You will. The name stands for AGGregated IO Systems, and a team of ex ARM and Qualcomm engineers are re-inventing power management. I’ll explain what AGGIOS is up to in a moment, but first a bit of backstory is in order to set the stage for why AGGIOS technology is so important.  It’s a story of “three P’s”.

Price

For decades, semiconductor companies rode the Moore’s Law curve and migrated to the next process node every two to three years to get the latest boost in performance and reduction in power and area. Lately, that is slowing down quite a bit with a cycle of five to seven years (or more). The price to migrate is skyrocketing and the power benefits are rapidly diminishing.

Gartner estimates the price tag to design a 7nm SoC to be $270 to over $400 million. This is at least 3x more than the design cost for a 16nm SoC. Costs per chip stay manageable only if end user needs are addressed and the market is large enough to absorb design costs. If you are Apple or Samsung, you have an infrastructure and customer base to accomplish this.

For everyone else these are rare air prices. Gartner estimates that 7nm delivers 65% power reduction over 16nm. A good result, but at a steep price.  For perspective, from 7nm to 5nm the power gain is just 20% – 30%, according to TSMC and Samsung.

Performance

Performance is no longer the leader of the technology adoption curve. For a long time, it was. You could just move to the next process node, increase your clock speed and have a new, competitive product. The flattening of Moore’s and Dennard’s Laws has changed that.

Instead, system design approaches exploit parallelism. Hardware accelerator and multi-core architectures with high-speed communication backbones are leading the way to superior performance. As these technologies significantly stress the power budget, the same need for power reduction exists.

Power

Because of battery life demands, thermal constraints and overall cost to operate data centers, power optimization has become a primary business and technology driver in the semiconductor sector. Even governments are involved with power reduction mandates. All roads lead to reduced power consumption.

A lot of power reduction strategies focus on hardware and process. I’ve touched on process improvements and the associated price tag. Hardware techniques such as clock gating or voltage and frequency scaling can successfully reduce power consumption when closely aligned with software execution. There is an “elephant in the room” problem with all this, however. These methods can deliver good power reduction, but, as with other approaches, it takes a lot of in-house engineering effort, talent and a company-wide vertically integrated hardware/software infrastructure to make it effective. Again, Apple or Samsung has it. For everyone else, it is a challenge.

Recap

Effective power management is something everyone needs but only a few can afford. Moving to the next process node is prohibitively expensive and the resultant power reduction is going down in advanced nodes. Performance improvements are being driven by architectural innovations that further stress the power budget. Hardware level approaches can reduce power, but the reductions take a lot of skill, effort and costly infrastructure. If you are moving to deeply scaled nodes for power there is a better way.

A Revolutionary Software-Based Approach to Power Management

A basic truth is that SoCs consume power when they run software. Hardware alone has a hard time understanding the dynamic behavior of the application and system software and its impact on power. What would happen if we enable software to directly optimize every milliwatt? AGGIOS saw the opportunity this presented early-on and developed a patented Software Defined Energy Management system. Their technology delivers fine-grained control of hardware power consumption to the software developer and provides fast and accurate feedback on the impact of software optimizations.

Typical power savings range from 30% to 60% with AGGIOS. So, you can achieve the same or better power reduction associated with expensive process migration and extensive engineering effort without hardware modifications or process migration. At last, what used to be an expensive and labor-intensive but valuable portion of power management is now available to all, not just the special few.  AGGIOS products act exclusively through automated system software and firmware optimization enabling longer lasting, cooler and smaller electronic devices delivered on schedule with much lower cost. Their approach can be applied to any SoC or FPGA architecture.

What would you do with all that power savings? You can read about some real case studies in a white paper from AGGIOS. They document actual results on a series of Xilinx Zynq UltraScale+ MPSoC applications using Xilinx Targeted Reference Designs (TRDs) and other reference applications. The applications include video streaming, video deep learning, ECC processing, memory throughput and two software-defined radios. Their white paper provides a lot of detail on the hardware and software architecture of the applications, how AGGIOS software is applied to the design and the detailed results they achieved.

Speaking of results, the actual power savings reported range from 35% to an eye-popping 86%. Recall that a difficult and expensive move from 16nm to 7nm delivers 65% power savings. The AGGIOS approach is even more effective at deeply scaled nodes as it can account for and even exploit the high variability of these processes to reduce energy consumption or increase performance.

If a new, cost-efficient and highly effective power management strategy sounds appealing, you need to download this white paper. It just may change the direction of your next project, especially if advanced technology migration is being considered or if you’re concerned that gains just won’t be enough. There is a better way than moving to deeply scaled nodes for power. You can download the white paper, titled AGGIOS Seedlings Power Reference Designs: Xilinx UltraScale+ here. 


High-throughput Workloads Get a Boost from Altair

High-throughput Workloads Get a Boost from Altair
by Daniel Nenni on 08-24-2020 at 6:00 am

Altair PBS Professional 2020 1

Altair PBS Professional™ is the trusted leader in high-performance computing workload management. It efficiently schedules HPC workloads across all forms of computing infrastructure, and it scales easily to support systems of any size — from clusters to the largest supercomputers.

Scheduling for high-throughput workloads just got easier. With the release of Altair® PBS Professional® 2020 we’ve expanded our industry-leading HPC workload manager with brand new capabilities, including hierarchical scheduling that can handle the biggest volumes of small, high-throughput jobs.

PBS Professional 2020

The new hierarchical scheduler built into PBS Professional offloads the base scheduler to enable greater throughput and better license and resource utilization. Batches of short jobs are presented as one longer job while maintaining full visibility into each individual job.

In addition to delivering a single scheduler for all types of workloads, PBS Professional has more new features including:

  • Cloud bursting and dynamic extension with a built-in GUI
  • Forecasting and simulation
  • Allocation and budget management
  • Security, performance, and administrative and usability updates

And PBS Pro users still get all the tools they already rely on for workload orchestration and optimization.

Altair HPC Virtual Summit

Learn about solutions for high-throughput scheduling and more at the Altair HPC Virtual Summit September 9 and 10. We’ve packed two half-days with topics in two tracks including semiconductor design acceleration and HPC and IT optimization, kicking off with PBS Professional User Group sessions in two time zones.

Virtually network with Altair experts, partners, customers, and industry peers to learn about the leading-edge computing solutions that keep innovation moving forward with cost savings and enhanced efficiency. Featured sessions include:

  • Keynote speaker Michael Heroux of the Exascale Computing Project
  • Cloud roundtable: “Is Cloud Officially Inevitable?” featuring participants from Google Cloud, Microsoft Azure, and Oracle
  • Live Q&A with the Altair development team and technical experts

Our track for semiconductor IT pros and engineering experts includes sessions on how to drive efficiency at every step of the design process. Take chip design to the cloud with Rapid Scaling solutions that bring cloud costs closer than ever to real demand, save serious money with license-first scheduling tools, and meet the design flow mapping tools VLSI engineers at leading semiconductor organizations user to get new technology to market first.

Register today to save your spot

About Altair (Nasdaq: ALTR)
Altair is a global technology company that provides software and cloud solutions in the areas of product development, high performance computing (HPC) and data analytics. Altair enables organizations across broad industry segments to compete more effectively in a connected world while creating a more sustainable future. To learn more, please visit www.altair.com.

Also Read

Interview with Altair CTO Sam Mahalingam

Six Essential Steps For Optimizing EDA Productivity

Latest Updates to Altair Accelerator, the Industry’s Fastest Enterprise Job Scheduler


PCI Express in Depth – Physical Layer

PCI Express in Depth – Physical Layer
by Luigi Filho on 08-23-2020 at 10:00 am

PCI Express in Depth Physical Layer

In the last article, I wrote about the PCIe basic concepts. This article will reach the physical layer of the PCIe standard.

The lowest PCI Express architectural layer is the Physical Layer. This layer is responsible for actually sending and receiving all the data to be sent across the PCI Express link. The Physical Layer interacts with its Data Link Layer and the physical PCI Express link.

This layer contains all the circuitry for the interface operation: input and output buffers, parallel-to-serial and serial-to-parallel converters, PLL(s) and impedance matching circuitry. It also contains some logic functions needed for interface initialization and maintenance.

In the physical layer let’s subdivide in two sub-blocks: Logical and Electrical sub-block

Logical Sub-Block

The logical sub-block is the key decision maker for the Physical Layer. The logical sub-block has separate transmit and receive paths, referred to hereafter as the transmit unit and receive unit. Both units are capable of operating independently of one another.

The primary function of the transmit unit is to prepare data link packets received from the Data Link Layer for transmission. This process involves three primary stages: data scrambling, 8-bit/10-bit encoding, and packet framing. The receive unit functions similarly to the transmit unit, but in reverse. The receive unit takes the deserialized physical packet taken off the wire by the electrical sub-block, removes the framing, decodes it, and finally descrambles it.

The figure below illustrate this:

Remember that you need to consider each lane of the PCIe.

Some sub topics in the electrical sub-block:

  • Data Scrambling – PCI Express employs a technique called data scrambling to reduce the possibility of electrical resonances on the link. PCI Express specification defines a scrambling/descrambling algorithm that is implemented using a linear feedback shift register. PCI Express accomplishes scrambling or descrambling by performing a serial XOR operation to the data with the seed output of a Linear Feedback Shift Register (LFSR) that is synchronized between PCI Express devices
  • 8-Bit/10-Bit Encoding – The primary purpose of 8-bit/10-bit encoding is to embed a clock signal into the data stream. By embedding a clock into the data, this encoding scheme renders external clock signals unnecessary.
  • Packet Framing – In order to let the receiving device know where one packet starts and ends, there are identifying 10-bit special symbols that are added and appended to a previously 8-bit/10-bit encoded data packet

Electrical Sub-Block

As the logical sub-block of the Physical Layer fulfils the role as the key decision maker, the electrical sub-block functions as the delivery mechanism for the physical architecture.

The electrical sub-block contains transmit and receive buffers that transform the data into electrical signals that can be transmitted across the link.

The electrical sub-block may also contain the PLL circuitry, which provides internal clocks for the device.

Some sub topics in the electrical sub-block:

  • Serial/Parallel Conversion – The transmit buffer in the electrical sub-block takes the encoded/packetized data from the logical sub-block and converts it into serial format. Once the data has been serialized it is then routed to an associated lane for transmission across the link. On the receive side the receivers deserialize the data and feed it back to the logical sub-block for further processing.
  • Clock Extraction – In addition to the parallel-to-serial conversion described above, the receive buffer in the electrical sub-block is responsible for recovering the link clock that has been embedded in the data.
  • Lane-to-Lane De-Skew – The receive buffer in the electrical sub-block de-skews data from the various lanes of the link prior to assembling the serial data into a parallel data packet. This is necessary to compensate for the allowable 20 nanoseconds of lane-to-lane skew.
  • Differential Signaling – PCI Express transmit and receive buffers are designed to convert the logical data symbols into a differential signal.
  • Phase Locked Loop (PLL) Circuit – A clock derived from a PLL circuit may provide the internal clocking to the PCI Express device. Each PCI Express device is given a 100 mega-hertz differential pair clock. This clock can be fed into a PLL circuit, which multiplies it by 25 to achieve the 2.5 gigahertz
  • AC Coupling – PCI Express uses AC coupling on the transmit side of the differential pair to eliminate the DC Common Mode element. That makes the buffer design process for PCI Express becomes much simpler
  • De-Emphasis – PCI Express utilizes a concept referred to as de-emphasis to reduce the effects of inter-symbol interference.

As always, leave a comment if you want me to get into more details.


PCI Express in Depth

PCI Express in Depth
by Luigi Filho on 08-23-2020 at 8:00 am

PCI Express in Depth

This is another post that was requested by a user, and as always i’ll do my best to put in a few articles the basic information that you’ll need to understand how it works at depth level.

PCI Express (or PCIe) is a high-speed serial computer expansion bus designed to replace the older PCI, PCI-X and AGP standards.

The first principle you need understand is the LANES, a lane is composed of two differential signaling pairs, with one pair for receiving data and the other for transmitting. Thus, each lane is composed of four wires or signal traces. Conceptually, each lane is used as a full-duplex byte stream, transporting data packets in eight-bit “byte” format simultaneously in both directions between endpoints of a link.

The connection between two PCIe devices is referred to as a link, physical PCIe links may contain from 1 to 16 lanes, more precisely 1, 4, 8 or 16 lanes. Lane counts are written with an “x” prefix (for example, “x8” represents an eight-lane card or slot), with x16 being the largest size in common use. Lane sizes are also referred to via the terms “width” or “by” e.g., an eight-lane slot could be referred to as a “by 8” or as “8 lanes wide.”

Others concepts include:

  • PCIe elements types:
  1. Root Complex – Is the head or root of the connection.
  2. PCI Express-PCI bridge – As the name says has one PCI Express port and one or multiple PCI/PCI-X bus interfaces.
  3. Endpoint – is a device that can request/complete PCI Express transactions for itself
  4. Switch – are used to fan out a PCI Express hierarchy.
  • PCIe Transactions Types:
  1. Memory Transaction – Transactions targeting the memory space transfer data to or from a memory-mapped location
  2. I/O Transactions – Transactions targeting the I/O space transfer data to or from an I/O-mapped location
  3. Configuration Transactions – Transactions targeting the configuration space are used for device configuration and setup
  4. Message Transactions – PCI Express adds a new transaction type to communicate a variety of miscellaneous messages between PCI Express devices

The architecture is show in the figure below:

The next three articles will be about theses three layers: Physical Layer, Data Link Layer and Transaction Layer.

As always, leave a comment, just tell me which protocol or standard you want to know more about it.


Fully Self-Aligned 6-Track and 7-Track Cell Process Integration

Fully Self-Aligned 6-Track and 7-Track Cell Process Integration
by Fred Chen on 08-23-2020 at 6:00 am

Fully Self Aligned 6 Track and 7 Track Cell Process Integration

For the 10nm – 5nm nodes, the leading-edge foundries are designing cells which utilize 6 or 7 metal tracks, entailing a wide metal line for every 4 or 5 minimum width lines, respectively (Figure 1).

Figure 1. Left: a 7-track cell. Right: a 6-track cell.

This is a fundamental vulnerability for lithography, as defocus can change the spacing between lines [1], leading to “pitch walking.” This happens when the first and highest orders go out of phase with each other, leading to one diminishing relative to the other. EUV makes things worse by introducing asymmetry between opposite sides of the pupil [2], leading to feature position shift. To get around this, self-aligned patterning is the default alternative. This also offers an exciting opportunity for both 6- and 7-track cells to be produced at the same time on the same chip.

7-track cell process

Targeting a 14-18 nm minimum half-pitch, we expect to use self-aligned quadruple patterning (SAQP) [1]. However, the core features which guide the SAQP consist of unequally sized lines. Specifically, a larger core feature is surrounded by a pair of smaller core features. When patterned lithographically, this also encounters defocus-induced pitch walking. So, it is also preferred to use self-aligned patterning for the core features as well. Self-aligned triple patterning (SATP) [3] naturally can provide the larger feature surrounded by a pair of smaller features. Since the starting pattern is now a single size line/space pattern, there is no threat of pitch walking from defocus. Figure 2 illustrates the self-aligned spacer patterning stages of the process flow for producing the 7-track cell comprising 5 narrow lines surrounded by two wider lines. SATP followed by SAQP (2 x SADP) constitutes self-aligned duodecuple (12x) patterning (SADDP).

Figure 2. Self-aligned spacer patterning stages of the 7-track cell SADDP fabrication process flow. The fourth row indicates the core features for the SAQP stage, consisting of two SADP stages. For self-aligned cutting or blocking purposes, the finally patterned lines are assigned a red or blue color, depending on whether they are on the final spacer interior or exterior, respectively.

6-track cell process

SATP can also produce the SAQP core features for the 6-track cell process (Figure 3) for the same design rules. The main difference is that the SATP starting pitch is 22 times the minimum linewidth or half-pitch instead of 26 times in the 7-track case. With only a ~15% pitch difference, both cases can be patterned using SATP at the same time.

Figure 3. Self-aligned spacer patterning stages of the 6-track cell SADDP fabrication process flow. As in Figure 2, the fourth row indicates the core features for the SAQP stage, consisting of two SADP stages. For self-aligned cutting or blocking purposes, the finally patterned lines are assigned a red or blue color, depending on whether they are on the final spacer interior or exterior, respectively.

Note that the SAQP core features are different from the arrangement that was presented in [4], since it does not use lithographic patterning but SATP. The advantage is that self-aligned line cutting is more effective.

Self-aligned line cutting or blocking

For both the 6-track and 7-track cases presented here, alternate lines, whether wider or narrower, can be assigned to one of two selectively etchable groups (indicated by the red or blue color), depending on whether the location is on the interior or the exterior of the spacers. This is advantageous for patterning line breaks, or cuts (blocks); neighboring lines will not be damaged by cut (block) placement error from overlay. The 6-track cell here maintains this advantage over that shown in [4], where some pairs of adjacent lines from the spacer interiors may still be simultaneously etched.

Pitch walking revisited

The SADDP sequence is essentially four successive SADP stages. Pitch walking will be determined mainly by the spacer deposition thickness control. In the best case, it can be a few to several percent [5,6]. If the spacer exterior is uncovered, it is also subject to etch thinning, which can also lead to pitch walking. This can be addressed by covering the exterior with another deposited layer, against which the spacer etch is selective [7].

Conclusion

The SADDP scheme is an extremely attractive and powerful approach for patterning 6-track and 7-track standard cells for the leading-edge technology nodes. It is currently the only presented approach where the track lithography is free from pitch walking and self-aligned cutting is also conveniently supported.

References

[1] https://www.linkedin.com/pulse/application-specific-lithography-5nm-6-track-cell-frederick-chen

[2] J. Finders et al., Proc. SPIE 9776, 97761P (2016).

[3] US Patent 7807575, assigned to Micron, filed Nov. 29, 2006.

[4] J. U. Lee et al., Proc. SPIE 10962, 109620N (2019).

[5] https://applied-multilayers.com/wp-content/uploads/2017/05/PECVD-DLC.pdf

[6] https://www.researchgate.net/publication/229964576_Film_Uniformity_in_Atomic_Layer_Deposition

[7] US Patent 7732341, assigned to Samsung, filed Mar. 23, 2007.

Related Lithography Posts


HCL Webinar Series – HCL Compass Delivers Defect Tracking and More

HCL Webinar Series – HCL Compass Delivers Defect Tracking and More
by Mike Gianfagna on 08-21-2020 at 10:00 am

Screen Shot 2020 08 02 at 9.37.53 PM

Similar to my last post on the HCL DevOps webinar series, I will cover their presentation of HCL Compass in a webinar that was recorded on July 29 about how HCL Compass delivers defect tracking and more.

This webinar was presented by Steve Boone, head of product management at HCL Software DevOps, Howie Bernstein, product manager for HCL Compass and HCL VersionVault and Leah Nassar, technical lead for HCL Compass.

As before, Steve began by providing on overview of the DevOps portfolio from HCL Software.  This portfolio was launched on June 16, 2020, so these are new capabilities. Steve stressed that all capabilities can be managed on the cloud or on prem, or whatever combination makes sense for your organization. That level of flexibility is refreshing.

Similar to the previous webinar, Howie and Leah discussed the product in an informal style, with a new addition. More on that in a moment. The current Compass product has been on the market for 23 years, starting as ClearQuest from Rational Software.

Howie explained that the product began as a flexible and customizable issue and defect management tool. He went on to explain that HCL Compass delivers defect tracking and more. This includes  lifecycle management  and the ability to customize that capability without coding for a particular organization’s needs. If you’re wondering how far this can go, Howie reported that a North American government is using the tool to manage their Social Security claims process. Now that’s a departure from SoC development.

Howie described a sophisticated customization process for the product that includes a GUI to describe workflow in great detail and a web-based interface to deploy and manage the resultant application. Very flexible with no coding required. The power of the tool became clear as the discussion continued. You’ll need to watch the webinar to get the full effect.

What followed was a unique online demo of HCL Compass presented by Leah. She went through an interactive creation of a defect tracking application. This included the definition of defects. attributes and assignment to employees. Leah went on to show how to deploy the tool in production and track the status and resolution of defects. Methods to communicate between users was also explained and demonstrated. The capability is quite flexible. She explained how to use the simple, out-of-the-box capability (good for small organizations) and how to customize communication with a lot of control if you’re part of a large organization.

What followed was the development of complete defect tracking tool, built in real-time as you watched. There was interesting discussion between Howie and Leah as Howie requested Leah to add new capabilities. Leah was able to do that easily – Howie couldn’t “stump” her. Leah’s command of the product and its application scenarios was both impressive and built confidence in the tool.

As the end of the webinar approached, Steve provided an honest, “from the heart” view of what remote development looks and feels like in the current environment. There were some great observations offered about how tools like HCL Compass can change the game in this “new normal”. To hear more, you’ll need to watch the webinar.

You can learn more about Compass on the HCL website Compass page. The short description provided there is:

Low-Code/No-Code change management software for enterprise level scaling, process customization, and control to accelerate project delivery and increase developer productivity.

And finally, here is where you can watch the webinar on HCL Compass – I highly recommend it.