Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/why-the-chips-get-hot.22260/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Why the Chips Get Hot

Daniel Nenni

Admin
Staff member
I found this interesting from Asianometry:

Today's leading edge chips get hot. Like really hot. And nowadays much work is being done to try and keep them from getting TOO hot. Fancy things like dipping the whole chip into water or oil. But why do the chips get hot? And how does that heat spread? Simple question right? In this video, we explore heat at nanometer scale. Easier video ever.

Hot Spots​

Heat inside the transistor begins with hot spots. Whenever the transistor is using power, power is also being dissipated. And whenever power is being dissipated, we are also generating heat. Let us switch on a plain planar transistor, a classic one you might have seen before on this channel. It has a source, drain, and a gate stack - which itself consists of a metal gate and a thin layer of oxide called the...gate oxide. We apply a minimum threshold voltage to that transistor to activate it. In doing so, we create a lateral electric field that spreads and peaks near the drain. This field accelerates charge carriers, causing them to travel along the channel from the source to the drain. And when I say "charge carriers", I am referring to negatively charged electrons or positively charged electron holes. A transistor can transport one or the other. So you have this rush of charge carriers. As they rush towards the drain, the charge carriers collide with various things they encounter within the silicon atom lattice. These collisions last for just a few picoseconds, which is very fast. They cause the charge carriers to slow down and release part of their kinetic energy into the lattice. The released energy creates a "collective excitation" of the atoms and molecules within the lattice. Remember this phrase, "collective excitations", because I am going to come back to it later. As a result, you end up with this very small hot spot where the electric field is the strongest near the drain. At the start, this hot spot is generally understood to be just a few tens of nanometers large. But it can grow. If we raise the voltage applied to the transistor, then the electric field gets stronger. That motivates even more charge carriers to stampede through the channel. This creates yet more collisions, which leads to more heat. This whole phenomena is known as "Joule heating", and it also happens to be the operating principle behind a hot plate. If we keep running this and do not take away enough of the heat via active or passive means, then the hot spot grows, causing the transistor junction temperature - or just junction temperature - to rise. Kind of like some of the London Underground subways. I was there a few months ago and was sweating like a pig. What the heck is going on down there?!

SOIs, FinFETs and More​

Heat generated by the planar transistor's hot spots can be dissipated into the silicon substrate. There the heat can then spread towards the back of the wafer. This makes the silicon substrate a heat sink, and it is a critical one. With the planar transistor, dissipating heat is relatively easy because silicon conducts heat well and the whole device is laid down flat on the substrate. There is a nice big interface for the heat to transmit down into the heat sink. But over time, the semiconductor industry has had to move away from this device for various reasons. Because life is not fair. And because the device's dimensions got so small that we could not prevent charge carriers from crossing over from the source to the drain. A new modification introduced to address this issue was the Silicon-on-Insulator. This is where we slide a layer of some insulating material like oxide under a planar transistor, separating it from the silicon substrate. The insulator lets the transistor switch faster and also helps prevent charge carriers from burrowing under the channel. It also bestows some radiation hardness benefits. But insulator materials like these oxides also tend to be poor thermal conductors. So making this insulator layer essentially means keeping the heat from dissipating into the silicon substrate. I mean, that's why they put it into houses right? Another next-generation transistor device is the widely adopted 3D FinFET, where the channel is turned on its side giving us the eponymous “fin”. Covering the channel this way on three sides grants us better control over it. This shape continues the transistor density scaling trend and is at the core of today's leading edge digital logic devices, but also adds new thermal issues. Dissipating heat generated by the FinFETs requires sending it down the fin which is not so easy. So the thermal conductivity of a FinFET at the 14-nanometer node is just 75% that of a planar transistor. The problem worsens as we scale the transistor because it means making the fins taller and skinnier. Taller and skinnier fins get hotter easier because the heat has to travel further to dissipate into the silicon. We also bunch them together in tight clusters, which only compounds the heating problem. Looking ahead into the future, we have the N2 class of nodes, which will implement Gate-All-Around nanosheet transistors. I am super excited about these because they are going to enable a wild new world of AI chips. With Gate-all-around, the fin is replaced by a stack of ribbons, with the gate stack wrapping around the entire channel. Nifty. While we have not yet confirmed the final heat profile of these things, just looking at it can tell you that this is not going to conduct heat well. The gate oxide surrounds the channel on all sides, the hot spots are further away from the substrate than ever before, and the whole thing is quite tall.

Heat Problems​

I don't think I have to work very hard to convince you guys that heat is a bad thing for the chip. But in what ways is it so bad? First, thermal problems can cause delays in how transistors receive signals - leading to what we call timing failures. Simulations imply clock timings can be skewed by as much as 10% for every 40 degree Celsius rise. Second, heat interferes with the IC's analog circuit components. Since these parts interact with the real world, they are affected by real world conditions. Big temperature differences in the die can interfere with those signal levels, causing inaccurate performance. Third, higher junction temperatures reduce the chip's overall longevity. For example, the transistor's gate oxide between the metal gate and the channel is one of its most vulnerable components. Long term exposure to electric fields causes these gate oxides to eventually break down. Higher junction temperatures accelerate this process by increasing interactions between itself and tunneling electrons. Recent innovations like gate oxides based on exotic High-K materials only worsen this because their thermal conductivity is worse. Heat cannot easily dissipate away from the transistor.

Runaway​

Finally, heat compounds. It begets more heat. Until it runs away. A CMOS device uses energy in three ways: Switching/Dynamic power, Sub-threshold leakage power, and Short-circuit power. I am going to ignore short circuit power because it is not relevant to our conversation. So that leaves switching and sub-threshold leakage. Switching power, as the name implies, refers to the power consumed while the transistors are switching. Sub-threshold leakage is the residual current flowing through even when the voltage is below the threshold voltage. In other words, the gate is supposed to be shut but some water still be dribbling through. Normally, most of a CMOS device's overall power usage is that used by switching power, leaving leakage a small percentage. Switching power is actually weakly negatively affected by higher temperature. But junction temperature and sub-threshold leakage are strongly positively correlated. So as the transistor gets hotter, more power is being dissipated from leakage. Which in turn increases the junction temperature yet more, which in turn causes even more leakage! Baby, now you've got a stew goin', rip Carl Weathers. If you cannot stop the cycle, then you end up with a situation known as thermal runaway. Thermal runaway causes permanent damage to the chip and should be avoided at all costs. I mean, unless you like your chips to have big holes in them. Like the reactor floor in Chernobyl.

Interconnect Heat​

An Integrated Circuit has more than just transistors. There are also interconnects, and these guys get hot too. These metal wires - most often copper - carry data and power around the IC. The integrated circuit will have several layers of these interconnects on top of one another, maybe up to a dozen. These wires also self-heat due to current and resistance. Such heat can accelerate a type of deterioration known as "electromigration". This is where high current densities cause the metal atoms in the interconnects to move around - creating either voids or bumps/bubbles known as hillocks. The latter in some cases can bridge to neighboring wires and create unintended electrical connections. But broadly speaking this heat was not as bad as that experienced by the transistors. And historically it was ameliorated with a good heat sink sitting on the top of the stack. But as always, new technology advances beget new complicating factors. First, there are now more interconnect layers than ever, and they are getting smaller and denser - especially at the very lowest levels closest to the transistors. Thanks EUV! Second, as the wires got thinner, we experience longer delays due to electrical resistance and capacitance. So designers add active devices called "repeaters" to regenerate the signal and drive it faster. These use power too, so they heat up as well. And this can cause problems because modern high performance chips might use a lot of them. Maybe thousands. Third, new materials inserted between the interconnects for electrical purposes do not conduct heat very well. Particularly more recent ones based on exotic polymers. Just like with the aforementioned silicon-on-insulator device, having these new, exotic material layers are going to exacerbate existing heat issues.

Packaging Heat​

Finally we have packaging, the interface between the silicon and the outside world. Heat absolutely wrecks semiconductor packaging. Often because it is made from materials with coefficients of thermal expansion or CTE, different from that of the die itself. For example, the die is bonded to a PCB or a package using solder. This makes the solder itself a critical connection point between materials with different CTEs. So when temperatures rise, the two materials between the solder expand or contract at different rates. This in turn can cause the solder to deform and eventually crack. I have covered the heat issues with advanced packaging, which often stack multiple dies bonded together in various ways. Being so close together, heat is an ever present concern with these stacks. And we have not yet found good solutions for mitigating that at the design phase.

Heat Carriers​

Let’s talk design now. The traditional tool for understanding how heat flows within a solid or solids is Fourier's Law. The law is named after the French genius Joseph Fourier. His work mathematically describes how heat naturally spreads out from a hotter area to a cooler one. And that spread goes faster in materials with higher indices of thermal conductivity. The opposite direction doesn't happen without some form of assistance, meaning energy consumption. The law is well known and used for a variety of things from designing heat exchangers to modeling the flows of heat within the earth's crust to modeling how heat spreads from a die to a package. So it is indeed a trusty tool. But at nanometer scale - like inside a transistor - the tool falls apart due to quantum size effects. For example, the rate of heat energy spread and the resulting temperatures end up higher than what is predicted by classical formulas. So we need an alternate system. And that is modeling the movement of individual particles and quasiparticles and how they scatter. What particles and quasiparticles? Glad you asked. In a conducting solid metal, heat is carried by free electrons. Simple enough, though not really. In non-metals like silicon, the job is done by something called phonons. You can best imagine phonons as like all the atoms or molecules in a crystal lattice structure vibing together. What does that mean? Silicon crystal has a lattice structure of silicon atoms, meaning they are connected together. Imagine it as kind of like balls attached together with springs. When energy is applied to the lattice, the whole thing starts vibrating. They use the word “collective excitation” to describe this and I am strangely reminded of concert fans in the mosh pit when the beat drops. Quantum mechanics lets us treat these "collective excitations" like as if they are discrete particles. This helps us more easily comprehend something that otherwise might be somewhat difficult to grasp - which is the collective actions of an entire atomic lattice. Ergo, phonons do particle things like propagate, collide and interact with one another. We call them quasi-particles. We apply the same concept to electron holes. We treat the hole, which is really just the absence of an electron, like as if it’s a particle. Though really, it is not. Can't quite wrap your mind around it? Forget it Jake, it's quantum. Just accept it. By the way, interesting connection. The phonon concept was first proposed in 1930 by the Nobel-winning Soviet physicist Igor Tamm. Tamm appeared in another video about the Soviet hydrogen bomb program.

Modeling Nano-heat​

As they move, heat energy carriers such as electrons or phonons interact with things like atomic particles, impurities, grain boundaries between substances, nanoparticles, and even other phonons. These collisions are how heat spreads - as I mentioned earlier. At least some types of collisions. Other types of collisions might instead cause feedback effects that impede the movement of other heat carriers - almost like traffic gridlock. Understanding these complicated interactions requires us to simulate the movement and behavior of individual particles and quasi-particles. That is a computationally difficult problem that we have not yet solved. When people are at the scale of a phonon's average distance of free travel before another interaction - which for silicon is between 2 and 300 nanometers - it is better to turn to the Boltzmann particle Transport Equation for insight. The equation was first derived by Ludwig Boltzmann in 1872. But since it requires long Monte Carlo simulations and very long computing time, we make simplifications. Like for example, assuming the charge carriers are like fluids. Another high potential method that has shown up in recent years are the Non-equilibrium Green’s Functions, NEGF, which relies on a quantum theory known as Many Body Perturbation Theory to simulate electrons, phonons, and photons to varying levels of accuracy. I will just leave us at that.

Conclusion​

In the future, advanced transistors can have gate lengths as short as 6 nanometers. Intel recently demonstrated one such device at the IEEE International Electron Device Meeting (IEDM) in San Francisco. Combine that with the more complicated structures coming down the pike, and you can see a real growing need to understand how these transistors generate and dissipate heat. We are finally getting to nanoscale sizes. We need to better understand or even take advantage of the quantum phenomena that we are starting to increasingly see. Things are starting to get weird and I am all for it.

Asianometry
 
Back
Top