Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/intel-13th-and-14th-gen-core-i9-stability-problems.20614/page-2
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Intel 13th and 14th gen Core i9 stability problems

Yeah he is the most reliable Nvidia Source on X I don't know how reliable is he for Intel but if EMR is hit there is some issue with Raptor cove/Uncore RPC is the fastest architecture they ever built after all iirc
I would recommend waiting for data on Emerald Rapids, though it sounds worth investigating. I was curious though that no one has mentioned problems with high clocked mobile parts which are mounted differently.

I wonder if any of the tech press have emailed Pat G about this directly :).
 
I would recommend waiting for data on Emerald Rapids, though it sounds worth investigating. I was curious though that no one has mentioned problems with high clocked mobile parts which are mounted differently.

I wonder if any of the tech press have emailed Pat G about this directly :).
Why not ask this in investors call 🤣
 
I'd be surprised if Intel doesn't have a team working on this already. They'd probably be able to reproduce the problem easily enough, but figuring out what's causing it will take more time. Meantime, they keep their fingers crossed and hope that the main stream press doesn't pick up on it
 
I'd be surprised if Intel doesn't have a team working on this already. They'd probably be able to reproduce the problem easily enough, but figuring out what's causing it will take more time. Meantime, they keep their fingers crossed and hope that the main stream press doesn't pick up on it
With the resources and talent at their disposal i find it hard to believe they have not figured it out yet pretty sure they did but are just silent
 
With the resources and talent at their disposal i find it hard to believe they have not figured it out yet pretty sure they did but are just silent

Honest and timely response is always the best action in such incidents.

Intel can afford to lose one generation of a particular processor sales but Intel can't afford to lose several generations of creditability and trust.
 
Apparently GamerNexus investigation and sources shows this info regarding stability
1000079929.jpg
 

Attachments

  • 1000079930.jpg
    1000079930.jpg
    150.3 KB · Views: 192
Official Intel Statement: (note they touch on the oxidation theory below)

As per Intel PR Comms:

Based on extensive analysis of Intel Core 13th/14th Gen desktop processors returned to us due to instability issues, we have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor.

Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed. Intel is currently targeting mid-August for patch release to partners following full validation.

Intel is committed to making this right with our customers, and we continue asking any customers currently experiencing instability issues on their Intel Core 13th/14th Gen desktop processors reach out to Intel Customer Support for further assistance.

July 2024 Update on Instability Reports on Intel Core 13th and 14th Gen Desktop Processors - Intel Community

This video by RobeyTech is also helpful and gives some tips on how to check your system to see if you are affected:


So that you don't have to hun down the answer -> Questions about manufacturing or Via Oxidation as reported by Tech outlets:

Short answer:
We can confirm there was a via Oxidation manufacturing issue (addressed back in 2023) but it is not related to the instability issue.

Long answer: We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue.

For the Instability issue, we are delivering a microcode patch which addresses exposure to elevated voltages which is a key element of the Instability issue. We are currently validating the microcode patch to ensure the instability issues for 13th/14th Gen are addressed.


I'll be on the thread for the next couple of hours trying to address any questions you folks might have. Please keep in mind that I won't be able to answer every question but I'll do my best to address most of them.

Thanks

Lex H. - Intel
 
Official Intel Statement: (note they touch on the oxidation theory below)

As per Intel PR Comms:

Based on extensive analysis of Intel Core 13th/14th Gen desktop processors returned to us due to instability issues, we have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor.

Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed.
Intel is currently targeting mid-August for patch release to partners following full validation.

Opinion from MLID: Intel is just trying to buy time by lowering the voltage because there is no enough CPU for RMA.
 

Opinion from MLID: Intel is just trying to buy time by lowering the voltage because there is no enough CPU for RMA.
This dude is something else only telling half baked things he was caught making videos about fake rumours 🤣🤣 the only good thing is the podcasts with the guests he has
 
I do wonder though about the scenario if Intel set something that is slowly damaging the CPU, and fixes it. now you have a CPU with a shortened lifespan; is that really fair?

AMD did this recently with 7800X3D voltage guidance that wasn't clear, allowing some board makers to post higher SoC voltages. Some 7800X3Ds failed spectacularly, while others likely silently suffered damage. Once they updated the guidance and board makers rolled out BIOSes - there was no offer to exchange your CPU, even though it could have a shorter lifespan than orignally.
 
Yeah some people are used to use their cpu for years and than give it away it ran for years it is something only time can tell but the people who got degraded deserves to get replacement.The tray CPUs are the problem as they only have a year warranty vs 3 years for boxed so they may be outside of warranty and can feel chested if you bought in six figures
 
I do wonder though about the scenario if Intel set something that is slowly damaging the CPU, and fixes it. now you have a CPU with a shortened lifespan; is that really fair?

AMD did this recently with 7800X3D voltage guidance that wasn't clear, allowing some board makers to post higher SoC voltages. Some 7800X3Ds failed spectacularly, while others likely silently suffered damage. Once they updated the guidance and board makers rolled out BIOSes - there was no offer to exchange your CPU, even though it could have a shorter lifespan than orignally.
Yeah AMD did a much better job handling this. They didn't try to push blame to their partners and were faster in communicating the problem. Now maybe it really did take this long for intel to figure it out, but if that is the case they should have been updating customers with their progress.

This dude is something else only telling half baked things he was caught making videos about fake rumours 🤣🤣 the only good thing is the podcasts with the guests he has
Yeah, someone who talks about intel (the company with a bunch of idle intel 7 fabs that will presumably be even worse this year) can't make enough RPL-S 8+16+1 dies (which is likely the smallest RPL die by volume) seems far fetched to me. Also it is pretty laughable seeing the complete lack of understanding that those RPL-W and RPL-HX chips are the same chips as desktop or that intel never said the problem was high power but rather the CPUs were requesting more voltage than they should have at the respective chip's frequency targets.

Short answer: We can confirm there was a via Oxidation manufacturing issue (addressed back in 2023) but it is not related to the instability issue.

Long answer: We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue.
For the purposes of semiwiki I think this is the most interesting nugget. Excursions happen to everyone and what separates the men from the boys is when it is found. Considering all the testing/burn-in that these sorts of products go through it is hard to imagine that many defective chips could get past without intel knowing about it. So there are two question marks for me. One is why did these chips get sent out knowing about this or recalling the allegedly small run of impacted dies? Maybe the issue was found to not be something that would be a reasonable issue and the microcode causing the chips to request more voltage than they needed made it an issue? The other question mark is what will intel do to ensure external customers that this won't happen on their products. If it was an intentional call to release these CPUs, then IF needs to demonstrate that enough inline metro is in place for any issues to be detected in-line before it gets to customer sort, and that foundry wafers will be scrapped if there are any issues. It is good business for an IDM wants to salvage every good die they can. However a foundry customer will demand that their dollars don't buy at risk material. If this oxidation excursion was an accidental slip that is a far bigger issue that would really have me squirming if I was a customer. If this was something that slipped past all of intel's quality nets, intel needs to drop everything and give current and potential customers a roadmap with milestones on how they will rebuild their quality culture from the top up (assuming said customers haven't already asked which if we are being honest they probably would have if this was a failure of IF's quality systems).
 
Last edited:
I haven't followed this in any detail and at a first reading found the Intel press release quite reassuring.

After reading @nghanaywem (above) and re-reading the Intel PR, I'm not so sure:

Short answer: We can confirm there was a via Oxidation manufacturing issue (addressed back in 2023) but it is not related to the instability issue.

Long answer: We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue.

Aren't those statements contradictory ?

The first claims they are separate issues. The second states that there is a connection between them.

I'm also uneasy about the claimed fix. If it's oxidation, then manufacturing improvements seems credible. But how is the QA ("screens") change then relevant ?

I get the impression - again from a very brief reading - that there are/were at least three technical issues at play here.

@nghanaywem is correct. All tech companies have issues like this from time to time. If you don't you're not innovating enough. It's how you deal with them that sorts out the good from the bad. You either increase customer trust or erode it.
 
I haven't followed this in any detail and at a first reading found the Intel press release quite reassuring.

After reading @nghanaywem (above) and re-reading the Intel PR, I'm not so sure:

Short answer: We can confirm there was a via Oxidation manufacturing issue (addressed back in 2023) but it is not related to the instability issue.

Long answer: We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue.

Aren't those statements contradictory ?

The first claims they are separate issues. The second states that there is a connection between them.

I'm also uneasy about the claimed fix. If it's oxidation, then manufacturing improvements seems credible. But how is the QA ("screens") change then relevant ?

I get the impression - again from a very brief reading - that there are/were at least three technical issues at play here.

@nghanaywem is correct. All tech companies have issues like this from time to time. If you don't you're not innovating enough. It's how you deal with them that sorts out the good from the bad. You either increase customer trust or erode it.
I am also suspicious, because the via oxidation thing was only uncovered by a GN source and then GN sent faulty CPU samples to a lab to be analyzed. Intel were then forced to admit that there was indeed a manufacturing problem, because the lab analysis would have proved the via oxidation claim to be true.
 
I haven't followed this in any detail and at a first reading found the Intel press release quite reassuring.

After reading @nghanaywem (above) and re-reading the Intel PR, I'm not so sure:

Short answer: We can confirm there was a via Oxidation manufacturing issue (addressed back in 2023) but it is not related to the instability issue.

Long answer: We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue.

Aren't those statements contradictory ?
Technically yes. My reading between the lines has me less concerned though. To me it reads:
-The root cause is CPUs requesting more voltage from the socket than it should be
-Hey this fab excursion did happen
-The oxidation is some secondary failure that was dependent on the voltage issue to even be a problem, so if we are being technical some small % of the failures are partially attributable to this
-Oxidation happened to such a small % of the total CPUs that the problem most of you face is highly unlikely to be anything besides the voltage issue
-We are disclosing the oxidation issue because if we said not a single CPU was impacted we would be lying

I could be wrong, but to me that is why I think the short and long answers don't perfectly match up. A case of for all intents and purposes the short answer is true, but not technically so intel need to disclose the full truth so nobody can ever say they lied or hid the full truth.

As for my theory on the oxidation being just a contributing factor rather than the problem itself my logic supporting this is the following three points:
1) it seems unlikely that large numbers of CPUs could get past die sort or even burn in without intel noticing, and if intel knew and released them anyways then if follows that they must have thought it wouldn't realistically be an issue in the intended environment over a 10 year lifetime.
2) Even if the CPU is only at 65w and clocked at 1GHz you will fry it if you pump say 5V into the core logic. I don't know exactly what the mechanisms behind high V killing chips is, but my assumption is dielectric breakdown in the ILD. Since interconnects are basically capacitors if the charge in the wires get to high you can see electricity arc from one to the other and ruin the insulative properties of the dielectric in between.
3) If the barrier layer is scuffed up on some of the CPUs then those CPUs should have an easier time arcing and potentially doing so at lower voltages. If it was really bad maybe some of the Cu migrated into the ILD and that could maybe make arcing occur at even lower voltages.

If my the above is what is happening in the field than I get why intel would say that oxidation isn't the root cause, but that some percentage of instability is related to this even if it wasn't technically the instigating event.
The first claims they are separate issues. The second states that there is a connection between them.

I'm also uneasy about the claimed fix. If it's oxidation, then manufacturing improvements seems credible. But how is the QA ("screens") change then relevant ?

I get the impression - again from a very brief reading - that there are/were at least three technical issues at play here.
That was more so me commenting on the IF side of the equation. If the issue was caught in sort that is bad for IF because foundry customers will likely be using their own OSATs. If the issue was caught inline then the screens are fine, but intel needs to make sure that suspect external material will be thrown out before it ever reaches customer hands. Think about it like food in your fridge. If I see something a little bit expired I might smell it see if their is any growths and if it looks/smells fine I might eat it to avoid wasting perfectly good food. If a restaurant did the same thing folks would be outraged because I am paying you and you fed my expired food. When you pay for something you don't want to pay for something that is maybe good or partially good, you want guaranteed 100% good. Redoing how IF does quality is only needed if the oxidized material would not have been sold if intel knew about it and somehow a large number of dies slipped past quality checks without anyone noticing, as in that instance the current systems are insufficient.
@nghanaywem is correct. All tech companies have issues like this from time to time. If you don't you're not innovating enough. It's how you deal with them that sorts out the good from the bad. You either increase customer trust or erode it.
Whie I do think that is true, but I was more so talking from the perspective of manufacturing. Sometimes a tool just breaks and starts producing output that is not within control, maybe an engineer wasn't paying attention to their control charts, operator error causes something nonstandard, etc. A good factory will have business processes and systems in place to minimize the occurrence of these events and will always catch them before it leaves the factory's walls.
I am also suspicious, because the via oxidation thing was only uncovered by a GN source and then GN sent faulty CPU samples to a lab to be analyzed. Intel were then forced to admit that there was indeed a manufacturing problem, because the lab analysis would have proved the via oxidation claim to be true.
Yeah... no. If this is something that only impacted some of the early RPL material then, it is highly unlikely GN has an oxidized CPU.
 
So, EM or SM?
Stress migration might explain why lower voltage and frequencies don't fix the issue (at least not completely). That said, the microcode fix Intel is going to release, might suggest that electromigration is more likely. Are they gambling though? I mean, it is possible that reducing max current and max voltage of such CPUs might reduce the amount of failures quite consistently. But as many mentioned already, it is fine to get parts that might still fail sooner or later? Probably they would rather spread the blaming over a longer timeframe for now, maybe they just can't afford to do the right thing immediately. And maybe they hope that the next gen of CPUs would buy them the full trust once more. If those "patched" CPUs last over 2 years, the hit they take is marginal eventually. Of course they risk losing their reputation completely if the problem explodes later and the fix is just delaying the inevitable. That's of course the worst case scenario. I don't think it's gonna happen. EM fits well imo. Probably the node is marginal and definitely not meant to be pushed so hard as they did with the i9 parts. Let's wait and see.
 
Back
Top