Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751406AbeAEBT4 convert rfc822-to-8bit (ORCPT + 1 other); Thu, 4 Jan 2018 20:19:56 -0500 Received: from smtp-16.smcloud.com ([198.36.167.16]:17904 "HELO smtp-16.smcloud.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751100AbeAEBTx (ORCPT ); Thu, 4 Jan 2018 20:19:53 -0500 From: "Tim Mouraveiko" Organization: IPCopper, Inc. To: Pavel Machek Date: Thu, 04 Jan 2018 17:21:36 -0800 MIME-Version: 1.0 Subject: Re: Bricked x86 CPU with software? CC: linux-kernel@vger.kernel.org Message-ID: <5A4ED320.12153.79B887@tim.ml.ipcopper.com> In-reply-to: <20180104224018.GA20860@amd> References: <5A4D7986.2138.FDC590CF@tim.ml.ipcopper.com>, <5A4EA724.8051.25FC07E@tim.ml.ipcopper.com>, <20180104224018.GA20860@amd> X-mailer: Pegasus Mail for Windows (4.52) Content-type: text/plain; charset=ISO-8859-1 Content-transfer-encoding: 8BIT Content-description: Mail message body Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: > On Thu 2018-01-04 14:13:56, Tim Mouraveiko wrote: > > > > As I mentioned before, I repeatedly and fully power-cycled the motherboard and reset BIOS > > > > and etc. It made no difference. I can see that the processor was not drawing any power. The > > > > software code behaved in a similar fashion on other processors, until I fixed it so that it would > > > > not kill any more processors. > > > > > > > > > > So you have code that killed more than one processor? Save it! We want > > > a copy. > > > > > > Do you have model numbers of affected CPUs? > > > > > > Why would you want a copy? Last time I checked bricked CPUs do not work well, even as > > decorations. > > > > I believe the processors were Intel Xeon series. The code would likely run on others too. > > Well... Intel's shares are overpriced, and you have code to fix that > :-). > > Actually... I don't think your code works. That's why I'm curious. But > if it works, its rather a big news... and I'm sure Intel and cloud > providers are going to be interested. > I first discovered this issue over a year ago, quite by accident. I changed the code I was working on so as not to kill the CPU (as that is not what I was trying to). We made Intel aware of it. They didn?t care much, one of their personnel suggesting that they already knew about it (whether this is true or not I couldn?t say). It popped up again later, so I had to fix the code again. It could be a buggy implementation of a certain x86 functionality, but I left it at that because I had better things to do with my time. Now this news came up about meltdown and spectre and I was curious if anyone else had experienced a dead CPU by software, too. Meltdown and spectre are undeniably a problem, but the magnitude and practicality of it is questionable. I suspect that what I discovered is either a kill switch, an unintentional flaw that was implemented at the time the original feature was built into x86 functionality and kept propagating through successive generations of processors, or could well be that I have a very destructive and targeted solar flare that is after my CPUs. So, I figured I would put the question out there, to see if anyone else had a similar experience. Putting the solar flare idea aside, I can?t conclusively say whether it is a flaw or a feature. Both options are supported at this time by my observations of the CPU behavior.