Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753418AbeAFA6m convert rfc822-to-8bit (ORCPT + 1 other); Fri, 5 Jan 2018 19:58:42 -0500 Received: from smtp-16.smcloud.com ([198.36.167.16]:11598 "HELO smtp-16.smcloud.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752981AbeAFA6l (ORCPT ); Fri, 5 Jan 2018 19:58:41 -0500 From: "Tim Mouraveiko" Organization: IPCopper, Inc. To: james harvey Date: Fri, 05 Jan 2018 17:00:27 -0800 MIME-Version: 1.0 Subject: Re: Bricked x86 CPU with software? CC: Pavel Machek , kernel list Message-ID: <5A501FAB.5052.58CB6DD@tim.ml.ipcopper.com> In-reply-to: References: <5A4D7986.2138.FDC590CF@tim.ml.ipcopper.com>, <5A4E9603.20778.21CD7C4@tim.ml.ipcopper.com>, X-mailer: Pegasus Mail for Windows (4.52) Content-type: text/plain; charset=ISO-8859-1 Content-transfer-encoding: 8BIT Content-description: Mail message body Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: > On Thu, Jan 4, 2018 at 4:00 PM, Tim Mouraveiko wrote: > > Pavel, > > > > As I mentioned before, I repeatedly and fully power-cycled the motherboard and reset BIOS > > and etc. It made no difference. I can see that the processor was not drawing any power. The > > software code behaved in a similar fashion on other processors, until I fixed it so that it would > > not kill any more processors. > > > > In case you are curious there was no overheating, no 100% utilization, no tampering with > > hardware (GPIO pins or anything of that sort), no overclocking and etc. No hardware issues > > or changes at all. > > > > Tim > > To clarify, by "in a similar fashion on other processors", do you > actually mean you consistently bricked multiple CPUs using the same > code? Or, was it just this one CPU that bricked, and it was just > acting buggy on other processors? > > Unless you consistently bricked multiples, my bet is coincidence. In > your original post, "There were signs that something was not right, > that the code was causing unusual behavior, which is what I was > debugging." makes me think it was a defective CPU but still > functional, and died as you were debugging/running the buggy code. We live and we die by coincidence. The processor was functioning fine without the code. It showed no signs of any problems. I had run a prior version of the code, then ran it without any of that code and it was fine. As I launched the nth version of the code, I thought of something and made another change. As I turned around to install it, the screen was showing that it had just executed that nth version of the code and then didn?t progress any further. I was actually glad it froze because I was able to gather the results of the execution of the code, which I needed for fine-tuning. It was only after hitting the reset button several times that it occurred to me that there was something wrong because the screen remained static. I had added the code in hopes of speeding up the catching of a bug (that I caught later without that code). The code made other processors behave the same way. I did not mean that I consistently bricked processors - I removed the code entirely to avoid exactly that.