Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751417AbeAEBv3 (ORCPT + 1 other); Thu, 4 Jan 2018 20:51:29 -0500 Received: from mail-ot0-f175.google.com ([74.125.82.175]:45420 "EHLO mail-ot0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751155AbeAEBv2 (ORCPT ); Thu, 4 Jan 2018 20:51:28 -0500 X-Google-Smtp-Source: ACJfBovZcboUsDYE63FrZoBxjvesFILahC6huOpp05x8dtZTU1ktEiqiaSc1W1y5KDrVaKOK2MDW6QsBLWaRVgIt4GE= MIME-Version: 1.0 In-Reply-To: <5A4E9603.20778.21CD7C4@tim.ml.ipcopper.com> References: <5A4D7986.2138.FDC590CF@tim.ml.ipcopper.com> <20180104200637.GC10427@amd> <5A4E9603.20778.21CD7C4@tim.ml.ipcopper.com> From: james harvey Date: Thu, 4 Jan 2018 20:51:27 -0500 Message-ID: Subject: Re: Bricked x86 CPU with software? To: Tim Mouraveiko Cc: Pavel Machek , kernel list Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, Jan 4, 2018 at 4:00 PM, Tim Mouraveiko wrote: > Pavel, > > As I mentioned before, I repeatedly and fully power-cycled the motherboard and reset BIOS > and etc. It made no difference. I can see that the processor was not drawing any power. The > software code behaved in a similar fashion on other processors, until I fixed it so that it would > not kill any more processors. > > In case you are curious there was no overheating, no 100% utilization, no tampering with > hardware (GPIO pins or anything of that sort), no overclocking and etc. No hardware issues > or changes at all. > > Tim To clarify, by "in a similar fashion on other processors", do you actually mean you consistently bricked multiple CPUs using the same code? Or, was it just this one CPU that bricked, and it was just acting buggy on other processors? Unless you consistently bricked multiples, my bet is coincidence. In your original post, "There were signs that something was not right, that the code was causing unusual behavior, which is what I was debugging." makes me think it was a defective CPU but still functional, and died as you were debugging/running the buggy code.