Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755058AbbB0S1Y (ORCPT ); Fri, 27 Feb 2015 13:27:24 -0500 Received: from mga11.intel.com ([192.55.52.93]:8800 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755010AbbB0S1W (ORCPT ); Fri, 27 Feb 2015 13:27:22 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,661,1418112000"; d="scan'208";a="672796063" From: "Luck, Tony" To: Borislav Petkov , Prarit Bhargava CC: Naoya Horiguchi , Vivek Goyal , "linux-kernel@vger.kernel.org" , Junichi Nomura , Kiyoshi Ueda Subject: RE: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec Thread-Topic: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec Thread-Index: AQHQUkoNpYXYiYQheEug6LuKQ8Lb850E3VkAgAAP6AD//+EckA== Date: Fri, 27 Feb 2015 18:27:16 +0000 Message-ID: <3908561D78D1C84285E8C5FCA982C28F329F18A7@ORSMSX114.amr.corp.intel.com> References: <1425013116-23581-1-git-send-email-n-horiguchi@ah.jp.nec.com> <54F05080.9090605@redhat.com> <20150227120648.GA3337@pd.tnic> In-Reply-To: <20150227120648.GA3337@pd.tnic> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.140] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id t1RIRSRU010454 Content-Length: 1201 Lines: 25 > When CR4.MCE=0b and an MCE happens, it will shutdown the system, at > least on Intel, according to Tony I checked with the architects ... and I was right. If you clear CR4.MCE you'll still see the machine check - and you'll pull the big system reset lever. If you think the other cpus can survive the reset - then the right thing to do is to have any offline cpus that show up in the machine check handler just clear MCG_STATUS and return: do_machine_check() { /* offline cpus may show up for the party - but don't need to do anything here - send them back home */ if (!(cpu_online(smp_processor_id())) { mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); return; } If we are crashing because of a machine check - I wonder how useful it is to run kdump(). There are a very small set of ways that you can induce a machine check from program action - normally the problem is that something bad happened in the h/w ... a kdump will just fill your disk and waste your time looking at what the s/w was dong when the machine check happened. -Tony ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?