Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp5443193ybh; Wed, 7 Aug 2019 06:09:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqxbHP3SDejGFBvO4AtqJTemoxlJHjWvjXyb/olaMcw/BboV8tACOT8UYFncAz+Atcb10sgW X-Received: by 2002:a63:6ec1:: with SMTP id j184mr6254212pgc.232.1565183360311; Wed, 07 Aug 2019 06:09:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565183360; cv=none; d=google.com; s=arc-20160816; b=AdrlrmXzq2Iqdytz8GcsnTvp2NnUqDVao+bndXy4XccEiUgZ6C0l9or6Dwn5rTrq0Y MhTFhIKUBkPR7nV4XKFHIAMOv6FJixZ5Xt406rwBSiSq8ipoOqA3HMZjT9OjsVFp1X6y THqSnfmtOQV/Z/q0jHnHFTeojaw0CUyw5aPhrzw5/tpw30f6h/ZdQhu//zZJzKHabJi/ 91y47s1G3YISqqN7mlgGFI5pVH2zus1L5L6EJOcWhXa/i7HiUO5witymvGuzGgh+Uw/E lXOipitPehU87hugOTnLx6kFkiNI6ukveqmee+14gumOg9rHYpEUtL0/e+92ANvdVjvK 64vQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=miZT2gFwkLWxT3pOhNbUODrgbuPlcwiZBN2KngqPOrA=; b=KciuHjhVMaIDBnBbmWLrgUXvzbk3WrNx3d6X/bchRrtoqergHw7l6gpE1PO8kE0Gqo mxrLik/MzcC6tviATvhJnoMfWaJB0HlG/KoNcG8/LNVvVjPovfS2WEvBl3605Xpc5b1F bcbp3NHjs8mlVbnXSogm/0he2R1ZTE2ZSFNEL7F+ZAsPBLv0kljWv8eFzJ+vX71vSza+ APcy6wUunjMbl6itH5B76Ryl6/pfNFsfyAXEgqHvd6Fb8Gj75vocQCDs3FilTKfeo/qX 05VNoKAIMg3x+2gf5lGE2UdLLim41dZKTzLQSh86iDQyzI3kqsj6L2MTH4mKE9OHBAHN QegQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i32si17120229pje.44.2019.08.07.06.09.03; Wed, 07 Aug 2019 06:09:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729814AbfHGNIS (ORCPT + 99 others); Wed, 7 Aug 2019 09:08:18 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49650 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726873AbfHGNIS (ORCPT ); Wed, 7 Aug 2019 09:08:18 -0400 Received: from p200300ddd742df588d2c07822b9f4274.dip0.t-ipconnect.de ([2003:dd:d742:df58:8d2c:782:2b9f:4274]) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1hvLfH-0000vJ-Ia; Wed, 07 Aug 2019 15:07:39 +0200 Date: Wed, 7 Aug 2019 15:07:33 +0200 (CEST) From: Thomas Gleixner To: Pingfan Liu cc: Dave Young , Andy Lutomirski , x86@kernel.org, Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Dave Hansen , Peter Zijlstra , Masami Hiramatsu , Qian Cai , Vlastimil Babka , Daniel Drake , Jacob Pan , Michal Hocko , Eric Biederman , linux-kernel@vger.kernel.org, Baoquan He , kexec@lists.infradead.org, tony.luck@intel.com, Xunlei Pang Subject: Re: [PATCH 0/4] x86/mce: protect nr_cpus from rebooting by broadcast mce In-Reply-To: <20190807075226.GA10392@mypc> Message-ID: References: <1564995539-29609-1-git-send-email-kernelfans@gmail.com> <20190807025843.GA4776@dhcp-128-65.nay.redhat.com> <20190807075226.GA10392@mypc> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 7 Aug 2019, Pingfan Liu wrote: > On Wed, Aug 07, 2019 at 11:00:41AM +0800, Dave Young wrote: > > Add Tony and Xunlei in cc. > > On 08/05/19 at 04:58pm, Pingfan Liu wrote: > > > This series include two related groups: > > > [1-3/4]: protect nr_cpus from rebooting by broadcast mce > > > [4/4]: improve "kexec -l" robustness against broadcast mce > > > > > > When I tried to fix [1], Thomas raised concern about the nr_cpus' vulnerability > > > to unexpected rebooting by broadcast mce. After analysis, I think only the > > > following first case suffers from the rebooting by broadcast mce. [1-3/4] aims > > > to fix that issue. > > > > I did not understand and read the MCE details, but we previously had a > > MCE problem, Xunlei fixed in below commit: > > commit 5bc329503e8191c91c4c40836f062ef771d8ba83 > > Author: Xunlei Pang > > Date: Mon Mar 13 10:50:19 2017 +0100 > > > > x86/mce: Handle broadcasted MCE gracefully with kexec > > > > I wonder if this is same issue or not. Also the old discussion is in > > below thread: > > https://lore.kernel.org/patchwork/patch/753530/ > > > > Tony raised similar questions, but I'm not sure if it is still a problem > > or it has been fixed. > > > > Xunlei's patch is the precondition of the stability for the case 2: boot > up by "kexec -p nr_cpus=" Correct. The only dangerous issue which is then left is that an MCE hits _before_ all CPUs have CR.MCE=1 set. That's a general issue also for cold boot. Thanks to the brilliant hardware design, all we can do is pray.... > For case1/3, extra effort is needed. > > Thanks, > Pingfan > > > > > > *** Back ground *** > > > > > > On x86 it's required to have all logical CPUs set CR4.MCE=1. Otherwise, a > > > broadcast MCE observing CR4.MCE=0b on any core will shutdown the machine. Pingfan, please trim your replies and remove the useless gunk after your answer. Thanks, tglx