Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp7933413ybi; Tue, 9 Jul 2019 06:34:52 -0700 (PDT) X-Google-Smtp-Source: APXvYqyAfhGOm636m5N+DF79har9qCkC6ZaeUhYiAHdWycV2larqaqt8lDvuaGK3rQbx3IQ7OAjU X-Received: by 2002:a17:902:9a42:: with SMTP id x2mr32705953plv.106.1562679292115; Tue, 09 Jul 2019 06:34:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562679292; cv=none; d=google.com; s=arc-20160816; b=Q5Y+mM5vz/fmcOa2YHa9LCKfCZQxCcNigZy5LorkK6Ew6iGxEKKk6ewwnAQnvaUz8V hZfeGIBcp4SLa0ZPoswU34dXBoEmEkgNWyqvxRMTw1+3Sp5k4VxzfRhlESUT2qyH+HT7 BNwQ4q1lkZOi76MnayFB/4yLTpgGA2QmjX5sd+H/Pyn7714t8otmke0ZSp2/VWuc5uu2 kmEWmgEcEnZH8kgYL9jdfovIRNSr53S47+S9Ozlt5RaCH96ylVC2fhAS3lVr/SkH8g07 nr9mTuS9Ydt87mdqw3FYhwn9cS54d/ujCGJCjQssTnVGjr7YmK5L44DlSMTlHUTCBUpM TA4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=qhZ9K+aj4bVR1kRvCDHu8fuOy09gt6Svugsy5dV+WVg=; b=KVKEzMSnJzvo6tDPSpWg8ta8Wz3W4a5nYq9QaV7Nj28j0P20PUAEQ+NspAJQ2abFs8 /2HamEiks6j0+ifJ3/CSvMcSSoc+/YUFXeSI4n9sdMntq/97mkOu1TKJh/TB8foHPV2V ghUEYM2sauZbGFRXy06bEwA7zViBp7cQhIS8zIVKZFU9Zu/fp/ABsTZfD5iVQYonpwoH byWvOUyTGvhZhhu/95QYKePCIWPyI9RBmxo3hQI6EA3wpEGIFAsB0K7a3IMcD8iq4fgD AYCLxW0UnyJxYkSi3hcrkktcnRyRVpcBwUWUcrEIV/Sqp+RPdnLI+g3P77jsmm79whVC i3+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=WCdvs4TX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c15si21377793pfr.73.2019.07.09.06.34.35; Tue, 09 Jul 2019 06:34:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=WCdvs4TX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726776AbfGINeM (ORCPT + 99 others); Tue, 9 Jul 2019 09:34:12 -0400 Received: from mail-io1-f65.google.com ([209.85.166.65]:37529 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725947AbfGINeL (ORCPT ); Tue, 9 Jul 2019 09:34:11 -0400 Received: by mail-io1-f65.google.com with SMTP id q22so21345497iog.4 for ; Tue, 09 Jul 2019 06:34:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=qhZ9K+aj4bVR1kRvCDHu8fuOy09gt6Svugsy5dV+WVg=; b=WCdvs4TXN+9GIYowkSuMdFEQ4syQGfEZq2fbK0yY5/dxQ6VxHf5j0VW5zBrHDuTpRH 8yrWc2pxIhaGaPSisqSDiwG+lDG1fmjDlEVHOwRDxGCfaSDw2MR4bLNGfDtaNXNDFZQr 79/X1Zn29OVo72pzTbe9DS0hT4MaiOMHN44t5xG/CiQs9VFOd0aZZxXqOHZDUD/dTAB3 Fhp7n2gtRcxGFN62nscplP3PVRYe//8U0jn3LXtCfYi1s7RijU7lUsgwpLPCHRZeq8// eLLQi5KsnejJ7IvX4qJdTI/lxpQN/TUlaUYQfwZHYHWIcwis6oZNTHDkZJ28dVpuhAes iWxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=qhZ9K+aj4bVR1kRvCDHu8fuOy09gt6Svugsy5dV+WVg=; b=sU7GU/NKF90vb+S0FLSh+5lbu9MpLsqFx2uW7n2KfGhdUiQzF8JIRAzgPHdvUDFfOZ wmslYdNkqk7fwVWSYuf5mIDJyJhUOELiWYq4wGesbSTGlPBnT+VZVd/eqoG6+yTt8jDB q0ORfoJ7M8jXssv9LXhV/eK/Al0kjZiEmbaAb8UGofpXaklYzxzctRPDN/6pRZGWnyPr u0WVjxWxucEyoOgswjZCBlL+hLslNvZIzDdIJjDTMcsMtp6q5GgWXd0v43eVAJW4utZa lvxqb2iJ3yJdXG34OZsMeipMus98oR3lIGIxgWEjRtDdk/1pjsPgeAIvepL8/OQlxpxz Pe8A== X-Gm-Message-State: APjAAAVBwblOc6mIfcfn4RSZase7U7nUYwDk/dIfdIDKs31N3DQwt5md i54dyfDpjO9VGHLn3up5cFKMCA== X-Received: by 2002:a6b:fd10:: with SMTP id c16mr23902581ioi.217.1562679250687; Tue, 09 Jul 2019 06:34:10 -0700 (PDT) Received: from ?IPv6:2601:281:200:3b79:d6e:1b00:ea8e:79ea? ([2601:281:200:3b79:d6e:1b00:ea8e:79ea]) by smtp.gmail.com with ESMTPSA id v3sm11452430iom.53.2019.07.09.06.34.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 09 Jul 2019 06:34:09 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: [PATCH 2/2] x86/numa: instance all parsed numa node From: Andy Lutomirski X-Mailer: iPhone Mail (16F203) In-Reply-To: Date: Tue, 9 Jul 2019 07:34:08 -0600 Cc: Thomas Gleixner , x86@kernel.org, Michal Hocko , Dave Hansen , Mike Rapoport , Tony Luck , Andy Lutomirski , Peter Zijlstra , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Andrew Morton , Vlastimil Babka , Oscar Salvador , Pavel Tatashin , Mel Gorman , Benjamin Herrenschmidt , Michael Ellerman , Stephen Rothwell , Qian Cai , Barret Rhoden , Bjorn Helgaas , David Rientjes , linux-mm@kvack.org, LKML Content-Transfer-Encoding: quoted-printable Message-Id: <4AF3459B-28F2-425F-8E4B-40311DEF30C6@amacapital.net> References: <1562300143-11671-1-git-send-email-kernelfans@gmail.com> <1562300143-11671-2-git-send-email-kernelfans@gmail.com> To: Pingfan Liu Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jul 9, 2019, at 1:24 AM, Pingfan Liu wrote: >=20 >> On Tue, Jul 9, 2019 at 2:12 PM Thomas Gleixner wrote= : >>=20 >>> On Tue, 9 Jul 2019, Pingfan Liu wrote: >>>> On Mon, Jul 8, 2019 at 5:35 PM Thomas Gleixner wro= te: >>>> It can and it does. >>>>=20 >>>> That's the whole point why we bring up all CPUs in the 'nosmt' case and= >>>> shut the siblings down again after setting CR4.MCE. Actually that's in f= act >>>> a 'let's hope no MCE hits before that happened' approach, but that's al= l we >>>> can do. >>>>=20 >>>> If we don't do that then the MCE broadcast can hit a CPU which has some= >>>> firmware initialized state. The result can be a full system lockup, tri= ple >>>> fault etc. >>>>=20 >>>> So when the MCE hits a CPU which is still in the crashed kernel lala st= ate, >>>> then all hell breaks lose. >>> Thank you for the comprehensive explain. With your guide, now, I have >>> a full understanding of the issue. >>>=20 >>> But when I tried to add something to enable CR4.MCE in >>> crash_nmi_callback(), I realized that it is undo-able in some case (if >>> crashed, we will not ask an offline smt cpu to online), also it is >>> needless. "kexec -l/-p" takes the advantage of the cpu state in the >>> first kernel, where all logical cpu has CR4.MCE=3D1. >>>=20 >>> So kexec is exempt from this bug if the first kernel already do it. >>=20 >> No. If the MCE broadcast is handled by a CPU which is stuck in the old >> kernel stop loop, then it will execute on the old kernel and eventually r= un >> into the memory corruption which crashed the old one. >>=20 > Yes, you are right. Stuck cpu may execute the old do_machine_check() > code. But I just found out that we have > do_machine_check()->__mc_check_crashing_cpu() to against this case. >=20 > And I think the MCE issue with nr_cpus is not closely related with > this series, can > be a separated issue. I had question whether Andy will take it, if > not, I am glad to do it. >=20 >=20 Go for it. I=E2=80=99m not familiar enough with the SMP boot stuff that I wo= uld be able to do it any faster than you. I=E2=80=99ll gladly help review it= .=