Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp7155885ybi; Mon, 8 Jul 2019 15:46:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqwnF63+YiAxFVr3NZPIeFQMDoYTTUr2hdlSldQQ1ah0YF4xlwk7g5Uv1+aDqlpEapYFPXk5 X-Received: by 2002:a63:61cb:: with SMTP id v194mr25018744pgb.95.1562626004764; Mon, 08 Jul 2019 15:46:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562626004; cv=none; d=google.com; s=arc-20160816; b=A7gmpYssaNiLq8t7bb7S+om6c4Zn2fDhARYuzJBPlT6vyEGkFlW8iUxLXK5G27Wi49 9Is8T2jdUqtJQ7sQ1yp63Y/3+AknunaxmtlQko/RuEWQEgvGhkF7MIQhC/hZICFb4pMp Y/6XGxwxHeSGvm73adfknnZBfbolB/eUPNILg3177mAk+19M8ztueeXgSArgoc8wGKj/ 4V2dbwoFw7euy9xDg5BJi54nt3l8uavtfKWSFPBHd3QH/aQUwlmfUF+COAyvWG9cvnXy oBSzXGsXQAUpOifSWIqOmRkR8WHm6Uzt7k7PaLCQM/xdoq6ly8ZY1MxqptYtkFL3g0vT keQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=lXiKTKdJjWmmOzImQjobEzNMyxTZp+pVBtxXzpoWeww=; b=Ei6yShkvhIl0f2NG+Ant9QI8Pld+B8ZDrZYVMc7oQYXAdDVW8d9SXPKP8vfxGIObFT gmUebLc5VYAJV0YM8ETPPjx0tUq6KfwOL2N2U46EnNzyBa1um7Uh/yMaX53JRXo8iHLt xUqR4VNek7EfrVbaexNsW2M1VZlnv39tMmNVR3QXf4eCnE9NMSDG3M7IpJRjmtd3PTK0 LGo7F3aizcWCvADUOKjbx2KO4k22YXGpCG7z/gtkvmSs7phoQn69CcSZUsfC8KgW9Qpf IsXG2InrE0Q1gRPWZBzygh7QZ9/tJ1PdBS57InLJGyWJ+dOxcvfmvpwC/3lwpZzwCA3u O17w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=uLTRVMnw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x69si19720041pgd.533.2019.07.08.15.46.30; Mon, 08 Jul 2019 15:46:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=uLTRVMnw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731080AbfGHRxs (ORCPT + 99 others); Mon, 8 Jul 2019 13:53:48 -0400 Received: from mail-oi1-f195.google.com ([209.85.167.195]:40736 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727576AbfGHRxr (ORCPT ); Mon, 8 Jul 2019 13:53:47 -0400 Received: by mail-oi1-f195.google.com with SMTP id w196so13271577oie.7 for ; Mon, 08 Jul 2019 10:53:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=lXiKTKdJjWmmOzImQjobEzNMyxTZp+pVBtxXzpoWeww=; b=uLTRVMnwZk2CQeK9NsMFG1T9iWX1eSDoHS52IKhuMBtnYB3nQ3/1U2OgUyvsCPlk/Z EI8XXNstp3Csyaez2ROx4EOJlIGjd4tLOiG39IIH4zN0JAF0ASLPuPL1/W+mqBTpwnth zWRqS6ERIPoyrbh8jQQpFpKPRXnz/4aD5kdG9by96/LhODAMI6Mxk/AH2AqgurMlqdki nz7x8/yM0VDMkAU6LHEmZIVB4scPSyYernwK/ISAIid4JoI9iImKz+AlTdEnJ2IZFdPU xaAWERIrsE1QK0rzzOmLtVBOMYVgiW0mStrMsDaag2fIp7PmU5kJZRgN+vR6txaTd6Sa MR7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=lXiKTKdJjWmmOzImQjobEzNMyxTZp+pVBtxXzpoWeww=; b=DHgu4/VTPuzpVpNEfEqGhGmSAxfhcSEWZW5/QawdBetEFrBV9rJv2S2GwHrdEYnrG7 o/m2Z7gr/2kZmDmkuRhKj/K8yyEQABFlChub7aNECaakVYP5LBPXkYZwmZWOEv4dIH+l AnltqDeNj49sQmA11KEqfnidSedx2knmSXRNyfw4q3BGEzwmXzqZ1aLKhW9d9C0pb6Rd b2Ha/3jShjwENW6HYe5asYl//ZeuX9gTSignkw9+FK4SAJDUGatROegydxmztlY4wn8e 1Ay8JLmYHRJfprQ0BoqWsdmKJMjdEi7xNojCbeiGG1csgOmTBAGkBkMCq9hACtGHtrjc eBeg== X-Gm-Message-State: APjAAAVBIdzc5M9VtoVHaQYvWVE4wjuDeh4KqLkr6W5eWRzWWDVbOpBh pFSahW5bYpKsJeMu9yZx4+Fe3g== X-Received: by 2002:aca:5a04:: with SMTP id o4mr10598811oib.36.1562608427065; Mon, 08 Jul 2019 10:53:47 -0700 (PDT) Received: from ?IPv6:2600:100e:b04d:1b:ad89:e9a5:8c48:d7f4? ([2600:100e:b04d:1b:ad89:e9a5:8c48:d7f4]) by smtp.gmail.com with ESMTPSA id 103sm6061298otu.33.2019.07.08.10.53.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Jul 2019 10:53:46 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: [PATCH 2/2] x86/numa: instance all parsed numa node From: Andy Lutomirski X-Mailer: iPhone Mail (16F203) In-Reply-To: Date: Mon, 8 Jul 2019 11:53:30 -0600 Cc: Pingfan Liu , x86@kernel.org, Michal Hocko , Dave Hansen , Mike Rapoport , Tony Luck , Andy Lutomirski , Peter Zijlstra , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Andrew Morton , Vlastimil Babka , Oscar Salvador , Pavel Tatashin , Mel Gorman , Benjamin Herrenschmidt , Michael Ellerman , Stephen Rothwell , Qian Cai , Barret Rhoden , Bjorn Helgaas , David Rientjes , linux-mm@kvack.org, LKML Content-Transfer-Encoding: quoted-printable Message-Id: <18D4CC9F-BC2C-4C82-873E-364CD1795EFB@amacapital.net> References: <1562300143-11671-1-git-send-email-kernelfans@gmail.com> <1562300143-11671-2-git-send-email-kernelfans@gmail.com> To: Thomas Gleixner Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jul 8, 2019, at 3:35 AM, Thomas Gleixner wrote: >=20 >> On Mon, 8 Jul 2019, Pingfan Liu wrote: >>> On Mon, Jul 8, 2019 at 3:44 AM Thomas Gleixner wrot= e: >>>=20 >>>> On Fri, 5 Jul 2019, Pingfan Liu wrote: >>>>=20 >>>> I hit a bug on an AMD machine, with kexec -l nr_cpus=3D4 option. nr_cpu= s option >>>> is used to speed up kdump process, so it is not a rare case. >>>=20 >>> But fundamentally wrong, really. >>>=20 >>> The rest of the CPUs are in a half baken state and any broadcast event, >>> e.g. MCE or a stray IPI, will result in a undiagnosable crash. >> Very appreciate if you can pay more word on it? I tried to figure out >> your point, but fail. >>=20 >> For "a half baked state", I think you concern about LAPIC state, and I >> expand this point like the following: >=20 > It's not only the APIC state. It's the state of the CPUs in general. >=20 >> For IPI: when capture kernel BSP is up, the rest cpus are still loop >> inside crash_nmi_callback(), so there is no way to eject new IPI from >> these cpu. Also we disable_local_APIC(), which effectively prevent the >> LAPIC from responding to IPI, except NMI/INIT/SIPI, which will not >> occur in crash case. >=20 > Fair enough for the IPI case. >=20 >> For MCE, I am not sure whether it can broadcast or not between cpus, >> but as my understanding, it can not. Then is it a problem? >=20 > It can and it does. >=20 > That's the whole point why we bring up all CPUs in the 'nosmt' case and > shut the siblings down again after setting CR4.MCE. Actually that's in fac= t > a 'let's hope no MCE hits before that happened' approach, but that's all w= e > can do. >=20 > If we don't do that then the MCE broadcast can hit a CPU which has some > firmware initialized state. The result can be a full system lockup, triple= > fault etc. >=20 > So when the MCE hits a CPU which is still in the crashed kernel lala state= , > then all hell breaks lose. >=20 >> =46rom another view point, is there any difference between nr_cpus=3D1 an= d >> nr_cpus> 1 in crashing case? If stray IPI raises issue to nr_cpus>1, >> it does for nr_cpus=3D1. >=20 > Anything less than the actual number of present CPUs is problematic except= > you use the 'let's hope nothing happens' approach. We could add an option > to stop the bringup at the early online state similar to what we do for > 'nosmt'. >=20 >=20 How about we change nr_cpus to do that instead so we never have to have this= conversation again?