Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp6687532ybi; Mon, 8 Jul 2019 07:04:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqxLCzNAXilUwJIhhFlTDgM2Um8jeKdbz4r+cRHKk6rKDZd405g5AQEevnOlr1hSIBDtVhlx X-Received: by 2002:a63:9249:: with SMTP id s9mr7509860pgn.356.1562594688780; Mon, 08 Jul 2019 07:04:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562594688; cv=none; d=google.com; s=arc-20160816; b=J9NoPlAWTKbqhxEdx0XXBL4AQtrpPuoMLj3fQmSRhQcxJC0HXkjFTZpuJqUj+5D0ID fLIjeJXGlzBqsGUW+sGrASMGKXc5Wh8l5It7cTxPMUvVt5y5KPpMclYVbrZwVtOMW2RG /fFjbdoa26d5I4S/65v5GaMjpSfAhme8gVIpxdQfrQy56p1pj0SY+OdN/zTzADe+fCgQ GP5/3BLaFbAqmyxAkwLR37ABtF1Oyf7wMSi2vqdlppNBmi8fjesdIb+xEUHuse5prlHA yP5nFC+NJATkCwauLEmwhdwHXSGDhEMYE7U6SNOYKUMXQzcHESIkOfj7JM4LCeiFvKvf Sv2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=0EIBgqdeKuEHDJDmiFAVJJnQD2bZ1UxnyjWBZHpvmPQ=; b=zePfKTo5r1rnELua3YcS9SfUrvlcAlapp8hJqao6FZsKsjFotj2AS5zBReiHsXJL1v XFt2VifYXrQvkQH+GtnP3vc7GpLVk2agczW2lEIXZcsu3Vxc+L/z2ETZFEdj2hghwwVl vaKzfPyZzTCV9Ze52T+o+QhBsS38VEPMG/m08vES6LZ2Al2xi7frH73YB0m5kEE5xXXu NjBILIWsaGEyZpBCvq2UJjFba8ebSGILS+29Fo0fmX0LDbCVNaFys0HchcvpV8JcVUPj L8L2zSeOrR4caOUXDCEXhP8IXVW0pCjiQz1oD2J7lJvbq5frjCngdnN7yEsqIeKtHqwR LwqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g65si18223028pgc.425.2019.07.08.07.04.31; Mon, 08 Jul 2019 07:04:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729916AbfGHJg5 (ORCPT + 99 others); Mon, 8 Jul 2019 05:36:57 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:38940 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728725AbfGHJg4 (ORCPT ); Mon, 8 Jul 2019 05:36:56 -0400 Received: from pd9ef1cb8.dip0.t-ipconnect.de ([217.239.28.184] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1hkQ3d-00055P-1m; Mon, 08 Jul 2019 11:35:37 +0200 Date: Mon, 8 Jul 2019 11:35:35 +0200 (CEST) From: Thomas Gleixner To: Pingfan Liu cc: x86@kernel.org, Michal Hocko , Dave Hansen , Mike Rapoport , Tony Luck , Andy Lutomirski , Peter Zijlstra , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Andrew Morton , Vlastimil Babka , Oscar Salvador , Pavel Tatashin , Mel Gorman , Benjamin Herrenschmidt , Michael Ellerman , Stephen Rothwell , Qian Cai , Barret Rhoden , Bjorn Helgaas , David Rientjes , linux-mm@kvack.org, LKML Subject: Re: [PATCH 2/2] x86/numa: instance all parsed numa node In-Reply-To: Message-ID: References: <1562300143-11671-1-git-send-email-kernelfans@gmail.com> <1562300143-11671-2-git-send-email-kernelfans@gmail.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 8 Jul 2019, Pingfan Liu wrote: > On Mon, Jul 8, 2019 at 3:44 AM Thomas Gleixner wrote: > > > > On Fri, 5 Jul 2019, Pingfan Liu wrote: > > > > > I hit a bug on an AMD machine, with kexec -l nr_cpus=4 option. nr_cpus option > > > is used to speed up kdump process, so it is not a rare case. > > > > But fundamentally wrong, really. > > > > The rest of the CPUs are in a half baken state and any broadcast event, > > e.g. MCE or a stray IPI, will result in a undiagnosable crash. > Very appreciate if you can pay more word on it? I tried to figure out > your point, but fail. > > For "a half baked state", I think you concern about LAPIC state, and I > expand this point like the following: It's not only the APIC state. It's the state of the CPUs in general. > For IPI: when capture kernel BSP is up, the rest cpus are still loop > inside crash_nmi_callback(), so there is no way to eject new IPI from > these cpu. Also we disable_local_APIC(), which effectively prevent the > LAPIC from responding to IPI, except NMI/INIT/SIPI, which will not > occur in crash case. Fair enough for the IPI case. > For MCE, I am not sure whether it can broadcast or not between cpus, > but as my understanding, it can not. Then is it a problem? It can and it does. That's the whole point why we bring up all CPUs in the 'nosmt' case and shut the siblings down again after setting CR4.MCE. Actually that's in fact a 'let's hope no MCE hits before that happened' approach, but that's all we can do. If we don't do that then the MCE broadcast can hit a CPU which has some firmware initialized state. The result can be a full system lockup, triple fault etc. So when the MCE hits a CPU which is still in the crashed kernel lala state, then all hell breaks lose. > From another view point, is there any difference between nr_cpus=1 and > nr_cpus> 1 in crashing case? If stray IPI raises issue to nr_cpus>1, > it does for nr_cpus=1. Anything less than the actual number of present CPUs is problematic except you use the 'let's hope nothing happens' approach. We could add an option to stop the bringup at the early online state similar to what we do for 'nosmt'. Thanks, tglx