Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp7564787ybi; Tue, 9 Jul 2019 00:26:02 -0700 (PDT) X-Google-Smtp-Source: APXvYqxGibA+2sjml9AQVwsRPabKL+tetfw58qHoLylyJySlyRxjVDInV5i2Vg9r0X5ygIwOCIzG X-Received: by 2002:a17:90a:2641:: with SMTP id l59mr29379549pje.55.1562657162513; Tue, 09 Jul 2019 00:26:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562657162; cv=none; d=google.com; s=arc-20160816; b=wTVqPbVAQz81QcCgqsuoYA6s+4nqNj3wYK2uQG1ifm6OYGAMAjMNWtdi9PCWNhIG28 Mg1HYhtr7zmLP3pUAkjePZaFcgvqE83pXck9Dkght3dVD0n28rETVugv9drdi6j6MdOJ rD6qTqG/4Uu9avJY+AlYhQAkviLApnaxLE0ekw9sfj/cJLZKvJwZI6f7Q3ZrkVBInJ35 1/NcRK2HqMWOIfc+8c3QaCKcHr1s+LrV8ZDXqUpibVoU8xG42SGIewTSdgDXKiy8lTgL 8UO7KSTDzaCct6FJpZbIRCyW8YJYD5Z8jFbgSgvdflBAtsIEfweGZMwYOMhr82GCjTIV 9Gqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=uMm8/j9yRMRH4iQPqFsUmwQgdLXGN0EeOqXIdi11IUA=; b=SA28HD8vgyuhAYAdAX6OAavAyQDtNx4yKcnZbj4iIH6PESJkbeUuGDL/IYj7ZyuE9g tMq8BUPn5UZfrzcFBDFCLPBTurhEYcfWsJzb1Z8GnYPvMuuEH0dPbX1UTU+o0aWG30eq C0j/QyipGxBX5wJtD8+BNbJBSyoQdm6KbZYiUcZewgyoJ5SJbs9jlpq4eDXfiR0G1iQa xXlayJadvFWdSOUNj/l5Tz8Ht314RgIMnvOrtGP5oczVQ0Kc0MNtYIVOyVLiwOqnFiK0 nyZxEwGji0X5jT8MHksxOFsKV1Kig1uWAdTuUvguEW3Jy0e3E96iVUSkyMA0Dv/kMFg3 LkGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dUQVg5Ps; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cj14si20724199plb.141.2019.07.09.00.25.47; Tue, 09 Jul 2019 00:26:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dUQVg5Ps; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726483AbfGIHYm (ORCPT + 99 others); Tue, 9 Jul 2019 03:24:42 -0400 Received: from mail-io1-f65.google.com ([209.85.166.65]:33872 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725947AbfGIHYm (ORCPT ); Tue, 9 Jul 2019 03:24:42 -0400 Received: by mail-io1-f65.google.com with SMTP id k8so41109878iot.1 for ; Tue, 09 Jul 2019 00:24:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uMm8/j9yRMRH4iQPqFsUmwQgdLXGN0EeOqXIdi11IUA=; b=dUQVg5PsEBC8sNS6Ko5mv4/t4K3hCsAqHkcyJqYwSeI54TV9HIhBglh0iGBRbuEIuu 5M75SqaCIZpE4ARE6teF2Xeymege0iPEJUz1ETqQ//XenEbHajapvmxrImOwcNg7OfLC PN0wXbJY75I8aGJdzcUVDud0y4Vsb/YMiqjZOTss7bWQ1CpSMxUcFOzXiOvDsCU7tb2h lAbni/hhSzXZ988Z5uxonG9nSwpg+roUyxFSvq+bkm2T6MIuCGyROxokpyR2qggdj5F5 D1ASMOALGfggeW+UrhKQg9l24tUGduKleQsdSiwiFC3WyECJpAOMAr4usJpl7fQ7gTvF 3Kgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uMm8/j9yRMRH4iQPqFsUmwQgdLXGN0EeOqXIdi11IUA=; b=SAYoH7IvnpffxeUEI3Rk+BEtvblO0mCyQQNiXbNwTeSGyvnhMXA9zlDOv6U+8m6x2I VBWrWjt13Z6CMaOWFD1y6p8LSD6DaMjrP+5wq9iB7rDGLNQHt/U8Vp6o2mEmmAxbcxsj /MXDKBENpMYFFGxgW3+OnTLfdpFjbvmsOilIOfY8A6jyFvK7xdZWctFRNmxD1GUYq20x /z2CEulQsWDOrlhXOQfZqSRh38jNYvLpaGVEJdMBMyEEVswil59rS7f+Bf8wuRIsdTz2 hNi7i3UChfbr3MOBTQrBQmTJHtzWUCe/8vqQ1miqg/noPXePLvdajC/BaJR0T08FgZvC nzPA== X-Gm-Message-State: APjAAAVlpdR56CNGSpqJk+0hmA401EwYoU4oXCR9lFtSGnLiTMM8oEa0 aOO/ikdEoqs+dy3Um8ah+dBBu4VTaJCF1nyWiQ== X-Received: by 2002:a5e:d60a:: with SMTP id w10mr25150800iom.78.1562657081578; Tue, 09 Jul 2019 00:24:41 -0700 (PDT) MIME-Version: 1.0 References: <1562300143-11671-1-git-send-email-kernelfans@gmail.com> <1562300143-11671-2-git-send-email-kernelfans@gmail.com> In-Reply-To: From: Pingfan Liu Date: Tue, 9 Jul 2019 15:24:30 +0800 Message-ID: Subject: Re: [PATCH 2/2] x86/numa: instance all parsed numa node To: Thomas Gleixner Cc: x86@kernel.org, Michal Hocko , Dave Hansen , Mike Rapoport , Tony Luck , Andy Lutomirski , Peter Zijlstra , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Andrew Morton , Vlastimil Babka , Oscar Salvador , Pavel Tatashin , Mel Gorman , Benjamin Herrenschmidt , Michael Ellerman , Stephen Rothwell , Qian Cai , Barret Rhoden , Bjorn Helgaas , David Rientjes , linux-mm@kvack.org, LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 9, 2019 at 2:12 PM Thomas Gleixner wrote: > > On Tue, 9 Jul 2019, Pingfan Liu wrote: > > On Mon, Jul 8, 2019 at 5:35 PM Thomas Gleixner wrote: > > > It can and it does. > > > > > > That's the whole point why we bring up all CPUs in the 'nosmt' case and > > > shut the siblings down again after setting CR4.MCE. Actually that's in fact > > > a 'let's hope no MCE hits before that happened' approach, but that's all we > > > can do. > > > > > > If we don't do that then the MCE broadcast can hit a CPU which has some > > > firmware initialized state. The result can be a full system lockup, triple > > > fault etc. > > > > > > So when the MCE hits a CPU which is still in the crashed kernel lala state, > > > then all hell breaks lose. > > Thank you for the comprehensive explain. With your guide, now, I have > > a full understanding of the issue. > > > > But when I tried to add something to enable CR4.MCE in > > crash_nmi_callback(), I realized that it is undo-able in some case (if > > crashed, we will not ask an offline smt cpu to online), also it is > > needless. "kexec -l/-p" takes the advantage of the cpu state in the > > first kernel, where all logical cpu has CR4.MCE=1. > > > > So kexec is exempt from this bug if the first kernel already do it. > > No. If the MCE broadcast is handled by a CPU which is stuck in the old > kernel stop loop, then it will execute on the old kernel and eventually run > into the memory corruption which crashed the old one. > Yes, you are right. Stuck cpu may execute the old do_machine_check() code. But I just found out that we have do_machine_check()->__mc_check_crashing_cpu() to against this case. And I think the MCE issue with nr_cpus is not closely related with this series, can be a separated issue. I had question whether Andy will take it, if not, I am glad to do it. Thanks and regards, Pingfan