Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp5866204iob; Tue, 10 May 2022 05:39:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzRtN6WkYABLYQRX1VqwNpUTrZttExjZq9kp71n22/pkxDYM4+PDJ74pEu/tdBwfQnAYghq X-Received: by 2002:a17:906:2310:b0:6f3:dad4:e285 with SMTP id l16-20020a170906231000b006f3dad4e285mr19093509eja.9.1652186396549; Tue, 10 May 2022 05:39:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652186396; cv=none; d=google.com; s=arc-20160816; b=FlHtcPvQo/rM2POwmBQuKDarB4ab784PYWmADQr9WFIlulCOcnt0VVBEiy/5jDxDEO FjRQ13eldVXewi0N92/IoUiXNmyiGdVkUqgW776zV7E9NHF02zkWKbRZOcDdcy0N0WlW guoDh1ry8OHUCGLQ9Ww/0SGODnvxWa+qrNWf2+Y4678IzBiBgf5dwARbow5c5fI28sKy 1PZSE/b2VoiFMPuY0vZqzgKiDv8pFdbAD22RjhCZ2r8UKdUEIR+WILY4geWOhcKWnkGl CFI0/otSOMx2GTyogWzgYTOkh9SUNHw/tR2Kxx8HtgBMo8UlnHYn2UI+j2Ozdc7glncE MRrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=TiMg1DrTYo/YZ2MXrCD0mV4n3aaVIaQGDnTw1AQsXRA=; b=ngGcgWr/4BVGsrumdH/4fifsiUykRdbL2xEDOLl3zmpadanYBeb9MPfXKUnBroQSVr uJXw78qgmNTjOlXQEUEnPsNLBtDSUAgSJBUC5IgFovjskgLXw1BAR51jddg284+rXEf+ UjZzLYXWez2oi4/PicdWoobdg4FxpYBdBoOICSv4hFa9fnA6CYBxSUVmDGTRnTZ2zemc Uoh9/k8eTBbTcvrcATScbOUw/4D+ZOtBQVnf34s4qXqLvdxhny/T1RkFA78C2z1W9MGy ixigraUvi1ALCLnyKfAqo3SPv/mb9iu8ZZJrANzcikq1kbTbZ3GHYaXc2rM3b7K8nAgQ KgSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=nsioDX0I; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m9-20020a056402510900b00425b57c4593si17363903edd.163.2022.05.10.05.39.31; Tue, 10 May 2022 05:39:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=nsioDX0I; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237990AbiEJIcM (ORCPT + 99 others); Tue, 10 May 2022 04:32:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232322AbiEJIcK (ORCPT ); Tue, 10 May 2022 04:32:10 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5C1736E20 for ; Tue, 10 May 2022 01:28:13 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1652171291; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TiMg1DrTYo/YZ2MXrCD0mV4n3aaVIaQGDnTw1AQsXRA=; b=nsioDX0IJUJIIL8PcIZ5LZzjCAKZI9Yr4r6v2sHmM1ka35Rq14vG/PKGc/MJ7nrybEppWR Ipk+vo4VaSBBSu+jb2BdzfelvxOh5spJvUCU+bj/Hq6FV93crY8ZW6ZKZ2/ez7VRAnGmPV WNHR9YJ4fql4wg3MaMI4IKparg7lD3q9R8zA08HpAv1h59XzQzGLlcpptDFZ1y4srezObO seOtnuUVQJeiYGJ3a++MYEZ4oOpjYnAWIgRK1uU4ssWG8qLdy7T97N+l4lXayj3dJz9fJh 6LU4Vm3VetdEqRdZtxrCGUHIkOkK0/vJi5bRTkNpzwfTHPrmPDQyUQMAGMW5Fw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1652171291; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TiMg1DrTYo/YZ2MXrCD0mV4n3aaVIaQGDnTw1AQsXRA=; b=O7fDOSZXbtRnEViXweUqLo/rP6gRRUbIYVkmwi1ef+vfFRAMnnDMEQW1lhH87G88tPcFdg x9/0dAaYDXZnbjAw== To: Pingfan Liu Cc: linux-kernel@vger.kernel.org, Eric Biederman , Peter Zijlstra , Valentin Schneider , Vincent Donnefort , Ingo Molnar , Mark Rutland , YueHaibing , Baokun Li , Randy Dunlap , Baoquan He , kexec@lists.infradead.org Subject: Re: [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable In-Reply-To: References: <20220509041305.15056-1-kernelfans@gmail.com> <20220509041305.15056-2-kernelfans@gmail.com> <87ee13rn52.ffs@tglx> Date: Tue, 10 May 2022 10:28:11 +0200 Message-ID: <87y1z9pzac.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 10 2022 at 11:38, Pingfan Liu wrote: > On Mon, May 09, 2022 at 12:55:21PM +0200, Thomas Gleixner wrote: >> On Mon, May 09 2022 at 12:13, Pingfan Liu wrote: >> > The following code chunk repeats in both >> > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus(): >> > This is due to a breakage like the following: >> >> I don't see what's broken here. >> > > No, no broken. Could it be better to replace 'breakage' with > 'breakin'? There is no break-in. There is a phase where CPU hotplug is reenabled, which might be avoided. >> > +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */ >> >> This comment makes no sense. >> > > Since migrate_to_reboot_cpu() disables cpu hotplug, so the selected > valid online cpu -- primary_cpu keeps unchange. So what is that parameter for then? If migrate_to_reboot_cpu() ensured that the current task is on the reboot CPU then this parameter is useless, no? >> > void smp_shutdown_nonboot_cpus(unsigned int primary_cpu) >> > { >> > unsigned int cpu; >> > int error; >> > >> > + /* >> > + * Block other cpu hotplug event, so primary_cpu is always online if >> > + * it is not touched by us >> > + */ >> > cpu_maps_update_begin(); >> > - >> > /* >> > - * Make certain the cpu I'm about to reboot on is online. >> > - * >> > - * This is inline to what migrate_to_reboot_cpu() already do. >> > + * migrate_to_reboot_cpu() disables CPU hotplug assuming that >> > + * no further code needs to use CPU hotplug (which is true in >> > + * the reboot case). However, the kexec path depends on using >> > + * CPU hotplug again; so re-enable it here. >> >> You want to reduce confusion, but in reality this is even more confusing >> than before. >> > > This __cpu_hotplug_enable() can be considered to defer from kernel_kexec() to > arch-dependent code chunk (here), which is a more proper point. > > Could it make things better by rephrasing the words as the following? > migrate_to_reboot_cpu() disables CPU hotplug to prevent the selected > reboot cpu from disappearing. But arches need cpu_down to hot remove > cpus except rebooting-cpu, so re-enabling cpu hotplug again. Can you please use proper words. arches is not a word and it's closer to the plural of arch, than to the word architecture. This is not twitter. And no, the architectures do not need cpu_down() at all. This very function smp_shutdown_nonboot_cpus() invokes cpu_down_maps_locked() to shut down the non boot CPUs. That fails when cpu_hotplug_disabled != 0. >> > */ >> > - if (!cpu_online(primary_cpu)) >> > - primary_cpu = cpumask_first(cpu_online_mask); >> > + __cpu_hotplug_enable(); >> >> How is this decrement solving anything? At the end of this function, the >> counter is incremented again. So what's the point of this exercise? >> > This decrement enables the cpu hot-removing. Since > smp_shutdown_nonboot_cpus()->cpu_down_maps_locked(), if > cpu_hotplug_disabled, it returns -EBUSY. Correct, so why can't you spell that out in concise words in the first place right at that comment which reenables hotplug? >> What does that for arch/powerpc/kernel/kexec_machine64.c now? >> >> Nothing, as far as I can tell. Which means you basically reverted >> 011e4b02f1da ("powerpc, kexec: Fix "Processor X is stuck" issue during >> kexec from ST mode") unless I'm completely confused. >> > > Oops. Forget about powerpc. Considering the cpu hotplug is an > arch-dependent feature in machine_shutdown(), as x86 does not need it. It's not a feature, it's a architecture specific requirement. x86 is irrelevant here because this is a powerpc requirement. >> This is tinkering at best. Can we please sit down and rethink this whole >> machinery instead of applying random duct tape to it? >> > I try to make code look consistent. Emphasis on try. So far the attempt failed and resulted in a regression. Thanks, tglx