Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4914848iob; Mon, 9 May 2022 04:47:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwvmF38kU2RD4Xlz8CRz/BEvIo3N4MUxTVVDNTTDZRyoxbWtjY49iHNUme2Dqx/lCOK32cj X-Received: by 2002:aa7:8888:0:b0:50f:2e7a:76b7 with SMTP id z8-20020aa78888000000b0050f2e7a76b7mr15816429pfe.48.1652096846465; Mon, 09 May 2022 04:47:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652096846; cv=none; d=google.com; s=arc-20160816; b=VAF2hb/w3iey6pxfVueG68piqw/hSqT1WmicqCtai37FvTYDNQOQhrKw/aYQj7uX5A Tuy7dUupbQmdR3wYQr8iTSqPsAYXqRw00s470xIuZzEiVG5Igi7RTiWfcxUxLNG4nL8h 9QsOEPWs0s9+uPlFeXgZeTHrezJIFDLcoZyEQ/sS8uqnqOUgbfckGR0lFWXYq3nexvXx uY7iEhwdcMEHxASZY13Vx8a9H6CTsew2a6vW+3jBZRoWJqhhDFfxVH+EkO+vzwiX8YCX yDf7++Dy+3wVAzCvX+5B2j8XwMgksjXHxAuRUa0eO4LEsHpueC1hO24FBnioJxxkCZXL VpuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=CYiIxCVx9W1bhOmSmf98b6lgiKtmIYGoNtu68j+dkiI=; b=uZhEz7dhOmhf9yOwYSO6gpj3CXp9J6kSFctvjE+pMWao4bP7j3etNwUEVT5/Z7qwen 8FyRU73Jeg3hJqToFmycEPshIQ0zaYdU8f21cuVt6pK4pvJu16uzUPXPJ8hwcrdjputM d2TcGiZZUL5i0H4glHTg0DykU8Q/3qCFilXCO4ZRRbk1g+JNt7DCCqSWPT8MLuyDKRjV llmjtoo7X5K4Ernt2Y3KkuaxTt4TzzAsNpiq7VMDTAbRFBAr4m/PAAJ0RxVJ3AVaOf7B MALN1MBPnA5LduadfRnBIrl9Ck8j5WNFJHdf4BxWp5AD0OKSjfCHjiTnGE8n7mxU1mr+ k44w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=1tuQRLZI; dkim=neutral (no key) header.i=@linutronix.de; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id l14-20020a056a0016ce00b005107f0bacdasi12260975pfc.126.2022.05.09.04.47.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 04:47:26 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=1tuQRLZI; dkim=neutral (no key) header.i=@linutronix.de; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B4234197F7F; Mon, 9 May 2022 03:55:34 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229755AbiEIK7U (ORCPT + 99 others); Mon, 9 May 2022 06:59:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229538AbiEIK7R (ORCPT ); Mon, 9 May 2022 06:59:17 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A49F1D48EE for ; Mon, 9 May 2022 03:55:24 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1652093722; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=CYiIxCVx9W1bhOmSmf98b6lgiKtmIYGoNtu68j+dkiI=; b=1tuQRLZIrfn6B7U6KKjue24EI8R1NQfjcGLWQBtaNpooDTFAOysw8mOAI4V+96V3wHZtW5 h10DH9fwCTHTPDeHQXc4ijG8TWRbogjWePYJ1N+3iwQAC86/L4EOVsGK8XgS4N8djiF4X5 NaHLFm/1LPCjn/SuFFCv+t8kW87d7hjMFAoQoeSTNT2xh45JO51VotPLUQszfuMfEqdbbp K2i4xmmCNsWOkW+KDlZrpXVUDgs6e/O0QhoozetiEqF4HoM4QvTgS0zcKgmpxHcaKY9PY4 t4t6LYVaglUwjHbEs3Z8U+k1FhTZZLKw8cD/nkHgAevC76+E8VnUl3KV5MiFlw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1652093722; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=CYiIxCVx9W1bhOmSmf98b6lgiKtmIYGoNtu68j+dkiI=; b=MOtniF/K1n+7Mvsl1BAMoj3O5rrAvt73HllS/LwaCcfjbmWfS0ZoCQz6V9aXaIF+YNEwfi rE23df0P/pITJaBw== To: Pingfan Liu , linux-kernel@vger.kernel.org Cc: Pingfan Liu , Eric Biederman , Peter Zijlstra , Valentin Schneider , Vincent Donnefort , Ingo Molnar , Mark Rutland , YueHaibing , Baokun Li , Randy Dunlap , Baoquan He , kexec@lists.infradead.org Subject: Re: [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable In-Reply-To: <20220509041305.15056-2-kernelfans@gmail.com> References: <20220509041305.15056-1-kernelfans@gmail.com> <20220509041305.15056-2-kernelfans@gmail.com> Date: Mon, 09 May 2022 12:55:21 +0200 Message-ID: <87ee13rn52.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 09 2022 at 12:13, Pingfan Liu wrote: > The following code chunk repeats in both > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus(): > > if (!cpu_online(primary_cpu)) > primary_cpu = cpumask_first(cpu_online_mask); > > This is due to a breakage like the following: I don't see what's broken here. > kernel_kexec() > migrate_to_reboot_cpu(); > cpu_hotplug_enable(); > -----------> comes a cpu_down(this_cpu) on other cpu > machine_shutdown(); > smp_shutdown_nonboot_cpus(); // re-check "if (!cpu_online(primary_cpu))" to protect against the former breakin > > Although the kexec-reboot task can get through a cpu_down() on its cpu, > this code looks a little confusing. Confusing != broken. > +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */ This comment makes no sense. > void smp_shutdown_nonboot_cpus(unsigned int primary_cpu) > { > unsigned int cpu; > int error; > > + /* > + * Block other cpu hotplug event, so primary_cpu is always online if > + * it is not touched by us > + */ > cpu_maps_update_begin(); > - > /* > - * Make certain the cpu I'm about to reboot on is online. > - * > - * This is inline to what migrate_to_reboot_cpu() already do. > + * migrate_to_reboot_cpu() disables CPU hotplug assuming that > + * no further code needs to use CPU hotplug (which is true in > + * the reboot case). However, the kexec path depends on using > + * CPU hotplug again; so re-enable it here. You want to reduce confusion, but in reality this is even more confusing than before. > */ > - if (!cpu_online(primary_cpu)) > - primary_cpu = cpumask_first(cpu_online_mask); > + __cpu_hotplug_enable(); How is this decrement solving anything? At the end of this function, the counter is incremented again. So what's the point of this exercise? > for_each_online_cpu(cpu) { > if (cpu == primary_cpu) > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index 68480f731192..db4fa6b174e3 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -1168,14 +1168,12 @@ int kernel_kexec(void) > kexec_in_progress = true; > kernel_restart_prepare("kexec reboot"); > migrate_to_reboot_cpu(); > - > /* > - * migrate_to_reboot_cpu() disables CPU hotplug assuming that > - * no further code needs to use CPU hotplug (which is true in > - * the reboot case). However, the kexec path depends on using > - * CPU hotplug again; so re-enable it here. > + * migrate_to_reboot_cpu() disables CPU hotplug. If an arch > + * relies on the cpu teardown to achieve reboot, it needs to > + * re-enable CPU hotplug there. What does that for arch/powerpc/kernel/kexec_machine64.c now? Nothing, as far as I can tell. Which means you basically reverted 011e4b02f1da ("powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode") unless I'm completely confused. > */ > - cpu_hotplug_enable(); This is tinkering at best. Can we please sit down and rethink this whole machinery instead of applying random duct tape to it? Thanks, tglx