Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp3702214rwb; Tue, 20 Sep 2022 03:33:41 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6/eDGP2QnpRfxagE1Gwj27deMUrGJh8k3UV+b5RBmWWp7iopnH+72Sx2lBthbjqmSSj5li X-Received: by 2002:a17:907:6d90:b0:77c:68a8:a5b with SMTP id sb16-20020a1709076d9000b0077c68a80a5bmr16789748ejc.342.1663670021358; Tue, 20 Sep 2022 03:33:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663670021; cv=none; d=google.com; s=arc-20160816; b=F3FKwzH1Lx/M+PU26M0hj+3QwgW0jgcDPSIJfGlQ0dlnkX7JW29Ln6dgHXR9vCI/p4 RyBpy/+tC5uBb4krnLHjd80PIcV8uCqacKEaOC7TjAbEUHvQNcrTGmPsqTeM4Kt8zGCu MNt2GEY+Apt62q4fmP0m8GagOaf0wzGlH22DRw+A9r5OpYYYVnk4TzwgBbBcirnXsfcd HqZNxj6TDmfIr7QtMOOOmc0zlbQO74mmkAMvUopwRDHfw3n5GgAgboi+c52lqKZQAtH4 QTFrW3Y1v1FO7Yrckvz+mJfLF4S3HCwwFJ+fqNxERA6WnJSF7jV3ZCvszJW/Oe52A0h7 Lgbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=UIRn7sNqTzXlq1Prju+lzb5bw7eC6+fPq7e2Fc/5dQw=; b=kK4A873SFDmygnLlt4pRez3o/34TNwnGzp9bxwbmPWugYK3IK5noJ5KLI/cduEHrtz BS5piysFskI7zNkMcEeiRrHgYyuDELxBijA7fRxzHBMxN0yBEKVkh+zl4rMBHsX7/ygv SQOs9Wr/y/eiZA63vquLM/uC5wKnSYJxLuLKDs8nt30eTcM6IwUfydO8Er7qsfhv/3+k mSEH8DDzLHn/TOX+9GHYTx6BosPdGxZSx4ZeqCaBwo1Y0bBfF00OVunjMDFL+tLYvZR0 NBxIJrAtdDNRbXcTp+am6H/W3Fis8BupaIahXcwfrSd33ISV8C7WtP6FeyJkCqlr1JZW tVcg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b20-20020a056402279400b00447b0cc1501si1446704ede.99.2022.09.20.03.33.14; Tue, 20 Sep 2022 03:33:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229656AbiITJ7N (ORCPT + 99 others); Tue, 20 Sep 2022 05:59:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230253AbiITJ7L (ORCPT ); Tue, 20 Sep 2022 05:59:11 -0400 Received: from wp530.webpack.hosteurope.de (wp530.webpack.hosteurope.de [IPv6:2a01:488:42:1000:50ed:8234::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94A3B6F268 for ; Tue, 20 Sep 2022 02:59:08 -0700 (PDT) Received: from [2a02:8108:963f:de38:eca4:7d19:f9a2:22c5]; authenticated by wp530.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1oaa1y-0000La-Uh; Tue, 20 Sep 2022 11:59:06 +0200 Message-ID: <42237005-5a6e-9462-bfdd-f1ba0c725c72@leemhuis.info> Date: Tue, 20 Sep 2022 11:59:06 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.1 Subject: Re: [PATCH v5] cpu/hotplug: Do not bail-out in DYING/STARTING sections Content-Language: en-US, de-DE To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, vschneid@redhat.com, kernel-team@android.com, Derek Dolney , Vincent Donnefort , peterz@infradead.org References: <20220725095952.206884-1-vdonnefort@google.com> From: Thorsten Leemhuis In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-bounce-key: webpack.hosteurope.de;regressions@leemhuis.info;1663667948;1c3a15dd; X-HE-SMSGID: 1oaa1y-0000La-Uh X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 17.08.22 11:46, Thorsten Leemhuis wrote: > > Hi, this is your Linux kernel regression tracker. > > On 25.07.22 11:59, Vincent Donnefort wrote: >> The DYING/STARTING callbacks are not expected to fail. However, as reported >> by Derek, drivers such as tboot are still free to return errors within >> those sections, which halts the hot(un)plug and leaves the CPU in an >> unrecoverable state. >> >> No rollback being possible there, let's only log the failures and proceed >> with the following steps. This restores the hotplug behaviour prior to >> commit 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()") >> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215867 >> Fixes: 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()") >> Reported-by: Derek Dolney >> Signed-off-by: Vincent Donnefort >> Tested-by: Derek Dolney > > What's the status here? Did that patch to fixing a regression fall > through the cracks? It looks like nothing happened for 3 weeks now, > that's why I wondered, but maybe I missed something. Hmm, Vincent seems to be MIA, at least I see no recent messages from him on lore. Odd. But well, it's still a fix for a regression and it's up to v5 already; Valentin already added his Reviewed-by, too. Would be a shame to waste this. Thomas, could you maybe take a look at the patch? Maybe we're lucky and the patch is already good to go... Ciao, Thorsten #regzbot poke > P.S.: As the Linux kernel's regression tracker I deal with a lot of > reports and sometimes miss something important when writing mails like > this. If that's the case here, don't hesitate to tell me in a public > reply, it's in everyone's interest to set the public record straight. > > >> v4 -> v5: >> - Remove WARN, only log broken states with pr_warn. >> v3 -> v4: >> - Sorry ... wrong commit description style ... >> v2 -> v3: >> - Tested-by tag. >> - Refine commit description. >> - Bugzilla link. >> v1 -> v2: >> - Commit message rewording. >> - More details in the warnings. >> - Some variable renaming >> >> diff --git a/kernel/cpu.c b/kernel/cpu.c >> index bbad5e375d3b..621e5af42d57 100644 >> --- a/kernel/cpu.c >> +++ b/kernel/cpu.c >> @@ -663,21 +663,51 @@ static bool cpuhp_next_state(bool bringup, >> return true; >> } >> >> -static int cpuhp_invoke_callback_range(bool bringup, >> - unsigned int cpu, >> - struct cpuhp_cpu_state *st, >> - enum cpuhp_state target) >> +static int __cpuhp_invoke_callback_range(bool bringup, >> + unsigned int cpu, >> + struct cpuhp_cpu_state *st, >> + enum cpuhp_state target, >> + bool nofail) >> { >> enum cpuhp_state state; >> - int err = 0; >> + int ret = 0; >> >> while (cpuhp_next_state(bringup, &state, st, target)) { >> + int err; >> + >> err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL); >> - if (err) >> + if (!err) >> + continue; >> + >> + if (nofail) { >> + pr_warn("CPU %u %s state %s (%d) failed (%d)\n", >> + cpu, bringup ? "UP" : "DOWN", >> + cpuhp_get_step(st->state)->name, >> + st->state, err); >> + ret = -1; >> + } else { >> + ret = err; >> break; >> + } >> } >> >> - return err; >> + return ret; >> +} >> + >> +static inline int cpuhp_invoke_callback_range(bool bringup, >> + unsigned int cpu, >> + struct cpuhp_cpu_state *st, >> + enum cpuhp_state target) >> +{ >> + return __cpuhp_invoke_callback_range(bringup, cpu, st, target, false); >> +} >> + >> +static inline void cpuhp_invoke_callback_range_nofail(bool bringup, >> + unsigned int cpu, >> + struct cpuhp_cpu_state *st, >> + enum cpuhp_state target) >> +{ >> + __cpuhp_invoke_callback_range(bringup, cpu, st, target, true); >> } >> >> static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st) >> @@ -999,7 +1029,6 @@ static int take_cpu_down(void *_param) >> struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state); >> enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE); >> int err, cpu = smp_processor_id(); >> - int ret; >> >> /* Ensure this CPU doesn't handle any more interrupts. */ >> err = __cpu_disable(); >> @@ -1012,13 +1041,11 @@ static int take_cpu_down(void *_param) >> */ >> WARN_ON(st->state != (CPUHP_TEARDOWN_CPU - 1)); >> >> - /* Invoke the former CPU_DYING callbacks */ >> - ret = cpuhp_invoke_callback_range(false, cpu, st, target); >> - >> /* >> + * Invoke the former CPU_DYING callbacks >> * DYING must not fail! >> */ >> - WARN_ON_ONCE(ret); >> + cpuhp_invoke_callback_range_nofail(false, cpu, st, target); >> >> /* Give up timekeeping duties */ >> tick_handover_do_timer(); >> @@ -1296,16 +1323,14 @@ void notify_cpu_starting(unsigned int cpu) >> { >> struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu); >> enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE); >> - int ret; >> >> rcu_cpu_starting(cpu); /* Enables RCU usage on this CPU. */ >> cpumask_set_cpu(cpu, &cpus_booted_once_mask); >> - ret = cpuhp_invoke_callback_range(true, cpu, st, target); >> >> /* >> * STARTING must not fail! >> */ >> - WARN_ON_ONCE(ret); >> + cpuhp_invoke_callback_range_nofail(true, cpu, st, target); >> } >> >> /*