Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp1377778rwi; Mon, 10 Oct 2022 15:45:02 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7DqMm0wZMxLb40UWavJ7dB69LVqJ6873Za153Ruh4syQEjpSze89ayLIDcnVMVLZxrw9JU X-Received: by 2002:a05:6402:164c:b0:459:1e2e:e742 with SMTP id s12-20020a056402164c00b004591e2ee742mr19602219edx.125.1665441902699; Mon, 10 Oct 2022 15:45:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665441902; cv=none; d=google.com; s=arc-20160816; b=ZgPN54rwRZxjYW0fzck9qGD7hM8l6pCbOYOWh0udBIST9RVmHNCRj9OKdd7Ya16FzM pZkvtBFBua2IhvqxVFQ8zmrKZko8oGDZuLMxCW4xm+Ima+aE5WIttgisyY0ktupvFup3 jHfXIlcIVstSR00+QIPv82XgK2ZvouUMKBto9SBGgqCmD9HlFDxhqa0rSxs3nECimZMG MNzuZ4utdJq2OWgB1cJZMVYLR6ioxDRwGZvlLlN5QGm0Rznw3meG4WUP95lv8QLNobDB NZ5PeMm4bMHKr6ZhRspCugGhkVydm9uFIW7+rWZ04TDNoM7E9S2ZUA1Tx1/dy2Ff4ssm o7Fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=1uI8jeOlBsH7SDp3XDQshHEWLWm+RvehXVB9vJnBOEk=; b=rZxsT99AD/OyMG+E4O8yYCimOhXru+I+GjpcXG9ckPXqy5yG5PHrHI50NoM6tq2+Pc dFGgWNTASD2zNhkZd61E89moDjOhKpd5Rbu2dhLVBWkVHfUbKuXqpCowUGBhBhsIppwW ZPUKybSL7e0W6HjWR4+cblQBP+h731W/xhZ+dLSuqhgodMsKOkky+wNPNNK6Mu5a1oOj QudhhcvsoYaQKPzlucyKd/j3l9ukjDFbjAqY+3H7kJGhh9o0RnVnv9IXGqh4VA2tmfUR 6YlFd8FuqY15aMYsXluQYXmSh1b/pPiiGiCvtnlEeaGXYy1cjgX8UANJSy7xdiaLcuqh SUuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="M1ET6V/R"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sb41-20020a1709076da900b0078d39e310e0si9195728ejc.701.2022.10.10.15.44.36; Mon, 10 Oct 2022 15:45:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="M1ET6V/R"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229817AbiJJWkQ (ORCPT + 99 others); Mon, 10 Oct 2022 18:40:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229471AbiJJWkL (ORCPT ); Mon, 10 Oct 2022 18:40:11 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEFE8792D8 for ; Mon, 10 Oct 2022 15:40:09 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 427C96103C for ; Mon, 10 Oct 2022 22:40:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9056FC433B5; Mon, 10 Oct 2022 22:40:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665441608; bh=XFR+M/R3+wFhDpWoVbJrYDu/lEAVmEd53JiYagAL0oA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=M1ET6V/RBZaGLJmpfjTtejr+hrnMuiZqX0d1YmG6A+OJD9fQvCrDSTrHPQC8K31A/ h5yPARGNvpoXmuLCvVEnmr1yJaKkUtal1d9ho5GYQKA2IYVpMcovTOubyRwxBtY0vo 7u1SLvMTeBmgTvBShxOF813QHyZw9Tv5Nk2Ilm2sUt/dZa0dj2kUNyuPjCgOGillmd YYEsS7fKUuI6EVs+p+BL8ucb78mZr/j2FOnqgxoiIEhRhlgRzYwcsotK3Y0cuTdm22 T52T4FiZbXO2PhEgGcdb1rtqPV8TsjI7PKFNFIiDZ3aVd07yH4RT3AAYMyfoh4KEUh NJ7YCgFDuVHPw== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , Joel Fernandes Subject: [PATCH 1/2] rcu: Fix missing nocb gp wake on rcu_barrier() Date: Tue, 11 Oct 2022 00:39:55 +0200 Message-Id: <20221010223956.1041247-2-frederic@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221010223956.1041247-1-frederic@kernel.org> References: <20221010223956.1041247-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Upon entraining a callback to a NOCB CPU, no further wake up is issued on the corresponding nocb_gp kthread. As a result, the callback and all the subsequent ones on that CPU may be ignored, at least until an RCU_NOCB_WAKE_FORCE timer is ever armed or another NOCB CPU belonging to the same group enqueues a callback on an empty queue. Here is a possible bad scenario: 1) CPU 0 is NOCB unlike all other CPUs. 2) CPU 0 queues a callback 2) The grace period related to that callback elapses 3) The callback is moved to the done list (but is not invoked yet), there are no more pending callbacks for CPU 0 4) CPU 1 calls rcu_barrier() and sends an IPI to CPU 0 5) CPU 0 entrains the callback but doesn't wake up nocb_gp 6) CPU 1 blocks forever, unless CPU 0 ever queues enough further callbacks to arm an RCU_NOCB_WAKE_FORCE timer. Make sure the necessary wake up is produced whenever necessary. Reported-by: Joel Fernandes (Google) Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs") Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree.c | 6 ++++++ kernel/rcu/tree.h | 1 + kernel/rcu/tree_nocb.h | 5 +++++ 3 files changed, 12 insertions(+) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 96d678c9cfb6..025f59f6f97f 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3914,6 +3914,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp) { unsigned long gseq = READ_ONCE(rcu_state.barrier_sequence); unsigned long lseq = READ_ONCE(rdp->barrier_seq_snap); + bool wake_nocb = false; + bool was_alldone = false; lockdep_assert_held(&rcu_state.barrier_lock); if (rcu_seq_state(lseq) || !rcu_seq_state(gseq) || rcu_seq_ctr(lseq) != rcu_seq_ctr(gseq)) @@ -3922,6 +3924,7 @@ static void rcu_barrier_entrain(struct rcu_data *rdp) rdp->barrier_head.func = rcu_barrier_callback; debug_rcu_head_queue(&rdp->barrier_head); rcu_nocb_lock(rdp); + was_alldone = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs(&rdp->cblist); WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies)); if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) { atomic_inc(&rcu_state.barrier_cpu_count); @@ -3929,7 +3932,10 @@ static void rcu_barrier_entrain(struct rcu_data *rdp) debug_rcu_head_unqueue(&rdp->barrier_head); rcu_barrier_trace(TPS("IRQNQ"), -1, rcu_state.barrier_sequence); } + wake_nocb = was_alldone && rcu_segcblist_pend_cbs(&rdp->cblist); rcu_nocb_unlock(rdp); + if (wake_nocb) + wake_nocb_gp(rdp, false); smp_store_release(&rdp->barrier_seq_snap, gseq); } diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index d4a97e40ea9c..925dd98f8b23 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -439,6 +439,7 @@ static void zero_cpu_stall_ticks(struct rcu_data *rdp); static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp); static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq); static void rcu_init_one_nocb(struct rcu_node *rnp); +static bool wake_nocb_gp(struct rcu_data *rdp, bool force); static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, unsigned long j); static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index f77a6d7e1356..094fd454b6c3 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -1558,6 +1558,11 @@ static void rcu_init_one_nocb(struct rcu_node *rnp) { } +static bool wake_nocb_gp(struct rcu_data *rdp, bool force) +{ + return false; +} + static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, unsigned long j) { -- 2.25.1