2024-02-01 01:42:21

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 0/6] RCU nocb updates for v6.9

Hi,

This series contains the updates and fixes of RCU nocb for v6.9.
You can also find the series at:

https://github.com/fbq/linux.git rcu-nocb.2024.01.29a

The detailed list of the changes:

Frederic Weisbecker (4):
rcu/nocb: Remove needless LOAD-ACQUIRE
rcu/nocb: Remove needless full barrier after callback advancing
rcu/nocb: Make IRQs disablement symmetric
rcu/nocb: Re-arrange call_rcu() NOCB specific code

Zqiang (2):
rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock()
rcu/nocb: Check rdp_gp->nocb_timer in __call_rcu_nocb_wake()

kernel/rcu/tree.c | 53 +++++++++++++++++++++++-------------------
kernel/rcu/tree.h | 9 ++++---
kernel/rcu/tree_nocb.h | 47 ++++++++++++++++++++++---------------
3 files changed, 61 insertions(+), 48 deletions(-)

--
2.43.0



2024-02-01 01:42:32

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 1/6] rcu/nocb: Remove needless LOAD-ACQUIRE

From: Frederic Weisbecker <[email protected]>

The LOAD-ACQUIRE access performed on rdp->nocb_cb_sleep advertizes
ordering callback execution against grace period completion. However
this is contradicted by the following:

* This LOAD-ACQUIRE doesn't pair with anything. The only counterpart
barrier that can be found is the smp_mb() placed after callbacks
advancing in nocb_gp_wait(). However the barrier is placed _after_
->nocb_cb_sleep write.

* Callbacks can be concurrently advanced between the LOAD-ACQUIRE on
->nocb_cb_sleep and the call to rcu_segcblist_extract_done_cbs() in
rcu_do_batch(), making any ordering based on ->nocb_cb_sleep broken.

* Both rcu_segcblist_extract_done_cbs() and rcu_advance_cbs() are called
under the nocb_lock, the latter hereby providing already the desired
ACQUIRE semantics.

Therefore it is safe to access ->nocb_cb_sleep with a simple compiler
barrier.

Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/rcu/tree_nocb.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 4efbf7333d4e..785946834c6b 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -933,8 +933,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
nocb_cb_wait_cond(rdp));

- // VVV Ensure CB invocation follows _sleep test.
- if (smp_load_acquire(&rdp->nocb_cb_sleep)) { // ^^^
+ if (READ_ONCE(rdp->nocb_cb_sleep)) {
WARN_ON(signal_pending(current));
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
}
--
2.43.0


2024-02-01 01:43:21

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 4/6] rcu/nocb: Re-arrange call_rcu() NOCB specific code

From: Frederic Weisbecker <[email protected]>

Currently the call_rcu() function interleaves NOCB and !NOCB enqueue
code in a complicated way such that:

* The bypass enqueue code may or may not have enqueued and may or may
not have locked the ->nocb_lock. Everything that follows is in a
Schrödinger locking state for the unwary reviewer's eyes.

* The was_alldone is always set but only used in NOCB related code.

* The NOCB wake up is distantly related to the locking hopefully
performed by the bypass enqueue code that did not enqueue on the
bypass list.

Unconfuse the whole and gather NOCB and !NOCB specific enqueue code to
their own functions.

Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/rcu/tree.c | 44 +++++++++++++++++++-----------------------
kernel/rcu/tree.h | 9 ++++-----
kernel/rcu/tree_nocb.h | 18 ++++++++++++++---
3 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a402dc4e9a9c..cc0e169e299a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2597,12 +2597,26 @@ static int __init rcu_spawn_core_kthreads(void)
return 0;
}

+static void rcutree_enqueue(struct rcu_data *rdp, struct rcu_head *head, rcu_callback_t func)
+{
+ rcu_segcblist_enqueue(&rdp->cblist, head);
+ if (__is_kvfree_rcu_offset((unsigned long)func))
+ trace_rcu_kvfree_callback(rcu_state.name, head,
+ (unsigned long)func,
+ rcu_segcblist_n_cbs(&rdp->cblist));
+ else
+ trace_rcu_callback(rcu_state.name, head,
+ rcu_segcblist_n_cbs(&rdp->cblist));
+ trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
+}
+
/*
* Handle any core-RCU processing required by a call_rcu() invocation.
*/
-static void __call_rcu_core(struct rcu_data *rdp, struct rcu_head *head,
- unsigned long flags)
+static void call_rcu_core(struct rcu_data *rdp, struct rcu_head *head,
+ rcu_callback_t func, unsigned long flags)
{
+ rcutree_enqueue(rdp, head, func);
/*
* If called from an extended quiescent state, invoke the RCU
* core in order to force a re-evaluation of RCU's idleness.
@@ -2698,7 +2712,6 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
unsigned long flags;
bool lazy;
struct rcu_data *rdp;
- bool was_alldone;

/* Misaligned rcu_head! */
WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1));
@@ -2735,28 +2748,11 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
}

check_cb_ovld(rdp);
- if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy)) {
- local_irq_restore(flags);
- return; // Enqueued onto ->nocb_bypass, so just leave.
- }
- // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
- rcu_segcblist_enqueue(&rdp->cblist, head);
- if (__is_kvfree_rcu_offset((unsigned long)func))
- trace_rcu_kvfree_callback(rcu_state.name, head,
- (unsigned long)func,
- rcu_segcblist_n_cbs(&rdp->cblist));
- else
- trace_rcu_callback(rcu_state.name, head,
- rcu_segcblist_n_cbs(&rdp->cblist));
-
- trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));

- /* Go handle any RCU core processing required. */
- if (unlikely(rcu_rdp_is_offloaded(rdp))) {
- __call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
- } else {
- __call_rcu_core(rdp, head, flags);
- }
+ if (unlikely(rcu_rdp_is_offloaded(rdp)))
+ call_rcu_nocb(rdp, head, func, flags, lazy);
+ else
+ call_rcu_core(rdp, head, func, flags);
local_irq_restore(flags);
}

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e9821a8422db..bf478da89a8f 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -467,11 +467,10 @@ static void rcu_init_one_nocb(struct rcu_node *rnp);
static bool wake_nocb_gp(struct rcu_data *rdp, bool force);
static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
unsigned long j, bool lazy);
-static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
- bool *was_alldone, unsigned long flags,
- bool lazy);
-static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
- unsigned long flags);
+static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
+ rcu_callback_t func, unsigned long flags, bool lazy);
+static void __maybe_unused __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
+ unsigned long flags);
static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
static bool do_nocb_deferred_wakeup(struct rcu_data *rdp);
static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 1d5c03c5c702..9e8052ba14b9 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -622,6 +622,18 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
}
}

+static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
+ rcu_callback_t func, unsigned long flags, bool lazy)
+{
+ bool was_alldone;
+
+ if (!rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy)) {
+ /* Not enqueued on bypass but locked, do regular enqueue */
+ rcutree_enqueue(rdp, head, func);
+ __call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
+ }
+}
+
static int nocb_gp_toggle_rdp(struct rcu_data *rdp,
bool *wake_state)
{
@@ -1764,10 +1776,10 @@ static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
return true;
}

-static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
- bool *was_alldone, unsigned long flags, bool lazy)
+static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
+ rcu_callback_t func, unsigned long flags, bool lazy)
{
- return false;
+ WARN_ON_ONCE(1); /* Should be dead code! */
}

static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
--
2.43.0


2024-02-01 01:46:49

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 2/6] rcu/nocb: Remove needless full barrier after callback advancing

From: Frederic Weisbecker <[email protected]>

A full barrier is issued from nocb_gp_wait() upon callbacks advancing
to order grace period completion with callbacks execution.

However these two events are already ordered by the
smp_mb__after_unlock_lock() barrier within the call to
raw_spin_lock_rcu_node() that is necessary for callbacks advancing to
happen.

The following litmus test shows the kind of guarantee that this barrier
provides:

C smp_mb__after_unlock_lock

{}

// rcu_gp_cleanup()
P0(spinlock_t *rnp_lock, int *gpnum)
{
// Grace period cleanup increase gp sequence number
spin_lock(rnp_lock);
WRITE_ONCE(*gpnum, 1);
spin_unlock(rnp_lock);
}

// nocb_gp_wait()
P1(spinlock_t *rnp_lock, spinlock_t *nocb_lock, int *gpnum, int *cb_ready)
{
int r1;

// Call rcu_advance_cbs() from nocb_gp_wait()
spin_lock(nocb_lock);
spin_lock(rnp_lock);
smp_mb__after_unlock_lock();
r1 = READ_ONCE(*gpnum);
WRITE_ONCE(*cb_ready, 1);
spin_unlock(rnp_lock);
spin_unlock(nocb_lock);
}

// nocb_cb_wait()
P2(spinlock_t *nocb_lock, int *cb_ready, int *cb_executed)
{
int r2;

// rcu_do_batch() -> rcu_segcblist_extract_done_cbs()
spin_lock(nocb_lock);
r2 = READ_ONCE(*cb_ready);
spin_unlock(nocb_lock);

// Actual callback execution
WRITE_ONCE(*cb_executed, 1);
}

P3(int *cb_executed, int *gpnum)
{
int r3;

WRITE_ONCE(*cb_executed, 2);
smp_mb();
r3 = READ_ONCE(*gpnum);
}

exists (1:r1=1 /\ 2:r2=1 /\ cb_executed=2 /\ 3:r3=0) (* Bad outcome. *)

Here the bad outcome only occurs if the smp_mb__after_unlock_lock() is
removed. This barrier orders the grace period completion against
callbacks advancing and even later callbacks invocation, thanks to the
opportunistic propagation via the ->nocb_lock to nocb_cb_wait().

Therefore the smp_mb() placed after callbacks advancing can be safely
removed.

Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/rcu/tree.c | 6 ++++++
kernel/rcu/tree_nocb.h | 1 -
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b2bccfd37c38..d540d210e5c7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2145,6 +2145,12 @@ static void rcu_do_batch(struct rcu_data *rdp)
* Extract the list of ready callbacks, disabling IRQs to prevent
* races with call_rcu() from interrupt handlers. Leave the
* callback counts, as rcu_barrier() needs to be conservative.
+ *
+ * Callbacks execution is fully ordered against preceding grace period
+ * completion (materialized by rnp->gp_seq update) thanks to the
+ * smp_mb__after_unlock_lock() upon node locking required for callbacks
+ * advancing. In NOCB mode this ordering is then further relayed through
+ * the nocb locking that protects both callbacks advancing and extraction.
*/
rcu_nocb_lock_irqsave(rdp, flags);
WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 785946834c6b..b2c3145c4c13 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -779,7 +779,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
needwake = rdp->nocb_cb_sleep;
WRITE_ONCE(rdp->nocb_cb_sleep, false);
- smp_mb(); /* CB invocation -after- GP end. */
} else {
needwake = false;
}
--
2.43.0


2024-02-01 01:46:59

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 3/6] rcu/nocb: Make IRQs disablement symmetric

From: Frederic Weisbecker <[email protected]>

Currently IRQs are disabled on call_rcu() and then depending on the
context:

* If the CPU is in nocb mode:

- If the callback is enqueued in the bypass list, IRQs are re-enabled
implictly by rcu_nocb_try_bypass()

- If the callback is enqueued in the normal list, IRQs are re-enabled
implicitly by __call_rcu_nocb_wake()

* If the CPU is NOT in nocb mode, IRQs are reenabled explicitly from call_rcu()

This makes the code a bit hard to follow, especially as it interleaves
with nocb locking.

To make the IRQ flags coverage clearer and also in order to prepare for
moving all the nocb enqueue code to its own function, always re-enable
the IRQ flags explicitly from call_rcu().

Reviewed-by: Neeraj Upadhyay (AMD) <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/rcu/tree.c | 9 ++++++---
kernel/rcu/tree_nocb.h | 20 +++++++++-----------
2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d540d210e5c7..a402dc4e9a9c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2735,8 +2735,10 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
}

check_cb_ovld(rdp);
- if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy))
+ if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy)) {
+ local_irq_restore(flags);
return; // Enqueued onto ->nocb_bypass, so just leave.
+ }
// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
rcu_segcblist_enqueue(&rdp->cblist, head);
if (__is_kvfree_rcu_offset((unsigned long)func))
@@ -2754,8 +2756,8 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
} else {
__call_rcu_core(rdp, head, flags);
- local_irq_restore(flags);
}
+ local_irq_restore(flags);
}

#ifdef CONFIG_RCU_LAZY
@@ -4646,8 +4648,9 @@ void rcutree_migrate_callbacks(int cpu)
__call_rcu_nocb_wake(my_rdp, true, flags);
} else {
rcu_nocb_unlock(my_rdp); /* irqs remain disabled. */
- raw_spin_unlock_irqrestore_rcu_node(my_rnp, flags);
+ raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */
}
+ local_irq_restore(flags);
if (needwake)
rcu_gp_kthread_wake();
lockdep_assert_irqs_enabled();
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index b2c3145c4c13..1d5c03c5c702 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -532,9 +532,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
// 2. Both of these conditions are met:
// a. The bypass list previously had only lazy CBs, and:
// b. The new CB is non-lazy.
- if (ncbs && (!bypass_is_lazy || lazy)) {
- local_irq_restore(flags);
- } else {
+ if (!ncbs || (bypass_is_lazy && !lazy)) {
// No-CBs GP kthread might be indefinitely asleep, if so, wake.
rcu_nocb_lock(rdp); // Rare during call_rcu() flood.
if (!rcu_segcblist_pend_cbs(&rdp->cblist)) {
@@ -544,7 +542,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
} else {
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("FirstBQnoWake"));
- rcu_nocb_unlock_irqrestore(rdp, flags);
+ rcu_nocb_unlock(rdp);
}
}
return true; // Callback already enqueued.
@@ -570,7 +568,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
// If we are being polled or there is no kthread, just leave.
t = READ_ONCE(rdp->nocb_gp_kthread);
if (rcu_nocb_poll || !t) {
- rcu_nocb_unlock_irqrestore(rdp, flags);
+ rcu_nocb_unlock(rdp);
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("WakeNotPoll"));
return;
@@ -583,17 +581,17 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
rdp->qlen_last_fqs_check = len;
// Only lazy CBs in bypass list
if (lazy_len && bypass_len == lazy_len) {
- rcu_nocb_unlock_irqrestore(rdp, flags);
+ rcu_nocb_unlock(rdp);
wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
TPS("WakeLazy"));
} else if (!irqs_disabled_flags(flags)) {
/* ... if queue was empty ... */
- rcu_nocb_unlock_irqrestore(rdp, flags);
+ rcu_nocb_unlock(rdp);
wake_nocb_gp(rdp, false);
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("WakeEmpty"));
} else {
- rcu_nocb_unlock_irqrestore(rdp, flags);
+ rcu_nocb_unlock(rdp);
wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
TPS("WakeEmptyIsDeferred"));
}
@@ -611,15 +609,15 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
if ((rdp->nocb_cb_sleep ||
!rcu_segcblist_ready_cbs(&rdp->cblist)) &&
!timer_pending(&rdp->nocb_timer)) {
- rcu_nocb_unlock_irqrestore(rdp, flags);
+ rcu_nocb_unlock(rdp);
wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE,
TPS("WakeOvfIsDeferred"));
} else {
- rcu_nocb_unlock_irqrestore(rdp, flags);
+ rcu_nocb_unlock(rdp);
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
}
} else {
- rcu_nocb_unlock_irqrestore(rdp, flags);
+ rcu_nocb_unlock(rdp);
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
}
}
--
2.43.0


2024-02-01 01:47:20

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 5/6] rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock()

From: Zqiang <[email protected]>

For the kernels built with CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y and
CONFIG_RCU_LAZY=y, the following scenarios will trigger WARN_ON_ONCE()
in the rcu_nocb_bypass_lock() and rcu_nocb_wait_contended() functions:

CPU2 CPU11
kthread
rcu_nocb_cb_kthread ksys_write
rcu_do_batch vfs_write
rcu_torture_timer_cb proc_sys_write
__kmem_cache_free proc_sys_call_handler
kmemleak_free drop_caches_sysctl_handler
delete_object_full drop_slab
__delete_object shrink_slab
put_object lazy_rcu_shrink_scan
call_rcu rcu_nocb_flush_bypass
__call_rcu_commn rcu_nocb_bypass_lock
raw_spin_trylock(&rdp->nocb_bypass_lock) fail
atomic_inc(&rdp->nocb_lock_contended);
rcu_nocb_wait_contended WARN_ON_ONCE(smp_processor_id() != rdp->cpu);
WARN_ON_ONCE(atomic_read(&rdp->nocb_lock_contended)) |
|_ _ _ _ _ _ _ _ _ _same rdp and rdp->cpu != 11_ _ _ _ _ _ _ _ _ __|

Reproduce this bug with "echo 3 > /proc/sys/vm/drop_caches".

This commit therefore uses rcu_nocb_try_flush_bypass() instead of
rcu_nocb_flush_bypass() in lazy_rcu_shrink_scan(). If the nocb_bypass
queue is being flushed, then rcu_nocb_try_flush_bypass will return
directly.

Signed-off-by: Zqiang <[email protected]>
Reviewed-by: Joel Fernandes (Google) <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/rcu/tree_nocb.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 9e8052ba14b9..ffa69a5e18f4 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1391,7 +1391,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
rcu_nocb_unlock_irqrestore(rdp, flags);
continue;
}
- WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
+ rcu_nocb_try_flush_bypass(rdp, jiffies);
rcu_nocb_unlock_irqrestore(rdp, flags);
wake_nocb_gp(rdp, false);
sc->nr_to_scan -= _count;
--
2.43.0


2024-02-01 01:47:32

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 6/6] rcu/nocb: Check rdp_gp->nocb_timer in __call_rcu_nocb_wake()

From: Zqiang <[email protected]>

Currently, only rdp_gp->nocb_timer is used, for nocb_timer of
no-rdp_gp structure, the timer_pending() is always return false,
this commit therefore need to check rdp_gp->nocb_timer in
__call_rcu_nocb_wake().

Signed-off-by: Zqiang <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/rcu/tree_nocb.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index ffa69a5e18f4..f124d4d45ce6 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
long lazy_len;
long len;
struct task_struct *t;
+ struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;

// If we are being polled or there is no kthread, just leave.
t = READ_ONCE(rdp->nocb_gp_kthread);
@@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
smp_mb(); /* Enqueue before timer_pending(). */
if ((rdp->nocb_cb_sleep ||
!rcu_segcblist_ready_cbs(&rdp->cblist)) &&
- !timer_pending(&rdp->nocb_timer)) {
+ !timer_pending(&rdp_gp->nocb_timer)) {
rcu_nocb_unlock(rdp);
wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE,
TPS("WakeOvfIsDeferred"));
--
2.43.0


2024-02-24 16:31:02

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [PATCH 5/6] rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock()

Hello.

On čtvrtek 1. února 2024 2:40:57 CET Boqun Feng wrote:
> From: Zqiang <[email protected]>
>
> For the kernels built with CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y and
> CONFIG_RCU_LAZY=y, the following scenarios will trigger WARN_ON_ONCE()
> in the rcu_nocb_bypass_lock() and rcu_nocb_wait_contended() functions:
>
> CPU2 CPU11
> kthread
> rcu_nocb_cb_kthread ksys_write
> rcu_do_batch vfs_write
> rcu_torture_timer_cb proc_sys_write
> __kmem_cache_free proc_sys_call_handler
> kmemleak_free drop_caches_sysctl_handler
> delete_object_full drop_slab
> __delete_object shrink_slab
> put_object lazy_rcu_shrink_scan
> call_rcu rcu_nocb_flush_bypass
> __call_rcu_commn rcu_nocb_bypass_lock
> raw_spin_trylock(&rdp->nocb_bypass_lock) fail
> atomic_inc(&rdp->nocb_lock_contended);
> rcu_nocb_wait_contended WARN_ON_ONCE(smp_processor_id() != rdp->cpu);
> WARN_ON_ONCE(atomic_read(&rdp->nocb_lock_contended)) |
> |_ _ _ _ _ _ _ _ _ _same rdp and rdp->cpu != 11_ _ _ _ _ _ _ _ _ __|
>
> Reproduce this bug with "echo 3 > /proc/sys/vm/drop_caches".
>
> This commit therefore uses rcu_nocb_try_flush_bypass() instead of
> rcu_nocb_flush_bypass() in lazy_rcu_shrink_scan(). If the nocb_bypass
> queue is being flushed, then rcu_nocb_try_flush_bypass will return
> directly.
>
> Signed-off-by: Zqiang <[email protected]>
> Reviewed-by: Joel Fernandes (Google) <[email protected]>
> Reviewed-by: Frederic Weisbecker <[email protected]>
> Reviewed-by: Paul E. McKenney <[email protected]>
> Signed-off-by: Boqun Feng <[email protected]>
> ---
> kernel/rcu/tree_nocb.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 9e8052ba14b9..ffa69a5e18f4 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1391,7 +1391,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> rcu_nocb_unlock_irqrestore(rdp, flags);
> continue;
> }
> - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
> + rcu_nocb_try_flush_bypass(rdp, jiffies);
> rcu_nocb_unlock_irqrestore(rdp, flags);
> wake_nocb_gp(rdp, false);
> sc->nr_to_scan -= _count;
>

Does this fix [1] [2]?

Thank you.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=217948
[2] https://lore.kernel.org/lkml/[email protected]/

--
Oleksandr Natalenko (post-factum)


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2024-02-27 00:21:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 5/6] rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock()

On Sat, Feb 24, 2024 at 05:30:29PM +0100, Oleksandr Natalenko wrote:
> Hello.
>
> On čtvrtek 1. února 2024 2:40:57 CET Boqun Feng wrote:
> > From: Zqiang <[email protected]>
> >
> > For the kernels built with CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y and
> > CONFIG_RCU_LAZY=y, the following scenarios will trigger WARN_ON_ONCE()
> > in the rcu_nocb_bypass_lock() and rcu_nocb_wait_contended() functions:
> >
> > CPU2 CPU11
> > kthread
> > rcu_nocb_cb_kthread ksys_write
> > rcu_do_batch vfs_write
> > rcu_torture_timer_cb proc_sys_write
> > __kmem_cache_free proc_sys_call_handler
> > kmemleak_free drop_caches_sysctl_handler
> > delete_object_full drop_slab
> > __delete_object shrink_slab
> > put_object lazy_rcu_shrink_scan
> > call_rcu rcu_nocb_flush_bypass
> > __call_rcu_commn rcu_nocb_bypass_lock
> > raw_spin_trylock(&rdp->nocb_bypass_lock) fail
> > atomic_inc(&rdp->nocb_lock_contended);
> > rcu_nocb_wait_contended WARN_ON_ONCE(smp_processor_id() != rdp->cpu);
> > WARN_ON_ONCE(atomic_read(&rdp->nocb_lock_contended)) |
> > |_ _ _ _ _ _ _ _ _ _same rdp and rdp->cpu != 11_ _ _ _ _ _ _ _ _ __|
> >
> > Reproduce this bug with "echo 3 > /proc/sys/vm/drop_caches".
> >
> > This commit therefore uses rcu_nocb_try_flush_bypass() instead of
> > rcu_nocb_flush_bypass() in lazy_rcu_shrink_scan(). If the nocb_bypass
> > queue is being flushed, then rcu_nocb_try_flush_bypass will return
> > directly.
> >
> > Signed-off-by: Zqiang <[email protected]>
> > Reviewed-by: Joel Fernandes (Google) <[email protected]>
> > Reviewed-by: Frederic Weisbecker <[email protected]>
> > Reviewed-by: Paul E. McKenney <[email protected]>
> > Signed-off-by: Boqun Feng <[email protected]>
> > ---
> > kernel/rcu/tree_nocb.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 9e8052ba14b9..ffa69a5e18f4 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -1391,7 +1391,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > rcu_nocb_unlock_irqrestore(rdp, flags);
> > continue;
> > }
> > - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
> > + rcu_nocb_try_flush_bypass(rdp, jiffies);
> > rcu_nocb_unlock_irqrestore(rdp, flags);
> > wake_nocb_gp(rdp, false);
> > sc->nr_to_scan -= _count;
> >
>
> Does this fix [1] [2]?
>
> Thank you.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217948
> [2] https://lore.kernel.org/lkml/[email protected]/

It might, but why not apply it to the exact kernel version on which the
bug appeared and see if it prevents it?

Thanx, Paul