Changes since v1 (https://lore.kernel.org/lkml/[email protected]/):
* Use mutex_trylock() to avoid inverted dependency chain against
allocations.
* WARN if an rdp is part of nocb mask but is not offloaded
Tested through shrinker debugfs interface.
Frederic Weisbecker (4):
rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading
rcu/nocb: Fix shrinker race against callback enqueuer
rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker
rcu/nocb: Make shrinker to iterate only NOCB CPUs
kernel/rcu/tree_nocb.h | 52 ++++++++++++++++++++++++++++++++++++++----
1 file changed, 47 insertions(+), 5 deletions(-)
--
2.34.1
The shrinker may run concurrently with callbacks (de-)offloading. As
such, calling rcu_nocb_lock() is very dangerous because it does a
conditional locking. The worst outcome is that rcu_nocb_lock() doesn't
lock but rcu_nocb_unlock() eventually unlocks, or the reverse, creating
an imbalance.
Fix this with protecting against (de-)offloading using the barrier mutex.
Although if the barrier mutex is contended, which should be rare, then
step aside so as not to trigger a mutex VS allocation
dependency chain.
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/rcu/tree_nocb.h | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f2280616f9d5..1a86883902ce 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1336,13 +1336,33 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
unsigned long flags;
unsigned long count = 0;
+ /*
+ * Protect against concurrent (de-)offloading. Otherwise nocb locking
+ * may be ignored or imbalanced.
+ */
+ if (!mutex_trylock(&rcu_state.barrier_mutex)) {
+ /*
+ * But really don't insist if barrier_mutex is contended since we
+ * can't guarantee that it will never engage in a dependency
+ * chain involving memory allocation. The lock is seldom contended
+ * anyway.
+ */
+ return 0;
+ }
+
/* Snapshot count of all CPUs */
for_each_possible_cpu(cpu) {
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
- int _count = READ_ONCE(rdp->lazy_len);
+ int _count;
+
+ if (!rcu_rdp_is_offloaded(rdp))
+ continue;
+
+ _count = READ_ONCE(rdp->lazy_len);
if (_count == 0)
continue;
+
rcu_nocb_lock_irqsave(rdp, flags);
WRITE_ONCE(rdp->lazy_len, 0);
rcu_nocb_unlock_irqrestore(rdp, flags);
@@ -1352,6 +1372,9 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
if (sc->nr_to_scan <= 0)
break;
}
+
+ mutex_unlock(&rcu_state.barrier_mutex);
+
return count ? count : SHRINK_STOP;
}
--
2.34.1
The shrinker resets the lazy callbacks counter in order to trigger the
pending lazy queue flush though the rcuog kthread. The counter reset is
protected by the ->nocb_lock against concurrent accesses...except
for one of them. Here is a list of existing synchronized readers/writer:
1) The first lazy enqueuer (incrementing ->lazy_len to 1) does so under
->nocb_lock and ->nocb_bypass_lock.
2) The further lazy enqueuers (incrementing ->lazy_len above 1) do so
under ->nocb_bypass_lock _only_.
3) The lazy flush checks and resets to 0 under ->nocb_lock and
->nocb_bypass_lock.
The shrinker protects its ->lazy_len reset against cases 1) and 3) but
not against 2). As such, setting ->lazy_len to 0 under the ->nocb_lock
may be cancelled right away by an overwrite from an enqueuer, leading
rcuog to ignore the flush.
To avoid that, use the proper bypass flush API which takes care of all
those details.
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/rcu/tree_nocb.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 1a86883902ce..c321fce2af8e 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1364,7 +1364,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
continue;
rcu_nocb_lock_irqsave(rdp, flags);
- WRITE_ONCE(rdp->lazy_len, 0);
+ WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
rcu_nocb_unlock_irqrestore(rdp, flags);
wake_nocb_gp(rdp, false);
sc->nr_to_scan -= _count;
--
2.34.1
On Wed, Mar 29, 2023 at 06:02:00PM +0200, Frederic Weisbecker wrote:
> The shrinker may run concurrently with callbacks (de-)offloading. As
> such, calling rcu_nocb_lock() is very dangerous because it does a
> conditional locking. The worst outcome is that rcu_nocb_lock() doesn't
> lock but rcu_nocb_unlock() eventually unlocks, or the reverse, creating
> an imbalance.
>
> Fix this with protecting against (de-)offloading using the barrier mutex.
> Although if the barrier mutex is contended, which should be rare, then
> step aside so as not to trigger a mutex VS allocation
> dependency chain.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> kernel/rcu/tree_nocb.h | 25 ++++++++++++++++++++++++-
> 1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index f2280616f9d5..1a86883902ce 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1336,13 +1336,33 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> unsigned long flags;
> unsigned long count = 0;
>
> + /*
> + * Protect against concurrent (de-)offloading. Otherwise nocb locking
> + * may be ignored or imbalanced.
> + */
> + if (!mutex_trylock(&rcu_state.barrier_mutex)) {
This looks much better, thank you!
> + /*
> + * But really don't insist if barrier_mutex is contended since we
> + * can't guarantee that it will never engage in a dependency
> + * chain involving memory allocation. The lock is seldom contended
> + * anyway.
> + */
> + return 0;
> + }
> +
> /* Snapshot count of all CPUs */
> for_each_possible_cpu(cpu) {
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> - int _count = READ_ONCE(rdp->lazy_len);
> + int _count;
> +
> + if (!rcu_rdp_is_offloaded(rdp))
> + continue;
> +
> + _count = READ_ONCE(rdp->lazy_len);
>
> if (_count == 0)
> continue;
> +
And I just might have unconfused myself here. We get here only if this
CPU is offloaded, in which case it might also have non-zero ->lazy_len,
so this is in fact *not* dead code.
> rcu_nocb_lock_irqsave(rdp, flags);
> WRITE_ONCE(rdp->lazy_len, 0);
> rcu_nocb_unlock_irqrestore(rdp, flags);
> @@ -1352,6 +1372,9 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> if (sc->nr_to_scan <= 0)
> break;
> }
> +
> + mutex_unlock(&rcu_state.barrier_mutex);
> +
> return count ? count : SHRINK_STOP;
> }
>
> --
> 2.34.1
>
On Wed, Mar 29, 2023 at 06:02:01PM +0200, Frederic Weisbecker wrote:
> The shrinker resets the lazy callbacks counter in order to trigger the
> pending lazy queue flush though the rcuog kthread. The counter reset is
> protected by the ->nocb_lock against concurrent accesses...except
> for one of them. Here is a list of existing synchronized readers/writer:
>
> 1) The first lazy enqueuer (incrementing ->lazy_len to 1) does so under
> ->nocb_lock and ->nocb_bypass_lock.
>
> 2) The further lazy enqueuers (incrementing ->lazy_len above 1) do so
> under ->nocb_bypass_lock _only_.
>
> 3) The lazy flush checks and resets to 0 under ->nocb_lock and
> ->nocb_bypass_lock.
>
> The shrinker protects its ->lazy_len reset against cases 1) and 3) but
> not against 2). As such, setting ->lazy_len to 0 under the ->nocb_lock
> may be cancelled right away by an overwrite from an enqueuer, leading
> rcuog to ignore the flush.
>
> To avoid that, use the proper bypass flush API which takes care of all
> those details.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> kernel/rcu/tree_nocb.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 1a86883902ce..c321fce2af8e 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1364,7 +1364,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> continue;
>
> rcu_nocb_lock_irqsave(rdp, flags);
> - WRITE_ONCE(rdp->lazy_len, 0);
> + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
And I do feel much better about this version. ;-)
> rcu_nocb_unlock_irqrestore(rdp, flags);
> wake_nocb_gp(rdp, false);
> sc->nr_to_scan -= _count;
> --
> 2.34.1
>
On Wed, Mar 29, 2023 at 01:44:53PM -0700, Paul E. McKenney wrote:
> On Wed, Mar 29, 2023 at 06:02:00PM +0200, Frederic Weisbecker wrote:
> > + /*
> > + * But really don't insist if barrier_mutex is contended since we
> > + * can't guarantee that it will never engage in a dependency
> > + * chain involving memory allocation. The lock is seldom contended
> > + * anyway.
> > + */
> > + return 0;
> > + }
> > +
> > /* Snapshot count of all CPUs */
> > for_each_possible_cpu(cpu) {
> > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > - int _count = READ_ONCE(rdp->lazy_len);
> > + int _count;
> > +
> > + if (!rcu_rdp_is_offloaded(rdp))
> > + continue;
> > +
> > + _count = READ_ONCE(rdp->lazy_len);
> >
> > if (_count == 0)
> > continue;
> > +
>
> And I just might have unconfused myself here. We get here only if this
> CPU is offloaded, in which case it might also have non-zero ->lazy_len,
> so this is in fact *not* dead code.
Right. Now whether it's really alive remains to be proven ;)
On Wed, Mar 29, 2023 at 06:01:59PM +0200, Frederic Weisbecker wrote:
> Changes since v1 (https://lore.kernel.org/lkml/[email protected]/):
>
> * Use mutex_trylock() to avoid inverted dependency chain against
> allocations.
>
> * WARN if an rdp is part of nocb mask but is not offloaded
>
> Tested through shrinker debugfs interface.
I pulled this one in, thank you!
As discussed, we do need some way to test lazy callbacks, but that should
not block this series. And it might well be a separate test.
Thanx, Paul
> Frederic Weisbecker (4):
> rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading
> rcu/nocb: Fix shrinker race against callback enqueuer
> rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker
> rcu/nocb: Make shrinker to iterate only NOCB CPUs
>
> kernel/rcu/tree_nocb.h | 52 ++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 47 insertions(+), 5 deletions(-)
>
> --
> 2.34.1
>