2022-06-20 22:45:54

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 0/7] Callback-offload (nocb) updates for v5.20

Hello!

This series provides NOCB-CPU updates:

1. Add/del rdp to iterate from rcuog itself, courtesy of Frederic
Weisbecker.

2. Invert rcu_state.barrier_mutex VS hotplug lock locking order,
courtesy of Zqiang.

3. Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload()
direct call, courtesy of Zqiang.

4. Add an option to offload all CPUs on boot, courtesy of Joel
Fernandes.

5. Add nocb_cb_kthread check to rcu_is_callbacks_kthread(), courtesy
of Zqiang.

6. Add option to opt rcuo kthreads out of RT priority, courtesy of
"Uladzislau Rezki (Sony)".

7. Avoid polling when my_rdp->nocb_head_rdp list is empty, courtesy
of Zqiang.

Thanx, Paul

------------------------------------------------------------------------

b/Documentation/admin-guide/kernel-parameters.txt | 6
b/kernel/rcu/Kconfig | 13 ++
b/kernel/rcu/tree.c | 4
b/kernel/rcu/tree.h | 1
b/kernel/rcu/tree_nocb.h | 138 +++++++++++-----------
b/kernel/rcu/tree_plugin.h | 33 +++--
kernel/rcu/Kconfig | 16 ++
kernel/rcu/tree.c | 6
kernel/rcu/tree.h | 2
kernel/rcu/tree_nocb.h | 130 ++++++++++++++++----
10 files changed, 236 insertions(+), 113 deletions(-)


2022-06-20 22:45:59

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 2/7] rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order

From: Zqiang <[email protected]>

In case of failure to spawn either rcuog or rcuo[p] kthreads for a given
rdp, rcu_nocb_rdp_deoffload() needs to be called with the hotplug
lock and the barrier_mutex held. However cpus write lock is already held
while calling rcutree_prepare_cpu(). It's not possible to call
rcu_nocb_rdp_deoffload() from there with just locking the barrier_mutex
or this would result in a locking inversion against
rcu_nocb_cpu_deoffload() which holds both locks in the reverse order.

Simply solve this with inverting the locking order inside
rcu_nocb_cpu_[de]offload(). This will be a pre-requisite to toggle NOCB
states toward cpusets anyway.

Signed-off-by: Zqiang <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Joel Fernandes <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree_nocb.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index dac74952e1d1b..f2f2cab6285a1 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1055,8 +1055,8 @@ int rcu_nocb_cpu_deoffload(int cpu)
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
int ret = 0;

- mutex_lock(&rcu_state.barrier_mutex);
cpus_read_lock();
+ mutex_lock(&rcu_state.barrier_mutex);
if (rcu_rdp_is_offloaded(rdp)) {
if (cpu_online(cpu)) {
ret = work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
@@ -1067,8 +1067,8 @@ int rcu_nocb_cpu_deoffload(int cpu)
ret = -EINVAL;
}
}
- cpus_read_unlock();
mutex_unlock(&rcu_state.barrier_mutex);
+ cpus_read_unlock();

return ret;
}
@@ -1134,8 +1134,8 @@ int rcu_nocb_cpu_offload(int cpu)
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
int ret = 0;

- mutex_lock(&rcu_state.barrier_mutex);
cpus_read_lock();
+ mutex_lock(&rcu_state.barrier_mutex);
if (!rcu_rdp_is_offloaded(rdp)) {
if (cpu_online(cpu)) {
ret = work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp);
@@ -1146,8 +1146,8 @@ int rcu_nocb_cpu_offload(int cpu)
ret = -EINVAL;
}
}
- cpus_read_unlock();
mutex_unlock(&rcu_state.barrier_mutex);
+ cpus_read_unlock();

return ret;
}
--
2.31.1.189.g2e36527f23

2022-06-20 22:52:09

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 6/7] rcu/nocb: Add option to opt rcuo kthreads out of RT priority

From: "Uladzislau Rezki (Sony)" <[email protected]>

This commit introduces a RCU_NOCB_CPU_CB_BOOST Kconfig option that
prevents rcuo kthreads from running at real-time priority, even in
kernels built with RCU_BOOST. This capability is important to devices
needing low-latency (as in a few milliseconds) response from expedited
RCU grace periods, but which are not running a classic real-time workload.
On such devices, permitting the rcuo kthreads to run at real-time priority
results in unacceptable latencies imposed on the application tasks,
which run as SCHED_OTHER.

See for example the following trace output:

<snip>
<...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
<snip>

If that rcuop kthread were permitted to run at real-time SCHED_FIFO
priority, it would monopolize its CPU for hundreds of milliseconds
while invoking those 34619 RCU callback functions, which would cause an
unacceptably long latency spike for many application stacks on Android
platforms.

However, some existing real-time workloads require that callback
invocation run at SCHED_FIFO priority, for example, those running on
systems with heavy SCHED_OTHER background loads. (It is the real-time
system's administrator's responsibility to make sure that important
real-time tasks run at a higher priority than do RCU's kthreads.)

Therefore, this new RCU_NOCB_CPU_CB_BOOST Kconfig option defaults to
"y" on kernels built with PREEMPT_RT and defaults to "n" otherwise.
The effect is to preserve current behavior for real-time systems, but for
other systems to allow expedited RCU grace periods to run with real-time
priority while continuing to invoke RCU callbacks as SCHED_OTHER.

As you would expect, this RCU_NOCB_CPU_CB_BOOST Kconfig option has no
effect except on CPUs with offloaded RCU callbacks.

Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Acked-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/Kconfig | 16 ++++++++++++++++
kernel/rcu/tree.c | 6 +++++-
kernel/rcu/tree_nocb.h | 3 ++-
3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 27aab870ae4cf..c05ca52cdf64d 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -275,6 +275,22 @@ config RCU_NOCB_CPU_DEFAULT_ALL
Say Y here if you want offload all CPUs by default on boot.
Say N here if you are unsure.

+config RCU_NOCB_CPU_CB_BOOST
+ bool "Offload RCU callback from real-time kthread"
+ depends on RCU_NOCB_CPU && RCU_BOOST
+ default y if PREEMPT_RT
+ help
+ Use this option to invoke offloaded callbacks as SCHED_FIFO
+ to avoid starvation by heavy SCHED_OTHER background load.
+ Of course, running as SCHED_FIFO during callback floods will
+ cause the rcuo[ps] kthreads to monopolize the CPU for hundreds
+ of milliseconds or more. Therefore, when enabling this option,
+ it is your responsibility to ensure that latency-sensitive
+ tasks either run with higher priority or run on some other CPU.
+
+ Say Y here if you want to set RT priority for offloading kthreads.
+ Say N here if you are building a !PREEMPT_RT kernel and are unsure.
+
config TASKS_TRACE_RCU_READ_MB
bool "Tasks Trace RCU readers use memory barriers in user and idle"
depends on RCU_EXPERT && TASKS_TRACE_RCU
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 74455671e6cf2..3b9f45ebb4999 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -154,7 +154,11 @@ static void sync_sched_exp_online_cleanup(int cpu);
static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);

-/* rcuc/rcub/rcuop kthread realtime priority */
+/*
+ * rcuc/rcub/rcuop kthread realtime priority. The "rcuop"
+ * real-time priority(enabling/disabling) is controlled by
+ * the extra CONFIG_RCU_NOCB_CPU_CB_BOOST configuration.
+ */
static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
module_param(kthread_prio, int, 0444);

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 60cc92cc66552..fa8e4f82e60c0 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
goto end;

- if (kthread_prio)
+ if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST) && kthread_prio)
sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
+
WRITE_ONCE(rdp->nocb_cb_kthread, t);
WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
return;
--
2.31.1.189.g2e36527f23

2022-06-20 22:52:33

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 5/7] rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()

From: Zqiang <[email protected]>

Callbacks are invoked in RCU kthreads when calbacks are offloaded
(rcu_nocbs boot parameter) or when RCU's softirq handler has been
offloaded to rcuc kthreads (use_softirq==0). The current code allows
for the rcu_nocbs case but not the use_softirq case. This commit adds
support for the use_softirq case.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree.c | 4 ++--
kernel/rcu/tree.h | 2 +-
kernel/rcu/tree_plugin.h | 33 +++++++++++++++++++--------------
3 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c25ba442044a6..74455671e6cf2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2530,7 +2530,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
trace_rcu_batch_end(rcu_state.name, 0,
!rcu_segcblist_empty(&rdp->cblist),
need_resched(), is_idle_task(current),
- rcu_is_callbacks_kthread());
+ rcu_is_callbacks_kthread(rdp));
return;
}

@@ -2608,7 +2608,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
rcu_nocb_lock_irqsave(rdp, flags);
rdp->n_cbs_invoked += count;
trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
- is_idle_task(current), rcu_is_callbacks_kthread());
+ is_idle_task(current), rcu_is_callbacks_kthread(rdp));

/* Update counts and requeue any remaining callbacks. */
rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4f8532c33558f..649ad4f0129b1 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -426,7 +426,7 @@ static void rcu_flavor_sched_clock_irq(int user);
static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck);
static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
-static bool rcu_is_callbacks_kthread(void);
+static bool rcu_is_callbacks_kthread(struct rcu_data *rdp);
static void rcu_cpu_kthread_setup(unsigned int cpu);
static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp);
static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c8ba0fe17267c..0483e1338c413 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1012,6 +1012,25 @@ static void rcu_cpu_kthread_setup(unsigned int cpu)
WRITE_ONCE(rdp->rcuc_activity, jiffies);
}

+static bool rcu_is_callbacks_nocb_kthread(struct rcu_data *rdp)
+{
+#ifdef CONFIG_RCU_NOCB_CPU
+ return rdp->nocb_cb_kthread == current;
+#else
+ return false;
+#endif
+}
+
+/*
+ * Is the current CPU running the RCU-callbacks kthread?
+ * Caller must have preemption disabled.
+ */
+static bool rcu_is_callbacks_kthread(struct rcu_data *rdp)
+{
+ return rdp->rcu_cpu_kthread_task == current ||
+ rcu_is_callbacks_nocb_kthread(rdp);
+}
+
#ifdef CONFIG_RCU_BOOST

/*
@@ -1151,15 +1170,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
}
}

-/*
- * Is the current CPU running the RCU-callbacks kthread?
- * Caller must have preemption disabled.
- */
-static bool rcu_is_callbacks_kthread(void)
-{
- return __this_cpu_read(rcu_data.rcu_cpu_kthread_task) == current;
-}
-
#define RCU_BOOST_DELAY_JIFFIES DIV_ROUND_UP(CONFIG_RCU_BOOST_DELAY * HZ, 1000)

/*
@@ -1242,11 +1252,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}

-static bool rcu_is_callbacks_kthread(void)
-{
- return false;
-}
-
static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
{
}
--
2.31.1.189.g2e36527f23

2022-06-20 22:52:34

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 4/7] rcu/nocb: Add an option to offload all CPUs on boot

From: Joel Fernandes <[email protected]>

Systems built with CONFIG_RCU_NOCB_CPU=y but booted without either
the rcu_nocbs= or rcu_nohz_full= kernel-boot parameters will not have
callback offloading on any of the CPUs, nor can any of the CPUs be
switched to enable callback offloading at runtime. Although this is
intentional, it would be nice to have a way to offload all the CPUs
without having to make random bootloaders specify either the rcu_nocbs=
or the rcu_nohz_full= kernel-boot parameters.

This commit therefore provides a new CONFIG_RCU_NOCB_CPU_DEFAULT_ALL
Kconfig option that switches the default so as to offload callback
processing on all of the CPUs. This default can still be overridden
using the rcu_nocbs= and rcu_nohz_full= kernel-boot parameters.

Reviewed-by: Kalesh Singh <[email protected]>
Reviewed-by: Uladzislau Rezki <[email protected]>
(In v4.1, fixed issues with CONFIG maze reported by kernel test robot).
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
kernel/rcu/Kconfig | 13 +++++++++++++
kernel/rcu/tree_nocb.h | 15 ++++++++++++++-
3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2522b11e593f2..34605c275294c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3659,6 +3659,9 @@
just as if they had also been called out in the
rcu_nocbs= boot parameter.

+ Note that this argument takes precedence over
+ the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
+
noiotrap [SH] Disables trapped I/O port accesses.

noirqdebug [X86-32] Disables the code which attempts to detect and
@@ -4557,6 +4560,9 @@
no-callback mode from boot but the mode may be
toggled at runtime via cpusets.

+ Note that this argument takes precedence over
+ the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
+
rcu_nocb_poll [KNL]
Rather than requiring that offloaded CPUs
(specified by rcu_nocbs= above) explicitly
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 1c630e573548d..27aab870ae4cf 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -262,6 +262,19 @@ config RCU_NOCB_CPU
Say Y here if you need reduced OS jitter, despite added overhead.
Say N here if you are unsure.

+config RCU_NOCB_CPU_DEFAULT_ALL
+ bool "Offload RCU callback processing from all CPUs by default"
+ depends on RCU_NOCB_CPU
+ default n
+ help
+ Use this option to offload callback processing from all CPUs
+ by default, in the absence of the rcu_nocbs or nohz_full boot
+ parameter. This also avoids the need to use any boot parameters
+ to achieve the effect of offloading all CPUs on boot.
+
+ Say Y here if you want offload all CPUs by default on boot.
+ Say N here if you are unsure.
+
config TASKS_TRACE_RCU_READ_MB
bool "Tasks Trace RCU readers use memory barriers in user and idle"
depends on RCU_EXPERT && TASKS_TRACE_RCU
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 4cf9a29bba79d..60cc92cc66552 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1197,11 +1197,21 @@ void __init rcu_init_nohz(void)
{
int cpu;
bool need_rcu_nocb_mask = false;
+ bool offload_all = false;
struct rcu_data *rdp;

+#if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL)
+ if (!rcu_state.nocb_is_setup) {
+ need_rcu_nocb_mask = true;
+ offload_all = true;
+ }
+#endif /* #if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) */
+
#if defined(CONFIG_NO_HZ_FULL)
- if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
+ if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) {
need_rcu_nocb_mask = true;
+ offload_all = false; /* NO_HZ_FULL has its own mask. */
+ }
#endif /* #if defined(CONFIG_NO_HZ_FULL) */

if (need_rcu_nocb_mask) {
@@ -1222,6 +1232,9 @@ void __init rcu_init_nohz(void)
cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
#endif /* #if defined(CONFIG_NO_HZ_FULL) */

+ if (offload_all)
+ cpumask_setall(rcu_nocb_mask);
+
if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n");
cpumask_and(rcu_nocb_mask, cpu_possible_mask,
--
2.31.1.189.g2e36527f23

2022-06-20 22:52:35

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 3/7] rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call

From: Zqiang <[email protected]>

If the rcuog/o[p] kthreads spawn failed, the offloaded rdp needs to
be explicitly deoffloaded, otherwise the target rdp is still considered
offloaded even though nothing actually handles the callbacks.

Signed-off-by: Zqiang <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Joel Fernandes <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree_nocb.h | 80 +++++++++++++++++++++++++++++++++---------
1 file changed, 64 insertions(+), 16 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f2f2cab6285a1..4cf9a29bba79d 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -986,10 +986,7 @@ static int rdp_offload_toggle(struct rcu_data *rdp,
}
raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);

- if (wake_gp)
- wake_up_process(rdp_gp->nocb_gp_kthread);
-
- return 0;
+ return wake_gp;
}

static long rcu_nocb_rdp_deoffload(void *arg)
@@ -997,9 +994,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
struct rcu_data *rdp = arg;
struct rcu_segcblist *cblist = &rdp->cblist;
unsigned long flags;
- int ret;
+ int wake_gp;
+ struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;

- WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
+ /*
+ * rcu_nocb_rdp_deoffload() may be called directly if
+ * rcuog/o[p] spawn failed, because at this time the rdp->cpu
+ * is not online yet.
+ */
+ WARN_ON_ONCE((rdp->cpu != raw_smp_processor_id()) && cpu_online(rdp->cpu));

pr_info("De-offloading %d\n", rdp->cpu);

@@ -1023,10 +1026,41 @@ static long rcu_nocb_rdp_deoffload(void *arg)
*/
rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
invoke_rcu_core();
- ret = rdp_offload_toggle(rdp, false, flags);
- swait_event_exclusive(rdp->nocb_state_wq,
- !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
- SEGCBLIST_KTHREAD_GP));
+ wake_gp = rdp_offload_toggle(rdp, false, flags);
+
+ mutex_lock(&rdp_gp->nocb_gp_kthread_mutex);
+ if (rdp_gp->nocb_gp_kthread) {
+ if (wake_gp)
+ wake_up_process(rdp_gp->nocb_gp_kthread);
+
+ /*
+ * If rcuo[p] kthread spawn failed, directly remove SEGCBLIST_KTHREAD_CB.
+ * Just wait SEGCBLIST_KTHREAD_GP to be cleared by rcuog.
+ */
+ if (!rdp->nocb_cb_kthread) {
+ rcu_nocb_lock_irqsave(rdp, flags);
+ rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_KTHREAD_CB);
+ rcu_nocb_unlock_irqrestore(rdp, flags);
+ }
+
+ swait_event_exclusive(rdp->nocb_state_wq,
+ !rcu_segcblist_test_flags(cblist,
+ SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP));
+ } else {
+ /*
+ * No kthread to clear the flags for us or remove the rdp from the nocb list
+ * to iterate. Do it here instead. Locking doesn't look stricly necessary
+ * but we stick to paranoia in this rare path.
+ */
+ rcu_nocb_lock_irqsave(rdp, flags);
+ rcu_segcblist_clear_flags(&rdp->cblist,
+ SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP);
+ rcu_nocb_unlock_irqrestore(rdp, flags);
+
+ list_del(&rdp->nocb_entry_rdp);
+ }
+ mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
+
/*
* Lock one last time to acquire latest callback updates from kthreads
* so we can later handle callbacks locally without locking.
@@ -1047,7 +1081,7 @@ static long rcu_nocb_rdp_deoffload(void *arg)
WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));


- return ret;
+ return 0;
}

int rcu_nocb_cpu_deoffload(int cpu)
@@ -1079,7 +1113,8 @@ static long rcu_nocb_rdp_offload(void *arg)
struct rcu_data *rdp = arg;
struct rcu_segcblist *cblist = &rdp->cblist;
unsigned long flags;
- int ret;
+ int wake_gp;
+ struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;

WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
/*
@@ -1089,6 +1124,9 @@ static long rcu_nocb_rdp_offload(void *arg)
if (!rdp->nocb_gp_rdp)
return -EINVAL;

+ if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
+ return -EINVAL;
+
pr_info("Offloading %d\n", rdp->cpu);

/*
@@ -1113,7 +1151,9 @@ static long rcu_nocb_rdp_offload(void *arg)
* WRITE flags READ callbacks
* rcu_nocb_unlock() rcu_nocb_unlock()
*/
- ret = rdp_offload_toggle(rdp, true, flags);
+ wake_gp = rdp_offload_toggle(rdp, true, flags);
+ if (wake_gp)
+ wake_up_process(rdp_gp->nocb_gp_kthread);
swait_event_exclusive(rdp->nocb_state_wq,
rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB) &&
rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
@@ -1126,7 +1166,7 @@ static long rcu_nocb_rdp_offload(void *arg)
rcu_segcblist_clear_flags(cblist, SEGCBLIST_RCU_CORE);
rcu_nocb_unlock_irqrestore(rdp, flags);

- return ret;
+ return 0;
}

int rcu_nocb_cpu_offload(int cpu)
@@ -1248,7 +1288,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
"rcuog/%d", rdp_gp->cpu);
if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
- return;
+ goto end;
}
WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
if (kthread_prio)
@@ -1260,12 +1300,20 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
t = kthread_run(rcu_nocb_cb_kthread, rdp,
"rcuo%c/%d", rcu_state.abbr, cpu);
if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
- return;
+ goto end;

if (kthread_prio)
sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
WRITE_ONCE(rdp->nocb_cb_kthread, t);
WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
+ return;
+end:
+ mutex_lock(&rcu_state.barrier_mutex);
+ if (rcu_rdp_is_offloaded(rdp)) {
+ rcu_nocb_rdp_deoffload(rdp);
+ cpumask_clear_cpu(cpu, rcu_nocb_mask);
+ }
+ mutex_unlock(&rcu_state.barrier_mutex);
}

/* How many CB CPU IDs per GP kthread? Default of -1 for sqrt(nr_cpu_ids). */
--
2.31.1.189.g2e36527f23

2022-06-20 22:52:39

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 1/7] rcu/nocb: Add/del rdp to iterate from rcuog itself

From: Frederic Weisbecker <[email protected]>

NOCB rdp's are part of a group whose list is iterated by the
corresponding rdp leader.

This list is RCU traversed because an rdp can be either added or
deleted concurrently. Upon addition, a new iteration to the list after
a synchronization point (a pair of LOCK/UNLOCK ->nocb_gp_lock) is forced
to make sure:

1) we didn't miss a new element added in the middle of an iteration
2) we didn't ignore a whole subset of the list due to an element being
quickly deleted and then re-added.
3) we prevent from probably other surprises...

Although this layout is expected to be safe, it doesn't help anybody
to sleep well.

Simplify instead the nocb state toggling with moving the list
modification from the nocb (de-)offloading workqueue to the rcuog
kthreads instead.

Whenever the rdp leader is expected to (re-)set the SEGCBLIST_KTHREAD_GP
flag of a target rdp, the latter is queued so that the leader handles
the flag flip along with adding or deleting the target rdp to the list
to iterate. This way the list modification and iteration happen from the
same kthread and those operations can't race altogether.

As a bonus, the flags for each rdp don't need to be checked locklessly
before each iteration, which is one less opportunity to produce
nightmares.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree.h | 1 +
kernel/rcu/tree_nocb.h | 138 +++++++++++++++++++++--------------------
2 files changed, 71 insertions(+), 68 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 2ccf5845957df..4f8532c33558f 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -235,6 +235,7 @@ struct rcu_data {
* if rdp_gp.
*/
struct list_head nocb_entry_rdp; /* rcu_data node in wakeup chain. */
+ struct rcu_data *nocb_toggling_rdp; /* rdp queued for (de-)offloading */

/* The following fields are used by CB kthread, hence new cacheline. */
struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp;
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 46694e13398a3..dac74952e1d1b 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -546,52 +546,43 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
}
}

-/*
- * Check if we ignore this rdp.
- *
- * We check that without holding the nocb lock but
- * we make sure not to miss a freshly offloaded rdp
- * with the current ordering:
- *
- * rdp_offload_toggle() nocb_gp_enabled_cb()
- * ------------------------- ----------------------------
- * WRITE flags LOCK nocb_gp_lock
- * LOCK nocb_gp_lock READ/WRITE nocb_gp_sleep
- * READ/WRITE nocb_gp_sleep UNLOCK nocb_gp_lock
- * UNLOCK nocb_gp_lock READ flags
- */
-static inline bool nocb_gp_enabled_cb(struct rcu_data *rdp)
-{
- u8 flags = SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_GP;
-
- return rcu_segcblist_test_flags(&rdp->cblist, flags);
-}
-
-static inline bool nocb_gp_update_state_deoffloading(struct rcu_data *rdp,
- bool *needwake_state)
+static int nocb_gp_toggle_rdp(struct rcu_data *rdp,
+ bool *wake_state)
{
struct rcu_segcblist *cblist = &rdp->cblist;
+ unsigned long flags;
+ int ret;

- if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) {
- if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) {
- rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP);
- if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
- *needwake_state = true;
- }
- return false;
+ rcu_nocb_lock_irqsave(rdp, flags);
+ if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED) &&
+ !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) {
+ /*
+ * Offloading. Set our flag and notify the offload worker.
+ * We will handle this rdp until it ever gets de-offloaded.
+ */
+ rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP);
+ if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
+ *wake_state = true;
+ ret = 1;
+ } else if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED) &&
+ rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) {
+ /*
+ * De-offloading. Clear our flag and notify the de-offload worker.
+ * We will ignore this rdp until it ever gets re-offloaded.
+ */
+ rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP);
+ if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
+ *wake_state = true;
+ ret = 0;
+ } else {
+ WARN_ON_ONCE(1);
+ ret = -1;
}

- /*
- * De-offloading. Clear our flag and notify the de-offload worker.
- * We will ignore this rdp until it ever gets re-offloaded.
- */
- WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
- rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP);
- if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
- *needwake_state = true;
- return true;
-}
+ rcu_nocb_unlock_irqrestore(rdp, flags);

+ return ret;
+}

/*
* No-CBs GP kthreads come here to wait for additional callbacks to show up
@@ -609,7 +600,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
bool needwait_gp = false; // This prevents actual uninitialized use.
bool needwake;
bool needwake_gp;
- struct rcu_data *rdp;
+ struct rcu_data *rdp, *rdp_toggling = NULL;
struct rcu_node *rnp;
unsigned long wait_gp_seq = 0; // Suppress "use uninitialized" warning.
bool wasempty = false;
@@ -634,19 +625,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
* is added to the list, so the skipped-over rcu_data structures
* won't be ignored for long.
*/
- list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) {
- bool needwake_state = false;
-
- if (!nocb_gp_enabled_cb(rdp))
- continue;
+ list_for_each_entry(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp) {
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
rcu_nocb_lock_irqsave(rdp, flags);
- if (nocb_gp_update_state_deoffloading(rdp, &needwake_state)) {
- rcu_nocb_unlock_irqrestore(rdp, flags);
- if (needwake_state)
- swake_up_one(&rdp->nocb_state_wq);
- continue;
- }
+ lockdep_assert_held(&rdp->nocb_lock);
bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
if (bypass_ncbs &&
(time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
@@ -656,8 +638,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
rcu_nocb_unlock_irqrestore(rdp, flags);
- if (needwake_state)
- swake_up_one(&rdp->nocb_state_wq);
continue; /* No callbacks here, try next. */
}
if (bypass_ncbs) {
@@ -705,8 +685,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
}
if (needwake_gp)
rcu_gp_kthread_wake();
- if (needwake_state)
- swake_up_one(&rdp->nocb_state_wq);
}

my_rdp->nocb_gp_bypass = bypass;
@@ -739,15 +717,49 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
!READ_ONCE(my_rdp->nocb_gp_sleep));
trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("EndWait"));
}
+
if (!rcu_nocb_poll) {
raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
+ // (De-)queue an rdp to/from the group if its nocb state is changing
+ rdp_toggling = my_rdp->nocb_toggling_rdp;
+ if (rdp_toggling)
+ my_rdp->nocb_toggling_rdp = NULL;
+
if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
del_timer(&my_rdp->nocb_timer);
}
WRITE_ONCE(my_rdp->nocb_gp_sleep, true);
raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
+ } else {
+ rdp_toggling = READ_ONCE(my_rdp->nocb_toggling_rdp);
+ if (rdp_toggling) {
+ /*
+ * Paranoid locking to make sure nocb_toggling_rdp is well
+ * reset *before* we (re)set SEGCBLIST_KTHREAD_GP or we could
+ * race with another round of nocb toggling for this rdp.
+ * Nocb locking should prevent from that already but we stick
+ * to paranoia, especially in rare path.
+ */
+ raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
+ my_rdp->nocb_toggling_rdp = NULL;
+ raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
+ }
+ }
+
+ if (rdp_toggling) {
+ bool wake_state = false;
+ int ret;
+
+ ret = nocb_gp_toggle_rdp(rdp_toggling, &wake_state);
+ if (ret == 1)
+ list_add_tail(&rdp_toggling->nocb_entry_rdp, &my_rdp->nocb_head_rdp);
+ else if (ret == 0)
+ list_del(&rdp_toggling->nocb_entry_rdp);
+ if (wake_state)
+ swake_up_one(&rdp_toggling->nocb_state_wq);
}
+
my_rdp->nocb_gp_seq = -1;
WARN_ON(signal_pending(current));
}
@@ -966,6 +978,8 @@ static int rdp_offload_toggle(struct rcu_data *rdp,
swake_up_one(&rdp->nocb_cb_wq);

raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
+ // Queue this rdp for add/del to/from the list to iterate on rcuog
+ WRITE_ONCE(rdp_gp->nocb_toggling_rdp, rdp);
if (rdp_gp->nocb_gp_sleep) {
rdp_gp->nocb_gp_sleep = false;
wake_gp = true;
@@ -1013,8 +1027,6 @@ static long rcu_nocb_rdp_deoffload(void *arg)
swait_event_exclusive(rdp->nocb_state_wq,
!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
SEGCBLIST_KTHREAD_GP));
- /* Stop nocb_gp_wait() from iterating over this structure. */
- list_del_rcu(&rdp->nocb_entry_rdp);
/*
* Lock one last time to acquire latest callback updates from kthreads
* so we can later handle callbacks locally without locking.
@@ -1079,16 +1091,6 @@ static long rcu_nocb_rdp_offload(void *arg)

pr_info("Offloading %d\n", rdp->cpu);

- /*
- * Cause future nocb_gp_wait() invocations to iterate over
- * structure, resetting ->nocb_gp_sleep and waking up the related
- * "rcuog". Since nocb_gp_wait() in turn locks ->nocb_gp_lock
- * before setting ->nocb_gp_sleep again, we are guaranteed to
- * iterate this newly added structure before "rcuog" goes to
- * sleep again.
- */
- list_add_tail_rcu(&rdp->nocb_entry_rdp, &rdp->nocb_gp_rdp->nocb_head_rdp);
-
/*
* Can't use rcu_nocb_lock_irqsave() before SEGCBLIST_LOCKING
* is set.
--
2.31.1.189.g2e36527f23

2022-06-20 23:09:44

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 7/7] rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty

From: Zqiang <[email protected]>

Currently, if the 'rcu_nocb_poll' kernel boot parameter is enabled, all
rcuog kthreads enter polling mode. However, if all of a given group
of rcuo kthreads correspond to CPUs that have been de-offloaded, the
corresponding rcuog kthread will nonetheless still wake up periodically,
unnecessarily consuming power and perturbing workloads. Fortunately,
this situation is easily detected by the fact that the rcuog kthread's
CPU's rcu_data structure's ->nocb_head_rdp list is empty.

This commit saves power and avoids unnecessarily perturbing workloads
by putting an rcuog kthread to sleep during any time period when all of
its rcuo kthreads' CPUs are de-offloaded.

Co-developed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree_nocb.h | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index fa8e4f82e60c0..a8f574d8850d2 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -584,6 +584,14 @@ static int nocb_gp_toggle_rdp(struct rcu_data *rdp,
return ret;
}

+static void nocb_gp_sleep(struct rcu_data *my_rdp, int cpu)
+{
+ trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep"));
+ swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq,
+ !READ_ONCE(my_rdp->nocb_gp_sleep));
+ trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("EndSleep"));
+}
+
/*
* No-CBs GP kthreads come here to wait for additional callbacks to show up
* or for grace periods to end.
@@ -701,13 +709,19 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
/* Polling, so trace if first poll in the series. */
if (gotcbs)
trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Poll"));
- schedule_timeout_idle(1);
+ if (list_empty(&my_rdp->nocb_head_rdp)) {
+ raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
+ if (!my_rdp->nocb_toggling_rdp)
+ WRITE_ONCE(my_rdp->nocb_gp_sleep, true);
+ raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
+ /* Wait for any offloading rdp */
+ nocb_gp_sleep(my_rdp, cpu);
+ } else {
+ schedule_timeout_idle(1);
+ }
} else if (!needwait_gp) {
/* Wait for callbacks to appear. */
- trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep"));
- swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq,
- !READ_ONCE(my_rdp->nocb_gp_sleep));
- trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("EndSleep"));
+ nocb_gp_sleep(my_rdp, cpu);
} else {
rnp = my_rdp->mynode;
trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("StartWait"));
--
2.31.1.189.g2e36527f23

2022-07-19 09:33:41

by Neeraj Upadhyay

[permalink] [raw]
Subject: Re: [PATCH rcu 2/7] rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order



On 6/21/2022 4:14 AM, Paul E. McKenney wrote:
> From: Zqiang <[email protected]>
>
> In case of failure to spawn either rcuog or rcuo[p] kthreads for a given
> rdp, rcu_nocb_rdp_deoffload() needs to be called with the hotplug
> lock and the barrier_mutex held. However cpus write lock is already held
> while calling rcutree_prepare_cpu(). It's not possible to call
> rcu_nocb_rdp_deoffload() from there with just locking the barrier_mutex
> or this would result in a locking inversion against
> rcu_nocb_cpu_deoffload() which holds both locks in the reverse order.
>
> Simply solve this with inverting the locking order inside
> rcu_nocb_cpu_[de]offload(). This will be a pre-requisite to toggle NOCB
> states toward cpusets anyway.
>
> Signed-off-by: Zqiang <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> ---

Reviewed-by: Neeraj Upadhyay <[email protected]>


Thanks
Neeraj

> kernel/rcu/tree_nocb.h | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index dac74952e1d1b..f2f2cab6285a1 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1055,8 +1055,8 @@ int rcu_nocb_cpu_deoffload(int cpu)
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> int ret = 0;
>
> - mutex_lock(&rcu_state.barrier_mutex);
> cpus_read_lock();
> + mutex_lock(&rcu_state.barrier_mutex);
> if (rcu_rdp_is_offloaded(rdp)) {
> if (cpu_online(cpu)) {
> ret = work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
> @@ -1067,8 +1067,8 @@ int rcu_nocb_cpu_deoffload(int cpu)
> ret = -EINVAL;
> }
> }
> - cpus_read_unlock();
> mutex_unlock(&rcu_state.barrier_mutex);
> + cpus_read_unlock();
>
> return ret;
> }
> @@ -1134,8 +1134,8 @@ int rcu_nocb_cpu_offload(int cpu)
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> int ret = 0;
>
> - mutex_lock(&rcu_state.barrier_mutex);
> cpus_read_lock();
> + mutex_lock(&rcu_state.barrier_mutex);
> if (!rcu_rdp_is_offloaded(rdp)) {
> if (cpu_online(cpu)) {
> ret = work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp);
> @@ -1146,8 +1146,8 @@ int rcu_nocb_cpu_offload(int cpu)
> ret = -EINVAL;
> }
> }
> - cpus_read_unlock();
> mutex_unlock(&rcu_state.barrier_mutex);
> + cpus_read_unlock();
>
> return ret;
> }

2022-07-19 09:44:59

by Neeraj Upadhyay

[permalink] [raw]
Subject: Re: [PATCH rcu 1/7] rcu/nocb: Add/del rdp to iterate from rcuog itself



On 6/21/2022 4:14 AM, Paul E. McKenney wrote:
> From: Frederic Weisbecker <[email protected]>
>
> NOCB rdp's are part of a group whose list is iterated by the
> corresponding rdp leader.
>
> This list is RCU traversed because an rdp can be either added or
> deleted concurrently. Upon addition, a new iteration to the list after
> a synchronization point (a pair of LOCK/UNLOCK ->nocb_gp_lock) is forced
> to make sure:
>
> 1) we didn't miss a new element added in the middle of an iteration
> 2) we didn't ignore a whole subset of the list due to an element being
> quickly deleted and then re-added.
> 3) we prevent from probably other surprises...
>
> Although this layout is expected to be safe, it doesn't help anybody
> to sleep well.
>
> Simplify instead the nocb state toggling with moving the list
> modification from the nocb (de-)offloading workqueue to the rcuog
> kthreads instead.
>
> Whenever the rdp leader is expected to (re-)set the SEGCBLIST_KTHREAD_GP
> flag of a target rdp, the latter is queued so that the leader handles
> the flag flip along with adding or deleting the target rdp to the list
> to iterate. This way the list modification and iteration happen from the
> same kthread and those operations can't race altogether.
>
> As a bonus, the flags for each rdp don't need to be checked locklessly
> before each iteration, which is one less opportunity to produce
> nightmares.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: Zqiang <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> ---

Reviewed-by: Neeraj Upadhyay <[email protected]>


Thanks
Neeraj

> kernel/rcu/tree.h | 1 +
> kernel/rcu/tree_nocb.h | 138 +++++++++++++++++++++--------------------
> 2 files changed, 71 insertions(+), 68 deletions(-)
>
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 2ccf5845957df..4f8532c33558f 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -235,6 +235,7 @@ struct rcu_data {
> * if rdp_gp.
> */
> struct list_head nocb_entry_rdp; /* rcu_data node in wakeup chain. */
> + struct rcu_data *nocb_toggling_rdp; /* rdp queued for (de-)offloading */
>
> /* The following fields are used by CB kthread, hence new cacheline. */
> struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp;
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 46694e13398a3..dac74952e1d1b 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -546,52 +546,43 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
> }
> }
>
> -/*
> - * Check if we ignore this rdp.
> - *
> - * We check that without holding the nocb lock but
> - * we make sure not to miss a freshly offloaded rdp
> - * with the current ordering:
> - *
> - * rdp_offload_toggle() nocb_gp_enabled_cb()
> - * ------------------------- ----------------------------
> - * WRITE flags LOCK nocb_gp_lock
> - * LOCK nocb_gp_lock READ/WRITE nocb_gp_sleep
> - * READ/WRITE nocb_gp_sleep UNLOCK nocb_gp_lock
> - * UNLOCK nocb_gp_lock READ flags
> - */
> -static inline bool nocb_gp_enabled_cb(struct rcu_data *rdp)
> -{
> - u8 flags = SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_GP;
> -
> - return rcu_segcblist_test_flags(&rdp->cblist, flags);
> -}
> -
> -static inline bool nocb_gp_update_state_deoffloading(struct rcu_data *rdp,
> - bool *needwake_state)
> +static int nocb_gp_toggle_rdp(struct rcu_data *rdp,
> + bool *wake_state)
> {
> struct rcu_segcblist *cblist = &rdp->cblist;
> + unsigned long flags;
> + int ret;
>
> - if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) {
> - if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) {
> - rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP);
> - if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
> - *needwake_state = true;
> - }
> - return false;
> + rcu_nocb_lock_irqsave(rdp, flags);
> + if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED) &&
> + !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) {
> + /*
> + * Offloading. Set our flag and notify the offload worker.
> + * We will handle this rdp until it ever gets de-offloaded.
> + */
> + rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP);
> + if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
> + *wake_state = true;
> + ret = 1;
> + } else if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED) &&
> + rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) {
> + /*
> + * De-offloading. Clear our flag and notify the de-offload worker.
> + * We will ignore this rdp until it ever gets re-offloaded.
> + */
> + rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP);
> + if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
> + *wake_state = true;
> + ret = 0;
> + } else {
> + WARN_ON_ONCE(1);
> + ret = -1;
> }
>
> - /*
> - * De-offloading. Clear our flag and notify the de-offload worker.
> - * We will ignore this rdp until it ever gets re-offloaded.
> - */
> - WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
> - rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP);
> - if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB))
> - *needwake_state = true;
> - return true;
> -}
> + rcu_nocb_unlock_irqrestore(rdp, flags);
>
> + return ret;
> +}
>
> /*
> * No-CBs GP kthreads come here to wait for additional callbacks to show up
> @@ -609,7 +600,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> bool needwait_gp = false; // This prevents actual uninitialized use.
> bool needwake;
> bool needwake_gp;
> - struct rcu_data *rdp;
> + struct rcu_data *rdp, *rdp_toggling = NULL;
> struct rcu_node *rnp;
> unsigned long wait_gp_seq = 0; // Suppress "use uninitialized" warning.
> bool wasempty = false;
> @@ -634,19 +625,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> * is added to the list, so the skipped-over rcu_data structures
> * won't be ignored for long.
> */
> - list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) {
> - bool needwake_state = false;
> -
> - if (!nocb_gp_enabled_cb(rdp))
> - continue;
> + list_for_each_entry(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp) {
> trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> rcu_nocb_lock_irqsave(rdp, flags);
> - if (nocb_gp_update_state_deoffloading(rdp, &needwake_state)) {
> - rcu_nocb_unlock_irqrestore(rdp, flags);
> - if (needwake_state)
> - swake_up_one(&rdp->nocb_state_wq);
> - continue;
> - }
> + lockdep_assert_held(&rdp->nocb_lock);
> bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> if (bypass_ncbs &&
> (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> @@ -656,8 +638,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> } else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
> rcu_nocb_unlock_irqrestore(rdp, flags);
> - if (needwake_state)
> - swake_up_one(&rdp->nocb_state_wq);
> continue; /* No callbacks here, try next. */
> }
> if (bypass_ncbs) {
> @@ -705,8 +685,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> }
> if (needwake_gp)
> rcu_gp_kthread_wake();
> - if (needwake_state)
> - swake_up_one(&rdp->nocb_state_wq);
> }
>
> my_rdp->nocb_gp_bypass = bypass;
> @@ -739,15 +717,49 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> !READ_ONCE(my_rdp->nocb_gp_sleep));
> trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("EndWait"));
> }
> +
> if (!rcu_nocb_poll) {
> raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
> + // (De-)queue an rdp to/from the group if its nocb state is changing
> + rdp_toggling = my_rdp->nocb_toggling_rdp;
> + if (rdp_toggling)
> + my_rdp->nocb_toggling_rdp = NULL;
> +
> if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
> WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> del_timer(&my_rdp->nocb_timer);
> }
> WRITE_ONCE(my_rdp->nocb_gp_sleep, true);
> raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
> + } else {
> + rdp_toggling = READ_ONCE(my_rdp->nocb_toggling_rdp);
> + if (rdp_toggling) {
> + /*
> + * Paranoid locking to make sure nocb_toggling_rdp is well
> + * reset *before* we (re)set SEGCBLIST_KTHREAD_GP or we could
> + * race with another round of nocb toggling for this rdp.
> + * Nocb locking should prevent from that already but we stick
> + * to paranoia, especially in rare path.
> + */
> + raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
> + my_rdp->nocb_toggling_rdp = NULL;
> + raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
> + }
> + }
> +
> + if (rdp_toggling) {
> + bool wake_state = false;
> + int ret;
> +
> + ret = nocb_gp_toggle_rdp(rdp_toggling, &wake_state);
> + if (ret == 1)
> + list_add_tail(&rdp_toggling->nocb_entry_rdp, &my_rdp->nocb_head_rdp);
> + else if (ret == 0)
> + list_del(&rdp_toggling->nocb_entry_rdp);
> + if (wake_state)
> + swake_up_one(&rdp_toggling->nocb_state_wq);
> }
> +
> my_rdp->nocb_gp_seq = -1;
> WARN_ON(signal_pending(current));
> }
> @@ -966,6 +978,8 @@ static int rdp_offload_toggle(struct rcu_data *rdp,
> swake_up_one(&rdp->nocb_cb_wq);
>
> raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> + // Queue this rdp for add/del to/from the list to iterate on rcuog
> + WRITE_ONCE(rdp_gp->nocb_toggling_rdp, rdp);
> if (rdp_gp->nocb_gp_sleep) {
> rdp_gp->nocb_gp_sleep = false;
> wake_gp = true;
> @@ -1013,8 +1027,6 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> swait_event_exclusive(rdp->nocb_state_wq,
> !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
> SEGCBLIST_KTHREAD_GP));
> - /* Stop nocb_gp_wait() from iterating over this structure. */
> - list_del_rcu(&rdp->nocb_entry_rdp);
> /*
> * Lock one last time to acquire latest callback updates from kthreads
> * so we can later handle callbacks locally without locking.
> @@ -1079,16 +1091,6 @@ static long rcu_nocb_rdp_offload(void *arg)
>
> pr_info("Offloading %d\n", rdp->cpu);
>
> - /*
> - * Cause future nocb_gp_wait() invocations to iterate over
> - * structure, resetting ->nocb_gp_sleep and waking up the related
> - * "rcuog". Since nocb_gp_wait() in turn locks ->nocb_gp_lock
> - * before setting ->nocb_gp_sleep again, we are guaranteed to
> - * iterate this newly added structure before "rcuog" goes to
> - * sleep again.
> - */
> - list_add_tail_rcu(&rdp->nocb_entry_rdp, &rdp->nocb_gp_rdp->nocb_head_rdp);
> -
> /*
> * Can't use rcu_nocb_lock_irqsave() before SEGCBLIST_LOCKING
> * is set.

2022-07-19 09:46:28

by Neeraj Upadhyay

[permalink] [raw]
Subject: Re: [PATCH rcu 3/7] rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call



On 6/21/2022 4:14 AM, Paul E. McKenney wrote:
> From: Zqiang <[email protected]>
>
> If the rcuog/o[p] kthreads spawn failed, the offloaded rdp needs to
> be explicitly deoffloaded, otherwise the target rdp is still considered
> offloaded even though nothing actually handles the callbacks.
>
> Signed-off-by: Zqiang <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> ---

Reviewed-by: Neeraj Upadhyay <[email protected]>


Thanks
Neeraj

> kernel/rcu/tree_nocb.h | 80 +++++++++++++++++++++++++++++++++---------
> 1 file changed, 64 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index f2f2cab6285a1..4cf9a29bba79d 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -986,10 +986,7 @@ static int rdp_offload_toggle(struct rcu_data *rdp,
> }
> raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
>
> - if (wake_gp)
> - wake_up_process(rdp_gp->nocb_gp_kthread);
> -
> - return 0;
> + return wake_gp;
> }
>
> static long rcu_nocb_rdp_deoffload(void *arg)
> @@ -997,9 +994,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> struct rcu_data *rdp = arg;
> struct rcu_segcblist *cblist = &rdp->cblist;
> unsigned long flags;
> - int ret;
> + int wake_gp;
> + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
>
> - WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> + /*
> + * rcu_nocb_rdp_deoffload() may be called directly if
> + * rcuog/o[p] spawn failed, because at this time the rdp->cpu
> + * is not online yet.
> + */
> + WARN_ON_ONCE((rdp->cpu != raw_smp_processor_id()) && cpu_online(rdp->cpu));
>
> pr_info("De-offloading %d\n", rdp->cpu);
>
> @@ -1023,10 +1026,41 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> */
> rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
> invoke_rcu_core();
> - ret = rdp_offload_toggle(rdp, false, flags);
> - swait_event_exclusive(rdp->nocb_state_wq,
> - !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
> - SEGCBLIST_KTHREAD_GP));
> + wake_gp = rdp_offload_toggle(rdp, false, flags);
> +
> + mutex_lock(&rdp_gp->nocb_gp_kthread_mutex);
> + if (rdp_gp->nocb_gp_kthread) {
> + if (wake_gp)
> + wake_up_process(rdp_gp->nocb_gp_kthread);
> +
> + /*
> + * If rcuo[p] kthread spawn failed, directly remove SEGCBLIST_KTHREAD_CB.
> + * Just wait SEGCBLIST_KTHREAD_GP to be cleared by rcuog.
> + */
> + if (!rdp->nocb_cb_kthread) {
> + rcu_nocb_lock_irqsave(rdp, flags);
> + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_KTHREAD_CB);
> + rcu_nocb_unlock_irqrestore(rdp, flags);
> + }
> +
> + swait_event_exclusive(rdp->nocb_state_wq,
> + !rcu_segcblist_test_flags(cblist,
> + SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP));
> + } else {
> + /*
> + * No kthread to clear the flags for us or remove the rdp from the nocb list
> + * to iterate. Do it here instead. Locking doesn't look stricly necessary
> + * but we stick to paranoia in this rare path.
> + */
> + rcu_nocb_lock_irqsave(rdp, flags);
> + rcu_segcblist_clear_flags(&rdp->cblist,
> + SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP);
> + rcu_nocb_unlock_irqrestore(rdp, flags);
> +
> + list_del(&rdp->nocb_entry_rdp);
> + }
> + mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
> +
> /*
> * Lock one last time to acquire latest callback updates from kthreads
> * so we can later handle callbacks locally without locking.
> @@ -1047,7 +1081,7 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
>
>
> - return ret;
> + return 0;
> }
>
> int rcu_nocb_cpu_deoffload(int cpu)
> @@ -1079,7 +1113,8 @@ static long rcu_nocb_rdp_offload(void *arg)
> struct rcu_data *rdp = arg;
> struct rcu_segcblist *cblist = &rdp->cblist;
> unsigned long flags;
> - int ret;
> + int wake_gp;
> + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
>
> WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> /*
> @@ -1089,6 +1124,9 @@ static long rcu_nocb_rdp_offload(void *arg)
> if (!rdp->nocb_gp_rdp)
> return -EINVAL;
>
> + if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
> + return -EINVAL;
> +
> pr_info("Offloading %d\n", rdp->cpu);
>
> /*
> @@ -1113,7 +1151,9 @@ static long rcu_nocb_rdp_offload(void *arg)
> * WRITE flags READ callbacks
> * rcu_nocb_unlock() rcu_nocb_unlock()
> */
> - ret = rdp_offload_toggle(rdp, true, flags);
> + wake_gp = rdp_offload_toggle(rdp, true, flags);
> + if (wake_gp)
> + wake_up_process(rdp_gp->nocb_gp_kthread);
> swait_event_exclusive(rdp->nocb_state_wq,
> rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB) &&
> rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
> @@ -1126,7 +1166,7 @@ static long rcu_nocb_rdp_offload(void *arg)
> rcu_segcblist_clear_flags(cblist, SEGCBLIST_RCU_CORE);
> rcu_nocb_unlock_irqrestore(rdp, flags);
>
> - return ret;
> + return 0;
> }
>
> int rcu_nocb_cpu_offload(int cpu)
> @@ -1248,7 +1288,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> "rcuog/%d", rdp_gp->cpu);
> if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
> mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
> - return;
> + goto end;
> }
> WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
> if (kthread_prio)
> @@ -1260,12 +1300,20 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> t = kthread_run(rcu_nocb_cb_kthread, rdp,
> "rcuo%c/%d", rcu_state.abbr, cpu);
> if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> - return;
> + goto end;
>
> if (kthread_prio)
> sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
> WRITE_ONCE(rdp->nocb_cb_kthread, t);
> WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
> + return;
> +end:
> + mutex_lock(&rcu_state.barrier_mutex);
> + if (rcu_rdp_is_offloaded(rdp)) {
> + rcu_nocb_rdp_deoffload(rdp);
> + cpumask_clear_cpu(cpu, rcu_nocb_mask);
> + }
> + mutex_unlock(&rcu_state.barrier_mutex);
> }
>
> /* How many CB CPU IDs per GP kthread? Default of -1 for sqrt(nr_cpu_ids). */

2022-07-19 09:47:37

by Neeraj Upadhyay

[permalink] [raw]
Subject: Re: [PATCH rcu 4/7] rcu/nocb: Add an option to offload all CPUs on boot



On 6/21/2022 4:15 AM, Paul E. McKenney wrote:
> From: Joel Fernandes <[email protected]>
>
> Systems built with CONFIG_RCU_NOCB_CPU=y but booted without either
> the rcu_nocbs= or rcu_nohz_full= kernel-boot parameters will not have
> callback offloading on any of the CPUs, nor can any of the CPUs be
> switched to enable callback offloading at runtime. Although this is
> intentional, it would be nice to have a way to offload all the CPUs
> without having to make random bootloaders specify either the rcu_nocbs=
> or the rcu_nohz_full= kernel-boot parameters.
>
> This commit therefore provides a new CONFIG_RCU_NOCB_CPU_DEFAULT_ALL
> Kconfig option that switches the default so as to offload callback
> processing on all of the CPUs. This default can still be overridden
> using the rcu_nocbs= and rcu_nohz_full= kernel-boot parameters.
>
> Reviewed-by: Kalesh Singh <[email protected]>
> Reviewed-by: Uladzislau Rezki <[email protected]>
> (In v4.1, fixed issues with CONFIG maze reported by kernel test robot).
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Joel Fernandes <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> ---


Reviewed-by: Neeraj Upadhyay <[email protected]>

One query on cpumask_setall() below

> Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
> kernel/rcu/Kconfig | 13 +++++++++++++
> kernel/rcu/tree_nocb.h | 15 ++++++++++++++-
> 3 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 2522b11e593f2..34605c275294c 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3659,6 +3659,9 @@
> just as if they had also been called out in the
> rcu_nocbs= boot parameter.
>
> + Note that this argument takes precedence over
> + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
> +
> noiotrap [SH] Disables trapped I/O port accesses.
>
> noirqdebug [X86-32] Disables the code which attempts to detect and
> @@ -4557,6 +4560,9 @@
> no-callback mode from boot but the mode may be
> toggled at runtime via cpusets.
>
> + Note that this argument takes precedence over
> + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
> +
> rcu_nocb_poll [KNL]
> Rather than requiring that offloaded CPUs
> (specified by rcu_nocbs= above) explicitly
> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> index 1c630e573548d..27aab870ae4cf 100644
> --- a/kernel/rcu/Kconfig
> +++ b/kernel/rcu/Kconfig
> @@ -262,6 +262,19 @@ config RCU_NOCB_CPU
> Say Y here if you need reduced OS jitter, despite added overhead.
> Say N here if you are unsure.
>
> +config RCU_NOCB_CPU_DEFAULT_ALL
> + bool "Offload RCU callback processing from all CPUs by default"
> + depends on RCU_NOCB_CPU
> + default n
> + help
> + Use this option to offload callback processing from all CPUs
> + by default, in the absence of the rcu_nocbs or nohz_full boot
> + parameter. This also avoids the need to use any boot parameters
> + to achieve the effect of offloading all CPUs on boot.
> +
> + Say Y here if you want offload all CPUs by default on boot.
> + Say N here if you are unsure.
> +
> config TASKS_TRACE_RCU_READ_MB
> bool "Tasks Trace RCU readers use memory barriers in user and idle"
> depends on RCU_EXPERT && TASKS_TRACE_RCU
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 4cf9a29bba79d..60cc92cc66552 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1197,11 +1197,21 @@ void __init rcu_init_nohz(void)
> {
> int cpu;
> bool need_rcu_nocb_mask = false;
> + bool offload_all = false;
> struct rcu_data *rdp;
>
> +#if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL)
> + if (!rcu_state.nocb_is_setup) {
> + need_rcu_nocb_mask = true;
> + offload_all = true;
> + }
> +#endif /* #if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) */
> +
> #if defined(CONFIG_NO_HZ_FULL)
> - if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
> + if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) {
> need_rcu_nocb_mask = true;
> + offload_all = false; /* NO_HZ_FULL has its own mask. */
> + }
> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
>
> if (need_rcu_nocb_mask) {
> @@ -1222,6 +1232,9 @@ void __init rcu_init_nohz(void)
> cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
>
> + if (offload_all)
> + cpumask_setall(rcu_nocb_mask);

Do we need to do a cpumask_and(rcu_nocb_mask, cpu_possible_mask,
rcu_nocb_mask) after setting all cpus in rcu_nocb_mask (cpumask_subset()
check below takes care of it though)?


Thanks
Neeraj

> +
> if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
> pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n");
> cpumask_and(rcu_nocb_mask, cpu_possible_mask,

2022-07-19 10:20:52

by Neeraj Upadhyay

[permalink] [raw]
Subject: Re: [PATCH rcu 5/7] rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()



On 6/21/2022 4:15 AM, Paul E. McKenney wrote:
> From: Zqiang <[email protected]>
>
> Callbacks are invoked in RCU kthreads when calbacks are offloaded
> (rcu_nocbs boot parameter) or when RCU's softirq handler has been
> offloaded to rcuc kthreads (use_softirq==0). The current code allows
> for the rcu_nocbs case but not the use_softirq case. This commit adds
> support for the use_softirq case.
>
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Zqiang <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> ---


Reviewed-by: Neeraj Upadhyay <[email protected]>


Thanks
Neeraj

> kernel/rcu/tree.c | 4 ++--
> kernel/rcu/tree.h | 2 +-
> kernel/rcu/tree_plugin.h | 33 +++++++++++++++++++--------------
> 3 files changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c25ba442044a6..74455671e6cf2 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2530,7 +2530,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> trace_rcu_batch_end(rcu_state.name, 0,
> !rcu_segcblist_empty(&rdp->cblist),
> need_resched(), is_idle_task(current),
> - rcu_is_callbacks_kthread());
> + rcu_is_callbacks_kthread(rdp));
> return;
> }
>
> @@ -2608,7 +2608,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> rcu_nocb_lock_irqsave(rdp, flags);
> rdp->n_cbs_invoked += count;
> trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
> - is_idle_task(current), rcu_is_callbacks_kthread());
> + is_idle_task(current), rcu_is_callbacks_kthread(rdp));
>
> /* Update counts and requeue any remaining callbacks. */
> rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 4f8532c33558f..649ad4f0129b1 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -426,7 +426,7 @@ static void rcu_flavor_sched_clock_irq(int user);
> static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck);
> static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
> static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
> -static bool rcu_is_callbacks_kthread(void);
> +static bool rcu_is_callbacks_kthread(struct rcu_data *rdp);
> static void rcu_cpu_kthread_setup(unsigned int cpu);
> static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp);
> static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index c8ba0fe17267c..0483e1338c413 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1012,6 +1012,25 @@ static void rcu_cpu_kthread_setup(unsigned int cpu)
> WRITE_ONCE(rdp->rcuc_activity, jiffies);
> }
>
> +static bool rcu_is_callbacks_nocb_kthread(struct rcu_data *rdp)
> +{
> +#ifdef CONFIG_RCU_NOCB_CPU
> + return rdp->nocb_cb_kthread == current;
> +#else
> + return false;
> +#endif
> +}
> +
> +/*
> + * Is the current CPU running the RCU-callbacks kthread?
> + * Caller must have preemption disabled.
> + */
> +static bool rcu_is_callbacks_kthread(struct rcu_data *rdp)
> +{
> + return rdp->rcu_cpu_kthread_task == current ||
> + rcu_is_callbacks_nocb_kthread(rdp);
> +}
> +
> #ifdef CONFIG_RCU_BOOST
>
> /*
> @@ -1151,15 +1170,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
> }
> }
>
> -/*
> - * Is the current CPU running the RCU-callbacks kthread?
> - * Caller must have preemption disabled.
> - */
> -static bool rcu_is_callbacks_kthread(void)
> -{
> - return __this_cpu_read(rcu_data.rcu_cpu_kthread_task) == current;
> -}
> -
> #define RCU_BOOST_DELAY_JIFFIES DIV_ROUND_UP(CONFIG_RCU_BOOST_DELAY * HZ, 1000)
>
> /*
> @@ -1242,11 +1252,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> }
>
> -static bool rcu_is_callbacks_kthread(void)
> -{
> - return false;
> -}
> -
> static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
> {
> }

2022-07-19 10:21:21

by Neeraj Upadhyay

[permalink] [raw]
Subject: Re: [PATCH rcu 7/7] rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty



On 6/21/2022 4:15 AM, Paul E. McKenney wrote:
> From: Zqiang <[email protected]>
>
> Currently, if the 'rcu_nocb_poll' kernel boot parameter is enabled, all
> rcuog kthreads enter polling mode. However, if all of a given group
> of rcuo kthreads correspond to CPUs that have been de-offloaded, the
> corresponding rcuog kthread will nonetheless still wake up periodically,
> unnecessarily consuming power and perturbing workloads. Fortunately,
> this situation is easily detected by the fact that the rcuog kthread's
> CPU's rcu_data structure's ->nocb_head_rdp list is empty.
>
> This commit saves power and avoids unnecessarily perturbing workloads
> by putting an rcuog kthread to sleep during any time period when all of
> its rcuo kthreads' CPUs are de-offloaded.
>
> Co-developed-by: Frederic Weisbecker <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Signed-off-by: Zqiang <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> ---


Reviewed-by: Neeraj Upadhyay <[email protected]>


Thanks
Neeraj

> kernel/rcu/tree_nocb.h | 24 +++++++++++++++++++-----
> 1 file changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index fa8e4f82e60c0..a8f574d8850d2 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -584,6 +584,14 @@ static int nocb_gp_toggle_rdp(struct rcu_data *rdp,
> return ret;
> }
>
> +static void nocb_gp_sleep(struct rcu_data *my_rdp, int cpu)
> +{
> + trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep"));
> + swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq,
> + !READ_ONCE(my_rdp->nocb_gp_sleep));
> + trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("EndSleep"));
> +}
> +
> /*
> * No-CBs GP kthreads come here to wait for additional callbacks to show up
> * or for grace periods to end.
> @@ -701,13 +709,19 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> /* Polling, so trace if first poll in the series. */
> if (gotcbs)
> trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Poll"));
> - schedule_timeout_idle(1);
> + if (list_empty(&my_rdp->nocb_head_rdp)) {
> + raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags);
> + if (!my_rdp->nocb_toggling_rdp)
> + WRITE_ONCE(my_rdp->nocb_gp_sleep, true);
> + raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
> + /* Wait for any offloading rdp */
> + nocb_gp_sleep(my_rdp, cpu);
> + } else {
> + schedule_timeout_idle(1);
> + }
> } else if (!needwait_gp) {
> /* Wait for callbacks to appear. */
> - trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep"));
> - swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq,
> - !READ_ONCE(my_rdp->nocb_gp_sleep));
> - trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("EndSleep"));
> + nocb_gp_sleep(my_rdp, cpu);
> } else {
> rnp = my_rdp->mynode;
> trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("StartWait"));

2022-07-19 10:25:33

by Neeraj Upadhyay

[permalink] [raw]
Subject: Re: [PATCH rcu 6/7] rcu/nocb: Add option to opt rcuo kthreads out of RT priority



On 6/21/2022 4:15 AM, Paul E. McKenney wrote:
> From: "Uladzislau Rezki (Sony)" <[email protected]>
>
> This commit introduces a RCU_NOCB_CPU_CB_BOOST Kconfig option that
> prevents rcuo kthreads from running at real-time priority, even in
> kernels built with RCU_BOOST. This capability is important to devices
> needing low-latency (as in a few milliseconds) response from expedited
> RCU grace periods, but which are not running a classic real-time workload.
> On such devices, permitting the rcuo kthreads to run at real-time priority
> results in unacceptable latencies imposed on the application tasks,
> which run as SCHED_OTHER.
>
> See for example the following trace output:
>
> <snip>
> <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
> <snip>
>
> If that rcuop kthread were permitted to run at real-time SCHED_FIFO
> priority, it would monopolize its CPU for hundreds of milliseconds
> while invoking those 34619 RCU callback functions, which would cause an
> unacceptably long latency spike for many application stacks on Android
> platforms.
>
> However, some existing real-time workloads require that callback
> invocation run at SCHED_FIFO priority, for example, those running on
> systems with heavy SCHED_OTHER background loads. (It is the real-time
> system's administrator's responsibility to make sure that important
> real-time tasks run at a higher priority than do RCU's kthreads.)
>
> Therefore, this new RCU_NOCB_CPU_CB_BOOST Kconfig option defaults to
> "y" on kernels built with PREEMPT_RT and defaults to "n" otherwise.
> The effect is to preserve current behavior for real-time systems, but for
> other systems to allow expedited RCU grace periods to run with real-time
> priority while continuing to invoke RCU callbacks as SCHED_OTHER.
>
> As you would expect, this RCU_NOCB_CPU_CB_BOOST Kconfig option has no
> effect except on CPUs with offloaded RCU callbacks.
>
> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> Acked-by: Joel Fernandes (Google) <[email protected]>
> ---


Reviewed-by: Neeraj Upadhyay <[email protected]>


Thanks
Neeraj

> kernel/rcu/Kconfig | 16 ++++++++++++++++
> kernel/rcu/tree.c | 6 +++++-
> kernel/rcu/tree_nocb.h | 3 ++-
> 3 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> index 27aab870ae4cf..c05ca52cdf64d 100644
> --- a/kernel/rcu/Kconfig
> +++ b/kernel/rcu/Kconfig
> @@ -275,6 +275,22 @@ config RCU_NOCB_CPU_DEFAULT_ALL
> Say Y here if you want offload all CPUs by default on boot.
> Say N here if you are unsure.
>
> +config RCU_NOCB_CPU_CB_BOOST
> + bool "Offload RCU callback from real-time kthread"
> + depends on RCU_NOCB_CPU && RCU_BOOST
> + default y if PREEMPT_RT
> + help
> + Use this option to invoke offloaded callbacks as SCHED_FIFO
> + to avoid starvation by heavy SCHED_OTHER background load.
> + Of course, running as SCHED_FIFO during callback floods will
> + cause the rcuo[ps] kthreads to monopolize the CPU for hundreds
> + of milliseconds or more. Therefore, when enabling this option,
> + it is your responsibility to ensure that latency-sensitive
> + tasks either run with higher priority or run on some other CPU.
> +
> + Say Y here if you want to set RT priority for offloading kthreads.
> + Say N here if you are building a !PREEMPT_RT kernel and are unsure.
> +
> config TASKS_TRACE_RCU_READ_MB
> bool "Tasks Trace RCU readers use memory barriers in user and idle"
> depends on RCU_EXPERT && TASKS_TRACE_RCU
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 74455671e6cf2..3b9f45ebb4999 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -154,7 +154,11 @@ static void sync_sched_exp_online_cleanup(int cpu);
> static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
> static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
>
> -/* rcuc/rcub/rcuop kthread realtime priority */
> +/*
> + * rcuc/rcub/rcuop kthread realtime priority. The "rcuop"
> + * real-time priority(enabling/disabling) is controlled by
> + * the extra CONFIG_RCU_NOCB_CPU_CB_BOOST configuration.
> + */
> static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
> module_param(kthread_prio, int, 0444);
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 60cc92cc66552..fa8e4f82e60c0 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> goto end;
>
> - if (kthread_prio)
> + if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST) && kthread_prio)
> sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
> +
> WRITE_ONCE(rdp->nocb_cb_kthread, t);
> WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
> return;

2022-07-19 18:15:49

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH rcu 4/7] rcu/nocb: Add an option to offload all CPUs on boot

On Tue, Jul 19, 2022 at 03:04:07PM +0530, Neeraj Upadhyay wrote:
>
>
> On 6/21/2022 4:15 AM, Paul E. McKenney wrote:
> > From: Joel Fernandes <[email protected]>
> >
> > Systems built with CONFIG_RCU_NOCB_CPU=y but booted without either
> > the rcu_nocbs= or rcu_nohz_full= kernel-boot parameters will not have
> > callback offloading on any of the CPUs, nor can any of the CPUs be
> > switched to enable callback offloading at runtime. Although this is
> > intentional, it would be nice to have a way to offload all the CPUs
> > without having to make random bootloaders specify either the rcu_nocbs=
> > or the rcu_nohz_full= kernel-boot parameters.
> >
> > This commit therefore provides a new CONFIG_RCU_NOCB_CPU_DEFAULT_ALL
> > Kconfig option that switches the default so as to offload callback
> > processing on all of the CPUs. This default can still be overridden
> > using the rcu_nocbs= and rcu_nohz_full= kernel-boot parameters.
> >
> > Reviewed-by: Kalesh Singh <[email protected]>
> > Reviewed-by: Uladzislau Rezki <[email protected]>
> > (In v4.1, fixed issues with CONFIG maze reported by kernel test robot).
> > Reported-by: kernel test robot <[email protected]>
> > Signed-off-by: Joel Fernandes <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > ---
>
>
> Reviewed-by: Neeraj Upadhyay <[email protected]>
>
> One query on cpumask_setall() below
>
> > Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
> > kernel/rcu/Kconfig | 13 +++++++++++++
> > kernel/rcu/tree_nocb.h | 15 ++++++++++++++-
> > 3 files changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 2522b11e593f2..34605c275294c 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -3659,6 +3659,9 @@
> > just as if they had also been called out in the
> > rcu_nocbs= boot parameter.
> > + Note that this argument takes precedence over
> > + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
> > +
> > noiotrap [SH] Disables trapped I/O port accesses.
> > noirqdebug [X86-32] Disables the code which attempts to detect and
> > @@ -4557,6 +4560,9 @@
> > no-callback mode from boot but the mode may be
> > toggled at runtime via cpusets.
> > + Note that this argument takes precedence over
> > + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
> > +
> > rcu_nocb_poll [KNL]
> > Rather than requiring that offloaded CPUs
> > (specified by rcu_nocbs= above) explicitly
> > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > index 1c630e573548d..27aab870ae4cf 100644
> > --- a/kernel/rcu/Kconfig
> > +++ b/kernel/rcu/Kconfig
> > @@ -262,6 +262,19 @@ config RCU_NOCB_CPU
> > Say Y here if you need reduced OS jitter, despite added overhead.
> > Say N here if you are unsure.
> > +config RCU_NOCB_CPU_DEFAULT_ALL
> > + bool "Offload RCU callback processing from all CPUs by default"
> > + depends on RCU_NOCB_CPU
> > + default n
> > + help
> > + Use this option to offload callback processing from all CPUs
> > + by default, in the absence of the rcu_nocbs or nohz_full boot
> > + parameter. This also avoids the need to use any boot parameters
> > + to achieve the effect of offloading all CPUs on boot.
> > +
> > + Say Y here if you want offload all CPUs by default on boot.
> > + Say N here if you are unsure.
> > +
> > config TASKS_TRACE_RCU_READ_MB
> > bool "Tasks Trace RCU readers use memory barriers in user and idle"
> > depends on RCU_EXPERT && TASKS_TRACE_RCU
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 4cf9a29bba79d..60cc92cc66552 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -1197,11 +1197,21 @@ void __init rcu_init_nohz(void)
> > {
> > int cpu;
> > bool need_rcu_nocb_mask = false;
> > + bool offload_all = false;
> > struct rcu_data *rdp;
> > +#if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL)
> > + if (!rcu_state.nocb_is_setup) {
> > + need_rcu_nocb_mask = true;
> > + offload_all = true;
> > + }
> > +#endif /* #if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) */
> > +
> > #if defined(CONFIG_NO_HZ_FULL)
> > - if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
> > + if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) {
> > need_rcu_nocb_mask = true;
> > + offload_all = false; /* NO_HZ_FULL has its own mask. */
> > + }
> > #endif /* #if defined(CONFIG_NO_HZ_FULL) */
> > if (need_rcu_nocb_mask) {
> > @@ -1222,6 +1232,9 @@ void __init rcu_init_nohz(void)
> > cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
> > #endif /* #if defined(CONFIG_NO_HZ_FULL) */
> > + if (offload_all)
> > + cpumask_setall(rcu_nocb_mask);
>
> Do we need to do a cpumask_and(rcu_nocb_mask, cpu_possible_mask,
> rcu_nocb_mask) after setting all cpus in rcu_nocb_mask (cpumask_subset()
> check below takes care of it though)?

Without that cpumask_and(), systems with sparse CPU numbering schemes
(for example, 0, 4, 8, 12, ...) will get a pr_info(), and as you noted,
the needed cpumask_and().

I am inclined to see a complaint before we change this. And perhaps if
this is to change, the change should be in cpumask_setall() rather than
in rcu_init_nohz(). But that is an argument for later, if at all. ;-)

And thank you for reviewing this series!

Thanx, Paul

> Thanks
> Neeraj
>
> > +
> > if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
> > pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n");
> > cpumask_and(rcu_nocb_mask, cpu_possible_mask,

2022-07-19 23:03:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH rcu 4/7] rcu/nocb: Add an option to offload all CPUs on boot

On Tue, Jul 19, 2022 at 06:42:00PM -0400, Joel Fernandes wrote:
>
>
> On 7/19/2022 2:12 PM, Paul E. McKenney wrote:
> > On Tue, Jul 19, 2022 at 03:04:07PM +0530, Neeraj Upadhyay wrote:
> >>
> >>
> >> On 6/21/2022 4:15 AM, Paul E. McKenney wrote:
> >>> From: Joel Fernandes <[email protected]>
> >>>
> >>> Systems built with CONFIG_RCU_NOCB_CPU=y but booted without either
> >>> the rcu_nocbs= or rcu_nohz_full= kernel-boot parameters will not have
> >>> callback offloading on any of the CPUs, nor can any of the CPUs be
> >>> switched to enable callback offloading at runtime. Although this is
> >>> intentional, it would be nice to have a way to offload all the CPUs
> >>> without having to make random bootloaders specify either the rcu_nocbs=
> >>> or the rcu_nohz_full= kernel-boot parameters.
> >>>
> >>> This commit therefore provides a new CONFIG_RCU_NOCB_CPU_DEFAULT_ALL
> >>> Kconfig option that switches the default so as to offload callback
> >>> processing on all of the CPUs. This default can still be overridden
> >>> using the rcu_nocbs= and rcu_nohz_full= kernel-boot parameters.
> >>>
> >>> Reviewed-by: Kalesh Singh <[email protected]>
> >>> Reviewed-by: Uladzislau Rezki <[email protected]>
> >>> (In v4.1, fixed issues with CONFIG maze reported by kernel test robot).
> >>> Reported-by: kernel test robot <[email protected]>
> >>> Signed-off-by: Joel Fernandes <[email protected]>
> >>> Signed-off-by: Paul E. McKenney <[email protected]>
> >>> ---
> >>
> >>
> >> Reviewed-by: Neeraj Upadhyay <[email protected]>
> >>
> >> One query on cpumask_setall() below
> >>
> >>> Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
> >>> kernel/rcu/Kconfig | 13 +++++++++++++
> >>> kernel/rcu/tree_nocb.h | 15 ++++++++++++++-
> >>> 3 files changed, 33 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >>> index 2522b11e593f2..34605c275294c 100644
> >>> --- a/Documentation/admin-guide/kernel-parameters.txt
> >>> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >>> @@ -3659,6 +3659,9 @@
> >>> just as if they had also been called out in the
> >>> rcu_nocbs= boot parameter.
> >>> + Note that this argument takes precedence over
> >>> + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
> >>> +
> >>> noiotrap [SH] Disables trapped I/O port accesses.
> >>> noirqdebug [X86-32] Disables the code which attempts to detect and
> >>> @@ -4557,6 +4560,9 @@
> >>> no-callback mode from boot but the mode may be
> >>> toggled at runtime via cpusets.
> >>> + Note that this argument takes precedence over
> >>> + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
> >>> +
> >>> rcu_nocb_poll [KNL]
> >>> Rather than requiring that offloaded CPUs
> >>> (specified by rcu_nocbs= above) explicitly
> >>> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> >>> index 1c630e573548d..27aab870ae4cf 100644
> >>> --- a/kernel/rcu/Kconfig
> >>> +++ b/kernel/rcu/Kconfig
> >>> @@ -262,6 +262,19 @@ config RCU_NOCB_CPU
> >>> Say Y here if you need reduced OS jitter, despite added overhead.
> >>> Say N here if you are unsure.
> >>> +config RCU_NOCB_CPU_DEFAULT_ALL
> >>> + bool "Offload RCU callback processing from all CPUs by default"
> >>> + depends on RCU_NOCB_CPU
> >>> + default n
> >>> + help
> >>> + Use this option to offload callback processing from all CPUs
> >>> + by default, in the absence of the rcu_nocbs or nohz_full boot
> >>> + parameter. This also avoids the need to use any boot parameters
> >>> + to achieve the effect of offloading all CPUs on boot.
> >>> +
> >>> + Say Y here if you want offload all CPUs by default on boot.
> >>> + Say N here if you are unsure.
> >>> +
> >>> config TASKS_TRACE_RCU_READ_MB
> >>> bool "Tasks Trace RCU readers use memory barriers in user and idle"
> >>> depends on RCU_EXPERT && TASKS_TRACE_RCU
> >>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> >>> index 4cf9a29bba79d..60cc92cc66552 100644
> >>> --- a/kernel/rcu/tree_nocb.h
> >>> +++ b/kernel/rcu/tree_nocb.h
> >>> @@ -1197,11 +1197,21 @@ void __init rcu_init_nohz(void)
> >>> {
> >>> int cpu;
> >>> bool need_rcu_nocb_mask = false;
> >>> + bool offload_all = false;
> >>> struct rcu_data *rdp;
> >>> +#if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL)
> >>> + if (!rcu_state.nocb_is_setup) {
> >>> + need_rcu_nocb_mask = true;
> >>> + offload_all = true;
> >>> + }
> >>> +#endif /* #if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) */
> >>> +
> >>> #if defined(CONFIG_NO_HZ_FULL)
> >>> - if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
> >>> + if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) {
> >>> need_rcu_nocb_mask = true;
> >>> + offload_all = false; /* NO_HZ_FULL has its own mask. */
> >>> + }
> >>> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
> >>> if (need_rcu_nocb_mask) {
> >>> @@ -1222,6 +1232,9 @@ void __init rcu_init_nohz(void)
> >>> cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
> >>> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
> >>> + if (offload_all)
> >>> + cpumask_setall(rcu_nocb_mask);
> >>
> >> Do we need to do a cpumask_and(rcu_nocb_mask, cpu_possible_mask,
> >> rcu_nocb_mask) after setting all cpus in rcu_nocb_mask (cpumask_subset()
> >> check below takes care of it though)?
> >
> > Without that cpumask_and(), systems with sparse CPU numbering schemes
> > (for example, 0, 4, 8, 12, ...) will get a pr_info(), and as you noted,
> > the needed cpumask_and().
> >
> > I am inclined to see a complaint before we change this. And perhaps if
> > this is to change, the change should be in cpumask_setall() rather than
> > in rcu_init_nohz(). But that is an argument for later, if at all. ;-)
> >
> >>> +
> >>> if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
>
> We could also suppress the pr_info() by making it conditional.
>
> like:
>
> if (!CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) {
> pr_info(...);
> }
>
> In other words, we could make the cpumask_and() as expected/normal on
> systems with sparse CPU numbering schemes. Would that work?

That would be a good within-RCU workaround if we get an urgent complaint,
but if this requires a change, shouldn't cpumask_setall() refrain from
setting bits for non-existent CPUs? It does refrain from setting any
bits beyond the largest-numbered CPU.

But perhaps there is an early boot reason why cpumask_setall() cannot
do this?

Either way, we are just doing a pr_info(), not a WARN_ON() or similar,
so the current state is probably fine.

Thanx, Paul

> Thanks,
>
> - Joel
>
>
> >>> pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n");
>
>
>
> >>> cpumask_and(rcu_nocb_mask, cpu_possible_mask,

2022-07-19 23:18:37

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH rcu 4/7] rcu/nocb: Add an option to offload all CPUs on boot



On 7/19/2022 2:12 PM, Paul E. McKenney wrote:
> On Tue, Jul 19, 2022 at 03:04:07PM +0530, Neeraj Upadhyay wrote:
>>
>>
>> On 6/21/2022 4:15 AM, Paul E. McKenney wrote:
>>> From: Joel Fernandes <[email protected]>
>>>
>>> Systems built with CONFIG_RCU_NOCB_CPU=y but booted without either
>>> the rcu_nocbs= or rcu_nohz_full= kernel-boot parameters will not have
>>> callback offloading on any of the CPUs, nor can any of the CPUs be
>>> switched to enable callback offloading at runtime. Although this is
>>> intentional, it would be nice to have a way to offload all the CPUs
>>> without having to make random bootloaders specify either the rcu_nocbs=
>>> or the rcu_nohz_full= kernel-boot parameters.
>>>
>>> This commit therefore provides a new CONFIG_RCU_NOCB_CPU_DEFAULT_ALL
>>> Kconfig option that switches the default so as to offload callback
>>> processing on all of the CPUs. This default can still be overridden
>>> using the rcu_nocbs= and rcu_nohz_full= kernel-boot parameters.
>>>
>>> Reviewed-by: Kalesh Singh <[email protected]>
>>> Reviewed-by: Uladzislau Rezki <[email protected]>
>>> (In v4.1, fixed issues with CONFIG maze reported by kernel test robot).
>>> Reported-by: kernel test robot <[email protected]>
>>> Signed-off-by: Joel Fernandes <[email protected]>
>>> Signed-off-by: Paul E. McKenney <[email protected]>
>>> ---
>>
>>
>> Reviewed-by: Neeraj Upadhyay <[email protected]>
>>
>> One query on cpumask_setall() below
>>
>>> Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
>>> kernel/rcu/Kconfig | 13 +++++++++++++
>>> kernel/rcu/tree_nocb.h | 15 ++++++++++++++-
>>> 3 files changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index 2522b11e593f2..34605c275294c 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -3659,6 +3659,9 @@
>>> just as if they had also been called out in the
>>> rcu_nocbs= boot parameter.
>>> + Note that this argument takes precedence over
>>> + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
>>> +
>>> noiotrap [SH] Disables trapped I/O port accesses.
>>> noirqdebug [X86-32] Disables the code which attempts to detect and
>>> @@ -4557,6 +4560,9 @@
>>> no-callback mode from boot but the mode may be
>>> toggled at runtime via cpusets.
>>> + Note that this argument takes precedence over
>>> + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
>>> +
>>> rcu_nocb_poll [KNL]
>>> Rather than requiring that offloaded CPUs
>>> (specified by rcu_nocbs= above) explicitly
>>> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
>>> index 1c630e573548d..27aab870ae4cf 100644
>>> --- a/kernel/rcu/Kconfig
>>> +++ b/kernel/rcu/Kconfig
>>> @@ -262,6 +262,19 @@ config RCU_NOCB_CPU
>>> Say Y here if you need reduced OS jitter, despite added overhead.
>>> Say N here if you are unsure.
>>> +config RCU_NOCB_CPU_DEFAULT_ALL
>>> + bool "Offload RCU callback processing from all CPUs by default"
>>> + depends on RCU_NOCB_CPU
>>> + default n
>>> + help
>>> + Use this option to offload callback processing from all CPUs
>>> + by default, in the absence of the rcu_nocbs or nohz_full boot
>>> + parameter. This also avoids the need to use any boot parameters
>>> + to achieve the effect of offloading all CPUs on boot.
>>> +
>>> + Say Y here if you want offload all CPUs by default on boot.
>>> + Say N here if you are unsure.
>>> +
>>> config TASKS_TRACE_RCU_READ_MB
>>> bool "Tasks Trace RCU readers use memory barriers in user and idle"
>>> depends on RCU_EXPERT && TASKS_TRACE_RCU
>>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
>>> index 4cf9a29bba79d..60cc92cc66552 100644
>>> --- a/kernel/rcu/tree_nocb.h
>>> +++ b/kernel/rcu/tree_nocb.h
>>> @@ -1197,11 +1197,21 @@ void __init rcu_init_nohz(void)
>>> {
>>> int cpu;
>>> bool need_rcu_nocb_mask = false;
>>> + bool offload_all = false;
>>> struct rcu_data *rdp;
>>> +#if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL)
>>> + if (!rcu_state.nocb_is_setup) {
>>> + need_rcu_nocb_mask = true;
>>> + offload_all = true;
>>> + }
>>> +#endif /* #if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) */
>>> +
>>> #if defined(CONFIG_NO_HZ_FULL)
>>> - if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
>>> + if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) {
>>> need_rcu_nocb_mask = true;
>>> + offload_all = false; /* NO_HZ_FULL has its own mask. */
>>> + }
>>> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
>>> if (need_rcu_nocb_mask) {
>>> @@ -1222,6 +1232,9 @@ void __init rcu_init_nohz(void)
>>> cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
>>> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
>>> + if (offload_all)
>>> + cpumask_setall(rcu_nocb_mask);
>>
>> Do we need to do a cpumask_and(rcu_nocb_mask, cpu_possible_mask,
>> rcu_nocb_mask) after setting all cpus in rcu_nocb_mask (cpumask_subset()
>> check below takes care of it though)?
>
> Without that cpumask_and(), systems with sparse CPU numbering schemes
> (for example, 0, 4, 8, 12, ...) will get a pr_info(), and as you noted,
> the needed cpumask_and().
>
> I am inclined to see a complaint before we change this. And perhaps if
> this is to change, the change should be in cpumask_setall() rather than
> in rcu_init_nohz(). But that is an argument for later, if at all. ;-)
>
>>> +
>>> if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {

We could also suppress the pr_info() by making it conditional.

like:

if (!CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) {
pr_info(...);
}

In other words, we could make the cpumask_and() as expected/normal on
systems with sparse CPU numbering schemes. Would that work?

Thanks,

- Joel


>>> pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n");



>>> cpumask_and(rcu_nocb_mask, cpu_possible_mask,

2022-07-19 23:47:06

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH rcu 4/7] rcu/nocb: Add an option to offload all CPUs on boot



On 7/19/2022 6:53 PM, Paul E. McKenney wrote:
> On Tue, Jul 19, 2022 at 06:42:00PM -0400, Joel Fernandes wrote:
>>
>>
>> On 7/19/2022 2:12 PM, Paul E. McKenney wrote:
>>> On Tue, Jul 19, 2022 at 03:04:07PM +0530, Neeraj Upadhyay wrote:
>>>>
>>>>
>>>> On 6/21/2022 4:15 AM, Paul E. McKenney wrote:
>>>>> From: Joel Fernandes <[email protected]>
>>>>>
>>>>> Systems built with CONFIG_RCU_NOCB_CPU=y but booted without either
>>>>> the rcu_nocbs= or rcu_nohz_full= kernel-boot parameters will not have
>>>>> callback offloading on any of the CPUs, nor can any of the CPUs be
>>>>> switched to enable callback offloading at runtime. Although this is
>>>>> intentional, it would be nice to have a way to offload all the CPUs
>>>>> without having to make random bootloaders specify either the rcu_nocbs=
>>>>> or the rcu_nohz_full= kernel-boot parameters.
>>>>>
>>>>> This commit therefore provides a new CONFIG_RCU_NOCB_CPU_DEFAULT_ALL
>>>>> Kconfig option that switches the default so as to offload callback
>>>>> processing on all of the CPUs. This default can still be overridden
>>>>> using the rcu_nocbs= and rcu_nohz_full= kernel-boot parameters.
>>>>>
>>>>> Reviewed-by: Kalesh Singh <[email protected]>
>>>>> Reviewed-by: Uladzislau Rezki <[email protected]>
>>>>> (In v4.1, fixed issues with CONFIG maze reported by kernel test robot).
>>>>> Reported-by: kernel test robot <[email protected]>
>>>>> Signed-off-by: Joel Fernandes <[email protected]>
>>>>> Signed-off-by: Paul E. McKenney <[email protected]>
>>>>> ---
>>>>
>>>>
>>>> Reviewed-by: Neeraj Upadhyay <[email protected]>
>>>>
>>>> One query on cpumask_setall() below
>>>>
>>>>> Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
>>>>> kernel/rcu/Kconfig | 13 +++++++++++++
>>>>> kernel/rcu/tree_nocb.h | 15 ++++++++++++++-
>>>>> 3 files changed, 33 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>>>> index 2522b11e593f2..34605c275294c 100644
>>>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>>>> @@ -3659,6 +3659,9 @@
>>>>> just as if they had also been called out in the
>>>>> rcu_nocbs= boot parameter.
>>>>> + Note that this argument takes precedence over
>>>>> + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
>>>>> +
>>>>> noiotrap [SH] Disables trapped I/O port accesses.
>>>>> noirqdebug [X86-32] Disables the code which attempts to detect and
>>>>> @@ -4557,6 +4560,9 @@
>>>>> no-callback mode from boot but the mode may be
>>>>> toggled at runtime via cpusets.
>>>>> + Note that this argument takes precedence over
>>>>> + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
>>>>> +
>>>>> rcu_nocb_poll [KNL]
>>>>> Rather than requiring that offloaded CPUs
>>>>> (specified by rcu_nocbs= above) explicitly
>>>>> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
>>>>> index 1c630e573548d..27aab870ae4cf 100644
>>>>> --- a/kernel/rcu/Kconfig
>>>>> +++ b/kernel/rcu/Kconfig
>>>>> @@ -262,6 +262,19 @@ config RCU_NOCB_CPU
>>>>> Say Y here if you need reduced OS jitter, despite added overhead.
>>>>> Say N here if you are unsure.
>>>>> +config RCU_NOCB_CPU_DEFAULT_ALL
>>>>> + bool "Offload RCU callback processing from all CPUs by default"
>>>>> + depends on RCU_NOCB_CPU
>>>>> + default n
>>>>> + help
>>>>> + Use this option to offload callback processing from all CPUs
>>>>> + by default, in the absence of the rcu_nocbs or nohz_full boot
>>>>> + parameter. This also avoids the need to use any boot parameters
>>>>> + to achieve the effect of offloading all CPUs on boot.
>>>>> +
>>>>> + Say Y here if you want offload all CPUs by default on boot.
>>>>> + Say N here if you are unsure.
>>>>> +
>>>>> config TASKS_TRACE_RCU_READ_MB
>>>>> bool "Tasks Trace RCU readers use memory barriers in user and idle"
>>>>> depends on RCU_EXPERT && TASKS_TRACE_RCU
>>>>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
>>>>> index 4cf9a29bba79d..60cc92cc66552 100644
>>>>> --- a/kernel/rcu/tree_nocb.h
>>>>> +++ b/kernel/rcu/tree_nocb.h
>>>>> @@ -1197,11 +1197,21 @@ void __init rcu_init_nohz(void)
>>>>> {
>>>>> int cpu;
>>>>> bool need_rcu_nocb_mask = false;
>>>>> + bool offload_all = false;
>>>>> struct rcu_data *rdp;
>>>>> +#if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL)
>>>>> + if (!rcu_state.nocb_is_setup) {
>>>>> + need_rcu_nocb_mask = true;
>>>>> + offload_all = true;
>>>>> + }
>>>>> +#endif /* #if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) */
>>>>> +
>>>>> #if defined(CONFIG_NO_HZ_FULL)
>>>>> - if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
>>>>> + if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) {
>>>>> need_rcu_nocb_mask = true;
>>>>> + offload_all = false; /* NO_HZ_FULL has its own mask. */
>>>>> + }
>>>>> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
>>>>> if (need_rcu_nocb_mask) {
>>>>> @@ -1222,6 +1232,9 @@ void __init rcu_init_nohz(void)
>>>>> cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
>>>>> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
>>>>> + if (offload_all)
>>>>> + cpumask_setall(rcu_nocb_mask);
>>>>
>>>> Do we need to do a cpumask_and(rcu_nocb_mask, cpu_possible_mask,
>>>> rcu_nocb_mask) after setting all cpus in rcu_nocb_mask (cpumask_subset()
>>>> check below takes care of it though)?
>>>
>>> Without that cpumask_and(), systems with sparse CPU numbering schemes
>>> (for example, 0, 4, 8, 12, ...) will get a pr_info(), and as you noted,
>>> the needed cpumask_and().
>>>
>>> I am inclined to see a complaint before we change this. And perhaps if
>>> this is to change, the change should be in cpumask_setall() rather than
>>> in rcu_init_nohz(). But that is an argument for later, if at all. ;-)
>>>
>>>>> +
>>>>> if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
>>
>> We could also suppress the pr_info() by making it conditional.
>>
>> like:
>>
>> if (!CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) {
>> pr_info(...);
>> }
>>
>> In other words, we could make the cpumask_and() as expected/normal on
>> systems with sparse CPU numbering schemes. Would that work?
>
> That would be a good within-RCU workaround if we get an urgent complaint,
> but if this requires a change, shouldn't cpumask_setall() refrain from
> setting bits for non-existent CPUs? It does refrain from setting any
> bits beyond the largest-numbered CPU.
>
> But perhaps there is an early boot reason why cpumask_setall() cannot
> do this?

Agreed, it would be great if it did not set those bits. I checked other
places in the kernel like kernel/sched/core.c and cannot find that it is
masking the bits after the setall(), so maybe its Ok?

> Either way, we are just doing a pr_info(), not a WARN_ON() or similar,
> so the current state is probably fine.

Agreed, thanks.

- Joel

>
> Thanx, Paul
>
>> Thanks,
>>
>> - Joel
>>
>>
>>>>> pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n");
>>
>>
>>
>>>>> cpumask_and(rcu_nocb_mask, cpu_possible_mask,