Changes since v1 after Paul's reviews:
* Clarify why the DEL vs ADD possible race on rdp group list is ok [1/6]
* Update kernel parameters documentation [5/6]
* Only create rcuo[sp] kthreads for CPUs that have ever come online [4/6]
* Consider nohz_full= on changelogs
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
rcu/nocb
HEAD: da51363e5ddf54d6ce9c2cfbab946f8914519290
Thanks,
Frederic
---
Frederic Weisbecker (6):
rcu/nocb: Remove rdp from nocb list when de-offloaded
rcu/nocb: Prepare nocb_cb_wait() to start with a non-offloaded rdp
rcu/nocb: Optimize kthreads and rdp initialization
rcu/nocb: Create kthreads on all CPUs if "rcu_nocb=" or "nohz_full=" are passed
rcu/nocb: Allow empty "rcu_nocbs" kernel parameter
rcu/nocb: Merge rcu_spawn_cpu_nocb_kthread() and rcu_spawn_one_nocb_kthread()
Documentation/admin-guide/kernel-parameters.txt | 28 ++++--
kernel/rcu/tree.h | 7 +-
kernel/rcu/tree_nocb.h | 119 +++++++++++++++---------
3 files changed, 96 insertions(+), 58 deletions(-)
nocb_gp_wait() iterates all CPUs within the rcuog's group even if they
are have been de-offloaded. This is suboptimal if only few CPUs are
offloaded within the group. And this will become even more a problem
when a nocb kthread will be created for all possible CPUs in the future.
Therefore use a standard double linked list to link all the offloaded
rdps and safely add/del their nodes as we (de-)offloaded them.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
---
kernel/rcu/tree.h | 7 +++++--
kernel/rcu/tree_nocb.h | 37 ++++++++++++++++++++++++++++++-------
2 files changed, 35 insertions(+), 9 deletions(-)
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index deeaf2fee714..486fc901bd08 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -221,8 +221,11 @@ struct rcu_data {
struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
bool nocb_cb_sleep; /* Is the nocb CB thread asleep? */
struct task_struct *nocb_cb_kthread;
- struct rcu_data *nocb_next_cb_rdp;
- /* Next rcu_data in wakeup chain. */
+ struct list_head nocb_head_rdp; /*
+ * Head of rcu_data list in wakeup chain,
+ * if rdp_gp.
+ */
+ struct list_head nocb_entry_rdp; /* rcu_data node in wakeup chain. */
/* The following fields are used by CB kthread, hence new cacheline. */
struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp;
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 2461fe8d0c23..cc1165559177 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -625,7 +625,15 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
* and the global grace-period kthread are awakened if needed.
*/
WARN_ON_ONCE(my_rdp->nocb_gp_rdp != my_rdp);
- for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
+ /*
+ * An rdp can be removed from the list after being de-offloaded or added
+ * to the list before being (re-)offloaded. If the below loop happens while
+ * an rdp is de-offloaded and then re-offloaded shortly afterward, we may
+ * shortcut and ignore a part of the rdp list due to racy list iteration.
+ * Fortunately a new run through the entire loop is forced after an rdp is
+ * added here so that such race get quickly fixed.
+ */
+ list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) {
bool needwake_state = false;
if (!nocb_gp_enabled_cb(rdp))
@@ -1003,6 +1011,8 @@ static long rcu_nocb_rdp_deoffload(void *arg)
swait_event_exclusive(rdp->nocb_state_wq,
!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
SEGCBLIST_KTHREAD_GP));
+ /* Don't bother iterate this one anymore on nocb_gp_wait() */
+ list_del_rcu(&rdp->nocb_entry_rdp);
/*
* Lock one last time to acquire latest callback updates from kthreads
* so we can later handle callbacks locally without locking.
@@ -1066,6 +1076,15 @@ static long rcu_nocb_rdp_offload(void *arg)
return -EINVAL;
pr_info("Offloading %d\n", rdp->cpu);
+
+ /*
+ * Iterate this CPU on nocb_gp_wait(). We do it before locking nocb_gp_lock,
+ * resetting nocb_gp_sleep and waking up the related "rcuog". Since nocb_gp_wait()
+ * in turn locks nocb_gp_lock before setting nocb_gp_sleep again, we are guaranteed
+ * to iterate this new rdp before "rcuog" goes to sleep again.
+ */
+ list_add_tail_rcu(&rdp->nocb_entry_rdp, &rdp->nocb_gp_rdp->nocb_head_rdp);
+
/*
* Can't use rcu_nocb_lock_irqsave() before SEGCBLIST_LOCKING
* is set.
@@ -1268,7 +1287,6 @@ static void __init rcu_organize_nocb_kthreads(void)
int nl = 0; /* Next GP kthread. */
struct rcu_data *rdp;
struct rcu_data *rdp_gp = NULL; /* Suppress misguided gcc warn. */
- struct rcu_data *rdp_prev = NULL;
if (!cpumask_available(rcu_nocb_mask))
return;
@@ -1288,8 +1306,8 @@ static void __init rcu_organize_nocb_kthreads(void)
/* New GP kthread, set up for CBs & next GP. */
gotnocbs = true;
nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls;
- rdp->nocb_gp_rdp = rdp;
rdp_gp = rdp;
+ INIT_LIST_HEAD(&rdp->nocb_head_rdp);
if (dump_tree) {
if (!firsttime)
pr_cont("%s\n", gotnocbscbs
@@ -1302,12 +1320,11 @@ static void __init rcu_organize_nocb_kthreads(void)
} else {
/* Another CB kthread, link to previous GP kthread. */
gotnocbscbs = true;
- rdp->nocb_gp_rdp = rdp_gp;
- rdp_prev->nocb_next_cb_rdp = rdp;
if (dump_tree)
pr_cont(" %d", cpu);
}
- rdp_prev = rdp;
+ rdp->nocb_gp_rdp = rdp_gp;
+ list_add_tail(&rdp->nocb_entry_rdp, &rdp_gp->nocb_head_rdp);
}
if (gotnocbs && dump_tree)
pr_cont("%s\n", gotnocbscbs ? "" : " (self only)");
@@ -1369,6 +1386,7 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
{
char bufw[20];
char bufr[20];
+ struct rcu_data *nocb_next_rdp;
struct rcu_segcblist *rsclp = &rdp->cblist;
bool waslocked;
bool wassleep;
@@ -1376,11 +1394,16 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
if (rdp->nocb_gp_rdp == rdp)
show_rcu_nocb_gp_state(rdp);
+ nocb_next_rdp = list_next_or_null_rcu(&rdp->nocb_gp_rdp->nocb_head_rdp,
+ &rdp->nocb_entry_rdp,
+ typeof(*rdp),
+ nocb_entry_rdp);
+
sprintf(bufw, "%ld", rsclp->gp_seq[RCU_WAIT_TAIL]);
sprintf(bufr, "%ld", rsclp->gp_seq[RCU_NEXT_READY_TAIL]);
pr_info(" CB %d^%d->%d %c%c%c%c%c%c F%ld L%ld C%d %c%c%s%c%s%c%c q%ld %c CPU %d%s\n",
rdp->cpu, rdp->nocb_gp_rdp->cpu,
- rdp->nocb_next_cb_rdp ? rdp->nocb_next_cb_rdp->cpu : -1,
+ nocb_next_rdp ? nocb_next_rdp->cpu : -1,
"kK"[!!rdp->nocb_cb_kthread],
"bB"[raw_spin_is_locked(&rdp->nocb_bypass_lock)],
"cC"[!!atomic_read(&rdp->nocb_lock_contended)],
--
2.25.1
In order to be able to toggle the offloaded state from cpusets, a nocb
kthread will need to be created for all possible CPUs whenever the
"rcu_nocbs=" or "nohz_full=" parameters are passed.
Therefore nocb_cb_wait() kthread must prepare to start running on a
de-offloaded rdp. Simply move the sleeping condition in the beginning
of the kthread callback is enough to prevent from running callbacks
before the rdp ever becomes offloaded.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
---
kernel/rcu/tree_nocb.h | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index cc1165559177..e1cb06840454 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -797,6 +797,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
bool can_sleep = true;
struct rcu_node *rnp = rdp->mynode;
+ do {
+ swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
+ nocb_cb_wait_cond(rdp));
+
+ // VVV Ensure CB invocation follows _sleep test.
+ if (smp_load_acquire(&rdp->nocb_cb_sleep)) { // ^^^
+ WARN_ON(signal_pending(current));
+ trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
+ }
+ } while (!nocb_cb_can_run(rdp));
+
+
local_irq_save(flags);
rcu_momentary_dyntick_idle();
local_irq_restore(flags);
@@ -849,17 +861,6 @@ static void nocb_cb_wait(struct rcu_data *rdp)
if (needwake_state)
swake_up_one(&rdp->nocb_state_wq);
-
- do {
- swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
- nocb_cb_wait_cond(rdp));
-
- // VVV Ensure CB invocation follows _sleep test.
- if (smp_load_acquire(&rdp->nocb_cb_sleep)) { // ^^^
- WARN_ON(signal_pending(current));
- trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
- }
- } while (!nocb_cb_can_run(rdp));
}
/*
--
2.25.1
Currently cpumask_available() is used to prevent from unwanted
NOCB initialization. However if neither "rcu_nocbs=" nor "nohz_full="
parameters are passed but CONFIG_CPUMASK_OFFSTACK=n, the initialization
path is still taken, running through all sorts of needless operations
and iterations on an empty cpumask.
Fix this with relying on a real initialization state instead. This
also optimize kthreads creation, sparing iteration over all online CPUs
when nocb isn't initialized.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
---
kernel/rcu/tree_nocb.h | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index e1cb06840454..d8ed3ee47a67 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -60,6 +60,9 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
* Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
* If the list is invalid, a warning is emitted and all CPUs are offloaded.
*/
+
+static bool rcu_nocb_is_setup;
+
static int __init rcu_nocb_setup(char *str)
{
alloc_bootmem_cpumask_var(&rcu_nocb_mask);
@@ -67,6 +70,7 @@ static int __init rcu_nocb_setup(char *str)
pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
cpumask_setall(rcu_nocb_mask);
}
+ rcu_nocb_is_setup = true;
return 1;
}
__setup("rcu_nocbs=", rcu_nocb_setup);
@@ -1159,13 +1163,17 @@ void __init rcu_init_nohz(void)
need_rcu_nocb_mask = true;
#endif /* #if defined(CONFIG_NO_HZ_FULL) */
- if (!cpumask_available(rcu_nocb_mask) && need_rcu_nocb_mask) {
- if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
- pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
- return;
+ if (need_rcu_nocb_mask) {
+ if (!cpumask_available(rcu_nocb_mask)) {
+ if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
+ pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
+ return;
+ }
}
+ rcu_nocb_is_setup = true;
}
- if (!cpumask_available(rcu_nocb_mask))
+
+ if (!rcu_nocb_is_setup)
return;
#if defined(CONFIG_NO_HZ_FULL)
@@ -1267,8 +1275,10 @@ static void __init rcu_spawn_nocb_kthreads(void)
{
int cpu;
- for_each_online_cpu(cpu)
- rcu_spawn_cpu_nocb_kthread(cpu);
+ if (rcu_nocb_is_setup) {
+ for_each_online_cpu(cpu)
+ rcu_spawn_cpu_nocb_kthread(cpu);
+ }
}
/* How many CB CPU IDs per GP kthread? Default of -1 for sqrt(nr_cpu_ids). */
--
2.25.1
In order to be able to (de-)offload any CPU using cpuset in the future,
create a NOCB kthread for all possible CPUs. For now this is done only
as long as the "rcu_nocb=" or "nohz_full=" kernel parameters are passed
to avoid the unnecessary overhead for most users.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
---
kernel/rcu/tree_nocb.h | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index d8ed3ee47a67..9d37916278d4 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1229,11 +1229,8 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
struct rcu_data *rdp_gp;
struct task_struct *t;
- /*
- * If this isn't a no-CBs CPU or if it already has an rcuo kthread,
- * then nothing to do.
- */
- if (!rcu_is_nocb_cpu(cpu) || rdp->nocb_cb_kthread)
+ /* If it already has an rcuo kthread, then nothing to do. */
+ if (rdp->nocb_cb_kthread)
return;
/* If we didn't spawn the GP kthread first, reorganize! */
@@ -1261,7 +1258,7 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
*/
static void rcu_spawn_cpu_nocb_kthread(int cpu)
{
- if (rcu_scheduler_fully_active)
+ if (rcu_scheduler_fully_active && rcu_nocb_is_setup)
rcu_spawn_one_nocb_kthread(cpu);
}
@@ -1311,7 +1308,7 @@ static void __init rcu_organize_nocb_kthreads(void)
* Should the corresponding CPU come online in the future, then
* we will spawn the needed set of rcu_nocb_kthread() kthreads.
*/
- for_each_cpu(cpu, rcu_nocb_mask) {
+ for_each_possible_cpu(cpu) {
rdp = per_cpu_ptr(&rcu_data, cpu);
if (rdp->cpu >= nl) {
/* New GP kthread, set up for CBs & next GP. */
@@ -1335,7 +1332,8 @@ static void __init rcu_organize_nocb_kthreads(void)
pr_cont(" %d", cpu);
}
rdp->nocb_gp_rdp = rdp_gp;
- list_add_tail(&rdp->nocb_entry_rdp, &rdp_gp->nocb_head_rdp);
+ if (cpumask_test_cpu(cpu, rcu_nocb_mask))
+ list_add_tail(&rdp->nocb_entry_rdp, &rdp_gp->nocb_head_rdp);
}
if (gotnocbs && dump_tree)
pr_cont("%s\n", gotnocbscbs ? "" : " (self only)");
--
2.25.1
If a user wants to boot without any CPU in offloaded mode initially but
with the possibility to offload them later using cpusets, provide a way
to simply pass an empty "rcu_nocbs" kernel parameter which will enforce
the creation of dormant nocb kthreads.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 26 ++++++++++++-------
kernel/rcu/tree_nocb.h | 10 ++++---
2 files changed, 23 insertions(+), 13 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cd860dc7c60b..6ff1a5f06383 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4351,20 +4351,28 @@
Disable the Correctable Errors Collector,
see CONFIG_RAS_CEC help text.
- rcu_nocbs= [KNL]
- The argument is a cpu list, as described above.
+ rcu_nocbs[=cpu-list]
+ [KNL] The optional argument is a cpu list,
+ as described above.
- In kernels built with CONFIG_RCU_NOCB_CPU=y, set
- the specified list of CPUs to be no-callback CPUs.
- Invocation of these CPUs' RCU callbacks will be
- offloaded to "rcuox/N" kthreads created for that
- purpose, where "x" is "p" for RCU-preempt, and
- "s" for RCU-sched, and "N" is the CPU number.
- This reduces OS jitter on the offloaded CPUs,
+ In kernels built with CONFIG_RCU_NOCB_CPU=y, enable the
+ no-callback CPU mode. Invocation of such CPUs' RCU
+ callbacks will be offloaded to "rcuox/N" kthreads
+ created for that purpose, where "x" is "p" for
+ RCU-preempt, and "s" for RCU-sched, and "N" is the CPU
+ number. This reduces OS jitter on the offloaded CPUs,
which can be useful for HPC and real-time
workloads. It can also improve energy efficiency
for asymmetric multiprocessors.
+ If a cpulist is passed as an argument, the specified
+ list of CPUs is set to no-callback mode from boot.
+
+ If otherwise the '=' sign and the cpulist arguments are
+ omitted, no CPU will be set to no-callback mode from
+ boot but cpuset will allow for toggling that mode at
+ runtime.
+
rcu_nocb_poll [KNL]
Rather than requiring that offloaded CPUs
(specified by rcu_nocbs= above) explicitly
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 9d37916278d4..d915780d40c8 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -66,14 +66,16 @@ static bool rcu_nocb_is_setup;
static int __init rcu_nocb_setup(char *str)
{
alloc_bootmem_cpumask_var(&rcu_nocb_mask);
- if (cpulist_parse(str, rcu_nocb_mask)) {
- pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
- cpumask_setall(rcu_nocb_mask);
+ if (*str == '=') {
+ if (cpulist_parse(++str, rcu_nocb_mask)) {
+ pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
+ cpumask_setall(rcu_nocb_mask);
+ }
}
rcu_nocb_is_setup = true;
return 1;
}
-__setup("rcu_nocbs=", rcu_nocb_setup);
+__setup("rcu_nocbs", rcu_nocb_setup);
static int __init parse_rcu_nocb_poll(char *arg)
{
--
2.25.1
rcu_spawn_one_nocb_kthread() is only called by
rcu_spawn_cpu_nocb_kthread(). Don't bother with two separate functions
and merge them.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
---
kernel/rcu/tree_nocb.h | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index d915780d40c8..7e8da346127a 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1225,12 +1225,15 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
* rcuo CB kthread, spawn it. Additionally, if the rcuo GP kthread
* for this CPU's group has not yet been created, spawn it as well.
*/
-static void rcu_spawn_one_nocb_kthread(int cpu)
+static void rcu_spawn_cpu_nocb_kthread(int cpu)
{
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
struct rcu_data *rdp_gp;
struct task_struct *t;
+ if (!rcu_scheduler_fully_active || !rcu_nocb_is_setup)
+ return;
+
/* If it already has an rcuo kthread, then nothing to do. */
if (rdp->nocb_cb_kthread)
return;
@@ -1254,16 +1257,6 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
}
-/*
- * If the specified CPU is a no-CBs CPU that does not already have its
- * rcuo kthread, spawn it.
- */
-static void rcu_spawn_cpu_nocb_kthread(int cpu)
-{
- if (rcu_scheduler_fully_active && rcu_nocb_is_setup)
- rcu_spawn_one_nocb_kthread(cpu);
-}
-
/*
* Once the scheduler is running, spawn rcuo kthreads for all online
* no-CBs CPUs. This assumes that the early_initcall()s happen before
--
2.25.1
Hi,
On 23/11/21 01:37, Frederic Weisbecker wrote:
> Changes since v1 after Paul's reviews:
>
> * Clarify why the DEL vs ADD possible race on rdp group list is ok [1/6]
> * Update kernel parameters documentation [5/6]
> * Only create rcuo[sp] kthreads for CPUs that have ever come online [4/6]
> * Consider nohz_full= on changelogs
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> rcu/nocb
>
> HEAD: da51363e5ddf54d6ce9c2cfbab946f8914519290
I run this on RT. It survived rcutorture on different kernel cmdline
configurations and creation of rcuox/N kthreads seemed sane. :)
FWIW, feel free to add
Tested-by: Juri Lelli <[email protected]>
Thanks for working on this!
Best,
Juri
Hi,
On 23/11/21 01:37, Frederic Weisbecker wrote:
> In order to be able to (de-)offload any CPU using cpuset in the future,
> create a NOCB kthread for all possible CPUs. For now this is done only
> as long as the "rcu_nocb=" or "nohz_full=" kernel parameters are passed
Super nitpick! In subject and above sentence, "rcu_nocb=" is actually
"rcu_nocbs=", isn't it?
Feel free to ignore the noise, of course. :)
Best,
Juri
On Tue, Nov 23, 2021 at 01:37:05AM +0100, Frederic Weisbecker wrote:
> Currently cpumask_available() is used to prevent from unwanted
> NOCB initialization. However if neither "rcu_nocbs=" nor "nohz_full="
> parameters are passed but CONFIG_CPUMASK_OFFSTACK=n, the initialization
> path is still taken, running through all sorts of needless operations
> and iterations on an empty cpumask.
>
> Fix this with relying on a real initialization state instead. This
> also optimize kthreads creation, sparing iteration over all online CPUs
> when nocb isn't initialized.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> ---
> kernel/rcu/tree_nocb.h | 24 +++++++++++++++++-------
> 1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index e1cb06840454..d8ed3ee47a67 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -60,6 +60,9 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
> * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
> * If the list is invalid, a warning is emitted and all CPUs are offloaded.
> */
> +
> +static bool rcu_nocb_is_setup;
I am taking this as is for now (modulo wordsmithing), but should this
variable instead be in the rcu_state structure? The advantage of putting
it there is keeping the state together. The corresponding disadvantage
is that the state is globally visible within RCU.
Thoughts?
Thanx, Paul
> +
> static int __init rcu_nocb_setup(char *str)
> {
> alloc_bootmem_cpumask_var(&rcu_nocb_mask);
> @@ -67,6 +70,7 @@ static int __init rcu_nocb_setup(char *str)
> pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
> cpumask_setall(rcu_nocb_mask);
> }
> + rcu_nocb_is_setup = true;
> return 1;
> }
> __setup("rcu_nocbs=", rcu_nocb_setup);
> @@ -1159,13 +1163,17 @@ void __init rcu_init_nohz(void)
> need_rcu_nocb_mask = true;
> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
>
> - if (!cpumask_available(rcu_nocb_mask) && need_rcu_nocb_mask) {
> - if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
> - pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
> - return;
> + if (need_rcu_nocb_mask) {
> + if (!cpumask_available(rcu_nocb_mask)) {
> + if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
> + pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
> + return;
> + }
> }
> + rcu_nocb_is_setup = true;
> }
> - if (!cpumask_available(rcu_nocb_mask))
> +
> + if (!rcu_nocb_is_setup)
> return;
>
> #if defined(CONFIG_NO_HZ_FULL)
> @@ -1267,8 +1275,10 @@ static void __init rcu_spawn_nocb_kthreads(void)
> {
> int cpu;
>
> - for_each_online_cpu(cpu)
> - rcu_spawn_cpu_nocb_kthread(cpu);
> + if (rcu_nocb_is_setup) {
> + for_each_online_cpu(cpu)
> + rcu_spawn_cpu_nocb_kthread(cpu);
> + }
> }
>
> /* How many CB CPU IDs per GP kthread? Default of -1 for sqrt(nr_cpu_ids). */
> --
> 2.25.1
>
On Tue, Nov 23, 2021 at 05:28:14PM +0000, Juri Lelli wrote:
> Hi,
>
> On 23/11/21 01:37, Frederic Weisbecker wrote:
> > In order to be able to (de-)offload any CPU using cpuset in the future,
> > create a NOCB kthread for all possible CPUs. For now this is done only
> > as long as the "rcu_nocb=" or "nohz_full=" kernel parameters are passed
>
> Super nitpick! In subject and above sentence, "rcu_nocb=" is actually
> "rcu_nocbs=", isn't it?
>
> Feel free to ignore the noise, of course. :)
What is life without a little noise? ;-)
I fixed this, and thank you for spotting it.
Thanx, Paul
On Tue, Nov 23, 2021 at 01:37:07AM +0100, Frederic Weisbecker wrote:
> If a user wants to boot without any CPU in offloaded mode initially but
> with the possibility to offload them later using cpusets, provide a way
> to simply pass an empty "rcu_nocbs" kernel parameter which will enforce
> the creation of dormant nocb kthreads.
Huh. This would have been a use for Yury Norov's "none" bitmask
specifier. ;-)
I pulled this one in with the usual wordsmithing.
Thanx, Paul
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> ---
> .../admin-guide/kernel-parameters.txt | 26 ++++++++++++-------
> kernel/rcu/tree_nocb.h | 10 ++++---
> 2 files changed, 23 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index cd860dc7c60b..6ff1a5f06383 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4351,20 +4351,28 @@
> Disable the Correctable Errors Collector,
> see CONFIG_RAS_CEC help text.
>
> - rcu_nocbs= [KNL]
> - The argument is a cpu list, as described above.
> + rcu_nocbs[=cpu-list]
> + [KNL] The optional argument is a cpu list,
> + as described above.
>
> - In kernels built with CONFIG_RCU_NOCB_CPU=y, set
> - the specified list of CPUs to be no-callback CPUs.
> - Invocation of these CPUs' RCU callbacks will be
> - offloaded to "rcuox/N" kthreads created for that
> - purpose, where "x" is "p" for RCU-preempt, and
> - "s" for RCU-sched, and "N" is the CPU number.
> - This reduces OS jitter on the offloaded CPUs,
> + In kernels built with CONFIG_RCU_NOCB_CPU=y, enable the
> + no-callback CPU mode. Invocation of such CPUs' RCU
> + callbacks will be offloaded to "rcuox/N" kthreads
> + created for that purpose, where "x" is "p" for
> + RCU-preempt, and "s" for RCU-sched, and "N" is the CPU
> + number. This reduces OS jitter on the offloaded CPUs,
> which can be useful for HPC and real-time
> workloads. It can also improve energy efficiency
> for asymmetric multiprocessors.
>
> + If a cpulist is passed as an argument, the specified
> + list of CPUs is set to no-callback mode from boot.
> +
> + If otherwise the '=' sign and the cpulist arguments are
> + omitted, no CPU will be set to no-callback mode from
> + boot but cpuset will allow for toggling that mode at
> + runtime.
> +
> rcu_nocb_poll [KNL]
> Rather than requiring that offloaded CPUs
> (specified by rcu_nocbs= above) explicitly
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 9d37916278d4..d915780d40c8 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -66,14 +66,16 @@ static bool rcu_nocb_is_setup;
> static int __init rcu_nocb_setup(char *str)
> {
> alloc_bootmem_cpumask_var(&rcu_nocb_mask);
> - if (cpulist_parse(str, rcu_nocb_mask)) {
> - pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
> - cpumask_setall(rcu_nocb_mask);
> + if (*str == '=') {
> + if (cpulist_parse(++str, rcu_nocb_mask)) {
> + pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
> + cpumask_setall(rcu_nocb_mask);
> + }
> }
> rcu_nocb_is_setup = true;
> return 1;
> }
> -__setup("rcu_nocbs=", rcu_nocb_setup);
> +__setup("rcu_nocbs", rcu_nocb_setup);
>
> static int __init parse_rcu_nocb_poll(char *arg)
> {
> --
> 2.25.1
>
On Wed, Nov 24, 2021 at 04:47:20PM -0800, Paul E. McKenney wrote:
> On Tue, Nov 23, 2021 at 01:37:07AM +0100, Frederic Weisbecker wrote:
> > If a user wants to boot without any CPU in offloaded mode initially but
> > with the possibility to offload them later using cpusets, provide a way
> > to simply pass an empty "rcu_nocbs" kernel parameter which will enforce
> > the creation of dormant nocb kthreads.
>
> Huh. This would have been a use for Yury Norov's "none" bitmask
> specifier. ;-)
This must be the last missing piece before we can finally support NR_CPUS=0 ;)
>
> I pulled this one in with the usual wordsmithing.
Thanks!
On Tue, Nov 23, 2021 at 05:25:34PM +0000, Juri Lelli wrote:
> Hi,
>
> On 23/11/21 01:37, Frederic Weisbecker wrote:
> > Changes since v1 after Paul's reviews:
> >
> > * Clarify why the DEL vs ADD possible race on rdp group list is ok [1/6]
> > * Update kernel parameters documentation [5/6]
> > * Only create rcuo[sp] kthreads for CPUs that have ever come online [4/6]
> > * Consider nohz_full= on changelogs
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > rcu/nocb
> >
> > HEAD: da51363e5ddf54d6ce9c2cfbab946f8914519290
>
> I run this on RT. It survived rcutorture on different kernel cmdline
> configurations and creation of rcuox/N kthreads seemed sane. :)
>
> FWIW, feel free to add
>
> Tested-by: Juri Lelli <[email protected]>
>
> Thanks for working on this!
And thank you for testing! Applied to all but the first one, which
I will catch on the next rebase.
Thanx, Paul
On Thu, Nov 25, 2021 at 01:55:26AM +0100, Frederic Weisbecker wrote:
> On Wed, Nov 24, 2021 at 04:47:20PM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 23, 2021 at 01:37:07AM +0100, Frederic Weisbecker wrote:
> > > If a user wants to boot without any CPU in offloaded mode initially but
> > > with the possibility to offload them later using cpusets, provide a way
> > > to simply pass an empty "rcu_nocbs" kernel parameter which will enforce
> > > the creation of dormant nocb kthreads.
> >
> > Huh. This would have been a use for Yury Norov's "none" bitmask
> > specifier. ;-)
>
> This must be the last missing piece before we can finally support NR_CPUS=0 ;)
;-) ;-) ;-)
Thanx, Paul
> > I pulled this one in with the usual wordsmithing.
>
> Thanks!
On Wed, Nov 24, 2021 at 04:47:20PM -0800, Paul E. McKenney wrote:
> On Tue, Nov 23, 2021 at 01:37:07AM +0100, Frederic Weisbecker wrote:
> > If a user wants to boot without any CPU in offloaded mode initially but
> > with the possibility to offload them later using cpusets, provide a way
> > to simply pass an empty "rcu_nocbs" kernel parameter which will enforce
> > the creation of dormant nocb kthreads.
>
> Huh. This would have been a use for Yury Norov's "none" bitmask
> specifier. ;-)
>
> I pulled this one in with the usual wordsmithing.
>
> Thanx, Paul
I think 'rcu_nocbs=,' should work as 'none'. But I admit that it looks
awkward. The following patch adds clear 'none' semantics to the parser.
If you like it, I think you may drop non-documentation part of this
patch.
From e3a9cfe4830141c88aa5d8a93eae3512b2ae2882 Mon Sep 17 00:00:00 2001
From: Yury Norov <[email protected]>
Date: Wed, 24 Nov 2021 19:34:05 -0800
Subject: [PATCH] lib/bitmap: add 'none' specifier for bitmap_parselist()
Currently bitmap_parselist() has no clear notation to specify empty bitmap.
The format allows ',' or '0:0-N' for doing this, but it looks hacky.
Frederic Weisbecker needs to pass an empty rcu_nocbs to the kernel, and
without such a notation has to hack his own code:
https://lore.kernel.org/rcu/20211125005526.GA490855@lothringen/
This patch adds 'none' to the bitmap_parselist, so that no such hacks would
be needed. 'None' is case-insensitive and doesn't support group semantics
('none:1/2' is meaningless and wouldn't work).
Signed-off-by: Yury Norov <[email protected]>
---
Documentation/admin-guide/kernel-parameters.rst | 7 +++++--
lib/bitmap.c | 12 +++++++++++-
lib/test_bitmap.c | 13 +++++++++++++
3 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 01ba293a2d70..af261018bab5 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -76,11 +76,14 @@ to change, such as less cores in the CPU list, then N and any ranges using N
will also change. Use the same on a small 4 core system, and "16-N" becomes
"16-3" and now the same boot input will be flagged as invalid (start > end).
+The special case-tolerant group name "none" has a meaning of selecting no CPUs,
+so that "rcu_nocps=none" would allow to disable offloading mode for all CPUs.
+
The special case-tolerant group name "all" has a meaning of selecting all CPUs,
so that "nohz_full=all" is the equivalent of "nohz_full=0-N".
-The semantics of "N" and "all" is supported on a level of bitmaps and holds for
-all users of bitmap_parse().
+The semantics of "none", "N" and "all" is supported on a level of bitmaps and
+holds for all users of bitmap_parse().
This document may not be entirely up to date and comprehensive. The command
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
diff --git a/lib/bitmap.c b/lib/bitmap.c
index d7b80a069819..bcb38d055ec1 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -771,6 +771,16 @@ static const char *bitmap_parse_region(const char *str, struct region *r)
goto check_pattern;
}
+ if (!strncasecmp(str, "none", 4)) {
+ r->start = 0;
+ r->end = 0;
+ r->off = 0;
+ r->group_len = r->nbits;
+ str += 4;
+
+ goto out;
+ }
+
str = bitmap_getnum(str, &r->start, lastbit);
if (IS_ERR(str))
return str;
@@ -806,7 +816,7 @@ static const char *bitmap_parse_region(const char *str, struct region *r)
no_pattern:
r->off = r->end + 1;
r->group_len = r->end + 1;
-
+out:
return end_of_str(*str) ? NULL : str;
}
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 0c82f07f74fc..1111d0d0df5f 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -351,18 +351,26 @@ static const struct test_bitmap_parselist parselist_tests[] __initconst = {
{0, ",, ,, , , ,", &exp1[12 * step], 8, 0},
{0, " , ,, , , ", &exp1[12 * step], 8, 0},
{0, " , ,, , , \n", &exp1[12 * step], 8, 0},
+ {0, "none", &exp1[12 * step], 8, 0},
+ {0, " , NONE ,, , , \n", &exp1[12 * step], 8, 0},
+ {0, " , ,none, , , \n", &exp1[12 * step], 8, 0},
+ {0, " , ,, , ,none \n", &exp1[12 * step], 8, 0},
{0, "0-0", &exp1[0], 32, 0},
{0, "1-1", &exp1[1 * step], 32, 0},
{0, "15-15", &exp1[13 * step], 32, 0},
{0, "31-31", &exp1[14 * step], 32, 0},
+ {0, "31-31,none", &exp1[14 * step], 32, 0},
{0, "0-0:0/1", &exp1[12 * step], 32, 0},
+ {0, "0-0:0/1,none", &exp1[12 * step], 32, 0},
{0, "0-0:1/1", &exp1[0], 32, 0},
{0, "0-0:1/31", &exp1[0], 32, 0},
{0, "0-0:31/31", &exp1[0], 32, 0},
{0, "1-1:1/1", &exp1[1 * step], 32, 0},
{0, "0-15:16/31", &exp1[2 * step], 32, 0},
+ {0, "0-15:16/31,none", &exp1[2 * step], 32, 0},
+ {0, "none,0-15:16/31", &exp1[2 * step], 32, 0},
{0, "15-15:1/2", &exp1[13 * step], 32, 0},
{0, "15-15:31/31", &exp1[13 * step], 32, 0},
{0, "15-31:1/31", &exp1[13 * step], 32, 0},
@@ -381,6 +389,7 @@ static const struct test_bitmap_parselist parselist_tests[] __initconst = {
{0, "0-N:1/3,1-N:1/3,2-N:1/3", &exp1[8 * step], 32, 0},
{0, "0-31:1/3,1-31:1/3,2-31:1/3", &exp1[8 * step], 32, 0},
{0, "1-10:8/12,8-31:24/29,0-31:0/3", &exp1[9 * step], 32, 0},
+ {0, "1-10:8/12,none,8-31:24/29,0-31:0/3", &exp1[9 * step], 32, 0},
{0, "all", &exp1[8 * step], 32, 0},
{0, "0, 1, all, ", &exp1[8 * step], 32, 0},
@@ -388,6 +397,10 @@ static const struct test_bitmap_parselist parselist_tests[] __initconst = {
{0, "ALL:1/2", &exp1[4 * step], 32, 0},
{-EINVAL, "al", NULL, 8, 0},
{-EINVAL, "alll", NULL, 8, 0},
+ {-EINVAL, "non", NULL, 8, 0},
+ {-EINVAL, "one", NULL, 8, 0},
+ {-EINVAL, "NONEE", NULL, 8, 0},
+ {-EINVAL, "NONE:1/2", NULL, 8, 0},
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
--
2.25.1
On Wed, Nov 24, 2021 at 08:41:32PM -0800, Yury Norov wrote:
> On Wed, Nov 24, 2021 at 04:47:20PM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 23, 2021 at 01:37:07AM +0100, Frederic Weisbecker wrote:
...
> + if (!strncasecmp(str, "none", 4)) {
> + r->start = 0;
> + r->end = 0;
> + r->off = 0;
> + r->group_len = r->nbits;
> + str += 4;
> + goto out;
Side remark, I would make the name of the label a bit more verbose on what it's
going to be done there, "out_end_of_region:" or alike.
> + }
--
With Best Regards,
Andy Shevchenko
On Wed, Nov 24, 2021 at 08:41:32PM -0800, Yury Norov wrote:
> On Wed, Nov 24, 2021 at 04:47:20PM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 23, 2021 at 01:37:07AM +0100, Frederic Weisbecker wrote:
> > > If a user wants to boot without any CPU in offloaded mode initially but
> > > with the possibility to offload them later using cpusets, provide a way
> > > to simply pass an empty "rcu_nocbs" kernel parameter which will enforce
> > > the creation of dormant nocb kthreads.
> >
> > Huh. This would have been a use for Yury Norov's "none" bitmask
> > specifier. ;-)
> >
> > I pulled this one in with the usual wordsmithing.
> >
> > Thanx, Paul
>
> I think 'rcu_nocbs=,' should work as 'none'. But I admit that it looks
> awkward. The following patch adds clear 'none' semantics to the parser.
> If you like it, I think you may drop non-documentation part of this
> patch.
I don't have real objection, but I fear that "rcu_nocbs=none" might be
interpretated as rcu_nocbs is entirely deactivated, whereas "rcu_nocbs"
alone makes it clear that we are turning something on.
We can support both though.
Thanks.
On Thu, Nov 25, 2021 at 02:28:53PM +0100, Frederic Weisbecker wrote:
> On Wed, Nov 24, 2021 at 08:41:32PM -0800, Yury Norov wrote:
> > On Wed, Nov 24, 2021 at 04:47:20PM -0800, Paul E. McKenney wrote:
> > > On Tue, Nov 23, 2021 at 01:37:07AM +0100, Frederic Weisbecker wrote:
> > > > If a user wants to boot without any CPU in offloaded mode initially but
> > > > with the possibility to offload them later using cpusets, provide a way
> > > > to simply pass an empty "rcu_nocbs" kernel parameter which will enforce
> > > > the creation of dormant nocb kthreads.
> > >
> > > Huh. This would have been a use for Yury Norov's "none" bitmask
> > > specifier. ;-)
> > >
> > > I pulled this one in with the usual wordsmithing.
> > >
> > > Thanx, Paul
> >
> > I think 'rcu_nocbs=,' should work as 'none'. But I admit that it looks
> > awkward. The following patch adds clear 'none' semantics to the parser.
> > If you like it, I think you may drop non-documentation part of this
> > patch.
>
> I don't have real objection, but I fear that "rcu_nocbs=none" might be
> interpretated as rcu_nocbs is entirely deactivated, whereas "rcu_nocbs"
> alone makes it clear that we are turning something on.
How about "rcu_nocbs=,"? ;-)
Thanx, Paul
> We can support both though.
>
> Thanks.
On 11/23/2021 6:07 AM, Frederic Weisbecker wrote:
> nocb_gp_wait() iterates all CPUs within the rcuog's group even if they
> are have been de-offloaded. This is suboptimal if only few CPUs are
> offloaded within the group. And this will become even more a problem
> when a nocb kthread will be created for all possible CPUs in the future.
>
> Therefore use a standard double linked list to link all the offloaded
> rdps and safely add/del their nodes as we (de-)offloaded them.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> ---
Few queries below.
Reviewed-by: Neeraj Upadhyay <[email protected]>
> kernel/rcu/tree.h | 7 +++++--
> kernel/rcu/tree_nocb.h | 37 ++++++++++++++++++++++++++++++-------
> 2 files changed, 35 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index deeaf2fee714..486fc901bd08 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -221,8 +221,11 @@ struct rcu_data {
> struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
> bool nocb_cb_sleep; /* Is the nocb CB thread asleep? */
> struct task_struct *nocb_cb_kthread;
> - struct rcu_data *nocb_next_cb_rdp;
> - /* Next rcu_data in wakeup chain. */
> + struct list_head nocb_head_rdp; /*
> + * Head of rcu_data list in wakeup chain,
> + * if rdp_gp.
> + */
> + struct list_head nocb_entry_rdp; /* rcu_data node in wakeup chain. */
>
> /* The following fields are used by CB kthread, hence new cacheline. */
> struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp;
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 2461fe8d0c23..cc1165559177 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -625,7 +625,15 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> * and the global grace-period kthread are awakened if needed.
> */
> WARN_ON_ONCE(my_rdp->nocb_gp_rdp != my_rdp);
> - for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> + /*
> + * An rdp can be removed from the list after being de-offloaded or added
> + * to the list before being (re-)offloaded. If the below loop happens while
> + * an rdp is de-offloaded and then re-offloaded shortly afterward, we may
> + * shortcut and ignore a part of the rdp list due to racy list iteration.
> + * Fortunately a new run through the entire loop is forced after an rdp is
> + * added here so that such race get quickly fixed.
> + */
> + list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) {
Can we hit a (unlikely) case where repeated calls to de-offload/offload
cause this loop to continue for long time?
> bool needwake_state = false;
>
> if (!nocb_gp_enabled_cb(rdp))
Now that we can probe flags here, without holding the nocb_gp_lock first
( the case where de-offload and offload happens while we are iterating
the list); can it cause a WARNING from below code?
WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP);
The sequence like this is possible?
1. <de-offload>
Clear SEGCBLIST_OFFLOADED
2. nocb_gp_wait() clears SEGCBLIST_KTHREAD_GP in
nocb_gp_update_state_deoffloading() and continues to next rdp.
3. <offload>
rdp_offload_toggle() hasn't been called yet.
4. rcuog thread migrates to different CPU, while executing the
loop in nocb_gp_wait().
5. nocb_gp_wait() reaches the tail rdp.
6. Current CPU , where rcog thread is running hasn't observed
SEGCBLIST_OFFLOADED clearing done in step 1; so, nocb_gp_enabled_cb()
passes.
7. nocb_gp_wait() acquires the rdp's nocb lock and read the state to
be deoffloaded; however, SEGCBLIST_KTHREAD_GP is not set and
we hit WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist,
SEGCBLIST_KTHREAD_GP));
Thanks
Neeraj
> @@ -1003,6 +1011,8 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> swait_event_exclusive(rdp->nocb_state_wq,
> !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
> SEGCBLIST_KTHREAD_GP));
> + /* Don't bother iterate this one anymore on nocb_gp_wait() */
> + list_del_rcu(&rdp->nocb_entry_rdp);
> /*
> * Lock one last time to acquire latest callback updates from kthreads
> * so we can later handle callbacks locally without locking.
> @@ -1066,6 +1076,15 @@ static long rcu_nocb_rdp_offload(void *arg)
> return -EINVAL;
>
> pr_info("Offloading %d\n", rdp->cpu);
> +
> + /*
> + * Iterate this CPU on nocb_gp_wait(). We do it before locking nocb_gp_lock,
> + * resetting nocb_gp_sleep and waking up the related "rcuog". Since nocb_gp_wait()
> + * in turn locks nocb_gp_lock before setting nocb_gp_sleep again, we are guaranteed
> + * to iterate this new rdp before "rcuog" goes to sleep again.
> + */
> + list_add_tail_rcu(&rdp->nocb_entry_rdp, &rdp->nocb_gp_rdp->nocb_head_rdp);
> +
> /*
> * Can't use rcu_nocb_lock_irqsave() before SEGCBLIST_LOCKING
> * is set.
> @@ -1268,7 +1287,6 @@ static void __init rcu_organize_nocb_kthreads(void)
> int nl = 0; /* Next GP kthread. */
> struct rcu_data *rdp;
> struct rcu_data *rdp_gp = NULL; /* Suppress misguided gcc warn. */
> - struct rcu_data *rdp_prev = NULL;
>
> if (!cpumask_available(rcu_nocb_mask))
> return;
> @@ -1288,8 +1306,8 @@ static void __init rcu_organize_nocb_kthreads(void)
> /* New GP kthread, set up for CBs & next GP. */
> gotnocbs = true;
> nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls;
> - rdp->nocb_gp_rdp = rdp;
> rdp_gp = rdp;
> + INIT_LIST_HEAD(&rdp->nocb_head_rdp);
> if (dump_tree) {
> if (!firsttime)
> pr_cont("%s\n", gotnocbscbs
> @@ -1302,12 +1320,11 @@ static void __init rcu_organize_nocb_kthreads(void)
> } else {
> /* Another CB kthread, link to previous GP kthread. */
> gotnocbscbs = true;
> - rdp->nocb_gp_rdp = rdp_gp;
> - rdp_prev->nocb_next_cb_rdp = rdp;
> if (dump_tree)
> pr_cont(" %d", cpu);
> }
> - rdp_prev = rdp;
> + rdp->nocb_gp_rdp = rdp_gp;
> + list_add_tail(&rdp->nocb_entry_rdp, &rdp_gp->nocb_head_rdp);
> }
> if (gotnocbs && dump_tree)
> pr_cont("%s\n", gotnocbscbs ? "" : " (self only)");
> @@ -1369,6 +1386,7 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
> {
> char bufw[20];
> char bufr[20];
> + struct rcu_data *nocb_next_rdp;
> struct rcu_segcblist *rsclp = &rdp->cblist;
> bool waslocked;
> bool wassleep;
> @@ -1376,11 +1394,16 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
> if (rdp->nocb_gp_rdp == rdp)
> show_rcu_nocb_gp_state(rdp);
>
> + nocb_next_rdp = list_next_or_null_rcu(&rdp->nocb_gp_rdp->nocb_head_rdp,
> + &rdp->nocb_entry_rdp,
> + typeof(*rdp),
> + nocb_entry_rdp);
> +
> sprintf(bufw, "%ld", rsclp->gp_seq[RCU_WAIT_TAIL]);
> sprintf(bufr, "%ld", rsclp->gp_seq[RCU_NEXT_READY_TAIL]);
> pr_info(" CB %d^%d->%d %c%c%c%c%c%c F%ld L%ld C%d %c%c%s%c%s%c%c q%ld %c CPU %d%s\n",
> rdp->cpu, rdp->nocb_gp_rdp->cpu,
> - rdp->nocb_next_cb_rdp ? rdp->nocb_next_cb_rdp->cpu : -1,
> + nocb_next_rdp ? nocb_next_rdp->cpu : -1,
> "kK"[!!rdp->nocb_cb_kthread],
> "bB"[raw_spin_is_locked(&rdp->nocb_bypass_lock)],
> "cC"[!!atomic_read(&rdp->nocb_lock_contended)],
>
On 11/23/2021 6:07 AM, Frederic Weisbecker wrote:
> In order to be able to toggle the offloaded state from cpusets, a nocb
> kthread will need to be created for all possible CPUs whenever the
> "rcu_nocbs=" or "nohz_full=" parameters are passed.
>
> Therefore nocb_cb_wait() kthread must prepare to start running on a
> de-offloaded rdp. Simply move the sleeping condition in the beginning
> of the kthread callback is enough to prevent from running callbacks
> before the rdp ever becomes offloaded.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> ---
Reviewed-by: Neeraj Upadhyay <[email protected]>
Thanks
Neeraj
> kernel/rcu/tree_nocb.h | 23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index cc1165559177..e1cb06840454 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -797,6 +797,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> bool can_sleep = true;
> struct rcu_node *rnp = rdp->mynode;
>
> + do {
> + swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> + nocb_cb_wait_cond(rdp));
> +
> + // VVV Ensure CB invocation follows _sleep test.
> + if (smp_load_acquire(&rdp->nocb_cb_sleep)) { // ^^^
> + WARN_ON(signal_pending(current));
> + trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> + }
> + } while (!nocb_cb_can_run(rdp));
> +
> +
> local_irq_save(flags);
> rcu_momentary_dyntick_idle();
> local_irq_restore(flags);
> @@ -849,17 +861,6 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>
> if (needwake_state)
> swake_up_one(&rdp->nocb_state_wq);
> -
> - do {
> - swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> - nocb_cb_wait_cond(rdp));
> -
> - // VVV Ensure CB invocation follows _sleep test.
> - if (smp_load_acquire(&rdp->nocb_cb_sleep)) { // ^^^
> - WARN_ON(signal_pending(current));
> - trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> - }
> - } while (!nocb_cb_can_run(rdp));
> }
>
> /*
>
On 11/23/2021 6:07 AM, Frederic Weisbecker wrote:
> Currently cpumask_available() is used to prevent from unwanted
> NOCB initialization. However if neither "rcu_nocbs=" nor "nohz_full="
> parameters are passed but CONFIG_CPUMASK_OFFSTACK=n, the initialization
> path is still taken, running through all sorts of needless operations
> and iterations on an empty cpumask.
>
> Fix this with relying on a real initialization state instead. This
> also optimize kthreads creation, sparing iteration over all online CPUs
> when nocb isn't initialized.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> ---
Reviewed-by: Neeraj Upadhyay <[email protected]>
> kernel/rcu/tree_nocb.h | 24 +++++++++++++++++-------
> 1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index e1cb06840454..d8ed3ee47a67 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -60,6 +60,9 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
> * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
> * If the list is invalid, a warning is emitted and all CPUs are offloaded.
> */
> +
> +static bool rcu_nocb_is_setup;
> +
> static int __init rcu_nocb_setup(char *str)
> {
> alloc_bootmem_cpumask_var(&rcu_nocb_mask);
> @@ -67,6 +70,7 @@ static int __init rcu_nocb_setup(char *str)
> pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
> cpumask_setall(rcu_nocb_mask);
> }
> + rcu_nocb_is_setup = true;
> return 1;
> }
> __setup("rcu_nocbs=", rcu_nocb_setup);
> @@ -1159,13 +1163,17 @@ void __init rcu_init_nohz(void)
> need_rcu_nocb_mask = true;
> #endif /* #if defined(CONFIG_NO_HZ_FULL) */
>
> - if (!cpumask_available(rcu_nocb_mask) && need_rcu_nocb_mask) {
> - if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
> - pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
> - return;
> + if (need_rcu_nocb_mask) {
> + if (!cpumask_available(rcu_nocb_mask)) {
> + if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
> + pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
> + return;
> + }
> }
> + rcu_nocb_is_setup = true;
> }
> - if (!cpumask_available(rcu_nocb_mask))
> +
> + if (!rcu_nocb_is_setup)
> return;
>
> #if defined(CONFIG_NO_HZ_FULL)
> @@ -1267,8 +1275,10 @@ static void __init rcu_spawn_nocb_kthreads(void)
> {
> int cpu;
>
> - for_each_online_cpu(cpu)
> - rcu_spawn_cpu_nocb_kthread(cpu);
> + if (rcu_nocb_is_setup) {
> + for_each_online_cpu(cpu)
> + rcu_spawn_cpu_nocb_kthread(cpu);
> + }
> }
>
> /* How many CB CPU IDs per GP kthread? Default of -1 for sqrt(nr_cpu_ids). */
>
On 11/23/2021 6:07 AM, Frederic Weisbecker wrote:
> In order to be able to (de-)offload any CPU using cpuset in the future,
> create a NOCB kthread for all possible CPUs. For now this is done only
> as long as the "rcu_nocb=" or "nohz_full=" kernel parameters are passed
> to avoid the unnecessary overhead for most users.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> ---
Reviewed-by: Neeraj Upadhyay <[email protected]>
> kernel/rcu/tree_nocb.h | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index d8ed3ee47a67..9d37916278d4 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1229,11 +1229,8 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
> struct rcu_data *rdp_gp;
> struct task_struct *t;
>
> - /*
> - * If this isn't a no-CBs CPU or if it already has an rcuo kthread,
> - * then nothing to do.
> - */
> - if (!rcu_is_nocb_cpu(cpu) || rdp->nocb_cb_kthread)
As rcu_is_nocb_cpu() does not have a user, we can probably remove it?
Thanks
Neeraj
> + /* If it already has an rcuo kthread, then nothing to do. */
> + if (rdp->nocb_cb_kthread)
> return;
>
> /* If we didn't spawn the GP kthread first, reorganize! */
> @@ -1261,7 +1258,7 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
> */
> static void rcu_spawn_cpu_nocb_kthread(int cpu)
> {
> - if (rcu_scheduler_fully_active)
> + if (rcu_scheduler_fully_active && rcu_nocb_is_setup)
> rcu_spawn_one_nocb_kthread(cpu);
> }
>
> @@ -1311,7 +1308,7 @@ static void __init rcu_organize_nocb_kthreads(void)
> * Should the corresponding CPU come online in the future, then
> * we will spawn the needed set of rcu_nocb_kthread() kthreads.
> */
> - for_each_cpu(cpu, rcu_nocb_mask) {
> + for_each_possible_cpu(cpu) {
> rdp = per_cpu_ptr(&rcu_data, cpu);
> if (rdp->cpu >= nl) {
> /* New GP kthread, set up for CBs & next GP. */
> @@ -1335,7 +1332,8 @@ static void __init rcu_organize_nocb_kthreads(void)
> pr_cont(" %d", cpu);
> }
> rdp->nocb_gp_rdp = rdp_gp;
> - list_add_tail(&rdp->nocb_entry_rdp, &rdp_gp->nocb_head_rdp);
> + if (cpumask_test_cpu(cpu, rcu_nocb_mask))
> + list_add_tail(&rdp->nocb_entry_rdp, &rdp_gp->nocb_head_rdp);
> }
> if (gotnocbs && dump_tree)
> pr_cont("%s\n", gotnocbscbs ? "" : " (self only)");
>
On 11/23/2021 6:07 AM, Frederic Weisbecker wrote:
> If a user wants to boot without any CPU in offloaded mode initially but
> with the possibility to offload them later using cpusets, provide a way
> to simply pass an empty "rcu_nocbs" kernel parameter which will enforce
> the creation of dormant nocb kthreads.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> ---
Reviewed-by: Neeraj Upadhyay <[email protected]>
> .../admin-guide/kernel-parameters.txt | 26 ++++++++++++-------
> kernel/rcu/tree_nocb.h | 10 ++++---
> 2 files changed, 23 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index cd860dc7c60b..6ff1a5f06383 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4351,20 +4351,28 @@
> Disable the Correctable Errors Collector,
> see CONFIG_RAS_CEC help text.
>
> - rcu_nocbs= [KNL]
> - The argument is a cpu list, as described above.
> + rcu_nocbs[=cpu-list]
> + [KNL] The optional argument is a cpu list,
> + as described above.
>
> - In kernels built with CONFIG_RCU_NOCB_CPU=y, set
> - the specified list of CPUs to be no-callback CPUs.
> - Invocation of these CPUs' RCU callbacks will be
> - offloaded to "rcuox/N" kthreads created for that
> - purpose, where "x" is "p" for RCU-preempt, and
> - "s" for RCU-sched, and "N" is the CPU number.
> - This reduces OS jitter on the offloaded CPUs,
> + In kernels built with CONFIG_RCU_NOCB_CPU=y, enable the
> + no-callback CPU mode. Invocation of such CPUs' RCU
> + callbacks will be offloaded to "rcuox/N" kthreads
> + created for that purpose, where "x" is "p" for
> + RCU-preempt, and "s" for RCU-sched, and "N" is the CPU
> + number. This reduces OS jitter on the offloaded CPUs,
> which can be useful for HPC and real-time
> workloads. It can also improve energy efficiency
> for asymmetric multiprocessors.
>
> + If a cpulist is passed as an argument, the specified
> + list of CPUs is set to no-callback mode from boot.
> +
> + If otherwise the '=' sign and the cpulist arguments are
> + omitted, no CPU will be set to no-callback mode from
> + boot but cpuset will allow for toggling that mode at
> + runtime.
> +
> rcu_nocb_poll [KNL]
> Rather than requiring that offloaded CPUs
> (specified by rcu_nocbs= above) explicitly
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 9d37916278d4..d915780d40c8 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -66,14 +66,16 @@ static bool rcu_nocb_is_setup;
> static int __init rcu_nocb_setup(char *str)
> {
> alloc_bootmem_cpumask_var(&rcu_nocb_mask);
> - if (cpulist_parse(str, rcu_nocb_mask)) {
> - pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
> - cpumask_setall(rcu_nocb_mask);
> + if (*str == '=') {
> + if (cpulist_parse(++str, rcu_nocb_mask)) {
> + pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
> + cpumask_setall(rcu_nocb_mask);
> + }
> }
> rcu_nocb_is_setup = true;
> return 1;
> }
> -__setup("rcu_nocbs=", rcu_nocb_setup);
> +__setup("rcu_nocbs", rcu_nocb_setup);
>
> static int __init parse_rcu_nocb_poll(char *arg)
> {
>
On 11/23/2021 6:07 AM, Frederic Weisbecker wrote:
> rcu_spawn_one_nocb_kthread() is only called by
> rcu_spawn_cpu_nocb_kthread(). Don't bother with two separate functions
> and merge them.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> ---
Reviewed-by: Neeraj Upadhyay <[email protected]>
> kernel/rcu/tree_nocb.h | 15 ++++-----------
> 1 file changed, 4 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index d915780d40c8..7e8da346127a 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1225,12 +1225,15 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
> * rcuo CB kthread, spawn it. Additionally, if the rcuo GP kthread
> * for this CPU's group has not yet been created, spawn it as well.
> */
> -static void rcu_spawn_one_nocb_kthread(int cpu)
> +static void rcu_spawn_cpu_nocb_kthread(int cpu)
> {
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> struct rcu_data *rdp_gp;
> struct task_struct *t;
>
> + if (!rcu_scheduler_fully_active || !rcu_nocb_is_setup)
> + return;
> +
> /* If it already has an rcuo kthread, then nothing to do. */
> if (rdp->nocb_cb_kthread)
> return;
> @@ -1254,16 +1257,6 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
> WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
> }
>
> -/*
> - * If the specified CPU is a no-CBs CPU that does not already have its
> - * rcuo kthread, spawn it.
> - */
> -static void rcu_spawn_cpu_nocb_kthread(int cpu)
> -{
> - if (rcu_scheduler_fully_active && rcu_nocb_is_setup)
> - rcu_spawn_one_nocb_kthread(cpu);
> -}
> -
> /*
> * Once the scheduler is running, spawn rcuo kthreads for all online
> * no-CBs CPUs. This assumes that the early_initcall()s happen before
>
On Wed, Dec 01, 2021 at 02:55:16PM +0530, Neeraj Upadhyay wrote:
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 2461fe8d0c23..cc1165559177 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -625,7 +625,15 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > * and the global grace-period kthread are awakened if needed.
> > */
> > WARN_ON_ONCE(my_rdp->nocb_gp_rdp != my_rdp);
> > - for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> > + /*
> > + * An rdp can be removed from the list after being de-offloaded or added
> > + * to the list before being (re-)offloaded. If the below loop happens while
> > + * an rdp is de-offloaded and then re-offloaded shortly afterward, we may
> > + * shortcut and ignore a part of the rdp list due to racy list iteration.
> > + * Fortunately a new run through the entire loop is forced after an rdp is
> > + * added here so that such race get quickly fixed.
> > + */
> > + list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) {
>
> Can we hit a (unlikely) case where repeated calls to de-offload/offload
> cause this loop to continue for long time?
Probably not, I guess the only situation for that to happen would be:
DEOFFLOAD rdp 0
OFFLOAD rdp 0
DEOFFLOAD rdp 1
OFFLOAD rdp 1
DEOFFLOAD rdp 2
OFFLOAD rdp 2
etc...
But even then you'd need a very bad luck.
>
>
> > bool needwake_state = false;
> > if (!nocb_gp_enabled_cb(rdp))
>
> Now that we can probe flags here, without holding the nocb_gp_lock first (
> the case where de-offload and offload happens while we are iterating the
> list); can it cause a WARNING from below code?
>
>
> WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
> rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP);
>
> The sequence like this is possible?
>
> 1. <de-offload>
> Clear SEGCBLIST_OFFLOADED
> 2. nocb_gp_wait() clears SEGCBLIST_KTHREAD_GP in
> nocb_gp_update_state_deoffloading() and continues to next rdp.
> 3. <offload>
> rdp_offload_toggle() hasn't been called yet.
> 4. rcuog thread migrates to different CPU, while executing the
> loop in nocb_gp_wait().
> 5. nocb_gp_wait() reaches the tail rdp.
> 6. Current CPU , where rcog thread is running hasn't observed
> SEGCBLIST_OFFLOADED clearing done in step 1; so, nocb_gp_enabled_cb()
> passes.
This shouldn't happen. If a task migrates from CPU X to CPU Y, CPU Y is
guaranteed to see what CPU X saw due to scheduler ordering.
But you have a good point that checking those flags outside the nocb lock
is asking for trouble. It used to be an optimization but now that the rdp
entries are removed when they are not needed anymore, we should probably
lock unconditionally before the call to nocb_gp_enabled_cb().
Thanks.
> 7. nocb_gp_wait() acquires the rdp's nocb lock and read the state to
> be deoffloaded; however, SEGCBLIST_KTHREAD_GP is not set and
> we hit WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist,
> SEGCBLIST_KTHREAD_GP));
>
> Thanks
> Neeraj
On Wed, Dec 01, 2021 at 02:57:18PM +0530, Neeraj Upadhyay wrote:
>
>
> On 11/23/2021 6:07 AM, Frederic Weisbecker wrote:
> > In order to be able to (de-)offload any CPU using cpuset in the future,
> > create a NOCB kthread for all possible CPUs. For now this is done only
> > as long as the "rcu_nocb=" or "nohz_full=" kernel parameters are passed
> > to avoid the unnecessary overhead for most users.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Neeraj Upadhyay <[email protected]>
> > Cc: Boqun Feng <[email protected]>
> > Cc: Uladzislau Rezki <[email protected]>
> > Cc: Josh Triplett <[email protected]>
> > Cc: Joel Fernandes <[email protected]>
> > ---
>
> Reviewed-by: Neeraj Upadhyay <[email protected]>
>
>
> > kernel/rcu/tree_nocb.h | 14 ++++++--------
> > 1 file changed, 6 insertions(+), 8 deletions(-)
> >
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index d8ed3ee47a67..9d37916278d4 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -1229,11 +1229,8 @@ static void rcu_spawn_one_nocb_kthread(int cpu)
> > struct rcu_data *rdp_gp;
> > struct task_struct *t;
> > - /*
> > - * If this isn't a no-CBs CPU or if it already has an rcuo kthread,
> > - * then nothing to do.
> > - */
> > - if (!rcu_is_nocb_cpu(cpu) || rdp->nocb_cb_kthread)
>
> As rcu_is_nocb_cpu() does not have a user, we can probably remove it?
Ah nice, I'll check that.
Thanks!
On Wed, Nov 24, 2021 at 04:30:26PM -0800, Paul E. McKenney wrote:
> On Tue, Nov 23, 2021 at 01:37:05AM +0100, Frederic Weisbecker wrote:
> > Currently cpumask_available() is used to prevent from unwanted
> > NOCB initialization. However if neither "rcu_nocbs=" nor "nohz_full="
> > parameters are passed but CONFIG_CPUMASK_OFFSTACK=n, the initialization
> > path is still taken, running through all sorts of needless operations
> > and iterations on an empty cpumask.
> >
> > Fix this with relying on a real initialization state instead. This
> > also optimize kthreads creation, sparing iteration over all online CPUs
> > when nocb isn't initialized.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Neeraj Upadhyay <[email protected]>
> > Cc: Boqun Feng <[email protected]>
> > Cc: Uladzislau Rezki <[email protected]>
> > Cc: Josh Triplett <[email protected]>
> > Cc: Joel Fernandes <[email protected]>
> > ---
> > kernel/rcu/tree_nocb.h | 24 +++++++++++++++++-------
> > 1 file changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index e1cb06840454..d8ed3ee47a67 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -60,6 +60,9 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
> > * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
> > * If the list is invalid, a warning is emitted and all CPUs are offloaded.
> > */
> > +
> > +static bool rcu_nocb_is_setup;
>
> I am taking this as is for now (modulo wordsmithing), but should this
> variable instead be in the rcu_state structure? The advantage of putting
> it there is keeping the state together. The corresponding disadvantage
> is that the state is globally visible within RCU.
>
> Thoughts?
Yes good point, will do!
Thanks!