Hello!
This series contains miscellaneous fixes:
1. Decrease FQS scan wait time in case of callback overloading.
2. Avoid tracing a few functions executed in stop machine, courtesy
of Patrick Wang.
3. Add rnp->cbovldmask check in rcutree_migrate_callbacks(),
courtesy of Zqiang.
4. Immediately boost preempted readers for strict grace periods,
courtesy of Zqiang.
5. Forbid RCU_STRICT_GRACE_PERIOD in TINY_RCU kernels.
6. locking/csd_lock: Change csdlock_debug from early_param to
__setup, courtesy of Chen Zhongjin.
7. tiny: Record kvfree_call_rcu() call stack for KASAN, courtesy
of Johannes Berg.
8. Cleanup RCU urgency state for offline CPU, courtesy of Zqiang.
9. Remove useless monitor_todo flag, courtesy of "Joel Fernandes
(Google)".
10. Initialize first_gp_fqs at declaration in rcu_gp_fqs().
11. Add comment to describe GP-done condition in fqs loop, courtesy
of Neeraj Upadhyay.
12. Block less aggressively for expedited grace periods.
Thanx, Paul
------------------------------------------------------------------------
b/include/linux/rcutiny.h | 11 +++++++++-
b/kernel/rcu/Kconfig.debug | 2 -
b/kernel/rcu/srcutree.c | 20 ++++++++++++------
b/kernel/rcu/tiny.c | 14 +++++++++++++
b/kernel/rcu/tree.c | 5 ++++
b/kernel/rcu/tree_plugin.h | 6 ++---
b/kernel/smp.c | 4 +--
kernel/rcu/tree.c | 48 ++++++++++++++++++++++++++-------------------
kernel/rcu/tree_plugin.h | 3 +-
9 files changed, 78 insertions(+), 35 deletions(-)
From: Chen Zhongjin <[email protected]>
The csdlock_debug kernel-boot parameter is parsed by the
early_param() function csdlock_debug(). If set, csdlock_debug()
invokes static_branch_enable() to enable csd_lock_wait feature, which
triggers a panic on arm64 for kernels built with CONFIG_SPARSEMEM=y and
CONFIG_SPARSEMEM_VMEMMAP=n.
With CONFIG_SPARSEMEM_VMEMMAP=n, __nr_to_section is called in
static_key_enable() and returns NULL, resulting in a NULL dereference
because mem_section is initialized only later in sparse_init().
This is also a problem for powerpc because early_param() functions
are invoked earlier than jump_label_init(), also resulting in
static_key_enable() failures. These failures cause the warning "static
key 'xxx' used before call to jump_label_init()".
Thus, early_param is too early for csd_lock_wait to run
static_branch_enable(), so changes it to __setup to fix these.
Fixes: 8d0968cc6b8f ("locking/csd_lock: Add boot parameter for controlling CSD lock debugging")
Cc: [email protected]
Reported-by: Chen jingwen <[email protected]>
Signed-off-by: Chen Zhongjin <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/smp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index dd215f4394264..650810a6f29b3 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -174,9 +174,9 @@ static int __init csdlock_debug(char *str)
if (val)
static_branch_enable(&csdlock_debug_enabled);
- return 0;
+ return 1;
}
-early_param("csdlock_debug", csdlock_debug);
+__setup("csdlock_debug=", csdlock_debug);
static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
--
2.31.1.189.g2e36527f23
The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
callback overloading and does an immediate initial scan for idle CPUs
if so. However, subsequent rescans will be carried out at as leisurely a
rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
module parameter. It might be tempting to just continue immediately
rescanning, but this turns the RCU grace-period kthread into a CPU hog.
It might also be tempting to reduce the time between rescans to a single
jiffy, but this can be problematic on larger systems.
This commit therefore divides the normal time between rescans by three,
rounding up. Thus a small system running at HZ=1000 that is suffering
from callback overload will wait only one jiffy instead of the normal
three between rescans.
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c25ba442044a6..c19d5926886fb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
jiffies + (j ? 3 * j : 2));
}
+ if (rcu_state.cbovld) {
+ j = (j + 2) / 3;
+ if (j <= 0)
+ j = 1;
+ }
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
TPS("fqswait"));
WRITE_ONCE(rcu_state.gp_state, RCU_GP_WAIT_FQS);
--
2.31.1.189.g2e36527f23
On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
> callback overloading and does an immediate initial scan for idle CPUs
> if so. However, subsequent rescans will be carried out at as leisurely a
> rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
> module parameter. It might be tempting to just continue immediately
> rescanning, but this turns the RCU grace-period kthread into a CPU hog.
> It might also be tempting to reduce the time between rescans to a single
> jiffy, but this can be problematic on larger systems.
>
> This commit therefore divides the normal time between rescans by three,
> rounding up. Thus a small system running at HZ=1000 that is suffering
> from callback overload will wait only one jiffy instead of the normal
> three between rescans.
>
> Signed-off-by: Paul E. McKenney <[email protected]>
> ---
> kernel/rcu/tree.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c25ba442044a6..c19d5926886fb 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
> WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> jiffies + (j ? 3 * j : 2));
> }
> + if (rcu_state.cbovld) {
> + j = (j + 2) / 3;
> + if (j <= 0)
> + j = 1;
> + }
We update 'j' here, after setting rcu_state.jiffies_force_qs
WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j)
So, we return from swait_event_idle_timeout_exclusive after 1/3 time
duration.
swait_event_idle_timeout_exclusive(rcu_state.gp_wq,
rcu_gp_fqs_check_wake(&gf), j);
This can result in !timer_after check to return false and we will
enter the 'else' (stray signal block) code?
This might not matter for first 2 fqs loop iterations, where
RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit
from this patch?
if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
(gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
...
} else {
/* Deal with stray signal. */
}
So, do we need to move this calculation above the 'if' block which sets
rcu_state.jiffies_force_qs?
if (!ret) {
WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies +
j);...
}
Thanks
Neeraj
> trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> TPS("fqswait"));
> WRITE_ONCE(rcu_state.gp_state, RCU_GP_WAIT_FQS);
On Tue, Jun 21, 2022 at 10:59:58AM +0530, Neeraj Upadhyay wrote:
>
>
> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
> > callback overloading and does an immediate initial scan for idle CPUs
> > if so. However, subsequent rescans will be carried out at as leisurely a
> > rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
> > module parameter. It might be tempting to just continue immediately
> > rescanning, but this turns the RCU grace-period kthread into a CPU hog.
> > It might also be tempting to reduce the time between rescans to a single
> > jiffy, but this can be problematic on larger systems.
> >
> > This commit therefore divides the normal time between rescans by three,
> > rounding up. Thus a small system running at HZ=1000 that is suffering
> > from callback overload will wait only one jiffy instead of the normal
> > three between rescans.
> >
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > ---
> > kernel/rcu/tree.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c25ba442044a6..c19d5926886fb 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
> > WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> > jiffies + (j ? 3 * j : 2));
> > }
> > + if (rcu_state.cbovld) {
> > + j = (j + 2) / 3;
> > + if (j <= 0)
> > + j = 1;
> > + }
>
> We update 'j' here, after setting rcu_state.jiffies_force_qs
>
> WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j)
>
> So, we return from swait_event_idle_timeout_exclusive after 1/3 time
> duration.
>
> swait_event_idle_timeout_exclusive(rcu_state.gp_wq,
> rcu_gp_fqs_check_wake(&gf), j);
>
> This can result in !timer_after check to return false and we will
> enter the 'else' (stray signal block) code?
>
> This might not matter for first 2 fqs loop iterations, where
> RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit
> from this patch?
>
>
> if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
> ...
> } else {
> /* Deal with stray signal. */
> }
>
>
> So, do we need to move this calculation above the 'if' block which sets
> rcu_state.jiffies_force_qs?
> if (!ret) {
>
> WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies +
> j);...
> }
Good catch, thank you! How about the updated patch shown below?
Thanx, Paul
------------------------------------------------------------------------
commit 77de092c78f549b5c28075bfee9998a525d21f84
Author: Paul E. McKenney <[email protected]>
Date: Tue Apr 12 15:08:14 2022 -0700
rcu: Decrease FQS scan wait time in case of callback overloading
The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
callback overloading and does an immediate initial scan for idle CPUs
if so. However, subsequent rescans will be carried out at as leisurely a
rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
module parameter. It might be tempting to just continue immediately
rescanning, but this turns the RCU grace-period kthread into a CPU hog.
It might also be tempting to reduce the time between rescans to a single
jiffy, but this can be problematic on larger systems.
This commit therefore divides the normal time between rescans by three,
rounding up. Thus a small system running at HZ=1000 that is suffering
from callback overload will wait only one jiffy instead of the normal
three between rescans.
[ paulmck: Apply Neeraj Upadhyay feedback. ]
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c25ba442044a6..52094e72866e5 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1983,7 +1983,12 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
gf = RCU_GP_FLAG_OVLD;
ret = 0;
for (;;) {
- if (!ret) {
+ if (rcu_state.cbovld) {
+ j = (j + 2) / 3;
+ if (j <= 0)
+ j = 1;
+ }
+ if (!ret || time_before(jiffies + j, rcu_state.jiffies_force_qs)) {
WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j);
/*
* jiffies_force_qs before RCU_GP_WAIT_FQS state
On 6/22/2022 3:49 AM, Paul E. McKenney wrote:
> On Tue, Jun 21, 2022 at 10:59:58AM +0530, Neeraj Upadhyay wrote:
>>
>>
>> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
>>> The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
>>> callback overloading and does an immediate initial scan for idle CPUs
>>> if so. However, subsequent rescans will be carried out at as leisurely a
>>> rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
>>> module parameter. It might be tempting to just continue immediately
>>> rescanning, but this turns the RCU grace-period kthread into a CPU hog.
>>> It might also be tempting to reduce the time between rescans to a single
>>> jiffy, but this can be problematic on larger systems.
>>>
>>> This commit therefore divides the normal time between rescans by three,
>>> rounding up. Thus a small system running at HZ=1000 that is suffering
>>> from callback overload will wait only one jiffy instead of the normal
>>> three between rescans.
>>>
>>> Signed-off-by: Paul E. McKenney <[email protected]>
>>> ---
>>> kernel/rcu/tree.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index c25ba442044a6..c19d5926886fb 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
>>> WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
>>> jiffies + (j ? 3 * j : 2));
>>> }
>>> + if (rcu_state.cbovld) {
>>> + j = (j + 2) / 3;
>>> + if (j <= 0)
>>> + j = 1;
>>> + }
>>
>> We update 'j' here, after setting rcu_state.jiffies_force_qs
>>
>> WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j)
>>
>> So, we return from swait_event_idle_timeout_exclusive after 1/3 time
>> duration.
>>
>> swait_event_idle_timeout_exclusive(rcu_state.gp_wq,
>> rcu_gp_fqs_check_wake(&gf), j);
>>
>> This can result in !timer_after check to return false and we will
>> enter the 'else' (stray signal block) code?
>>
>> This might not matter for first 2 fqs loop iterations, where
>> RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit
>> from this patch?
>>
>>
>> if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
>> (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
>> ...
>> } else {
>> /* Deal with stray signal. */
>> }
>>
>>
>> So, do we need to move this calculation above the 'if' block which sets
>> rcu_state.jiffies_force_qs?
>> if (!ret) {
>>
>> WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies +
>> j);...
>> }
>
> Good catch, thank you! How about the updated patch shown below?
>
Looks good to me.
Thanks
Neeraj
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 77de092c78f549b5c28075bfee9998a525d21f84
> Author: Paul E. McKenney <[email protected]>
> Date: Tue Apr 12 15:08:14 2022 -0700
>
> rcu: Decrease FQS scan wait time in case of callback overloading
>
> The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
> callback overloading and does an immediate initial scan for idle CPUs
> if so. However, subsequent rescans will be carried out at as leisurely a
> rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
> module parameter. It might be tempting to just continue immediately
> rescanning, but this turns the RCU grace-period kthread into a CPU hog.
> It might also be tempting to reduce the time between rescans to a single
> jiffy, but this can be problematic on larger systems.
>
> This commit therefore divides the normal time between rescans by three,
> rounding up. Thus a small system running at HZ=1000 that is suffering
> from callback overload will wait only one jiffy instead of the normal
> three between rescans.
>
> [ paulmck: Apply Neeraj Upadhyay feedback. ]
>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c25ba442044a6..52094e72866e5 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1983,7 +1983,12 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
> gf = RCU_GP_FLAG_OVLD;
> ret = 0;
> for (;;) {
> - if (!ret) {
> + if (rcu_state.cbovld) {
> + j = (j + 2) / 3;
> + if (j <= 0)
> + j = 1;
> + }
> + if (!ret || time_before(jiffies + j, rcu_state.jiffies_force_qs)) {
> WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j);
> /*
> * jiffies_force_qs before RCU_GP_WAIT_FQS state