2021-05-16 05:31:17

by Xu, Yanfei

[permalink] [raw]
Subject: [PATCH] rcu: wrong stall log when only one task blocked within RCU read-side

From: Yanfei Xu <[email protected]>

The for loop in rcu_print_task_stall() always omits the first blocked
task in RCU read-ride critical section. It will cause status of the
blocked task can't be printed and lead a wrong ndetected variable.
With the above, the wired stall log comes, All QSes seen, but qsmask
still is not zero:

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-1 rcu_node (CPUs 12-23):
(detected by 15, t=6504 jiffies, g=164777, q=9011209)
rcu: All QSes seen, last rcu_preempt kthread activity 1
(4295252379-4295252378), jiffies_till_next_fqs=1, root ->qsmask 0x2
BUG: sleeping function called from invalid context at
include/linux/uaccess.h:156
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 70613, name:
msgstress04
INFO: lockdep is turned off.
Preemption disabled at:
[<ffff8000104031a4>] create_object.isra.0+0x204/0x4b0
CPU: 15 PID: 70613 Comm: msgstress04 Kdump: loaded Not tainted
5.12.2-yoctodev-standard #1
Hardware name: Marvell OcteonTX CN96XX board (DT)
Call trace:
dump_backtrace+0x0/0x2cc
show_stack+0x24/0x30
dump_stack+0x110/0x188
___might_sleep+0x214/0x2d0
__might_sleep+0x7c/0xe0

Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
Signed-off-by: Yanfei Xu <[email protected]>
---
kernel/rcu/tree_stall.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 59b95cc5cbdf..b72311d24a9f 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -280,8 +280,8 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
break;
}
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
- for (i--; i; i--) {
- t = ts[i];
+ while (i) {
+ t = ts[--i];
if (!try_invoke_on_locked_down_task(t, check_slow_task, &rscr))
pr_cont(" P%d", t->pid);
else
--
2.27.0



2021-05-16 09:02:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: wrong stall log when only one task blocked within RCU read-side

On Sun, May 16, 2021 at 12:45:11AM +0800, [email protected] wrote:
> From: Yanfei Xu <[email protected]>
>
> The for loop in rcu_print_task_stall() always omits the first blocked
> task in RCU read-ride critical section. It will cause status of the
> blocked task can't be printed and lead a wrong ndetected variable.
> With the above, the wired stall log comes, All QSes seen, but qsmask
> still is not zero:
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: Tasks blocked on level-1 rcu_node (CPUs 12-23):
> (detected by 15, t=6504 jiffies, g=164777, q=9011209)
> rcu: All QSes seen, last rcu_preempt kthread activity 1
> (4295252379-4295252378), jiffies_till_next_fqs=1, root ->qsmask 0x2
> BUG: sleeping function called from invalid context at
> include/linux/uaccess.h:156
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 70613, name:
> msgstress04
> INFO: lockdep is turned off.
> Preemption disabled at:
> [<ffff8000104031a4>] create_object.isra.0+0x204/0x4b0
> CPU: 15 PID: 70613 Comm: msgstress04 Kdump: loaded Not tainted
> 5.12.2-yoctodev-standard #1
> Hardware name: Marvell OcteonTX CN96XX board (DT)
> Call trace:
> dump_backtrace+0x0/0x2cc
> show_stack+0x24/0x30
> dump_stack+0x110/0x188
> ___might_sleep+0x214/0x2d0
> __might_sleep+0x7c/0xe0
>
> Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
> Signed-off-by: Yanfei Xu <[email protected]>

Apologies for that bug, and thank you for the fix! Queued for testing
and review. I wordsmithed as usual, so could you please check to make
sure that I did not mess anything up?

Thanx, Paul

------------------------------------------------------------------------

commit 9855fdb508e2fa136551294861682d83b71daded
Author: Yanfei Xu <[email protected]>
Date: Sun May 16 00:45:11 2021 +0800

rcu: Fix to include first blocked task in stall warning

The for loop in rcu_print_task_stall() always omits ts[0], which points
to the first task blocking the stalled grace period. This in turn fails
to count this first task, which means that ndetected will be equal to
zero when all CPUs have passed through their quiescent states and only
one task is blocking the stalled grace period. This zero value for
ndetected will in turn result in an incorrect "All QSes seen" message:

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-1 rcu_node (CPUs 12-23):
(detected by 15, t=6504 jiffies, g=164777, q=9011209)
rcu: All QSes seen, last rcu_preempt kthread activity 1 (4295252379-4295252378), jiffies_till_next_fqs=1, root ->qsmask 0x2
BUG: sleeping function called from invalid context at include/linux/uaccess.h:156
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 70613, name: msgstress04
INFO: lockdep is turned off.
Preemption disabled at:
[<ffff8000104031a4>] create_object.isra.0+0x204/0x4b0
CPU: 15 PID: 70613 Comm: msgstress04 Kdump: loaded Not tainted
5.12.2-yoctodev-standard #1
Hardware name: Marvell OcteonTX CN96XX board (DT)
Call trace:
dump_backtrace+0x0/0x2cc
show_stack+0x24/0x30
dump_stack+0x110/0x188
___might_sleep+0x214/0x2d0
__might_sleep+0x7c/0xe0

This commit therefore fixes the loop to include ts[0].

Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
Signed-off-by: Yanfei Xu <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index f4152aa18f48..a10ea1f1f81f 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -280,8 +280,8 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
break;
}
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
- for (i--; i; i--) {
- t = ts[i];
+ while (i) {
+ t = ts[--i];
if (!try_invoke_on_locked_down_task(t, check_slow_task, &rscr))
pr_cont(" P%d", t->pid);
else

2021-05-16 10:15:24

by Xu, Yanfei

[permalink] [raw]
Subject: Re: [PATCH] rcu: wrong stall log when only one task blocked within RCU read-side



On 5/16/21 2:36 AM, Paul E. McKenney wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, May 16, 2021 at 12:45:11AM +0800, [email protected] wrote:
>> From: Yanfei Xu <[email protected]>
>>
>> The for loop in rcu_print_task_stall() always omits the first blocked
>> task in RCU read-ride critical section. It will cause status of the
>> blocked task can't be printed and lead a wrong ndetected variable.
>> With the above, the wired stall log comes, All QSes seen, but qsmask
>> still is not zero:
>>
>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> rcu: Tasks blocked on level-1 rcu_node (CPUs 12-23):
>> (detected by 15, t=6504 jiffies, g=164777, q=9011209)
>> rcu: All QSes seen, last rcu_preempt kthread activity 1
>> (4295252379-4295252378), jiffies_till_next_fqs=1, root ->qsmask 0x2
>> BUG: sleeping function called from invalid context at
>> include/linux/uaccess.h:156
>> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 70613, name:
>> msgstress04
>> INFO: lockdep is turned off.
>> Preemption disabled at:
>> [<ffff8000104031a4>] create_object.isra.0+0x204/0x4b0
>> CPU: 15 PID: 70613 Comm: msgstress04 Kdump: loaded Not tainted
>> 5.12.2-yoctodev-standard #1
>> Hardware name: Marvell OcteonTX CN96XX board (DT)
>> Call trace:
>> dump_backtrace+0x0/0x2cc
>> show_stack+0x24/0x30
>> dump_stack+0x110/0x188
>> ___might_sleep+0x214/0x2d0
>> __might_sleep+0x7c/0xe0
>>
>> Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
>> Signed-off-by: Yanfei Xu <[email protected]>
>
> Apologies for that bug, and thank you for the fix! Queued for testing
> and review. I wordsmithed as usual, so could you please check to make
> sure that I did not mess anything up?
>

Ah, this new commit message is more clear. Thank you!

Regards,
Yanfei

> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 9855fdb508e2fa136551294861682d83b71daded
> Author: Yanfei Xu <[email protected]>
> Date: Sun May 16 00:45:11 2021 +0800
>
> rcu: Fix to include first blocked task in stall warning
>
> The for loop in rcu_print_task_stall() always omits ts[0], which points
> to the first task blocking the stalled grace period. This in turn fails
> to count this first task, which means that ndetected will be equal to
> zero when all CPUs have passed through their quiescent states and only
> one task is blocking the stalled grace period. This zero value for
> ndetected will in turn result in an incorrect "All QSes seen" message:
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: Tasks blocked on level-1 rcu_node (CPUs 12-23):
> (detected by 15, t=6504 jiffies, g=164777, q=9011209)
> rcu: All QSes seen, last rcu_preempt kthread activity 1 (4295252379-4295252378), jiffies_till_next_fqs=1, root ->qsmask 0x2
> BUG: sleeping function called from invalid context at include/linux/uaccess.h:156
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 70613, name: msgstress04
> INFO: lockdep is turned off.
> Preemption disabled at:
> [<ffff8000104031a4>] create_object.isra.0+0x204/0x4b0
> CPU: 15 PID: 70613 Comm: msgstress04 Kdump: loaded Not tainted
> 5.12.2-yoctodev-standard #1
> Hardware name: Marvell OcteonTX CN96XX board (DT)
> Call trace:
> dump_backtrace+0x0/0x2cc
> show_stack+0x24/0x30
> dump_stack+0x110/0x188
> ___might_sleep+0x214/0x2d0
> __might_sleep+0x7c/0xe0
>
> This commit therefore fixes the loop to include ts[0].
>
> Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
> Signed-off-by: Yanfei Xu <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index f4152aa18f48..a10ea1f1f81f 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -280,8 +280,8 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
> break;
> }
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> - for (i--; i; i--) {
> - t = ts[i];
> + while (i) {
> + t = ts[--i];
> if (!try_invoke_on_locked_down_task(t, check_slow_task, &rscr))
> pr_cont(" P%d", t->pid);
> else
>