2023-07-21 08:19:18

by Leizhen (ThunderTown)

[permalink] [raw]
Subject: [PATCH 0/2] rcu: Simplify check_cpu_stall()

From: Zhen Lei <[email protected]>

Eliminate the duplicated code and comments.

Zhen Lei (2):
rcu: Delete a redundant check in check_cpu_stall()
rcu: Simplify check_cpu_stall()

kernel/rcu/tree_stall.h | 36 +++++++++++-------------------------
1 file changed, 11 insertions(+), 25 deletions(-)

--
2.25.1



2023-07-21 08:21:31

by Leizhen (ThunderTown)

[permalink] [raw]
Subject: [PATCH 1/2] rcu: Delete a redundant check in check_cpu_stall()

From: Zhen Lei <[email protected]>

j = jiffies;
js = READ_ONCE(rcu_state.jiffies_stall); (1)
if (ULONG_CMP_LT(j, js)) (2)
return;

if (cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) (3)
didstall = true;

if (didstall && READ_ONCE(rcu_state.jiffies_stall) == jn) { (4)
jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
WRITE_ONCE(rcu_state.jiffies_stall, jn);
}

For ease of description, the pseudo code is extracted as above. First,
assume that only one CPU is operating, the condition 'didstall' is true
means that (3) succeeds. That is, the value of rcu_state.jiffies_stall
must be 'jn'.

Then, assuming that another CPU is also operating at the same time, there
are two cases:
1. That CPU sees the updated result at (1), it returns at (2).
2. That CPU does not see the updated result at (1), it fails at (3) and
cmpxchg returns jn. So that, didstall cannot be true.

Signed-off-by: Zhen Lei <[email protected]>
---
kernel/rcu/tree_stall.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index cc884cd49e026a3..371713f3f7d15d9 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -794,7 +794,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
rcu_ftrace_dump(DUMP_ALL);
didstall = true;
}
- if (didstall && READ_ONCE(rcu_state.jiffies_stall) == jn) {
+ if (didstall) {
jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
WRITE_ONCE(rcu_state.jiffies_stall, jn);
}
--
2.25.1


2023-07-21 08:23:02

by Leizhen (ThunderTown)

[permalink] [raw]
Subject: [PATCH 2/2] rcu: Simplify check_cpu_stall()

From: Zhen Lei <[email protected]>

The code and comments of self detected and other detected are the same
except the output function. Combine the same parts to avoid duplication.

Signed-off-by: Zhen Lei <[email protected]>
---
kernel/rcu/tree_stall.h | 36 +++++++++++-------------------------
1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 371713f3f7d15d9..0ade7f9dcaa18da 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -711,7 +711,7 @@ static void print_cpu_stall(unsigned long gps)

static void check_cpu_stall(struct rcu_data *rdp)
{
- bool didstall = false;
+ bool self_detected;
unsigned long gs1;
unsigned long gs2;
unsigned long gps;
@@ -758,10 +758,10 @@ static void check_cpu_stall(struct rcu_data *rdp)
return; /* No stall or GP completed since entering function. */
rnp = rdp->mynode;
jn = jiffies + ULONG_MAX / 2;
+ self_detected = READ_ONCE(rnp->qsmask) & rdp->grpmask;
if (rcu_gp_in_progress() &&
- (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
+ (self_detected || ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) &&
cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
-
/*
* If a virtual machine is stopped by the host it can look to
* the watchdog like an RCU stall. Check to see if the host
@@ -770,31 +770,17 @@ static void check_cpu_stall(struct rcu_data *rdp)
if (kvm_check_and_clear_guest_paused())
return;

- /* We haven't checked in, so go dump stack. */
- print_cpu_stall(gps);
- if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
- rcu_ftrace_dump(DUMP_ALL);
- didstall = true;
-
- } else if (rcu_gp_in_progress() &&
- ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
- cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
-
- /*
- * If a virtual machine is stopped by the host it can look to
- * the watchdog like an RCU stall. Check to see if the host
- * stopped the vm.
- */
- if (kvm_check_and_clear_guest_paused())
- return;
+ if (self_detected) {
+ /* We haven't checked in, so go dump stack. */
+ print_cpu_stall(gps);
+ } else {
+ /* They had a few time units to dump stack, so complain. */
+ print_other_cpu_stall(gs2, gps);
+ }

- /* They had a few time units to dump stack, so complain. */
- print_other_cpu_stall(gs2, gps);
if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
rcu_ftrace_dump(DUMP_ALL);
- didstall = true;
- }
- if (didstall) {
+
jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
WRITE_ONCE(rcu_state.jiffies_stall, jn);
}
--
2.25.1


2023-07-21 23:59:49

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/2] rcu: Delete a redundant check in check_cpu_stall()

On Fri, Jul 21, 2023 at 03:57:15PM +0800, [email protected] wrote:
> From: Zhen Lei <[email protected]>
>
> j = jiffies;
> js = READ_ONCE(rcu_state.jiffies_stall); (1)
> if (ULONG_CMP_LT(j, js)) (2)
> return;
>
> if (cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) (3)
> didstall = true;
>
> if (didstall && READ_ONCE(rcu_state.jiffies_stall) == jn) { (4)
> jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> WRITE_ONCE(rcu_state.jiffies_stall, jn);
> }
>
> For ease of description, the pseudo code is extracted as above. First,
> assume that only one CPU is operating, the condition 'didstall' is true
> means that (3) succeeds. That is, the value of rcu_state.jiffies_stall
> must be 'jn'.
>
> Then, assuming that another CPU is also operating at the same time, there
> are two cases:
> 1. That CPU sees the updated result at (1), it returns at (2).
> 2. That CPU does not see the updated result at (1), it fails at (3) and
> cmpxchg returns jn. So that, didstall cannot be true.

The history behind this one is that there are races in which the stall
can end in the midst of check_cpu_stall(). For example, when the activity
of producing the stall warning breaks things loose.

And yes, long ago, I figured that if things had been static for so
many seconds, they were unlikely to change, and thus omitted any and
all synchronization. The Linux kernel taught me better. ;-)

Thanx, Paul

> Signed-off-by: Zhen Lei <[email protected]>
> ---
> kernel/rcu/tree_stall.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index cc884cd49e026a3..371713f3f7d15d9 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -794,7 +794,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
> rcu_ftrace_dump(DUMP_ALL);
> didstall = true;
> }
> - if (didstall && READ_ONCE(rcu_state.jiffies_stall) == jn) {
> + if (didstall) {
> jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> WRITE_ONCE(rcu_state.jiffies_stall, jn);
> }
> --
> 2.25.1
>

2023-07-22 00:08:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] rcu: Simplify check_cpu_stall()

On Fri, Jul 21, 2023 at 03:57:16PM +0800, [email protected] wrote:
> From: Zhen Lei <[email protected]>
>
> The code and comments of self detected and other detected are the same
> except the output function. Combine the same parts to avoid duplication.
>
> Signed-off-by: Zhen Lei <[email protected]>

This looks like a nice simplification, thank you!

Please make this not depend on the earlier patch and resend.

Thanx, Paul

> ---
> kernel/rcu/tree_stall.h | 36 +++++++++++-------------------------
> 1 file changed, 11 insertions(+), 25 deletions(-)
>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 371713f3f7d15d9..0ade7f9dcaa18da 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -711,7 +711,7 @@ static void print_cpu_stall(unsigned long gps)
>
> static void check_cpu_stall(struct rcu_data *rdp)
> {
> - bool didstall = false;
> + bool self_detected;
> unsigned long gs1;
> unsigned long gs2;
> unsigned long gps;
> @@ -758,10 +758,10 @@ static void check_cpu_stall(struct rcu_data *rdp)
> return; /* No stall or GP completed since entering function. */
> rnp = rdp->mynode;
> jn = jiffies + ULONG_MAX / 2;
> + self_detected = READ_ONCE(rnp->qsmask) & rdp->grpmask;
> if (rcu_gp_in_progress() &&
> - (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
> + (self_detected || ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) &&
> cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> -
> /*
> * If a virtual machine is stopped by the host it can look to
> * the watchdog like an RCU stall. Check to see if the host
> @@ -770,31 +770,17 @@ static void check_cpu_stall(struct rcu_data *rdp)
> if (kvm_check_and_clear_guest_paused())
> return;
>
> - /* We haven't checked in, so go dump stack. */
> - print_cpu_stall(gps);
> - if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> - rcu_ftrace_dump(DUMP_ALL);
> - didstall = true;
> -
> - } else if (rcu_gp_in_progress() &&
> - ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
> - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> -
> - /*
> - * If a virtual machine is stopped by the host it can look to
> - * the watchdog like an RCU stall. Check to see if the host
> - * stopped the vm.
> - */
> - if (kvm_check_and_clear_guest_paused())
> - return;
> + if (self_detected) {
> + /* We haven't checked in, so go dump stack. */
> + print_cpu_stall(gps);
> + } else {
> + /* They had a few time units to dump stack, so complain. */
> + print_other_cpu_stall(gs2, gps);
> + }
>
> - /* They had a few time units to dump stack, so complain. */
> - print_other_cpu_stall(gs2, gps);
> if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> rcu_ftrace_dump(DUMP_ALL);
> - didstall = true;
> - }
> - if (didstall) {
> +
> jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> WRITE_ONCE(rcu_state.jiffies_stall, jn);
> }
> --
> 2.25.1
>

2023-07-24 03:47:26

by Leizhen (ThunderTown)

[permalink] [raw]
Subject: Re: [PATCH 1/2] rcu: Delete a redundant check in check_cpu_stall()



On 2023/7/22 6:37, Paul E. McKenney wrote:
> On Fri, Jul 21, 2023 at 03:57:15PM +0800, [email protected] wrote:
>> From: Zhen Lei <[email protected]>
>>
>> j = jiffies;
>> js = READ_ONCE(rcu_state.jiffies_stall); (1)
>> if (ULONG_CMP_LT(j, js)) (2)
>> return;
>>
>> if (cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) (3)
>> didstall = true;
>>
>> if (didstall && READ_ONCE(rcu_state.jiffies_stall) == jn) { (4)
>> jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
>> WRITE_ONCE(rcu_state.jiffies_stall, jn);
>> }
>>
>> For ease of description, the pseudo code is extracted as above. First,
>> assume that only one CPU is operating, the condition 'didstall' is true
>> means that (3) succeeds. That is, the value of rcu_state.jiffies_stall
>> must be 'jn'.
>>
>> Then, assuming that another CPU is also operating at the same time, there
>> are two cases:
>> 1. That CPU sees the updated result at (1), it returns at (2).
>> 2. That CPU does not see the updated result at (1), it fails at (3) and
>> cmpxchg returns jn. So that, didstall cannot be true.
>
> The history behind this one is that there are races in which the stall
> can end in the midst of check_cpu_stall(). For example, when the activity
> of producing the stall warning breaks things loose.
>
> And yes, long ago, I figured that if things had been static for so
> many seconds, they were unlikely to change, and thus omitted any and
> all synchronization. The Linux kernel taught me better. ;-)

Okay, got it, thank you

>
> Thanx, Paul
>
>> Signed-off-by: Zhen Lei <[email protected]>
>> ---
>> kernel/rcu/tree_stall.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
>> index cc884cd49e026a3..371713f3f7d15d9 100644
>> --- a/kernel/rcu/tree_stall.h
>> +++ b/kernel/rcu/tree_stall.h
>> @@ -794,7 +794,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
>> rcu_ftrace_dump(DUMP_ALL);
>> didstall = true;
>> }
>> - if (didstall && READ_ONCE(rcu_state.jiffies_stall) == jn) {
>> + if (didstall) {
>> jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
>> WRITE_ONCE(rcu_state.jiffies_stall, jn);
>> }
>> --
>> 2.25.1
>>
> .
>

--
Regards,
Zhen Lei