2018-11-30 15:13:49

by Prateek Sood

[permalink] [raw]
Subject: [PATCH] percpu_rwsem: fix missed wakeup due to reordering of load

In a scenario where cpu_hotplug_lock percpu_rw_semaphore is already
acquired for read operation by P1 using percpu_down_read().

Now we have P1 in the path of releaseing the cpu_hotplug_lock and P2
is in the process of acquiring cpu_hotplug_lock.

P1 P2
percpu_up_read() path percpu_down_write() path

rcu_sync_enter() //gp_state=GP_PASSED

rcu_sync_is_idle() //returns false down_write(rw_sem)

__percpu_up_read()

[L] task = rcu_dereference(w->task) //NULL

smp_rmb() [S] w->task = current

smp_mb()

[L] readers_active_check() //fails
schedule()

[S] __this_cpu_dec(read_count)

Since load of task can result in NULL. This can lead to missed wakeup
in rcuwait_wake_up(). Above sequence violated the following constraint
in rcuwait_wake_up():

WAIT WAKE
[S] tsk = current [S] cond = true
MB (A) MB (B)
[L] cond [L] tsk

This can happen as smp_rmb() in rcuwait_wake_up() will provide ordering
of load before barrier with load and store after barrier for arm64
architecture. Here the requirement is to order store before smp_rmb()
with load after the smp_rmb().

For the usage of rcuwait_wake_up() in __percpu_up_read() full barrier
(smp_mb) is required to complete the constraint of rcuwait_wake_up().

Signed-off-by: Prateek Sood <[email protected]>
---
kernel/exit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f1d74f0..a10820d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -306,7 +306,7 @@ void rcuwait_wake_up(struct rcuwait *w)
* MB (A) MB (B)
* [L] cond [L] tsk
*/
- smp_rmb(); /* (B) */
+ smp_mb(); /* (B) */

/*
* Avoid using task_rcu_dereference() magic as long as we are careful,
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.



2018-12-03 06:39:29

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] percpu_rwsem: fix missed wakeup due to reordering of load

On 2018-11-30 07:10, Prateek Sood wrote:
> In a scenario where cpu_hotplug_lock percpu_rw_semaphore is already
> acquired for read operation by P1 using percpu_down_read().
>
> Now we have P1 in the path of releaseing the cpu_hotplug_lock and P2
> is in the process of acquiring cpu_hotplug_lock.
>
> P1 P2
> percpu_up_read() path percpu_down_write() path
>
> rcu_sync_enter()
> //gp_state=GP_PASSED
>
> rcu_sync_is_idle() //returns false down_write(rw_sem)
>
> __percpu_up_read()
>
> [L] task = rcu_dereference(w->task) //NULL
>
> smp_rmb() [S] w->task = current
>
> smp_mb()
>
> [L] readers_active_check()
> //fails
> schedule()
>
> [S] __this_cpu_dec(read_count)
>
> Since load of task can result in NULL. This can lead to missed wakeup
> in rcuwait_wake_up(). Above sequence violated the following constraint
> in rcuwait_wake_up():
>
> WAIT WAKE
> [S] tsk = current [S] cond = true
> MB (A) MB (B)
> [L] cond [L] tsk
>

Hmm yeah we don't want rcu_wake_up() to get hoisted over the
__this_cpu_dec(read_count). The smp_rmb() does not make sense to me here
in the first place. Did you run into this scenario by code inspection or
you actually it the issue?

Thanks,
Davidlohr

2018-12-03 19:37:31

by Prateek Sood

[permalink] [raw]
Subject: Re: [PATCH] percpu_rwsem: fix missed wakeup due to reordering of load

On 12/03/2018 12:08 PM, Davidlohr Bueso wrote:
> On 2018-11-30 07:10, Prateek Sood wrote:
>> In a scenario where cpu_hotplug_lock percpu_rw_semaphore is already
>> acquired for read operation by P1 using percpu_down_read().
>>
>> Now we have P1 in the path of releaseing the cpu_hotplug_lock and P2
>> is in the process of acquiring cpu_hotplug_lock.
>>
>> P1                                               P2
>> percpu_up_read() path                      percpu_down_write() path
>>
>>                                           rcu_sync_enter() //gp_state=GP_PASSED
>>
>> rcu_sync_is_idle() //returns false        down_write(rw_sem)
>>
>> __percpu_up_read()
>>
>> [L] task = rcu_dereference(w->task) //NULL
>>
>> smp_rmb()                                  [S] w->task = current
>>
>>                                             smp_mb()
>>
>>                                            [L] readers_active_check() //fails
>>                          schedule()
>>
>> [S] __this_cpu_dec(read_count)
>>
>> Since load of task can result in NULL. This can lead to missed wakeup
>> in rcuwait_wake_up(). Above sequence violated the following constraint
>> in rcuwait_wake_up():
>>
>>      WAIT                WAKE
>> [S] tsk = current      [S] cond = true
>> MB (A)                        MB (B)
>> [L] cond          [L] tsk
>>
>
> Hmm yeah we don't want rcu_wake_up() to get hoisted over the __this_cpu_dec(read_count). The smp_rmb() does not make sense to me here in the first place. Did you run into this scenario by code inspection or you actually it the issue?
>
> Thanks,
> Davidlohr

I have checked one issue where it seems that cpu hotplug code
path is not able to get cpu_hotplug_lock in write mode and there
is a reader pending for cpu hotplug path to release
percpu_rw_semaphore->rwsem to acquire cpu_hotplug_lock.
This caused a deadlock.

From code inspection also it seems to be not adhering to arm64
smp_rmb() constraint of load/load-store ordering guarantee.


Thanks,
Prateek

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project

2018-12-12 14:29:01

by Prateek Sood

[permalink] [raw]
Subject: Re: [PATCH] percpu_rwsem: fix missed wakeup due to reordering of load

On 12/04/2018 01:06 AM, Prateek Sood wrote:
> On 12/03/2018 12:08 PM, Davidlohr Bueso wrote:
>> On 2018-11-30 07:10, Prateek Sood wrote:
>>> In a scenario where cpu_hotplug_lock percpu_rw_semaphore is already
>>> acquired for read operation by P1 using percpu_down_read().
>>>
>>> Now we have P1 in the path of releaseing the cpu_hotplug_lock and P2
>>> is in the process of acquiring cpu_hotplug_lock.
>>>
>>> P1                                               P2
>>> percpu_up_read() path                      percpu_down_write() path
>>>
>>>                                           rcu_sync_enter() //gp_state=GP_PASSED
>>>
>>> rcu_sync_is_idle() //returns false        down_write(rw_sem)
>>>
>>> __percpu_up_read()
>>>
>>> [L] task = rcu_dereference(w->task) //NULL
>>>
>>> smp_rmb()                                  [S] w->task = current
>>>
>>>                                             smp_mb()
>>>
>>>                                            [L] readers_active_check() //fails
>>>                          schedule()
>>>
>>> [S] __this_cpu_dec(read_count)
>>>
>>> Since load of task can result in NULL. This can lead to missed wakeup
>>> in rcuwait_wake_up(). Above sequence violated the following constraint
>>> in rcuwait_wake_up():
>>>
>>>      WAIT                WAKE
>>> [S] tsk = current      [S] cond = true
>>> MB (A)                        MB (B)
>>> [L] cond          [L] tsk
>>>
>>
>> Hmm yeah we don't want rcu_wake_up() to get hoisted over the __this_cpu_dec(read_count). The smp_rmb() does not make sense to me here in the first place. Did you run into this scenario by code inspection or you actually it the issue?
>>
>> Thanks,
>> Davidlohr
>
> I have checked one issue where it seems that cpu hotplug code
> path is not able to get cpu_hotplug_lock in write mode and there
> is a reader pending for cpu hotplug path to release
> percpu_rw_semaphore->rwsem to acquire cpu_hotplug_lock.
> This caused a deadlock.
>
> From code inspection also it seems to be not adhering to arm64
> smp_rmb() constraint of load/load-store ordering guarantee.
>
>
> Thanks,
> Prateek
>

Folks,

Please confirm if the suspicion of smp_rmb is correct.
IMO, it should be smp_mb() translating to dmb ish.


Thanks
Prateek

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project

2018-12-12 15:31:44

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH] percpu_rwsem: fix missed wakeup due to reordering of load

On Fri, Nov 30, 2018 at 08:40:56PM +0530, Prateek Sood wrote:
> In a scenario where cpu_hotplug_lock percpu_rw_semaphore is already
> acquired for read operation by P1 using percpu_down_read().
>
> Now we have P1 in the path of releaseing the cpu_hotplug_lock and P2
> is in the process of acquiring cpu_hotplug_lock.
>
> P1 P2
> percpu_up_read() path percpu_down_write() path
>
> rcu_sync_enter() //gp_state=GP_PASSED
>
> rcu_sync_is_idle() //returns false down_write(rw_sem)
>
> __percpu_up_read()
>
> [L] task = rcu_dereference(w->task) //NULL
>
> smp_rmb() [S] w->task = current
>
> smp_mb()
>
> [L] readers_active_check() //fails
> schedule()
>
> [S] __this_cpu_dec(read_count)
>
> Since load of task can result in NULL. This can lead to missed wakeup
> in rcuwait_wake_up(). Above sequence violated the following constraint
> in rcuwait_wake_up():
>
> WAIT WAKE
> [S] tsk = current [S] cond = true
> MB (A) MB (B)
> [L] cond [L] tsk
>
> This can happen as smp_rmb() in rcuwait_wake_up() will provide ordering
> of load before barrier with load and store after barrier for arm64
> architecture. Here the requirement is to order store before smp_rmb()
> with load after the smp_rmb().
>
> For the usage of rcuwait_wake_up() in __percpu_up_read() full barrier
> (smp_mb) is required to complete the constraint of rcuwait_wake_up().
>
> Signed-off-by: Prateek Sood <[email protected]>

I know this is going to sound ridiculous (coming from me or from
the Italian that I am), but it looks like we could both work on
our English. ;-)

But the fix seems correct to me:

Reviewed-by: Andrea Parri <[email protected]>

It might be a good idea to integrate this fix with fixes to the
inline comments/annotations: for example, I see that the comment
in rcuwait_wake_up() mentions a non-existing rcuwait_trywake();
moreover, the memory-barrier annotation "B" is used also for the
smp_mb() preceding the __this_cpu_dec() in __percpu_up_read().

Andrea


> ---
> kernel/exit.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f1d74f0..a10820d 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -306,7 +306,7 @@ void rcuwait_wake_up(struct rcuwait *w)
> * MB (A) MB (B)
> * [L] cond [L] tsk
> */
> - smp_rmb(); /* (B) */
> + smp_mb(); /* (B) */
>
> /*
> * Avoid using task_rcu_dereference() magic as long as we are careful,
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
> is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
>

2018-12-12 15:33:30

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] percpu_rwsem: fix missed wakeup due to reordering of load

On 2018-12-12 06:26, Prateek Sood wrote:
> Please confirm if the suspicion of smp_rmb is correct.
> IMO, it should be smp_mb() translating to dmb ish.

Feel free to add my ack. This should also be Cc to stable as of v4.11.

Fixes: 8f95c90ceb54 (sched/wait, RCU: Introduce rcuwait machinery)

Thanks,
Davidlohr

2018-12-21 08:59:31

by Prateek Sood

[permalink] [raw]
Subject: Re: [PATCH] percpu_rwsem: fix missed wakeup due to reordering of load

On 12/12/2018 08:58 PM, Andrea Parri wrote:
> On Fri, Nov 30, 2018 at 08:40:56PM +0530, Prateek Sood wrote:
>> In a scenario where cpu_hotplug_lock percpu_rw_semaphore is already
>> acquired for read operation by P1 using percpu_down_read().
>>
>> Now we have P1 in the path of releaseing the cpu_hotplug_lock and P2
>> is in the process of acquiring cpu_hotplug_lock.
>>
>> P1 P2
>> percpu_up_read() path percpu_down_write() path
>>
>> rcu_sync_enter() //gp_state=GP_PASSED
>>
>> rcu_sync_is_idle() //returns false down_write(rw_sem)
>>
>> __percpu_up_read()
>>
>> [L] task = rcu_dereference(w->task) //NULL
>>
>> smp_rmb() [S] w->task = current
>>
>> smp_mb()
>>
>> [L] readers_active_check() //fails
>> schedule()
>>
>> [S] __this_cpu_dec(read_count)
>>
>> Since load of task can result in NULL. This can lead to missed wakeup
>> in rcuwait_wake_up(). Above sequence violated the following constraint
>> in rcuwait_wake_up():
>>
>> WAIT WAKE
>> [S] tsk = current [S] cond = true
>> MB (A) MB (B)
>> [L] cond [L] tsk
>>
>> This can happen as smp_rmb() in rcuwait_wake_up() will provide ordering
>> of load before barrier with load and store after barrier for arm64
>> architecture. Here the requirement is to order store before smp_rmb()
>> with load after the smp_rmb().
>>
>> For the usage of rcuwait_wake_up() in __percpu_up_read() full barrier
>> (smp_mb) is required to complete the constraint of rcuwait_wake_up().
>>
>> Signed-off-by: Prateek Sood <[email protected]>
>
> I know this is going to sound ridiculous (coming from me or from
> the Italian that I am), but it looks like we could both work on
> our English. ;-)
>
> But the fix seems correct to me:
>
> Reviewed-by: Andrea Parri <[email protected]>
>
> It might be a good idea to integrate this fix with fixes to the
> inline comments/annotations: for example, I see that the comment
> in rcuwait_wake_up() mentions a non-existing rcuwait_trywake();
Ok, I will update the comment in next version of the patch.

> moreover, the memory-barrier annotation "B" is used also for the
> smp_mb() preceding the __this_cpu_dec() in __percpu_up_read().
In this annotation "B" is corresponding to annotation "A" in
rcuwait_wait_event(). So this seems to be correct.

>
> Andrea
>
>
>> ---
>> kernel/exit.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/exit.c b/kernel/exit.c
>> index f1d74f0..a10820d 100644
>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -306,7 +306,7 @@ void rcuwait_wake_up(struct rcuwait *w)
>> * MB (A) MB (B)
>> * [L] cond [L] tsk
>> */
>> - smp_rmb(); /* (B) */
>> + smp_mb(); /* (B) */
>>
>> /*
>> * Avoid using task_rcu_dereference() magic as long as we are careful,
>> --
>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
>> is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
>>


--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project

2018-12-21 17:02:39

by Prateek Sood

[permalink] [raw]
Subject: [PATCH] percpu_rwsem: fix missed wakeup due to reordering of load

P1 is releaseing the cpu_hotplug_lock and P2 is acquiring
cpu_hotplug_lock.

P1 P2
percpu_up_read() path percpu_down_write() path

rcu_sync_enter() //gp_state=GP_PASSED

rcu_sync_is_idle() //returns false down_write(rw_sem)

__percpu_up_read()

[L] task = rcu_dereference(w->task) //NULL

smp_rmb() [S] w->task = current

smp_mb()

[L] readers_active_check() //fails
schedule()

[S] __this_cpu_dec(read_count)

Since load of task can result in NULL, it can lead to missed wakeup
in rcuwait_wake_up(). Above sequence violated the following constraint
in rcuwait_wake_up():

WAIT WAKE
[S] tsk = current [S] cond = true
MB (A) MB (B)
[L] cond [L] tsk

This can happen as smp_rmb() in rcuwait_wake_up() will provide ordering
of load before barrier with load and store after barrier for arm64
architecture. Here the requirement is to order store before smp_rmb()
with load after the smp_rmb().

For the usage of rcuwait_wake_up() in __percpu_up_read() full barrier
(smp_mb) is required to complete the constraint of rcuwait_wake_up().

Signed-off-by: Prateek Sood <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>

---
kernel/exit.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index ac1a814..696e0e1 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -298,7 +298,7 @@ void rcuwait_wake_up(struct rcuwait *w)
/*
* Order condition vs @task, such that everything prior to the load
* of @task is visible. This is the condition as to why the user called
- * rcuwait_trywake() in the first place. Pairs with set_current_state()
+ * rcuwait_wake_up() in the first place. Pairs with set_current_state()
* barrier (A) in rcuwait_wait_event().
*
* WAIT WAKE
@@ -306,7 +306,7 @@ void rcuwait_wake_up(struct rcuwait *w)
* MB (A) MB (B)
* [L] cond [L] tsk
*/
- smp_rmb(); /* (B) */
+ smp_mb(); /* (B) */

/*
* Avoid using task_rcu_dereference() magic as long as we are careful,
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


2018-12-21 18:29:11

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH] percpu_rwsem: fix missed wakeup due to reordering of load

On Fri, Dec 21, 2018 at 12:59:13PM +0530, Prateek Sood wrote:
> P1 is releaseing the cpu_hotplug_lock and P2 is acquiring
> cpu_hotplug_lock.
>
> P1 P2
> percpu_up_read() path percpu_down_write() path
>
> rcu_sync_enter() //gp_state=GP_PASSED
>
> rcu_sync_is_idle() //returns false down_write(rw_sem)
>
> __percpu_up_read()
>
> [L] task = rcu_dereference(w->task) //NULL
>
> smp_rmb() [S] w->task = current
>
> smp_mb()
>
> [L] readers_active_check() //fails
> schedule()
>
> [S] __this_cpu_dec(read_count)
>
> Since load of task can result in NULL, it can lead to missed wakeup
> in rcuwait_wake_up(). Above sequence violated the following constraint
> in rcuwait_wake_up():
>
> WAIT WAKE
> [S] tsk = current [S] cond = true
> MB (A) MB (B)
> [L] cond [L] tsk
>
> This can happen as smp_rmb() in rcuwait_wake_up() will provide ordering
> of load before barrier with load and store after barrier for arm64
> architecture. Here the requirement is to order store before smp_rmb()
> with load after the smp_rmb().
>
> For the usage of rcuwait_wake_up() in __percpu_up_read() full barrier
> (smp_mb) is required to complete the constraint of rcuwait_wake_up().
>
> Signed-off-by: Prateek Sood <[email protected]>
> Acked-by: Davidlohr Bueso <[email protected]>

It looks like Peter has already queued this, c.f.,

https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core&id=73685b8af253cf32b1b41b3045f2828c6fb2439e

with a modified changelog and my Reviewed-by (that I confirm).

I can't tell how/when this is going to be upstreamed (guess via -tip),
Peter?

Andrea


>
> ---
> kernel/exit.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index ac1a814..696e0e1 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -298,7 +298,7 @@ void rcuwait_wake_up(struct rcuwait *w)
> /*
> * Order condition vs @task, such that everything prior to the load
> * of @task is visible. This is the condition as to why the user called
> - * rcuwait_trywake() in the first place. Pairs with set_current_state()
> + * rcuwait_wake_up() in the first place. Pairs with set_current_state()
> * barrier (A) in rcuwait_wait_event().
> *
> * WAIT WAKE
> @@ -306,7 +306,7 @@ void rcuwait_wake_up(struct rcuwait *w)
> * MB (A) MB (B)
> * [L] cond [L] tsk
> */
> - smp_rmb(); /* (B) */
> + smp_mb(); /* (B) */
>
> /*
> * Avoid using task_rcu_dereference() magic as long as we are careful,
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
> is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
>

Subject: [tip:locking/core] sched/wait: Fix rcuwait_wake_up() ordering

Commit-ID: 6dc080eeb2ba01973bfff0d79844d7a59e12542e
Gitweb: https://git.kernel.org/tip/6dc080eeb2ba01973bfff0d79844d7a59e12542e
Author: Prateek Sood <[email protected]>
AuthorDate: Fri, 30 Nov 2018 20:40:56 +0530
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 21 Jan 2019 11:15:36 +0100

sched/wait: Fix rcuwait_wake_up() ordering

For some peculiar reason rcuwait_wake_up() has the right barrier in
the comment, but not in the code.

This mistake has been observed to cause a deadlock in the following
situation:

P1 P2

percpu_up_read() percpu_down_write()
rcu_sync_is_idle() // false
rcu_sync_enter()
...
__percpu_up_read()

[S] ,- __this_cpu_dec(*sem->read_count)
| smp_rmb();
[L] | task = rcu_dereference(w->task) // NULL
|
| [S] w->task = current
| smp_mb();
| [L] readers_active_check() // fail
`-> <store happens here>

Where the smp_rmb() (obviously) fails to constrain the store.

[ peterz: Added changelog. ]

Signed-off-by: Prateek Sood <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: 8f95c90ceb54 ("sched/wait, RCU: Introduce rcuwait machinery")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/exit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 284f2fe9a293..3fb7be001964 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -307,7 +307,7 @@ void rcuwait_wake_up(struct rcuwait *w)
* MB (A) MB (B)
* [L] cond [L] tsk
*/
- smp_rmb(); /* (B) */
+ smp_mb(); /* (B) */

/*
* Avoid using task_rcu_dereference() magic as long as we are careful,