LinuxLists.cc - [patch V3 56/64] futex: Correct the number of requeued waiters for PI

2021-08-05 19:11:58

Subject: [patch V3 56/64] futex: Correct the number of requeued waiters for PI

From: Thomas Gleixner <[email protected]>

The accounting is wrong when either the PI sanity check or the
requeue PI operation fails. Adjust it in the failure path.

Will be simplified in the next step.

Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/futex.c | 4 ++++
1 file changed, 4 insertions(+)
---
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2131,6 +2131,8 @@ static int futex_requeue(u32 __user *uad

/* Ensure we requeue to the expected futex for requeue_pi. */
if (requeue_pi && !match_futex(this->requeue_pi_key, &key2)) {
+ /* Don't account for it */
+ task_count--;
ret = -EINVAL;
break;
}
@@ -2172,6 +2174,8 @@ static int futex_requeue(u32 __user *uad
*/
this->pi_state = NULL;
put_pi_state(pi_state);
+ /* Don't account for it */
+ task_count--;
/*
* We stop queueing more waiters and let user
* space deal with the mess.

2021-08-08 17:07:50

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: [patch V3 56/64] futex: Correct the number of requeued waiters for PI

On Thu, 05 Aug 2021, Thomas Gleixner wrote:

>From: Thomas Gleixner <[email protected]>
>
>The accounting is wrong when either the PI sanity check or the
>requeue PI operation fails. Adjust it in the failure path.

Ok fortunately these accounting errors are benign considering they
are in error paths. This also made me wonder about the requeue PI
top-waiter wakeup from futex_proxy_trylock_atomic(), which is always
required with nr_wakers == 1. We account for it on the successful
case we acquired the lock on it's behalf (and thus requeue_pi_wake_futex
was called), but if the corresponding lookup_pi_state fails, we'll retry.
So, shouldn't the task_count++ only be considered when we know the
requeueing is next (a successful top_waiter acquiring the lock+pi state)?

@@ -2260,7 +2260,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
*/
if (ret > 0) {
WARN_ON(pi_state);
- task_count++;
/*
* If we acquired the lock, then the user space value
* of uaddr2 should be vpid. It cannot be changed by
@@ -2275,6 +2274,8 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
*/
ret = lookup_pi_state(uaddr2, ret, hb2, &key2,
&pi_state, &exiting);
+ if (!ret)
+ task_count++;
}

switch (ret) {

Also, looking at the code, I think we can use the goto retry_private
optimization for private futexes upon futex_proxy_trylock_atomic
lookup_pi_state errors:

@@ -2290,8 +2290,11 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
double_unlock_hb(hb1, hb2);
hb_waiters_dec(hb2);
ret = fault_in_user_writeable(uaddr2);
- if (!ret)
+ if (!ret) {
+ if (!(flags & FLAGS_SHARED))
+ goto retry_private;
goto retry;
+ }
return ret;
case -EBUSY:
case -EAGAIN:

Thanks,
Davidlohr

2021-08-09 09:54:51

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [patch V3 56/64] futex: Correct the number of requeued waiters for PI

On Sun, Aug 08 2021 at 10:05, Davidlohr Bueso wrote:
> On Thu, 05 Aug 2021, Thomas Gleixner wrote:
>
>>From: Thomas Gleixner <[email protected]>
>>
>>The accounting is wrong when either the PI sanity check or the
>>requeue PI operation fails. Adjust it in the failure path.
>
> Ok fortunately these accounting errors are benign considering they
> are in error paths. This also made me wonder about the requeue PI
> top-waiter wakeup from futex_proxy_trylock_atomic(), which is always
> required with nr_wakers == 1. We account for it on the successful
> case we acquired the lock on it's behalf (and thus requeue_pi_wake_futex
> was called), but if the corresponding lookup_pi_state fails, we'll retry.
> So, shouldn't the task_count++ only be considered when we know the
> requeueing is next (a successful top_waiter acquiring the lock+pi state)?
>
> @@ -2260,7 +2260,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
> */
> if (ret > 0) {
> WARN_ON(pi_state);
> - task_count++;
> /*
> * If we acquired the lock, then the user space value
> * of uaddr2 should be vpid. It cannot be changed by
> @@ -2275,6 +2274,8 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
> */
> ret = lookup_pi_state(uaddr2, ret, hb2, &key2,
> &pi_state, &exiting);
> + if (!ret)
> + task_count++;
> }

Yes, but if futex_proxy_trylock_atomic() succeeds and lookup_pi_state()
fails then the user space futex value is already the VPID of the top
waiter and a retry will then fail the futex_proxy_trylock_atomic().

> switch (ret) {
>
> Also, looking at the code, I think we can use the goto retry_private
> optimization for private futexes upon futex_proxy_trylock_atomic
> lookup_pi_state errors:
>
> @@ -2290,8 +2290,11 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
> double_unlock_hb(hb1, hb2);
> hb_waiters_dec(hb2);
> ret = fault_in_user_writeable(uaddr2);
> - if (!ret)
> + if (!ret) {
> + if (!(flags & FLAGS_SHARED))
> + goto retry_private;
> goto retry;
> + }

Yes, we can, but let me stare at that code some more.

Thanks,

tglx

2021-08-09 12:32:15

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [patch V3 56/64] futex: Correct the number of requeued waiters for PI

On Mon, Aug 09 2021 at 10:18, Thomas Gleixner wrote:
> On Sun, Aug 08 2021 at 10:05, Davidlohr Bueso wrote:
>> On Thu, 05 Aug 2021, Thomas Gleixner wrote:
>>
>>>From: Thomas Gleixner <[email protected]>
>>>
>>>The accounting is wrong when either the PI sanity check or the
>>>requeue PI operation fails. Adjust it in the failure path.
>>
>> Ok fortunately these accounting errors are benign considering they
>> are in error paths. This also made me wonder about the requeue PI
>> top-waiter wakeup from futex_proxy_trylock_atomic(), which is always
>> required with nr_wakers == 1. We account for it on the successful
>> case we acquired the lock on it's behalf (and thus requeue_pi_wake_futex
>> was called), but if the corresponding lookup_pi_state fails, we'll retry.
>> So, shouldn't the task_count++ only be considered when we know the
>> requeueing is next (a successful top_waiter acquiring the lock+pi state)?
>>
>> @@ -2260,7 +2260,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
>> */
>> if (ret > 0) {
>> WARN_ON(pi_state);
>> - task_count++;
>> /*
>> * If we acquired the lock, then the user space value
>> * of uaddr2 should be vpid. It cannot be changed by
>> @@ -2275,6 +2274,8 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
>> */
>> ret = lookup_pi_state(uaddr2, ret, hb2, &key2,
>> &pi_state, &exiting);
>> + if (!ret)
>> + task_count++;
>> }
>
> Yes, but if futex_proxy_trylock_atomic() succeeds and lookup_pi_state()
> fails then the user space futex value is already the VPID of the top
> waiter and a retry will then fail the futex_proxy_trylock_atomic().

Actually lookup_pi_state() cannot fail here.

If futex_proxy_trylock_atomic() takes the user space futex then there
are no waiters on futex2 and the task for which the proxy trylock
acquired futex2 in the user space storage cannot be exiting because it's
still enqueued on futex1.

That means lookup_pi_state() will take the attach_to_pi_owner() path and
that will succeed because VPID belongs to an alive task.

What's wrong in that code though is the condition further up:

if (requeue_pi && (task_count - nr_wake < nr_requeue)) {

nr_wake has to be 1 for requeue PI. For the first round task_count is 0
which means the condition is true for any value of nr_requeue >= 0.

It does not really matter because none of the code below that runs the
retry path, but it's at least confusing as hell.

Let me fix all of that muck.

Thanks,

tglx