2010-06-23 09:13:15

by Michal Hocko

[permalink] [raw]
Subject: futex: race in lock and unlock&exit for robust futex with PI?

Hi,

attached you can find a simple test case which fails quite easily on the
following glibc assert:
"SharedMutexTest: pthread_mutex_lock.c:289: __pthread_mutex_lock:
Assertion `(-(e)) != 3 || !robust' failed." "

AFAIU, this assertion says that futex syscall cannot fail with ESRCH
for robust futex because it should either succeed or fail with
EOWNERDEAD.

We have seen this problem on SLES11 and SLES11SP1 but I was able to
reproduce it with the 2.6.34 kernel as well.

The test case is quite easy.

Executed with a parameter it creates a test file and initializes shared,
robust pthread mutex (optionaly compile time configured with priority
inheritance) backed by the mmapped test file. Without a parameter it
mmaps the file and just locks, unlocks mutex and checks for EOWNERDEAD
(this should never happen during the test as the process never dies with
the lock held) in the loop.

If I run this application for multiple users in parallel I can see the
above assertion. However, if priority inheritance is turned off then
there is no problem. I am not able to reproduce also if the test case is
run under a single user.

I am using the attached runSimple.sh script to run the test case like
this:

rm test.file simple
for i in `seq 10`
do
sh runSimple.sh
done

To disable IP just comment out USE_PI variable in the script.
You need to change USER1 and USER2 variables to match you system. You
will need to run the script as root if you do not set any special
setting to run su on behalf of those users.

I have tried to look at futex_{un}lock_pi but it is really hard to
understand. I assume that lookup_pi_state is the one which sets ESRCH
after it is not able to find the pid of the current owner.

This would suggest that we are racing with the unlock of the current
lock holder but I don't see how is this possible as both lock and unlock
paths hold fshared lock for all operations over the lock value. I have
noticed that the lock path drops fshared if the current holder is dying
but then it retries the whole process again.

Any advice would be highly appreciated.

Let me know if you need any further information

Thanks
--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic


Attachments:
(No filename) (2.19 kB)
runSimple.sh (951.00 B)
simple.c (2.59 kB)
Download all attachments

2010-06-25 02:42:56

by Darren Hart

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On 06/23/2010 02:13 AM, Michal Hocko wrote:
> Hi,

Hi Michal,

Thanks for reporting the issue and providing a testcase.

>
> attached you can find a simple test case which fails quite easily on the
> following glibc assert:
> "SharedMutexTest: pthread_mutex_lock.c:289: __pthread_mutex_lock:
> Assertion `(-(e)) != 3 || !robust' failed." "

I've run runSimple.sh in a tight loop for a couple hours (about 2k
iterations so far) and haven't seen anything other than "Here we go"
printed to the console.

I had to add -D_GNU_SOURCE to get it to build on my system (RHEL5.2 +
2.6.34). Perhaps this is just a difference in the toolchain.

> AFAIU, this assertion says that futex syscall cannot fail with ESRCH
> for robust futex because it should either succeed or fail with
> EOWNERDEAD.

I'll have to think on that and review the libc source. We do need to
confirm that the assert is even doing the right thing.

>
> We have seen this problem on SLES11 and SLES11SP1 but I was able to
> reproduce it with the 2.6.34 kernel as well.

What kind of system are you seeing this on? I've been running on a 4-way
x86_64 blade.

> The test case is quite easy.
>
> Executed with a parameter it creates a test file and initializes shared,
> robust pthread mutex (optionaly compile time configured with priority
> inheritance) backed by the mmapped test file. Without a parameter it
> mmaps the file and just locks, unlocks mutex and checks for EOWNERDEAD
> (this should never happen during the test as the process never dies with
> the lock held) in the loop.

Have you found the PI parameter to be required for reproducing the
error? From the comments below I'm assuming so... just want to be sure.

>
> If I run this application for multiple users in parallel I can see the
> above assertion. However, if priority inheritance is turned off then
> there is no problem. I am not able to reproduce also if the test case is
> run under a single user.
>
> I am using the attached runSimple.sh script to run the test case like
> this:
>
> rm test.file simple
> for i in `seq 10`
> do
> sh runSimple.sh
> done
>
> To disable IP just comment out USE_PI variable in the script.
> You need to change USER1 and USER2 variables to match you system. You
> will need to run the script as root if you do not set any special
> setting to run su on behalf of those users.
>
> I have tried to look at futex_{un}lock_pi but it is really hard to
> understand.

*grin* tell me about it...

See Documentation/pi-futex.txt if you haven't already.

> I assume that lookup_pi_state is the one which sets ESRCH
> after it is not able to find the pid of the current owner.
>
> This would suggest that we are racing with the unlock of the current
> lock holder but I don't see how is this possible as both lock and unlock
> paths hold fshared lock for all operations over the lock value. I have
> noticed that the lock path drops fshared if the current holder is dying
> but then it retries the whole process again.
>
> Any advice would be highly appreciated.

If I can reproduce this I should be able to get some trace points in
there to get a better idea of the execution path leading up to the problem.

This would be a great time to have those futex fault injection patches...


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

2010-06-25 08:27:16

by Michal Hocko

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On Thu 24-06-10 19:42:50, Darren Hart wrote:
> On 06/23/2010 02:13 AM, Michal Hocko wrote:
> >Hi,

Hi,

>
> Hi Michal,
>
> Thanks for reporting the issue and providing a testcase.
>
> >
> >attached you can find a simple test case which fails quite easily on the
> >following glibc assert:
> >"SharedMutexTest: pthread_mutex_lock.c:289: __pthread_mutex_lock:
> > Assertion `(-(e)) != 3 || !robust' failed." "
>
> I've run runSimple.sh in a tight loop for a couple hours (about 2k
> iterations so far) and haven't seen anything other than "Here we go"
> printed to the console.

Maybe a higher load on CPUs would help (busy loop on other CPUs).

>
> I had to add -D_GNU_SOURCE to get it to build on my system (RHEL5.2
> + 2.6.34). Perhaps this is just a difference in the toolchain.

I assume that you got PTHREAD_PRIO_INHERIT undeclared error, don't you?
I have hacked around that by #define __USE_UNIX98 which worked on Debian
and OpenSuse. But you are right _GNU_SOURCE is definitely better
solution.

>
> >AFAIU, this assertion says that futex syscall cannot fail with ESRCH
> >for robust futex because it should either succeed or fail with
> >EOWNERDEAD.
>
> I'll have to think on that and review the libc source. We do need to
> confirm that the assert is even doing the right thing.

Sure. I have looked through the glibc lock implementation and it makes
quite a good sense to me. A robust lock should never return with ESRCH.

>
> >
> >We have seen this problem on SLES11 and SLES11SP1 but I was able to
> >reproduce it with the 2.6.34 kernel as well.
>
> What kind of system are you seeing this on? I've been running on a
> 4-way x86_64 blade.

* Debian (squeeze/sid) with
- Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz
- kernel: vanilla 2.6.34
- glibc: 2.11.1-3
- i386

* OpenSuse 11.2 with
- Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz
- kernel: distribution 2.6.31.12-0.2-desktop
- glibc: 2.10.1-10.5.1
- i386

* SLES11SP1
- Dual-Core AMD Opteron(tm) Processor 1218
- kernel: distribution 2.6.32.12-0.3-default
- glibc: 2.11.1-0.17.4
- x86_64

Each box shows a different number of asserts during 10 iterations.

>
> >The test case is quite easy.
> >
> >Executed with a parameter it creates a test file and initializes shared,
> >robust pthread mutex (optionaly compile time configured with priority
> >inheritance) backed by the mmapped test file. Without a parameter it
> >mmaps the file and just locks, unlocks mutex and checks for EOWNERDEAD
> >(this should never happen during the test as the process never dies with
> >the lock held) in the loop.
>
> Have you found the PI parameter to be required for reproducing the
> error? From the comments below I'm assuming so... just want to be
> sure.

Yes. If you comment out USE_PI variable in the script the problem is not
shown at all.

>
> >
> >If I run this application for multiple users in parallel I can see the
> >above assertion. However, if priority inheritance is turned off then
> >there is no problem. I am not able to reproduce also if the test case is
> >run under a single user.
> >
> >I am using the attached runSimple.sh script to run the test case like
> >this:
> >
> >rm test.file simple
> >for i in `seq 10`
> >do
> > sh runSimple.sh
> >done
> >
> >To disable IP just comment out USE_PI variable in the script.
> >You need to change USER1 and USER2 variables to match you system. You
> >will need to run the script as root if you do not set any special
> >setting to run su on behalf of those users.
> >
> >I have tried to look at futex_{un}lock_pi but it is really hard to
> >understand.
>
> *grin* tell me about it...
>
> See Documentation/pi-futex.txt if you haven't already.

Will do.

>
> >I assume that lookup_pi_state is the one which sets ESRCH
> >after it is not able to find the pid of the current owner.
> >
> >This would suggest that we are racing with the unlock of the current
> >lock holder but I don't see how is this possible as both lock and unlock
> >paths hold fshared lock for all operations over the lock value. I have
> >noticed that the lock path drops fshared if the current holder is dying
> >but then it retries the whole process again.
> >
> >Any advice would be highly appreciated.
>
> If I can reproduce this I should be able to get some trace points in
> there to get a better idea of the execution path leading up to the
> problem.

Please make sure that you run the test case with two different users. I
couldn't reproduce the issue with a single user.

If you have some ideas about patches which I could try then just pass it
to me.

>
> This would be a great time to have those futex fault injection patches...
>
>
> --
> Darren Hart
> IBM Linux Technology Center
> Real-Time Linux Team

Thanks for looking into it.
--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-25 17:54:21

by Darren Hart

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On 06/25/2010 01:27 AM, Michal Hocko wrote:
> On Thu 24-06-10 19:42:50, Darren Hart wrote:
>> On 06/23/2010 02:13 AM, Michal Hocko wrote:
>>> Hi,
>
> Hi,
>
>>
>> Hi Michal,
>>
>> Thanks for reporting the issue and providing a testcase.
>>
>>>
>>> attached you can find a simple test case which fails quite easily on the
>>> following glibc assert:
>>> "SharedMutexTest: pthread_mutex_lock.c:289: __pthread_mutex_lock:
>>> Assertion `(-(e)) != 3 || !robust' failed." "
>>
>> I've run runSimple.sh in a tight loop for a couple hours (about 2k
>> iterations so far) and haven't seen anything other than "Here we go"
>> printed to the console.
>
> Maybe a higher load on CPUs would help (busy loop on other CPUs).

Must have been a build issue. I can reproduce _something_ now. Within 10
iterations of runSimple.sh the test hangs. ps shows all the simple
processes sitting in pause.

(gdb) bt
#0 0x0000003c0060e030 in __pause_nocancel () from /lib64/libpthread.so.0
#1 0x0000003c006085fc in __pthread_mutex_lock_full ()
from /lib64/libpthread.so.0
#2 0x0000000000400cd6 in main (argc=1, argv=0x7fffc016e508) at simple.c:101

There is only one call to pause* in pthread_mutex_lock.c: (line ~316):

/* ESRCH can happen only for non-robust PI mutexes where
the owner of the lock died. */
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);

/* Delay the thread indefinitely. */
while (1)
pause_not_cancel ();

Right now I'm thinking that NDEBUG is set in my build for whatever
reason, but I think I'm seeing the same issue you are. I'll review the
futex code and prepare a trace patch and see if I can reproduce with that.

Note: confirmed, the glibc rpm has -DNDEBUG=1

--
Darren

>
>>
>> I had to add -D_GNU_SOURCE to get it to build on my system (RHEL5.2
>> + 2.6.34). Perhaps this is just a difference in the toolchain.
>
> I assume that you got PTHREAD_PRIO_INHERIT undeclared error, don't you?
> I have hacked around that by #define __USE_UNIX98 which worked on Debian
> and OpenSuse. But you are right _GNU_SOURCE is definitely better
> solution.
>
>>
>>> AFAIU, this assertion says that futex syscall cannot fail with ESRCH
>>> for robust futex because it should either succeed or fail with
>>> EOWNERDEAD.
>>
>> I'll have to think on that and review the libc source. We do need to
>> confirm that the assert is even doing the right thing.
>
> Sure. I have looked through the glibc lock implementation and it makes
> quite a good sense to me. A robust lock should never return with ESRCH.
>
>>
>>>
>>> We have seen this problem on SLES11 and SLES11SP1 but I was able to
>>> reproduce it with the 2.6.34 kernel as well.
>>
>> What kind of system are you seeing this on? I've been running on a
>> 4-way x86_64 blade.
>
> * Debian (squeeze/sid) with
> - Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz
> - kernel: vanilla 2.6.34
> - glibc: 2.11.1-3
> - i386
>
> * OpenSuse 11.2 with
> - Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz
> - kernel: distribution 2.6.31.12-0.2-desktop
> - glibc: 2.10.1-10.5.1
> - i386
>
> * SLES11SP1
> - Dual-Core AMD Opteron(tm) Processor 1218
> - kernel: distribution 2.6.32.12-0.3-default
> - glibc: 2.11.1-0.17.4
> - x86_64
>
> Each box shows a different number of asserts during 10 iterations.
>
>>
>>> The test case is quite easy.
>>>
>>> Executed with a parameter it creates a test file and initializes shared,
>>> robust pthread mutex (optionaly compile time configured with priority
>>> inheritance) backed by the mmapped test file. Without a parameter it
>>> mmaps the file and just locks, unlocks mutex and checks for EOWNERDEAD
>>> (this should never happen during the test as the process never dies with
>>> the lock held) in the loop.
>>
>> Have you found the PI parameter to be required for reproducing the
>> error? From the comments below I'm assuming so... just want to be
>> sure.
>
> Yes. If you comment out USE_PI variable in the script the problem is not
> shown at all.
>
>>
>>>
>>> If I run this application for multiple users in parallel I can see the
>>> above assertion. However, if priority inheritance is turned off then
>>> there is no problem. I am not able to reproduce also if the test case is
>>> run under a single user.
>>>
>>> I am using the attached runSimple.sh script to run the test case like
>>> this:
>>>
>>> rm test.file simple
>>> for i in `seq 10`
>>> do
>>> sh runSimple.sh
>>> done
>>>
>>> To disable IP just comment out USE_PI variable in the script.
>>> You need to change USER1 and USER2 variables to match you system. You
>>> will need to run the script as root if you do not set any special
>>> setting to run su on behalf of those users.
>>>
>>> I have tried to look at futex_{un}lock_pi but it is really hard to
>>> understand.
>>
>> *grin* tell me about it...
>>
>> See Documentation/pi-futex.txt if you haven't already.
>
> Will do.
>
>>
>>> I assume that lookup_pi_state is the one which sets ESRCH
>>> after it is not able to find the pid of the current owner.
>>>
>>> This would suggest that we are racing with the unlock of the current
>>> lock holder but I don't see how is this possible as both lock and unlock
>>> paths hold fshared lock for all operations over the lock value. I have
>>> noticed that the lock path drops fshared if the current holder is dying
>>> but then it retries the whole process again.
>>>
>>> Any advice would be highly appreciated.
>>
>> If I can reproduce this I should be able to get some trace points in
>> there to get a better idea of the execution path leading up to the
>> problem.
>
> Please make sure that you run the test case with two different users. I
> couldn't reproduce the issue with a single user.
>
> If you have some ideas about patches which I could try then just pass it
> to me.
>
>>
>> This would be a great time to have those futex fault injection patches...
>>
>>
>> --
>> Darren Hart
>> IBM Linux Technology Center
>> Real-Time Linux Team
>
> Thanks for looking into it.


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

2010-06-25 23:35:23

by Darren Hart

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On 06/25/2010 10:53 AM, Darren Hart wrote:
> On 06/25/2010 01:27 AM, Michal Hocko wrote:
>> On Thu 24-06-10 19:42:50, Darren Hart wrote:
>>> On 06/23/2010 02:13 AM, Michal Hocko wrote:

>>>> attached you can find a simple test case which fails quite easily on
>>>> the
>>>> following glibc assert:
>>>> "SharedMutexTest: pthread_mutex_lock.c:289: __pthread_mutex_lock:
>>>> Assertion `(-(e)) != 3 || !robust' failed." "
>>>
>>> I've run runSimple.sh in a tight loop for a couple hours (about 2k
>>> iterations so far) and haven't seen anything other than "Here we go"
>>> printed to the console.
>>
>> Maybe a higher load on CPUs would help (busy loop on other CPUs).
>
> Must have been a build issue. I can reproduce _something_ now. Within 10
> iterations of runSimple.sh the test hangs. ps shows all the simple
> processes sitting in pause.
>
> (gdb) bt
> #0 0x0000003c0060e030 in __pause_nocancel () from /lib64/libpthread.so.0
> #1 0x0000003c006085fc in __pthread_mutex_lock_full ()
> from /lib64/libpthread.so.0
> #2 0x0000000000400cd6 in main (argc=1, argv=0x7fffc016e508) at simple.c:101
>
> There is only one call to pause* in pthread_mutex_lock.c: (line ~316):
>
> /* ESRCH can happen only for non-robust PI mutexes where
> the owner of the lock died. */
> assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
>
> /* Delay the thread indefinitely. */
> while (1)
> pause_not_cancel ();
>
> Right now I'm thinking that NDEBUG is set in my build for whatever
> reason, but I think I'm seeing the same issue you are. I'll review the
> futex code and prepare a trace patch and see if I can reproduce with that.
>
> Note: confirmed, the glibc rpm has -DNDEBUG=1

The simple tracing patch (below) confirms that we are indeed returning
-ESRCH to userspace from futex_lock_pi(). Notice that the pids of the
two "simple" processes lingering after the runSimple.sh script are the
ones that return -ESRCH to userspace, and therefor end up in the
pause_not_cancel() trap inside glibc.

# trace-cmd record -p nop ./runSimple.sh
<snip>

# ps -eLo pid,comm,wchan | grep "simple "
20636 simple pause
20876 simple pause

# trace-cmd report
version = 6
CPU 0 is empty
cpus=4
field->offset = 24 size=8
<...>-20636 [003] 1778.965860: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH
<...>-20636 [003] 1778.965865: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
<...>-20636 [003] 1778.965866: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
>>---> <...>-20636 [003] 1778.965867: bprint: futex_lock_pi : returning -ESRCH to userspace
<...>-20876 [001] 1780.199394: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
<...>-20876 [001] 1780.199400: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH
<...>-20876 [001] 1780.199401: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
<...>-20876 [001] 1780.199402: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
>>---> <...>-20876 [001] 1780.199403: bprint: futex_lock_pi : returning -ESRCH to userspace
<...>-21316 [002] 1782.300695: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
<...>-21316 [002] 1782.300698: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying

Attaching gdb to 20636, we can see the state of the mutex:
(gdb) print (struct __pthread_mutex_s)*mutex
$1 = {__lock = 0, __count = 1, __owner = 0, __nusers = 0, __kind = 176, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}

This is consistent with hex dump of the first bits of the backing file:
# xxd test.file | head -n 3
0000000: 0000 0000 0100 0000 0000 0000 0000 0000 ................
0000010: b000 0000 0000 0000 0000 0000 0000 0000 ................
0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................

The futex (__lock) value is 0, indicating it is unlocked and has no waiters. The count being 1 however suggests a task has acquired it once, which, if I read the glibc source correctly, means the owner field and __lock fields should not be 0. This supports Michal's thought about lock racing with unlock, seeing it's held, then unable to find the owner (pi_state) as it has since been unlocked. Possibly some horkage with the WAITERS bit leading to glibc performing atomic acquistions/releases and rendering the mutex inconsistent with the kernel's view. This should be protected against, but that is the direction I am going to start looking.

--
Darren Hart

>From 92014a07df73489460ff788274506255ff0f775d Mon Sep 17 00:00:00 2001
From: Darren Hart <[email protected]>
Date: Fri, 25 Jun 2010 13:54:25 -0700
Subject: [PATCH] robust pi futex tracing

---
kernel/futex.c | 24 ++++++++++++++++++++----
1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e7a35f1..24ac437 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -683,6 +683,8 @@ retry:
*/
if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
/* Keep the OWNER_DIED bit */
+ if (ownerdied)
+ trace_printk("ownerdied, taking over lock\n");
newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task);
ownerdied = 0;
lock_taken = 1;
@@ -692,14 +694,18 @@ retry:

if (unlikely(curval == -EFAULT))
return -EFAULT;
- if (unlikely(curval != uval))
+ if (unlikely(curval != uval)) {
+ trace_printk("cmpxchg failed, retrying\n");
goto retry;
+ }

/*
* We took the lock due to owner died take over.
*/
- if (unlikely(lock_taken))
+ if (unlikely(lock_taken)) {
+ trace_printk("ownerdied, lock acquired, return 1\n");
return 1;
+ }

/*
* We dont have the lock. Look up the PI state (or create it if
@@ -710,13 +716,16 @@ retry:
if (unlikely(ret)) {
switch (ret) {
case -ESRCH:
+ trace_printk("lookup_pi_state: -ESRCH\n");
/*
* No owner found for this futex. Check if the
* OWNER_DIED bit is set to figure out whether
* this is a robust futex or not.
*/
- if (get_futex_value_locked(&curval, uaddr))
+ if (get_futex_value_locked(&curval, uaddr)) {
+ trace_printk("get_futex_value_locked: -EFAULT\n");
return -EFAULT;
+ }

/*
* We simply start over in case of a robust
@@ -724,10 +733,13 @@ retry:
* and return happy.
*/
if (curval & FUTEX_OWNER_DIED) {
+ trace_printk("ownerdied, goto retry\n");
ownerdied = 1;
goto retry;
}
+ trace_printk("ownerdied not detected, returning -ESRCH\n");
default:
+ trace_printk("lookup_pi_state: %d\n", ret);
break;
}
}
@@ -1950,6 +1962,8 @@ retry_private:
put_futex_key(fshared, &q.key);
cond_resched();
goto retry;
+ case -ESRCH:
+ trace_printk("returning -ESRCH to userspace\n");
default:
goto out_unlock_put_key;
}
@@ -2537,8 +2551,10 @@ void exit_robust_list(struct task_struct *curr)
/*
* Avoid excessively long or circular lists:
*/
- if (!--limit)
+ if (!--limit) {
+ trace_printk("excessively long list, aborting\n");
break;
+ }

cond_resched();
}
--
1.7.0.4

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

2010-06-28 14:42:57

by Michal Hocko

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

Hi Darren,

On Fri 25-06-10 16:35:14, Darren Hart wrote:
[...]
> # trace-cmd record -p nop ./runSimple.sh
> <snip>
>
> # ps -eLo pid,comm,wchan | grep "simple "
> 20636 simple pause
> 20876 simple pause
>
> # trace-cmd report
> version = 6
> CPU 0 is empty
> cpus=4
> field->offset = 24 size=8
> <...>-20636 [003] 1778.965860: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH
> <...>-20636 [003] 1778.965865: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
> <...>-20636 [003] 1778.965866: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
> >>---> <...>-20636 [003] 1778.965867: bprint: futex_lock_pi : returning -ESRCH to userspace
> <...>-20876 [001] 1780.199394: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
> <...>-20876 [001] 1780.199400: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH
> <...>-20876 [001] 1780.199401: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
> <...>-20876 [001] 1780.199402: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
> >>---> <...>-20876 [001] 1780.199403: bprint: futex_lock_pi : returning -ESRCH to userspace
> <...>-21316 [002] 1782.300695: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
> <...>-21316 [002] 1782.300698: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
>
[...]

I have updated the test case slightly (reduced the number of lock/unlock
cycles to 1).

Then, I have used the additional patch (see bellow) on top of the one
you have posted and here is the log I am getting:

version = 6
cpus=2
field->offset = 16 size=4
<...>-13232 [001] 226.693880: bprint: do_futex : futex_lock_pi start
<...>-13232 [001] 226.693886: bprint: do_futex : futex_lock_pi done ret=0
<...>-13235 [001] 226.700204: bprint: do_futex : futex_lock_pi start
<...>-13235 [001] 226.700210: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=13242
<...>-13235 [001] 226.700211: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
<...>-13235 [001] 226.700211: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
<...>-13235 [001] 226.700212: bprint: futex_lock_pi : returning -ESRCH to userspace
<...>-13235 [001] 226.700212: bprint: do_futex : futex_lock_pi done ret=-3
<...>-13240 [000] 226.705574: bprint: do_futex : futex_lock_pi start
<...>-13240 [000] 226.705580: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=13242
<...>-13240 [000] 226.705581: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
<...>-13240 [000] 226.705582: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
<...>-13240 [000] 226.705582: bprint: futex_lock_pi : returning -ESRCH to userspace
<...>-13240 [000] 226.705583: bprint: do_futex : futex_lock_pi done ret=-3
<...>-13231 [000] 226.708095: bprint: do_futex : futex_lock_pi start
<...>-13231 [000] 226.708101: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=13242
<...>-13231 [000] 226.708102: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
<...>-13231 [000] 226.708102: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
<...>-13231 [000] 226.708103: bprint: futex_lock_pi : returning -ESRCH to userspace
<...>-13231 [000] 226.708103: bprint: do_futex : futex_lock_pi done ret=-3
<...>-13242 [001] 226.709246: bprint: do_futex : futex_unlock_pi start
<...>-13242 [001] 226.709249: bprint: do_futex : futex_unlock_pi: TID->0 transition 2147496890
<...>-13242 [001] 226.709250: bprint: do_futex : futex_unlock_pi: no waiters, unlock the futex ret=0 uval=-2147470406
<...>-13242 [001] 226.709250: bprint: do_futex : futex_unlock_pi done ret=0

As you can see lookup_pi_state fails for the pid (13242) which is at the very
bottom and that is unlocking the futex. This smells fishy to me. I can
see this pattern consistently for all failures. Maybe I am doing
something wrong or the timestamps are not precise enough but from what I
can see this looks like a bug in lookup_pi_state which doesn't find an
existing PID.

--
>From 733816347db91670f27d206382b8c2e57e5ef125 Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Mon, 28 Jun 2010 13:42:29 +0200
Subject: [PATCH] futex pi unlock tracing added

---
kernel/futex.c | 24 +++++++++++++++++++-----
1 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 24ac437..d114fee 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -716,7 +716,8 @@ retry:
if (unlikely(ret)) {
switch (ret) {
case -ESRCH:
- trace_printk("lookup_pi_state: -ESRCH\n");
+ trace_printk("lookup_pi_state: -ESRCH for pid=%u\n",
+ uval & FUTEX_TID_MASK);
/*
* No owner found for this futex. Check if the
* OWNER_DIED bit is set to figure out whether
@@ -2070,8 +2071,10 @@ retry:
* again. If it succeeds then we can return without waking
* anyone else up:
*/
- if (!(uval & FUTEX_OWNER_DIED))
+ if (!(uval & FUTEX_OWNER_DIED)) {
uval = cmpxchg_futex_value_locked(uaddr, task_pid_vnr(current), 0);
+ trace_printk("futex_unlock_pi: TID->0 transition %u\n", uval);
+ }


if (unlikely(uval == -EFAULT))
@@ -2080,8 +2083,10 @@ retry:
* Rare case: we managed to release the lock atomically,
* no need to wake anyone else up:
*/
- if (unlikely(uval == task_pid_vnr(current)))
+ if (unlikely(uval == task_pid_vnr(current))) {
+ trace_printk("futex_unlock_pi: release without wakeup\n");
goto out_unlock;
+ }

/*
* Ok, other tasks may need to be woken up - check waiters
@@ -2093,6 +2098,7 @@ retry:
if (!match_futex (&this->key, &key))
continue;
ret = wake_futex_pi(uaddr, uval, this);
+ trace_printk("futex_unlock_pi: wake ret=%d uval=%u this=%p\n", ret, uval, this);
/*
* The atomic access to the futex value
* generated a pagefault, so retry the
@@ -2107,6 +2113,8 @@ retry:
*/
if (!(uval & FUTEX_OWNER_DIED)) {
ret = unlock_futex_pi(uaddr, uval);
+ trace_printk("futex_unlock_pi: no waiters, unlock the futex ret=%d uval=%d\n",
+ ret, uval);
if (ret == -EFAULT)
goto pi_faulted;
}
@@ -2600,12 +2608,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
break;
case FUTEX_LOCK_PI:
- if (futex_cmpxchg_enabled)
+ if (futex_cmpxchg_enabled) {
+ trace_printk("futex_lock_pi start\n");
ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
+ trace_printk("futex_lock_pi done ret=%d\n", ret);
+ }
break;
case FUTEX_UNLOCK_PI:
- if (futex_cmpxchg_enabled)
+ if (futex_cmpxchg_enabled) {
+ trace_printk("futex_unlock_pi start\n");
ret = futex_unlock_pi(uaddr, fshared);
+ trace_printk("futex_unlock_pi done ret=%d\n", ret);
+ }
break;
case FUTEX_TRYLOCK_PI:
if (futex_cmpxchg_enabled)
--
1.7.1

>
> --
> Darren Hart
>
> From 92014a07df73489460ff788274506255ff0f775d Mon Sep 17 00:00:00 2001
> From: Darren Hart <[email protected]>
> Date: Fri, 25 Jun 2010 13:54:25 -0700
> Subject: [PATCH] robust pi futex tracing
>
> ---
> kernel/futex.c | 24 ++++++++++++++++++++----
> 1 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index e7a35f1..24ac437 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -683,6 +683,8 @@ retry:
> */
> if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
> /* Keep the OWNER_DIED bit */
> + if (ownerdied)
> + trace_printk("ownerdied, taking over lock\n");
> newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task);
> ownerdied = 0;
> lock_taken = 1;
> @@ -692,14 +694,18 @@ retry:
>
> if (unlikely(curval == -EFAULT))
> return -EFAULT;
> - if (unlikely(curval != uval))
> + if (unlikely(curval != uval)) {
> + trace_printk("cmpxchg failed, retrying\n");
> goto retry;
> + }
>
> /*
> * We took the lock due to owner died take over.
> */
> - if (unlikely(lock_taken))
> + if (unlikely(lock_taken)) {
> + trace_printk("ownerdied, lock acquired, return 1\n");
> return 1;
> + }
>
> /*
> * We dont have the lock. Look up the PI state (or create it if
> @@ -710,13 +716,16 @@ retry:
> if (unlikely(ret)) {
> switch (ret) {
> case -ESRCH:
> + trace_printk("lookup_pi_state: -ESRCH\n");
> /*
> * No owner found for this futex. Check if the
> * OWNER_DIED bit is set to figure out whether
> * this is a robust futex or not.
> */
> - if (get_futex_value_locked(&curval, uaddr))
> + if (get_futex_value_locked(&curval, uaddr)) {
> + trace_printk("get_futex_value_locked: -EFAULT\n");
> return -EFAULT;
> + }
>
> /*
> * We simply start over in case of a robust
> @@ -724,10 +733,13 @@ retry:
> * and return happy.
> */
> if (curval & FUTEX_OWNER_DIED) {
> + trace_printk("ownerdied, goto retry\n");
> ownerdied = 1;
> goto retry;
> }
> + trace_printk("ownerdied not detected, returning -ESRCH\n");
> default:
> + trace_printk("lookup_pi_state: %d\n", ret);
> break;
> }
> }
> @@ -1950,6 +1962,8 @@ retry_private:
> put_futex_key(fshared, &q.key);
> cond_resched();
> goto retry;
> + case -ESRCH:
> + trace_printk("returning -ESRCH to userspace\n");
> default:
> goto out_unlock_put_key;
> }
> @@ -2537,8 +2551,10 @@ void exit_robust_list(struct task_struct *curr)
> /*
> * Avoid excessively long or circular lists:
> */
> - if (!--limit)
> + if (!--limit) {
> + trace_printk("excessively long list, aborting\n");
> break;
> + }
>
> cond_resched();
> }
> --
> 1.7.0.4
>
> --
> Darren Hart
> IBM Linux Technology Center
> Real-Time Linux Team

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-28 14:57:08

by Darren Hart

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On 06/28/2010 07:42 AM, Michal Hocko wrote:
> Hi Darren,
>
> On Fri 25-06-10 16:35:14, Darren Hart wrote:
> [...]
>> # trace-cmd record -p nop ./runSimple.sh
>> <snip>
>>
>> # ps -eLo pid,comm,wchan | grep "simple "
>> 20636 simple pause
>> 20876 simple pause
>>
>> # trace-cmd report
>> version = 6
>> CPU 0 is empty
>> cpus=4
>> field->offset = 24 size=8
>> <...>-20636 [003] 1778.965860: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH
>> <...>-20636 [003] 1778.965865: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
>> <...>-20636 [003] 1778.965866: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
>>>> ---> <...>-20636 [003] 1778.965867: bprint: futex_lock_pi : returning -ESRCH to userspace
>> <...>-20876 [001] 1780.199394: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
>> <...>-20876 [001] 1780.199400: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH
>> <...>-20876 [001] 1780.199401: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
>> <...>-20876 [001] 1780.199402: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
>>>> ---> <...>-20876 [001] 1780.199403: bprint: futex_lock_pi : returning -ESRCH to userspace
>> <...>-21316 [002] 1782.300695: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
>> <...>-21316 [002] 1782.300698: bprint: futex_lock_pi_atomic : cmpxchg failed, retrying
>>
> [...]
>
> I have updated the test case slightly (reduced the number of lock/unlock
> cycles to 1).
>
> Then, I have used the additional patch (see bellow) on top of the one
> you have posted and here is the log I am getting:


Interesting. I'm going to start pouring over lookup_pi_state() and
futex_lock_pi() to see if I can spot any races. I'm also wondering about
glibc's usage of the FUTEX_WAITERS bit.

While I investigate that, could you do a git bisect from 2.6.31 (after
which a lot of the most recent futex changes hit) and see if you can
identify a particular patch, or even kernel version, where this broke?

Thanks,

Darren

>
> version = 6
> cpus=2
> field->offset = 16 size=4
> <...>-13232 [001] 226.693880: bprint: do_futex : futex_lock_pi start
> <...>-13232 [001] 226.693886: bprint: do_futex : futex_lock_pi done ret=0
> <...>-13235 [001] 226.700204: bprint: do_futex : futex_lock_pi start
> <...>-13235 [001] 226.700210: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=13242
> <...>-13235 [001] 226.700211: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
> <...>-13235 [001] 226.700211: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
> <...>-13235 [001] 226.700212: bprint: futex_lock_pi : returning -ESRCH to userspace
> <...>-13235 [001] 226.700212: bprint: do_futex : futex_lock_pi done ret=-3
> <...>-13240 [000] 226.705574: bprint: do_futex : futex_lock_pi start
> <...>-13240 [000] 226.705580: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=13242
> <...>-13240 [000] 226.705581: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
> <...>-13240 [000] 226.705582: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
> <...>-13240 [000] 226.705582: bprint: futex_lock_pi : returning -ESRCH to userspace
> <...>-13240 [000] 226.705583: bprint: do_futex : futex_lock_pi done ret=-3
> <...>-13231 [000] 226.708095: bprint: do_futex : futex_lock_pi start
> <...>-13231 [000] 226.708101: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=13242
> <...>-13231 [000] 226.708102: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
> <...>-13231 [000] 226.708102: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
> <...>-13231 [000] 226.708103: bprint: futex_lock_pi : returning -ESRCH to userspace
> <...>-13231 [000] 226.708103: bprint: do_futex : futex_lock_pi done ret=-3
> <...>-13242 [001] 226.709246: bprint: do_futex : futex_unlock_pi start
> <...>-13242 [001] 226.709249: bprint: do_futex : futex_unlock_pi: TID->0 transition 2147496890
> <...>-13242 [001] 226.709250: bprint: do_futex : futex_unlock_pi: no waiters, unlock the futex ret=0 uval=-2147470406
> <...>-13242 [001] 226.709250: bprint: do_futex : futex_unlock_pi done ret=0
>
> As you can see lookup_pi_state fails for the pid (13242) which is at the very
> bottom and that is unlocking the futex. This smells fishy to me. I can
> see this pattern consistently for all failures. Maybe I am doing
> something wrong or the timestamps are not precise enough but from what I
> can see this looks like a bug in lookup_pi_state which doesn't find an
> existing PID.
>
> --
> From 733816347db91670f27d206382b8c2e57e5ef125 Mon Sep 17 00:00:00 2001
> From: Michal Hocko<[email protected]>
> Date: Mon, 28 Jun 2010 13:42:29 +0200
> Subject: [PATCH] futex pi unlock tracing added
>
> ---
> kernel/futex.c | 24 +++++++++++++++++++-----
> 1 files changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 24ac437..d114fee 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -716,7 +716,8 @@ retry:
> if (unlikely(ret)) {
> switch (ret) {
> case -ESRCH:
> - trace_printk("lookup_pi_state: -ESRCH\n");
> + trace_printk("lookup_pi_state: -ESRCH for pid=%u\n",
> + uval& FUTEX_TID_MASK);
> /*
> * No owner found for this futex. Check if the
> * OWNER_DIED bit is set to figure out whether
> @@ -2070,8 +2071,10 @@ retry:
> * again. If it succeeds then we can return without waking
> * anyone else up:
> */
> - if (!(uval& FUTEX_OWNER_DIED))
> + if (!(uval& FUTEX_OWNER_DIED)) {
> uval = cmpxchg_futex_value_locked(uaddr, task_pid_vnr(current), 0);
> + trace_printk("futex_unlock_pi: TID->0 transition %u\n", uval);
> + }
>
>
> if (unlikely(uval == -EFAULT))
> @@ -2080,8 +2083,10 @@ retry:
> * Rare case: we managed to release the lock atomically,
> * no need to wake anyone else up:
> */
> - if (unlikely(uval == task_pid_vnr(current)))
> + if (unlikely(uval == task_pid_vnr(current))) {
> + trace_printk("futex_unlock_pi: release without wakeup\n");
> goto out_unlock;
> + }
>
> /*
> * Ok, other tasks may need to be woken up - check waiters
> @@ -2093,6 +2098,7 @@ retry:
> if (!match_futex (&this->key,&key))
> continue;
> ret = wake_futex_pi(uaddr, uval, this);
> + trace_printk("futex_unlock_pi: wake ret=%d uval=%u this=%p\n", ret, uval, this);
> /*
> * The atomic access to the futex value
> * generated a pagefault, so retry the
> @@ -2107,6 +2113,8 @@ retry:
> */
> if (!(uval& FUTEX_OWNER_DIED)) {
> ret = unlock_futex_pi(uaddr, uval);
> + trace_printk("futex_unlock_pi: no waiters, unlock the futex ret=%d uval=%d\n",
> + ret, uval);
> if (ret == -EFAULT)
> goto pi_faulted;
> }
> @@ -2600,12 +2608,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
> ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
> break;
> case FUTEX_LOCK_PI:
> - if (futex_cmpxchg_enabled)
> + if (futex_cmpxchg_enabled) {
> + trace_printk("futex_lock_pi start\n");
> ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
> + trace_printk("futex_lock_pi done ret=%d\n", ret);
> + }
> break;
> case FUTEX_UNLOCK_PI:
> - if (futex_cmpxchg_enabled)
> + if (futex_cmpxchg_enabled) {
> + trace_printk("futex_unlock_pi start\n");
> ret = futex_unlock_pi(uaddr, fshared);
> + trace_printk("futex_unlock_pi done ret=%d\n", ret);
> + }
> break;
> case FUTEX_TRYLOCK_PI:
> if (futex_cmpxchg_enabled)


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

2010-06-28 15:32:22

by Michal Hocko

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On Mon 28-06-10 16:42:46, Michal Hocko wrote:
> Hi Darren,

Hmm, I think that I've found the reason. I have used one additional
tracing patch (bellow) and we are getting ESRCH because cread and pcred
don't match:


version = 6
CPU 0 is empty
cpus=2
field->offset = 16 size=4
<...>-22260 [001] 139.669672: bprint: do_futex : futex_lock_pi start
<...>-22260 [001] 139.669678: bprint: do_futex : futex_lock_pi done ret=0
<...>-22281 [001] 139.693690: bprint: do_futex : futex_lock_pi start
<...>-22281 [001] 139.693696: bprint: lookup_pi_state : cred(1004,1004) != pcred(1003,1003)
<...>-22281 [001] 139.693697: bprint: lookup_pi_state : futex_find_get_task failed with -3
<...>-22281 [001] 139.693697: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=22280
<...>-22281 [001] 139.693698: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
<...>-22281 [001] 139.693698: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
<...>-22281 [001] 139.693699: bprint: futex_lock_pi : returning -ESRCH to userspace
<...>-22281 [001] 139.693700: bprint: do_futex : futex_lock_pi done ret=-3
<...>-22280 [001] 139.694033: bprint: do_futex : futex_unlock_pi start
<...>-22280 [001] 139.694035: bprint: do_futex : futex_unlock_pi: TID->0 transition 2147505928
<...>-22280 [001] 139.694036: bprint: do_futex : futex_unlock_pi: no waiters, unlock the futex ret=0 uval=-2147461368
<...>-22280 [001] 139.694036: bprint: do_futex : futex_unlock_pi done ret=0
<...>-22488 [001] 139.874967: bprint: do_futex : futex_lock_pi start
<...>-22488 [001] 139.874972: bprint: do_futex : futex_lock_pi done ret=0

This would answer why we cannot reproduce with a single user.
Btw. I guess that there is a typo in the condition and it should be
like this:
if (cred->euid != pcred->euid &&
- cred->euid != pcred->uid)
+ cred->uid != pcred->uid)

--
>From f0743d2b2d2f81c40700c4d6816ad3abfd113c1b Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Mon, 28 Jun 2010 17:08:56 +0200
Subject: [PATCH] more traces

---
kernel/futex.c | 13 +++++++++++--
1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index d114fee..d1c2489 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -435,11 +435,16 @@ static struct task_struct * futex_find_get_task(pid_t pid)
p = find_task_by_vpid(pid);
if (!p) {
p = ERR_PTR(-ESRCH);
+ trace_printk("%u pid not found\n", pid);
} else {
pcred = __task_cred(p);
if (cred->euid != pcred->euid &&
- cred->euid != pcred->uid)
+ cred->euid != pcred->uid) {
p = ERR_PTR(-ESRCH);
+ trace_printk("cred(%u,%u) != pcred(%u,%u)\n",
+ cred->euid, cred->uid,
+ pcred->euid, pcred->uid);
+ }
else
get_task_struct(p);
}
@@ -564,8 +569,11 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
if (!pid)
return -ESRCH;
p = futex_find_get_task(pid);
- if (IS_ERR(p))
+ if (IS_ERR(p)) {
+ trace_printk("futex_find_get_task failed with %d\n",
+ PTR_ERR(p));
return PTR_ERR(p);
+ }

/*
* We need to look at the task state flags to figure out,
@@ -581,6 +589,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
* cleanup:
*/
int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
+ trace_printk("pid=%u is exiting ret=%d\n", pid, ret);

raw_spin_unlock_irq(&p->pi_lock);
put_task_struct(p);
--
1.7.1


>
> --
> From 733816347db91670f27d206382b8c2e57e5ef125 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <[email protected]>
> Date: Mon, 28 Jun 2010 13:42:29 +0200
> Subject: [PATCH] futex pi unlock tracing added
>
> ---
> kernel/futex.c | 24 +++++++++++++++++++-----
> 1 files changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 24ac437..d114fee 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -716,7 +716,8 @@ retry:
> if (unlikely(ret)) {
> switch (ret) {
> case -ESRCH:
> - trace_printk("lookup_pi_state: -ESRCH\n");
> + trace_printk("lookup_pi_state: -ESRCH for pid=%u\n",
> + uval & FUTEX_TID_MASK);
> /*
> * No owner found for this futex. Check if the
> * OWNER_DIED bit is set to figure out whether
> @@ -2070,8 +2071,10 @@ retry:
> * again. If it succeeds then we can return without waking
> * anyone else up:
> */
> - if (!(uval & FUTEX_OWNER_DIED))
> + if (!(uval & FUTEX_OWNER_DIED)) {
> uval = cmpxchg_futex_value_locked(uaddr, task_pid_vnr(current), 0);
> + trace_printk("futex_unlock_pi: TID->0 transition %u\n", uval);
> + }
>
>
> if (unlikely(uval == -EFAULT))
> @@ -2080,8 +2083,10 @@ retry:
> * Rare case: we managed to release the lock atomically,
> * no need to wake anyone else up:
> */
> - if (unlikely(uval == task_pid_vnr(current)))
> + if (unlikely(uval == task_pid_vnr(current))) {
> + trace_printk("futex_unlock_pi: release without wakeup\n");
> goto out_unlock;
> + }
>
> /*
> * Ok, other tasks may need to be woken up - check waiters
> @@ -2093,6 +2098,7 @@ retry:
> if (!match_futex (&this->key, &key))
> continue;
> ret = wake_futex_pi(uaddr, uval, this);
> + trace_printk("futex_unlock_pi: wake ret=%d uval=%u this=%p\n", ret, uval, this);
> /*
> * The atomic access to the futex value
> * generated a pagefault, so retry the
> @@ -2107,6 +2113,8 @@ retry:
> */
> if (!(uval & FUTEX_OWNER_DIED)) {
> ret = unlock_futex_pi(uaddr, uval);
> + trace_printk("futex_unlock_pi: no waiters, unlock the futex ret=%d uval=%d\n",
> + ret, uval);
> if (ret == -EFAULT)
> goto pi_faulted;
> }
> @@ -2600,12 +2608,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
> ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
> break;
> case FUTEX_LOCK_PI:
> - if (futex_cmpxchg_enabled)
> + if (futex_cmpxchg_enabled) {
> + trace_printk("futex_lock_pi start\n");
> ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
> + trace_printk("futex_lock_pi done ret=%d\n", ret);
> + }
> break;
> case FUTEX_UNLOCK_PI:
> - if (futex_cmpxchg_enabled)
> + if (futex_cmpxchg_enabled) {
> + trace_printk("futex_unlock_pi start\n");
> ret = futex_unlock_pi(uaddr, fshared);
> + trace_printk("futex_unlock_pi done ret=%d\n", ret);
> + }
> break;
> case FUTEX_TRYLOCK_PI:
> if (futex_cmpxchg_enabled)
> --
> 1.7.1
>
> >
> > --
> > Darren Hart
> >
> > From 92014a07df73489460ff788274506255ff0f775d Mon Sep 17 00:00:00 2001
> > From: Darren Hart <[email protected]>
> > Date: Fri, 25 Jun 2010 13:54:25 -0700
> > Subject: [PATCH] robust pi futex tracing
> >
> > ---
> > kernel/futex.c | 24 ++++++++++++++++++++----
> > 1 files changed, 20 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index e7a35f1..24ac437 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -683,6 +683,8 @@ retry:
> > */
> > if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
> > /* Keep the OWNER_DIED bit */
> > + if (ownerdied)
> > + trace_printk("ownerdied, taking over lock\n");
> > newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task);
> > ownerdied = 0;
> > lock_taken = 1;
> > @@ -692,14 +694,18 @@ retry:
> >
> > if (unlikely(curval == -EFAULT))
> > return -EFAULT;
> > - if (unlikely(curval != uval))
> > + if (unlikely(curval != uval)) {
> > + trace_printk("cmpxchg failed, retrying\n");
> > goto retry;
> > + }
> >
> > /*
> > * We took the lock due to owner died take over.
> > */
> > - if (unlikely(lock_taken))
> > + if (unlikely(lock_taken)) {
> > + trace_printk("ownerdied, lock acquired, return 1\n");
> > return 1;
> > + }
> >
> > /*
> > * We dont have the lock. Look up the PI state (or create it if
> > @@ -710,13 +716,16 @@ retry:
> > if (unlikely(ret)) {
> > switch (ret) {
> > case -ESRCH:
> > + trace_printk("lookup_pi_state: -ESRCH\n");
> > /*
> > * No owner found for this futex. Check if the
> > * OWNER_DIED bit is set to figure out whether
> > * this is a robust futex or not.
> > */
> > - if (get_futex_value_locked(&curval, uaddr))
> > + if (get_futex_value_locked(&curval, uaddr)) {
> > + trace_printk("get_futex_value_locked: -EFAULT\n");
> > return -EFAULT;
> > + }
> >
> > /*
> > * We simply start over in case of a robust
> > @@ -724,10 +733,13 @@ retry:
> > * and return happy.
> > */
> > if (curval & FUTEX_OWNER_DIED) {
> > + trace_printk("ownerdied, goto retry\n");
> > ownerdied = 1;
> > goto retry;
> > }
> > + trace_printk("ownerdied not detected, returning -ESRCH\n");
> > default:
> > + trace_printk("lookup_pi_state: %d\n", ret);
> > break;
> > }
> > }
> > @@ -1950,6 +1962,8 @@ retry_private:
> > put_futex_key(fshared, &q.key);
> > cond_resched();
> > goto retry;
> > + case -ESRCH:
> > + trace_printk("returning -ESRCH to userspace\n");
> > default:
> > goto out_unlock_put_key;
> > }
> > @@ -2537,8 +2551,10 @@ void exit_robust_list(struct task_struct *curr)
> > /*
> > * Avoid excessively long or circular lists:
> > */
> > - if (!--limit)
> > + if (!--limit) {
> > + trace_printk("excessively long list, aborting\n");
> > break;
> > + }
> >
> > cond_resched();
> > }
> > --
> > 1.7.0.4
> >
> > --
> > Darren Hart
> > IBM Linux Technology Center
> > Real-Time Linux Team
>
> --
> Michal Hocko
> L3 team
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9
> Czech Republic

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-28 15:40:06

by Michal Hocko

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On Mon 28-06-10 17:32:14, Michal Hocko wrote:
[...]
> Btw. I guess that there is a typo in the condition and it should be
> like this:
> if (cred->euid != pcred->euid &&
> - cred->euid != pcred->uid)
> + cred->uid != pcred->uid)

Scratch that. This looks correct.


--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-28 15:58:51

by Michal Hocko

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On Mon 28-06-10 17:32:14, Michal Hocko wrote:
> On Mon 28-06-10 16:42:46, Michal Hocko wrote:
> > Hi Darren,
>
> Hmm, I think that I've found the reason. I have used one additional
> tracing patch (bellow) and we are getting ESRCH because cread and pcred
> don't match:
>
>
> version = 6
> CPU 0 is empty
> cpus=2
> field->offset = 16 size=4
> <...>-22260 [001] 139.669672: bprint: do_futex : futex_lock_pi start
> <...>-22260 [001] 139.669678: bprint: do_futex : futex_lock_pi done ret=0
> <...>-22281 [001] 139.693690: bprint: do_futex : futex_lock_pi start
> <...>-22281 [001] 139.693696: bprint: lookup_pi_state : cred(1004,1004) != pcred(1003,1003)
> <...>-22281 [001] 139.693697: bprint: lookup_pi_state : futex_find_get_task failed with -3
> <...>-22281 [001] 139.693697: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=22280
> <...>-22281 [001] 139.693698: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
> <...>-22281 [001] 139.693698: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
> <...>-22281 [001] 139.693699: bprint: futex_lock_pi : returning -ESRCH to userspace
> <...>-22281 [001] 139.693700: bprint: do_futex : futex_lock_pi done ret=-3
> <...>-22280 [001] 139.694033: bprint: do_futex : futex_unlock_pi start
> <...>-22280 [001] 139.694035: bprint: do_futex : futex_unlock_pi: TID->0 transition 2147505928
> <...>-22280 [001] 139.694036: bprint: do_futex : futex_unlock_pi: no waiters, unlock the futex ret=0 uval=-2147461368
> <...>-22280 [001] 139.694036: bprint: do_futex : futex_unlock_pi done ret=0
> <...>-22488 [001] 139.874967: bprint: do_futex : futex_lock_pi start
> <...>-22488 [001] 139.874972: bprint: do_futex : futex_lock_pi done ret=0
>
> This would answer why we cannot reproduce with a single user.

I am trying to understand why futex_find_get_task has to check the
credentials at all. I guess that there might be some concerns from
futex_requeue callers but does it make sense from futex_lock_pi_atomic
call path? This would mean that the futex is unusable in multi-user process
synchronization, right? (Is this documented somewhere?)

[...]
>
> --
> From f0743d2b2d2f81c40700c4d6816ad3abfd113c1b Mon Sep 17 00:00:00 2001
> From: Michal Hocko <[email protected]>
> Date: Mon, 28 Jun 2010 17:08:56 +0200
> Subject: [PATCH] more traces
>
> ---
> kernel/futex.c | 13 +++++++++++--
> 1 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index d114fee..d1c2489 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -435,11 +435,16 @@ static struct task_struct * futex_find_get_task(pid_t pid)
> p = find_task_by_vpid(pid);
> if (!p) {
> p = ERR_PTR(-ESRCH);
> + trace_printk("%u pid not found\n", pid);
> } else {
> pcred = __task_cred(p);
> if (cred->euid != pcred->euid &&
> - cred->euid != pcred->uid)
> + cred->euid != pcred->uid) {
> p = ERR_PTR(-ESRCH);
> + trace_printk("cred(%u,%u) != pcred(%u,%u)\n",
> + cred->euid, cred->uid,
> + pcred->euid, pcred->uid);
> + }
> else
> get_task_struct(p);
> }
> @@ -564,8 +569,11 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> if (!pid)
> return -ESRCH;
> p = futex_find_get_task(pid);
> - if (IS_ERR(p))
> + if (IS_ERR(p)) {
> + trace_printk("futex_find_get_task failed with %d\n",
> + PTR_ERR(p));
> return PTR_ERR(p);
> + }
>
> /*
> * We need to look at the task state flags to figure out,
> @@ -581,6 +589,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> * cleanup:
> */
> int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
> + trace_printk("pid=%u is exiting ret=%d\n", pid, ret);
>
> raw_spin_unlock_irq(&p->pi_lock);
> put_task_struct(p);
> --
> 1.7.1
>
>
> >
> > --
> > From 733816347db91670f27d206382b8c2e57e5ef125 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <[email protected]>
> > Date: Mon, 28 Jun 2010 13:42:29 +0200
> > Subject: [PATCH] futex pi unlock tracing added
> >
> > ---
> > kernel/futex.c | 24 +++++++++++++++++++-----
> > 1 files changed, 19 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index 24ac437..d114fee 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -716,7 +716,8 @@ retry:
> > if (unlikely(ret)) {
> > switch (ret) {
> > case -ESRCH:
> > - trace_printk("lookup_pi_state: -ESRCH\n");
> > + trace_printk("lookup_pi_state: -ESRCH for pid=%u\n",
> > + uval & FUTEX_TID_MASK);
> > /*
> > * No owner found for this futex. Check if the
> > * OWNER_DIED bit is set to figure out whether
> > @@ -2070,8 +2071,10 @@ retry:
> > * again. If it succeeds then we can return without waking
> > * anyone else up:
> > */
> > - if (!(uval & FUTEX_OWNER_DIED))
> > + if (!(uval & FUTEX_OWNER_DIED)) {
> > uval = cmpxchg_futex_value_locked(uaddr, task_pid_vnr(current), 0);
> > + trace_printk("futex_unlock_pi: TID->0 transition %u\n", uval);
> > + }
> >
> >
> > if (unlikely(uval == -EFAULT))
> > @@ -2080,8 +2083,10 @@ retry:
> > * Rare case: we managed to release the lock atomically,
> > * no need to wake anyone else up:
> > */
> > - if (unlikely(uval == task_pid_vnr(current)))
> > + if (unlikely(uval == task_pid_vnr(current))) {
> > + trace_printk("futex_unlock_pi: release without wakeup\n");
> > goto out_unlock;
> > + }
> >
> > /*
> > * Ok, other tasks may need to be woken up - check waiters
> > @@ -2093,6 +2098,7 @@ retry:
> > if (!match_futex (&this->key, &key))
> > continue;
> > ret = wake_futex_pi(uaddr, uval, this);
> > + trace_printk("futex_unlock_pi: wake ret=%d uval=%u this=%p\n", ret, uval, this);
> > /*
> > * The atomic access to the futex value
> > * generated a pagefault, so retry the
> > @@ -2107,6 +2113,8 @@ retry:
> > */
> > if (!(uval & FUTEX_OWNER_DIED)) {
> > ret = unlock_futex_pi(uaddr, uval);
> > + trace_printk("futex_unlock_pi: no waiters, unlock the futex ret=%d uval=%d\n",
> > + ret, uval);
> > if (ret == -EFAULT)
> > goto pi_faulted;
> > }
> > @@ -2600,12 +2608,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
> > ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
> > break;
> > case FUTEX_LOCK_PI:
> > - if (futex_cmpxchg_enabled)
> > + if (futex_cmpxchg_enabled) {
> > + trace_printk("futex_lock_pi start\n");
> > ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
> > + trace_printk("futex_lock_pi done ret=%d\n", ret);
> > + }
> > break;
> > case FUTEX_UNLOCK_PI:
> > - if (futex_cmpxchg_enabled)
> > + if (futex_cmpxchg_enabled) {
> > + trace_printk("futex_unlock_pi start\n");
> > ret = futex_unlock_pi(uaddr, fshared);
> > + trace_printk("futex_unlock_pi done ret=%d\n", ret);
> > + }
> > break;
> > case FUTEX_TRYLOCK_PI:
> > if (futex_cmpxchg_enabled)
> > --
> > 1.7.1
> >
> > >
> > > --
> > > Darren Hart
> > >
> > > From 92014a07df73489460ff788274506255ff0f775d Mon Sep 17 00:00:00 2001
> > > From: Darren Hart <[email protected]>
> > > Date: Fri, 25 Jun 2010 13:54:25 -0700
> > > Subject: [PATCH] robust pi futex tracing
> > >
> > > ---
> > > kernel/futex.c | 24 ++++++++++++++++++++----
> > > 1 files changed, 20 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/kernel/futex.c b/kernel/futex.c
> > > index e7a35f1..24ac437 100644
> > > --- a/kernel/futex.c
> > > +++ b/kernel/futex.c
> > > @@ -683,6 +683,8 @@ retry:
> > > */
> > > if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
> > > /* Keep the OWNER_DIED bit */
> > > + if (ownerdied)
> > > + trace_printk("ownerdied, taking over lock\n");
> > > newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task);
> > > ownerdied = 0;
> > > lock_taken = 1;
> > > @@ -692,14 +694,18 @@ retry:
> > >
> > > if (unlikely(curval == -EFAULT))
> > > return -EFAULT;
> > > - if (unlikely(curval != uval))
> > > + if (unlikely(curval != uval)) {
> > > + trace_printk("cmpxchg failed, retrying\n");
> > > goto retry;
> > > + }
> > >
> > > /*
> > > * We took the lock due to owner died take over.
> > > */
> > > - if (unlikely(lock_taken))
> > > + if (unlikely(lock_taken)) {
> > > + trace_printk("ownerdied, lock acquired, return 1\n");
> > > return 1;
> > > + }
> > >
> > > /*
> > > * We dont have the lock. Look up the PI state (or create it if
> > > @@ -710,13 +716,16 @@ retry:
> > > if (unlikely(ret)) {
> > > switch (ret) {
> > > case -ESRCH:
> > > + trace_printk("lookup_pi_state: -ESRCH\n");
> > > /*
> > > * No owner found for this futex. Check if the
> > > * OWNER_DIED bit is set to figure out whether
> > > * this is a robust futex or not.
> > > */
> > > - if (get_futex_value_locked(&curval, uaddr))
> > > + if (get_futex_value_locked(&curval, uaddr)) {
> > > + trace_printk("get_futex_value_locked: -EFAULT\n");
> > > return -EFAULT;
> > > + }
> > >
> > > /*
> > > * We simply start over in case of a robust
> > > @@ -724,10 +733,13 @@ retry:
> > > * and return happy.
> > > */
> > > if (curval & FUTEX_OWNER_DIED) {
> > > + trace_printk("ownerdied, goto retry\n");
> > > ownerdied = 1;
> > > goto retry;
> > > }
> > > + trace_printk("ownerdied not detected, returning -ESRCH\n");
> > > default:
> > > + trace_printk("lookup_pi_state: %d\n", ret);
> > > break;
> > > }
> > > }
> > > @@ -1950,6 +1962,8 @@ retry_private:
> > > put_futex_key(fshared, &q.key);
> > > cond_resched();
> > > goto retry;
> > > + case -ESRCH:
> > > + trace_printk("returning -ESRCH to userspace\n");
> > > default:
> > > goto out_unlock_put_key;
> > > }
> > > @@ -2537,8 +2551,10 @@ void exit_robust_list(struct task_struct *curr)
> > > /*
> > > * Avoid excessively long or circular lists:
> > > */
> > > - if (!--limit)
> > > + if (!--limit) {
> > > + trace_printk("excessively long list, aborting\n");
> > > break;
> > > + }
> > >
> > > cond_resched();
> > > }
> > > --
> > > 1.7.0.4
> > >
> > > --
> > > Darren Hart
> > > IBM Linux Technology Center
> > > Real-Time Linux Team
> >
> > --
> > Michal Hocko
> > L3 team
> > SUSE LINUX s.r.o.
> > Lihovarska 1060/12
> > 190 00 Praha 9
> > Czech Republic
>
> --
> Michal Hocko
> L3 team
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9
> Czech Republic

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-28 16:39:56

by Michal Hocko

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

[I am sorry for bombarding you with emails]

On Mon 28-06-10 17:58:45, Michal Hocko wrote:
> On Mon 28-06-10 17:32:14, Michal Hocko wrote:
> > On Mon 28-06-10 16:42:46, Michal Hocko wrote:
> > > Hi Darren,
> >
> > Hmm, I think that I've found the reason. I have used one additional
> > tracing patch (bellow) and we are getting ESRCH because cread and pcred
> > don't match:
> >
> >
> > version = 6
> > CPU 0 is empty
> > cpus=2
> > field->offset = 16 size=4
> > <...>-22260 [001] 139.669672: bprint: do_futex : futex_lock_pi start
> > <...>-22260 [001] 139.669678: bprint: do_futex : futex_lock_pi done ret=0
> > <...>-22281 [001] 139.693690: bprint: do_futex : futex_lock_pi start
> > <...>-22281 [001] 139.693696: bprint: lookup_pi_state : cred(1004,1004) != pcred(1003,1003)
> > <...>-22281 [001] 139.693697: bprint: lookup_pi_state : futex_find_get_task failed with -3
> > <...>-22281 [001] 139.693697: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=22280
> > <...>-22281 [001] 139.693698: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
> > <...>-22281 [001] 139.693698: bprint: futex_lock_pi_atomic : lookup_pi_state: -3
> > <...>-22281 [001] 139.693699: bprint: futex_lock_pi : returning -ESRCH to userspace
> > <...>-22281 [001] 139.693700: bprint: do_futex : futex_lock_pi done ret=-3
> > <...>-22280 [001] 139.694033: bprint: do_futex : futex_unlock_pi start
> > <...>-22280 [001] 139.694035: bprint: do_futex : futex_unlock_pi: TID->0 transition 2147505928
> > <...>-22280 [001] 139.694036: bprint: do_futex : futex_unlock_pi: no waiters, unlock the futex ret=0 uval=-2147461368
> > <...>-22280 [001] 139.694036: bprint: do_futex : futex_unlock_pi done ret=0
> > <...>-22488 [001] 139.874967: bprint: do_futex : futex_lock_pi start
> > <...>-22488 [001] 139.874972: bprint: do_futex : futex_lock_pi done ret=0
> >
> > This would answer why we cannot reproduce with a single user.
>
> I am trying to understand why futex_find_get_task has to check the
> credentials at all. I guess that there might be some concerns from
> futex_requeue callers but does it make sense from futex_lock_pi_atomic
> call path? This would mean that the futex is unusable in multi-user process
> synchronization, right? (Is this documented somewhere?)

Would something like the following be acceptable (just a compile
tested without comments). It simply makes caller of lookup_pi_state to
decide whether credentials should be checked.

diff --git a/kernel/futex.c b/kernel/futex.c
index e7a35f1..dfe4b11 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -426,7 +426,7 @@ static void free_pi_state(struct futex_pi_state *pi_state)
* Look up the task based on what TID userspace gave us.
* We dont trust it.
*/
-static struct task_struct * futex_find_get_task(pid_t pid)
+static struct task_struct * futex_find_get_task(pid_t pid, bool check_cred)
{
struct task_struct *p;
const struct cred *cred = current_cred(), *pcred;
@@ -436,10 +436,12 @@ static struct task_struct * futex_find_get_task(pid_t pid)
if (!p) {
p = ERR_PTR(-ESRCH);
} else {
- pcred = __task_cred(p);
- if (cred->euid != pcred->euid &&
- cred->euid != pcred->uid)
- p = ERR_PTR(-ESRCH);
+ if (check_cred) {
+ pcred = __task_cred(p);
+ if (cred->euid != pcred->euid &&
+ cred->euid != pcred->uid)
+ p = ERR_PTR(-ESRCH);
+ }
else
get_task_struct(p);
}
@@ -506,7 +508,7 @@ void exit_pi_state_list(struct task_struct *curr)

static int
lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
- union futex_key *key, struct futex_pi_state **ps)
+ union futex_key *key, struct futex_pi_state **ps, bool check_cred)
{
struct futex_pi_state *pi_state = NULL;
struct futex_q *this, *next;
@@ -563,7 +565,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
*/
if (!pid)
return -ESRCH;
- p = futex_find_get_task(pid);
+ p = futex_find_get_task(pid, check_cred);
if (IS_ERR(p))
return PTR_ERR(p);

@@ -705,7 +707,7 @@ retry:
* We dont have the lock. Look up the PI state (or create it if
* we are the first waiter):
*/
- ret = lookup_pi_state(uval, hb, key, ps);
+ ret = lookup_pi_state(uval, hb, key, ps, false);

if (unlikely(ret)) {
switch (ret) {
@@ -1258,7 +1260,7 @@ retry_private:
ret = get_futex_value_locked(&curval2, uaddr2);
if (!ret)
ret = lookup_pi_state(curval2, hb2, &key2,
- &pi_state);
+ &pi_state, true);
}

switch (ret) {

>
> [...]
> >
> > --
> > From f0743d2b2d2f81c40700c4d6816ad3abfd113c1b Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <[email protected]>
> > Date: Mon, 28 Jun 2010 17:08:56 +0200
> > Subject: [PATCH] more traces
> >
> > ---
> > kernel/futex.c | 13 +++++++++++--
> > 1 files changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index d114fee..d1c2489 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -435,11 +435,16 @@ static struct task_struct * futex_find_get_task(pid_t pid)
> > p = find_task_by_vpid(pid);
> > if (!p) {
> > p = ERR_PTR(-ESRCH);
> > + trace_printk("%u pid not found\n", pid);
> > } else {
> > pcred = __task_cred(p);
> > if (cred->euid != pcred->euid &&
> > - cred->euid != pcred->uid)
> > + cred->euid != pcred->uid) {
> > p = ERR_PTR(-ESRCH);
> > + trace_printk("cred(%u,%u) != pcred(%u,%u)\n",
> > + cred->euid, cred->uid,
> > + pcred->euid, pcred->uid);
> > + }
> > else
> > get_task_struct(p);
> > }
> > @@ -564,8 +569,11 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> > if (!pid)
> > return -ESRCH;
> > p = futex_find_get_task(pid);
> > - if (IS_ERR(p))
> > + if (IS_ERR(p)) {
> > + trace_printk("futex_find_get_task failed with %d\n",
> > + PTR_ERR(p));
> > return PTR_ERR(p);
> > + }
> >
> > /*
> > * We need to look at the task state flags to figure out,
> > @@ -581,6 +589,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> > * cleanup:
> > */
> > int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
> > + trace_printk("pid=%u is exiting ret=%d\n", pid, ret);
> >
> > raw_spin_unlock_irq(&p->pi_lock);
> > put_task_struct(p);
> > --
> > 1.7.1
> >
> >
> > >
> > > --
> > > From 733816347db91670f27d206382b8c2e57e5ef125 Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <[email protected]>
> > > Date: Mon, 28 Jun 2010 13:42:29 +0200
> > > Subject: [PATCH] futex pi unlock tracing added
> > >
> > > ---
> > > kernel/futex.c | 24 +++++++++++++++++++-----
> > > 1 files changed, 19 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/kernel/futex.c b/kernel/futex.c
> > > index 24ac437..d114fee 100644
> > > --- a/kernel/futex.c
> > > +++ b/kernel/futex.c
> > > @@ -716,7 +716,8 @@ retry:
> > > if (unlikely(ret)) {
> > > switch (ret) {
> > > case -ESRCH:
> > > - trace_printk("lookup_pi_state: -ESRCH\n");
> > > + trace_printk("lookup_pi_state: -ESRCH for pid=%u\n",
> > > + uval & FUTEX_TID_MASK);
> > > /*
> > > * No owner found for this futex. Check if the
> > > * OWNER_DIED bit is set to figure out whether
> > > @@ -2070,8 +2071,10 @@ retry:
> > > * again. If it succeeds then we can return without waking
> > > * anyone else up:
> > > */
> > > - if (!(uval & FUTEX_OWNER_DIED))
> > > + if (!(uval & FUTEX_OWNER_DIED)) {
> > > uval = cmpxchg_futex_value_locked(uaddr, task_pid_vnr(current), 0);
> > > + trace_printk("futex_unlock_pi: TID->0 transition %u\n", uval);
> > > + }
> > >
> > >
> > > if (unlikely(uval == -EFAULT))
> > > @@ -2080,8 +2083,10 @@ retry:
> > > * Rare case: we managed to release the lock atomically,
> > > * no need to wake anyone else up:
> > > */
> > > - if (unlikely(uval == task_pid_vnr(current)))
> > > + if (unlikely(uval == task_pid_vnr(current))) {
> > > + trace_printk("futex_unlock_pi: release without wakeup\n");
> > > goto out_unlock;
> > > + }
> > >
> > > /*
> > > * Ok, other tasks may need to be woken up - check waiters
> > > @@ -2093,6 +2098,7 @@ retry:
> > > if (!match_futex (&this->key, &key))
> > > continue;
> > > ret = wake_futex_pi(uaddr, uval, this);
> > > + trace_printk("futex_unlock_pi: wake ret=%d uval=%u this=%p\n", ret, uval, this);
> > > /*
> > > * The atomic access to the futex value
> > > * generated a pagefault, so retry the
> > > @@ -2107,6 +2113,8 @@ retry:
> > > */
> > > if (!(uval & FUTEX_OWNER_DIED)) {
> > > ret = unlock_futex_pi(uaddr, uval);
> > > + trace_printk("futex_unlock_pi: no waiters, unlock the futex ret=%d uval=%d\n",
> > > + ret, uval);
> > > if (ret == -EFAULT)
> > > goto pi_faulted;
> > > }
> > > @@ -2600,12 +2608,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
> > > ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
> > > break;
> > > case FUTEX_LOCK_PI:
> > > - if (futex_cmpxchg_enabled)
> > > + if (futex_cmpxchg_enabled) {
> > > + trace_printk("futex_lock_pi start\n");
> > > ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
> > > + trace_printk("futex_lock_pi done ret=%d\n", ret);
> > > + }
> > > break;
> > > case FUTEX_UNLOCK_PI:
> > > - if (futex_cmpxchg_enabled)
> > > + if (futex_cmpxchg_enabled) {
> > > + trace_printk("futex_unlock_pi start\n");
> > > ret = futex_unlock_pi(uaddr, fshared);
> > > + trace_printk("futex_unlock_pi done ret=%d\n", ret);
> > > + }
> > > break;
> > > case FUTEX_TRYLOCK_PI:
> > > if (futex_cmpxchg_enabled)
> > > --
> > > 1.7.1
> > >
> > > >
> > > > --
> > > > Darren Hart
> > > >
> > > > From 92014a07df73489460ff788274506255ff0f775d Mon Sep 17 00:00:00 2001
> > > > From: Darren Hart <[email protected]>
> > > > Date: Fri, 25 Jun 2010 13:54:25 -0700
> > > > Subject: [PATCH] robust pi futex tracing
> > > >
> > > > ---
> > > > kernel/futex.c | 24 ++++++++++++++++++++----
> > > > 1 files changed, 20 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/kernel/futex.c b/kernel/futex.c
> > > > index e7a35f1..24ac437 100644
> > > > --- a/kernel/futex.c
> > > > +++ b/kernel/futex.c
> > > > @@ -683,6 +683,8 @@ retry:
> > > > */
> > > > if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
> > > > /* Keep the OWNER_DIED bit */
> > > > + if (ownerdied)
> > > > + trace_printk("ownerdied, taking over lock\n");
> > > > newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task);
> > > > ownerdied = 0;
> > > > lock_taken = 1;
> > > > @@ -692,14 +694,18 @@ retry:
> > > >
> > > > if (unlikely(curval == -EFAULT))
> > > > return -EFAULT;
> > > > - if (unlikely(curval != uval))
> > > > + if (unlikely(curval != uval)) {
> > > > + trace_printk("cmpxchg failed, retrying\n");
> > > > goto retry;
> > > > + }
> > > >
> > > > /*
> > > > * We took the lock due to owner died take over.
> > > > */
> > > > - if (unlikely(lock_taken))
> > > > + if (unlikely(lock_taken)) {
> > > > + trace_printk("ownerdied, lock acquired, return 1\n");
> > > > return 1;
> > > > + }
> > > >
> > > > /*
> > > > * We dont have the lock. Look up the PI state (or create it if
> > > > @@ -710,13 +716,16 @@ retry:
> > > > if (unlikely(ret)) {
> > > > switch (ret) {
> > > > case -ESRCH:
> > > > + trace_printk("lookup_pi_state: -ESRCH\n");
> > > > /*
> > > > * No owner found for this futex. Check if the
> > > > * OWNER_DIED bit is set to figure out whether
> > > > * this is a robust futex or not.
> > > > */
> > > > - if (get_futex_value_locked(&curval, uaddr))
> > > > + if (get_futex_value_locked(&curval, uaddr)) {
> > > > + trace_printk("get_futex_value_locked: -EFAULT\n");
> > > > return -EFAULT;
> > > > + }
> > > >
> > > > /*
> > > > * We simply start over in case of a robust
> > > > @@ -724,10 +733,13 @@ retry:
> > > > * and return happy.
> > > > */
> > > > if (curval & FUTEX_OWNER_DIED) {
> > > > + trace_printk("ownerdied, goto retry\n");
> > > > ownerdied = 1;
> > > > goto retry;
> > > > }
> > > > + trace_printk("ownerdied not detected, returning -ESRCH\n");
> > > > default:
> > > > + trace_printk("lookup_pi_state: %d\n", ret);
> > > > break;
> > > > }
> > > > }
> > > > @@ -1950,6 +1962,8 @@ retry_private:
> > > > put_futex_key(fshared, &q.key);
> > > > cond_resched();
> > > > goto retry;
> > > > + case -ESRCH:
> > > > + trace_printk("returning -ESRCH to userspace\n");
> > > > default:
> > > > goto out_unlock_put_key;
> > > > }
> > > > @@ -2537,8 +2551,10 @@ void exit_robust_list(struct task_struct *curr)
> > > > /*
> > > > * Avoid excessively long or circular lists:
> > > > */
> > > > - if (!--limit)
> > > > + if (!--limit) {
> > > > + trace_printk("excessively long list, aborting\n");
> > > > break;
> > > > + }
> > > >
> > > > cond_resched();
> > > > }
> > > > --
> > > > 1.7.0.4
> > > >
> > > > --
> > > > Darren Hart
> > > > IBM Linux Technology Center
> > > > Real-Time Linux Team
> > >
> > > --
> > > Michal Hocko
> > > L3 team
> > > SUSE LINUX s.r.o.
> > > Lihovarska 1060/12
> > > 190 00 Praha 9
> > > Czech Republic
> >
> > --
> > Michal Hocko
> > L3 team
> > SUSE LINUX s.r.o.
> > Lihovarska 1060/12
> > 190 00 Praha 9
> > Czech Republic
>
> --
> Michal Hocko
> L3 team
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9
> Czech Republic

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-28 16:46:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On Mon, 2010-06-28 at 18:39 +0200, Michal Hocko wrote:

> [I am sorry for bombarding you with emails]

You can lessen the burden on all of us by trimming your emails more
aggressively. Your last one has 4 dead tails in it.

2010-06-28 16:49:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On Mon, 2010-06-28 at 18:39 +0200, Michal Hocko wrote:
> Would something like the following be acceptable (just a compile
> tested without comments). It simply makes caller of lookup_pi_state to
> decide whether credentials should be checked.

So it was Ingo, who in c87e2837be8 (pi-futex:
futex_lock_pi/futex_unlock_pi support) introduced the euid checks:

+futex_find_get_task():
+ if ((current->euid != p->euid) && (current->euid != p->uid)) {
+ p = NULL;
+ goto out_unlock;
+ }

Ingo, do you remember the rationale behind that? It seems to be causing
grief when two different users contend on the same (shared) futex.

See the below proposed solution.

> diff --git a/kernel/futex.c b/kernel/futex.c
> index e7a35f1..dfe4b11 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -426,7 +426,7 @@ static void free_pi_state(struct futex_pi_state *pi_state)
> * Look up the task based on what TID userspace gave us.
> * We dont trust it.
> */
> -static struct task_struct * futex_find_get_task(pid_t pid)
> +static struct task_struct * futex_find_get_task(pid_t pid, bool check_cred)
> {
> struct task_struct *p;
> const struct cred *cred = current_cred(), *pcred;
> @@ -436,10 +436,12 @@ static struct task_struct * futex_find_get_task(pid_t pid)
> if (!p) {
> p = ERR_PTR(-ESRCH);
> } else {
> - pcred = __task_cred(p);
> - if (cred->euid != pcred->euid &&
> - cred->euid != pcred->uid)
> - p = ERR_PTR(-ESRCH);
> + if (check_cred) {
> + pcred = __task_cred(p);
> + if (cred->euid != pcred->euid &&
> + cred->euid != pcred->uid)
> + p = ERR_PTR(-ESRCH);
> + }
> else
> get_task_struct(p);
> }
> @@ -506,7 +508,7 @@ void exit_pi_state_list(struct task_struct *curr)
>
> static int
> lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> - union futex_key *key, struct futex_pi_state **ps)
> + union futex_key *key, struct futex_pi_state **ps, bool check_cred)
> {
> struct futex_pi_state *pi_state = NULL;
> struct futex_q *this, *next;
> @@ -563,7 +565,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> */
> if (!pid)
> return -ESRCH;
> - p = futex_find_get_task(pid);
> + p = futex_find_get_task(pid, check_cred);
> if (IS_ERR(p))
> return PTR_ERR(p);
>
> @@ -705,7 +707,7 @@ retry:
> * We dont have the lock. Look up the PI state (or create it if
> * we are the first waiter):
> */
> - ret = lookup_pi_state(uval, hb, key, ps);
> + ret = lookup_pi_state(uval, hb, key, ps, false);
>
> if (unlikely(ret)) {
> switch (ret) {
> @@ -1258,7 +1260,7 @@ retry_private:
> ret = get_futex_value_locked(&curval2, uaddr2);
> if (!ret)
> ret = lookup_pi_state(curval2, hb2, &key2,
> - &pi_state);
> + &pi_state, true);
> }
>
> switch (ret) {

2010-06-28 16:56:47

by Michal Hocko

[permalink] [raw]
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?

On Mon 28-06-10 18:45:50, Peter Zijlstra wrote:
> On Mon, 2010-06-28 at 18:39 +0200, Michal Hocko wrote:
>
> > [I am sorry for bombarding you with emails]
>
> You can lessen the burden on all of us by trimming your emails more
> aggressively.

OK.

> Your last one has 4 dead tails in it.

I just wanted to have all tracing patches in there, but you are right I
should have removed that they are not helpful anymore. Sorry about that.

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-29 08:42:37

by Michal Hocko

[permalink] [raw]
Subject: [PATCH] futex: futex_find_get_task make credentials check conditional

On Mon 28-06-10 18:49:08, Peter Zijlstra wrote:
> On Mon, 2010-06-28 at 18:39 +0200, Michal Hocko wrote:
> > Would something like the following be acceptable (just a compile
> > tested without comments). It simply makes caller of lookup_pi_state to
> > decide whether credentials should be checked.
>
> So it was Ingo, who in c87e2837be8 (pi-futex:
> futex_lock_pi/futex_unlock_pi support) introduced the euid checks:
>
> +futex_find_get_task():
> + if ((current->euid != p->euid) && (current->euid != p->uid)) {
> + p = NULL;
> + goto out_unlock;
> + }
>
> Ingo, do you remember the rationale behind that? It seems to be causing
> grief when two different users contend on the same (shared) futex.
>
> See the below proposed solution.

Here is the patch with comments and rationale:
(reference to the original discussion: http://lkml.org/lkml/2010/6/23/52)

--
>From f477a6d989dfde11c5bb5f28d5ce21d0682f4e25 Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Tue, 29 Jun 2010 10:02:58 +0200
Subject: [PATCH] futex: futex_find_get_task make credentials check conditional

futex_find_get_task is currently used (through lookup_pi_state) from two
contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
makes sense in the first code path, the second one is more problematic
because this check requires that the PI lock holder (pid parameter) has
the same uid and euid as the process's euid which is trying to lock the
same futex (current).

This results in glibc assert failure or process hang (if glibc is
compiled without assert support) for shared robust pthread mutex with
priority inheritance if a process tries to lock already held lock owned
by a process with a different euid:

pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.

The problem is that futex_lock_pi_atomic which is called when we try to
lock already held lock checks the current holder (tid is stored in the
futex value) to get the PI state. It uses lookup_pi_state which in turn
gets task struct from futex_find_get_task. ESRCH is returned either when
the task is not found or if credentials check fails.
futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
however, doesn't expect that robust lock returns with ESRCH because it
should get either success or owner died.

Let's make credentials check conditional (as a new parameter) in
futex_find_get_task. Then we can prevent from check in the pi lock path
and still preserve it in the futex_requeue path.

Signed-off-by: Michal Hocko <[email protected]>
---
kernel/futex.c | 24 +++++++++++++++---------
1 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e7a35f1..79b69e5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -425,8 +425,9 @@ static void free_pi_state(struct futex_pi_state *pi_state)
/*
* Look up the task based on what TID userspace gave us.
* We dont trust it.
+ * Check the credentials if required by check_cred
*/
-static struct task_struct * futex_find_get_task(pid_t pid)
+static struct task_struct * futex_find_get_task(pid_t pid, bool check_cred)
{
struct task_struct *p;
const struct cred *cred = current_cred(), *pcred;
@@ -436,10 +437,12 @@ static struct task_struct * futex_find_get_task(pid_t pid)
if (!p) {
p = ERR_PTR(-ESRCH);
} else {
- pcred = __task_cred(p);
- if (cred->euid != pcred->euid &&
- cred->euid != pcred->uid)
- p = ERR_PTR(-ESRCH);
+ if (check_cred) {
+ pcred = __task_cred(p);
+ if (cred->euid != pcred->euid &&
+ cred->euid != pcred->uid)
+ p = ERR_PTR(-ESRCH);
+ }
else
get_task_struct(p);
}
@@ -504,9 +507,10 @@ void exit_pi_state_list(struct task_struct *curr)
raw_spin_unlock_irq(&curr->pi_lock);
}

+/* check_cred is just passed through to futex_find_get_task */
static int
lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
- union futex_key *key, struct futex_pi_state **ps)
+ union futex_key *key, struct futex_pi_state **ps, bool check_cred)
{
struct futex_pi_state *pi_state = NULL;
struct futex_q *this, *next;
@@ -563,7 +567,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
*/
if (!pid)
return -ESRCH;
- p = futex_find_get_task(pid);
+ p = futex_find_get_task(pid, check_cred);
if (IS_ERR(p))
return PTR_ERR(p);

@@ -704,8 +708,10 @@ retry:
/*
* We dont have the lock. Look up the PI state (or create it if
* we are the first waiter):
+ * Do not ask for credentials check because we want to share the
+ * lock between processes with different (e)uids
*/
- ret = lookup_pi_state(uval, hb, key, ps);
+ ret = lookup_pi_state(uval, hb, key, ps, false);

if (unlikely(ret)) {
switch (ret) {
@@ -1258,7 +1264,7 @@ retry_private:
ret = get_futex_value_locked(&curval2, uaddr2);
if (!ret)
ret = lookup_pi_state(curval2, hb2, &key2,
- &pi_state);
+ &pi_state, true);
}

switch (ret) {
--
1.7.1


--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-29 14:56:39

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task make credentials check conditional

On 06/29/2010 01:42 AM, Michal Hocko wrote:
> On Mon 28-06-10 18:49:08, Peter Zijlstra wrote:
>> On Mon, 2010-06-28 at 18:39 +0200, Michal Hocko wrote:
>>> Would something like the following be acceptable (just a compile
>>> tested without comments). It simply makes caller of lookup_pi_state to
>>> decide whether credentials should be checked.
>>
>> So it was Ingo, who in c87e2837be8 (pi-futex:
>> futex_lock_pi/futex_unlock_pi support) introduced the euid checks:
>>
>> +futex_find_get_task():
>> + if ((current->euid != p->euid)&& (current->euid != p->uid)) {
>> + p = NULL;
>> + goto out_unlock;
>> + }
>>
>> Ingo, do you remember the rationale behind that? It seems to be causing
>> grief when two different users contend on the same (shared) futex.
>>
>> See the below proposed solution.
>
> Here is the patch with comments and rationale:
> (reference to the original discussion: http://lkml.org/lkml/2010/6/23/52)
>
> --
> From f477a6d989dfde11c5bb5f28d5ce21d0682f4e25 Mon Sep 17 00:00:00 2001
> From: Michal Hocko<[email protected]>
> Date: Tue, 29 Jun 2010 10:02:58 +0200
> Subject: [PATCH] futex: futex_find_get_task make credentials check conditional
>
> futex_find_get_task is currently used (through lookup_pi_state) from two
> contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
> makes sense in the first code path, the second one is more problematic
> because this check requires that the PI lock holder (pid parameter) has
> the same uid and euid as the process's euid which is trying to lock the
> same futex (current).
>
> This results in glibc assert failure or process hang (if glibc is
> compiled without assert support) for shared robust pthread mutex with
> priority inheritance if a process tries to lock already held lock owned
> by a process with a different euid:
>
> pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
>
> The problem is that futex_lock_pi_atomic which is called when we try to
> lock already held lock checks the current holder (tid is stored in the
> futex value) to get the PI state. It uses lookup_pi_state which in turn
> gets task struct from futex_find_get_task. ESRCH is returned either when
> the task is not found or if credentials check fails.
> futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
> however, doesn't expect that robust lock returns with ESRCH because it
> should get either success or owner died.
>
> Let's make credentials check conditional (as a new parameter) in
> futex_find_get_task. Then we can prevent from check in the pi lock path
> and still preserve it in the futex_requeue path.
>


Hi Michal,

All the above is accurate, however I think it emphasizes glibc's
expectations when the core of the issue is that shared PI futexes don't
work across processes with different uid's.

It seems like most users of shared futexes do so from the same uid,
however I can think of situations where it would be useful to use them
from different uid's. Since shared futexes key on their physical
address, their shouldn't be any security issues with allowing different
uids.


> Signed-off-by: Michal Hocko<[email protected]>
> ---
> kernel/futex.c | 24 +++++++++++++++---------
> 1 files changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index e7a35f1..79b69e5 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -425,8 +425,9 @@ static void free_pi_state(struct futex_pi_state *pi_state)
> /*
> * Look up the task based on what TID userspace gave us.
> * We dont trust it.
> + * Check the credentials if required by check_cred

While we're changing comment blocks, please update it to a proper
kerneldoc function descriptor:

/**
* futex_find_get_task() - Lookup task by TID
* @pid: TID of the task_struct to find
* @check_cred: check credentials (1) or not (0)
*
* Look up the task based on the TID userspace gave us. We don't trust
* it. Optionally check the credentials.
*
* Returns a valid task_struct pointer or an error code embedded in the
* pointer value.
*/

The above should probably also include whatever motivation Ingo comes
back with for having done the uid check in the first place - which I
confess I am not seeing.

> */
> -static struct task_struct * futex_find_get_task(pid_t pid)
> +static struct task_struct * futex_find_get_task(pid_t pid, bool check_cred)

bool is nice, not used elsewhere, but clearly defines purpose. I may
need to update some of the other flags throughout the file in a
follow-on patch.

> {
> struct task_struct *p;
> const struct cred *cred = current_cred(), *pcred;
> @@ -436,10 +437,12 @@ static struct task_struct * futex_find_get_task(pid_t pid)
> if (!p) {
> p = ERR_PTR(-ESRCH);
> } else {
> - pcred = __task_cred(p);
> - if (cred->euid != pcred->euid&&
> - cred->euid != pcred->uid)
> - p = ERR_PTR(-ESRCH);
> + if (check_cred) {
> + pcred = __task_cred(p);
> + if (cred->euid != pcred->euid&&
> + cred->euid != pcred->uid)
> + p = ERR_PTR(-ESRCH);
> + }
> else
> get_task_struct(p);
> }
> @@ -504,9 +507,10 @@ void exit_pi_state_list(struct task_struct *curr)
> raw_spin_unlock_irq(&curr->pi_lock);
> }
>
> +/* check_cred is just passed through to futex_find_get_task */
> static int
> lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> - union futex_key *key, struct futex_pi_state **ps)
> + union futex_key *key, struct futex_pi_state **ps, bool check_cred)

Wrap at 80.

> {
> struct futex_pi_state *pi_state = NULL;
> struct futex_q *this, *next;
> @@ -563,7 +567,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> */
> if (!pid)
> return -ESRCH;
> - p = futex_find_get_task(pid);
> + p = futex_find_get_task(pid, check_cred);
> if (IS_ERR(p))
> return PTR_ERR(p);
>
> @@ -704,8 +708,10 @@ retry:
> /*
> * We dont have the lock. Look up the PI state (or create it if
> * we are the first waiter):
> + * Do not ask for credentials check because we want to share the
> + * lock between processes with different (e)uids

Please merge the new comments into the old. Keeping the original colon
confuses the comment block. Try:

/*
* We dont have the lock. Look up the PI state (or create it if
* we are the first waiter). Don't ask for a credentials check
* as we need to allow shared locks between processes with
* different (e)uids.

Thanks,

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

2010-06-29 15:25:00

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task make credentials check conditional

On Tue 29-06-10 07:56:06, Darren Hart wrote:
> On 06/29/2010 01:42 AM, Michal Hocko wrote:
> >On Mon 28-06-10 18:49:08, Peter Zijlstra wrote:
> >>On Mon, 2010-06-28 at 18:39 +0200, Michal Hocko wrote:
> >>>Would something like the following be acceptable (just a compile
> >>>tested without comments). It simply makes caller of lookup_pi_state to
> >>>decide whether credentials should be checked.
> >>
> >>So it was Ingo, who in c87e2837be8 (pi-futex:
> >>futex_lock_pi/futex_unlock_pi support) introduced the euid checks:
> >>
> >>+futex_find_get_task():
> >>+ if ((current->euid != p->euid)&& (current->euid != p->uid)) {
> >>+ p = NULL;
> >>+ goto out_unlock;
> >>+ }
> >>
> >>Ingo, do you remember the rationale behind that? It seems to be causing
> >>grief when two different users contend on the same (shared) futex.
> >>
> >>See the below proposed solution.
> >
> >Here is the patch with comments and rationale:
> >(reference to the original discussion: http://lkml.org/lkml/2010/6/23/52)
> >
> >--
> > From f477a6d989dfde11c5bb5f28d5ce21d0682f4e25 Mon Sep 17 00:00:00 2001
> >From: Michal Hocko<[email protected]>
> >Date: Tue, 29 Jun 2010 10:02:58 +0200
> >Subject: [PATCH] futex: futex_find_get_task make credentials check conditional
> >
> >futex_find_get_task is currently used (through lookup_pi_state) from two
> >contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
> >makes sense in the first code path, the second one is more problematic
> >because this check requires that the PI lock holder (pid parameter) has
> >the same uid and euid as the process's euid which is trying to lock the
> >same futex (current).
> >
> >This results in glibc assert failure or process hang (if glibc is
> >compiled without assert support) for shared robust pthread mutex with
> >priority inheritance if a process tries to lock already held lock owned
> >by a process with a different euid:
> >
> >pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
> >
> >The problem is that futex_lock_pi_atomic which is called when we try to
> >lock already held lock checks the current holder (tid is stored in the
> >futex value) to get the PI state. It uses lookup_pi_state which in turn
> >gets task struct from futex_find_get_task. ESRCH is returned either when
> >the task is not found or if credentials check fails.
> >futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
> >however, doesn't expect that robust lock returns with ESRCH because it
> >should get either success or owner died.
> >
> >Let's make credentials check conditional (as a new parameter) in
> >futex_find_get_task. Then we can prevent from check in the pi lock path
> >and still preserve it in the futex_requeue path.
> >
>
>
> Hi Michal,

Hey,

>
> All the above is accurate, however I think it emphasizes glibc's
> expectations when the core of the issue is that shared PI futexes
> don't work across processes with different uid's.

I understand that but I failed to find any documentation which would
point that out. The issue is that this may be non-intuitive for users
who are trying to "improve" their application which uses robust mutexes
to use PI and that fails silently (even if glibc would fix the assert).

>
> It seems like most users of shared futexes do so from the same uid,
> however I can think of situations where it would be useful to use
> them from different uid's. Since shared futexes key on their
> physical address, their shouldn't be any security issues with
> allowing different uids.

The original issue came from our customer (working on Firebird). I don't
know many details why they need different users but AFAIU they are
accessing some files and separate functionality into processes with
different users.

I don't see any security concerns for shared locks as well, but I am not
a security guy.

>
>
> >Signed-off-by: Michal Hocko<[email protected]>
> >---
> > kernel/futex.c | 24 +++++++++++++++---------
> > 1 files changed, 15 insertions(+), 9 deletions(-)
> >
> >diff --git a/kernel/futex.c b/kernel/futex.c
> >index e7a35f1..79b69e5 100644
> >--- a/kernel/futex.c
> >+++ b/kernel/futex.c
> >@@ -425,8 +425,9 @@ static void free_pi_state(struct futex_pi_state *pi_state)
> > /*
> > * Look up the task based on what TID userspace gave us.
> > * We dont trust it.
> >+ * Check the credentials if required by check_cred
>
> While we're changing comment blocks, please update it to a proper
> kerneldoc function descriptor:
>
> /**
> * futex_find_get_task() - Lookup task by TID
> * @pid: TID of the task_struct to find
> * @check_cred: check credentials (1) or not (0)
> *
> * Look up the task based on the TID userspace gave us. We don't trust
> * it. Optionally check the credentials.
> *
> * Returns a valid task_struct pointer or an error code embedded in the
> * pointer value.
> */
>
> The above should probably also include whatever motivation Ingo
> comes back with for having done the uid check in the first place -
> which I confess I am not seeing.

OK, no problem.

>
> > */
> >-static struct task_struct * futex_find_get_task(pid_t pid)
> >+static struct task_struct * futex_find_get_task(pid_t pid, bool check_cred)
>
> bool is nice, not used elsewhere, but clearly defines purpose. I may
> need to update some of the other flags throughout the file in a
> follow-on patch.
>
> > {
> > struct task_struct *p;
> > const struct cred *cred = current_cred(), *pcred;
> >@@ -436,10 +437,12 @@ static struct task_struct * futex_find_get_task(pid_t pid)
> > if (!p) {
> > p = ERR_PTR(-ESRCH);
> > } else {
> >- pcred = __task_cred(p);
> >- if (cred->euid != pcred->euid&&
> >- cred->euid != pcred->uid)
> >- p = ERR_PTR(-ESRCH);
> >+ if (check_cred) {
> >+ pcred = __task_cred(p);
> >+ if (cred->euid != pcred->euid&&
> >+ cred->euid != pcred->uid)
> >+ p = ERR_PTR(-ESRCH);
> >+ }
> > else
> > get_task_struct(p);
> > }
> >@@ -504,9 +507,10 @@ void exit_pi_state_list(struct task_struct *curr)
> > raw_spin_unlock_irq(&curr->pi_lock);
> > }
> >
> >+/* check_cred is just passed through to futex_find_get_task */
> > static int
> > lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> >- union futex_key *key, struct futex_pi_state **ps)
> >+ union futex_key *key, struct futex_pi_state **ps, bool check_cred)
>
> Wrap at 80.

Sure

>
> > {
> > struct futex_pi_state *pi_state = NULL;
> > struct futex_q *this, *next;
> >@@ -563,7 +567,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> > */
> > if (!pid)
> > return -ESRCH;
> >- p = futex_find_get_task(pid);
> >+ p = futex_find_get_task(pid, check_cred);
> > if (IS_ERR(p))
> > return PTR_ERR(p);
> >
> >@@ -704,8 +708,10 @@ retry:
> > /*
> > * We dont have the lock. Look up the PI state (or create it if
> > * we are the first waiter):
> >+ * Do not ask for credentials check because we want to share the
> >+ * lock between processes with different (e)uids
>
> Please merge the new comments into the old. Keeping the original
> colon confuses the comment block. Try:
>
> /*
> * We dont have the lock. Look up the PI state (or create it if
> * we are the first waiter). Don't ask for a credentials check
> * as we need to allow shared locks between processes with
> * different (e)uids.

Sure.

>
> Thanks,

Thanks for comments!

>
> --
> Darren Hart
> IBM Linux Technology Center
> Real-Time Linux Team

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-29 16:41:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task make credentials check conditional

On Tue, Jun 29, 2010 at 1:42 AM, Michal Hocko <[email protected]> wrote:
>
> futex_find_get_task is currently used (through lookup_pi_state) from two
> contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
> makes sense in the first code path, the second one is more problematic
> because this check requires that the PI lock holder (pid parameter) has
> the same uid and euid as the process's euid which is trying to lock the
> same futex (current).

So exactly why does it make sense to check the credentials in the
first code path then? Shouldn't the futex issue in the end depend on
whether you have a shared page or not - and not on credentials at all?
Any two processes that share a futex in the same shared page should be
able to use that without any regard for whether they are the same
user. That's kind of the point, no?

IOW, I personally dislike these kinds of conditional checks,
especially since the discussion (at least the part I've seen) hasn't
made it clear why it should be conditional - or exist - in the first
place.

So I'd like the patch to include an explanation of exactly why the two
cases are different.

The other thing I'd like to see is to move the whole cred checking up
a level. There's no reason to check the credentials in
futex_find_get_task() that I can see - why not do it in the caller
instead? IOW, I think futex_find_get_task() should just look something
like this instead:


static struct task_struct * futex_find_get_task(pid_t pid)
{
struct task_struct *p;

rcu_read_lock();
p = find_task_by_vpid(pid);
if (p)
get_task_struct(p);
rcu_read_unlock();

return p;
}

and then in the caller we'd do something like

p = futex_find_get_task(pid);
if (!p)
return -ESEARCH;

if ( .. check p credentials is necessary and fails..)
goto put_task_and_exit;

because especially with not everybody needing the credentials check, I
do not think it should be done at the lowest level (it's clearly not
fundamental to the operation, so it shouldn't be part of the core
lookup).

With some re-factoring, it might even be possible to avoid a dynamic
check at all, and just have two different static paths for the two
cases. That's a separate issue, though.

Linus

2010-06-29 16:58:29

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task make credentials check conditional

On 06/29/2010 09:41 AM, Linus Torvalds wrote:
> On Tue, Jun 29, 2010 at 1:42 AM, Michal Hocko<[email protected]> wrote:
>>
>> futex_find_get_task is currently used (through lookup_pi_state) from two
>> contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
>> makes sense in the first code path, the second one is more problematic
>> because this check requires that the PI lock holder (pid parameter) has
>> the same uid and euid as the process's euid which is trying to lock the
>> same futex (current).
>
> So exactly why does it make sense to check the credentials in the
> first code path then? Shouldn't the futex issue in the end depend on
> whether you have a shared page or not - and not on credentials at all?
> Any two processes that share a futex in the same shared page should be
> able to use that without any regard for whether they are the same
> user. That's kind of the point, no?

I agree and haven't been able to come up with a need for the test
either, but I wanted to hear back from Ingo as the he authored the
original check.

I was trying to see if futex_lock_pi() could somehow be abused, but if
so, I don't see it:

TaskUserA TaskUserB
futex_lock_pi(addrA)
*addrB = TID_OF(TaskUserA)
futex_lock_pi(addrB)

TaskUserB would lookup the pi_state, not find it as addrB and addrA
don't hash to the same key, create a new pi_state and mark TaskUserA as
the owner, then block.

Once TaskUserA exits, the pi_list will contain the pi_state for the
addrB futex. This is "bad", but the kernel cleans it up, releases the
lock - but doesn't wake TaskUserB. That seems acceptable to me since
TaskUserB is in the wrong here.


> IOW, I personally dislike these kinds of conditional checks,
> especially since the discussion (at least the part I've seen) hasn't
> made it clear why it should be conditional - or exist - in the first
> place.
>
> So I'd like the patch to include an explanation of exactly why the two
> cases are different.

Agreed, waiting on Ingo at the moment.

> The other thing I'd like to see is to move the whole cred checking up
> a level. There's no reason to check the credentials in
> futex_find_get_task() that I can see - why not do it in the caller
> instead? IOW, I think futex_find_get_task() should just look something
> like this instead:


/me beats head on desk, duh. Still, I'm hoping this isn't necessary and
we can lose the credentials checking entirely.

Thanks,

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

2010-06-29 18:05:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task make credentials check conditional

On Tue, 29 Jun 2010, Darren Hart wrote:
> On 06/29/2010 09:41 AM, Linus Torvalds wrote:
> >
> > The other thing I'd like to see is to move the whole cred checking up
> > a level. There's no reason to check the credentials in
> > futex_find_get_task() that I can see - why not do it in the caller
> > instead? IOW, I think futex_find_get_task() should just look something
> > like this instead:
>
>
> /me beats head on desk, duh. Still, I'm hoping this isn't necessary and we can
> lose the credentials checking entirely.

I did some archaeology in my mail and code archives.

The only hint I found from the early days of the PI futex code is a
mail which mutters something about pid reuse. The euid/uid check
surfaced in the code archive right after that, but to be honest I fail
to see how it should be related. I rather believe that the whole thing
resulted out of being overly paranoid about the PI usage.

Jakub, any idea why this check got there ?

Today I cannot see a real reason to keep that check, it just adds a
restriction for no benefit.

Thanks,

tglx

2010-06-30 07:01:21

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task make credentials check conditional

On Tue 29-06-10 09:41:02, Linus Torvalds wrote:
> On Tue, Jun 29, 2010 at 1:42 AM, Michal Hocko <[email protected]> wrote:
> >
> > futex_find_get_task is currently used (through lookup_pi_state) from two
> > contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
> > makes sense in the first code path, the second one is more problematic
> > because this check requires that the PI lock holder (pid parameter) has
> > the same uid and euid as the process's euid which is trying to lock the
> > same futex (current).
>
> So exactly why does it make sense to check the credentials in the
> first code path then?

I though that requeue needs this for security reasons (don't let requeue
process for other user), but when I thought about that again you are
right and the only what matters should be accessibility of the shared
memory.


--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-30 09:55:32

by Michal Hocko

[permalink] [raw]
Subject: [PATCH] futex: futex_find_get_task remove credentails check

On Wed 30-06-10 09:01:15, Michal Hocko wrote:
> On Tue 29-06-10 09:41:02, Linus Torvalds wrote:
> > On Tue, Jun 29, 2010 at 1:42 AM, Michal Hocko <[email protected]> wrote:
> > >
> > > futex_find_get_task is currently used (through lookup_pi_state) from two
> > > contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
> > > makes sense in the first code path, the second one is more problematic
> > > because this check requires that the PI lock holder (pid parameter) has
> > > the same uid and euid as the process's euid which is trying to lock the
> > > same futex (current).
> >
> > So exactly why does it make sense to check the credentials in the
> > first code path then?
>
> I though that requeue needs this for security reasons (don't let requeue
> process for other user), but when I thought about that again you are
> right and the only what matters should be accessibility of the shared
> memory.

And here is the patch which does the thing.

--

>From 082c5ad2c482a8e78b61b17e213e750b006176aa Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Wed, 30 Jun 2010 09:51:19 +0200
Subject: [PATCH] futex: futex_find_get_task remove credentails check

futex_find_get_task is currently used (through lookup_pi_state) from two
contexts, futex_requeue and futex_lock_pi_atomic. None of the paths
looks it needs the credentials check, though. Different (e)uids
shouldn't matter at all because the only thing that is important for
shared futex is the accessibility of the shared memory.

The credentail check results in glibc assert failure or process hang (if
glibc is compiled without assert support) for shared robust pthread
mutex with priority inheritance if a process tries to lock already held
lock owned by a process with a different euid:

pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.

The problem is that futex_lock_pi_atomic which is called when we try to
lock already held lock checks the current holder (tid is stored in the
futex value) to get the PI state. It uses lookup_pi_state which in turn
gets task struct from futex_find_get_task. ESRCH is returned either when
the task is not found or if credentials check fails.
futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
however, doesn't expect that robust lock returns with ESRCH because it
should get either success or owner died.

Signed-off-by: Michal Hocko <[email protected]>
---
kernel/futex.c | 17 ++++-------------
1 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e7a35f1..6a3a5fa 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -429,20 +429,11 @@ static void free_pi_state(struct futex_pi_state *pi_state)
static struct task_struct * futex_find_get_task(pid_t pid)
{
struct task_struct *p;
- const struct cred *cred = current_cred(), *pcred;

rcu_read_lock();
p = find_task_by_vpid(pid);
- if (!p) {
- p = ERR_PTR(-ESRCH);
- } else {
- pcred = __task_cred(p);
- if (cred->euid != pcred->euid &&
- cred->euid != pcred->uid)
- p = ERR_PTR(-ESRCH);
- else
- get_task_struct(p);
- }
+ if (p)
+ get_task_struct(p);

rcu_read_unlock();

@@ -564,8 +555,8 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
if (!pid)
return -ESRCH;
p = futex_find_get_task(pid);
- if (IS_ERR(p))
- return PTR_ERR(p);
+ if (!p)
+ return -ESRCH;

/*
* We need to look at the task state flags to figure out,
--
1.7.1


--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-06-30 16:43:46

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task remove credentails check

On 06/30/2010 02:55 AM, Michal Hocko wrote:
> On Wed 30-06-10 09:01:15, Michal Hocko wrote:
>> On Tue 29-06-10 09:41:02, Linus Torvalds wrote:
>>> On Tue, Jun 29, 2010 at 1:42 AM, Michal Hocko<[email protected]> wrote:
>>>>
>>>> futex_find_get_task is currently used (through lookup_pi_state) from two
>>>> contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
>>>> makes sense in the first code path, the second one is more problematic
>>>> because this check requires that the PI lock holder (pid parameter) has
>>>> the same uid and euid as the process's euid which is trying to lock the
>>>> same futex (current).
>>>
>>> So exactly why does it make sense to check the credentials in the
>>> first code path then?
>>
>> I though that requeue needs this for security reasons (don't let requeue
>> process for other user), but when I thought about that again you are
>> right and the only what matters should be accessibility of the shared
>> memory.
>
> And here is the patch which does the thing.
>
> --
>
> From 082c5ad2c482a8e78b61b17e213e750b006176aa Mon Sep 17 00:00:00 2001
> From: Michal Hocko<[email protected]>
> Date: Wed, 30 Jun 2010 09:51:19 +0200
> Subject: [PATCH] futex: futex_find_get_task remove credentails check
>
> futex_find_get_task is currently used (through lookup_pi_state) from two
> contexts, futex_requeue and futex_lock_pi_atomic. None of the paths
> looks it needs the credentials check, though. Different (e)uids
> shouldn't matter at all because the only thing that is important for
> shared futex is the accessibility of the shared memory.
>
> The credentail check results in glibc assert failure or process hang (if
> glibc is compiled without assert support) for shared robust pthread
> mutex with priority inheritance if a process tries to lock already held
> lock owned by a process with a different euid:
>
> pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
>
> The problem is that futex_lock_pi_atomic which is called when we try to
> lock already held lock checks the current holder (tid is stored in the
> futex value) to get the PI state. It uses lookup_pi_state which in turn
> gets task struct from futex_find_get_task. ESRCH is returned either when
> the task is not found or if credentials check fails.
> futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
> however, doesn't expect that robust lock returns with ESRCH because it
> should get either success or owner died.
>
> Signed-off-by: Michal Hocko<[email protected]>

Without hearing back from Ingo on the original intent of the credentials
check, this looks right to me.

Acked-by: Darren Hart <[email protected]>


> ---
> kernel/futex.c | 17 ++++-------------
> 1 files changed, 4 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index e7a35f1..6a3a5fa 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -429,20 +429,11 @@ static void free_pi_state(struct futex_pi_state *pi_state)
> static struct task_struct * futex_find_get_task(pid_t pid)
> {
> struct task_struct *p;
> - const struct cred *cred = current_cred(), *pcred;
>
> rcu_read_lock();
> p = find_task_by_vpid(pid);
> - if (!p) {
> - p = ERR_PTR(-ESRCH);
> - } else {
> - pcred = __task_cred(p);
> - if (cred->euid != pcred->euid&&
> - cred->euid != pcred->uid)
> - p = ERR_PTR(-ESRCH);
> - else
> - get_task_struct(p);
> - }
> + if (p)
> + get_task_struct(p);
>
> rcu_read_unlock();
>
> @@ -564,8 +555,8 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> if (!pid)
> return -ESRCH;
> p = futex_find_get_task(pid);
> - if (IS_ERR(p))
> - return PTR_ERR(p);
> + if (!p)
> + return -ESRCH;
>
> /*
> * We need to look at the task state flags to figure out,


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

2010-07-08 09:28:23

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task remove credentails check

On Wed 30-06-10 09:43:27, Darren Hart wrote:
> On 06/30/2010 02:55 AM, Michal Hocko wrote:
> >On Wed 30-06-10 09:01:15, Michal Hocko wrote:
> >>On Tue 29-06-10 09:41:02, Linus Torvalds wrote:
> >>>On Tue, Jun 29, 2010 at 1:42 AM, Michal Hocko<[email protected]> wrote:
> >>>>
> >>>>futex_find_get_task is currently used (through lookup_pi_state) from two
> >>>>contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
> >>>>makes sense in the first code path, the second one is more problematic
> >>>>because this check requires that the PI lock holder (pid parameter) has
> >>>>the same uid and euid as the process's euid which is trying to lock the
> >>>>same futex (current).
> >>>
> >>>So exactly why does it make sense to check the credentials in the
> >>>first code path then?
> >>
> >>I though that requeue needs this for security reasons (don't let requeue
> >>process for other user), but when I thought about that again you are
> >>right and the only what matters should be accessibility of the shared
> >>memory.
> >
> >And here is the patch which does the thing.
> >
> >--
> >
> > From 082c5ad2c482a8e78b61b17e213e750b006176aa Mon Sep 17 00:00:00 2001
> >From: Michal Hocko<[email protected]>
> >Date: Wed, 30 Jun 2010 09:51:19 +0200
> >Subject: [PATCH] futex: futex_find_get_task remove credentails check
> >
> >futex_find_get_task is currently used (through lookup_pi_state) from two
> >contexts, futex_requeue and futex_lock_pi_atomic. None of the paths
> >looks it needs the credentials check, though. Different (e)uids
> >shouldn't matter at all because the only thing that is important for
> >shared futex is the accessibility of the shared memory.
> >
> >The credentail check results in glibc assert failure or process hang (if
> >glibc is compiled without assert support) for shared robust pthread
> >mutex with priority inheritance if a process tries to lock already held
> >lock owned by a process with a different euid:
> >
> >pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
> >
> >The problem is that futex_lock_pi_atomic which is called when we try to
> >lock already held lock checks the current holder (tid is stored in the
> >futex value) to get the PI state. It uses lookup_pi_state which in turn
> >gets task struct from futex_find_get_task. ESRCH is returned either when
> >the task is not found or if credentials check fails.
> >futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
> >however, doesn't expect that robust lock returns with ESRCH because it
> >should get either success or owner died.
> >
> >Signed-off-by: Michal Hocko<[email protected]>
>
> Without hearing back from Ingo on the original intent of the
> credentials check, this looks right to me.

Could you comment on that Ingo, please?

>
> Acked-by: Darren Hart <[email protected]>
>
>
> >---
> > kernel/futex.c | 17 ++++-------------
> > 1 files changed, 4 insertions(+), 13 deletions(-)
> >
> >diff --git a/kernel/futex.c b/kernel/futex.c
> >index e7a35f1..6a3a5fa 100644
> >--- a/kernel/futex.c
> >+++ b/kernel/futex.c
> >@@ -429,20 +429,11 @@ static void free_pi_state(struct futex_pi_state *pi_state)
> > static struct task_struct * futex_find_get_task(pid_t pid)
> > {
> > struct task_struct *p;
> >- const struct cred *cred = current_cred(), *pcred;
> >
> > rcu_read_lock();
> > p = find_task_by_vpid(pid);
> >- if (!p) {
> >- p = ERR_PTR(-ESRCH);
> >- } else {
> >- pcred = __task_cred(p);
> >- if (cred->euid != pcred->euid&&
> >- cred->euid != pcred->uid)
> >- p = ERR_PTR(-ESRCH);
> >- else
> >- get_task_struct(p);
> >- }
> >+ if (p)
> >+ get_task_struct(p);
> >
> > rcu_read_unlock();
> >
> >@@ -564,8 +555,8 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
> > if (!pid)
> > return -ESRCH;
> > p = futex_find_get_task(pid);
> >- if (IS_ERR(p))
> >- return PTR_ERR(p);
> >+ if (!p)
> >+ return -ESRCH;
> >
> > /*
> > * We need to look at the task state flags to figure out,
>
>
> --
> Darren Hart
> IBM Linux Technology Center
> Real-Time Linux Team

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-07-08 09:33:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task remove credentails check


* Michal Hocko <[email protected]> wrote:

> On Wed 30-06-10 09:43:27, Darren Hart wrote:
> > On 06/30/2010 02:55 AM, Michal Hocko wrote:
> > >On Wed 30-06-10 09:01:15, Michal Hocko wrote:
> > >>On Tue 29-06-10 09:41:02, Linus Torvalds wrote:
> > >>>On Tue, Jun 29, 2010 at 1:42 AM, Michal Hocko<[email protected]> wrote:
> > >>>>
> > >>>>futex_find_get_task is currently used (through lookup_pi_state) from two
> > >>>>contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
> > >>>>makes sense in the first code path, the second one is more problematic
> > >>>>because this check requires that the PI lock holder (pid parameter) has
> > >>>>the same uid and euid as the process's euid which is trying to lock the
> > >>>>same futex (current).
> > >>>
> > >>>So exactly why does it make sense to check the credentials in the
> > >>>first code path then?
> > >>
> > >>I though that requeue needs this for security reasons (don't let requeue
> > >>process for other user), but when I thought about that again you are
> > >>right and the only what matters should be accessibility of the shared
> > >>memory.
> > >
> > >And here is the patch which does the thing.
> > >
> > >--
> > >
> > > From 082c5ad2c482a8e78b61b17e213e750b006176aa Mon Sep 17 00:00:00 2001
> > >From: Michal Hocko<[email protected]>
> > >Date: Wed, 30 Jun 2010 09:51:19 +0200
> > >Subject: [PATCH] futex: futex_find_get_task remove credentails check
> > >
> > >futex_find_get_task is currently used (through lookup_pi_state) from two
> > >contexts, futex_requeue and futex_lock_pi_atomic. None of the paths
> > >looks it needs the credentials check, though. Different (e)uids
> > >shouldn't matter at all because the only thing that is important for
> > >shared futex is the accessibility of the shared memory.
> > >
> > >The credentail check results in glibc assert failure or process hang (if
> > >glibc is compiled without assert support) for shared robust pthread
> > >mutex with priority inheritance if a process tries to lock already held
> > >lock owned by a process with a different euid:
> > >
> > >pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
> > >
> > >The problem is that futex_lock_pi_atomic which is called when we try to
> > >lock already held lock checks the current holder (tid is stored in the
> > >futex value) to get the PI state. It uses lookup_pi_state which in turn
> > >gets task struct from futex_find_get_task. ESRCH is returned either when
> > >the task is not found or if credentials check fails.
> > >futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
> > >however, doesn't expect that robust lock returns with ESRCH because it
> > >should get either success or owner died.
> > >
> > >Signed-off-by: Michal Hocko<[email protected]>
> >
> > Without hearing back from Ingo on the original intent of the
> > credentials check, this looks right to me.
>
> Could you comment on that Ingo, please?

I think that's more of a question to Thomas :-)

My memories are hazy and nothing springs out as some credible original intent.
So please assume it doesnt exist :-)

Ingo

2010-07-08 09:39:27

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task remove credentails check

On Thu 08-07-10 11:32:41, Ingo Molnar wrote:
>
> * Michal Hocko <[email protected]> wrote:
>
> > On Wed 30-06-10 09:43:27, Darren Hart wrote:
> > > On 06/30/2010 02:55 AM, Michal Hocko wrote:
> > > >On Wed 30-06-10 09:01:15, Michal Hocko wrote:
> > > >>On Tue 29-06-10 09:41:02, Linus Torvalds wrote:
> > > >>>On Tue, Jun 29, 2010 at 1:42 AM, Michal Hocko<[email protected]> wrote:
> > > >>>>
> > > >>>>futex_find_get_task is currently used (through lookup_pi_state) from two
> > > >>>>contexts, futex_requeue and futex_lock_pi_atomic. While credentials check
> > > >>>>makes sense in the first code path, the second one is more problematic
> > > >>>>because this check requires that the PI lock holder (pid parameter) has
> > > >>>>the same uid and euid as the process's euid which is trying to lock the
> > > >>>>same futex (current).
> > > >>>
> > > >>>So exactly why does it make sense to check the credentials in the
> > > >>>first code path then?
> > > >>
> > > >>I though that requeue needs this for security reasons (don't let requeue
> > > >>process for other user), but when I thought about that again you are
> > > >>right and the only what matters should be accessibility of the shared
> > > >>memory.
> > > >
> > > >And here is the patch which does the thing.
> > > >
> > > >--
> > > >
> > > > From 082c5ad2c482a8e78b61b17e213e750b006176aa Mon Sep 17 00:00:00 2001
> > > >From: Michal Hocko<[email protected]>
> > > >Date: Wed, 30 Jun 2010 09:51:19 +0200
> > > >Subject: [PATCH] futex: futex_find_get_task remove credentails check
> > > >
> > > >futex_find_get_task is currently used (through lookup_pi_state) from two
> > > >contexts, futex_requeue and futex_lock_pi_atomic. None of the paths
> > > >looks it needs the credentials check, though. Different (e)uids
> > > >shouldn't matter at all because the only thing that is important for
> > > >shared futex is the accessibility of the shared memory.
> > > >
> > > >The credentail check results in glibc assert failure or process hang (if
> > > >glibc is compiled without assert support) for shared robust pthread
> > > >mutex with priority inheritance if a process tries to lock already held
> > > >lock owned by a process with a different euid:
> > > >
> > > >pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
> > > >
> > > >The problem is that futex_lock_pi_atomic which is called when we try to
> > > >lock already held lock checks the current holder (tid is stored in the
> > > >futex value) to get the PI state. It uses lookup_pi_state which in turn
> > > >gets task struct from futex_find_get_task. ESRCH is returned either when
> > > >the task is not found or if credentials check fails.
> > > >futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
> > > >however, doesn't expect that robust lock returns with ESRCH because it
> > > >should get either success or owner died.
> > > >
> > > >Signed-off-by: Michal Hocko<[email protected]>
> > >
> > > Without hearing back from Ingo on the original intent of the
> > > credentials check, this looks right to me.
> >
> > Could you comment on that Ingo, please?
>
> I think that's more of a question to Thomas :-)
>
> My memories are hazy and nothing springs out as some credible original intent.
> So please assume it doesnt exist :-)

OK, so do you need an ACK from Thomas, or can you grab the patch and
push it through one of your trees?

>
> Ingo

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2010-07-08 09:44:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task remove credentails check

On Thu, 2010-07-08 at 11:39 +0200, Michal Hocko wrote:
> OK, so do you need an ACK from Thomas, or can you grab the patch and
> push it through one of your trees?
>
Have a look at Linus' tree.

2010-07-08 09:50:53

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] futex: futex_find_get_task remove credentails check

On Thu 08-07-10 11:43:53, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 11:39 +0200, Michal Hocko wrote:
> > OK, so do you need an ACK from Thomas, or can you grab the patch and
> > push it through one of your trees?
> >
> Have a look at Linus' tree.

Ahh, you mean 7a0ea09ad5352efce8fe79ed853150449903b9f5. I didn't get any
email about the committing.

Thanks!

--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic