2011-03-28 07:40:37

by xby

[permalink] [raw]
Subject: PROBLEM:a bug about pi-futex maybe let the program going to hang

hi, all.

Maybe, there is a bug about pi-futex, it would let the program in user-space going to hang.

We have a board: CPU is powerpc 8572, two core. after ran one month, the state of pi-futex in user-space got bad: mutex->__data.__lock is 0x8000023e, mutex->__data.__count is 0, mutex->__data.__owner is 0.

then, I review file "kernel/funtex.c"(the version is linux 2.6.38), found a case:

if there are 3 thread, named threadA, threadB, threadC??thread A hold mutexM, threadB and threadC is waiting mutexM. They run as fllow steps:

1. threadB and threadC sleep at line 1984.
2. threadB receive a signal, then it will be wake up.
3. threadA unlock mutexM, and give mutexM to threadB.
4. threadB call fixup_owner, try to give mutex to threadC.
5. at line 1580, threadB trigger a addr-fault, then goto handle_fault.
6. at line 1617, threadB release spinlock, then handle fault.
7. threadC got spinlock, and call fixup_owner, and got mutexM.
8. threadC give mutexM to threadB.
9. threadB re-got spinlock, it will found "pi_state->owner == oldowner" and retry to fixup.
10. threadB give mutexM to threadC, that's a bad thing.

we have wrote a program, this program can prove all above.


2011-03-28 08:26:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: PROBLEM:a bug about pi-futex maybe let the program going to hang

On Mon, 2011-03-28 at 15:25 +0800, xby wrote:
> hi, all.

Works better if you also CC people who actually work on that code.

> Maybe, there is a bug about pi-futex, it would let the program in user-space going to hang.
>
> We have a board: CPU is powerpc 8572, two core. after ran one month, the state of pi-futex in user-space got bad: mutex->__data.__lock is 0x8000023e, mutex->__data.__count is 0, mutex->__data.__owner is 0.
>
> then, I review file "kernel/funtex.c"(the version is linux 2.6.38), found a case:
>
> if there are 3 thread, named threadA, threadB, threadC。thread A hold mutexM, threadB and threadC is waiting mutexM. They run as fllow steps:
>
> 1. threadB and threadC sleep at line 1984.
> 2. threadB receive a signal, then it will be wake up.
> 3. threadA unlock mutexM, and give mutexM to threadB.
> 4. threadB call fixup_owner, try to give mutex to threadC.
> 5. at line 1580, threadB trigger a addr-fault, then goto handle_fault.
> 6. at line 1617, threadB release spinlock, then handle fault.
> 7. threadC got spinlock, and call fixup_owner, and got mutexM.
> 8. threadC give mutexM to threadB.
> 9. threadB re-got spinlock, it will found "pi_state->owner == oldowner" and retry to fixup.
> 10. threadB give mutexM to threadC, that's a bad thing.
>
> we have wrote a program, this program can prove all above.

It would have been ever so much more useful if you'd have included that.

2011-03-28 10:01:46

by xby

[permalink] [raw]
Subject: Re:Re: PROBLEM:a bug about pi-futex maybe let the program going to hang


At 2011-03-28 16:26:22??"Peter Zijlstra" <[email protected]> wrote:

>On Mon, 2011-03-28 at 15:25 +0800, xby wrote:
>> hi, all.
>
>Works better if you also CC people who actually work on that code.
>
>> Maybe, there is a bug about pi-futex, it would let the program in user-space going to hang.
>>
>> We have a board: CPU is powerpc 8572, two core. after ran one month, the state of pi-futex in user-space got bad: mutex->__data.__lock is 0x8000023e, mutex->__data.__count is 0, mutex->__data.__owner is 0.
>>
>> then, I review file "kernel/funtex.c"(the version is linux 2.6.38), found a case:
>>
>> if there are 3 thread, named threadA, threadB, threadC??thread A hold mutexM, threadB and threadC is waiting mutexM. They run as fllow steps:
>>
>> 1. threadB and threadC sleep at line 1984.
>> 2. threadB receive a signal, then it will be wake up.
>> 3. threadA unlock mutexM, and give mutexM to threadB.
>> 4. threadB call fixup_owner, try to give mutex to threadC.
>> 5. at line 1580, threadB trigger a addr-fault, then goto handle_fault.
>> 6. at line 1617, threadB release spinlock, then handle fault.
>> 7. threadC got spinlock, and call fixup_owner, and got mutexM.
>> 8. threadC give mutexM to threadB.
>> 9. threadB re-got spinlock, it will found "pi_state->owner == oldowner" and retry to fixup.
>> 10. threadB give mutexM to threadC, that's a bad thing.
>>
>> we have wrote a program, this program can prove all above.
>
>It would have been ever so much more useful if you'd have included that.

sorry, the code lies at office, and can't mail to all. I'm at home now ^-^

2011-03-28 17:38:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: Re: PROBLEM:a bug about pi-futex maybe let the program going to hang

On Mon, Mar 28, 2011 at 05:43:49PM +0800, xby wrote:
> >
> >It would have been ever so much more useful if you'd have included that.
>
> sorry, the code lies at office, and can't mail to all. I'm at home now ^-^
>

Also, please do not post multiple times. As it is embarassing that I
sent out the same email that Peter did.

If no one answers:

1) Add Cc's to those that work on the code (like Peter said)

2) Either reply to your original post, or say you posted before.

But do not just repost the exact same email a couple of days later!

-- Steve

2011-03-28 22:13:41

by Darren Hart

[permalink] [raw]
Subject: Re: PROBLEM:a bug about pi-futex maybe let the program going to hang



On 03/28/2011 01:26 AM, Peter Zijlstra wrote:
> On Mon, 2011-03-28 at 15:25 +0800, xby wrote:
>> hi, all.
>
> Works better if you also CC people who actually work on that code.
>
>> Maybe, there is a bug about pi-futex, it would let the program in user-space going to hang.
>>
>> We have a board: CPU is powerpc 8572, two core. after ran one month, the state of pi-futex in user-space got bad: mutex->__data.__lock is 0x8000023e, mutex->__data.__count is 0, mutex->__data.__owner is 0.
>>
>> then, I review file "kernel/funtex.c"(the version is linux 2.6.38), found a case:
>>
>> if there are 3 thread, named threadA, threadB, threadC。thread A hold mutexM, threadB and threadC is waiting mutexM. They run as fllow steps:
>>
>> 1. threadB and threadC sleep at line 1984.
>> 2. threadB receive a signal, then it will be wake up.
>> 3. threadA unlock mutexM, and give mutexM to threadB.
>> 4. threadB call fixup_owner, try to give mutex to threadC.
>> 5. at line 1580, threadB trigger a addr-fault, then goto handle_fault.
>> 6. at line 1617, threadB release spinlock, then handle fault.
>> 7. threadC got spinlock, and call fixup_owner, and got mutexM.
>> 8. threadC give mutexM to threadB.
>> 9. threadB re-got spinlock, it will found "pi_state->owner == oldowner" and retry to fixup.
>> 10. threadB give mutexM to threadC, that's a bad thing.
>>
>> we have wrote a program, this program can prove all above.
>
> It would have been ever so much more useful if you'd have included that.


Please reply with the testcase and your glibc version please. If this is
a custom kernel, please make your .config as well.

--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel