2009-07-27 08:00:49

by Jens Rosenboom

[permalink] [raw]
Subject: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

We have a problem with infinitely running processes on kernels at least
since 2.6.29.4. It happens on a loaded machine after running for a
couple of days, that a "ps ax" seems to get stuck in get_futex_key while
exiting. Sadly your patch does not fix it as I hoped from the
description, maybe the following tracebacks taken a couple of minutes
apart from the same process can help in identifying some further bug
here:

ps R running 0 12886 12884 0x00000000
c9189cc4 c136ea4b 03d5e000 00000058 c9189c68 c1053959 00000000 c40d6e00
0000061e c9189cb4 c104b558 fffff000 00000007 c1b18000 80000000 c9189d18
00000000 c9189c9c c1020e3f 00000163 80000000 b7f1c000 c9189cc0 c1020135
Call Trace:
[<c136ea4b>] ? schedule+0x28b/0x970
[<c1057bce>] ? trace_hardirqs_on_caller+0x5e/0x180
[<c1020e3f>] ? kmap_atomic+0x1f/0x30
[<c1020135>] ? gup_pte_range+0x115/0x190
[<c1020252>] ? gup_pud_range+0xa2/0x120
[<c1020405>] ? get_user_pages_fast+0x135/0x170
[<c1057cfb>] ? trace_hardirqs_on+0xb/0x10
[<c1020405>] ? get_user_pages_fast+0x135/0x170
[<c105bed5>] ? get_futex_key+0x95/0x1c0
[<c105c60c>] ? futex_wake+0x4c/0x110
[<c105de0d>] ? do_futex+0x21d/0xd00
[<c101bd86>] ? no_context+0x26/0x1a0
[<c102a013>] ? finish_task_switch+0x33/0xf0
[<c101bfbb>] ? __bad_area_nosemaphore+0xbb/0x180
[<c1058d8d>] ? __lock_acquire+0x39d/0x18e0
[<c1058d8d>] ? __lock_acquire+0x39d/0x18e0
[<c101c0c9>] ? __bad_area+0x29/0x50
[<c101c0da>] ? __bad_area+0x3a/0x50
[<c101c122>] ? bad_area_access_error+0x12/0x20
[<c1002e1c>] ? restore_all_notrace+0x0/0x18
[<c101c210>] ? do_page_fault+0x0/0x280
[<c1057c9c>] ? trace_hardirqs_on_caller+0x12c/0x180
[<c105e992>] ? sys_futex+0xa2/0x130
[<c101c210>] ? do_page_fault+0x0/0x280
[<c102fa68>] ? mm_release+0xa8/0xc0
[<c1033668>] ? exit_mm+0x18/0x110
[<c1065121>] ? acct_collect+0x131/0x180
[<c10353cb>] ? do_exit+0x60b/0x680
[<c101c35d>] ? do_page_fault+0x14d/0x280
[<c104b8f6>] ? up_read+0x16/0x30
[<c103547c>] ? do_group_exit+0x3c/0xa0
[<c10354f3>] ? sys_exit_group+0x13/0x20
[<c1002d68>] ? sysenter_do_call+0x12/0x36

ps R running 0 12886 12884 0x00000000
c9189ca4 00200046 c1036db8 c9189c6c c1036f02 00200046 c1067e57 00000001
c15239dc c15239dc c15239dc 00000001 f6bfee60 00051f77 00000000 00000001
c1590000 c1592424 c15955c0 f6bfee64 f6bfecc0 c9189c80 c103716c f6bfecc0
Call Trace:
[<c1036db8>] ? _local_bh_enable+0x48/0xb0
[<c1036f02>] ? __do_softirq+0xe2/0x130
[<c1067e57>] ? handle_fasteoi_irq+0x87/0xc0
[<c103716c>] ? irq_exit+0x3c/0x80
[<c136f25c>] ? preempt_schedule_irq+0x2c/0x60
[<c1057c9c>] ? trace_hardirqs_on_caller+0x12c/0x180
[<c136f262>] preempt_schedule_irq+0x32/0x60
[<c1002cf2>] need_resched+0x1f/0x21
[<c102040a>] ? get_user_pages_fast+0x13a/0x170
[<c105beef>] ? get_futex_key+0xaf/0x1c0
[<c105c60c>] ? futex_wake+0x4c/0x110
[<c105de0d>] ? do_futex+0x21d/0xd00
[<c101bd86>] ? no_context+0x26/0x1a0
[<c102a013>] ? finish_task_switch+0x33/0xf0
[<c101bfbb>] ? __bad_area_nosemaphore+0xbb/0x180
[<c1058d8d>] ? __lock_acquire+0x39d/0x18e0
[<c1058d8d>] ? __lock_acquire+0x39d/0x18e0
[<c101c0c9>] ? __bad_area+0x29/0x50
[<c101c0da>] ? __bad_area+0x3a/0x50
[<c101c122>] ? bad_area_access_error+0x12/0x20
[<c1002e1c>] ? restore_all_notrace+0x0/0x18
[<c101c210>] ? do_page_fault+0x0/0x280
[<c1057c9c>] ? trace_hardirqs_on_caller+0x12c/0x180
[<c105e992>] ? sys_futex+0xa2/0x130
[<c101c210>] ? do_page_fault+0x0/0x280
[<c102fa68>] ? mm_release+0xa8/0xc0
[<c1033668>] ? exit_mm+0x18/0x110
[<c1065121>] ? acct_collect+0x131/0x180
[<c10353cb>] ? do_exit+0x60b/0x680
[<c101c35d>] ? do_page_fault+0x14d/0x280
[<c104b8f6>] ? up_read+0x16/0x30
[<c103547c>] ? do_group_exit+0x3c/0xa0
[<c10354f3>] ? sys_exit_group+0x13/0x20
[<c1002d68>] ? sysenter_do_call+0x12/0x36


2009-07-27 11:28:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Mon, 2009-07-27 at 10:00 +0200, Jens Rosenboom wrote:
> We have a problem with infinitely running processes on kernels at least
> since 2.6.29.4. It happens on a loaded machine after running for a
> couple of days,

What kinds of machine, i386? Could you please enable
CONFIG_FRAME_POINTER, these backtraces are quite mangled.

> that a "ps ax" seems to get stuck in get_futex_key while
> exiting. Sadly your patch

Who's patch, and which patch? 7c8fa4f04ab956076605422d5ed37410893a8a73?
That was only regarding huge pages.

The only loop in get_futex_key() appears to be the one around
get_user_pages_fast(), and I'm not quite sure how that could get stuck
like this.

Could it be glibc loops on futex_wake() returning -EFAULT?

> does not fix it as I hoped from the
> description, maybe the following tracebacks taken a couple of minutes
> apart from the same process can help in identifying some further bug
> here:
>
> ps R running 0 12886 12884 0x00000000
> c9189cc4 c136ea4b 03d5e000 00000058 c9189c68 c1053959 00000000 c40d6e00
> 0000061e c9189cb4 c104b558 fffff000 00000007 c1b18000 80000000 c9189d18
> 00000000 c9189c9c c1020e3f 00000163 80000000 b7f1c000 c9189cc0 c1020135
> Call Trace:
> [<c136ea4b>] ? schedule+0x28b/0x970
> [<c1057bce>] ? trace_hardirqs_on_caller+0x5e/0x180
> [<c1020e3f>] ? kmap_atomic+0x1f/0x30
> [<c1020135>] ? gup_pte_range+0x115/0x190
> [<c1020252>] ? gup_pud_range+0xa2/0x120
> [<c1020405>] ? get_user_pages_fast+0x135/0x170
> [<c1057cfb>] ? trace_hardirqs_on+0xb/0x10
> [<c1020405>] ? get_user_pages_fast+0x135/0x170
> [<c105bed5>] ? get_futex_key+0x95/0x1c0
> [<c105c60c>] ? futex_wake+0x4c/0x110
> [<c105de0d>] ? do_futex+0x21d/0xd00
> [<c101bd86>] ? no_context+0x26/0x1a0
> [<c102a013>] ? finish_task_switch+0x33/0xf0
> [<c101bfbb>] ? __bad_area_nosemaphore+0xbb/0x180
> [<c1058d8d>] ? __lock_acquire+0x39d/0x18e0
> [<c1058d8d>] ? __lock_acquire+0x39d/0x18e0
> [<c101c0c9>] ? __bad_area+0x29/0x50
> [<c101c0da>] ? __bad_area+0x3a/0x50
> [<c101c122>] ? bad_area_access_error+0x12/0x20
> [<c1002e1c>] ? restore_all_notrace+0x0/0x18
> [<c101c210>] ? do_page_fault+0x0/0x280
> [<c1057c9c>] ? trace_hardirqs_on_caller+0x12c/0x180
> [<c105e992>] ? sys_futex+0xa2/0x130
> [<c101c210>] ? do_page_fault+0x0/0x280
> [<c102fa68>] ? mm_release+0xa8/0xc0
> [<c1033668>] ? exit_mm+0x18/0x110
> [<c1065121>] ? acct_collect+0x131/0x180
> [<c10353cb>] ? do_exit+0x60b/0x680
> [<c101c35d>] ? do_page_fault+0x14d/0x280
> [<c104b8f6>] ? up_read+0x16/0x30
> [<c103547c>] ? do_group_exit+0x3c/0xa0
> [<c10354f3>] ? sys_exit_group+0x13/0x20
> [<c1002d68>] ? sysenter_do_call+0x12/0x36

2009-07-27 12:17:08

by Jens Rosenboom

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Mon, 2009-07-27 at 13:31 +0200, Peter Zijlstra wrote:
> On Mon, 2009-07-27 at 10:00 +0200, Jens Rosenboom wrote:
> > We have a problem with infinitely running processes on kernels at least
> > since 2.6.29.4. It happens on a loaded machine after running for a
> > couple of days,
>
> What kinds of machine, i386? Could you please enable
> CONFIG_FRAME_POINTER, these backtraces are quite mangled.

i686 or AMD dualcore Opteron to be exact. CONFIG_FRAME_POINTER is
enabled, the complete kernel-config is attached, maybe some other
debugging options are needed? But I copied just the part pertaining to
the stuck process, maybe the complete log has the parts you are missing?

> > that a "ps ax" seems to get stuck in get_futex_key while
> > exiting. Sadly your patch
>
> Who's patch, and which patch? 7c8fa4f04ab956076605422d5ed37410893a8a73?
> That was only regarding huge pages.

Yes, that is the one I was talking about and the commit message seemed
to match what I was seeing here.

> The only loop in get_futex_key() appears to be the one around
> get_user_pages_fast(), and I'm not quite sure how that could get stuck
> like this.
>
> Could it be glibc loops on futex_wake() returning -EFAULT?

How would I be able to check that?


Attachments:
config (70.16 kB)
pstrace1.txt.gz (23.10 kB)
Download all attachments

2009-07-27 12:21:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Mon, 2009-07-27 at 14:16 +0200, Jens Rosenboom wrote:
> On Mon, 2009-07-27 at 13:31 +0200, Peter Zijlstra wrote:
> > On Mon, 2009-07-27 at 10:00 +0200, Jens Rosenboom wrote:
> > > We have a problem with infinitely running processes on kernels at least
> > > since 2.6.29.4. It happens on a loaded machine after running for a
> > > couple of days,
> >
> > What kinds of machine, i386? Could you please enable
> > CONFIG_FRAME_POINTER, these backtraces are quite mangled.
>
> i686 or AMD dualcore Opteron to be exact. CONFIG_FRAME_POINTER is
> enabled, the complete kernel-config is attached, maybe some other
> debugging options are needed? But I copied just the part pertaining to
> the stuck process, maybe the complete log has the parts you are missing?

Ah, weird. The question of course is, does an x86_64 kernel suffer the
same problem?

> > > that a "ps ax" seems to get stuck in get_futex_key while
> > > exiting. Sadly your patch
> >
> > Who's patch, and which patch? 7c8fa4f04ab956076605422d5ed37410893a8a73?
> > That was only regarding huge pages.
>
> Yes, that is the one I was talking about and the commit message seemed
> to match what I was seeing here.

Are you in fact using huge pages?

> > The only loop in get_futex_key() appears to be the one around
> > get_user_pages_fast(), and I'm not quite sure how that could get stuck
> > like this.
> >
> > Could it be glibc loops on futex_wake() returning -EFAULT?
>
> How would I be able to check that?

strace the struck process I think, you'd see tons of sys_futex() calls
with FUTEX_WAKE* returning -EFAULT.

2009-07-27 12:46:09

by Jens Rosenboom

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Mon, 2009-07-27 at 14:23 +0200, Peter Zijlstra wrote:
> On Mon, 2009-07-27 at 14:16 +0200, Jens Rosenboom wrote:
> > On Mon, 2009-07-27 at 13:31 +0200, Peter Zijlstra wrote:
> > > On Mon, 2009-07-27 at 10:00 +0200, Jens Rosenboom wrote:
> > > > We have a problem with infinitely running processes on kernels at least
> > > > since 2.6.29.4. It happens on a loaded machine after running for a
> > > > couple of days,
> > >
> > > What kinds of machine, i386? Could you please enable
> > > CONFIG_FRAME_POINTER, these backtraces are quite mangled.
> >
> > i686 or AMD dualcore Opteron to be exact. CONFIG_FRAME_POINTER is
> > enabled, the complete kernel-config is attached, maybe some other
> > debugging options are needed? But I copied just the part pertaining to
> > the stuck process, maybe the complete log has the parts you are missing?
>
> Ah, weird. The question of course is, does an x86_64 kernel suffer the
> same problem?

Good question, but as this happens on a production machine, I cannot
easily change the installation to check this.

> > > > that a "ps ax" seems to get stuck in get_futex_key while
> > > > exiting. Sadly your patch
> > >
> > > Who's patch, and which patch? 7c8fa4f04ab956076605422d5ed37410893a8a73?
> > > That was only regarding huge pages.
> >
> > Yes, that is the one I was talking about and the commit message seemed
> > to match what I was seeing here.
>
> Are you in fact using huge pages?

The process that gets stuck is a standard ps from procps version 3.2.8,
which is called from within a perl script, so the answer is probably:
no. Which means let us forget that patch and look at this as a distinct
issue.

> > > The only loop in get_futex_key() appears to be the one around
> > > get_user_pages_fast(), and I'm not quite sure how that could get stuck
> > > like this.
> > >
> > > Could it be glibc loops on futex_wake() returning -EFAULT?
> >
> > How would I be able to check that?
>
> strace the struck process I think, you'd see tons of sys_futex() calls
> with FUTEX_WAKE* returning -EFAULT.

Attaching an strace to the process gives just

# strace -p 12886
Process 12886 attached - interrupt to quit

and nothing further.

2009-07-27 13:34:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Mon, 2009-07-27 at 14:45 +0200, Jens Rosenboom wrote:

> Good question, but as this happens on a production machine, I cannot
> easily change the installation to check this.

Uhm, but you do run .31-rc4 on it? :-)

> Attaching an strace to the process gives just
>
> # strace -p 12886
> Process 12886 attached - interrupt to quit
>
> and nothing further.

Bugger.. how easy it is to reproduce?

2009-07-27 13:42:28

by Eric Dumazet

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

Peter Zijlstra a ?crit :
> On Mon, 2009-07-27 at 14:45 +0200, Jens Rosenboom wrote:
>
>> Good question, but as this happens on a production machine, I cannot
>> easily change the installation to check this.
>
> Uhm, but you do run .31-rc4 on it? :-)
>
>> Attaching an strace to the process gives just
>>
>> # strace -p 12886
>> Process 12886 attached - interrupt to quit
>>
>> and nothing further.
>
> Bugger.. how easy it is to reproduce?
> --

Strange thing is that "ps ax" doesnt use threads at all.

Stack trace seems to show mm_release() calls futex code,
but this part should not be called when a mono threaded program
exits (its clear_child_tid should be NULL at this point)

2009-07-27 14:07:22

by Jens Rosenboom

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Mon, Jul 27, 2009 at 03:36:52PM +0200, Peter Zijlstra wrote:
> On Mon, 2009-07-27 at 14:45 +0200, Jens Rosenboom wrote:
>
> > Good question, but as this happens on a production machine, I cannot
> > easily change the installation to check this.
>
> Uhm, but you do run .31-rc4 on it? :-)

In the hope of verifying a patch, yes. Before that we had 2.6.29.4
for some time.

> > Attaching an strace to the process gives just
> >
> > # strace -p 12886
> > Process 12886 attached - interrupt to quit
> >
> > and nothing further.
>
> Bugger.. how easy it is to reproduce?

There is a lot of stuff happening on this machine, I'm not sure
what part of it is necessary to trigger the bug. Currently it
takes anything from a couple of hours to a couple of days for
the first stuck ps to appear.

2009-07-27 16:01:05

by Ray Lee

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Mon, Jul 27, 2009 at 7:00 AM, Jens Rosenboom
<[email protected]> wrote:
>
> On Mon, Jul 27, 2009 at 03:36:52PM +0200, Peter Zijlstra wrote:
> > On Mon, 2009-07-27 at 14:45 +0200, Jens Rosenboom wrote:
> >
> > > Good question, but as this happens on a production machine, I cannot
> > > easily change the installation to check this.
> >
> > Uhm, but you do run .31-rc4 on it? :-)
>
> In the hope of verifying a patch, yes. Before that we had 2.6.29.4
> for some time.

In principle you should be able to run an x86_64 kernel on that
machine with the 32-bit userspace unchanged. The only hassle would be
if you are using any kernel modules outside the normal tree. So, no
need to change the full installation.

2009-07-29 06:23:05

by Jens Rosenboom

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Mon, 2009-07-27 at 15:36 +0200, Peter Zijlstra wrote:
[...]
> Bugger.. how easy it is to reproduce?

Okay, my colleague found the right combination of scripts, take the two
attached, run them both a couple of times in parallel for some hours,
and get a stuck ps. This happens both on an old 2.6.29.1 I happened to
still have on one machine as with 2.6.31-rc4. Both of them dual-core
Opterons as the original one. If you want further tracebacks or other
information, let me know.


Attachments:
null.pl (79.00 B)
pees.pl (99.00 B)
Download all attachments

2009-07-29 09:57:41

by Jens Rosenboom

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Wed, 2009-07-29 at 08:22 +0200, Jens Rosenboom wrote:
> On Mon, 2009-07-27 at 15:36 +0200, Peter Zijlstra wrote:
> [...]
> > Bugger.. how easy it is to reproduce?
>
> Okay, my colleague found the right combination of scripts, take the two
> attached, run them both a couple of times in parallel for some hours,
> and get a stuck ps. This happens both on an old 2.6.29.1 I happened to
> still have on one machine as with 2.6.31-rc4. Both of them dual-core
> Opterons as the original one. If you want further tracebacks or other
> information, let me know.

Forget about null.pl even, just run pees.pl twice and a top to watch it,
has worked for me within less than an hour several times now.

2009-07-29 09:59:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Wed, 2009-07-29 at 11:57 +0200, Jens Rosenboom wrote:
> On Wed, 2009-07-29 at 08:22 +0200, Jens Rosenboom wrote:
> > On Mon, 2009-07-27 at 15:36 +0200, Peter Zijlstra wrote:
> > [...]
> > > Bugger.. how easy it is to reproduce?
> >
> > Okay, my colleague found the right combination of scripts, take the two
> > attached, run them both a couple of times in parallel for some hours,
> > and get a stuck ps. This happens both on an old 2.6.29.1 I happened to
> > still have on one machine as with 2.6.31-rc4. Both of them dual-core
> > Opterons as the original one. If you want further tracebacks or other
> > information, let me know.
>
> Forget about null.pl even, just run pees.pl twice and a top to watch it,
> has worked for me within less than an hour several times now.

Thanks, I'll give it a go..

2009-07-29 10:29:13

by Eric Dumazet

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

Jens Rosenboom a ?crit :
> On Wed, 2009-07-29 at 08:22 +0200, Jens Rosenboom wrote:
>> On Mon, 2009-07-27 at 15:36 +0200, Peter Zijlstra wrote:
>> [...]
>>> Bugger.. how easy it is to reproduce?
>> Okay, my colleague found the right combination of scripts, take the two
>> attached, run them both a couple of times in parallel for some hours,
>> and get a stuck ps. This happens both on an old 2.6.29.1 I happened to
>> still have on one machine as with 2.6.31-rc4. Both of them dual-core
>> Opterons as the original one. If you want further tracebacks or other
>> information, let me know.
>
> Forget about null.pl even, just run pees.pl twice and a top to watch it,
> has worked for me within less than an hour several times now.
>

Ah that makes sense now...

maybe execve() forgets to clear clear_child_tid

2009-07-29 10:56:29

by Eric Dumazet

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

Eric Dumazet a ?crit :
> Jens Rosenboom a ?crit :
>> On Wed, 2009-07-29 at 08:22 +0200, Jens Rosenboom wrote:
>>> On Mon, 2009-07-27 at 15:36 +0200, Peter Zijlstra wrote:
>>> [...]
>>>> Bugger.. how easy it is to reproduce?
>>> Okay, my colleague found the right combination of scripts, take the two
>>> attached, run them both a couple of times in parallel for some hours,
>>> and get a stuck ps. This happens both on an old 2.6.29.1 I happened to
>>> still have on one machine as with 2.6.31-rc4. Both of them dual-core
>>> Opterons as the original one. If you want further tracebacks or other
>>> information, let me know.
>> Forget about null.pl even, just run pees.pl twice and a top to watch it,
>> has worked for me within less than an hour several times now.
>>
>
> Ah that makes sense now...
>
> maybe execve() forgets to clear clear_child_tid
>

Sorry , I hit 'Send' before completing the mail...

Could you please try following patch ?

[PATCH] exec: must clear clear_child_tid

"ps", while reading /proc/xxx files, has to raise mm_users count
(via a call to get_task_mm())

So when the exiting process (and spied by a ps) calls mm_release() we could go through :

if (tsk->clear_child_tid
&& !(tsk->flags & PF_SIGNALED)
&& atomic_read(&mm->mm_users) > 1) {
u32 __user * tidptr = tsk->clear_child_tid;
tsk->clear_child_tid = NULL;

/*
* We don't check the error code - if userspace has
* not set up a proper pointer then tough luck.
*/
put_user(0, tidptr);
sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
}

It can happen if execve() doesnt set clear_child_tid to NULL,
and we try to futex() to a tidptr that has no meaning (it had a
meaning only in bash space, before its thread did the execve())

Furthermore, we can call put_user(0, tidptr), overwriting some integer
in our user space and corrupt user memory. (if the initial thread exits)


Reported-by: Jens Rosenboom <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
---
diff --git a/fs/exec.c b/fs/exec.c
index 4a8849e..e275652 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1343,6 +1343,7 @@ int do_execve(char * filename,
mutex_unlock(&current->cred_guard_mutex);
acct_update_integrals(current);
free_bprm(bprm);
+ current->clear_child_tid = NULL;
if (displaced)
put_files_struct(displaced);
return retval;

2009-07-29 14:29:42

by Jens Rosenboom

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

On Wed, 2009-07-29 at 12:56 +0200, Eric Dumazet wrote:
[...]
> Could you please try following patch ?
>
> [PATCH] exec: must clear clear_child_tid
[...]

Running fine for three hours now, but I'll keep it running at least
until tomorrow to be sure.

2009-07-30 14:13:53

by Jens Rosenboom

[permalink] [raw]
Subject: Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

Another 24 hours without stuck processes have earned you my

Tested-by: Jens Rosenboom <[email protected]>

2009-07-31 10:01:03

by Eric Dumazet

[permalink] [raw]
Subject: [ PATCH] execve: must clear current->clear_child_tid

While looking at Jens Rosenboom bug report about strange sys_futex call done
from a dying "ps" program, we found following problem.

clone() syscall has special support for TID of created threads.
This support includes two features.

One (CLONE_CHILD_SETTID) is to set an integer into user memory
with the TID value.

One (CLONE_CHILD_CLEARTID) is to clear this same integer once
the created thread dies.

The integer location is a user provided pointer, provided at clone()
time.

kernel keeps this pointer value into current->clear_child_tid.

At execve() time, we should make sure kernel doesnt keep
this user provided pointer, as full user memory is replaced by a new one.

As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID
and CLONE_CHILD_CLEARTID set, chances are high that we might corrupt
user memory in forked processes.

Following sequence could happen:

1) bash (or any program) starts a new process, by a fork() call
that glibc maps to a clone( ... CLONE_CHILD_SETTID
| CLONE_CHILD_CLEARTID ...) syscall

2) When new process starts, its current->clear_child_tid is set to a location
that has a meaning only in bash (or initial program) context (&THREAD_SELF->tid)

3) This new process does the execve() syscall to start a new program.
current->clear_child_tid is left unchanged (a non NULL value)

4) If this new program creates some threads, and initial thread exits,
kernel will attempt to clear the integer pointed by current->clear_child_tid
from mm_release() :

if (tsk->clear_child_tid
&& !(tsk->flags & PF_SIGNALED)
&& atomic_read(&mm->mm_users) > 1) {
u32 __user * tidptr = tsk->clear_child_tid;
tsk->clear_child_tid = NULL;

/*
* We don't check the error code - if userspace has
* not set up a proper pointer then tough luck.
*/
<< here >> put_user(0, tidptr);
sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
}

5) OR : if new program is not multi-threaded, but spied by /proc/pid users
(ps command for example), mm_users > 1, and the exiting program could
corrupt 4 bytes in a persistent memory area (shm or memory mapped file)

If current->clear_child_tid points to a writeable portion of memory
of the new program, kernel happily and silently corrupts 4 bytes of memory,
with unexpected effects.

Fix is straightforward and should not break any sane program.

Reported-by: Jens Rosenboom <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Tested-by: Jens Rosenboom <[email protected]>
---
fs/compat.c | 1 +
fs/exec.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/fs/compat.c b/fs/compat.c
index 94502da..deb1049 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1550,6 +1550,7 @@ int compat_do_execve(char * filename,
mutex_unlock(&current->cred_guard_mutex);
acct_update_integrals(current);
free_bprm(bprm);
+ current->clear_child_tid = NULL;
if (displaced)
put_files_struct(displaced);
return retval;
diff --git a/fs/exec.c b/fs/exec.c
index 4a8849e..e275652 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1343,6 +1343,7 @@ int do_execve(char * filename,
mutex_unlock(&current->cred_guard_mutex);
acct_update_integrals(current);
free_bprm(bprm);
+ current->clear_child_tid = NULL;
if (displaced)
put_files_struct(displaced);
return retval;