2011-04-12 10:32:40

by Robert Święcki

[permalink] [raw]
Subject: Processes hang in an unkillable state

Hi, while fuzzing Linux system calls (32bit fuzzer, 64bi linux
kernel), it happens after some time (10-20mins) that some processes
enter a state which makes them un-killable. They are either in R or D
state.

# strace ps wwuax
...
...
open("/proc/450/cmdline", O_RDONLY) = 6
read(6, - hangs....

# kill -9 450
# kill -9 450 (no ESRCH)

More data in the attachment - I'll keep it in the kdb session for
further examination.

--
Robert Święcki


Attachments:
450.txt (8.94 kB)

2011-04-12 12:44:36

by Cong Wang

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

2011/4/12 Robert Święcki <[email protected]>:
> Hi, while fuzzing Linux system calls (32bit fuzzer, 64bi linux
> kernel), it happens after some time (10-20mins) that some processes
> enter a state which makes them un-killable. They are either in R or D
> state.
>
> # strace ps wwuax
> ...
> ...
> open("/proc/450/cmdline", O_RDONLY)     = 6
> read(6,  -             hangs....
>
> # kill -9 450
> # kill -9 450 (no ESRCH)
>
> More data in the attachment - I'll keep it in the kdb session for
> further examination.

Hmm, it must be stuck at

lib/rwsem.c

/* wait to be given the lock */
for (;;) {
if (!waiter.task)
break;
schedule();
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
}

don't know why it still can't acquire the ->mmap_sem...

Cc'ing Oleg...

2011-04-12 13:03:24

by Robert Święcki

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Tue, Apr 12, 2011 at 2:44 PM, Américo Wang <[email protected]> wrote:
> 2011/4/12 Robert Święcki <[email protected]>:
>> Hi, while fuzzing Linux system calls (32bit fuzzer, 64bi linux
>> kernel), it happens after some time (10-20mins) that some processes
>> enter a state which makes them un-killable. They are either in R or D
>> state.
>>
>> # strace ps wwuax
>> ...
>> ...
>> open("/proc/450/cmdline", O_RDONLY)     = 6
>> read(6,  -             hangs....
>>
>> # kill -9 450
>> # kill -9 450 (no ESRCH)
>>
>> More data in the attachment - I'll keep it in the kdb session for
>> further examination.
>
> Hmm, it must be stuck at
>
> lib/rwsem.c
>
>        /* wait to be given the lock */
>        for (;;) {
>                if (!waiter.task)
>                        break;
>                schedule();
>                set_task_state(tsk, TASK_UNINTERRUPTIBLE);
>        }
>
> don't know why it still can't acquire the ->mmap_sem...

btw, the ps process trying to read /proc/450/cmdline is stuck in

[0]kdb> bt
Stack traceback for pid 6959
0xffff880113334590 6959 18384 0 1 D 0xffff880113334a10 ps
<c> ffff88011f8f9d00<c> 0000000000000082<c> 00000040ffffffff<c>
0000000000000000<c>
<c> ffff88012bffcc08<c> ffff88011f8f8000<c> ffff88011f8f8000<c>
ffff880113334590<c>
<c> ffff88011f8f8010<c> ffff880113334948<c> ffff88011f8f9fd8<c>
ffff88011f8f9fd8<c>
Call Trace:
[<ffffffff8224f665>] rwsem_down_failed_common+0xc5/0x160
[<ffffffff8224f735>] rwsem_down_read_failed+0x15/0x17
[<ffffffff81595694>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff810b31d0>] ? get_task_mm+0x40/0x80
[<ffffffff8224e957>] ? down_read+0x17/0x20
[<ffffffff811788eb>] access_process_vm+0x4b/0x1f0
[<ffffffff8224ffba>] ? _raw_spin_unlock+0x1a/0x40
[<ffffffff8120b15d>] proc_pid_cmdline+0x6d/0x120
[<ffffffff811925c1>] ? alloc_pages_current+0xa1/0x100
[<ffffffff8120bc9d>] proc_info_read+0xad/0xf0
[<ffffffff811abc55>] vfs_read+0xc5/0x190
[<ffffffff811abe21>] sys_read+0x51/0x90
[<ffffffff8104f082>] system_call_fastpath+0x16/0x1b


--
Robert Święcki

2011-04-12 13:08:47

by Robert Święcki

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Tue, Apr 12, 2011 at 3:03 PM, Robert Święcki <[email protected]> wrote:
> On Tue, Apr 12, 2011 at 2:44 PM, Américo Wang <[email protected]> wrote:
>> 2011/4/12 Robert Święcki <[email protected]>:
>>> Hi, while fuzzing Linux system calls (32bit fuzzer, 64bi linux
>>> kernel), it happens after some time (10-20mins) that some processes
>>> enter a state which makes them un-killable. They are either in R or D
>>> state.
>>>
>>> # strace ps wwuax
>>> ...
>>> ...
>>> open("/proc/450/cmdline", O_RDONLY)     = 6
>>> read(6,  -             hangs....
>>>
>>> # kill -9 450
>>> # kill -9 450 (no ESRCH)
>>>
>>> More data in the attachment - I'll keep it in the kdb session for
>>> further examination.
>>
>> Hmm, it must be stuck at
>>
>> lib/rwsem.c
>>
>>        /* wait to be given the lock */
>>        for (;;) {
>>                if (!waiter.task)
>>                        break;
>>                schedule();
>>                set_task_state(tsk, TASK_UNINTERRUPTIBLE);
>>        }
>>
>> don't know why it still can't acquire the ->mmap_sem...
>
> btw, the ps process trying to read /proc/450/cmdline is stuck in
>
> [0]kdb> bt
> Stack traceback for pid 6959
> 0xffff880113334590     6959    18384  0    1   D  0xffff880113334a10  ps
> <c> ffff88011f8f9d00<c> 0000000000000082<c> 00000040ffffffff<c>
> 0000000000000000<c>
> <c> ffff88012bffcc08<c> ffff88011f8f8000<c> ffff88011f8f8000<c>
> ffff880113334590<c>
> <c> ffff88011f8f8010<c> ffff880113334948<c> ffff88011f8f9fd8<c>
> ffff88011f8f9fd8<c>
> Call Trace:
>  [<ffffffff8224f665>] rwsem_down_failed_common+0xc5/0x160
>  [<ffffffff8224f735>] rwsem_down_read_failed+0x15/0x17
>  [<ffffffff81595694>] call_rwsem_down_read_failed+0x14/0x30
>  [<ffffffff810b31d0>] ? get_task_mm+0x40/0x80
>  [<ffffffff8224e957>] ? down_read+0x17/0x20
>  [<ffffffff811788eb>] access_process_vm+0x4b/0x1f0
>  [<ffffffff8224ffba>] ? _raw_spin_unlock+0x1a/0x40
>  [<ffffffff8120b15d>] proc_pid_cmdline+0x6d/0x120
>  [<ffffffff811925c1>] ? alloc_pages_current+0xa1/0x100
>  [<ffffffff8120bc9d>] proc_info_read+0xad/0xf0
>  [<ffffffff811abc55>] vfs_read+0xc5/0x190
>  [<ffffffff811abe21>] sys_read+0x51/0x90
>  [<ffffffff8104f082>] system_call_fastpath+0x16/0x1b

And I thought that kdb "dumpall" might help as well (atached).

--
Robert Święcki


Attachments:
dumpall.txt (325.57 kB)

2011-04-12 18:28:43

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On 04/12, Américo Wang wrote:
>
> 2011/4/12 Robert Święcki <[email protected]>:
> > Hi, while fuzzing Linux system calls (32bit fuzzer, 64bi linux
> > kernel), it happens after some time (10-20mins) that some processes
> > enter a state which makes them un-killable. They are either in R or D
> > state.
> >
> > # strace ps wwuax
> > ...
> > ...
> > open("/proc/450/cmdline", O_RDONLY)     = 6
> > read(6,  -             hangs....
> >
> > # kill -9 450
> > # kill -9 450 (no ESRCH)
> >
> > More data in the attachment - I'll keep it in the kdb session for
> > further examination.
>
> http://marc.info/?t0260440100004
>
> Hmm, it must be stuck at
>
> lib/rwsem.c
>
> /* wait to be given the lock */
> for (;;) {
> if (!waiter.task)
> break;
> schedule();
> set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> }
>
> don't know why it still can't acquire the ->mmap_sem...
>
> Cc'ing Oleg...

I seem to understand...

Please wait a bit, I need to recheck.

Oleg.

2011-04-12 18:34:12

by Robert Święcki

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Tue, Apr 12, 2011 at 8:28 PM, Oleg Nesterov <[email protected]> wrote:
> On 04/12, Américo Wang wrote:
>>
>> 2011/4/12 Robert Święcki <[email protected]>:
>> > Hi, while fuzzing Linux system calls (32bit fuzzer, 64bi linux
>> > kernel), it happens after some time (10-20mins) that some processes
>> > enter a state which makes them un-killable. They are either in R or D
>> > state.
>> >
>> > # strace ps wwuax
>> > ...
>> > ...
>> > open("/proc/450/cmdline", O_RDONLY)     = 6
>> > read(6,  -             hangs....
>> >
>> > # kill -9 450
>> > # kill -9 450 (no ESRCH)
>> >
>> > More data in the attachment - I'll keep it in the kdb session for
>> > further examination.
>>
>> http://marc.info/?t 0260440100004
>>
>> Hmm, it must be stuck at
>>
>> lib/rwsem.c
>>
>>         /* wait to be given the lock */
>>         for (;;) {
>>                 if (!waiter.task)
>>                         break;
>>                 schedule();
>>                 set_task_state(tsk, TASK_UNINTERRUPTIBLE);
>>         }
>>
>> don't know why it still can't acquire the ->mmap_sem...
>>
>> Cc'ing Oleg...
>
> I seem to understand...
>
> Please wait a bit, I need to recheck.

Btw, Linus Torvalds is looking into a similar case in another thread -
http://marc.info/?l=linux-kernel&m=130262886420218&w=2

--
Robert Święcki

2011-04-12 19:08:52

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/1] Was: Processes hang in an unkillable state

(add cc's)

On 04/12, Oleg Nesterov wrote:
>
> On 04/12, Am?rico Wang wrote:
> >
> > Hmm, it must be stuck at
> >
> > lib/rwsem.c
> >
> > /* wait to be given the lock */
> > for (;;) {
> > if (!waiter.task)
> > break;
> > schedule();
> > set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> > }
> >
> > don't know why it still can't acquire the ->mmap_sem...
> >
> > Cc'ing Oleg...
>
> I seem to understand...
>
> Please wait a bit, I need to recheck.

Yes, mlock looks buggy. I'll report more info a bit later, but
I think we need something like this patch.

Oleg.

2011-04-12 19:09:28

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/1] __mlock_vma_pages_range: stack_guard_page() case returns the wrong value

__mlock_vma_pages_range() simply changes addr/nr_pages when
stack_guard_page(vma, start). But this means that __get_user_pages()
returns a number which doesn't match the [start, end) interval and
the caller can be confused.

If we skip the first page, we should return 1 if gup fails, or add
1 to the number it returns.

Signed-off-by: Oleg Nesterov <[email protected]>
---

mm/mlock.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

--- sigprocmask/mm/mlock.c~do_mlock_pages_stack_guard_page 2011-04-06 21:33:50.000000000 +0200
+++ sigprocmask/mm/mlock.c 2011-04-12 20:50:30.000000000 +0200
@@ -159,9 +159,8 @@ static long __mlock_vma_pages_range(stru
int *nonblocking)
{
struct mm_struct *mm = vma->vm_mm;
- unsigned long addr = start;
int nr_pages = (end - start) / PAGE_SIZE;
- int gup_flags;
+ int gup_flags, skip_page, ret;

VM_BUG_ON(start & ~PAGE_MASK);
VM_BUG_ON(end & ~PAGE_MASK);
@@ -189,13 +188,22 @@ static long __mlock_vma_pages_range(stru
gup_flags |= FOLL_MLOCK;

/* We don't try to access the guard page of a stack vma */
+ skip_page = 0;
if (stack_guard_page(vma, start)) {
- addr += PAGE_SIZE;
+ skip_page = 1;
+ start += PAGE_SIZE;
nr_pages--;
}

- return __get_user_pages(current, mm, addr, nr_pages, gup_flags,
+ ret = __get_user_pages(current, mm, start, nr_pages, gup_flags,
NULL, NULL, nonblocking);
+
+ if (ret >= 0)
+ ret += skip_page;
+ else if (skip_page)
+ ret = 1;
+
+ return ret;
}

/*

2011-04-12 19:18:49

by Robert Święcki

[permalink] [raw]
Subject: Re: [PATCH 1/1] __mlock_vma_pages_range: stack_guard_page() case returns the wrong value

On Tue, Apr 12, 2011 at 9:08 PM, Oleg Nesterov <[email protected]> wrote:
> __mlock_vma_pages_range() simply changes addr/nr_pages when
> stack_guard_page(vma, start). But this means that __get_user_pages()
> returns a number which doesn't match the [start, end) interval and
> the caller can be confused.
>
> If we skip the first page, we should return 1 if gup fails, or add
> 1 to the number it returns.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
>
>  mm/mlock.c |   16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> --- sigprocmask/mm/mlock.c~do_mlock_pages_stack_guard_page      2011-04-06 21:33:50.000000000 +0200
> +++ sigprocmask/mm/mlock.c      2011-04-12 20:50:30.000000000 +0200
> @@ -159,9 +159,8 @@ static long __mlock_vma_pages_range(stru
>                                    int *nonblocking)
>  {
>        struct mm_struct *mm = vma->vm_mm;
> -       unsigned long addr = start;
>        int nr_pages = (end - start) / PAGE_SIZE;
> -       int gup_flags;
> +       int gup_flags, skip_page, ret;
>
>        VM_BUG_ON(start & ~PAGE_MASK);
>        VM_BUG_ON(end   & ~PAGE_MASK);
> @@ -189,13 +188,22 @@ static long __mlock_vma_pages_range(stru
>                gup_flags |= FOLL_MLOCK;
>
>        /* We don't try to access the guard page of a stack vma */
> +       skip_page = 0;
>        if (stack_guard_page(vma, start)) {
> -               addr += PAGE_SIZE;
> +               skip_page = 1;
> +               start += PAGE_SIZE;
>                nr_pages--;
>        }
>
> -       return __get_user_pages(current, mm, addr, nr_pages, gup_flags,
> +       ret = __get_user_pages(current, mm, start, nr_pages, gup_flags,
>                                NULL, NULL, nonblocking);
> +
> +       if (ret >= 0)
> +               ret += skip_page;
> +       else if (skip_page)
> +               ret = 1;
> +
> +       return ret;
>  }
>
>  /*

Compiling with Linus' new patch now, lemme know if you agree on which
one might be the more correct one :). Otherwise I'll stick to the
first choice, and let you know tomorrow if it worked some more
extensive testing.

--
Robert Święcki

2011-04-12 19:22:08

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On 04/12, Robert Święcki wrote:
>
> On Tue, Apr 12, 2011 at 8:28 PM, Oleg Nesterov <[email protected]> wrote:
> >
> > I seem to understand...
> >
> > Please wait a bit, I need to recheck.
>
> Btw, Linus Torvalds is looking into a similar case in another thread -
> http://marc.info/?l=linux-kernel&m=130262886420218&w=2

Argh. thanks ;)

I'd wish I knew this before I started to investigate...

So, Linus's patch does the same, we can ignore the patch I sent.

Oleg.

2011-04-12 20:23:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Tue, Apr 12, 2011 at 1:03 PM, Robert Święcki <[email protected]> wrote:
>>
>> Ok, applied Linus' patch and got the following (kdb dump in the attachment):
>>
>> It contains references to sys_mlock, but in another process/user that
>> oopsed (there are iknowthis and iknowthis2 processes running under
>> test and test2 users). I think I'll simply disable sys_madvise in the
>> fuzzer; and treat this oops as a separate issue.

This does seem to be something else.

It looks like some kind of live-lock situation between two processes
both doing madvise() and causing vmtruncate_range() calls.

Miklos wrote this patch for something bad in this area to serialize
concurrent unmap_mapping_range() calls in order to not restart forever
on vm_truncate_count. That got merged into 2.6.38, so it's there, but
I wonder if there is some case it misses.

Linus

2011-04-12 21:47:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Tue, Apr 12, 2011 at 1:56 PM, Robert Święcki <[email protected]> wrote:
>
> Ok, just to update you with what I'm currently doing:
>
> I'm testing now with 2.6.39-rc3 - according to
> http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.39-rc3
> it has vma_to_resize patch included
> (982134ba62618c2d69fbbbd166d0a11ee3b7e3d8) - I applied the latest
> Linus' patch for sys_mlock (the one patching memory.c and mlock.c),
> disabled the sys_madvise in the fuzzer, and now I got the following
> (full kdb dump attached)

Ok, that's different from the apparent livelock.

Except it once again is one of the BUG_ON's in vma_prio_tree_add() -
and again, your kgdb thing has corrupted the bug information.

Can you make a bug-report to the kgdb people? It's annoying as hell
that all the *critical* bug information that the kernel prints out
apparently gets totally lost when you attach with the debugger. It's
not an Oops, it should have that nice BUG: together with filename and
line number.

> <d>Pid: 18598, comm: iknowthis Not tainted 2.6.39-rc3 #1<c> Dell Inc.
>               Precision WorkStation 390    <c>/0GH911<c>
> <d>RIP: 0010:[<ffffffff8116c842>]  [<ffffffff8116c842>] vma_prio_tree_add+0xc2/0xd0

Code disassembly shows:

0: 58 pop %rax
1: 48 89 7e 68 mov %rdi,0x68(%rsi)
5: c9 leaveq
6: c3 retq
7: 66 90 xchg %ax,%ax
9: 48 8b 56 50 mov 0x50(%rsi),%rdx
d: 48 8d 47 50 lea 0x50(%rdi),%rax
11: 48 89 42 08 mov %rax,0x8(%rdx)
15: 48 89 57 50 mov %rdx,0x50(%rdi)
19: 48 8d 56 50 lea 0x50(%rsi),%rdx
1d: 48 89 57 58 mov %rdx,0x58(%rdi)
21: 48 89 46 50 mov %rax,0x50(%rsi)
25: c9 leaveq
26: c3 retq
27:* 0f 0b ud2 <-- trapping instruction
29: eb fe jmp 0x29
2b:* 0f 0b ud2 <-- trapping instruction
2d: eb fe jmp 0x2d
2f: eb 08 jmp 0x39

and scripts/decodecode is wrong, it's the _second_ of the two ud2's
that traps, as shown by the Code: line.

But whether that is the first or the second in the source code, who
knows? Gcc may have re-ordered things completely, and kdb has thrown
away the information that the kernel should have printed out.

Anyway, it looks _very_ much exactly like the old mremap() issue. But
if you are running -rc3, then you already have commit 42933bac11e8 in
your tree, so maybe there is some other way to trigger a vm_pgoff
overflow.

You've lost Hugh's patch that did the vma dump instead of having the
BUG_ON(). Can you try that one? And once more, I think that if you had
CONFIG_OPTIMIZE_SIZE on, then I think gcc wouldn't re-order the basic
blocks, and the BUG_ON() info would be easier to track.

> Call Trace:
>  [<ffffffff8116c9a1>] vma_prio_tree_insert+0x41/0x60
>  [<ffffffff8117cb8c>] __vma_link_file+0x4c/0x90
>  [<ffffffff8117d568>] vma_adjust+0xe8/0x570
>  [<ffffffff8117db31>] __split_vma+0x141/0x280
>  [<ffffffff8117dc95>] split_vma+0x25/0x30
>  [<ffffffff8117c1a1>] mlock_fixup+0x171/0x1c0
>  [<ffffffff8117c529>] do_mlock+0xc9/0x100
>  [<ffffffff8117c6d7>] sys_mlock+0xe7/0x130
>  [<ffffffff82284e03>] ia32_do_call+0x13/0x13

Hmm. mlock() itself should not be causing any pgoff expansion.

I wonder if this is related to that whole stack expansion thing (you
clearly are hitting the stack vma judging by the other bug you found),
and we have a pgoff underflow when expanding the stack?

Attached patch for your enjoyment. COMPLETELY UNTESTED, as usual.

Guys, can you think of any other thing that might expand a mapping?
Rather than find them one-by-one as Robert plays with his fuzzer?

Linus


Attachments:
patch.diff (748.00 B)

2011-04-12 21:59:40

by Robert Święcki

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

>> Ok, just to update you with what I'm currently doing:
>>
>> I'm testing now with 2.6.39-rc3 - according to
>> http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.39-rc3
>> it has vma_to_resize patch included
>> (982134ba62618c2d69fbbbd166d0a11ee3b7e3d8) - I applied the latest
>> Linus' patch for sys_mlock (the one patching memory.c and mlock.c),
>> disabled the sys_madvise in the fuzzer, and now I got the following
>> (full kdb dump attached)
>
> Ok, that's different from the apparent livelock.
>
> Except it once again is one of the BUG_ON's in vma_prio_tree_add() -
> and again, your kgdb thing has corrupted the bug information.
>
> Can you make a bug-report to the kgdb people?

Ok,

> You've lost Hugh's patch that did the vma dump instead of having the
> BUG_ON(). Can you try that one? And once more, I think that if you had
> CONFIG_OPTIMIZE_SIZE on, then I think gcc wouldn't re-order the basic
> blocks, and the BUG_ON() info would be easier to track.

Compiling now with CONFIG_OPTIMIZE_SIZE and vma dump code. Will
probably post some results tomorrow.

--
Robert Święcki

2011-04-12 22:13:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Tue, Apr 12, 2011 at 2:59 PM, Robert Święcki <[email protected]> wrote:
>
> Compiling now with CONFIG_OPTIMIZE_SIZE and vma dump code. Will
> probably post some results tomorrow.

.. and if you've added my patch to the stack growth case, hopefully
there won't _be_ any results ;)

Linus

2011-04-12 22:16:18

by Robert Święcki

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Wed, Apr 13, 2011 at 12:12 AM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 12, 2011 at 2:59 PM, Robert Święcki <[email protected]> wrote:
>>
>> Compiling now with CONFIG_OPTIMIZE_SIZE and vma dump code. Will
>> probably post some results tomorrow.
>
> .. and if you've added my patch to the stack growth case, hopefully
> there won't _be_ any results ;)

I can, depending in whether you'd like to see vma dump results for
this case or not. Let me know, it's still cooooompiling :).

--
Robert Święcki

2011-04-12 22:19:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Tue, Apr 12, 2011 at 3:16 PM, Robert Święcki <[email protected]> wrote:
> On Wed, Apr 13, 2011 at 12:12 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Tue, Apr 12, 2011 at 2:59 PM, Robert Święcki <[email protected]> wrote:
>>>
>>> Compiling now with CONFIG_OPTIMIZE_SIZE and vma dump code. Will
>>> probably post some results tomorrow.
>>
>> .. and if you've added my patch to the stack growth case, hopefully
>> there won't _be_ any results ;)
>
> I can, depending in whether you'd like to see vma dump results for
> this case or not. Let me know, it's still cooooompiling :).

Please do add it. Since it can take a long time to trigger, it's best
to just try to fix this issue asap. If it never triggers, and we don't
see any vma dumps, I won't cry.

Linus

2011-04-12 22:30:59

by Robert Święcki

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Wed, Apr 13, 2011 at 12:18 AM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 12, 2011 at 3:16 PM, Robert Święcki <[email protected]> wrote:
>> On Wed, Apr 13, 2011 at 12:12 AM, Linus Torvalds
>> <[email protected]> wrote:
>>> On Tue, Apr 12, 2011 at 2:59 PM, Robert Święcki <[email protected]> wrote:
>>>>
>>>> Compiling now with CONFIG_OPTIMIZE_SIZE and vma dump code. Will
>>>> probably post some results tomorrow.
>>>
>>> .. and if you've added my patch to the stack growth case, hopefully
>>> there won't _be_ any results ;)
>>
>> I can, depending in whether you'd like to see vma dump results for
>> this case or not. Let me know, it's still cooooompiling :).
>
> Please do add it. Since it can take a long time to trigger, it's best
> to just try to fix this issue asap. If it never triggers, and we don't
> see any vma dumps, I won't cry.

Ok,

btw, here might be another path which hits this (at least I think so).

http://alt.swiecki.net/linux_kernel/sys_mprotect-2.6.38.txt

And, generally: here are a few deadlocks/bug_on's/ooops gathered
earlier (some fixed already) - http://alt.swiecki.net/linux_kernel/ -
I'll try to ask for fixes for them one by one, as soon as they repeat
and I have proper kdb/perf dumps.

--
Robert Święcki

2011-04-12 22:44:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Tue, Apr 12, 2011 at 3:30 PM, Robert Święcki <[email protected]> wrote:
>
> btw, here might be another path which hits this (at least I think so).

So both mprotect and mlock will do the same "split/merge vma's as
necessary", but neither of them should actually ever _expand_ a
mapping or change the vm_pgoff of a vma (except to fix up the pgoff as
a vma is split).

So what I think is happening is that a previous vma operation (like
the mremap or the stack expansion) did the expand and created a vma
with a wrapping vm_pgoff. But nothing bad happened, because nobody
really _cares_ about the wrapping until later, when we split the vma.

So I think (and hope) that your mprotect issue is exactly the same as
your mlock issue, and that the deeper problem was the earlier stack
expansion.

That said, I'm not at all going to guarantee that it's about stack
expansion. There might be something else going on, and the stack
expansion was just the first thing that I could think of as doing
something similar to mremap(), causing a wrapping vm_pgoff.

Linus

2011-04-13 12:19:13

by Robert Święcki

[permalink] [raw]
Subject: Re: Processes hang in an unkillable state

On Wed, Apr 13, 2011 at 12:43 AM, Linus Torvalds
<[email protected]> wrote:
>> btw, here might be another path which hits this (at least I think so).
>
> So both mprotect and mlock will do the same "split/merge vma's as
> necessary", but neither of them should actually ever _expand_ a
> mapping or change the vm_pgoff of a vma (except to fix up the pgoff as
> a vma is split).
>
> So what I think is happening is that a previous vma operation (like
> the mremap or the stack expansion) did the expand and created a vma
> with a wrapping vm_pgoff. But nothing bad happened, because nobody
> really _cares_ about the wrapping until later, when we split the vma.
>
> So I think (and hope) that your mprotect issue is exactly the same as
> your mlock issue, and that the deeper problem was the earlier stack
> expansion.

So, after ~12h of testing I don't see any crashes. Currently, I'm
testing with 2.6.39-rc3 with 2 of your patches applied (1st patching
mlock.c/memory.c, 2nd: mmap.c).

It's still crashing with sys_madvise (as reported earlier), and I'm
going to re-enable all syscalls now (madvise, getdents(64), readdir),
which were disabled before. If something unrelated to the problems
discussed in this thread happens, I'll report it in another thread.

--
Robert Święcki