LinuxLists.cc - kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

FWIW I got :

* Starting local
Kernel panic - not syncing: Segfault with no mm
08335ed4: [<082b0b3b>] dump_stack+0x22/0x24
08335eec: [<082b0ba0>] panic+0x63/0x167
08335f14: [<080614af>] segv+0x27f/0x2f0
08335fcc: [<08061561>] segv_handler+0x41/0x60
08335fec: [<08071da4>] sig_handler_common+0x44/0xb0

EIP: 0000:[<00000000>] CPU: 0 Not tainted EFLAGS: 00000000
Not tainted
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: 00000000 DS: 0000 ES: 0000
08335e88: [<0807935d>] show_regs+0xed/0x120
08335ea4: [<0806179c>] panic_exit+0x2c/0x50
08335eb4: [<080a2b9c>] notifier_call_chain+0x4c/0x70
08335edc: [<080a2c13>] atomic_notifier_call_chain+0x23/0x30
08335eec: [<082b0bc8>] panic+0x8b/0x167
08335f14: [<080614af>] segv+0x27f/0x2f0
08335fcc: [<08061561>] segv_handler+0x41/0x60
08335fec: [<08071da4>] sig_handler_common+0x44/0xb0

and gdb gives in another session to reproduce the bug this:

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
rwsem_down_failed_common (sem=0x84f4000, flags=<value optimized out>,
adjustment=<value optimized out>)
at lib/rwsem.c:189
189 adjustment += RWSEM_WAITING_BIAS;
(gdb) bt
#0 rwsem_down_failed_common (sem=0x84f4000, flags=<value optimized out>,
adjustment=<value optimized out>)
at lib/rwsem.c:189
#1 0x082b28f5 in rwsem_down_write_failed (sem=0x84f4000) at lib/rwsem.c:236
#2 0x082b0ba2 in call_rwsem_down_write_failed () at arch/um/sys-
i386/../../x86/lib/semaphore_32.S:92
#3 0x082b20e7 in __down_write_nested (sem=0x18916274)
at /home/tfoerste/devel/linux-2.6/arch/x86/include/asm/rwsem.h:105
#4 __down_write (sem=0x18916274) at
/home/tfoerste/devel/linux-2.6/arch/x86/include/asm/rwsem.h:121
#5 down_write (sem=0x18916274) at kernel/rwsem.c:51
#6 0x080d78e5 in sys_brk (brk=139411456) at mm/mmap.c:254
#7 0x08061dc6 in handle_syscall (r=0x19634d50) at
arch/um/kernel/skas/syscall.c:35
#8 0x08075ed1 in handle_trap (regs=0x19634d50) at arch/um/os-
Linux/skas/process.c:201
#9 userspace (regs=0x19634d50) at arch/um/os-Linux/skas/process.c:417
#10 0x0805ef34 in fork_handler () at arch/um/kernel/process.c:181
#11 0x00000000 in ?? ()
(gdb) bt full
#0 rwsem_down_failed_common (sem=0x84f4000, flags=<value optimized out>,
adjustment=<value optimized out>)
at lib/rwsem.c:189
waiter = {list = {next = 0x84cf27c, prev = 0x6}, task = 0x19634b80,
flags = 2}
tsk = 0x19634b80
count = <value optimized out>
#1 0x082b28f5 in rwsem_down_write_failed (sem=0x84f4000) at lib/rwsem.c:236
No locals.
#2 0x082b0ba2 in call_rwsem_down_write_failed () at arch/um/sys-
i386/../../x86/lib/semaphore_32.S:92
No locals.
#3 0x082b20e7 in __down_write_nested (sem=0x18916274)
at /home/tfoerste/devel/linux-2.6/arch/x86/include/asm/rwsem.h:105
tmp = 411836076
#4 __down_write (sem=0x18916274) at
/home/tfoerste/devel/linux-2.6/arch/x86/include/asm/rwsem.h:121
No locals.
#5 down_write (sem=0x18916274) at kernel/rwsem.c:51
No locals.
#6 0x080d78e5 in sys_brk (brk=139411456) at mm/mmap.c:254
rlim = <value optimized out>
newbrk = <value optimized out>
oldbrk = 0
mm = 0x18916240
#7 0x08061dc6 in handle_syscall (r=0x19634d50) at
arch/um/kernel/skas/syscall.c:35
syscall = <value optimized out>
#8 0x08075ed1 in handle_trap (regs=0x19634d50) at arch/um/os-
Linux/skas/process.c:201
err = <value optimized out>
status = 0
#9 userspace (regs=0x19634d50) at arch/um/os-Linux/skas/process.c:417
sig = <value optimized out>
timer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec =
0, tv_usec = 3999}}
nsecs = <value optimized out>
err = <value optimized out>
status = 34175
op = 31
pid = 11337
local_using_sysemu = 2
#10 0x0805ef34 in fork_handler () at arch/um/kernel/process.c:181
No locals.
#11 0x00000000 in ?? ()
No symbol table info available.

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2011-05-19 17:00:37

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

Hi,

Please CC also [email protected],
so you can reach much more UML users. :)

2011/5/19 Toralf F?rster <[email protected]>:
> I got a segfault as soon as I try to access the phpmyadmin web page of the UML
> instance at https://<uml_hostname>/phpmyadmin/ ?:

Hmm, strange.
phpmyadmin works fine on my UML test bed.
Can you bisect the issue?

--
Thanks,
//richard

2011-05-19 17:20:55

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

richard -rw- weinberger wrote at 19:00:35
> Hi,
>
> Please CC also [email protected],
> so you can reach much more UML users. :)
>
> 2011/5/19 Toralf F?rster <[email protected]>:
> > I got a segfault as soon as I try to access the phpmyadmin web page of
> > the UML
>
> > instance at https://<uml_hostname>/phpmyadmin/ :
> Hmm, strange.
> phpmyadmin works fine on my UML test bed.
> Can you bisect the issue?

Errm, automatic bisecting doesn't work, b/c the issue can't be reproduced by a
simple "wget https://..." - when I use konqueror - up to 6-10 times I'm asked
to confirm a cookie or something else before the crash occures.

And if I use "lynx -accept_all_cookies https://n22_uml/phpmyadmin/" then I'm
able to login - so it has something to do with HTTP frames I suspected - but a
shutdown of the UML instance wasn't possible too after such a try - probably
something else then the HTTP frames itself triggers the issue ...

In short - it is not phpmyadmin (3.4.0) itself, but it triggers the bug (in
fact sometimes I even could see the login window within Firefox of the
phpmyadmin site before the crash happened).

Because therefore manual interaction is needed (or do you know an automated
way for konqueror/ff/.... ?) at least it would be helpful if the bisecting
could be narrowed doesn to a given path or somethign else.

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2011-05-19 17:25:55

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

2011/5/19 Toralf F?rster <[email protected]>:
> Errm, automatic bisecting doesn't work, b/c the issue can't be reproduced by a
> simple "wget https://..." - when I use konqueror - up to 6-10 times I'm asked
> to confirm a cookie or something else before the crash occures.
>
> And if I use "lynx -accept_all_cookies https://n22_uml/phpmyadmin/" then I'm
> able to login - so it has something to do with HTTP frames I suspected - but a
> shutdown of the UML instance wasn't possible too after such a try - probably
> something else then the HTTP frames itself triggers the issue ...
>
> In short - ?it is not phpmyadmin (3.4.0) itself, but it triggers the bug (in
> fact sometimes I even could see the login window within Firefox of the
> phpmyadmin site before the crash happened).
>
> Because therefore manual interaction is needed (or do you know an automated
> way for konqueror/ff/.... ?) at least it would be helpful if the bisecting
> could be narrowed doesn to a given path or somethign else.

BTW: Haven?t you had such an issue a few months ago?
Does it work without https? Maybe mod_ssl triggers the bug...

--
Thanks,
//richard

2011-05-19 20:18:27

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

richard -rw- weinberger wrote at 19:00:35
> Can you bisect the issue?

tfoerste@n22 ~/devel/linux-2.6 $ git bisect bad
2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c is the first bad commit
commit 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c
Author: Lai Jiangshan <[email protected]>
Date: Wed Dec 22 14:18:50 2010 +0800

futex,plist: Pass the real head of the priority list to plist_del()

Some plist_del()s in kernel/futex.c are passed a faked head of the
priority list.

It does not fail because the current code does not require the real head
in plist_del(). The current code of plist_del() just uses the head for
checking,
so it will not cause a bad result even when we use a faked head.

But it is undocumented usage:

/**
* plist_del - Remove a @node from plist.
*
* @node: &struct plist_node pointer - entry to be removed
* @head: &struct plist_head pointer - list head
*/

The document says that the @head is the "list head" head of the priority
list.

In futex code, several places use "plist_del(&q->list, &q->list.plist);",
they pass a fake head. We need to fix them all.

Thanks to Darren Hart for many suggestions.

Acked-by: Darren Hart <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

:040000 040000 78d47de377f8da1c131007a17ca915fbd13f7ff6
ffac93205aaf22fda0667d6395c8da7c7bf692e4 M kernel

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2011-05-19 20:43:52

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On Thu, May 19, 2011 at 10:18:16PM +0200, Toralf F?rster wrote:
>
> richard -rw- weinberger wrote at 19:00:35
> > Can you bisect the issue?

Is this bug fully reproducable? If not, then you may have had a git
bisect good, when it should have been git bisect bad.

The futex/plist should not be affecting rwsem.

-- Steve

>
> tfoerste@n22 ~/devel/linux-2.6 $ git bisect bad
> 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c is the first bad commit
> commit 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c
> Author: Lai Jiangshan <[email protected]>
> Date: Wed Dec 22 14:18:50 2010 +0800
>
> futex,plist: Pass the real head of the priority list to plist_del()
>
> Some plist_del()s in kernel/futex.c are passed a faked head of the
> priority list.
>
> It does not fail because the current code does not require the real head
> in plist_del(). The current code of plist_del() just uses the head for
> checking,
> so it will not cause a bad result even when we use a faked head.
>
> But it is undocumented usage:
>
> /**
> * plist_del - Remove a @node from plist.
> *
> * @node: &struct plist_node pointer - entry to be removed
> * @head: &struct plist_head pointer - list head
> */
>
> The document says that the @head is the "list head" head of the priority
> list.
>
> In futex code, several places use "plist_del(&q->list, &q->list.plist);",
> they pass a fake head. We need to fix them all.
>
> Thanks to Darren Hart for many suggestions.
>
> Acked-by: Darren Hart <[email protected]>
> Signed-off-by: Lai Jiangshan <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
>
> :040000 040000 78d47de377f8da1c131007a17ca915fbd13f7ff6
> ffac93205aaf22fda0667d6395c8da7c7bf692e4 M kernel
>
> --
> MfG/Sincerely
> Toralf F?rster
> pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-05-20 06:44:41

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

Hi,

a few more questions/ideas. :)

2011/5/19 Toralf F?rster <[email protected]>:
> FWIW I got :
>
> * Starting local

What is this "Starting local",
was UML crashing while starting your distro?

> Kernel panic - not syncing: Segfault with no mm
> 08335ed4: ?[<082b0b3b>] dump_stack+0x22/0x24
> 08335eec: ?[<082b0ba0>] panic+0x63/0x167
> 08335f14: ?[<080614af>] segv+0x27f/0x2f0
> 08335fcc: ?[<08061561>] segv_handler+0x41/0x60
> 08335fec: ?[<08071da4>] sig_handler_common+0x44/0xb0
>
>
> EIP: 0000:[<00000000>] CPU: 0 Not tainted EFLAGS: 00000000
> ? ?Not tainted
> EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
> ESI: 00000000 EDI: 00000000 EBP: 00000000 DS: 0000 ES: 0000
> 08335e88: ?[<0807935d>] show_regs+0xed/0x120
> 08335ea4: ?[<0806179c>] panic_exit+0x2c/0x50
> 08335eb4: ?[<080a2b9c>] notifier_call_chain+0x4c/0x70
> 08335edc: ?[<080a2c13>] atomic_notifier_call_chain+0x23/0x30
> 08335eec: ?[<082b0bc8>] panic+0x8b/0x167
> 08335f14: ?[<080614af>] segv+0x27f/0x2f0
> 08335fcc: ?[<08061561>] segv_handler+0x41/0x60
> 08335fec: ?[<08071da4>] sig_handler_common+0x44/0xb0
>
>
> and gdb gives in another session to reproduce the bug this:
>
> (gdb) c
> Continuing.

GDB stopped here and UML got SIGSEGV after you continued?
GDB has to ignore SIGSEGV. UML uses this signal to handle page faults.
type: handle SIGSEGV noprint nostop pass

I fear the backtrace is garbage.

Can you reproduce the issue using the default config?
Are you using hostfs?
What exactly is the output when it crashes? (Without GDB)

Your host's kernel ring buffer should contain a line like this one
after the crash:
linux[123]: segfault at 0 ip xxx sp xxx error 4 in linux[xxx+yyy]
Please share this line with me.

--
Thanks,
//richard

2011-05-20 07:37:24

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

Steven Rostedt wrote at 22:43:43
> Is this bug fully reproducable? If not, then you may have had a git
> bisect good, when it should have been git bisect bad.
Yes, bisected it again to the same commit.
Furthermore I explicitely checked out that revision - tested it - issue exists,
reverted exactly that commit on top of the checked out tree and tested it
again, issue went away.
Then I recompiled the buggy version with CONFIG_DEBUG_INFO=y
here's the output :

...
Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x80a9f6b
08324b44: [<0829e78b>] dump_stack+0x22/0x24
08324b5c: [<0829e7f0>] panic+0x63/0x167
08324b84: [<080603d2>] segv+0x1e2/0x2b0
08324c3c: [<080604e1>] segv_handler+0x41/0x60
08324c5c: [<08070c54>] sig_handler_common+0x44/0xb0
08324cd8: [<08070e32>] sig_handler+0x42/0x50
08324ce8: [<0807106c>] handle_signal+0x5c/0xa0
08324d0c: [<08073408>] hard_handler+0x18/0x20
08324d1c: [<b7715400>] 0xb7715400

EIP: 0073:[<400008d2>] CPU: 0 Tainted: G W ESP: 007b:4ef22270 EFLAGS: 00200206
Tainted: G W
EAX: ffffffda EBX: 081efe10 ECX: 00000081 EDX: 00000001
ESI: 083f6758 EDI: 081efe0c EBP: 080a88a8 DS: 007b ES: 007b
08324af8: [<080780bd>] show_regs+0xed/0x120
08324b14: [<0806071c>] panic_exit+0x2c/0x50
08324b24: [<0809fc1c>] notifier_call_chain+0x4c/0x70
08324b4c: [<0809fc93>] atomic_notifier_call_chain+0x23/0x30
08324b5c: [<0829e818>] panic+0x8b/0x167
08324b84: [<080603d2>] segv+0x1e2/0x2b0
08324c3c: [<080604e1>] segv_handler+0x41/0x60
08324c5c: [<08070c54>] sig_handler_common+0x44/0xb0
08324cd8: [<08070e32>] sig_handler+0x42/0x50
08324ce8: [<0807106c>] handle_signal+0x5c/0xa0
08324d0c: [<08073408>] hard_handler+0x18/0x20
08324d1c: [<b7715400>] 0xb7715400

The file /var/log/messages of the UML says :

2011-05-20T09:33:03.455+02:00 n22_uml kernel: ------------[ cut here ]------------
2011-05-20T09:33:03.455+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60()
2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bd14: [<0829e78b>] dump_stack+0x22/0x24
2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bd2c: [<0808205a>] warn_slowpath_common+0x5a/0x80
2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bd54: [<080820a3>] warn_slowpath_null+0x23/0x30
2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bd64: [<080a9eb8>] wake_futex+0x28/0x60
2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bd7c: [<080a9faf>] futex_wake+0xbf/0x100
2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bda4: [<080abb1d>] do_futex+0xcd/0x6c0
2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5be08: [<080ac184>] sys_futex+0x74/0x140
2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5be60: [<0807ffc1>] mm_release+0xd1/0x130
2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5be8c: [<08083dad>] exit_mm+0x1d/0x100
2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5beb8: [<08085b73>] do_exit+0xc3/0x660
2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5bf14: [<080861e9>] sys_exit+0x19/0x20
2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5bf20: [<08060d16>] handle_syscall+0xa6/0xb0
2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5bf68: [<08074cf1>] userspace+0x361/0x500
2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5bfe8: [<0805e0cb>] fork_handler+0x5b/0x70
2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5bffc: [<00000000>] 0x0
2011-05-20T09:33:03.457+02:00 n22_uml kernel:
2011-05-20T09:33:03.457+02:00 n22_uml kernel: ---[ end trace 95fb08f635a473e8 ]---
2011-05-20T09:33:03.831+02:00 n22_uml kernel: ------------[ cut here ]------------
2011-05-20T09:33:03.831+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60()
2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99d14: [<0829e78b>] dump_stack+0x22/0x24
2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99d2c: [<0808205a>] warn_slowpath_common+0x5a/0x80
2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99d54: [<080820a3>] warn_slowpath_null+0x23/0x30
2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99d64: [<080a9eb8>] wake_futex+0x28/0x60
2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99d7c: [<080a9faf>] futex_wake+0xbf/0x100
2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99da4: [<080abb1d>] do_futex+0xcd/0x6c0
2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99e08: [<080ac184>] sys_futex+0x74/0x140
2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99e60: [<0807ffc1>] mm_release+0xd1/0x130
2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99e8c: [<08083dad>] exit_mm+0x1d/0x100
2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99eb8: [<08085b73>] do_exit+0xc3/0x660
2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99f14: [<080861e9>] sys_exit+0x19/0x20
2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99f20: [<08060d16>] handle_syscall+0xa6/0xb0
2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99f68: [<08074cf1>] userspace+0x361/0x500
2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99fe8: [<0805e0cb>] fork_handler+0x5b/0x70
2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99ffc: [<00000000>] 0x0
2011-05-20T09:33:03.832+02:00 n22_uml kernel:
2011-05-20T09:33:03.832+02:00 n22_uml kernel: ---[ end trace 95fb08f635a473e9 ]---
2011-05-20T09:33:03.951+02:00 n22_uml kernel: ------------[ cut here ]------------
2011-05-20T09:33:03.951+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60()
2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bd78: [<0829e78b>] dump_stack+0x22/0x24
2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bd90: [<0808205a>] warn_slowpath_common+0x5a/0x80
2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bdb8: [<080820a3>] warn_slowpath_null+0x23/0x30
2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bdc8: [<080a9eb8>] wake_futex+0x28/0x60
2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bde0: [<080ab702>] futex_requeue+0x362/0x6b0
2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5be64: [<080abceb>] do_futex+0x29b/0x6c0
2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bec8: [<080ac184>] sys_futex+0x74/0x140
2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bf20: [<08060d16>] handle_syscall+0xa6/0xb0
2011-05-20T09:33:03.955+02:00 n22_uml kernel: 19e5bf68: [<08074cf1>] userspace+0x361/0x500
2011-05-20T09:33:03.955+02:00 n22_uml kernel: 19e5bfe8: [<0805e0cb>] fork_handler+0x5b/0x70
2011-05-20T09:33:03.955+02:00 n22_uml kernel: 19e5bffc: [<00000000>] 0x0
2011-05-20T09:33:03.955+02:00 n22_uml kernel:
2011-05-20T09:33:03.955+02:00 n22_uml kernel: ---[ end trace 95fb08f635a473ea ]---
2011-05-20T09:33:04.000+02:00 n22_uml sshd[738]: Server listening on 0.0.0.0 port 22.
2011-05-20T09:33:06.100+02:00 n22_uml kernel: ------------[ cut here ]------------
2011-05-20T09:33:06.100+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60()
2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0d14: [<0829e78b>] dump_stack+0x22/0x24
2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0d2c: [<0808205a>] warn_slowpath_common+0x5a/0x80
2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0d54: [<080820a3>] warn_slowpath_null+0x23/0x30
2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0d64: [<080a9eb8>] wake_futex+0x28/0x60
2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0d7c: [<080a9faf>] futex_wake+0xbf/0x100
2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0da4: [<080abb1d>] do_futex+0xcd/0x6c0
2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0e08: [<080ac184>] sys_futex+0x74/0x140
2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0e60: [<0807ffc1>] mm_release+0xd1/0x130
2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0e8c: [<08083dad>] exit_mm+0x1d/0x100
2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0eb8: [<08085b73>] do_exit+0xc3/0x660
2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0f14: [<080861e9>] sys_exit+0x19/0x20
2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0f20: [<08060d16>] handle_syscall+0xa6/0xb0
2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0f68: [<08074cf1>] userspace+0x361/0x500
2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0fe8: [<0805e0cb>] fork_handler+0x5b/0x70
2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0ffc: [<00000000>] 0x0
2011-05-20T09:33:06.104+02:00 n22_uml kernel:
2011-05-20T09:33:06.104+02:00 n22_uml kernel: ---[ end trace 95fb08f635a473eb ]---
2011-05-20T09:33:09.000+02:00 n22_uml cron[851]: (CRON) STARTUP (V5.0)
2011-05-20T09:33:10.112+02:00 n22_uml kernel: Virtual console 1 assigned device '/dev/pts/5'

>
> The futex/plist should not be affecting rwsem.
>
> -- Steve
>
> > tfoerste@n22 ~/devel/linux-2.6 $ git bisect bad
> > 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c is the first bad commit
> > commit 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c
> > Author: Lai Jiangshan <[email protected]>
> > Date: Wed Dec 22 14:18:50 2010 +0800
> >
> > futex,plist: Pass the real head of the priority list to plist_del()
> >
> > Some plist_del()s in kernel/futex.c are passed a faked head of the
> > priority list.
> >
> > It does not fail because the current code does not require the real
> > head in plist_del(). The current code of plist_del() just uses the
> > head for
> >
> > checking,
> >
> > so it will not cause a bad result even when we use a faked head.
> >
> > But it is undocumented usage:
> >
> > /**
> >
> > * plist_del - Remove a @node from plist.
> > *
> > * @node: &struct plist_node pointer - entry to be removed
> > * @head: &struct plist_head pointer - list head
> > */
> >
> > The document says that the @head is the "list head" head of the
> > priority
> >
> > list.
> >
> > In futex code, several places use "plist_del(&q->list,
> > &q->list.plist);", they pass a fake head. We need to fix them all.
> >
> > Thanks to Darren Hart for many suggestions.
> >
> > Acked-by: Darren Hart <[email protected]>
> > Signed-off-by: Lai Jiangshan <[email protected]>
> > LKML-Reference: <[email protected]>
> > Signed-off-by: Steven Rostedt <[email protected]>
> > :
> > :040000 040000 78d47de377f8da1c131007a17ca915fbd13f7ff6
> >
> > ffac93205aaf22fda0667d6395c8da7c7bf692e4 M kernel

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2011-05-20 07:43:16

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

richard -rw- weinberger wrote at 08:44:39
> > * Starting local
>
> What is this "Starting local",
> was UML crashing while starting your distro?
No, that the last init script of the Gentoo, I'm using within UML.

> I fear the backtrace is garbage.
yep, maybe this helps more :

Program received signal SIGSEGV, Segmentation fault.
0x080a9f6b in futex_wake (uaddr=<value optimized out>, flags=<value optimized out>, nr_wake=<value optimized out>,
bitset=4294967295) at kernel/futex.c:958
958 plist_for_each_entry_safe(this, next, head, list) {
(gdb) bt
#0 0x080a9f6b in futex_wake (uaddr=<value optimized out>, flags=<value optimized out>, nr_wake=<value optimized out>,
bitset=4294967295) at kernel/futex.c:958
#1 0x080abb1d in do_futex (uaddr=0x81efe10, op=0, val=0, timeout=0x0, uaddr2=0x81efe0c, val2=0, val3=4294967295)
at kernel/futex.c:2610
#2 0x080ac184 in sys_futex (uaddr=0x81efe10, op=129, val=1, utime=0x83f89b8, uaddr2=0x81efe0c, val3=134908072)
at kernel/futex.c:2678
#3 0x08060d16 in handle_syscall (r=0x19545290) at arch/um/kernel/skas/syscall.c:35
#4 0x08074cf1 in handle_trap (regs=0x19545290) at arch/um/os-Linux/skas/process.c:201
#5 userspace (regs=0x19545290) at arch/um/os-Linux/skas/process.c:417
#6 0x0805e0cb in fork_handler () at arch/um/kernel/process.c:181
#7 0x00000000 in ?? ()

>
> Can you reproduce the issue using the default config?
yes

> Are you using hostfs?
Yes, but the issue is independend from that.

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2011-05-20 07:56:04

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

2011/5/20 Toralf F?rster <[email protected]>:
> ...
> Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x80a9f6b

Looks like a NULL-pointer bug.
What code is at address 80a9f6b?
Use "objdump -d -S | less" to find it.
Please note, kernel binary and log message have to match!

> The file /var/log/messages of the UML says :
>
> 2011-05-20T09:33:03.455+02:00 n22_uml kernel: ------------[ cut here ]------------
> 2011-05-20T09:33:03.455+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60()

Is this really 2.6.39?
Line 789 contains no WARN*().
http://lxr.linux.no/#linux+v2.6.39/kernel/futex.c#L789

--
Thanks,
//richard

2011-05-20 08:39:06

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

2011/5/20 richard -rw- weinberger <[email protected]>:
> Use "objdump -d -S | less" to find it.

Ick, I meant "objdump -d -S vmlinux | less"

--
Thanks,
//richard

2011-05-20 08:42:26

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

richard -rw- weinberger wrote at 09:56:02
> 2011/5/20 Toralf F?rster <[email protected]>:
> > ...
> > Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x80a9f6b
>
> Looks like a NULL-pointer bug.
> What code is at address 80a9f6b?
> Use "objdump -d -S | less" to find it.
if (unlikely(ret != 0))
80a9f3a: 85 c0 test %eax,%eax
80a9f3c: 75 ca jne 80a9f08 <futex_wake+0x18>
goto out;

hb = hash_futex(&key);
80a9f3e: 8d 45 e8 lea -0x18(%ebp),%eax
80a9f41: e8 aa f6 ff ff call 80a95f0 <hash_futex>
80a9f46: 89 c2 mov %eax,%edx
spin_lock(&hb->lock);
head = &hb->chain;

plist_for_each_entry_safe(this, next, head, list) {
80a9f48: 8b 48 08 mov 0x8(%eax),%ecx
80a9f4b: 83 c2 08 add $0x8,%edx
80a9f4e: 8d 41 f4 lea -0xc(%ecx),%eax
80a9f51: 39 ca cmp %ecx,%edx
80a9f53: 8b 70 0c mov 0xc(%eax),%esi
80a9f56: 74 6a je 80a9fc2 <futex_wake+0xd2>
80a9f58: 89 d9 mov %ebx,%ecx
80a9f5a: 83 ee 0c sub $0xc,%esi
80a9f5d: 89 d3 mov %edx,%ebx
80a9f5f: 89 fa mov %edi,%edx
80a9f61: 89 cf mov %ecx,%edi
80a9f63: eb 12 jmp 80a9f77 <futex_wake+0x87>
80a9f65: 8d 76 00 lea 0x0(%esi),%esi
80a9f68: 8d 46 0c lea 0xc(%esi),%eax
80a9f6b: 8b 4e 0c mov 0xc(%esi),%ecx
80a9f6e: 39 c3 cmp %eax,%ebx
80a9f70: 74 4e je 80a9fc0 <futex_wake+0xd0>
80a9f72: 89 f0 mov %esi,%eax
80a9f74: 8d 71 f4 lea -0xc(%ecx),%esi
if (match_futex (&this->key, &key)) {
80a9f77: 83 f8 e4 cmp $0xffffffe4,%eax
80a9f7a: 74 ec je 80a9f68 <futex_wake+0x78>
80a9f7c: 8b 48 1c mov 0x1c(%eax),%ecx
80a9f7f: 3b 4d e8 cmp -0x18(%ebp),%ecx
80a9f82: 75 e4 jne 80a9f68 <futex_wake+0x78>
/*
* Return 1 if two futex_keys are equal, 0 otherwise.
*/

> Is this really 2.6.39?
No, but I didn't want to change the subject line, the bisected version is :
v2.6.38-rc8-1-g2e12978

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2011-05-20 08:58:26

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

richard -rw- weinberger wrote at 10:39:03
> 2011/5/20 richard -rw- weinberger <[email protected]>:
> > Use "objdump -d -S | less" to find it.
>
> Ick, I meant "objdump -d -S vmlinux | less"
Well (BTW nearly similar to the result of "objdump -d -S linux" ), but anyway
here it is :

spin_lock(&hb->lock);
head = &hb->chain;

plist_for_each_entry_safe(this, next, head, list) {
80a9f48: 8b 48 08 mov 0x8(%eax),%ecx
80a9f4b: 83 c2 08 add $0x8,%edx
80a9f4e: 8d 41 f4 lea -0xc(%ecx),%eax
80a9f51: 39 ca cmp %ecx,%edx
80a9f53: 8b 70 0c mov 0xc(%eax),%esi
80a9f56: 74 6a je 80a9fc2 <futex_wake+0xd2>
80a9f58: 89 d9 mov %ebx,%ecx
80a9f5a: 83 ee 0c sub $0xc,%esi
80a9f5d: 89 d3 mov %edx,%ebx
80a9f5f: 89 fa mov %edi,%edx
80a9f61: 89 cf mov %ecx,%edi
80a9f63: eb 12 jmp 80a9f77 <futex_wake+0x87>
80a9f65: 8d 76 00 lea 0x0(%esi),%esi
80a9f68: 8d 46 0c lea 0xc(%esi),%eax
80a9f6b: 8b 4e 0c mov 0xc(%esi),%ecx
80a9f6e: 39 c3 cmp %eax,%ebx
80a9f70: 74 4e je 80a9fc0 <futex_wake+0xd0>
80a9f72: 89 f0 mov %esi,%eax
80a9f74: 8d 71 f4 lea -0xc(%ecx),%esi
if (match_futex (&this->key, &key)) {
80a9f77: 83 f8 e4 cmp $0xffffffe4,%eax
80a9f7a: 74 ec je 80a9f68 <futex_wake+0x78>
80a9f7c: 8b 48 1c mov 0x1c(%eax),%ecx
80a9f7f: 3b 4d e8 cmp -0x18(%ebp),%ecx
80a9f82: 75 e4 jne 80a9f68 <futex_wake+0x78>
/*
* Return 1 if two futex_keys are equal, 0 otherwise.
*/

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2011-05-20 09:02:37

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

2011/5/20 Toralf F?rster <[email protected]>:
>> Ick, I meant "objdump -d -S vmlinux | less"
> Well (BTW nearly similar to the result of "objdump -d -S linux" ), but anyway
> here it is :

I hope it's _exactly_ the same result. vmlinux and linux are hard linked...

--
Thanks,
//richard

2011-05-20 09:19:41

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

richard -rw- weinberger wrote at 11:02:35

> I hope it's _exactly_ the same result. vmlinux and linux are hard linked...
tfoerste@n22 ~/devel/linux-2.6 $ ls -lid vmlinux linux
2034205 -rwxr-xr-x 2 tfoerste users 35624874 May 20 09:28 linux
2034205 -rwxr-xr-x 2 tfoerste users 35624874 May 20 09:28 vmlinux

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2011-05-20 15:55:14

by Darren Hart

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On 05/20/2011 12:56 AM, richard -rw- weinberger wrote:
> 2011/5/20 Toralf F?rster <[email protected]>:
>> ...
>> Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x80a9f6b
>
> Looks like a NULL-pointer bug.
> What code is at address 80a9f6b?
> Use "objdump -d -S | less" to find it.
> Please note, kernel binary and log message have to match!
>
>> The file /var/log/messages of the UML says :
>>
>> 2011-05-20T09:33:03.455+02:00 n22_uml kernel: ------------[ cut here ]------------
>> 2011-05-20T09:33:03.455+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60()
>
> Is this really 2.6.39?
> Line 789 contains no WARN*().
> http://lxr.linux.no/#linux+v2.6.39/kernel/futex.c#L789
>

I suspect Toralf is hitting the WARN_ON in __unqueue_futex:

if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr)
|| plist_node_empty(&q->list)))

Toralf, can you instrument that let us know which of conditions is
triggering the WARN_ON? Something like the following should be adequate
to get you the line number. I suspect it is plist_node_empty give the
git bisect results you reported.

diff --git a/kernel/futex.c b/kernel/futex.c
index abd5324..7f31bca 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -782,8 +782,11 @@ static void __unqueue_futex(struct futex_q *q)
{
struct futex_hash_bucket *hb;

- if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr)
- || plist_node_empty(&q->list)))
+ if (WARN_ON(!q->lock_ptr))
+ return;
+ if (!spin_is_locked(q->lock_ptr))
+ return;
+ if (plist_node_empty(&q->list))
return;

hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock);

--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel

2011-05-20 16:04:20

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On Fri, 2011-05-20 at 08:55 -0700, Darren Hart wrote:

> I suspect Toralf is hitting the WARN_ON in __unqueue_futex:
>
> if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr)
> || plist_node_empty(&q->list)))
>
> Toralf, can you instrument that let us know which of conditions is
> triggering the WARN_ON? Something like the following should be adequate
> to get you the line number. I suspect it is plist_node_empty give the
> git bisect results you reported.
>
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index abd5324..7f31bca 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -782,8 +782,11 @@ static void __unqueue_futex(struct futex_q *q)
> {
> struct futex_hash_bucket *hb;
>
> - if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr)
> - || plist_node_empty(&q->list)))
> + if (WARN_ON(!q->lock_ptr))
> + return;
> + if (!spin_is_locked(q->lock_ptr))
> + return;
> + if (plist_node_empty(&q->list))
> return;
>

Wait! This is where we need the WARN_ON_SMP(), do we have that patch in?

I think UML is UP, and that spin_is_locked() will always return false.

-- Steve

2011-05-20 16:11:40

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On Fri, 2011-05-20 at 12:04 -0400, Steven Rostedt wrote:
> On Fri, 2011-05-20 at 08:55 -0700, Darren Hart wrote:
>
> > I suspect Toralf is hitting the WARN_ON in __unqueue_futex:
> >
> > if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr)
> > || plist_node_empty(&q->list)))
> >
> > Toralf, can you instrument that let us know which of conditions is
> > triggering the WARN_ON? Something like the following should be adequate
> > to get you the line number. I suspect it is plist_node_empty give the
> > git bisect results you reported.
> >
> >
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index abd5324..7f31bca 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -782,8 +782,11 @@ static void __unqueue_futex(struct futex_q *q)
> > {
> > struct futex_hash_bucket *hb;
> >
> > - if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr)
> > - || plist_node_empty(&q->list)))
> > + if (WARN_ON(!q->lock_ptr))
> > + return;
> > + if (!spin_is_locked(q->lock_ptr))
> > + return;
> > + if (plist_node_empty(&q->list))
> > return;
> >
>
> Wait! This is where we need the WARN_ON_SMP(), do we have that patch in?
>
> I think UML is UP, and that spin_is_locked() will always return false.
>

Could you apply these patches:

2092e6be WARN_ON_SMP(): Allow use in if() statements on UP
29096202 futex: Fix WARN_ON() test for UP

On top of this commit, and see if the problem goes away. What could have
happened, is that you have two bugs, with one of them fixed. If the git
bisect stumbled on this bug, it will show this one, even though later
on, this code was fixed. If you apply the above two patches and it works
again, then this isn't the bug you are looking for.

-- Steve

2011-05-20 16:24:58

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

2011/5/20 Toralf F?rster <[email protected]>:
>
> richard -rw- weinberger wrote at 09:56:02
>> 2011/5/20 Toralf F?rster <[email protected]>:
>> > ...
>> > Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x80a9f6b
>>
>> Looks like a NULL-pointer bug.
>> What code is at address 80a9f6b?
>> Use "objdump -d -S | less" to find it.
> ? ? ? ?if (unlikely(ret != 0))
> ?80a9f3a: ? ? ? 85 c0 ? ? ? ? ? ? ? ? ? test ? %eax,%eax
> ?80a9f3c: ? ? ? 75 ca ? ? ? ? ? ? ? ? ? jne ? ?80a9f08 <futex_wake+0x18>
> ? ? ? ? ? ? ? ?goto out;
>
> ? ? ? ?hb = hash_futex(&key);
> ?80a9f3e: ? ? ? 8d 45 e8 ? ? ? ? ? ? ? ?lea ? ?-0x18(%ebp),%eax
> ?80a9f41: ? ? ? e8 aa f6 ff ff ? ? ? ? ?call ? 80a95f0 <hash_futex>
> ?80a9f46: ? ? ? 89 c2 ? ? ? ? ? ? ? ? ? mov ? ?%eax,%edx
> ? ? ? ?spin_lock(&hb->lock);
> ? ? ? ?head = &hb->chain;
>
> ? ? ? ?plist_for_each_entry_safe(this, next, head, list) {
> ?80a9f48: ? ? ? 8b 48 08 ? ? ? ? ? ? ? ?mov ? ?0x8(%eax),%ecx
> ?80a9f4b: ? ? ? 83 c2 08 ? ? ? ? ? ? ? ?add ? ?$0x8,%edx
> ?80a9f4e: ? ? ? 8d 41 f4 ? ? ? ? ? ? ? ?lea ? ?-0xc(%ecx),%eax
> ?80a9f51: ? ? ? 39 ca ? ? ? ? ? ? ? ? ? cmp ? ?%ecx,%edx
> ?80a9f53: ? ? ? 8b 70 0c ? ? ? ? ? ? ? ?mov ? ?0xc(%eax),%esi
> ?80a9f56: ? ? ? 74 6a ? ? ? ? ? ? ? ? ? je ? ? 80a9fc2 <futex_wake+0xd2>
> ?80a9f58: ? ? ? 89 d9 ? ? ? ? ? ? ? ? ? mov ? ?%ebx,%ecx
> ?80a9f5a: ? ? ? 83 ee 0c ? ? ? ? ? ? ? ?sub ? ?$0xc,%esi
> ?80a9f5d: ? ? ? 89 d3 ? ? ? ? ? ? ? ? ? mov ? ?%edx,%ebx
> ?80a9f5f: ? ? ? 89 fa ? ? ? ? ? ? ? ? ? mov ? ?%edi,%edx
> ?80a9f61: ? ? ? 89 cf ? ? ? ? ? ? ? ? ? mov ? ?%ecx,%edi
> ?80a9f63: ? ? ? eb 12 ? ? ? ? ? ? ? ? ? jmp ? ?80a9f77 <futex_wake+0x87>
> ?80a9f65: ? ? ? 8d 76 00 ? ? ? ? ? ? ? ?lea ? ?0x0(%esi),%esi
> ?80a9f68: ? ? ? 8d 46 0c ? ? ? ? ? ? ? ?lea ? ?0xc(%esi),%eax
> ?80a9f6b: ? ? ? 8b 4e 0c ? ? ? ? ? ? ? ?mov ? ?0xc(%esi),%ecx

Here in futex_wake() happens a NULL pointer dereference.
Steve, any ideas?

--
Thanks,
//richard

2011-05-20 17:10:24

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

Steven Rostedt wrote at 18:11:36
> > Wait! This is where we need the WARN_ON_SMP(), do we have that patch in?
> >
> > I think UML is UP, and that spin_is_locked() will always return false.
>
> Could you apply these patches:
>
> 2092e6be WARN_ON_SMP(): Allow use in if() statements on UP
> 29096202 futex: Fix WARN_ON() test for UP
>
> On top of this commit, and see if the problem goes away. What could have
> happened, is that you have two bugs, with one of them fixed. If the git
> bisect stumbled on this bug, it will show this one, even though later
> on, this code was fixed. If you apply the above two patches and it works
> again, then this isn't the bug you are looking for.
>
> -- Steve
Right - applying those 2 commits on top of v2.6.38-rc8-1-g2e12978 works - now
the issue is away.

And yes - now I've to look for the other of two bugs, which was introduced
between the tags v2.6.38 and v2.6.39, I think.

Is there's an easy way to check, whether a given checked out git tree contains
a specific commit id ?

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2011-05-20 17:19:17

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On Fri, 2011-05-20 at 18:24 +0200, richard -rw- weinberger wrote:
> 2011/5/20 Toralf F?rster <[email protected]>:
> >

> Here in futex_wake() happens a NULL pointer dereference.
> Steve, any ideas?
>

Yes, if this is from the bisect, and does not contain the two commits
that I've posted in another email. Without those fixes, the futex code
will error and return without doing the proper work and cause all sorts
of bugs.

-- Steve

2011-05-20 17:35:56

by Darren Hart

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On 05/20/2011 09:04 AM, Steven Rostedt wrote:
> On Fri, 2011-05-20 at 08:55 -0700, Darren Hart wrote:
>
>> I suspect Toralf is hitting the WARN_ON in __unqueue_futex:
>>
>> if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr)
>> || plist_node_empty(&q->list)))
>>
>> Toralf, can you instrument that let us know which of conditions is
>> triggering the WARN_ON? Something like the following should be adequate
>> to get you the line number. I suspect it is plist_node_empty give the
>> git bisect results you reported.
>>
>>
>> diff --git a/kernel/futex.c b/kernel/futex.c
>> index abd5324..7f31bca 100644
>> --- a/kernel/futex.c
>> +++ b/kernel/futex.c
>> @@ -782,8 +782,11 @@ static void __unqueue_futex(struct futex_q *q)
>> {
>> struct futex_hash_bucket *hb;
>>
>> - if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr)
>> - || plist_node_empty(&q->list)))
>> + if (WARN_ON(!q->lock_ptr))
>> + return;
>> + if (!spin_is_locked(q->lock_ptr))
>> + return;
>> + if (plist_node_empty(&q->list))
>> return;
>>
>

Whoops, there should have been WARN_ON's in all the if blocks... duh.

> Wait! This is where we need the WARN_ON_SMP(), do we have that patch in?

Hrm, I thought he said he was on 2.6.39-rc-something. Those patches went
in pre 2.6.39-rc1 according to gitk.

>
> I think UML is UP, and that spin_is_locked() will always return false.
>
> -- Steve
>
>

--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel

2011-05-20 17:41:18

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On Fri, 2011-05-20 at 10:35 -0700, Darren Hart wrote:

> > Wait! This is where we need the WARN_ON_SMP(), do we have that patch in?
>
> Hrm, I thought he said he was on 2.6.39-rc-something. Those patches went
> in pre 2.6.39-rc1 according to gitk.

A git bisect can easily stumble on this where the fix is not made.

-- Steve

2011-05-20 17:44:06

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On Fri, 2011-05-20 at 19:10 +0200, Toralf F?rster wrote:

> Is there's an easy way to check, whether a given checked out git tree contains
> a specific commit id ?

You can try:

git log --pretty=oneline | grep <SHA1>

-- Steve

2011-05-20 17:46:12

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On Fri, 2011-05-20 at 19:10 +0200, Toralf F?rster wrote:
> Steven Rostedt wrote at 18:11:36
> > > Wait! This is where we need the WARN_ON_SMP(), do we have that patch in?
> > >
> > > I think UML is UP, and that spin_is_locked() will always return false.
> >
> > Could you apply these patches:
> >
> > 2092e6be WARN_ON_SMP(): Allow use in if() statements on UP
> > 29096202 futex: Fix WARN_ON() test for UP
> >
> > On top of this commit, and see if the problem goes away. What could have
> > happened, is that you have two bugs, with one of them fixed. If the git
> > bisect stumbled on this bug, it will show this one, even though later
> > on, this code was fixed. If you apply the above two patches and it works
> > again, then this isn't the bug you are looking for.
> >
> > -- Steve
> Right - applying those 2 commits on top of v2.6.38-rc8-1-g2e12978 works - now
> the issue is away.
>
> And yes - now I've to look for the other of two bugs, which was introduced
> between the tags v2.6.38 and v2.6.39, I think.

Another thing you could do is checkout 29096202 and see if it works. If
it does, you can base that as your "git bisect good" and you should not
be affected by this bug again.

-- Steve

2011-05-20 22:54:06

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

Steven Rostedt wrote at 18:11:36
> Could you apply these patches:
>
> 2092e6be WARN_ON_SMP(): Allow use in if() statements on UP
> 29096202 futex: Fix WARN_ON() test for UP
>
> On top of this commit, and see if the problem goes away. What could have
> happened, is that you have two bugs, with one of them fixed. If the git
> bisect stumbled on this bug, it will show this one, even though later
> on, this code was fixed. If you apply the above two patches and it works
> again, then this isn't the bug you are looking for.

I bisected it again and applied at every step those 2 commits, if commit
2e12978 was in the source too.

Furthermore it was necessary to use a fresh instance of firefox every time to
reproduce a now shomehow changed issue: the UML system wasn't longer
reachable, neither ping nor ssh into it was possible as soon as I tried to
point firefox to https://n22_uml/phpmyadmin/ and no crash occured any longer.
Furthermore a previously opened ssh session to that UML hangs completely.

Bisecting gave :

git bisect badd123375425d7df4b6081a631fc1203fceafa59b2 is the first bad commit
commit d123375425d7df4b6081a631fc1203fceafa59b2
Author: Thomas Gleixner <[email protected]>
Date: Wed Jan 26 21:32:01 2011 +0100

rwsem: Remove redundant asmregparm annotation

Peter Zijlstra pointed out, that the only user of asmregparm (x86) is
compiling the kernel already with -mregparm=3. So the annotation of
the rwsem functions is redundant. Remove it.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: David Howells <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Chris Zankel <[email protected]>
LKML-Reference:
<[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

:040000 040000 f373822625e4f5d03d89997cc9f06ef0e21c6d08
272479d3450a4924f3ad2d06a058d77c577ec0d4 M include
:040000 040000 9294321acb9db51e4db72b8e7c95fbd1531a7f26
393fc63299ae482792439384485618a492619787 M lib

/enjoy
:-)

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2011-05-21 08:54:10

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

Toralf F?rster wrote at 00:53:50
> Bisecting gave :
>
>
> git bisect badd123375425d7df4b6081a631fc1203fceafa59b2 is the first bad
> commit commit d123375425d7df4b6081a631fc1203fceafa59b2
> Author: Thomas Gleixner <[email protected]>
> Date: Wed Jan 26 21:32:01 2011 +0100
>
> rwsem: Remove redundant asmregparm annotation
>
> Peter Zijlstra pointed out, that the only user of asmregparm (x86) is
> compiling the kernel already with -mregparm=3. So the annotation of
> the rwsem functions is redundant. Remove it.

BTW I double checked that this commit is the culprit of the hang - it is.
Furthermore I added these kernel config options :

tfoerste@n22 ~/devel/linux-2.6 $ diff .config* | grep '<' | grep -v '#'
< CONFIG_X86_CPU=y
< CONFIG_CMPXCHG_LOCAL=y
< CONFIG_IP_FIB_HASH=y
< CONFIG_RPCSEC_GSS_KRB5=y
< CONFIG_DEBUG_RT_MUTEXES=y
< CONFIG_DEBUG_PI_LIST=y
< CONFIG_DEBUG_MUTEXES=y
< CONFIG_BKL=y
< CONFIG_DEBUG_INFO=y
< CONFIG_DEBUG_WRITECOUNT=y
< CONFIG_DEBUG_LIST=y
< CONFIG_DEBUG_PAGEALLOC=y
< CONFIG_WANT_PAGE_DEBUG_FLAGS=y
< CONFIG_PAGE_POISONING=y

attached gdb to the top process of linx and run wiresahrk in parallel when I did
$>firefox https://n22_uml/phpmyadmin/ &

GDB gave :

Program received signal SIGSEGV, Segmentation fault.
0x0829f2bd in rwsem_down_failed_common (sem=0x84f6000, flags=2, adjustment=65535) at lib/rwsem.c:189
189 adjustment += RWSEM_WAITING_BIAS;
(gdb) bt full
#0 0x0829f2bd in rwsem_down_failed_common (sem=0x84f6000, flags=2, adjustment=65535) at lib/rwsem.c:189
waiter = {list = {next = 0x0, prev = 0x6}, task = 0x1809d480, flags = 2}
tsk = 0x1809d480
count = <value optimized out>
#1 0x0829f375 in rwsem_down_write_failed (sem=0x84f6000) at lib/rwsem.c:236
No locals.
#2 0x0829d0e2 in call_rwsem_down_write_failed () at arch/um/sys-i386/../../x86/lib/semaphore_32.S:97
No locals.
#3 0x0829eaa7 in __down_write_nested (sem=0x182aa774)
at /home/tfoerste/devel/linux-2.6/arch/x86/include/asm/rwsem.h:105
tmp = -1
#4 __down_write (sem=0x182aa774) at /home/tfoerste/devel/linux-2.6/arch/x86/include/asm/rwsem.h:121
No locals.
#5 down_write (sem=0x182aa774) at kernel/rwsem.c:51
No locals.
#6 0x080d2de3 in sys_brk (brk=139419648) at mm/mmap.c:254
rlim = <value optimized out>
newbrk = <value optimized out>
oldbrk = 0
mm = 0x182aa740
#7 0x08060d16 in handle_syscall (r=0x1809d650) at arch/um/kernel/skas/syscall.c:35
syscall = <value optimized out>
#8 0x08074ca1 in handle_trap (regs=0x1809d650) at arch/um/os-Linux/skas/process.c:201
err = <value optimized out>
status = 0
#9 userspace (regs=0x1809d650) at arch/um/os-Linux/skas/process.c:417
sig = <value optimized out>
timer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 5999}}
nsecs = <value optimized out>
err = <value optimized out>
status = 34175
op = 31
pid = 12820
local_using_sysemu = 2
#10 0x0805e0cb in fork_handler () at arch/um/kernel/process.c:181
No locals.
#11 0xaaaaaaaa in ?? ()
No symbol table info available.
(gdb) quit

I attached the wireshark stream onto this mail - the network connections at at packet #92:

TLSv1 754 [TCP Retransmission] Change Cipher Spec, Encrypted Handshake Message, Application Data

might give a hint, that in few cases the UML system even hangs during start of the sshd, isn't it ?

--
MfG/Sincerely
Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachments:

uml_hang.pcap (58.30 kB)

2011-05-21 10:12:31

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

2011/5/21 Toralf F?rster <[email protected]>:
> Bisecting gave :
>
>
> git bisect badd123375425d7df4b6081a631fc1203fceafa59b2 is the first bad commit
> commit d123375425d7df4b6081a631fc1203fceafa59b2
> Author: Thomas Gleixner <[email protected]>
> Date: ? Wed Jan 26 21:32:01 2011 +0100
>
> ? ?rwsem: Remove redundant asmregparm annotation
>
> ? ?Peter Zijlstra pointed out, that the only user of asmregparm (x86) is
> ? ?compiling the kernel already with -mregparm=3. So the annotation of
> ? ?the rwsem functions is redundant. Remove it.

Ok, this bisect makes much more sense.

Thomas, Peter, please revert d123375425d7df4b6081a631fc1203fceafa59b2.
We cannot compile UML with -mregparm=3 it would cause a lot of trouble.
It would break 32bit UML on 64bit and also on older 32bit systems like RHEL5.

--
Thanks,
//richard

2011-05-21 22:38:09

by Peter Zijlstra

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On Sat, 2011-05-21 at 12:12 +0200, richard -rw- weinberger wrote:
> 2011/5/21 Toralf Förster <[email protected]>:
> > Bisecting gave :
> >
> >
> > git bisect badd123375425d7df4b6081a631fc1203fceafa59b2 is the first bad commit
> > commit d123375425d7df4b6081a631fc1203fceafa59b2
> > Author: Thomas Gleixner <[email protected]>
> > Date: Wed Jan 26 21:32:01 2011 +0100
> >
> > rwsem: Remove redundant asmregparm annotation
> >
> > Peter Zijlstra pointed out, that the only user of asmregparm (x86) is
> > compiling the kernel already with -mregparm=3. So the annotation of
> > the rwsem functions is redundant. Remove it.
>
> Ok, this bisect makes much more sense.
>
> Thomas, Peter, please revert d123375425d7df4b6081a631fc1203fceafa59b2.
> We cannot compile UML with -mregparm=3 it would cause a lot of trouble.
> It would break 32bit UML on 64bit and also on older 32bit systems like RHEL5.

But why?

Also, having to carry that asmregparm notation just for uml doesn't seem
worth the trouble.

2011-05-21 23:06:49

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

On Sun, May 22, 2011 at 12:37 AM, Peter Zijlstra <[email protected]> wrote:
> On Sat, 2011-05-21 at 12:12 +0200, richard -rw- weinberger wrote:
>> 2011/5/21 Toralf F?rster <[email protected]>:
>> > Bisecting gave :
>> >
>> >
>> > git bisect badd123375425d7df4b6081a631fc1203fceafa59b2 is the first bad commit
>> > commit d123375425d7df4b6081a631fc1203fceafa59b2
>> > Author: Thomas Gleixner <[email protected]>
>> > Date: ? Wed Jan 26 21:32:01 2011 +0100
>> >
>> > ? ?rwsem: Remove redundant asmregparm annotation
>> >
>> > ? ?Peter Zijlstra pointed out, that the only user of asmregparm (x86) is
>> > ? ?compiling the kernel already with -mregparm=3. So the annotation of
>> > ? ?the rwsem functions is redundant. Remove it.
>>
>> Ok, this bisect makes much more sense.
>>
>> Thomas, Peter, please revert d123375425d7df4b6081a631fc1203fceafa59b2.
>> We cannot compile UML with -mregparm=3 it would cause a lot of trouble.
>> It would break 32bit UML on 64bit and also on older 32bit systems like RHEL5.
>
> But why?

Why reverting?
d123375 effectively reverts commit d50efc6c (x86: fix UML and -regparm=3).

> Also, having to carry that asmregparm notation just for uml doesn't seem
> worth the trouble.
>

Frankly, I don't know why exactly UML breaks without having asmregparm.
I've seen this -regparm=3 thing today the very first time, I'll dig into it...

--
Thanks,
//richard

2011-05-23 19:17:37

[permalink] [raw]

Subject: Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

2011/5/21 Toralf F?rster <[email protected]>:
>
> Toralf F?rster wrote at 00:53:50
>> Bisecting gave :
>>
>>
>> git bisect badd123375425d7df4b6081a631fc1203fceafa59b2 is the first bad
>> commit commit d123375425d7df4b6081a631fc1203fceafa59b2
>> Author: Thomas Gleixner <[email protected]>
>> Date: ? Wed Jan 26 21:32:01 2011 +0100
>>
>> ? ? rwsem: Remove redundant asmregparm annotation
>>
>> ? ? Peter Zijlstra pointed out, that the only user of asmregparm (x86) is
>> ? ? compiling the kernel already with -mregparm=3. So the annotation of
>> ? ? the rwsem functions is redundant. Remove it.
>
> BTW I double checked that this commit is the culprit of the hang - it is.
> Furthermore I added these kernel config options :

Toralf,

does this patch fix your problem?
http://userweb.kernel.org/~rw/rwsem.diff

Please do a make clean before rebuilding...

--
Thanks,
//richard

2011-05-23 19:48:32