2004-10-29 23:13:23

by Steven Dake

[permalink] [raw]
Subject: 2.6.9 kernel oops with openais

Mark,

Have you seen the following oops in 2.6.x? I can generate it easily
with two nodes by letting openais run for 15-20 seconds on 2.6.9.

I had to turn mlockall off in order to get openais to run in the first
place, otherwise openais runs out of ram which causes a memset to a null
address in parse.c (we should fix that:). Have you had problems with
mlock when working with a 2.6 kernel?

<1>Unable to handle kernel NULL pointer dereference at virtual address
0000000c
printing eip:
c016dd7b
*pde = 00000000
Oops: 0000 [#2]
PREEMPT SMP
Modules linked in:
CPU: 2
EIP: 0060:[<c016dd7b>] Not tainted VLI
EFLAGS: 00010286 (2.6.9)
EIP is at dnotify_flush+0x1e/0xad
eax: 00000000 ebx: f6cdfb80 ecx: 00000000 edx: f6cdfb80
esi: 00000000 edi: f7baf880 ebp: f6cdfb80 esp: f6cefd50
ds: 007b es: 007b ss: 0068
Process aisexec (pid: 929, threadinfo=f6cee000 task=f7cc2810)
Stack: c0154240 f7224a70 f7cdea80 f6cdfb80 00000000 f7baf880 f7baf880
c0152a6f
f6cdfb80 f7baf880 00000005 00000007 0000000f c011e344 f6cdfb80
f7baf880
00000020 00000001 f7baf880 f7cc2d38 f7cc2810 f7a9a0ac c011f11e
f7cc2810
Call Trace:
[<c0154240>] __fput+0x86/0xd4
[<c0152a6f>] filp_close+0x46/0x86
[<c011e344>] put_files_struct+0x87/0xec
[<c011f11e>] do_exit+0x1a8/0x360
[<c01070fd>] do_divide_error+0x0/0x13e
[<c0116640>] do_page_fault+0x251/0x5af
[<c0108aa3>] do_IRQ+0xd2/0x139
[<c033b760>] move_addr_to_user+0x5c/0x67
[<c033d815>] sys_recvmsg+0x21d/0x226
[<c033f3ec>] release_sock+0x1b/0x71
[<c033f3a7>] lock_sock+0x17/0x41
[<c01555d7>] invalidate_inode_buffers+0x1b/0x7e
[<c01163ef>] do_page_fault+0x0/0x5af
[<c01068e5>] error_code+0x2d/0x38
[<c033c550>] sock_poll+0xe/0x31
[<c01664c1>] do_pollfd+0x8c/0x90
[<c016652b>] do_poll+0x66/0xc6
[<c01666cb>] sys_poll+0x140/0x1fd
[<c0165a45>] __pollwait+0x0/0xc5
[<c0105e7b>] syscall_call+0x7/0xb



2004-10-29 23:18:38

by Mark Haverkamp

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

On Fri, 2004-10-29 at 15:51 -0700, Steven Dake wrote:
> Mark,
>
> Have you seen the following oops in 2.6.x? I can generate it easily
> with two nodes by letting openais run for 15-20 seconds on 2.6.9.
>
> I had to turn mlockall off in order to get openais to run in the first
> place, otherwise openais runs out of ram which causes a memset to a null
> address in parse.c (we should fix that:). Have you had problems with
> mlock when working with a 2.6 kernel?

Funny that you should ask. Just this afternoon I updated one of my
machines from 2.6.8-rc4 to 2.6.10-rc1 and saw the memset problem. (I
got around it by commenting out the group.conf file). And then got a
segfault later. I didn't see a kernel panic though since I couldn't get
it to run that long. I don't know about any mlock problems. Maybe the
kernel mailing list archives has something.


Mark.

>
> <1>Unable to handle kernel NULL pointer dereference at virtual address
> 0000000c
> printing eip:
> c016dd7b
> *pde = 00000000
> Oops: 0000 [#2]
> PREEMPT SMP
> Modules linked in:
> CPU: 2
> EIP: 0060:[<c016dd7b>] Not tainted VLI
> EFLAGS: 00010286 (2.6.9)
> EIP is at dnotify_flush+0x1e/0xad
> eax: 00000000 ebx: f6cdfb80 ecx: 00000000 edx: f6cdfb80
> esi: 00000000 edi: f7baf880 ebp: f6cdfb80 esp: f6cefd50
> ds: 007b es: 007b ss: 0068
> Process aisexec (pid: 929, threadinfo=f6cee000 task=f7cc2810)
> Stack: c0154240 f7224a70 f7cdea80 f6cdfb80 00000000 f7baf880 f7baf880
> c0152a6f
> f6cdfb80 f7baf880 00000005 00000007 0000000f c011e344 f6cdfb80
> f7baf880
> 00000020 00000001 f7baf880 f7cc2d38 f7cc2810 f7a9a0ac c011f11e
> f7cc2810
> Call Trace:
> [<c0154240>] __fput+0x86/0xd4
> [<c0152a6f>] filp_close+0x46/0x86
> [<c011e344>] put_files_struct+0x87/0xec
> [<c011f11e>] do_exit+0x1a8/0x360
> [<c01070fd>] do_divide_error+0x0/0x13e
> [<c0116640>] do_page_fault+0x251/0x5af
> [<c0108aa3>] do_IRQ+0xd2/0x139
> [<c033b760>] move_addr_to_user+0x5c/0x67
> [<c033d815>] sys_recvmsg+0x21d/0x226
> [<c033f3ec>] release_sock+0x1b/0x71
> [<c033f3a7>] lock_sock+0x17/0x41
> [<c01555d7>] invalidate_inode_buffers+0x1b/0x7e
> [<c01163ef>] do_page_fault+0x0/0x5af
> [<c01068e5>] error_code+0x2d/0x38
> [<c033c550>] sock_poll+0xe/0x31
> [<c01664c1>] do_pollfd+0x8c/0x90
> [<c016652b>] do_poll+0x66/0xc6
> [<c01666cb>] sys_poll+0x140/0x1fd
> [<c0165a45>] __pollwait+0x0/0xc5
> [<c0105e7b>] syscall_call+0x7/0xb
>
--
Mark Haverkamp <[email protected]>

2004-10-29 23:28:15

by Steven Dake

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

On Fri, 2004-10-29 at 16:08, Mark Haverkamp wrote:
> On Fri, 2004-10-29 at 15:51 -0700, Steven Dake wrote:
> > Mark,
> >
> > Have you seen the following oops in 2.6.x? I can generate it easily
> > with two nodes by letting openais run for 15-20 seconds on 2.6.9.
> >
> > I had to turn mlockall off in order to get openais to run in the first
> > place, otherwise openais runs out of ram which causes a memset to a null
> > address in parse.c (we should fix that:). Have you had problems with
> > mlock when working with a 2.6 kernel?
>
> Funny that you should ask. Just this afternoon I updated one of my
> machines from 2.6.8-rc4 to 2.6.10-rc1 and saw the memset problem. (I
> got around it by commenting out the group.conf file). And then got a
> segfault later. I didn't see a kernel panic though since I couldn't get
> it to run that long. I don't know about any mlock problems. Maybe the
> kernel mailing list archives has something.
>

Can you see if you can duplicate the oops? I can get other oopses as
well probably all related.. The best way around the memset problem is
to comment out the code that does the mlockall (the function is
aisexec_mlockall(). This then allows all memory allocations to
succeed. I think there must be some new limit with mlockall in the
2.6.9 kernel series or later.

Thanks
-steve

>
> Mark.
>
> >
> > <1>Unable to handle kernel NULL pointer dereference at virtual address
> > 0000000c
> > printing eip:
> > c016dd7b
> > *pde = 00000000
> > Oops: 0000 [#2]
> > PREEMPT SMP
> > Modules linked in:
> > CPU: 2
> > EIP: 0060:[<c016dd7b>] Not tainted VLI
> > EFLAGS: 00010286 (2.6.9)
> > EIP is at dnotify_flush+0x1e/0xad
> > eax: 00000000 ebx: f6cdfb80 ecx: 00000000 edx: f6cdfb80
> > esi: 00000000 edi: f7baf880 ebp: f6cdfb80 esp: f6cefd50
> > ds: 007b es: 007b ss: 0068
> > Process aisexec (pid: 929, threadinfo=f6cee000 task=f7cc2810)
> > Stack: c0154240 f7224a70 f7cdea80 f6cdfb80 00000000 f7baf880 f7baf880
> > c0152a6f
> > f6cdfb80 f7baf880 00000005 00000007 0000000f c011e344 f6cdfb80
> > f7baf880
> > 00000020 00000001 f7baf880 f7cc2d38 f7cc2810 f7a9a0ac c011f11e
> > f7cc2810
> > Call Trace:
> > [<c0154240>] __fput+0x86/0xd4
> > [<c0152a6f>] filp_close+0x46/0x86
> > [<c011e344>] put_files_struct+0x87/0xec
> > [<c011f11e>] do_exit+0x1a8/0x360
> > [<c01070fd>] do_divide_error+0x0/0x13e
> > [<c0116640>] do_page_fault+0x251/0x5af
> > [<c0108aa3>] do_IRQ+0xd2/0x139
> > [<c033b760>] move_addr_to_user+0x5c/0x67
> > [<c033d815>] sys_recvmsg+0x21d/0x226
> > [<c033f3ec>] release_sock+0x1b/0x71
> > [<c033f3a7>] lock_sock+0x17/0x41
> > [<c01555d7>] invalidate_inode_buffers+0x1b/0x7e
> > [<c01163ef>] do_page_fault+0x0/0x5af
> > [<c01068e5>] error_code+0x2d/0x38
> > [<c033c550>] sock_poll+0xe/0x31
> > [<c01664c1>] do_pollfd+0x8c/0x90
> > [<c016652b>] do_poll+0x66/0xc6
> > [<c01666cb>] sys_poll+0x140/0x1fd
> > [<c0165a45>] __pollwait+0x0/0xc5
> > [<c0105e7b>] syscall_call+0x7/0xb
> >

2004-10-30 00:01:47

by Chris Wright

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

* Steven Dake ([email protected]) wrote:
> The use case is this (on 2.6.9):
>
> task starts as uid 0
> task calls mlockall
> task allocates several mb of ram
> task drops root privs to non prived uid
> further memory allocations fail

What's the RLIMIT_MEMLOCK?

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-10-30 00:01:46

by Steven Dake

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

The use case is this (on 2.6.9):

task starts as uid 0
task calls mlockall
task allocates several mb of ram
task drops root privs to non prived uid
further memory allocations fail

Thanks
-steve

On Fri, 2004-10-29 at 16:39, Chris Wright wrote:
> * Steven Dake ([email protected]) wrote:
> > well probably all related.. The best way around the memset problem is
> > to comment out the code that does the mlockall (the function is
> > aisexec_mlockall(). This then allows all memory allocations to
> > succeed. I think there must be some new limit with mlockall in the
> > 2.6.9 kernel series or later.
>
> What's the mlock issue? I changed that code about 2.6.9-rc4.
>
> thanks,
> -chris

2004-10-30 00:01:40

by Steven Dake

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

Chris

The change was that from 2.6.8 to 2.6.9 the rlimit for memlock was
changed from infinity to 32k (and at the same time, normal users are now
allowed to use mlockall if they dont have alot of memory to mlock). I
fixed up the openais code by doing something evil from uid 0 like:

struct rlimit rlimit;

rlimit.rlim_cur = RLIM_INFINITY;
rlimit.rlim_max = RLIM_INFINITY;
setrlimit (RLIMIT_MEMLOCK, &rlimit);

Thanks
-steve

On Fri, 2004-10-29 at 16:45, Chris Wright wrote:
> * Steven Dake ([email protected]) wrote:
> > The use case is this (on 2.6.9):
> >
> > task starts as uid 0
> > task calls mlockall
> > task allocates several mb of ram
> > task drops root privs to non prived uid
> > further memory allocations fail
>
> What's the RLIMIT_MEMLOCK?
>
> thanks,
> -chris

2004-10-29 23:58:36

by Steven Dake

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

Mark

I think I found the changes to 2.6.9 which cause mlockall to fail.
mlockall can now be called by non privleged applications without root
priveleges.. But those processes are limited to 32kb of ram. aisexec
starts as root, mlocks, then allocates several mb of ram, then drops
root priveleges. This causes further memory allocations to fail since
sbrk cannot find new memory in the process space.

I think this can be fixed with setrlimit but I've not tested this.

I tried 2.6.8 with nfs root to see if the oops is gone when runnig
openais, but get a different oops on startup:

Unable to handle kernel NULL pointer dereference at virtual address
00000014
printing eip:
c01eff35
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c01eff35>] Not tainted
EFLAGS: 00010246 (2.6.8)
EIP is at nfs_request_init+0x1e/0x2d
eax: 00000000 ebx: f7b4b800 ecx: 00000000 edx: f7f4ed80
esi: 00000000 edi: f7b4b858 ebp: f7f92000 esp: f7f93a24
ds: 007b es: 007b ss: 0068
Process swapper (pid: 1, threadinfo=f7f92000 task=f7f916b0)
Stack: f7b59680 f7f4ed80 f7b4b800 c01ee3f1 f7b4b800 f7f4ed80 f7fb1c80
c1ffea00
00001000 f7b59680 00000000 c01f1091 f7f4ed80 f7b59680 c1ffea00
00000000
00001000 f7b59728 00000000 f7f93b10 c1ffea18 c1ffea00 f7f93bc4
00000000
Call Trace:
[<c01ee3f1>] nfs_create_request+0xe5/0xe9
[<c01f1091>] readpage_async_filler+0x70/0x147
[<c013ea1e>] read_cache_pages+0xcc/0x154
[<c01f11ca>] nfs_readpages+0x62/0xce
[<c01f1021>] readpage_async_filler+0x0/0x147
[<c013ebef>] read_pages+0x149/0x152
[<c013bd36>] __alloc_pages+0x2ee/0x32d
[<c013f0f8>] __do_page_cache_readahead+0x111/0x18f
[<c013edac>] page_cache_readahead+0xb9/0x22e
[<c0137b11>] do_generic_mapping_read+0x103/0x507
[<c01381f0>] __generic_file_aio_read+0x1ef/0x22b
[<c0137f15>] file_read_actor+0x0/0xec
[<c0140212>] cache_grow+0x12a/0x1c6
[<c0138286>] generic_file_aio_read+0x5a/0x74
[<c01ea034>] nfs_file_read+0xb3/0xf5
[<c01575c6>] do_sync_read+0x84/0xad
[<c01576ab>] vfs_read+0xbc/0x127
[<c0162a38>] kernel_read+0x50/0x5f
[<c01637a5>] prepare_binprm+0xde/0xfc
[<c0163cbf>] do_execve+0x1a8/0x28c
[<c01048f5>] sys_execve+0x50/0x80
[<c0105e47>] syscall_call+0x7/0xb
[<c0100442>] run_init_process+0x1c/0x2a
[<c01005c7>] init+0x177/0x1e7
[<c0100450>] init+0x0/0x1e7
[<c0103f49>] kernel_thread_helper+0x5/0xb
Code: f0 ff 40 14 89 43 18 8b 5c 24 08 83 c4 0c c3 8b 4c 24 04 8b
<0>Kernel panic: Attempted to kill init!

I'll have to spend some time debugging some of these oops to see if I
can get a 2.6 kernel up and running that I can fix the defect-169 with..

Thanks
-steve

On Fri, 2004-10-29 at 16:08, Mark Haverkamp wrote:
> On Fri, 2004-10-29 at 15:51 -0700, Steven Dake wrote:
> > Mark,
> >
> > Have you seen the following oops in 2.6.x? I can generate it easily
> > with two nodes by letting openais run for 15-20 seconds on 2.6.9.
> >
> > I had to turn mlockall off in order to get openais to run in the first
> > place, otherwise openais runs out of ram which causes a memset to a null
> > address in parse.c (we should fix that:). Have you had problems with
> > mlock when working with a 2.6 kernel?
>
> Funny that you should ask. Just this afternoon I updated one of my
> machines from 2.6.8-rc4 to 2.6.10-rc1 and saw the memset problem. (I
> got around it by commenting out the group.conf file). And then got a
> segfault later. I didn't see a kernel panic though since I couldn't get
> it to run that long. I don't know about any mlock problems. Maybe the
> kernel mailing list archives has something.
>
>
> Mark.
>
> >
> > <1>Unable to handle kernel NULL pointer dereference at virtual address
> > 0000000c
> > printing eip:
> > c016dd7b
> > *pde = 00000000
> > Oops: 0000 [#2]
> > PREEMPT SMP
> > Modules linked in:
> > CPU: 2
> > EIP: 0060:[<c016dd7b>] Not tainted VLI
> > EFLAGS: 00010286 (2.6.9)
> > EIP is at dnotify_flush+0x1e/0xad
> > eax: 00000000 ebx: f6cdfb80 ecx: 00000000 edx: f6cdfb80
> > esi: 00000000 edi: f7baf880 ebp: f6cdfb80 esp: f6cefd50
> > ds: 007b es: 007b ss: 0068
> > Process aisexec (pid: 929, threadinfo=f6cee000 task=f7cc2810)
> > Stack: c0154240 f7224a70 f7cdea80 f6cdfb80 00000000 f7baf880 f7baf880
> > c0152a6f
> > f6cdfb80 f7baf880 00000005 00000007 0000000f c011e344 f6cdfb80
> > f7baf880
> > 00000020 00000001 f7baf880 f7cc2d38 f7cc2810 f7a9a0ac c011f11e
> > f7cc2810
> > Call Trace:
> > [<c0154240>] __fput+0x86/0xd4
> > [<c0152a6f>] filp_close+0x46/0x86
> > [<c011e344>] put_files_struct+0x87/0xec
> > [<c011f11e>] do_exit+0x1a8/0x360
> > [<c01070fd>] do_divide_error+0x0/0x13e
> > [<c0116640>] do_page_fault+0x251/0x5af
> > [<c0108aa3>] do_IRQ+0xd2/0x139
> > [<c033b760>] move_addr_to_user+0x5c/0x67
> > [<c033d815>] sys_recvmsg+0x21d/0x226
> > [<c033f3ec>] release_sock+0x1b/0x71
> > [<c033f3a7>] lock_sock+0x17/0x41
> > [<c01555d7>] invalidate_inode_buffers+0x1b/0x7e
> > [<c01163ef>] do_page_fault+0x0/0x5af
> > [<c01068e5>] error_code+0x2d/0x38
> > [<c033c550>] sock_poll+0xe/0x31
> > [<c01664c1>] do_pollfd+0x8c/0x90
> > [<c016652b>] do_poll+0x66/0xc6
> > [<c01666cb>] sys_poll+0x140/0x1fd
> > [<c0165a45>] __pollwait+0x0/0xc5
> > [<c0105e7b>] syscall_call+0x7/0xb
> >

2004-10-30 00:21:01

by Chris Wright

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

* Steven Dake ([email protected]) wrote:
> The change was that from 2.6.8 to 2.6.9 the rlimit for memlock was
> changed from infinity to 32k (and at the same time, normal users are now
> allowed to use mlockall if they dont have alot of memory to mlock). I
> fixed up the openais code by doing something evil from uid 0 like:
>
> struct rlimit rlimit;
>
> rlimit.rlim_cur = RLIM_INFINITY;
> rlimit.rlim_max = RLIM_INFINITY;
> setrlimit (RLIMIT_MEMLOCK, &rlimit);

Yeah, that'll do it (although, certainly wouldn't hurt to size it
down ;-). Hopefully most users aren't dropping uid (I doubt it, since
I hadn't seen this problem pop up before).

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-10-30 00:22:40

by Steven Dake

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

What would be preferrable instead of dropping UID when privleged
services are needed? more specifically I need
* CAP_NET_RAW (bindtodevice)
* CAP_SYS_NICE (setscheduler)
* CAP_IPC_LOCK (mlockall)

I had thought about adding the correct code to get these capabilities
but it still requires a start-from-uid0 environment

THanks
-steve

On Fri, 2004-10-29 at 17:01, Chris Wright wrote:
> * Steven Dake ([email protected]) wrote:
> > The change was that from 2.6.8 to 2.6.9 the rlimit for memlock was
> > changed from infinity to 32k (and at the same time, normal users are now
> > allowed to use mlockall if they dont have alot of memory to mlock). I
> > fixed up the openais code by doing something evil from uid 0 like:
> >
> > struct rlimit rlimit;
> >
> > rlimit.rlim_cur = RLIM_INFINITY;
> > rlimit.rlim_max = RLIM_INFINITY;
> > setrlimit (RLIMIT_MEMLOCK, &rlimit);
>
> Yeah, that'll do it (although, certainly wouldn't hurt to size it
> down ;-). Hopefully most users aren't dropping uid (I doubt it, since
> I hadn't seen this problem pop up before).
>
> thanks,
> -chris

2004-10-29 23:58:36

by Chris Wright

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

* Steven Dake ([email protected]) wrote:
> well probably all related.. The best way around the memset problem is
> to comment out the code that does the mlockall (the function is
> aisexec_mlockall(). This then allows all memory allocations to
> succeed. I think there must be some new limit with mlockall in the
> 2.6.9 kernel series or later.

What's the mlock issue? I changed that code about 2.6.9-rc4.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-10-30 00:33:26

by Chris Wright

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

* Steven Dake ([email protected]) wrote:
> What would be preferrable instead of dropping UID when privleged
> services are needed? more specifically I need
> * CAP_NET_RAW (bindtodevice)
> * CAP_SYS_NICE (setscheduler)
> * CAP_IPC_LOCK (mlockall)

You could drop all but those specific capabilities. But, since you only
seem to need those during startup there's not a huge value in doing
anything other than what you're already doing.

> I had thought about adding the correct code to get these capabilities
> but it still requires a start-from-uid0 environment

Dropping uid is a fine idea, esp. since you have to start from uid 0
to get the bind/setsched/mlock bits done. It just exposes a case where
the mlock change might surprise users, which is why I hope it's not the
common usage pattern (and I think most are root apps, so we should be ok).

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-10-30 00:52:20

by Lee Revell

[permalink] [raw]
Subject: Re: 2.6.9 kernel oops with openais

On Fri, 2004-10-29 at 17:11 -0700, Steven Dake wrote:
> What would be preferrable instead of dropping UID when privleged
> services are needed? more specifically I need
> * CAP_NET_RAW (bindtodevice)
> * CAP_SYS_NICE (setscheduler)
> * CAP_IPC_LOCK (mlockall)
>
> I had thought about adding the correct code to get these capabilities
> but it still requires a start-from-uid0 environment

Not sure about #1, but Jack (http://jackit.sf.net) needed #2 and #3 and
the realtime LSM was developed as a result. See the LKML thread of the
same name.

HTH,

Lee