Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263681AbUJ2X2P (ORCPT ); Fri, 29 Oct 2004 19:28:15 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263682AbUJ2XUK (ORCPT ); Fri, 29 Oct 2004 19:20:10 -0400 Received: from rav-az.mvista.com ([65.200.49.157]:366 "EHLO zipcode.az.mvista.com") by vger.kernel.org with ESMTP id S263677AbUJ2XQ7 (ORCPT ); Fri, 29 Oct 2004 19:16:59 -0400 Subject: Re: 2.6.9 kernel oops with openais From: Steven Dake Reply-To: sdake@mvista.com To: Mark Haverkamp Cc: Openais List , linux-kernel In-Reply-To: <1099091302.13961.42.camel@markh1.pdx.osdl.net> References: <1099090282.14581.19.camel@persist.az.mvista.com> <1099091302.13961.42.camel@markh1.pdx.osdl.net> Content-Type: text/plain Organization: MontaVista Software, Inc. Message-Id: <1099091816.14581.22.camel@persist.az.mvista.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Fri, 29 Oct 2004 16:16:57 -0700 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3256 Lines: 84 On Fri, 2004-10-29 at 16:08, Mark Haverkamp wrote: > On Fri, 2004-10-29 at 15:51 -0700, Steven Dake wrote: > > Mark, > > > > Have you seen the following oops in 2.6.x? I can generate it easily > > with two nodes by letting openais run for 15-20 seconds on 2.6.9. > > > > I had to turn mlockall off in order to get openais to run in the first > > place, otherwise openais runs out of ram which causes a memset to a null > > address in parse.c (we should fix that:). Have you had problems with > > mlock when working with a 2.6 kernel? > > Funny that you should ask. Just this afternoon I updated one of my > machines from 2.6.8-rc4 to 2.6.10-rc1 and saw the memset problem. (I > got around it by commenting out the group.conf file). And then got a > segfault later. I didn't see a kernel panic though since I couldn't get > it to run that long. I don't know about any mlock problems. Maybe the > kernel mailing list archives has something. > Can you see if you can duplicate the oops? I can get other oopses as well probably all related.. The best way around the memset problem is to comment out the code that does the mlockall (the function is aisexec_mlockall(). This then allows all memory allocations to succeed. I think there must be some new limit with mlockall in the 2.6.9 kernel series or later. Thanks -steve > > Mark. > > > > > <1>Unable to handle kernel NULL pointer dereference at virtual address > > 0000000c > > printing eip: > > c016dd7b > > *pde = 00000000 > > Oops: 0000 [#2] > > PREEMPT SMP > > Modules linked in: > > CPU: 2 > > EIP: 0060:[] Not tainted VLI > > EFLAGS: 00010286 (2.6.9) > > EIP is at dnotify_flush+0x1e/0xad > > eax: 00000000 ebx: f6cdfb80 ecx: 00000000 edx: f6cdfb80 > > esi: 00000000 edi: f7baf880 ebp: f6cdfb80 esp: f6cefd50 > > ds: 007b es: 007b ss: 0068 > > Process aisexec (pid: 929, threadinfo=f6cee000 task=f7cc2810) > > Stack: c0154240 f7224a70 f7cdea80 f6cdfb80 00000000 f7baf880 f7baf880 > > c0152a6f > > f6cdfb80 f7baf880 00000005 00000007 0000000f c011e344 f6cdfb80 > > f7baf880 > > 00000020 00000001 f7baf880 f7cc2d38 f7cc2810 f7a9a0ac c011f11e > > f7cc2810 > > Call Trace: > > [] __fput+0x86/0xd4 > > [] filp_close+0x46/0x86 > > [] put_files_struct+0x87/0xec > > [] do_exit+0x1a8/0x360 > > [] do_divide_error+0x0/0x13e > > [] do_page_fault+0x251/0x5af > > [] do_IRQ+0xd2/0x139 > > [] move_addr_to_user+0x5c/0x67 > > [] sys_recvmsg+0x21d/0x226 > > [] release_sock+0x1b/0x71 > > [] lock_sock+0x17/0x41 > > [] invalidate_inode_buffers+0x1b/0x7e > > [] do_page_fault+0x0/0x5af > > [] error_code+0x2d/0x38 > > [] sock_poll+0xe/0x31 > > [] do_pollfd+0x8c/0x90 > > [] do_poll+0x66/0xc6 > > [] sys_poll+0x140/0x1fd > > [] __pollwait+0x0/0xc5 > > [] syscall_call+0x7/0xb > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/