Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Thu, 27 Mar 2003 04:29:57 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Thu, 27 Mar 2003 04:29:57 -0500 Received: from [218.244.176.102] ([218.244.176.102]:32268 "EHLO wideinfo.com.cn") by vger.kernel.org with ESMTP id ; Thu, 27 Mar 2003 04:29:50 -0500 From: "Zhenghui Zhou" To: Subject: Re: Ooops in 2.4.18 through 2.4.20, now kswapd is defunct Date: Thu, 27 Mar 2003 17:40:12 +0800 Message-ID: <005d01c2f444$dd414170$a9b0f4da@wideinfo.com.cn> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 X-Authenticated-Sender: zzh@wideinfo.com.cn X-MDRemoteIP: 192.168.0.14 X-Return-Path: zhouzhenghui@cn99.com X-MDaemon-Deliver-To: linux-kernel@vger.kernel.org Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8303 Lines: 224 > > I have a variety of systems running kernels ranging from > 2.4.18 through 2.4.20 and am seeing fairly frequent > kernel oopsen. After the oops, kswapd is defunct, however > the systems are still running (they are dual CPU systems). > Has anyone seen this before, and is there a patch to fix > it yet ? Thanks. > > Kelvin Edwards > System Admin > Jefferson Lab > > Here's some Ooops run through ksymoops: > > Mar 21 15:20:00 MachX kernel: Unable to handle kernel NULL pointer dereference > at virtual address 00000004 > Mar 21 15:20:00 MachX kernel: c0152b51 > Mar 21 15:20:00 MachX kernel: *pde = 00000000 > Mar 21 15:20:00 MachX kernel: Oops: 0000 > Mar 21 15:20:00 MachX kernel: CPU: 4 > Mar 21 15:20:00 MachX kernel: EIP: 0010:[destroy_inode+33/80] Not > tainted > Mar 21 15:20:00 MachX kernel: EIP: 0010:[] Not tainted > Using defaults from ksymoops -t elf32-i386 -a i386 > Mar 21 15:20:00 MachX kernel: EFLAGS: 00010246 > Mar 21 15:20:00 MachX kernel: eax: 00000000 ebx: e3c999c0 ecx: 00000000 > edx: e3c999c0 > Mar 21 15:20:00 MachX kernel: esi: e3c999c0 edi: 00000001 ebp: 00000a55 > esp: f7a8fefc > Mar 21 15:20:00 MachX kernel: ds: 0018 es: 0018 ss: 0018 > Mar 21 15:20:00 MachX kernel: Process kswapd (pid: 11, stackpage=f7a8f000) > Mar 21 15:20:00 MachX kernel: Stack: e3c999c0 c0154335 e3c999c0 d940e7a0 > c034c700 00150c00 f8a09e9a ca60c738 > Mar 21 15:20:00 MachX kernel: ca60c720 e3c999c0 c0151a31 e3c999c0 > e3c999c0 c0135e63 d7f4e400 f7a8e000 > Mar 21 15:20:00 MachX kernel: ffffffff 000001d0 c02e6308 f7a8e000 > 00000003 0000001f 000001d0 00000006 > Mar 21 15:20:00 MachX kernel: Call Trace: [iput+629/640] > [appletalk:__insmod_appletalk_S.bss_L268+425722/117747542] > [prune_dcache+225/368] [shrink_cache+819/976] [shrink_dcache_memory+32/48] > Mar 21 15:20:00 MachX kernel: Call Trace: [] [] > [] [] [] > Mar 21 15:20:00 MachX kernel: [] [] [] > [] [] [] > Mar 21 15:20:00 MachX kernel: [] [] [] > Mar 21 15:20:00 MachX kernel: Code: 8b 40 04 85 c0 74 08 53 ff d0 59 eb 11 89 > f6 53 8b 15 b4 ae > > >>EIP; c0152b51 <===== > Trace; c0154335 > Trace; f8a09e9a <[nfs]nfs_dentry_iput+5a/80> > Trace; c0151a31 > Trace; c0135e63 > Trace; c0151de0 > Trace; c0136087 > Trace; c01360ec > Trace; c01361ff > Trace; c0136276 > Trace; c01363b1 > Trace; c0136310 > Trace; c0105000 <_stext+0/0> > Trace; c0107296 > Trace; c0136310 > Code; c0152b51 > 00000000 <_EIP>: > Code; c0152b51 <===== > 0: 8b 40 04 mov 0x4(%eax),%eax <===== > Code; c0152b54 > 3: 85 c0 test %eax,%eax > Code; c0152b56 > 5: 74 08 je f <_EIP+0xf> c0152b60 > > Code; c0152b58 > 7: 53 push %ebx > Code; c0152b59 > 8: ff d0 call *%eax > Code; c0152b5b > a: 59 pop %ecx > Code; c0152b5c > b: eb 11 jmp 1e <_EIP+0x1e> c0152b6f > > Code; c0152b5e > d: 89 f6 mov %esi,%esi > Code; c0152b60 > f: 53 push %ebx > Code; c0152b61 > 10: 8b 15 b4 ae 00 00 mov 0xaeb4,%edx > I meet the similar situation, I run the server on internet with heavy stress and cannot trace it clearly, I also tested from 2.4.18 to 2.4.20, and got the same wrong thing, I have to limit the number of processes running on the server to cut down the errors. The error fired while load and run a program from disk, if it is do by hand, it shows as "Segmentation Fault". The dmesg shows: <1>Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: dfd91718 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[] Not tainted EFLAGS: 00010286 eax: bffffae4 ebx: cff46000 ecx: 00000000 edx: cff46000 esi: c0106c33 edi: 0000000b ebp: cff47fb8 esp: cff47f84 ds: 0018 es: 0018 ss: 0018 Process more (pid: 20846, stackpage=cff47000) Stack: cff46000 c0106c33 0000000b 00000000 d3ef2000 0000000b cff47fbc c0105987 bffffae4 c01059a7 00000000 00000a3a 00000020 bffffa5c dfd918e8 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000002b Call Trace: [] [] [] Code: 8b 51 04 83 fa ff 0f 84 56 01 00 00 83 fa fc 77 07 c7 41 04 I run ksymoops with correct specified vmlinux and System.map, the result shows: Warning (compare_maps): ksyms_base symbol default_idle_R__ver_default_idle not f ound in vmlinux. Ignoring ksyms_base entry Warning (compare_maps): ksyms_base symbol machine_real_restart_R__ver_machine_re al_restart not found in vmlinux. Ignoring ksyms_base entry Reading Oops report from the terminal Using defaults from ksymoops -t elf32-i386 -a i386 <1>Unable to handle kernel NULL pointer dereference at virtual address 00000004 dfd91718 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[] Not tainted EFLAGS: 00010286 eax: bffffae4 ebx: cff46000 ecx: 00000000 edx: cff46000 esi: c0106c33 edi: 0000000b ebp: cff47fb8 esp: cff47f84 ds: 0018 es: 0018 ss: 0018 Process more (pid: 20846, stackpage=cff47000) Stack: cff46000 c0106c33 0000000b 00000000 d3ef2000 0000000b cff47fbc c0105987 bffffae4 c01059a7 00000000 00000a3a 00000020 bffffa5c dfd918e8 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000002b Call Trace: [] [] [] Code: 8b 51 04 83 fa ff 0f 84 56 01 00 00 83 fa fc 77 07 c7 41 04 >>EIP; dfd91718 <_end+1fb00020/205fa908> <===== >>eax; bffffae4 Before first symbol >>ebx; cff46000 <_end+fcb4908/205fa908> >>edx; cff46000 <_end+fcb4908/205fa908> >>esi; c0106c33 >>ebp; cff47fb8 <_end+fcb68c0/205fa908> >>esp; cff47f84 <_end+fcb688c/205fa908> Trace; c0106c33 Trace; c0105987 Trace; c01059a7 Code; dfd91718 <_end+1fb00020/205fa908> 00000000 <_EIP>: Code; dfd91718 <_end+1fb00020/205fa908> <===== 0: 8b 51 04 mov 0x4(%ecx),%edx <===== Code; dfd9171b <_end+1fb00023/205fa908> 3: 83 fa ff cmp $0xffffffff,%edx Code; dfd9171e <_end+1fb00026/205fa908> 6: 0f 84 56 01 00 00 je 162 <_EIP+0x162> dfd9187a <_end+1fb0018 2/205fa908> Code; dfd91724 <_end+1fb0002c/205fa908> c: 83 fa fc cmp $0xfffffffc,%edx Code; dfd91727 <_end+1fb0002f/205fa908> f: 77 07 ja 18 <_EIP+0x18> dfd91730 <_end+1fb00038/ 205fa908> Code; dfd91729 <_end+1fb00031/205fa908> 11: c7 41 04 00 00 00 00 movl $0x0,0x4(%ecx) Linux jsgx 2.4.20 #2 Tue Mar 25 22:47:46 CST 2003 i686 unknown Gnu C 2.96 Gnu make 3.78.1 binutils 2.11.90.0.8 util-linux 2.10f mount 2.10r modutils 2.4.13 e2fsprogs 1.32 pcmcia-cs 3.1.20 PPP 2.4.1 isdn4k-utils 3.1beta7 Linux C Library so.8.0 Linux C Library 2.1.3 Dynamic linker (ldd) 2.1.3 Procps 2.0.6 Net-tools 1.60 Console-tools 0.3.3 Sh-utils 2.0 Modules Loaded I tested several situations, cann't get rid of it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/