Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261665AbUKAJ0W (ORCPT ); Mon, 1 Nov 2004 04:26:22 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261676AbUKAJ0W (ORCPT ); Mon, 1 Nov 2004 04:26:22 -0500 Received: from gyre.foreca.com ([193.94.59.26]:30423 "EHLO gyre.weather.fi") by vger.kernel.org with ESMTP id S261665AbUKAJ0I (ORCPT ); Mon, 1 Nov 2004 04:26:08 -0500 Date: Mon, 1 Nov 2004 11:25:57 +0200 (EET) From: =?ISO-8859-1?Q?Jaakko_Hyv=E4tti?= X-X-Sender: jaakko@gyre.weather.fi To: linux-kernel@vger.kernel.org, Andrew Morton Subject: ext3 and nfsd do not work under load (Re: x86_64, LOCKUP on CPU0, kjournald) In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9563 Lines: 144 Here is another oops and lockup, with nfsd now there in the trace also: Unable to handle kernel paging request at ffffffff00000808 RIP: {cache_alloc_refill+329} PML4 103027 PGD 0 Oops: 0002 [1] SMP CPU 0 Modules linked in: w83627hf i2c_sensor i2c_isa i2c_core nfsd exportfs lockd sunrpc md5 ipv6 parport_pc lp parport tg3 ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod ohci_hcd button battery asus_acpi ac ext3 jbd 3w_xxxx sd_mod scsi_mod Pid: 1968, comm: nfsd Not tainted 2.6.8-1.521smp RIP: 0010:[] {cache_alloc_refill+329} RSP: 0018:000001003ad8d728 EFLAGS: 00010046 RAX: ffffffff00000800 RBX: 00000000ffffffff RCX: 0000010001000000 RDX: 00000100400112c8 RSI: 00000000000000d0 RDI: 0000010040011280 RBP: 000001003ffb7000 R08: a322eb30453b40eb R09: 0000000000000008 R10: 000001004d6bdb00 R11: 000001004d6bdb00 R12: 00000100400112c8 R13: 0000010040011280 R14: 00000000000000d0 R15: 0000000000000001 FS: 0000002a958624c0(0000) GS:ffffffff804e2d80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffff00000808 CR3: 0000000000101000 CR4: 00000000000006e0 Process nfsd (pid: 1968, threadinfo 000001003ad8c000, task 00000100be7e5550) Stack: 0000010076ba4388 0000010076ba4388 fffffffffffffff4 0000010076ba4388 000001003ad8d7e8 0000000000000000 000001009c01a188 ffffffff80161eae 0000000000000283 ffffffff80195755 Call Trace:{kmem_cache_alloc+52} {d_alloc+196} {cached_lookup+40} {__lookup_hash+117} {lookup_one_len+87} {:nfsd:compose_entry_fh+279} {:nfsd:encode_entry+460} {thread_return+41} {__generic_unplug_device+49} {:nfsd:nfs3svc_encode_entry_plus+0} {:nfsd:nfs3svc_encode_entry_plus+9} {:ext3:call_filldir+137} {:nfsd:nfs3svc_encode_entry_plus+0} {:ext3:ext3_dx_readdir+331} {:nfsd:nfs3svc_encode_entry_plus+0} {:ext3:ext3_readdir+131} {:nfsd:nfs3svc_encode_entry_plus+0} {:nfsd:fh_verify+1324} {recalc_task_prio+332} {:nfsd:nfs3svc_encode_entry_plus+0} {vfs_readdir+123} {:nfsd:nfs3svc_encode_entry_plus+0} {:nfsd:nfsd_readdir+104} {:nfsd:nfsd3_proc_readdirplus+245} {:nfsd:nfsd_dispatch+226} {:sunrpc:svc_process+931} {:nfsd:nfsd+750} {:nfsd:nfsd+0} {child_rip+8} {:nfsd:nfsd+0} {:nfsd:nfsd+0} {child_rip+0} Code: 48 89 50 08 48 89 02 48 c7 41 08 00 02 20 00 66 83 79 24 ff RIP {cache_alloc_refill+329} RSP <000001003ad8d728> CR2: ffffffff00000808 NMI Watchdog detected LOCKUP on CPU0, registers: CPU 0 Modules linked in: w83627hf i2c_sensor i2c_isa i2c_core nfsd exportfs lockd sunrpc md5 ipv6 parport_pc lp parport tg3 ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod ohci_hcd button battery asus_acpi ac ext3 jbd 3w_xxxx sd_mod scsi_mod Pid: 1968, comm: nfsd Not tainted 2.6.8-1.521smp RIP: 0010:[] {.text.lock.slab+338} RSP: 0018:ffffffff80480b38 EFLAGS: 00000086 RAX: 000000010fe430f5 RBX: 0000010040011280 RCX: 000000000000001e RDX: 000000000000001d RSI: 000001003ffb7000 RDI: 0000010040011280 RBP: 0000010040011370 R08: 000001003ffac010 R09: 00000100bfc34d80 R10: 0000010040011280 R11: 0000010040011280 R12: 00000100bfc34e28 R13: 0000000000000002 R14: ffffffff804af7e0 R15: 0000000000000000 FS: 0000002a958624c0(0000) GS:ffffffff804e2d80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffff00000808 CR3: 0000000000101000 CR4: 00000000000006e0 Process nfsd (pid: 1968, threadinfo 000001003ad8c000, task 00000100be7e5550) Stack: 0000010001e0fb60 0000000000000000 ffffffff80480b98 0000000000000292 0000000000000000 0000010001e13020 ffffffff80480bb8 000001003ad8d678 0000000000000002 ffffffff80162bcd Call Trace: {reap_timer_fnc+31} {run_timer_softirq+427} {__do_softirq+65} {__do_softirq+76} {do_softirq+49} {apic_timer_interrupt+133} {oops_end+80} {oops_end+18} {do_page_fault+1008} {__getblk+17} {:ext3:ext3_getblk+195} {bh_lru_install+221} {error_exit+0} {cache_alloc_refill+329} {d_instantiate+202} {kmem_cache_alloc+52} {d_alloc+196} {cached_lookup+40} {__lookup_hash+117} {lookup_one_len+87} {:nfsd:compose_entry_fh+279} {:nfsd:encode_entry+460} {thread_return+41} {__generic_unplug_device+49} {:nfsd:nfs3svc_encode_entry_plus+0} {:nfsd:nfs3svc_encode_entry_plus+9} {:ext3:call_filldir+137} {:nfsd:nfs3svc_encode_entry_plus+0} {:ext3:ext3_dx_readdir+331} {:nfsd:nfs3svc_encode_entry_plus+0} {:ext3:ext3_readdir+131} {:nfsd:nfs3svc_encode_entry_plus+0} {:nfsd:fh_verify+1324} {recalc_task_prio+332} {:nfsd:nfs3svc_encode_entry_plus+0} {vfs_readdir+123} {:nfsd:nfs3svc_encode_entry_plus+0} {:nfsd:nfsd_readdir+104} {:nfsd:nfsd3_proc_readdirplus+245} {:nfsd:nfsd_dispatch+226} {:sunrpc:svc_process+931} {:nfsd:nfsd+750} {:nfsd:nfsd+0} {child_rip+8} {:nfsd:nfsd+0} {:nfsd:nfsd+0} {child_rip+0} Code: 80 7d b8 00 7e f8 e9 b4 f4 ff ff f3 90 80 bb a8 00 00 00 00 console shuts up ... On Tue, 26 Oct 2004, Jaakko Hyv?tti wrote: > Last night we got this trace, saved by serial console. > > The machine is dual AMD Opteron 240 (1.4GHz), MSI K8D motherboard, 3G > mem, disks in raid configuration with 3ware SATA adapter, so they show > up as scsi devices. Kernel is updated Fedora Core 2 2.6.8-1.521smp, > but the same probably happened with 2.6.6 at least, cannot tell as I > did not have serial console back then. Disks are shared over nfs to > a few very very busy clients, some running Linux, some running IRIX. > > > NMI Watchdog detected LOCKUP on CPU0, registers: > CPU 0 > Modules linked in: loop w83627hf i2c_sensor i2c_isa i2c_core nfsd exportfs lockd sunrpc md5 ipv6 parport_pc lp parport tg3 ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod ohci_hcd button battery asus_acpi ac ext3 jbd 3w_xxxx sd_mod scsi_mod > Pid: 210, comm: kjournald Not tainted 2.6.8-1.521smp > RIP: 0010:[] {cache_alloc_refill+397} > RSP: 0018:00000100bf4efa58 EFLAGS: 00000013 > RAX: 00000100bff39000 RBX: 0000000000000031 RCX: 00000100400116d8 > RDX: 00000100400116c8 RSI: 0000000000000850 RDI: 0000010040011680 > RBP: 000001003ffbf000 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000010095e05e00 R12: 00000100400116c8 > R13: 0000010040011680 R14: 0000000000000850 R15: 0000010031cdb9d0 > FS: 0000002a958624c0(0000) GS:ffffffff804e2d80(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000a02b5c CR3: 0000000000101000 CR4: 00000000000006e0 > Process kjournald (pid: 210, threadinfo 00000100bf4ee000, task 0000010037d201f0) > Stack: 000001007e657558 000001007e657558 00000100bdc356e0 000001007e657558 > 000001003ffb5e00 000001003c2ed180 000001007e657558 ffffffff80161eae > 0000000000000202 ffffffff80180d98 > Call Trace:{kmem_cache_alloc+52} {alloc_buffer_head+17} > {:jbd:journal_write_metadata_buffer+130} > {:jbd:journal_commit_transaction+2847} > {autoremove_wake_function+0} {autoremove_wake_function+0} > {:jbd:kjournald+333} {autoremove_wake_function+0} > {autoremove_wake_function+0} {:jbd:commit_timeout+0} > {child_rip+8} {:jbd:kjournald+0} > {child_rip+0} > > Code: 4c 89 61 08 49 89 0c 24 85 db 0f 8f 2c ff ff ff 8b 45 00 49 > console shuts up ... -- Foreca Ltd Jaakko.Hyvatti@foreca.com Pursimiehenkatu 29-31 B, FIN-00150 Helsinki, Finland http://www.foreca.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/