Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762709AbXKQRx4 (ORCPT ); Sat, 17 Nov 2007 12:53:56 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755265AbXKQRxt (ORCPT ); Sat, 17 Nov 2007 12:53:49 -0500 Received: from py-out-1112.google.com ([64.233.166.183]:58744 "EHLO py-out-1112.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754884AbXKQRxr (ORCPT ); Sat, 17 Nov 2007 12:53:47 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Dvw9HNTLWCtfPkv6Ut7JK2VJ+NeTir6N2N3nGIGLVePUY5MdwIC3QY5ys+zYwF4H4QkiucQPEa2uwF7D+6TrvZAHfzrcwOFfynJ6KDELyoL/YrxaOUSb6TnkbHcat4tfILHTgFqdGILV3xdDSrSijJQCT+x7AnJkw0HXtJIrl/Y= Message-ID: <64bb37e0711170953p67d1be49lf4eaa190d662e2b4@mail.gmail.com> Date: Sat, 17 Nov 2007 18:53:45 +0100 From: "Torsten Kaiser" To: "Kamalesh Babulal" Subject: Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4 Cc: "Andrew Morton" , LKML , linuxppc-dev@ozlabs.org, nfs@lists.sourceforge.net, "Andy Whitcroft" , "Balbir Singh" , "Jan Blunck" , "Trond Myklebust" In-Reply-To: <473DA608.1020804@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <473DA608.1020804@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7104 Lines: 133 On Nov 16, 2007 3:15 PM, Kamalesh Babulal wrote: > Hi Andrew, > > The kernel enters the xmon state while running the file system > stress on nfs v4 mounted partition. [snip] > 0:mon> t > [c0000000dbd4fb50] c000000000069768 .__wake_up+0x54/0x88 > [c0000000dbd4fc00] d00000000086b890 .nfs_sb_deactive+0x44/0x58 [nfs] > [c0000000dbd4fc80] d000000000872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs] > [c0000000dbd4fd10] d000000000598510 .rpc_release_calldata+0x50/0x74 [sunrpc] > [c0000000dbd4fda0] c00000000008d960 .run_workqueue+0x10c/0x1f4 > [c0000000dbd4fe50] c00000000008ec70 .worker_thread+0x118/0x138 > [c0000000dbd4ff00] c0000000000939f4 .kthread+0x78/0xc4 > [c0000000dbd4ff90] c00000000002b060 .kernel_thread+0x4c/0x68 Definitely not a ppc problem. Got nearly the same backtrace on 64bit x86: [ 966.712167] BUG: soft lockup - CPU#3 stuck for 11s! [rpciod/3:605] [ 966.718522] CPU 3: [ 966.720589] Modules linked in: radeon drm nfsd exportfs ipv6 w83792d tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common v4l1_compat hid sg i2c_nforce2 pata_amd [ 966.748306] Pid: 605, comm: rpciod/3 Not tainted 2.6.24-rc2-mm1 #4 [ 966.754653] RIP: 0010:[] [] _spin_lock_irqsave+0x12/0x30 [ 966.763424] RSP: 0018:ffff81007ef33e28 EFLAGS: 00000286 [ 966.768879] RAX: 0000000000000286 RBX: ffff81007ef33e60 RCX: 0000000000000000 [ 966.776204] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff81011e107960 [ 966.783511] RBP: ffff81011cc6c588 R08: ffff8100db918130 R09: ffff81011cc6c540 [ 966.790837] R10: 0000000000000000 R11: ffffffff80266390 R12: ffff8100d2d693a8 [ 966.798170] R13: ffff81011cc6c588 R14: ffff8100d2d693a8 R15: ffffffff80302726 [ 966.805505] FS: 00007f9e739d96f0(0000) GS:ffff81011ff12700(0000) knlGS:0000000000000000 [ 966.813805] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 966.819703] CR2: 0000000001b691d0 CR3: 0000000069861000 CR4: 00000000000006e0 [ 966.827039] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 966.834362] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 966.841687] [ 966.841687] Call Trace: [ 966.845728] [] __wake_up+0x2d/0x70 [ 966.850900] [] nfs_free_unlinkdata+0x1e/0x50 [ 966.857004] [] rpc_release_calldata+0x26/0x50 [ 966.863161] [] rpc_async_schedule+0x0/0x10 [ 966.869078] [] run_workqueue+0xcc/0x170 [ 966.874705] [] worker_thread+0x0/0xb0 [ 966.880163] [] worker_thread+0x0/0xb0 [ 966.885610] [] worker_thread+0x6d/0xb0 [ 966.891148] [] autoremove_wake_function+0x0/0x30 [ 966.897606] [] worker_thread+0x0/0xb0 [ 966.903045] [] worker_thread+0x0/0xb0 [ 966.908485] [] kthread+0x4b/0x80 [ 966.913484] [] child_rip+0xa/0x12 [ 966.918579] [] kthread+0x0/0x80 [ 966.923498] [] child_rip+0x0/0x12 [ 966.928584] Sadly lockdep does not work for me, as it gets turned off early: [ 39.851594] --------------------------------- [ 39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage. [ 39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes: [ 39.866963] (&n->list_lock){-+..}, at: [] add_partial+0x31/0xa0 [ 39.874712] {softirq-on-W} state was registered at: [ 39.879788] [] __lock_acquire+0x3e8/0x1140 [ 39.885763] [] debug_check_no_locks_freed+0x188/0x1a0 [ 39.892682] [] lock_acquire+0x55/0x70 [ 39.898840] [] add_partial+0x31/0xa0 [ 39.904288] [] _spin_lock+0x1e/0x30 [ 39.909650] [] add_partial+0x31/0xa0 [ 39.915097] [] kmem_cache_open+0x1cc/0x330 [ 39.921066] [] _spin_unlock_irq+0x24/0x30 [ 39.926946] [] create_kmalloc_cache+0x64/0xf0 [ 39.933172] [] init_alloc_cpu_cpu+0x70/0x90 [ 39.939226] [] kmem_cache_init+0x65/0x1d0 [ 39.945289] [] start_kernel+0x23e/0x350 [ 39.950996] [] _sinittext+0x12d/0x140 [ 39.956529] [] 0xffffffffffffffff [ 39.961720] irq event stamp: 1207 [ 39.965048] hardirqs last enabled at (1206): [] debug_check_no_locks_freed+0x188/0x1a0 [ 39.974701] hardirqs last disabled at (1207): [] __slab_free+0x3b/0x190 [ 39.982968] softirqs last enabled at (570): [] call_softirq+0x1c/0x30 [ 39.991148] softirqs last disabled at (1197): [] call_softirq+0x1c/0x30 [ 39.999415] [ 39.999416] other info that might help us debug this: [ 40.005990] no locks held by swapper/0. [ 40.010018] [ 40.010018] stack backtrace: [ 40.014429] [ 40.014429] Call Trace: [ 40.018407] [] print_usage_bug+0x18c/0x1a0 [ 40.024817] [] mark_lock+0x64c/0x660 [ 40.030057] [] __lock_acquire+0x39e/0x1140 [ 40.035818] [] save_trace+0x37/0xa0 [ 40.040972] [] __rcu_process_callbacks+0x8d/0x250 [ 40.047335] [] lock_acquire+0x55/0x70 [ 40.052663] [] add_partial+0x31/0xa0 [ 40.057905] [] trace_hardirqs_on+0x83/0x160 [ 40.063750] [] _spin_lock+0x1e/0x30 [ 40.068905] [] add_partial+0x31/0xa0 [ 40.074311] [] __slab_free+0x100/0x190 [ 40.079724] [] __rcu_process_callbacks+0x8d/0x250 [ 40.086088] [] tasklet_action+0x2c/0xc0 [ 40.091588] [] rcu_process_callbacks+0x23/0x50 [ 40.097694] [] tasklet_action+0x4a/0xc0 [ 40.103194] [] __do_softirq+0x7a/0x100 [ 40.108607] [] call_softirq+0x1c/0x30 [ 40.113935] [] do_softirq+0x55/0xb0 [ 40.119089] [] irq_exit+0x97/0xa0 [ 40.124073] [] smp_apic_timer_interrupt+0x7c/0xc0 [ 40.130434] [] default_idle+0x0/0x60 [ 40.135840] [] default_idle+0x0/0x60 [ 40.141080] [] apic_timer_interrupt+0x6b/0x70 [ 40.147100] [] default_idle+0x37/0x60 [ 40.153066] [] default_idle+0x35/0x60 [ 40.158393] [] cpu_idle+0x5f/0x90 [ 40.163374] [ 40.164888] INFO: lockdep is turned off. Don't know who to bug about that. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/