From: Peter Zijlstra Subject: Re: [NFS] [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4 Date: Sun, 18 Nov 2007 00:05:08 +0100 Message-ID: <20071117230508.GB25905__39604.3313354768$1195352263$gmane$org@dyad> References: <473DA608.1020804@linux.vnet.ibm.com> <64bb37e0711170953p67d1be49lf4eaa190d662e2b4@mail.gmail.com> <20071117180946.GA14055@elte.hu> <20071117101957.7562639d.akpm@linux-foundation.org> <64bb37e0711171140w5f1451e0qea081a4fbc7a45f7@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Trond Myklebust , Peter Zijlstra , steved@redhat.com, LKML , Kamalesh Babulal , linuxppc-dev-mnsaURCQ41sdnm+yROfE0A@public.gmane.org, nfs@lists.sourceforge.net, Andy Whitcroft , Andrew Morton , Jan Blunck , Ingo Molnar , Balbir Singh To: Torsten Kaiser Return-path: Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.91] helo=mail.sourceforge.net) by sc8-sf-list2-new.sourceforge.net with esmtp (Exim 4.43) id 1ItWis-0005at-48 for nfs@lists.sourceforge.net; Sat, 17 Nov 2007 15:05:10 -0800 Received: from viefep18-int.chello.at ([213.46.255.22] helo=viefep16-int.chello.at) by mail.sourceforge.net with esmtp (Exim 4.44) id 1ItWix-0008PP-F3 for nfs@lists.sourceforge.net; Sat, 17 Nov 2007 15:05:16 -0800 In-Reply-To: <64bb37e0711171140w5f1451e0qea081a4fbc7a45f7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Sat, Nov 17, 2007 at 08:40:22PM +0100, Torsten Kaiser wrote: > Lockdep triggers immedetly before the freeze, but the result is still > not helpful: > > [ 221.565011] INFO: trying to register non-static key. > [ 221.566999] the code is fine but needs lockdep annotation. > [ 221.569206] turning off the locking correctness validator. > [ 221.571404] > [ 221.571405] Call Trace: > [ 221.572996] [] __lock_acquire+0x4c4/0x1140 > [ 221.575298] [] lock_acquire+0x55/0x70 > [ 221.577429] [] __wake_up+0x2d/0x70 > [ 221.579457] [] _spin_lock_irqsave+0x34/0x50 > [ 221.581800] [] _spin_unlock_irqrestore+0x55/0x70 > [ 221.584317] [] __wake_up+0x2d/0x70 > [ 221.586344] [] rpc_async_schedule+0x0/0x10 > [ 221.588648] [] nfs_free_unlinkdata+0x1e/0x50 > [ 221.591023] [] rpc_release_calldata+0x26/0x50 > [ 221.593428] [] run_workqueue+0x16f/0x210 > [ 221.595662] [] trace_hardirqs_on+0xc1/0x160 > [ 221.598004] [] worker_thread+0x0/0xb0 > [ 221.600130] [] worker_thread+0x0/0xb0 > [ 221.602265] [] worker_thread+0x6d/0xb0 > [ 221.604431] [] autoremove_wake_function+0x0/0x30 > [ 221.606939] [] worker_thread+0x0/0xb0 > [ 221.609067] [] worker_thread+0x0/0xb0 > [ 221.611199] [] kthread+0x4b/0x80 > [ 221.613156] [] child_rip+0xa/0x12 > [ 221.615151] [] restore_args+0x0/0x30 > [ 221.617247] [] kthread+0x0/0x80 > [ 221.619162] [] child_rip+0x0/0x12 > [ 221.621147] > [ 221.621749] INFO: lockdep is turned off. I've been staring at this NFS code for a while an can't make any sense out of it. It seems to correctly initialize the waitqueue. So this would indicate corruption of some sort. > I also had another BUG output during system startup, but that should > be unrelated: > [ 103.254681] BUG: sleeping function called from invalid context at > kernel/rwsem.c:20 > [ 103.257757] in_atomic():0, irqs_disabled():1 > [ 103.259469] 1 lock held by artsd/5883: > [ 103.259470] #0: (pm_qos_lock){....}, at: [] > pm_qos_add_requirement+0x6b/0xf0 > [ 103.263316] irq event stamp: 49712 > [ 103.263318] hardirqs last enabled at (49711): [] > __kmalloc+0x10d/0x180 > [ 103.263321] hardirqs last disabled at (49712): [] > _spin_lock_irqsave+0x1a/0x50 > [ 103.263326] softirqs last enabled at (48820): [] > unix_release_sock+0x79/0x240 > [ 103.263330] softirqs last disabled at (48818): [] > _write_lock_bh+0x9/0x30 > [ 103.263333] > [ 103.263333] Call Trace: > [ 103.263335] [] down_read+0x15/0x40 > [ 103.263338] [] __blocking_notifier_call_chain+0x46/0x90 > [ 103.263341] [] pm_qos_add_requirement+0x93/0xf0 > [ 103.263344] [] snd_pcm_hw_params+0x2fa/0x380 > [ 103.263347] [] snd_pcm_common_ioctl1+0xb4c/0xdc0 > [ 103.263350] [] __do_fault+0x227/0x470 > [ 103.263353] [] __lock_acquire+0x745/0x1140 > [ 103.263357] [] _spin_unlock_irqrestore+0x55/0x70 > [ 103.263359] [] trace_hardirqs_on+0xc1/0x160 > [ 103.263362] [] snd_pcm_playback_ioctl1+0x48/0x240 > [ 103.263365] [] snd_pcm_playback_ioctl+0x36/0x50 > [ 103.263367] [] vfs_ioctl+0x2f/0xa0 > [ 103.263369] [] do_vfs_ioctl+0x260/0x2e0 > [ 103.263371] [] trace_hardirqs_on+0xc1/0x160 > [ 103.263373] [] sys_ioctl+0x91/0xb0 > [ 103.263376] [] system_call+0x7e/0x83 > [ 103.263379] This pm-qos code is fubar, it calls blocking_notifier_call_chain while holding a spinlock (and that is after 'fixing' it from a srcu_notifier_call_chain - which is equally wrong). ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs