From: "J. Bruce Fields" Subject: Re: [PATCH] nfs lockd: detect grace_list corruption Date: Tue, 12 May 2009 15:13:46 -0400 Message-ID: <20090512191346.GE19164@fieldses.org> References: <49F12D78.2040304@cn.fujitsu.com> <20090424231252.GD22477@fieldses.org> <4A0155A0.4020008@cn.fujitsu.com> <20090506203227.GM9861@fieldses.org> <4A0284EB.9050202@cn.fujitsu.com> <20090508182648.GD20539@fieldses.org> <4A07C21E.3020309@cn.fujitsu.com> <20090511205741.GH793@fieldses.org> <4A08C63D.5080002@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: neilb@suse.de, Trond.Myklebust@netapp.com, linux-nfs@vger.kernel.org, FNST-Bian Naimeng To: Wang Chen Return-path: Received: from mail.fieldses.org ([141.211.133.115]:33036 "EHLO pickle.fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751359AbZELTNs (ORCPT ); Tue, 12 May 2009 15:13:48 -0400 In-Reply-To: <4A08C63D.5080002@cn.fujitsu.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Tue, May 12, 2009 at 08:43:41AM +0800, Wang Chen wrote: > J. Bruce Fields said the following on 2009-5-12 4:57: > > On Mon, May 11, 2009 at 02:13:50PM +0800, Wang Chen wrote: > >> J. Bruce Fields said the following on 2009-5-9 2:26: > >>> On Thu, May 07, 2009 at 02:51:23PM +0800, Wang Chen wrote: > >>>> J. Bruce Fields said the following on 2009-5-7 4:32: > >>>>> On Wed, May 06, 2009 at 05:17:20PM +0800, Wang Chen wrote: > >>>>>> J. Bruce Fields said the following on 2009-4-25 7:12: > >>>>>>> On Fri, Apr 24, 2009 at 11:09:44AM +0800, Wang Chen wrote: > >>>>>>>> Although I can't reproduce it now, it really happened that some lock manager > >>>>>>>> started grace period but didn't end it. > >>>>>>>> This causes an lm entry be left in grace_list, and when service nfs restart, > >>>>>>>> the same lm will be added again into the list. > >>>>>>>> As you know, adding an entry, which is in the list, to a list will leads to > >>>>>>>> list corruption. > >>>>>>> I'd really like to understand why locks_end_grace() isn't being called. > >>>>>>> I'm probably overlooking something obvious, but I just can't see how > >>>>>>> lockd or nfsd can be shut down right now without locks_end_grace() being > >>>>>>> called. > >>>>>>> > >>>>>> Me neither can figure out why locks_end_grace() isn't being called. > >>>>>> > >>>>>> But do locks_start_grace() twice can trigger this warning too. > >>>>>> You can do > >>>>>> 1. service nfs restart > >>>>>> 2. (immediately) kill -s SIGKILL lockd > >>>>>> this can trigger > >>>>>> --- > >>>>>> lockd(void *vrqstp) > >>>>>> ... > >>>>>> if (signalled()) { > >>>>>> flush_signals(current); > >>>>>> if (nlmsvc_ops) { > >>>>>> nlmsvc_invalidate_all(); > >>>>>> set_grace_period(); > >>>>>> --- > >>>>>> and makes locks_start_grace() be called twice without locks_end_grace(). > >>>>> Ah-hah! > >>>>> > >>>>>> So I still suggest to do something to protect the lm list. :) > >>>>> I wouldn't be opposed to a simple WARN_ON(!list_empty()) in > >>>>> locks_start_grace(), but I'm mainly worried about fixing the original > >>>>> bug. How about the following? > >>>>> > >>>> Yeah, the following fix is OK to me, although it only fixed > >>>> "start_grace again after start_grace" case. > >>> OK, thanks. > >>> > >>>> The bug about "quit lockd without end_grace", which I encountered before > >>>> incidentally, maybe is still there. > >>> You're talking about the report that started this thread?: > >>> > >>> http://marc.info/?l=linux-nfs&m=124054262421444&w=2 > >>> > >> Yes. I mean this. > >> > >>> It looks to me like that could be explained by two start_grace's in a > >>> row. > >>> > >> But in that report, I didn't post the total message. > >> Here are something show that: > >> 1. not only lockd has the problem, but nfsd also. > >> 2. every time I do "service nfs restart", I got the warning, so this is not > >> "two start_grace's in a row" problem. > > > > Once the list is corrupted, it stays corrupted, so that's expected; the > > only interesting warning is the first one. > > > > But as you see the logs, nfsd made list corrupted first. Are you sure? You may be right, I just don't understand why. If you look at the definitions in lib/list_debug.c and include/linux/list.h, and trace through what would happen e.g. in a sequence like: list_add(item, head); list_add(item, head); list_del(item); list_add(item, head); where would corruption first be reported? I don't think it would be a the first place where the corruption was actually *created*. In fact I *think* the warning would first occur on the third list_add(), resulting in exactly the behavior seen in your logs--but someone should check that. --b. > Your fix only "two start_grace's in a row" of lockd . > > > --b. > > > >> Following is more message I got on last month. > >> ------------------------------------------------------ > >> Apr 16 16:35:41 localhost mountd[15061]: Caught signal 15, un-registering and exiting. > >> Apr 16 16:35:42 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 16 16:35:43 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 16 16:35:43 localhost kernel: ------------[ cut here ]------------ > >> Apr 16 16:35:43 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > >> Apr 16 16:35:43 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > >> Apr 16 16:35:43 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > >> Apr 16 16:35:43 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > >> Apr 16 16:35:43 localhost kernel: Pid: 17455, comm: rpc.nfsd Tainted: G W 2.6.30-rc2 #3 > >> Apr 16 16:35:43 localhost kernel: Call Trace: > >> Apr 16 16:35:43 localhost kernel: [] warn_slowpath+0x71/0xa0 > >> Apr 16 16:35:43 localhost kernel: [] ? nfsd4_build_namelist+0x0/0x8e [nfsd] > >> Apr 16 16:35:43 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 16 16:35:43 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 16 16:35:43 localhost kernel: [] ? mntput_no_expire+0x1c/0x101 > >> Apr 16 16:35:43 localhost kernel: [] ? dput+0x35/0x103 > >> Apr 16 16:35:43 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 16 16:35:43 localhost kernel: [] __list_add+0x27/0x5c > >> Apr 16 16:35:43 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > >> Apr 16 16:35:43 localhost kernel: [] nfs4_state_start+0x7a/0xdd [nfsd] > >> Apr 16 16:35:43 localhost kernel: [] nfsd_svc+0x57/0xf9 [nfsd] > >> Apr 16 16:35:43 localhost kernel: [] ? write_threads+0x0/0x59 [nfsd] > >> Apr 16 16:35:43 localhost kernel: [] write_threads+0x35/0x59 [nfsd] > >> Apr 16 16:35:43 localhost kernel: [] nfsctl_transaction_write+0x3b/0x58 [nfsd] > >> Apr 16 16:35:43 localhost kernel: [] ? nfsctl_transaction_write+0x0/0x58 [nfsd] > >> Apr 16 16:35:43 localhost kernel: [] vfs_write+0x7c/0xad > >> Apr 16 16:35:43 localhost kernel: [] sys_write+0x3b/0x60 > >> Apr 16 16:35:43 localhost kernel: [] sysenter_do_call+0x12/0x3c > >> Apr 16 16:35:43 localhost kernel: ---[ end trace fa484bd6d19ade87 ]--- > >> Apr 16 16:35:43 localhost kernel: NFSD: starting 90-second grace period > >> ...snip... > >> Apr 17 13:02:54 localhost mountd[17468]: Caught signal 15, un-registering and exiting. > >> Apr 17 13:02:54 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 13:02:55 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 13:02:55 localhost kernel: ------------[ cut here ]------------ > >> Apr 17 13:02:55 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > >> Apr 17 13:02:55 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > >> Apr 17 13:02:55 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > >> Apr 17 13:02:55 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > >> Apr 17 13:02:55 localhost kernel: Pid: 22642, comm: rpc.nfsd Tainted: G W 2.6.30-rc2 #3 > >> Apr 17 13:02:55 localhost kernel: Call Trace: > >> Apr 17 13:02:55 localhost kernel: [] warn_slowpath+0x71/0xa0 > >> Apr 17 13:02:55 localhost kernel: [] ? nfsd4_build_namelist+0x0/0x8e [nfsd] > >> Apr 17 13:02:55 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 13:02:55 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 17 13:02:55 localhost kernel: [] ? mntput_no_expire+0x1c/0x101 > >> Apr 17 13:02:55 localhost kernel: [] ? dput+0x35/0x103 > >> Apr 17 13:02:55 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 17 13:02:55 localhost kernel: [] __list_add+0x27/0x5c > >> Apr 17 13:02:55 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > >> Apr 17 13:02:55 localhost kernel: [] nfs4_state_start+0x7a/0xdd [nfsd] > >> Apr 17 13:02:55 localhost kernel: [] nfsd_svc+0x57/0xf9 [nfsd] > >> Apr 17 13:02:55 localhost kernel: [] ? write_threads+0x0/0x59 [nfsd] > >> Apr 17 13:02:55 localhost kernel: [] write_threads+0x35/0x59 [nfsd] > >> Apr 17 13:02:55 localhost kernel: [] nfsctl_transaction_write+0x3b/0x58 [nfsd] > >> Apr 17 13:02:55 localhost kernel: [] ? nfsctl_transaction_write+0x0/0x58 [nfsd] > >> Apr 17 13:02:55 localhost kernel: [] vfs_write+0x7c/0xad > >> Apr 17 13:02:55 localhost kernel: [] sys_write+0x3b/0x60 > >> Apr 17 13:02:55 localhost kernel: [] sysenter_do_call+0x12/0x3c > >> Apr 17 13:02:55 localhost kernel: ---[ end trace fa484bd6d19ade88 ]--- > >> Apr 17 13:02:55 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 13:04:07 localhost mountd[22655]: Caught signal 15, un-registering and exiting. > >> Apr 17 13:04:07 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 13:04:07 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 13:04:07 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 13:05:04 localhost mountd[22760]: Caught signal 15, un-registering and exiting. > >> Apr 17 13:05:04 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 13:05:05 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 13:05:05 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 13:06:10 localhost mountd[22859]: Caught signal 15, un-registering and exiting. > >> Apr 17 13:06:10 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 13:06:10 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 13:06:10 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 13:08:07 localhost mountd[22960]: Caught signal 15, un-registering and exiting. > >> Apr 17 13:08:07 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 13:08:07 localhost kernel: ------------[ cut here ]------------ > >> Apr 17 13:08:07 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > >> Apr 17 13:08:07 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > >> Apr 17 13:08:07 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > >> Apr 17 13:08:07 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > >> Apr 17 13:08:07 localhost kernel: Pid: 23062, comm: lockd Tainted: G W 2.6.30-rc2 #3 > >> Apr 17 13:08:07 localhost kernel: Call Trace: > >> Apr 17 13:08:07 localhost kernel: [] warn_slowpath+0x71/0xa0 > >> Apr 17 13:08:07 localhost kernel: [] ? update_curr+0x11d/0x125 > >> Apr 17 13:08:07 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 13:08:07 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > >> Apr 17 13:08:07 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 17 13:08:07 localhost kernel: [] __list_add+0x27/0x5c > >> Apr 17 13:08:07 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > >> Apr 17 13:08:07 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > >> Apr 17 13:08:07 localhost kernel: [] ? lock_kernel+0x1c/0x28 > >> Apr 17 13:08:07 localhost kernel: [] lockd+0x64/0x164 [lockd] > >> Apr 17 13:08:07 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 13:08:07 localhost kernel: [] ? complete+0x34/0x3e > >> Apr 17 13:08:07 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 13:08:07 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 13:08:07 localhost kernel: [] kthread+0x45/0x6b > >> Apr 17 13:08:07 localhost kernel: [] ? kthread+0x0/0x6b > >> Apr 17 13:08:07 localhost kernel: [] kernel_thread_helper+0x7/0x10 > >> Apr 17 13:08:07 localhost kernel: ---[ end trace fa484bd6d19ade89 ]--- > >> Apr 17 13:08:07 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 13:08:07 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 14:39:45 localhost mountd[23074]: Caught signal 15, un-registering and exiting. > >> Apr 17 14:39:45 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 14:39:45 localhost kernel: ------------[ cut here ]------------ > >> Apr 17 14:39:45 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > >> Apr 17 14:39:45 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > >> Apr 17 14:39:45 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > >> Apr 17 14:39:45 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > >> Apr 17 14:39:45 localhost kernel: Pid: 24287, comm: lockd Tainted: G W 2.6.30-rc2 #3 > >> Apr 17 14:39:45 localhost kernel: Call Trace: > >> Apr 17 14:39:45 localhost kernel: [] warn_slowpath+0x71/0xa0 > >> Apr 17 14:39:45 localhost kernel: [] ? update_curr+0x11d/0x125 > >> Apr 17 14:39:45 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 14:39:45 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > >> Apr 17 14:39:45 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 17 14:39:45 localhost kernel: [] __list_add+0x27/0x5c > >> Apr 17 14:39:45 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > >> Apr 17 14:39:45 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > >> Apr 17 14:39:45 localhost kernel: [] ? lock_kernel+0x1c/0x28 > >> Apr 17 14:39:45 localhost kernel: [] lockd+0x64/0x164 [lockd] > >> Apr 17 14:39:45 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 14:39:45 localhost kernel: [] ? complete+0x34/0x3e > >> Apr 17 14:39:45 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 14:39:45 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 14:39:45 localhost kernel: [] kthread+0x45/0x6b > >> Apr 17 14:39:45 localhost kernel: [] ? kthread+0x0/0x6b > >> Apr 17 14:39:45 localhost kernel: [] kernel_thread_helper+0x7/0x10 > >> Apr 17 14:39:45 localhost kernel: ---[ end trace fa484bd6d19ade8a ]--- > >> Apr 17 14:39:45 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 14:39:45 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 14:41:32 localhost mountd[24299]: Caught signal 15, un-registering and exiting. > >> Apr 17 14:41:32 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 14:41:33 localhost kernel: ------------[ cut here ]------------ > >> Apr 17 14:41:33 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > >> Apr 17 14:41:33 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > >> Apr 17 14:41:33 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > >> Apr 17 14:41:33 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > >> Apr 17 14:41:33 localhost kernel: Pid: 24399, comm: lockd Tainted: G W 2.6.30-rc2 #3 > >> Apr 17 14:41:33 localhost kernel: Call Trace: > >> Apr 17 14:41:33 localhost kernel: [] warn_slowpath+0x71/0xa0 > >> Apr 17 14:41:33 localhost kernel: [] ? update_curr+0x11d/0x125 > >> Apr 17 14:41:33 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 14:41:33 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > >> Apr 17 14:41:33 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 17 14:41:33 localhost kernel: [] __list_add+0x27/0x5c > >> Apr 17 14:41:33 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > >> Apr 17 14:41:33 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > >> Apr 17 14:41:33 localhost kernel: [] ? lock_kernel+0x1c/0x28 > >> Apr 17 14:41:33 localhost kernel: [] lockd+0x64/0x164 [lockd] > >> Apr 17 14:41:33 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 14:41:33 localhost kernel: [] ? complete+0x34/0x3e > >> Apr 17 14:41:33 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 14:41:33 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 14:41:33 localhost kernel: [] kthread+0x45/0x6b > >> Apr 17 14:41:33 localhost kernel: [] ? kthread+0x0/0x6b > >> Apr 17 14:41:33 localhost kernel: [] kernel_thread_helper+0x7/0x10 > >> Apr 17 14:41:33 localhost kernel: ---[ end trace fa484bd6d19ade8b ]--- > >> Apr 17 14:41:33 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 14:41:33 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 14:42:16 localhost mountd[24411]: Caught signal 15, un-registering and exiting. > >> Apr 17 14:42:17 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 14:42:17 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 14:42:17 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 14:42:52 localhost mountd[24508]: Caught signal 15, un-registering and exiting. > >> Apr 17 14:42:52 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 14:42:53 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 14:42:53 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 14:43:28 localhost mountd[24602]: Caught signal 15, un-registering and exiting. > >> Apr 17 14:43:28 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 14:43:29 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 14:43:29 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 14:43:59 localhost mountd[24697]: Caught signal 15, un-registering and exiting. > >> Apr 17 14:43:59 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 14:44:00 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 14:44:00 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 14:44:28 localhost mountd[24791]: Caught signal 15, un-registering and exiting. > >> Apr 17 14:44:28 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 14:44:29 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 14:44:29 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 14:45:33 localhost mountd[24885]: Caught signal 15, un-registering and exiting. > >> Apr 17 14:45:33 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 14:45:34 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 14:45:34 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 14:46:05 localhost mountd[24988]: Caught signal 15, un-registering and exiting. > >> Apr 17 14:46:05 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 14:46:05 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 14:46:05 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 14:46:34 localhost mountd[25082]: Caught signal 15, un-registering and exiting. > >> Apr 17 14:46:34 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 14:46:35 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 14:46:35 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 15:35:01 localhost mountd[25176]: Caught signal 15, un-registering and exiting. > >> Apr 17 15:35:02 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 15:35:02 localhost kernel: ------------[ cut here ]------------ > >> Apr 17 15:35:02 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > >> Apr 17 15:35:02 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > >> Apr 17 15:35:02 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > >> Apr 17 15:35:02 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > >> Apr 17 15:35:02 localhost kernel: Pid: 25883, comm: lockd Tainted: G W 2.6.30-rc2 #3 > >> Apr 17 15:35:02 localhost kernel: Call Trace: > >> Apr 17 15:35:02 localhost kernel: [] warn_slowpath+0x71/0xa0 > >> Apr 17 15:35:02 localhost kernel: [] ? update_curr+0x11d/0x125 > >> Apr 17 15:35:02 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 15:35:02 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > >> Apr 17 15:35:02 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 17 15:35:02 localhost kernel: [] __list_add+0x27/0x5c > >> Apr 17 15:35:02 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > >> Apr 17 15:35:02 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > >> Apr 17 15:35:02 localhost kernel: [] ? lock_kernel+0x1c/0x28 > >> Apr 17 15:35:02 localhost kernel: [] lockd+0x64/0x164 [lockd] > >> Apr 17 15:35:02 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 15:35:02 localhost kernel: [] ? complete+0x34/0x3e > >> Apr 17 15:35:02 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 15:35:02 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 15:35:02 localhost kernel: [] kthread+0x45/0x6b > >> Apr 17 15:35:02 localhost kernel: [] ? kthread+0x0/0x6b > >> Apr 17 15:35:02 localhost kernel: [] kernel_thread_helper+0x7/0x10 > >> Apr 17 15:35:02 localhost kernel: ---[ end trace fa484bd6d19ade8c ]--- > >> Apr 17 15:35:02 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 15:35:02 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 15:55:22 localhost mountd[25895]: Caught signal 15, un-registering and exiting. > >> Apr 17 15:55:22 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 15:55:23 localhost kernel: ------------[ cut here ]------------ > >> Apr 17 15:55:23 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > >> Apr 17 15:55:23 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > >> Apr 17 15:55:23 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > >> Apr 17 15:55:23 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > >> Apr 17 15:55:23 localhost kernel: Pid: 26230, comm: lockd Tainted: G W 2.6.30-rc2 #3 > >> Apr 17 15:55:23 localhost kernel: Call Trace: > >> Apr 17 15:55:23 localhost kernel: [] warn_slowpath+0x71/0xa0 > >> Apr 17 15:55:23 localhost kernel: [] ? update_curr+0x11d/0x125 > >> Apr 17 15:55:23 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 15:55:23 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > >> Apr 17 15:55:23 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 17 15:55:23 localhost kernel: [] __list_add+0x27/0x5c > >> Apr 17 15:55:23 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > >> Apr 17 15:55:23 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > >> Apr 17 15:55:23 localhost kernel: [] ? lock_kernel+0x1c/0x28 > >> Apr 17 15:55:23 localhost kernel: [] lockd+0x64/0x164 [lockd] > >> Apr 17 15:55:23 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 15:55:23 localhost kernel: [] ? complete+0x34/0x3e > >> Apr 17 15:55:23 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 15:55:23 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 15:55:23 localhost kernel: [] kthread+0x45/0x6b > >> Apr 17 15:55:23 localhost kernel: [] ? kthread+0x0/0x6b > >> Apr 17 15:55:23 localhost kernel: [] kernel_thread_helper+0x7/0x10 > >> Apr 17 15:55:23 localhost kernel: ---[ end trace fa484bd6d19ade8d ]--- > >> Apr 17 15:55:23 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 15:55:23 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 16:54:27 localhost mountd[26242]: Caught signal 15, un-registering and exiting. > >> Apr 17 16:54:27 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 16:54:28 localhost kernel: ------------[ cut here ]------------ > >> Apr 17 16:54:28 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > >> Apr 17 16:54:28 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > >> Apr 17 16:54:28 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > >> Apr 17 16:54:28 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > >> Apr 17 16:54:28 localhost kernel: Pid: 27044, comm: lockd Tainted: G W 2.6.30-rc2 #3 > >> Apr 17 16:54:28 localhost kernel: Call Trace: > >> Apr 17 16:54:28 localhost kernel: [] warn_slowpath+0x71/0xa0 > >> Apr 17 16:54:28 localhost kernel: [] ? update_curr+0x11d/0x125 > >> Apr 17 16:54:28 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 16:54:28 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > >> Apr 17 16:54:28 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 17 16:54:28 localhost kernel: [] __list_add+0x27/0x5c > >> Apr 17 16:54:28 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > >> Apr 17 16:54:28 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > >> Apr 17 16:54:28 localhost kernel: [] ? lock_kernel+0x1c/0x28 > >> Apr 17 16:54:28 localhost kernel: [] lockd+0x64/0x164 [lockd] > >> Apr 17 16:54:28 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 16:54:28 localhost kernel: [] ? complete+0x34/0x3e > >> Apr 17 16:54:28 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 16:54:28 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 16:54:28 localhost kernel: [] kthread+0x45/0x6b > >> Apr 17 16:54:28 localhost kernel: [] ? kthread+0x0/0x6b > >> Apr 17 16:54:28 localhost kernel: [] kernel_thread_helper+0x7/0x10 > >> Apr 17 16:54:28 localhost kernel: ---[ end trace fa484bd6d19ade8e ]--- > >> Apr 17 16:54:28 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 16:54:28 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 16:59:55 localhost mountd[27056]: Caught signal 15, un-registering and exiting. > >> Apr 17 16:59:55 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 16:59:56 localhost kernel: ------------[ cut here ]------------ > >> Apr 17 16:59:56 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > >> Apr 17 16:59:56 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > >> Apr 17 16:59:56 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > >> Apr 17 16:59:56 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > >> Apr 17 16:59:56 localhost kernel: Pid: 27197, comm: lockd Tainted: G W 2.6.30-rc2 #3 > >> Apr 17 16:59:56 localhost kernel: Call Trace: > >> Apr 17 16:59:56 localhost kernel: [] warn_slowpath+0x71/0xa0 > >> Apr 17 16:59:56 localhost kernel: [] ? update_curr+0x11d/0x125 > >> Apr 17 16:59:56 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 16:59:56 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > >> Apr 17 16:59:56 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 17 16:59:56 localhost kernel: [] __list_add+0x27/0x5c > >> Apr 17 16:59:56 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > >> Apr 17 16:59:56 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > >> Apr 17 16:59:56 localhost kernel: [] ? lock_kernel+0x1c/0x28 > >> Apr 17 16:59:56 localhost kernel: [] lockd+0x64/0x164 [lockd] > >> Apr 17 16:59:56 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 16:59:56 localhost kernel: [] ? complete+0x34/0x3e > >> Apr 17 16:59:56 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 16:59:56 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 16:59:56 localhost kernel: [] kthread+0x45/0x6b > >> Apr 17 16:59:56 localhost kernel: [] ? kthread+0x0/0x6b > >> Apr 17 16:59:56 localhost kernel: [] kernel_thread_helper+0x7/0x10 > >> Apr 17 16:59:56 localhost kernel: ---[ end trace fa484bd6d19ade8f ]--- > >> Apr 17 16:59:56 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 16:59:56 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 17:02:50 localhost mountd[27209]: Caught signal 15, un-registering and exiting. > >> Apr 17 17:02:50 localhost kernel: nfsd: last server has exited, flushing export cache > >> Apr 17 17:02:51 localhost kernel: ------------[ cut here ]------------ > >> Apr 17 17:02:51 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > >> Apr 17 17:02:51 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > >> Apr 17 17:02:51 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > >> Apr 17 17:02:51 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > >> Apr 17 17:02:51 localhost kernel: Pid: 27349, comm: lockd Tainted: G W 2.6.30-rc2 #3 > >> Apr 17 17:02:51 localhost kernel: Call Trace: > >> Apr 17 17:02:51 localhost kernel: [] warn_slowpath+0x71/0xa0 > >> Apr 17 17:02:51 localhost kernel: [] ? update_curr+0x11d/0x125 > >> Apr 17 17:02:51 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 17:02:51 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > >> Apr 17 17:02:51 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > >> Apr 17 17:02:51 localhost kernel: [] __list_add+0x27/0x5c > >> Apr 17 17:02:51 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > >> Apr 17 17:02:51 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > >> Apr 17 17:02:51 localhost kernel: [] ? lock_kernel+0x1c/0x28 > >> Apr 17 17:02:51 localhost kernel: [] lockd+0x64/0x164 [lockd] > >> Apr 17 17:02:51 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > >> Apr 17 17:02:51 localhost kernel: [] ? complete+0x34/0x3e > >> Apr 17 17:02:51 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 17:02:51 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > >> Apr 17 17:02:51 localhost kernel: [] kthread+0x45/0x6b > >> Apr 17 17:02:51 localhost kernel: [] ? kthread+0x0/0x6b > >> Apr 17 17:02:51 localhost kernel: [] kernel_thread_helper+0x7/0x10 > >> Apr 17 17:02:51 localhost kernel: ---[ end trace fa484bd6d19ade90 ]--- > >> Apr 17 17:02:51 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > >> Apr 17 17:02:51 localhost kernel: NFSD: starting 90-second grace period > >> Apr 17 17:08:09 localhost mountd[27361]: authenticated mount request from 10.167.141.101:695 for /tmp/nfs3 (/tmp/nfs3) > >> > >>> --b. > >>> > >>>>> --b. > >>>>> > >>>>> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c > >>>>> index abf8388..1a54ae1 100644 > >>>>> --- a/fs/lockd/svc.c > >>>>> +++ b/fs/lockd/svc.c > >>>>> @@ -104,6 +104,16 @@ static void set_grace_period(void) > >>>>> schedule_delayed_work(&grace_period_end, grace_period); > >>>>> } > >>>>> > >>>>> +static void restart_grace(void) > >>>>> +{ > >>>>> + if (nlmsvc_ops) { > >>>>> + cancel_delayed_work_sync(&grace_period_end); > >>>>> + locks_end_grace(&lockd_manager); > >>>>> + nlmsvc_invalidate_all(); > >>>>> + set_grace_period(); > >>>>> + } > >>>>> +} > >>>>> + > >>>>> /* > >>>>> * This is the lockd kernel thread > >>>>> */ > >>>>> @@ -149,10 +159,7 @@ lockd(void *vrqstp) > >>>>> > >>>>> if (signalled()) { > >>>>> flush_signals(current); > >>>>> - if (nlmsvc_ops) { > >>>>> - nlmsvc_invalidate_all(); > >>>>> - set_grace_period(); > >>>>> - } > >>>>> + restart_grace(); > >>>>> continue; > >>>>> } > >>>>> > >>>>> >