From: "J. Bruce Fields" Subject: Re: [PATCH] nfs lockd: detect grace_list corruption Date: Mon, 11 May 2009 16:57:41 -0400 Message-ID: <20090511205741.GH793@fieldses.org> References: <49F12D78.2040304@cn.fujitsu.com> <20090424231252.GD22477@fieldses.org> <4A0155A0.4020008@cn.fujitsu.com> <20090506203227.GM9861@fieldses.org> <4A0284EB.9050202@cn.fujitsu.com> <20090508182648.GD20539@fieldses.org> <4A07C21E.3020309@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: neilb@suse.de, Trond.Myklebust@netapp.com, linux-nfs@vger.kernel.org, FNST-Bian Naimeng To: Wang Chen Return-path: Received: from mail.fieldses.org ([141.211.133.115]:50649 "EHLO pickle.fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753244AbZEKU5m (ORCPT ); Mon, 11 May 2009 16:57:42 -0400 In-Reply-To: <4A07C21E.3020309@cn.fujitsu.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Mon, May 11, 2009 at 02:13:50PM +0800, Wang Chen wrote: > J. Bruce Fields said the following on 2009-5-9 2:26: > > On Thu, May 07, 2009 at 02:51:23PM +0800, Wang Chen wrote: > >> J. Bruce Fields said the following on 2009-5-7 4:32: > >>> On Wed, May 06, 2009 at 05:17:20PM +0800, Wang Chen wrote: > >>>> J. Bruce Fields said the following on 2009-4-25 7:12: > >>>>> On Fri, Apr 24, 2009 at 11:09:44AM +0800, Wang Chen wrote: > >>>>>> Although I can't reproduce it now, it really happened that some lock manager > >>>>>> started grace period but didn't end it. > >>>>>> This causes an lm entry be left in grace_list, and when service nfs restart, > >>>>>> the same lm will be added again into the list. > >>>>>> As you know, adding an entry, which is in the list, to a list will leads to > >>>>>> list corruption. > >>>>> I'd really like to understand why locks_end_grace() isn't being called. > >>>>> I'm probably overlooking something obvious, but I just can't see how > >>>>> lockd or nfsd can be shut down right now without locks_end_grace() being > >>>>> called. > >>>>> > >>>> Me neither can figure out why locks_end_grace() isn't being called. > >>>> > >>>> But do locks_start_grace() twice can trigger this warning too. > >>>> You can do > >>>> 1. service nfs restart > >>>> 2. (immediately) kill -s SIGKILL lockd > >>>> this can trigger > >>>> --- > >>>> lockd(void *vrqstp) > >>>> ... > >>>> if (signalled()) { > >>>> flush_signals(current); > >>>> if (nlmsvc_ops) { > >>>> nlmsvc_invalidate_all(); > >>>> set_grace_period(); > >>>> --- > >>>> and makes locks_start_grace() be called twice without locks_end_grace(). > >>> Ah-hah! > >>> > >>>> So I still suggest to do something to protect the lm list. :) > >>> I wouldn't be opposed to a simple WARN_ON(!list_empty()) in > >>> locks_start_grace(), but I'm mainly worried about fixing the original > >>> bug. How about the following? > >>> > >> Yeah, the following fix is OK to me, although it only fixed > >> "start_grace again after start_grace" case. > > > > OK, thanks. > > > >> The bug about "quit lockd without end_grace", which I encountered before > >> incidentally, maybe is still there. > > > > You're talking about the report that started this thread?: > > > > http://marc.info/?l=linux-nfs&m=124054262421444&w=2 > > > > Yes. I mean this. > > > It looks to me like that could be explained by two start_grace's in a > > row. > > > > But in that report, I didn't post the total message. > Here are something show that: > 1. not only lockd has the problem, but nfsd also. > 2. every time I do "service nfs restart", I got the warning, so this is not > "two start_grace's in a row" problem. Once the list is corrupted, it stays corrupted, so that's expected; the only interesting warning is the first one. --b. > Following is more message I got on last month. > ------------------------------------------------------ > Apr 16 16:35:41 localhost mountd[15061]: Caught signal 15, un-registering and exiting. > Apr 16 16:35:42 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 16 16:35:43 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 16 16:35:43 localhost kernel: ------------[ cut here ]------------ > Apr 16 16:35:43 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > Apr 16 16:35:43 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > Apr 16 16:35:43 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > Apr 16 16:35:43 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > Apr 16 16:35:43 localhost kernel: Pid: 17455, comm: rpc.nfsd Tainted: G W 2.6.30-rc2 #3 > Apr 16 16:35:43 localhost kernel: Call Trace: > Apr 16 16:35:43 localhost kernel: [] warn_slowpath+0x71/0xa0 > Apr 16 16:35:43 localhost kernel: [] ? nfsd4_build_namelist+0x0/0x8e [nfsd] > Apr 16 16:35:43 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 16 16:35:43 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 16 16:35:43 localhost kernel: [] ? mntput_no_expire+0x1c/0x101 > Apr 16 16:35:43 localhost kernel: [] ? dput+0x35/0x103 > Apr 16 16:35:43 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 16 16:35:43 localhost kernel: [] __list_add+0x27/0x5c > Apr 16 16:35:43 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > Apr 16 16:35:43 localhost kernel: [] nfs4_state_start+0x7a/0xdd [nfsd] > Apr 16 16:35:43 localhost kernel: [] nfsd_svc+0x57/0xf9 [nfsd] > Apr 16 16:35:43 localhost kernel: [] ? write_threads+0x0/0x59 [nfsd] > Apr 16 16:35:43 localhost kernel: [] write_threads+0x35/0x59 [nfsd] > Apr 16 16:35:43 localhost kernel: [] nfsctl_transaction_write+0x3b/0x58 [nfsd] > Apr 16 16:35:43 localhost kernel: [] ? nfsctl_transaction_write+0x0/0x58 [nfsd] > Apr 16 16:35:43 localhost kernel: [] vfs_write+0x7c/0xad > Apr 16 16:35:43 localhost kernel: [] sys_write+0x3b/0x60 > Apr 16 16:35:43 localhost kernel: [] sysenter_do_call+0x12/0x3c > Apr 16 16:35:43 localhost kernel: ---[ end trace fa484bd6d19ade87 ]--- > Apr 16 16:35:43 localhost kernel: NFSD: starting 90-second grace period > ...snip... > Apr 17 13:02:54 localhost mountd[17468]: Caught signal 15, un-registering and exiting. > Apr 17 13:02:54 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 13:02:55 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 13:02:55 localhost kernel: ------------[ cut here ]------------ > Apr 17 13:02:55 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > Apr 17 13:02:55 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > Apr 17 13:02:55 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > Apr 17 13:02:55 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > Apr 17 13:02:55 localhost kernel: Pid: 22642, comm: rpc.nfsd Tainted: G W 2.6.30-rc2 #3 > Apr 17 13:02:55 localhost kernel: Call Trace: > Apr 17 13:02:55 localhost kernel: [] warn_slowpath+0x71/0xa0 > Apr 17 13:02:55 localhost kernel: [] ? nfsd4_build_namelist+0x0/0x8e [nfsd] > Apr 17 13:02:55 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 13:02:55 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 17 13:02:55 localhost kernel: [] ? mntput_no_expire+0x1c/0x101 > Apr 17 13:02:55 localhost kernel: [] ? dput+0x35/0x103 > Apr 17 13:02:55 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 17 13:02:55 localhost kernel: [] __list_add+0x27/0x5c > Apr 17 13:02:55 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > Apr 17 13:02:55 localhost kernel: [] nfs4_state_start+0x7a/0xdd [nfsd] > Apr 17 13:02:55 localhost kernel: [] nfsd_svc+0x57/0xf9 [nfsd] > Apr 17 13:02:55 localhost kernel: [] ? write_threads+0x0/0x59 [nfsd] > Apr 17 13:02:55 localhost kernel: [] write_threads+0x35/0x59 [nfsd] > Apr 17 13:02:55 localhost kernel: [] nfsctl_transaction_write+0x3b/0x58 [nfsd] > Apr 17 13:02:55 localhost kernel: [] ? nfsctl_transaction_write+0x0/0x58 [nfsd] > Apr 17 13:02:55 localhost kernel: [] vfs_write+0x7c/0xad > Apr 17 13:02:55 localhost kernel: [] sys_write+0x3b/0x60 > Apr 17 13:02:55 localhost kernel: [] sysenter_do_call+0x12/0x3c > Apr 17 13:02:55 localhost kernel: ---[ end trace fa484bd6d19ade88 ]--- > Apr 17 13:02:55 localhost kernel: NFSD: starting 90-second grace period > Apr 17 13:04:07 localhost mountd[22655]: Caught signal 15, un-registering and exiting. > Apr 17 13:04:07 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 13:04:07 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 13:04:07 localhost kernel: NFSD: starting 90-second grace period > Apr 17 13:05:04 localhost mountd[22760]: Caught signal 15, un-registering and exiting. > Apr 17 13:05:04 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 13:05:05 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 13:05:05 localhost kernel: NFSD: starting 90-second grace period > Apr 17 13:06:10 localhost mountd[22859]: Caught signal 15, un-registering and exiting. > Apr 17 13:06:10 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 13:06:10 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 13:06:10 localhost kernel: NFSD: starting 90-second grace period > Apr 17 13:08:07 localhost mountd[22960]: Caught signal 15, un-registering and exiting. > Apr 17 13:08:07 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 13:08:07 localhost kernel: ------------[ cut here ]------------ > Apr 17 13:08:07 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > Apr 17 13:08:07 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > Apr 17 13:08:07 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > Apr 17 13:08:07 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > Apr 17 13:08:07 localhost kernel: Pid: 23062, comm: lockd Tainted: G W 2.6.30-rc2 #3 > Apr 17 13:08:07 localhost kernel: Call Trace: > Apr 17 13:08:07 localhost kernel: [] warn_slowpath+0x71/0xa0 > Apr 17 13:08:07 localhost kernel: [] ? update_curr+0x11d/0x125 > Apr 17 13:08:07 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 13:08:07 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > Apr 17 13:08:07 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 17 13:08:07 localhost kernel: [] __list_add+0x27/0x5c > Apr 17 13:08:07 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > Apr 17 13:08:07 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > Apr 17 13:08:07 localhost kernel: [] ? lock_kernel+0x1c/0x28 > Apr 17 13:08:07 localhost kernel: [] lockd+0x64/0x164 [lockd] > Apr 17 13:08:07 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 13:08:07 localhost kernel: [] ? complete+0x34/0x3e > Apr 17 13:08:07 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 13:08:07 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 13:08:07 localhost kernel: [] kthread+0x45/0x6b > Apr 17 13:08:07 localhost kernel: [] ? kthread+0x0/0x6b > Apr 17 13:08:07 localhost kernel: [] kernel_thread_helper+0x7/0x10 > Apr 17 13:08:07 localhost kernel: ---[ end trace fa484bd6d19ade89 ]--- > Apr 17 13:08:07 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 13:08:07 localhost kernel: NFSD: starting 90-second grace period > Apr 17 14:39:45 localhost mountd[23074]: Caught signal 15, un-registering and exiting. > Apr 17 14:39:45 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 14:39:45 localhost kernel: ------------[ cut here ]------------ > Apr 17 14:39:45 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > Apr 17 14:39:45 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > Apr 17 14:39:45 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > Apr 17 14:39:45 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > Apr 17 14:39:45 localhost kernel: Pid: 24287, comm: lockd Tainted: G W 2.6.30-rc2 #3 > Apr 17 14:39:45 localhost kernel: Call Trace: > Apr 17 14:39:45 localhost kernel: [] warn_slowpath+0x71/0xa0 > Apr 17 14:39:45 localhost kernel: [] ? update_curr+0x11d/0x125 > Apr 17 14:39:45 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 14:39:45 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > Apr 17 14:39:45 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 17 14:39:45 localhost kernel: [] __list_add+0x27/0x5c > Apr 17 14:39:45 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > Apr 17 14:39:45 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > Apr 17 14:39:45 localhost kernel: [] ? lock_kernel+0x1c/0x28 > Apr 17 14:39:45 localhost kernel: [] lockd+0x64/0x164 [lockd] > Apr 17 14:39:45 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 14:39:45 localhost kernel: [] ? complete+0x34/0x3e > Apr 17 14:39:45 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 14:39:45 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 14:39:45 localhost kernel: [] kthread+0x45/0x6b > Apr 17 14:39:45 localhost kernel: [] ? kthread+0x0/0x6b > Apr 17 14:39:45 localhost kernel: [] kernel_thread_helper+0x7/0x10 > Apr 17 14:39:45 localhost kernel: ---[ end trace fa484bd6d19ade8a ]--- > Apr 17 14:39:45 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 14:39:45 localhost kernel: NFSD: starting 90-second grace period > Apr 17 14:41:32 localhost mountd[24299]: Caught signal 15, un-registering and exiting. > Apr 17 14:41:32 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 14:41:33 localhost kernel: ------------[ cut here ]------------ > Apr 17 14:41:33 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > Apr 17 14:41:33 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > Apr 17 14:41:33 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > Apr 17 14:41:33 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > Apr 17 14:41:33 localhost kernel: Pid: 24399, comm: lockd Tainted: G W 2.6.30-rc2 #3 > Apr 17 14:41:33 localhost kernel: Call Trace: > Apr 17 14:41:33 localhost kernel: [] warn_slowpath+0x71/0xa0 > Apr 17 14:41:33 localhost kernel: [] ? update_curr+0x11d/0x125 > Apr 17 14:41:33 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 14:41:33 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > Apr 17 14:41:33 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 17 14:41:33 localhost kernel: [] __list_add+0x27/0x5c > Apr 17 14:41:33 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > Apr 17 14:41:33 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > Apr 17 14:41:33 localhost kernel: [] ? lock_kernel+0x1c/0x28 > Apr 17 14:41:33 localhost kernel: [] lockd+0x64/0x164 [lockd] > Apr 17 14:41:33 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 14:41:33 localhost kernel: [] ? complete+0x34/0x3e > Apr 17 14:41:33 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 14:41:33 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 14:41:33 localhost kernel: [] kthread+0x45/0x6b > Apr 17 14:41:33 localhost kernel: [] ? kthread+0x0/0x6b > Apr 17 14:41:33 localhost kernel: [] kernel_thread_helper+0x7/0x10 > Apr 17 14:41:33 localhost kernel: ---[ end trace fa484bd6d19ade8b ]--- > Apr 17 14:41:33 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 14:41:33 localhost kernel: NFSD: starting 90-second grace period > Apr 17 14:42:16 localhost mountd[24411]: Caught signal 15, un-registering and exiting. > Apr 17 14:42:17 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 14:42:17 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 14:42:17 localhost kernel: NFSD: starting 90-second grace period > Apr 17 14:42:52 localhost mountd[24508]: Caught signal 15, un-registering and exiting. > Apr 17 14:42:52 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 14:42:53 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 14:42:53 localhost kernel: NFSD: starting 90-second grace period > Apr 17 14:43:28 localhost mountd[24602]: Caught signal 15, un-registering and exiting. > Apr 17 14:43:28 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 14:43:29 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 14:43:29 localhost kernel: NFSD: starting 90-second grace period > Apr 17 14:43:59 localhost mountd[24697]: Caught signal 15, un-registering and exiting. > Apr 17 14:43:59 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 14:44:00 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 14:44:00 localhost kernel: NFSD: starting 90-second grace period > Apr 17 14:44:28 localhost mountd[24791]: Caught signal 15, un-registering and exiting. > Apr 17 14:44:28 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 14:44:29 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 14:44:29 localhost kernel: NFSD: starting 90-second grace period > Apr 17 14:45:33 localhost mountd[24885]: Caught signal 15, un-registering and exiting. > Apr 17 14:45:33 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 14:45:34 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 14:45:34 localhost kernel: NFSD: starting 90-second grace period > Apr 17 14:46:05 localhost mountd[24988]: Caught signal 15, un-registering and exiting. > Apr 17 14:46:05 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 14:46:05 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 14:46:05 localhost kernel: NFSD: starting 90-second grace period > Apr 17 14:46:34 localhost mountd[25082]: Caught signal 15, un-registering and exiting. > Apr 17 14:46:34 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 14:46:35 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 14:46:35 localhost kernel: NFSD: starting 90-second grace period > Apr 17 15:35:01 localhost mountd[25176]: Caught signal 15, un-registering and exiting. > Apr 17 15:35:02 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 15:35:02 localhost kernel: ------------[ cut here ]------------ > Apr 17 15:35:02 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > Apr 17 15:35:02 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > Apr 17 15:35:02 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > Apr 17 15:35:02 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > Apr 17 15:35:02 localhost kernel: Pid: 25883, comm: lockd Tainted: G W 2.6.30-rc2 #3 > Apr 17 15:35:02 localhost kernel: Call Trace: > Apr 17 15:35:02 localhost kernel: [] warn_slowpath+0x71/0xa0 > Apr 17 15:35:02 localhost kernel: [] ? update_curr+0x11d/0x125 > Apr 17 15:35:02 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 15:35:02 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > Apr 17 15:35:02 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 17 15:35:02 localhost kernel: [] __list_add+0x27/0x5c > Apr 17 15:35:02 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > Apr 17 15:35:02 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > Apr 17 15:35:02 localhost kernel: [] ? lock_kernel+0x1c/0x28 > Apr 17 15:35:02 localhost kernel: [] lockd+0x64/0x164 [lockd] > Apr 17 15:35:02 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 15:35:02 localhost kernel: [] ? complete+0x34/0x3e > Apr 17 15:35:02 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 15:35:02 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 15:35:02 localhost kernel: [] kthread+0x45/0x6b > Apr 17 15:35:02 localhost kernel: [] ? kthread+0x0/0x6b > Apr 17 15:35:02 localhost kernel: [] kernel_thread_helper+0x7/0x10 > Apr 17 15:35:02 localhost kernel: ---[ end trace fa484bd6d19ade8c ]--- > Apr 17 15:35:02 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 15:35:02 localhost kernel: NFSD: starting 90-second grace period > Apr 17 15:55:22 localhost mountd[25895]: Caught signal 15, un-registering and exiting. > Apr 17 15:55:22 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 15:55:23 localhost kernel: ------------[ cut here ]------------ > Apr 17 15:55:23 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > Apr 17 15:55:23 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > Apr 17 15:55:23 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > Apr 17 15:55:23 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > Apr 17 15:55:23 localhost kernel: Pid: 26230, comm: lockd Tainted: G W 2.6.30-rc2 #3 > Apr 17 15:55:23 localhost kernel: Call Trace: > Apr 17 15:55:23 localhost kernel: [] warn_slowpath+0x71/0xa0 > Apr 17 15:55:23 localhost kernel: [] ? update_curr+0x11d/0x125 > Apr 17 15:55:23 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 15:55:23 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > Apr 17 15:55:23 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 17 15:55:23 localhost kernel: [] __list_add+0x27/0x5c > Apr 17 15:55:23 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > Apr 17 15:55:23 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > Apr 17 15:55:23 localhost kernel: [] ? lock_kernel+0x1c/0x28 > Apr 17 15:55:23 localhost kernel: [] lockd+0x64/0x164 [lockd] > Apr 17 15:55:23 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 15:55:23 localhost kernel: [] ? complete+0x34/0x3e > Apr 17 15:55:23 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 15:55:23 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 15:55:23 localhost kernel: [] kthread+0x45/0x6b > Apr 17 15:55:23 localhost kernel: [] ? kthread+0x0/0x6b > Apr 17 15:55:23 localhost kernel: [] kernel_thread_helper+0x7/0x10 > Apr 17 15:55:23 localhost kernel: ---[ end trace fa484bd6d19ade8d ]--- > Apr 17 15:55:23 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 15:55:23 localhost kernel: NFSD: starting 90-second grace period > Apr 17 16:54:27 localhost mountd[26242]: Caught signal 15, un-registering and exiting. > Apr 17 16:54:27 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 16:54:28 localhost kernel: ------------[ cut here ]------------ > Apr 17 16:54:28 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > Apr 17 16:54:28 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > Apr 17 16:54:28 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > Apr 17 16:54:28 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > Apr 17 16:54:28 localhost kernel: Pid: 27044, comm: lockd Tainted: G W 2.6.30-rc2 #3 > Apr 17 16:54:28 localhost kernel: Call Trace: > Apr 17 16:54:28 localhost kernel: [] warn_slowpath+0x71/0xa0 > Apr 17 16:54:28 localhost kernel: [] ? update_curr+0x11d/0x125 > Apr 17 16:54:28 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 16:54:28 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > Apr 17 16:54:28 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 17 16:54:28 localhost kernel: [] __list_add+0x27/0x5c > Apr 17 16:54:28 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > Apr 17 16:54:28 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > Apr 17 16:54:28 localhost kernel: [] ? lock_kernel+0x1c/0x28 > Apr 17 16:54:28 localhost kernel: [] lockd+0x64/0x164 [lockd] > Apr 17 16:54:28 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 16:54:28 localhost kernel: [] ? complete+0x34/0x3e > Apr 17 16:54:28 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 16:54:28 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 16:54:28 localhost kernel: [] kthread+0x45/0x6b > Apr 17 16:54:28 localhost kernel: [] ? kthread+0x0/0x6b > Apr 17 16:54:28 localhost kernel: [] kernel_thread_helper+0x7/0x10 > Apr 17 16:54:28 localhost kernel: ---[ end trace fa484bd6d19ade8e ]--- > Apr 17 16:54:28 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 16:54:28 localhost kernel: NFSD: starting 90-second grace period > Apr 17 16:59:55 localhost mountd[27056]: Caught signal 15, un-registering and exiting. > Apr 17 16:59:55 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 16:59:56 localhost kernel: ------------[ cut here ]------------ > Apr 17 16:59:56 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > Apr 17 16:59:56 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > Apr 17 16:59:56 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > Apr 17 16:59:56 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > Apr 17 16:59:56 localhost kernel: Pid: 27197, comm: lockd Tainted: G W 2.6.30-rc2 #3 > Apr 17 16:59:56 localhost kernel: Call Trace: > Apr 17 16:59:56 localhost kernel: [] warn_slowpath+0x71/0xa0 > Apr 17 16:59:56 localhost kernel: [] ? update_curr+0x11d/0x125 > Apr 17 16:59:56 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 16:59:56 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > Apr 17 16:59:56 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 17 16:59:56 localhost kernel: [] __list_add+0x27/0x5c > Apr 17 16:59:56 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > Apr 17 16:59:56 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > Apr 17 16:59:56 localhost kernel: [] ? lock_kernel+0x1c/0x28 > Apr 17 16:59:56 localhost kernel: [] lockd+0x64/0x164 [lockd] > Apr 17 16:59:56 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 16:59:56 localhost kernel: [] ? complete+0x34/0x3e > Apr 17 16:59:56 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 16:59:56 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 16:59:56 localhost kernel: [] kthread+0x45/0x6b > Apr 17 16:59:56 localhost kernel: [] ? kthread+0x0/0x6b > Apr 17 16:59:56 localhost kernel: [] kernel_thread_helper+0x7/0x10 > Apr 17 16:59:56 localhost kernel: ---[ end trace fa484bd6d19ade8f ]--- > Apr 17 16:59:56 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 16:59:56 localhost kernel: NFSD: starting 90-second grace period > Apr 17 17:02:50 localhost mountd[27209]: Caught signal 15, un-registering and exiting. > Apr 17 17:02:50 localhost kernel: nfsd: last server has exited, flushing export cache > Apr 17 17:02:51 localhost kernel: ------------[ cut here ]------------ > Apr 17 17:02:51 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() > Apr 17 17:02:51 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) > Apr 17 17:02:51 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). > Apr 17 17:02:51 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] > Apr 17 17:02:51 localhost kernel: Pid: 27349, comm: lockd Tainted: G W 2.6.30-rc2 #3 > Apr 17 17:02:51 localhost kernel: Call Trace: > Apr 17 17:02:51 localhost kernel: [] warn_slowpath+0x71/0xa0 > Apr 17 17:02:51 localhost kernel: [] ? update_curr+0x11d/0x125 > Apr 17 17:02:51 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 17:02:51 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd > Apr 17 17:02:51 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa > Apr 17 17:02:51 localhost kernel: [] __list_add+0x27/0x5c > Apr 17 17:02:51 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] > Apr 17 17:02:51 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] > Apr 17 17:02:51 localhost kernel: [] ? lock_kernel+0x1c/0x28 > Apr 17 17:02:51 localhost kernel: [] lockd+0x64/0x164 [lockd] > Apr 17 17:02:51 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 > Apr 17 17:02:51 localhost kernel: [] ? complete+0x34/0x3e > Apr 17 17:02:51 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 17:02:51 localhost kernel: [] ? lockd+0x0/0x164 [lockd] > Apr 17 17:02:51 localhost kernel: [] kthread+0x45/0x6b > Apr 17 17:02:51 localhost kernel: [] ? kthread+0x0/0x6b > Apr 17 17:02:51 localhost kernel: [] kernel_thread_helper+0x7/0x10 > Apr 17 17:02:51 localhost kernel: ---[ end trace fa484bd6d19ade90 ]--- > Apr 17 17:02:51 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > Apr 17 17:02:51 localhost kernel: NFSD: starting 90-second grace period > Apr 17 17:08:09 localhost mountd[27361]: authenticated mount request from 10.167.141.101:695 for /tmp/nfs3 (/tmp/nfs3) > > > --b. > > > >>> --b. > >>> > >>> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c > >>> index abf8388..1a54ae1 100644 > >>> --- a/fs/lockd/svc.c > >>> +++ b/fs/lockd/svc.c > >>> @@ -104,6 +104,16 @@ static void set_grace_period(void) > >>> schedule_delayed_work(&grace_period_end, grace_period); > >>> } > >>> > >>> +static void restart_grace(void) > >>> +{ > >>> + if (nlmsvc_ops) { > >>> + cancel_delayed_work_sync(&grace_period_end); > >>> + locks_end_grace(&lockd_manager); > >>> + nlmsvc_invalidate_all(); > >>> + set_grace_period(); > >>> + } > >>> +} > >>> + > >>> /* > >>> * This is the lockd kernel thread > >>> */ > >>> @@ -149,10 +159,7 @@ lockd(void *vrqstp) > >>> > >>> if (signalled()) { > >>> flush_signals(current); > >>> - if (nlmsvc_ops) { > >>> - nlmsvc_invalidate_all(); > >>> - set_grace_period(); > >>> - } > >>> + restart_grace(); > >>> continue; > >>> } > >>> > >>> >