From: Wang Chen Subject: Re: [PATCH] nfs lockd: detect grace_list corruption Date: Tue, 12 May 2009 08:43:41 +0800 Message-ID: <4A08C63D.5080002@cn.fujitsu.com> References: <49F12D78.2040304@cn.fujitsu.com> <20090424231252.GD22477@fieldses.org> <4A0155A0.4020008@cn.fujitsu.com> <20090506203227.GM9861@fieldses.org> <4A0284EB.9050202@cn.fujitsu.com> <20090508182648.GD20539@fieldses.org> <4A07C21E.3020309@cn.fujitsu.com> <20090511205741.GH793@fieldses.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: neilb@suse.de, Trond.Myklebust@netapp.com, linux-nfs@vger.kernel.org, FNST-Bian Naimeng To: "J. Bruce Fields" Return-path: Received: from cn.fujitsu.com ([222.73.24.84]:55381 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753000AbZELAoZ (ORCPT ); Mon, 11 May 2009 20:44:25 -0400 In-Reply-To: <20090511205741.GH793@fieldses.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: J. Bruce Fields said the following on 2009-5-12 4:57: > On Mon, May 11, 2009 at 02:13:50PM +0800, Wang Chen wrote: >> J. Bruce Fields said the following on 2009-5-9 2:26: >>> On Thu, May 07, 2009 at 02:51:23PM +0800, Wang Chen wrote: >>>> J. Bruce Fields said the following on 2009-5-7 4:32: >>>>> On Wed, May 06, 2009 at 05:17:20PM +0800, Wang Chen wrote: >>>>>> J. Bruce Fields said the following on 2009-4-25 7:12: >>>>>>> On Fri, Apr 24, 2009 at 11:09:44AM +0800, Wang Chen wrote: >>>>>>>> Although I can't reproduce it now, it really happened that some lock manager >>>>>>>> started grace period but didn't end it. >>>>>>>> This causes an lm entry be left in grace_list, and when service nfs restart, >>>>>>>> the same lm will be added again into the list. >>>>>>>> As you know, adding an entry, which is in the list, to a list will leads to >>>>>>>> list corruption. >>>>>>> I'd really like to understand why locks_end_grace() isn't being called. >>>>>>> I'm probably overlooking something obvious, but I just can't see how >>>>>>> lockd or nfsd can be shut down right now without locks_end_grace() being >>>>>>> called. >>>>>>> >>>>>> Me neither can figure out why locks_end_grace() isn't being called. >>>>>> >>>>>> But do locks_start_grace() twice can trigger this warning too. >>>>>> You can do >>>>>> 1. service nfs restart >>>>>> 2. (immediately) kill -s SIGKILL lockd >>>>>> this can trigger >>>>>> --- >>>>>> lockd(void *vrqstp) >>>>>> ... >>>>>> if (signalled()) { >>>>>> flush_signals(current); >>>>>> if (nlmsvc_ops) { >>>>>> nlmsvc_invalidate_all(); >>>>>> set_grace_period(); >>>>>> --- >>>>>> and makes locks_start_grace() be called twice without locks_end_grace(). >>>>> Ah-hah! >>>>> >>>>>> So I still suggest to do something to protect the lm list. :) >>>>> I wouldn't be opposed to a simple WARN_ON(!list_empty()) in >>>>> locks_start_grace(), but I'm mainly worried about fixing the original >>>>> bug. How about the following? >>>>> >>>> Yeah, the following fix is OK to me, although it only fixed >>>> "start_grace again after start_grace" case. >>> OK, thanks. >>> >>>> The bug about "quit lockd without end_grace", which I encountered before >>>> incidentally, maybe is still there. >>> You're talking about the report that started this thread?: >>> >>> http://marc.info/?l=linux-nfs&m=124054262421444&w=2 >>> >> Yes. I mean this. >> >>> It looks to me like that could be explained by two start_grace's in a >>> row. >>> >> But in that report, I didn't post the total message. >> Here are something show that: >> 1. not only lockd has the problem, but nfsd also. >> 2. every time I do "service nfs restart", I got the warning, so this is not >> "two start_grace's in a row" problem. > > Once the list is corrupted, it stays corrupted, so that's expected; the > only interesting warning is the first one. > But as you see the logs, nfsd made list corrupted first. Your fix only "two start_grace's in a row" of lockd . > --b. > >> Following is more message I got on last month. >> ------------------------------------------------------ >> Apr 16 16:35:41 localhost mountd[15061]: Caught signal 15, un-registering and exiting. >> Apr 16 16:35:42 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 16 16:35:43 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 16 16:35:43 localhost kernel: ------------[ cut here ]------------ >> Apr 16 16:35:43 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() >> Apr 16 16:35:43 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) >> Apr 16 16:35:43 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). >> Apr 16 16:35:43 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] >> Apr 16 16:35:43 localhost kernel: Pid: 17455, comm: rpc.nfsd Tainted: G W 2.6.30-rc2 #3 >> Apr 16 16:35:43 localhost kernel: Call Trace: >> Apr 16 16:35:43 localhost kernel: [] warn_slowpath+0x71/0xa0 >> Apr 16 16:35:43 localhost kernel: [] ? nfsd4_build_namelist+0x0/0x8e [nfsd] >> Apr 16 16:35:43 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 16 16:35:43 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 16 16:35:43 localhost kernel: [] ? mntput_no_expire+0x1c/0x101 >> Apr 16 16:35:43 localhost kernel: [] ? dput+0x35/0x103 >> Apr 16 16:35:43 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 16 16:35:43 localhost kernel: [] __list_add+0x27/0x5c >> Apr 16 16:35:43 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] >> Apr 16 16:35:43 localhost kernel: [] nfs4_state_start+0x7a/0xdd [nfsd] >> Apr 16 16:35:43 localhost kernel: [] nfsd_svc+0x57/0xf9 [nfsd] >> Apr 16 16:35:43 localhost kernel: [] ? write_threads+0x0/0x59 [nfsd] >> Apr 16 16:35:43 localhost kernel: [] write_threads+0x35/0x59 [nfsd] >> Apr 16 16:35:43 localhost kernel: [] nfsctl_transaction_write+0x3b/0x58 [nfsd] >> Apr 16 16:35:43 localhost kernel: [] ? nfsctl_transaction_write+0x0/0x58 [nfsd] >> Apr 16 16:35:43 localhost kernel: [] vfs_write+0x7c/0xad >> Apr 16 16:35:43 localhost kernel: [] sys_write+0x3b/0x60 >> Apr 16 16:35:43 localhost kernel: [] sysenter_do_call+0x12/0x3c >> Apr 16 16:35:43 localhost kernel: ---[ end trace fa484bd6d19ade87 ]--- >> Apr 16 16:35:43 localhost kernel: NFSD: starting 90-second grace period >> ...snip... >> Apr 17 13:02:54 localhost mountd[17468]: Caught signal 15, un-registering and exiting. >> Apr 17 13:02:54 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 13:02:55 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 13:02:55 localhost kernel: ------------[ cut here ]------------ >> Apr 17 13:02:55 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() >> Apr 17 13:02:55 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) >> Apr 17 13:02:55 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). >> Apr 17 13:02:55 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] >> Apr 17 13:02:55 localhost kernel: Pid: 22642, comm: rpc.nfsd Tainted: G W 2.6.30-rc2 #3 >> Apr 17 13:02:55 localhost kernel: Call Trace: >> Apr 17 13:02:55 localhost kernel: [] warn_slowpath+0x71/0xa0 >> Apr 17 13:02:55 localhost kernel: [] ? nfsd4_build_namelist+0x0/0x8e [nfsd] >> Apr 17 13:02:55 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 13:02:55 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 17 13:02:55 localhost kernel: [] ? mntput_no_expire+0x1c/0x101 >> Apr 17 13:02:55 localhost kernel: [] ? dput+0x35/0x103 >> Apr 17 13:02:55 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 17 13:02:55 localhost kernel: [] __list_add+0x27/0x5c >> Apr 17 13:02:55 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] >> Apr 17 13:02:55 localhost kernel: [] nfs4_state_start+0x7a/0xdd [nfsd] >> Apr 17 13:02:55 localhost kernel: [] nfsd_svc+0x57/0xf9 [nfsd] >> Apr 17 13:02:55 localhost kernel: [] ? write_threads+0x0/0x59 [nfsd] >> Apr 17 13:02:55 localhost kernel: [] write_threads+0x35/0x59 [nfsd] >> Apr 17 13:02:55 localhost kernel: [] nfsctl_transaction_write+0x3b/0x58 [nfsd] >> Apr 17 13:02:55 localhost kernel: [] ? nfsctl_transaction_write+0x0/0x58 [nfsd] >> Apr 17 13:02:55 localhost kernel: [] vfs_write+0x7c/0xad >> Apr 17 13:02:55 localhost kernel: [] sys_write+0x3b/0x60 >> Apr 17 13:02:55 localhost kernel: [] sysenter_do_call+0x12/0x3c >> Apr 17 13:02:55 localhost kernel: ---[ end trace fa484bd6d19ade88 ]--- >> Apr 17 13:02:55 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 13:04:07 localhost mountd[22655]: Caught signal 15, un-registering and exiting. >> Apr 17 13:04:07 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 13:04:07 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 13:04:07 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 13:05:04 localhost mountd[22760]: Caught signal 15, un-registering and exiting. >> Apr 17 13:05:04 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 13:05:05 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 13:05:05 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 13:06:10 localhost mountd[22859]: Caught signal 15, un-registering and exiting. >> Apr 17 13:06:10 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 13:06:10 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 13:06:10 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 13:08:07 localhost mountd[22960]: Caught signal 15, un-registering and exiting. >> Apr 17 13:08:07 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 13:08:07 localhost kernel: ------------[ cut here ]------------ >> Apr 17 13:08:07 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() >> Apr 17 13:08:07 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) >> Apr 17 13:08:07 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). >> Apr 17 13:08:07 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] >> Apr 17 13:08:07 localhost kernel: Pid: 23062, comm: lockd Tainted: G W 2.6.30-rc2 #3 >> Apr 17 13:08:07 localhost kernel: Call Trace: >> Apr 17 13:08:07 localhost kernel: [] warn_slowpath+0x71/0xa0 >> Apr 17 13:08:07 localhost kernel: [] ? update_curr+0x11d/0x125 >> Apr 17 13:08:07 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 13:08:07 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd >> Apr 17 13:08:07 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 17 13:08:07 localhost kernel: [] __list_add+0x27/0x5c >> Apr 17 13:08:07 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] >> Apr 17 13:08:07 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] >> Apr 17 13:08:07 localhost kernel: [] ? lock_kernel+0x1c/0x28 >> Apr 17 13:08:07 localhost kernel: [] lockd+0x64/0x164 [lockd] >> Apr 17 13:08:07 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 13:08:07 localhost kernel: [] ? complete+0x34/0x3e >> Apr 17 13:08:07 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 13:08:07 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 13:08:07 localhost kernel: [] kthread+0x45/0x6b >> Apr 17 13:08:07 localhost kernel: [] ? kthread+0x0/0x6b >> Apr 17 13:08:07 localhost kernel: [] kernel_thread_helper+0x7/0x10 >> Apr 17 13:08:07 localhost kernel: ---[ end trace fa484bd6d19ade89 ]--- >> Apr 17 13:08:07 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 13:08:07 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 14:39:45 localhost mountd[23074]: Caught signal 15, un-registering and exiting. >> Apr 17 14:39:45 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 14:39:45 localhost kernel: ------------[ cut here ]------------ >> Apr 17 14:39:45 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() >> Apr 17 14:39:45 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) >> Apr 17 14:39:45 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). >> Apr 17 14:39:45 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] >> Apr 17 14:39:45 localhost kernel: Pid: 24287, comm: lockd Tainted: G W 2.6.30-rc2 #3 >> Apr 17 14:39:45 localhost kernel: Call Trace: >> Apr 17 14:39:45 localhost kernel: [] warn_slowpath+0x71/0xa0 >> Apr 17 14:39:45 localhost kernel: [] ? update_curr+0x11d/0x125 >> Apr 17 14:39:45 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 14:39:45 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd >> Apr 17 14:39:45 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 17 14:39:45 localhost kernel: [] __list_add+0x27/0x5c >> Apr 17 14:39:45 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] >> Apr 17 14:39:45 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] >> Apr 17 14:39:45 localhost kernel: [] ? lock_kernel+0x1c/0x28 >> Apr 17 14:39:45 localhost kernel: [] lockd+0x64/0x164 [lockd] >> Apr 17 14:39:45 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 14:39:45 localhost kernel: [] ? complete+0x34/0x3e >> Apr 17 14:39:45 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 14:39:45 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 14:39:45 localhost kernel: [] kthread+0x45/0x6b >> Apr 17 14:39:45 localhost kernel: [] ? kthread+0x0/0x6b >> Apr 17 14:39:45 localhost kernel: [] kernel_thread_helper+0x7/0x10 >> Apr 17 14:39:45 localhost kernel: ---[ end trace fa484bd6d19ade8a ]--- >> Apr 17 14:39:45 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 14:39:45 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 14:41:32 localhost mountd[24299]: Caught signal 15, un-registering and exiting. >> Apr 17 14:41:32 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 14:41:33 localhost kernel: ------------[ cut here ]------------ >> Apr 17 14:41:33 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() >> Apr 17 14:41:33 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) >> Apr 17 14:41:33 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). >> Apr 17 14:41:33 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] >> Apr 17 14:41:33 localhost kernel: Pid: 24399, comm: lockd Tainted: G W 2.6.30-rc2 #3 >> Apr 17 14:41:33 localhost kernel: Call Trace: >> Apr 17 14:41:33 localhost kernel: [] warn_slowpath+0x71/0xa0 >> Apr 17 14:41:33 localhost kernel: [] ? update_curr+0x11d/0x125 >> Apr 17 14:41:33 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 14:41:33 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd >> Apr 17 14:41:33 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 17 14:41:33 localhost kernel: [] __list_add+0x27/0x5c >> Apr 17 14:41:33 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] >> Apr 17 14:41:33 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] >> Apr 17 14:41:33 localhost kernel: [] ? lock_kernel+0x1c/0x28 >> Apr 17 14:41:33 localhost kernel: [] lockd+0x64/0x164 [lockd] >> Apr 17 14:41:33 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 14:41:33 localhost kernel: [] ? complete+0x34/0x3e >> Apr 17 14:41:33 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 14:41:33 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 14:41:33 localhost kernel: [] kthread+0x45/0x6b >> Apr 17 14:41:33 localhost kernel: [] ? kthread+0x0/0x6b >> Apr 17 14:41:33 localhost kernel: [] kernel_thread_helper+0x7/0x10 >> Apr 17 14:41:33 localhost kernel: ---[ end trace fa484bd6d19ade8b ]--- >> Apr 17 14:41:33 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 14:41:33 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 14:42:16 localhost mountd[24411]: Caught signal 15, un-registering and exiting. >> Apr 17 14:42:17 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 14:42:17 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 14:42:17 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 14:42:52 localhost mountd[24508]: Caught signal 15, un-registering and exiting. >> Apr 17 14:42:52 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 14:42:53 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 14:42:53 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 14:43:28 localhost mountd[24602]: Caught signal 15, un-registering and exiting. >> Apr 17 14:43:28 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 14:43:29 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 14:43:29 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 14:43:59 localhost mountd[24697]: Caught signal 15, un-registering and exiting. >> Apr 17 14:43:59 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 14:44:00 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 14:44:00 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 14:44:28 localhost mountd[24791]: Caught signal 15, un-registering and exiting. >> Apr 17 14:44:28 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 14:44:29 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 14:44:29 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 14:45:33 localhost mountd[24885]: Caught signal 15, un-registering and exiting. >> Apr 17 14:45:33 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 14:45:34 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 14:45:34 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 14:46:05 localhost mountd[24988]: Caught signal 15, un-registering and exiting. >> Apr 17 14:46:05 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 14:46:05 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 14:46:05 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 14:46:34 localhost mountd[25082]: Caught signal 15, un-registering and exiting. >> Apr 17 14:46:34 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 14:46:35 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 14:46:35 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 15:35:01 localhost mountd[25176]: Caught signal 15, un-registering and exiting. >> Apr 17 15:35:02 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 15:35:02 localhost kernel: ------------[ cut here ]------------ >> Apr 17 15:35:02 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() >> Apr 17 15:35:02 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) >> Apr 17 15:35:02 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). >> Apr 17 15:35:02 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] >> Apr 17 15:35:02 localhost kernel: Pid: 25883, comm: lockd Tainted: G W 2.6.30-rc2 #3 >> Apr 17 15:35:02 localhost kernel: Call Trace: >> Apr 17 15:35:02 localhost kernel: [] warn_slowpath+0x71/0xa0 >> Apr 17 15:35:02 localhost kernel: [] ? update_curr+0x11d/0x125 >> Apr 17 15:35:02 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 15:35:02 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd >> Apr 17 15:35:02 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 17 15:35:02 localhost kernel: [] __list_add+0x27/0x5c >> Apr 17 15:35:02 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] >> Apr 17 15:35:02 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] >> Apr 17 15:35:02 localhost kernel: [] ? lock_kernel+0x1c/0x28 >> Apr 17 15:35:02 localhost kernel: [] lockd+0x64/0x164 [lockd] >> Apr 17 15:35:02 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 15:35:02 localhost kernel: [] ? complete+0x34/0x3e >> Apr 17 15:35:02 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 15:35:02 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 15:35:02 localhost kernel: [] kthread+0x45/0x6b >> Apr 17 15:35:02 localhost kernel: [] ? kthread+0x0/0x6b >> Apr 17 15:35:02 localhost kernel: [] kernel_thread_helper+0x7/0x10 >> Apr 17 15:35:02 localhost kernel: ---[ end trace fa484bd6d19ade8c ]--- >> Apr 17 15:35:02 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 15:35:02 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 15:55:22 localhost mountd[25895]: Caught signal 15, un-registering and exiting. >> Apr 17 15:55:22 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 15:55:23 localhost kernel: ------------[ cut here ]------------ >> Apr 17 15:55:23 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() >> Apr 17 15:55:23 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) >> Apr 17 15:55:23 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). >> Apr 17 15:55:23 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] >> Apr 17 15:55:23 localhost kernel: Pid: 26230, comm: lockd Tainted: G W 2.6.30-rc2 #3 >> Apr 17 15:55:23 localhost kernel: Call Trace: >> Apr 17 15:55:23 localhost kernel: [] warn_slowpath+0x71/0xa0 >> Apr 17 15:55:23 localhost kernel: [] ? update_curr+0x11d/0x125 >> Apr 17 15:55:23 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 15:55:23 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd >> Apr 17 15:55:23 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 17 15:55:23 localhost kernel: [] __list_add+0x27/0x5c >> Apr 17 15:55:23 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] >> Apr 17 15:55:23 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] >> Apr 17 15:55:23 localhost kernel: [] ? lock_kernel+0x1c/0x28 >> Apr 17 15:55:23 localhost kernel: [] lockd+0x64/0x164 [lockd] >> Apr 17 15:55:23 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 15:55:23 localhost kernel: [] ? complete+0x34/0x3e >> Apr 17 15:55:23 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 15:55:23 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 15:55:23 localhost kernel: [] kthread+0x45/0x6b >> Apr 17 15:55:23 localhost kernel: [] ? kthread+0x0/0x6b >> Apr 17 15:55:23 localhost kernel: [] kernel_thread_helper+0x7/0x10 >> Apr 17 15:55:23 localhost kernel: ---[ end trace fa484bd6d19ade8d ]--- >> Apr 17 15:55:23 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 15:55:23 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 16:54:27 localhost mountd[26242]: Caught signal 15, un-registering and exiting. >> Apr 17 16:54:27 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 16:54:28 localhost kernel: ------------[ cut here ]------------ >> Apr 17 16:54:28 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() >> Apr 17 16:54:28 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) >> Apr 17 16:54:28 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). >> Apr 17 16:54:28 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] >> Apr 17 16:54:28 localhost kernel: Pid: 27044, comm: lockd Tainted: G W 2.6.30-rc2 #3 >> Apr 17 16:54:28 localhost kernel: Call Trace: >> Apr 17 16:54:28 localhost kernel: [] warn_slowpath+0x71/0xa0 >> Apr 17 16:54:28 localhost kernel: [] ? update_curr+0x11d/0x125 >> Apr 17 16:54:28 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 16:54:28 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd >> Apr 17 16:54:28 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 17 16:54:28 localhost kernel: [] __list_add+0x27/0x5c >> Apr 17 16:54:28 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] >> Apr 17 16:54:28 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] >> Apr 17 16:54:28 localhost kernel: [] ? lock_kernel+0x1c/0x28 >> Apr 17 16:54:28 localhost kernel: [] lockd+0x64/0x164 [lockd] >> Apr 17 16:54:28 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 16:54:28 localhost kernel: [] ? complete+0x34/0x3e >> Apr 17 16:54:28 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 16:54:28 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 16:54:28 localhost kernel: [] kthread+0x45/0x6b >> Apr 17 16:54:28 localhost kernel: [] ? kthread+0x0/0x6b >> Apr 17 16:54:28 localhost kernel: [] kernel_thread_helper+0x7/0x10 >> Apr 17 16:54:28 localhost kernel: ---[ end trace fa484bd6d19ade8e ]--- >> Apr 17 16:54:28 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 16:54:28 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 16:59:55 localhost mountd[27056]: Caught signal 15, un-registering and exiting. >> Apr 17 16:59:55 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 16:59:56 localhost kernel: ------------[ cut here ]------------ >> Apr 17 16:59:56 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() >> Apr 17 16:59:56 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) >> Apr 17 16:59:56 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). >> Apr 17 16:59:56 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] >> Apr 17 16:59:56 localhost kernel: Pid: 27197, comm: lockd Tainted: G W 2.6.30-rc2 #3 >> Apr 17 16:59:56 localhost kernel: Call Trace: >> Apr 17 16:59:56 localhost kernel: [] warn_slowpath+0x71/0xa0 >> Apr 17 16:59:56 localhost kernel: [] ? update_curr+0x11d/0x125 >> Apr 17 16:59:56 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 16:59:56 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd >> Apr 17 16:59:56 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 17 16:59:56 localhost kernel: [] __list_add+0x27/0x5c >> Apr 17 16:59:56 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] >> Apr 17 16:59:56 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] >> Apr 17 16:59:56 localhost kernel: [] ? lock_kernel+0x1c/0x28 >> Apr 17 16:59:56 localhost kernel: [] lockd+0x64/0x164 [lockd] >> Apr 17 16:59:56 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 16:59:56 localhost kernel: [] ? complete+0x34/0x3e >> Apr 17 16:59:56 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 16:59:56 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 16:59:56 localhost kernel: [] kthread+0x45/0x6b >> Apr 17 16:59:56 localhost kernel: [] ? kthread+0x0/0x6b >> Apr 17 16:59:56 localhost kernel: [] kernel_thread_helper+0x7/0x10 >> Apr 17 16:59:56 localhost kernel: ---[ end trace fa484bd6d19ade8f ]--- >> Apr 17 16:59:56 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 16:59:56 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 17:02:50 localhost mountd[27209]: Caught signal 15, un-registering and exiting. >> Apr 17 17:02:50 localhost kernel: nfsd: last server has exited, flushing export cache >> Apr 17 17:02:51 localhost kernel: ------------[ cut here ]------------ >> Apr 17 17:02:51 localhost kernel: WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c() >> Apr 17 17:02:51 localhost kernel: Hardware name: Presario M2000 (PT365PA#AB2) >> Apr 17 17:02:51 localhost kernel: list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128). >> Apr 17 17:02:51 localhost kernel: Modules linked in: fuse i915 drm i2c_algo_bit nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 p4_clockmod dm_multipath uinput snd_intel8x0m snd_intel8x0 snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore 8139cp firewire_ohci firewire_core snd_page_alloc tifm_7xx1 i2c_i801 iTCO_wdt 8139too tifm_core i2c_core yenta_socket crc_itu_t iTCO_vendor_support pcspkr mii rsrc_nonstatic wmi video output ata_generic pata_acpi [last unloaded: microcode] >> Apr 17 17:02:51 localhost kernel: Pid: 27349, comm: lockd Tainted: G W 2.6.30-rc2 #3 >> Apr 17 17:02:51 localhost kernel: Call Trace: >> Apr 17 17:02:51 localhost kernel: [] warn_slowpath+0x71/0xa0 >> Apr 17 17:02:51 localhost kernel: [] ? update_curr+0x11d/0x125 >> Apr 17 17:02:51 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 17:02:51 localhost kernel: [] ? trace_hardirqs_on+0xb/0xd >> Apr 17 17:02:51 localhost kernel: [] ? _raw_spin_lock+0x53/0xfa >> Apr 17 17:02:51 localhost kernel: [] __list_add+0x27/0x5c >> Apr 17 17:02:51 localhost kernel: [] locks_start_grace+0x22/0x30 [lockd] >> Apr 17 17:02:51 localhost kernel: [] set_grace_period+0x39/0x53 [lockd] >> Apr 17 17:02:51 localhost kernel: [] ? lock_kernel+0x1c/0x28 >> Apr 17 17:02:51 localhost kernel: [] lockd+0x64/0x164 [lockd] >> Apr 17 17:02:51 localhost kernel: [] ? trace_hardirqs_on_caller+0x18/0x150 >> Apr 17 17:02:51 localhost kernel: [] ? complete+0x34/0x3e >> Apr 17 17:02:51 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 17:02:51 localhost kernel: [] ? lockd+0x0/0x164 [lockd] >> Apr 17 17:02:51 localhost kernel: [] kthread+0x45/0x6b >> Apr 17 17:02:51 localhost kernel: [] ? kthread+0x0/0x6b >> Apr 17 17:02:51 localhost kernel: [] kernel_thread_helper+0x7/0x10 >> Apr 17 17:02:51 localhost kernel: ---[ end trace fa484bd6d19ade90 ]--- >> Apr 17 17:02:51 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory >> Apr 17 17:02:51 localhost kernel: NFSD: starting 90-second grace period >> Apr 17 17:08:09 localhost mountd[27361]: authenticated mount request from 10.167.141.101:695 for /tmp/nfs3 (/tmp/nfs3) >> >>> --b. >>> >>>>> --b. >>>>> >>>>> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c >>>>> index abf8388..1a54ae1 100644 >>>>> --- a/fs/lockd/svc.c >>>>> +++ b/fs/lockd/svc.c >>>>> @@ -104,6 +104,16 @@ static void set_grace_period(void) >>>>> schedule_delayed_work(&grace_period_end, grace_period); >>>>> } >>>>> >>>>> +static void restart_grace(void) >>>>> +{ >>>>> + if (nlmsvc_ops) { >>>>> + cancel_delayed_work_sync(&grace_period_end); >>>>> + locks_end_grace(&lockd_manager); >>>>> + nlmsvc_invalidate_all(); >>>>> + set_grace_period(); >>>>> + } >>>>> +} >>>>> + >>>>> /* >>>>> * This is the lockd kernel thread >>>>> */ >>>>> @@ -149,10 +159,7 @@ lockd(void *vrqstp) >>>>> >>>>> if (signalled()) { >>>>> flush_signals(current); >>>>> - if (nlmsvc_ops) { >>>>> - nlmsvc_invalidate_all(); >>>>> - set_grace_period(); >>>>> - } >>>>> + restart_grace(); >>>>> continue; >>>>> } >>>>> >>>>>