Return-Path: linux-nfs-owner@vger.kernel.org Received: from userp1040.oracle.com ([156.151.31.81]:18241 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758539Ab3JQSDX convert rfc822-to-8bit (ORCPT ); Thu, 17 Oct 2013 14:03:23 -0400 Content-Type: text/plain; charset=US-ASCII Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: 'umount -f /mnt/foo' fails if server IP is gone. From: Chuck Lever In-Reply-To: <52601FED.6070708@candelatech.com> Date: Thu, 17 Oct 2013 14:03:05 -0400 Cc: Linux NFS Mailing List , Jeff Layton Message-Id: <758E075A-2DCD-4456-848F-FD983FECF5E5@oracle.com> References: <525D899F.5010604@candelatech.com> <52601FED.6070708@candelatech.com> To: Ben Greear Sender: linux-nfs-owner@vger.kernel.org List-ID: On Oct 17, 2013, at 1:35 PM, Ben Greear wrote: > On 10/15/2013 11:29 AM, Ben Greear wrote: >> Is 'umount -f' supposed to always work, even if the file server >> goes away? >> >> I have a user's system that just hangs forever in this case. >> >> Could be local changes we have made, but I'm curious about >> the expected behaviour before I go digging too deep... > > Any input on this? I don't mind trying to fix it, but I > would like to know how it is supposed to work. Recent kernels emit a GETATTR at umount time. It is probably this operation that is stuck. > Older kernels do not hang (we tried 3.0.x), but I'm not sure > exactly where the problem started. > > Test case was to set up NFSv3 mount, then pull the Ethernet cable > on the nfs client machine. This system is running 3.9.11+ kernel. > > From /proc/mounts: > > 10.2.46.90:/nfs_export on /mnt/lf/nfs3-001 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.2.46.90,mountvers=3,mountport=19408,mountproto=udp,srcaddr=10.2.46.91,local_lock=none,addr=10.2.46.90) > > # umount /mnt/lf/nfs3-001 > ^C > # umount -f /mnt/lf/nfs3-001 > [hangs forever it seems, certainly for a long time] > > > Here is a stack trace of hung processes, for instance: > > Oct 17 10:24:18 localhost kernel: [688601.930366] SysRq : Show Blocked State > Oct 17 10:24:18 localhost kernel: [688601.931016] task PC stack pid father > Oct 17 10:24:18 localhost kernel: [688601.931016] mkdir D f1bf6700 0 16898 16831 0x00000082 > Oct 17 10:24:18 localhost kernel: [688601.931016] f070bd8c 00000046 00000282 f1bf6700 f5b55a20 c0d7e400 f5b55a20 c0d7e400 > Oct 17 10:24:18 localhost kernel: [688601.931016] c0d7e400 c0d7e400 c0d7e400 f79e9400 f5b55a20 f79e9400 f5b55a20 f58b19c0 > Oct 17 10:24:18 localhost kernel: [688601.931016] f8dc4fd0 f070bd50 f0ce9924 f070bd50 f8ec6bff f070bd94 f8dbf9f7 ee91a138 > Oct 17 10:24:18 localhost kernel: [688601.931016] Call Trace: > Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? rpc_put_task+0xf/0x20 [sunrpc] > Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? nfs_initiate_write+0xb7/0xe0 [nfs] > Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? ktime_get_ts+0x3e/0x110 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] schedule+0x23/0x60 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] io_schedule+0x76/0xc0 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] sleep_on_page+0xd/0x20 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] __wait_on_bit+0x4d/0x70 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? __lock_page+0x90/0x90 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] wait_on_page_bit+0x91/0xa0 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? wake_atomic_t_function+0x50/0x50 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] filemap_fdatawait_range+0xcb/0x150 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] filemap_write_and_wait_range+0x97/0xb0 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] nfs_file_fsync+0x44/0xa0 [nfs] > Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? nfs_file_fsync_commit+0xb0/0xb0 [nfs] > Oct 17 10:24:18 localhost kernel: [688601.931016] [] vfs_fsync_range+0x59/0x70 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] vfs_fsync+0x27/0x30 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] nfs_file_flush+0x6b/0x90 [nfs] > Oct 17 10:24:18 localhost kernel: [688601.931016] [] filp_close+0x31/0x80 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] put_files_struct+0x85/0xe0 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] exit_files+0x47/0x60 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] do_exit+0x25c/0x980 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? SyS_stat64+0x2e/0x40 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] do_group_exit+0x3e/0xa0 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] SyS_exit_group+0x18/0x20 > Oct 17 10:24:18 localhost kernel: [688601.931016] [] sysenter_do_call+0x12/0x28 > Oct 17 10:24:18 localhost kernel: [688601.931016] umount.nfs D f11c4900 0 17150 17149 0x00000080 > Oct 17 10:24:18 localhost kernel: [688602.225057] f3955d00 00000082 efea0d8c f11c4900 f3955c8c c08d9f96 f104e700 c0d7e400 > Oct 17 10:24:18 localhost kernel: [688602.225057] c0d7e400 c0d7e400 c0d7e400 efea0d8c efea0c80 f79db400 f104e700 c0c3e980 > Oct 17 10:24:18 localhost kernel: [688602.225057] f3955cd0 f3955cb4 f3955e90 0000002c 0000005c 132df575 efea0d80 0000005c > Oct 17 10:24:18 localhost kernel: [688602.225057] Call Trace: > Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? __kfree_skb+0x36/0x90 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] schedule+0x23/0x60 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] rpc_wait_bit_killable+0x2d/0x70 [sunrpc] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] __wait_on_bit+0x4d/0x70 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] out_of_line_wait_on_bit+0xab/0xc0 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? wake_atomic_t_function+0x50/0x50 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] __rpc_execute+0x11e/0x290 [sunrpc] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? rpcproc_decode_null+0x10/0x10 [sunrpc] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? rpcproc_decode_null+0x10/0x10 [sunrpc] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? wake_up_bit+0x5f/0x70 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] rpc_execute+0x3c/0xa0 [sunrpc] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] rpc_run_task+0x59/0x70 [sunrpc] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] rpc_call_sync+0x42/0xa0 [sunrpc] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] nfs3_rpc_wrapper.clone.0+0x5c/0xa0 [nfsv3] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] nfs3_proc_getattr+0x34/0x40 [nfsv3] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] __nfs_revalidate_inode+0xc7/0x140 [nfs] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] nfs_revalidate_inode+0x2f/0x60 [nfs] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] nfs_weak_revalidate+0x38/0x50 [nfs] > Oct 17 10:24:18 localhost kernel: [688602.225057] [] complete_walk+0xa8/0xf0 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] path_lookupat+0x63/0x690 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] filename_lookup+0x2e/0xc0 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] user_path_at_empty+0x43/0x80 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? __d_free+0x2e/0x50 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? security_capable+0x1c/0x30 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] user_path_at+0x1f/0x30 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] SyS_umount+0x83/0x380 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? __audit_syscall_exit+0x1f6/0x290 > Oct 17 10:24:18 localhost kernel: [688602.225057] [] sysenter_do_call+0x12/0x28 > > .... > > Oct 17 10:24:42 localhost kernel: [688631.186190] INFO: task mkdir:16898 blocked for more than 180 seconds. > Oct 17 10:24:42 localhost kernel: [688631.195666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 17 10:24:42 localhost kernel: [688631.206304] mkdir D f1bf6700 0 16898 16831 0x00000082 > Oct 17 10:24:42 localhost kernel: [688631.215220] f070bd8c 00000046 00000282 f1bf6700 f5b55a20 c0d7e400 f5b55a20 c0d7e400 > Oct 17 10:24:42 localhost kernel: [688631.225933] c0d7e400 c0d7e400 c0d7e400 f79e9400 f5b55a20 f79e9400 f5b55a20 f58b19c0 > Oct 17 10:24:42 localhost kernel: [688631.236712] f8dc4fd0 f070bd50 f0ce9924 f070bd50 f8ec6bff f070bd94 f8dbf9f7 ee91a138 > Oct 17 10:24:42 localhost kernel: [688631.247550] Call Trace: > Oct 17 10:24:42 localhost kernel: [688631.252746] [] ? rpc_put_task+0xf/0x20 [sunrpc] > Oct 17 10:24:42 localhost kernel: [688631.261369] [] ? nfs_initiate_write+0xb7/0xe0 [nfs] > Oct 17 10:24:42 localhost kernel: [688631.270065] [] ? ktime_get_ts+0x3e/0x110 > Oct 17 10:24:42 localhost kernel: [688631.277724] [] schedule+0x23/0x60 > Oct 17 10:24:42 localhost kernel: [688631.285298] [] io_schedule+0x76/0xc0 > Oct 17 10:24:42 localhost kernel: [688631.292738] [] sleep_on_page+0xd/0x20 > Oct 17 10:24:42 localhost kernel: [688631.300316] [] __wait_on_bit+0x4d/0x70 > Oct 17 10:24:42 localhost kernel: [688631.308117] [] ? __lock_page+0x90/0x90 > Oct 17 10:24:42 localhost kernel: [688631.315731] [] wait_on_page_bit+0x91/0xa0 > Oct 17 10:24:42 localhost kernel: [688631.323630] [] ? wake_atomic_t_function+0x50/0x50 > Oct 17 10:24:42 localhost kernel: [688631.332536] [] filemap_fdatawait_range+0xcb/0x150 > Oct 17 10:24:42 localhost kernel: [688631.341221] [] filemap_write_and_wait_range+0x97/0xb0 > Oct 17 10:24:42 localhost kernel: [688631.350224] [] nfs_file_fsync+0x44/0xa0 [nfs] > Oct 17 10:24:42 localhost kernel: [688631.358569] [] ? nfs_file_fsync_commit+0xb0/0xb0 [nfs] > Oct 17 10:24:42 localhost kernel: [688631.367764] [] vfs_fsync_range+0x59/0x70 > Oct 17 10:24:42 localhost kernel: [688631.375818] [] vfs_fsync+0x27/0x30 > Oct 17 10:24:42 localhost kernel: [688631.383346] [] nfs_file_flush+0x6b/0x90 [nfs] > Oct 17 10:24:42 localhost kernel: [688631.392117] [] filp_close+0x31/0x80 > Oct 17 10:24:42 localhost kernel: [688631.399741] [] put_files_struct+0x85/0xe0 > Oct 17 10:24:42 localhost kernel: [688631.407871] [] exit_files+0x47/0x60 > Oct 17 10:24:42 localhost kernel: [688631.415535] [] do_exit+0x25c/0x980 > Oct 17 10:24:42 localhost kernel: [688631.423133] [] ? SyS_stat64+0x2e/0x40 > Oct 17 10:24:42 localhost kernel: [688631.431078] [] do_group_exit+0x3e/0xa0 > Oct 17 10:24:42 localhost kernel: [688631.439103] [] SyS_exit_group+0x18/0x20 > Oct 17 10:24:42 localhost kernel: [688631.447169] [] sysenter_do_call+0x12/0x28 > Oct 17 10:24:54 localhost kernel: [688643.517069] RPC: AUTH_GSS upcall timed out. > > > Thanks, > Ben > > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com