Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail.candelatech.com ([208.74.158.172]:50004 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756706Ab3JQRfm (ORCPT ); Thu, 17 Oct 2013 13:35:42 -0400 Received: from [192.168.100.236] (firewall.candelatech.com [70.89.124.249]) (authenticated bits=0) by ns3.lanforge.com (8.14.2/8.14.2) with ESMTP id r9HHZgDU000898 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 17 Oct 2013 10:35:42 -0700 Message-ID: <52601FED.6070708@candelatech.com> Date: Thu, 17 Oct 2013 10:35:41 -0700 From: Ben Greear MIME-Version: 1.0 To: "linux-nfs@vger.kernel.org" Subject: Re: 'umount -f /mnt/foo' fails if server IP is gone. References: <525D899F.5010604@candelatech.com> In-Reply-To: <525D899F.5010604@candelatech.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 10/15/2013 11:29 AM, Ben Greear wrote: > Is 'umount -f' supposed to always work, even if the file server > goes away? > > I have a user's system that just hangs forever in this case. > > Could be local changes we have made, but I'm curious about > the expected behaviour before I go digging too deep... Any input on this? I don't mind trying to fix it, but I would like to know how it is supposed to work. Older kernels do not hang (we tried 3.0.x), but I'm not sure exactly where the problem started. Test case was to set up NFSv3 mount, then pull the Ethernet cable on the nfs client machine. This system is running 3.9.11+ kernel. From /proc/mounts: 10.2.46.90:/nfs_export on /mnt/lf/nfs3-001 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.2.46.90,mountvers=3,mountport=19408,mountproto=udp,srcaddr=10.2.46.91,local_lock=none,addr=10.2.46.90) # umount /mnt/lf/nfs3-001 ^C # umount -f /mnt/lf/nfs3-001 [hangs forever it seems, certainly for a long time] Here is a stack trace of hung processes, for instance: Oct 17 10:24:18 localhost kernel: [688601.930366] SysRq : Show Blocked State Oct 17 10:24:18 localhost kernel: [688601.931016] task PC stack pid father Oct 17 10:24:18 localhost kernel: [688601.931016] mkdir D f1bf6700 0 16898 16831 0x00000082 Oct 17 10:24:18 localhost kernel: [688601.931016] f070bd8c 00000046 00000282 f1bf6700 f5b55a20 c0d7e400 f5b55a20 c0d7e400 Oct 17 10:24:18 localhost kernel: [688601.931016] c0d7e400 c0d7e400 c0d7e400 f79e9400 f5b55a20 f79e9400 f5b55a20 f58b19c0 Oct 17 10:24:18 localhost kernel: [688601.931016] f8dc4fd0 f070bd50 f0ce9924 f070bd50 f8ec6bff f070bd94 f8dbf9f7 ee91a138 Oct 17 10:24:18 localhost kernel: [688601.931016] Call Trace: Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? rpc_put_task+0xf/0x20 [sunrpc] Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? nfs_initiate_write+0xb7/0xe0 [nfs] Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? ktime_get_ts+0x3e/0x110 Oct 17 10:24:18 localhost kernel: [688601.931016] [] schedule+0x23/0x60 Oct 17 10:24:18 localhost kernel: [688601.931016] [] io_schedule+0x76/0xc0 Oct 17 10:24:18 localhost kernel: [688601.931016] [] sleep_on_page+0xd/0x20 Oct 17 10:24:18 localhost kernel: [688601.931016] [] __wait_on_bit+0x4d/0x70 Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? __lock_page+0x90/0x90 Oct 17 10:24:18 localhost kernel: [688601.931016] [] wait_on_page_bit+0x91/0xa0 Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? wake_atomic_t_function+0x50/0x50 Oct 17 10:24:18 localhost kernel: [688601.931016] [] filemap_fdatawait_range+0xcb/0x150 Oct 17 10:24:18 localhost kernel: [688601.931016] [] filemap_write_and_wait_range+0x97/0xb0 Oct 17 10:24:18 localhost kernel: [688601.931016] [] nfs_file_fsync+0x44/0xa0 [nfs] Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? nfs_file_fsync_commit+0xb0/0xb0 [nfs] Oct 17 10:24:18 localhost kernel: [688601.931016] [] vfs_fsync_range+0x59/0x70 Oct 17 10:24:18 localhost kernel: [688601.931016] [] vfs_fsync+0x27/0x30 Oct 17 10:24:18 localhost kernel: [688601.931016] [] nfs_file_flush+0x6b/0x90 [nfs] Oct 17 10:24:18 localhost kernel: [688601.931016] [] filp_close+0x31/0x80 Oct 17 10:24:18 localhost kernel: [688601.931016] [] put_files_struct+0x85/0xe0 Oct 17 10:24:18 localhost kernel: [688601.931016] [] exit_files+0x47/0x60 Oct 17 10:24:18 localhost kernel: [688601.931016] [] do_exit+0x25c/0x980 Oct 17 10:24:18 localhost kernel: [688601.931016] [] ? SyS_stat64+0x2e/0x40 Oct 17 10:24:18 localhost kernel: [688601.931016] [] do_group_exit+0x3e/0xa0 Oct 17 10:24:18 localhost kernel: [688601.931016] [] SyS_exit_group+0x18/0x20 Oct 17 10:24:18 localhost kernel: [688601.931016] [] sysenter_do_call+0x12/0x28 Oct 17 10:24:18 localhost kernel: [688601.931016] umount.nfs D f11c4900 0 17150 17149 0x00000080 Oct 17 10:24:18 localhost kernel: [688602.225057] f3955d00 00000082 efea0d8c f11c4900 f3955c8c c08d9f96 f104e700 c0d7e400 Oct 17 10:24:18 localhost kernel: [688602.225057] c0d7e400 c0d7e400 c0d7e400 efea0d8c efea0c80 f79db400 f104e700 c0c3e980 Oct 17 10:24:18 localhost kernel: [688602.225057] f3955cd0 f3955cb4 f3955e90 0000002c 0000005c 132df575 efea0d80 0000005c Oct 17 10:24:18 localhost kernel: [688602.225057] Call Trace: Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? __kfree_skb+0x36/0x90 Oct 17 10:24:18 localhost kernel: [688602.225057] [] schedule+0x23/0x60 Oct 17 10:24:18 localhost kernel: [688602.225057] [] rpc_wait_bit_killable+0x2d/0x70 [sunrpc] Oct 17 10:24:18 localhost kernel: [688602.225057] [] __wait_on_bit+0x4d/0x70 Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] Oct 17 10:24:18 localhost kernel: [688602.225057] [] out_of_line_wait_on_bit+0xab/0xc0 Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? wake_atomic_t_function+0x50/0x50 Oct 17 10:24:18 localhost kernel: [688602.225057] [] __rpc_execute+0x11e/0x290 [sunrpc] Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? rpcproc_decode_null+0x10/0x10 [sunrpc] Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? rpcproc_decode_null+0x10/0x10 [sunrpc] Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? wake_up_bit+0x5f/0x70 Oct 17 10:24:18 localhost kernel: [688602.225057] [] rpc_execute+0x3c/0xa0 [sunrpc] Oct 17 10:24:18 localhost kernel: [688602.225057] [] rpc_run_task+0x59/0x70 [sunrpc] Oct 17 10:24:18 localhost kernel: [688602.225057] [] rpc_call_sync+0x42/0xa0 [sunrpc] Oct 17 10:24:18 localhost kernel: [688602.225057] [] nfs3_rpc_wrapper.clone.0+0x5c/0xa0 [nfsv3] Oct 17 10:24:18 localhost kernel: [688602.225057] [] nfs3_proc_getattr+0x34/0x40 [nfsv3] Oct 17 10:24:18 localhost kernel: [688602.225057] [] __nfs_revalidate_inode+0xc7/0x140 [nfs] Oct 17 10:24:18 localhost kernel: [688602.225057] [] nfs_revalidate_inode+0x2f/0x60 [nfs] Oct 17 10:24:18 localhost kernel: [688602.225057] [] nfs_weak_revalidate+0x38/0x50 [nfs] Oct 17 10:24:18 localhost kernel: [688602.225057] [] complete_walk+0xa8/0xf0 Oct 17 10:24:18 localhost kernel: [688602.225057] [] path_lookupat+0x63/0x690 Oct 17 10:24:18 localhost kernel: [688602.225057] [] filename_lookup+0x2e/0xc0 Oct 17 10:24:18 localhost kernel: [688602.225057] [] user_path_at_empty+0x43/0x80 Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? __d_free+0x2e/0x50 Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? security_capable+0x1c/0x30 Oct 17 10:24:18 localhost kernel: [688602.225057] [] user_path_at+0x1f/0x30 Oct 17 10:24:18 localhost kernel: [688602.225057] [] SyS_umount+0x83/0x380 Oct 17 10:24:18 localhost kernel: [688602.225057] [] ? __audit_syscall_exit+0x1f6/0x290 Oct 17 10:24:18 localhost kernel: [688602.225057] [] sysenter_do_call+0x12/0x28 .... Oct 17 10:24:42 localhost kernel: [688631.186190] INFO: task mkdir:16898 blocked for more than 180 seconds. Oct 17 10:24:42 localhost kernel: [688631.195666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 17 10:24:42 localhost kernel: [688631.206304] mkdir D f1bf6700 0 16898 16831 0x00000082 Oct 17 10:24:42 localhost kernel: [688631.215220] f070bd8c 00000046 00000282 f1bf6700 f5b55a20 c0d7e400 f5b55a20 c0d7e400 Oct 17 10:24:42 localhost kernel: [688631.225933] c0d7e400 c0d7e400 c0d7e400 f79e9400 f5b55a20 f79e9400 f5b55a20 f58b19c0 Oct 17 10:24:42 localhost kernel: [688631.236712] f8dc4fd0 f070bd50 f0ce9924 f070bd50 f8ec6bff f070bd94 f8dbf9f7 ee91a138 Oct 17 10:24:42 localhost kernel: [688631.247550] Call Trace: Oct 17 10:24:42 localhost kernel: [688631.252746] [] ? rpc_put_task+0xf/0x20 [sunrpc] Oct 17 10:24:42 localhost kernel: [688631.261369] [] ? nfs_initiate_write+0xb7/0xe0 [nfs] Oct 17 10:24:42 localhost kernel: [688631.270065] [] ? ktime_get_ts+0x3e/0x110 Oct 17 10:24:42 localhost kernel: [688631.277724] [] schedule+0x23/0x60 Oct 17 10:24:42 localhost kernel: [688631.285298] [] io_schedule+0x76/0xc0 Oct 17 10:24:42 localhost kernel: [688631.292738] [] sleep_on_page+0xd/0x20 Oct 17 10:24:42 localhost kernel: [688631.300316] [] __wait_on_bit+0x4d/0x70 Oct 17 10:24:42 localhost kernel: [688631.308117] [] ? __lock_page+0x90/0x90 Oct 17 10:24:42 localhost kernel: [688631.315731] [] wait_on_page_bit+0x91/0xa0 Oct 17 10:24:42 localhost kernel: [688631.323630] [] ? wake_atomic_t_function+0x50/0x50 Oct 17 10:24:42 localhost kernel: [688631.332536] [] filemap_fdatawait_range+0xcb/0x150 Oct 17 10:24:42 localhost kernel: [688631.341221] [] filemap_write_and_wait_range+0x97/0xb0 Oct 17 10:24:42 localhost kernel: [688631.350224] [] nfs_file_fsync+0x44/0xa0 [nfs] Oct 17 10:24:42 localhost kernel: [688631.358569] [] ? nfs_file_fsync_commit+0xb0/0xb0 [nfs] Oct 17 10:24:42 localhost kernel: [688631.367764] [] vfs_fsync_range+0x59/0x70 Oct 17 10:24:42 localhost kernel: [688631.375818] [] vfs_fsync+0x27/0x30 Oct 17 10:24:42 localhost kernel: [688631.383346] [] nfs_file_flush+0x6b/0x90 [nfs] Oct 17 10:24:42 localhost kernel: [688631.392117] [] filp_close+0x31/0x80 Oct 17 10:24:42 localhost kernel: [688631.399741] [] put_files_struct+0x85/0xe0 Oct 17 10:24:42 localhost kernel: [688631.407871] [] exit_files+0x47/0x60 Oct 17 10:24:42 localhost kernel: [688631.415535] [] do_exit+0x25c/0x980 Oct 17 10:24:42 localhost kernel: [688631.423133] [] ? SyS_stat64+0x2e/0x40 Oct 17 10:24:42 localhost kernel: [688631.431078] [] do_group_exit+0x3e/0xa0 Oct 17 10:24:42 localhost kernel: [688631.439103] [] SyS_exit_group+0x18/0x20 Oct 17 10:24:42 localhost kernel: [688631.447169] [] sysenter_do_call+0x12/0x28 Oct 17 10:24:54 localhost kernel: [688643.517069] RPC: AUTH_GSS upcall timed out. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com