Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail-ie0-f176.google.com ([209.85.223.176]:50516 "EHLO mail-ie0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755779AbaCRSpN (ORCPT ); Tue, 18 Mar 2014 14:45:13 -0400 Received: by mail-ie0-f176.google.com with SMTP id rd18so7390898iec.7 for ; Tue, 18 Mar 2014 11:45:13 -0700 (PDT) Message-ID: <1395168308.11244.3.camel@leira.trondhjem.org> Subject: Re: [PATCH 1/2] SUNRPC: Ensure that call_connect times out correctly From: Trond Myklebust To: Steve Dickson Cc: linux-nfs@vger.kernel.org Date: Tue, 18 Mar 2014 14:45:08 -0400 In-Reply-To: <53288146.4010601@RedHat.com> References: <1395081645-11906-1-git-send-email-trond.myklebust@primarydata.com> <53286A9D.2020007@RedHat.com> <362845B0-35A4-4DDF-96F6-42582D66334B@primarydata.com> <53288146.4010601@RedHat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Tue, 2014-03-18 at 13:24 -0400, Steve Dickson wrote: > > On 03/18/2014 11:58 AM, Trond Myklebust wrote: > > > > On Mar 18, 2014, at 11:47, Steve Dickson wrote: > > > >> Hey, > >> > >> On 03/17/2014 02:40 PM, Trond Myklebust wrote: > >>> When the server is unavailable due to a networking error, etc, we want > >>> the RPC client to respect the timeout delays when attempting to reconnect. > >>> > >>> Fixes: 561ec1603171 (SUNRPC: call_connect_status should recheck bind..) > >>> Signed-off-by: Trond Myklebust > >>> --- > >>> net/sunrpc/clnt.c | 8 +++----- > >>> 1 file changed, 3 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > >>> index 0edada973434..f22d3a115fda 100644 > >>> --- a/net/sunrpc/clnt.c > >>> +++ b/net/sunrpc/clnt.c > >>> @@ -1798,10 +1798,6 @@ call_connect_status(struct rpc_task *task) > >>> trace_rpc_connect_status(task, status); > >>> task->tk_status = 0; > >>> switch (status) { > >>> - /* if soft mounted, test if we've timed out */ > >>> - case -ETIMEDOUT: > >>> - task->tk_action = call_timeout; > >>> - return; > >>> case -ECONNREFUSED: > >>> case -ECONNRESET: > >>> case -ECONNABORTED: > >>> @@ -1812,7 +1808,9 @@ call_connect_status(struct rpc_task *task) > >>> if (RPC_IS_SOFTCONN(task)) > >>> break; > >>> case -EAGAIN: > >>> - task->tk_action = call_bind; > >>> + case -ETIMEDOUT: > >>> + /* Check if we've timed out before looping back to call_bind */ > >>> + task->tk_action = call_timeout; > >>> return; > >>> case 0: > >>> clnt->cl_stats->netreconn++; > >>> > >> How is this support to work if the trunking code still ignores timeouts? > >> > >> [ 2076.045176] NFS: nfs4_discover_server_trunking after status -110, retrying > > > > The above patch fixes the regression that Neil tracked down in Linux 3.12, and that > > affects the generic RPC handling of soft timeouts. > > > > The trunking code's handling of ETIMEDOUT has been there since Linux 3.7 > > and hasn’t changed, so I really don’t see how it can have worked at one time before 3.12. > Maybe it been broken that long.... :-) > > But here is the obvious loop that stop that hangs a mount forever: > > #8 [ffff88007a22b7e8] rpc_call_sync at ffffffffa0220210 [sunrpc] > #9 [ffff88007a22b840] nfs4_proc_setclientid at ffffffffa0505c49 [nfsv4] > #10 [ffff88007a22b988] nfs40_discover_server_trunking at ffffffffa0514489 [nfsv4] > #11 [ffff88007a22b9d0] nfs4_discover_server_trunking at ffffffffa0516f2d [nfsv4] > #12 [ffff88007a22ba28] nfs4_init_client at ffffffffa051e9a4 [nfsv4] > #13 [ffff88007a22bb20] nfs_get_client at ffffffffa04bd6ba [nfs] > #14 [ffff88007a22bb80] nfs4_set_client at ffffffffa051dfb0 [nfsv4] > #15 [ffff88007a22bc00] nfs4_create_server at ffffffffa051f4ce [nfsv4] > #16 [ffff88007a22bc88] nfs4_remote_mount at ffffffffa051790e [nfsv4] > #17 [ffff88007a22bcb0] mount_fs at ffffffff811b3dd9 > > The SETCLIENT times out > NFS call setclientid auth=UNIX, 'Linux NFSv4.0 10.19.60.77/10.19.60.33 tcp' > NFS reply setclientid: -110 > > The nfs4_discover_server_trunking() retries > NFS: nfs4_discover_server_trunking after status -110, retrying > > The happens when there server is down and so the connections > fail with ECONNREFUSED: > RPC: 2 call_connect_status (status -111) > > The mount system call never times out in which it did in the past. Why should a mount system call time out other than perhaps in the case of a soft mount? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com