Return-Path: Received: from mx2.netapp.com ([216.240.18.37]:51713 "EHLO mx2.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752466Ab0BEXFX (ORCPT ); Fri, 5 Feb 2010 18:05:23 -0500 Cc: "Alexandros Batsakis" , , Message-ID: <77EBFB14-A6B6-41DC-90DC-7A00548DFAEA@netapp.com> From: "Batsakis, Alexandros" To: "Chuck Lever" In-Reply-To: <4B6C9FA7.2010702@oracle.com> Content-Type: text/plain; format=flowed; delsp=yes; charset="us-ascii" Subject: Re: [PATCH 6/6] RPC: adjust timeout for connect, bind, restablish so that they sensitive to the major time out value Date: Fri, 5 Feb 2010 15:04:03 -0800 References: <1265155576-7618-1-git-send-email-batsakis@netapp.com> <1265155576-7618-2-git-send-email-batsakis@netapp.com> <1265155576-7618-3-git-send-email-batsakis@netapp.com> <1265155576-7618-4-git-send-email-batsakis@netapp.com> <1265155576-7618-5-git-send-email-batsakis@netapp.com> <1265155576-7618-6-git-send-email-batsakis@netapp.com> <1265155576-7618-7-git-send-email-batsakis@netapp.com> <4B6C7BCA.2040806@oracle.com> <383F4881-BD88-4155-B605-4D24F5B05BDD@netapp.com> <4B6C9FA7.2010702@oracle.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Feb 5, 2010, at 14:47, "Chuck Lever" wrote: > On 02/05/2010 05:14 PM, Batsakis, Alexandros wrote: >> Yeah sure, >> >> So imagine that for a specific connection the remaining major timeo >> value is 30secs. Xs_connect has a default timeout before attempting >> to >> reconnect of 60secs. The user (NFS) expects to "hear back" from the >> rpc >> layer within the timeout as in often cases e.g. lease renewal, it's >> of >> no benefit for an operation to reach the server at a later time and >> miss >> the critical time because it was sleeping for an arbitrary amount >> of time. >> >> My patch ensures that the reconnection attempts happen within the >> user-specified limits by replacing the 60secs default timeout with a >> value that is less than the time to expire. If this is not ideal the >> user should readjust the timeout value. >> >> Does it make sense? > > That's better, but this explanation needs to accompany the patch, as > the long description. It's hard for reviewers who are not familiar > with the constraints on NFSv4 RENEW to understand what you're trying > to fix. Same comment applies to the other patches in your series, I > think. (And see further comments in the code below). > Yeah you are right about that > You are changing a generic part of the RPC client to fix an issue > with one specific NFSv4 procedure, it seems to me. Did you actually > observe a situation where the reconnect delay caused the server to > miss a RENEW request? > Yeah, it's very easy to reproduce too > There are good reasons why there is a timeout before reestablishing > a connection. Have you tested this patch with NFSv3 servers that > are going up and down repeatedly, for example? I think skipping the > reconnect delay could have consequences for the cases which the > original xs_connect logic is supposed to address, and it's not > something we want in many other cases. > I am not skipping the reconnect delay. What i am saying is that it seems wrong to me to hard-code the reconnection delay. Why 60secs for example ? To me it seems that this value should be related to the timeout. Do you disagree ? > Perhaps a better idea would be to mark these particular RPCs with > some kind of indication that the RPC client has to connect > immediately for this request, if possible. Similar to > RPC_TASK_SOFTCONN. > In general, sunrpc.ko has a problem with this kind of "urgent" RPC > request. If the RPC backlog queue is large for a particular > rpc_clnt, it can often take many seconds (like longer than the major > timeout) for a request to actually get down to the transport and get > sent. I don't see that these timeout changes necessarily address > that at all. > Bu if the timeout has expired rpc_execute will quit the task anyways. What is the downside of instead of sleeping for a long, arbitrary period to wake up and poll the server at intervals that actually make some sense to the client? > I would suggest that the best solution might be a separate transport > for such requests, but that has other consequences. Maybe our RPC > client needs a separate "urgent" request queue that bypasses the > normal backlog queue for such time-dependent requests? > A seperate transport for requests that will fail as soon as the timeout expires and not later? Because, as I said, the request is gonna fail anyways. I dont't think that this should be treated as a special case, but that's just my opinion. Maybe your experience says differently. I ll address the code comments when we aggree how to proceed. Thanks! -alexandros >> -alexandros >> >> On Feb 5, 2010, at 12:14, "Chuck Lever" >> wrote: >> >>> Hi Alexandros- >>> >>> I think we need a little more than the short description for these >>> patches. Can you explain why the extra logic in xs_connect is >>> necessary? What exactly happens if the RENEW doesn't reach the >>> server >>> because the transport can't be reconnected? >>> >>> In other words, what problem are you trying to address here? >>> >>> On 02/02/2010 07:06 PM, Alexandros Batsakis wrote: >>>> Signed-off-by: Alexandros Batsakis >>>> --- >>>> net/sunrpc/clnt.c | 2 +- >>>> net/sunrpc/sunrpc.h | 2 ++ >>>> net/sunrpc/xprt.c | 18 +++++++++++++++--- >>>> net/sunrpc/xprtsock.c | 12 ++++++++++-- >>>> 4 files changed, 28 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c >>>> index 154034b..8e552cd 100644 >>>> --- a/net/sunrpc/clnt.c >>>> +++ b/net/sunrpc/clnt.c >>>> @@ -1019,7 +1019,7 @@ call_bind(struct rpc_task *task) >>>> task->tk_action = call_connect; >>>> if (!xprt_bound(xprt)) { >>>> task->tk_action = call_bind_status; >>>> - task->tk_timeout = xprt->bind_timeout; >>>> + xprt_set_timeout_sane(task, xprt->bind_timeout); >>>> xprt->ops->rpcbind(task); >>>> } >>>> } >>>> diff --git a/net/sunrpc/sunrpc.h b/net/sunrpc/sunrpc.h >>>> index 90c292e..87c7aaa 100644 >>>> --- a/net/sunrpc/sunrpc.h >>>> +++ b/net/sunrpc/sunrpc.h >>>> @@ -37,6 +37,8 @@ struct rpc_buffer { >>>> char data[]; >>>> }; >>>> >>>> +void xprt_set_timeout_sane(struct rpc_task *task, unsigned long >>>> timeout); >>>> + >>>> static inline int rpc_reply_expected(struct rpc_task *task) >>>> { >>>> return (task->tk_msg.rpc_proc != NULL)&& >>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >>>> index 469de29..f6f3988 100644 >>>> --- a/net/sunrpc/xprt.c >>>> +++ b/net/sunrpc/xprt.c >>>> @@ -482,7 +482,7 @@ void xprt_wait_for_buffer_space(struct rpc_task >>>> *task, rpc_action action) >>>> struct rpc_rqst *req = task->tk_rqstp; >>>> struct rpc_xprt *xprt = req->rq_xprt; >>>> >>>> - task->tk_timeout = req->rq_timeout; >>>> + xprt_set_timeout_sane(task, req->rq_timeout); >>>> rpc_sleep_on(&xprt->pending, task, action); >>>> } >>>> EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space); >>>> @@ -518,7 +518,7 @@ EXPORT_SYMBOL_GPL(xprt_write_space); >>>> */ >>>> void xprt_set_retrans_timeout_def(struct rpc_task *task) >>>> { >>>> - task->tk_timeout = task->tk_rqstp->rq_timeout; >>>> + xprt_set_timeout_sane(task, task->tk_rqstp->rq_timeout); >>>> } >>>> EXPORT_SYMBOL_GPL(xprt_set_retrans_timeout_def); >>>> >>>> @@ -682,6 +682,18 @@ out_abort: >>>> spin_unlock(&xprt->transport_lock); >>>> } >>>> >>>> +void xprt_set_timeout_sane(struct rpc_task *task, unsigned long >>>> timeout) >>> >>> xprt_set_timeout_sane? That's amusing, but it isn't very descriptive >>> to someone who doesn't know why this extra function is needed. >>> >>>> +{ >>>> + unsigned long majorto = task->tk_rqstp->rq_majortimeo; >>>> + >>>> + if (majorto<= jiffies) >>>> + task->tk_timeout = 5 * HZ; >>>> + else if (timeout> majorto - jiffies) >>>> + task->tk_timeout = 2 * (majorto - jiffies) / 3; >>>> + else >>>> + task->tk_timeout = timeout; >>>> +} >>>> + >>>> /** >>>> * xprt_connect - schedule a transport connect operation >>>> * @task: RPC task that is requesting the connect >>>> @@ -710,7 +722,7 @@ void xprt_connect(struct rpc_task *task) >>>> if (task->tk_rqstp) >>>> task->tk_rqstp->rq_bytes_sent = 0; >>>> >>>> - task->tk_timeout = xprt->connect_timeout; >>>> + xprt_set_timeout_sane(task, xprt->connect_timeout); >>>> rpc_sleep_on(&xprt->pending, task, xprt_connect_status); >>>> xprt->stat.connect_start = jiffies; >>>> xprt->ops->connect(task); >>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >>>> index 721bafd..7fcc97f 100644 >>>> --- a/net/sunrpc/xprtsock.c >>>> +++ b/net/sunrpc/xprtsock.c >>>> @@ -2020,9 +2020,17 @@ static void xs_connect(struct rpc_task >>>> *task) >>>> return; >>>> >>>> if (transport->sock != NULL&& !RPC_IS_SOFTCONN(task)) { >>>> + unsigned long majorto = task->tk_rqstp->rq_majortimeo; >>>> + >>>> + if (majorto<= jiffies) >>>> + xprt->reestablish_timeout = 5 * HZ; >>>> + else if (xprt->reestablish_timeout> majorto - jiffies) >>>> + xprt->reestablish_timeout = 2 * (majorto - jiffies) >>>> + / 3; >>> >>> This looks like the code you added above in _sane. Can you reuse >>> that >>> code here? >>> >>>> + >>>> dprintk("RPC: xs_connect delayed xprt %p for %lu " >>>> - "seconds\n", >>>> - xprt, xprt->reestablish_timeout / HZ); >>>> + "seconds\n", >>>> + xprt, xprt->reestablish_timeout + (HZ - 1) / HZ); >>>> queue_delayed_work(rpciod_workqueue, >>>> &transport->connect_worker, >>>> xprt->reestablish_timeout); >>> >>> >>> -- >>> chuck[dot]lever[at]oracle[dot]com > > > -- > chuck[dot]lever[at]oracle[dot]com