Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp773892ybz; Fri, 17 Apr 2020 09:47:49 -0700 (PDT) X-Google-Smtp-Source: APiQypK8b8WNmSDXbJiA1dCJLxEujNBI5HEVAp42ed7P3uFemc9HZBedEA1QUd5FzBpGEA5Io7Lb X-Received: by 2002:a17:906:c9ce:: with SMTP id hk14mr3849602ejb.314.1587142069622; Fri, 17 Apr 2020 09:47:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587142069; cv=none; d=google.com; s=arc-20160816; b=qawBmDCsT3aZMKBdPVA0kEIzqiIse5NDxCgdEhCZgU57eI1P5d2WVo/3t9aFqnZ9w0 vis7XHDDAAIAW3NgVkUrscLwZzR/bTW/JDxAHIrzoT/X6t4DyUyrbheT0gxew5rp7sZl pHBmw1uyhj5IQbZuq1kzLh6lz33HCbf5jukyCXE/YoWKEpXGV9IM2Zglguu4lncDmg+C qon/lexEej6DF/5fEDN0DOy15hq3kzOH5sdpg/mLMi6OyYP9Amb0OkFmGD6i3rIN6R1s mHmjAH5IzVeK1MtcNFfcpAnIdSrrM9cu8Vq64LSO1x1TZgWDKXdSo8dkj4nGPQnKJfEf vM2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=cUliEBi+ERFAFP46IFgCR44yakpNosHPlLMv4V9Zq0o=; b=xnEvuZCQwLLTveatEf24LStaTRf8Gxbefp6OhszHcCzK96Wey7gUwG0yVmTtsjDcjh NOqoSxNTIH14PIdOb9n10uZeyUF6YrUi0OgHp/vOdxtAZsTYLAnxcvrk+7vhl6SWZDZJ 2k4FNRLCO/rbPDJiJQibFFtnxYZXCfFn+Zre/LHQxTc/4f/CF71jMHePrxww4vEyZrC+ SApM6RJWUTaLsMRnsq/MX9eKzhietHc7NaFHYFg5rYDifyG7G26WOSpqu/63UzeZYngf Y1GXjSyFHdZXn+4IPiC9XWBp+r9CNcl/wLeja3smVEWwss5hUBftYA1hhjsjNkrB9/S2 LXjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=XkW+g8uJ; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qh10si3112738ejb.49.2020.04.17.09.47.14; Fri, 17 Apr 2020 09:47:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=XkW+g8uJ; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726390AbgDQQrL (ORCPT + 99 others); Fri, 17 Apr 2020 12:47:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726387AbgDQQrL (ORCPT ); Fri, 17 Apr 2020 12:47:11 -0400 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B88A4C061A0C for ; Fri, 17 Apr 2020 09:47:10 -0700 (PDT) Received: by mail-wm1-x342.google.com with SMTP id u127so2602949wmg.1 for ; Fri, 17 Apr 2020 09:47:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cUliEBi+ERFAFP46IFgCR44yakpNosHPlLMv4V9Zq0o=; b=XkW+g8uJ9yfAZkuRa67S40V7TNJ84Ams9c/2NJ11hpxzEcRvibV3zGjDVadlbDtmjH QF9m6ddNc/vnKrDGFYvmsxrPExVyKHO/TogeaevQHxLncC3DbS3z45nUhlu2xX7ykqZJ XYJHjJBckylgsYroC1D/Io8281XSMoiMWa9zmCzF7ysA3s8IAWPxirX+n33FdZMBlKHc amIxICsZvoC2xD/ELF7K5ijrE7aeNlnPpa307LxV6V1pBELplAMnQq8bIoiPUM5meBFn I0dvQUZWdJrr1manl5TPyeTs1dXMW0hYqXPBYHapi1aVzfnNawnhqW+8KBXUPAD79jQW nMaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cUliEBi+ERFAFP46IFgCR44yakpNosHPlLMv4V9Zq0o=; b=etxxXyuyyC8pLfQ1L2I1k8SKjOcNew4lJccQkgVzGF246N8oX2oNe6544eO6ErEkxB oZEg0koBtnDsucLl8cVYAYpGmps79bXxtZbBk+1uB9hdSDYta/0uFp2RwWG6J1/d0HO1 RwcwXhrpdrOiK5ktvPI0jh+OzmJdYmqxczP5x7+J9p7/8R+YaQfL3wpt20OnqgLn8fM1 a2ipIc23SFAliDBrI1Gj9+GyPS1XIX9bnFs18lCYTGAyh7oLZQh89sKPYFjdStfIfJO0 S2Xl+RRC3MO0kWO0G27uWWL0QSlO8V46KI5SFsVzwpU1EjMZfkEIRPW9Hptn5f8xN5kX lWVg== X-Gm-Message-State: AGi0PuZbE9RQO25r+bY6J7Sv86i8uUpSdORZACGpMHIsYcrIGANlA9Hx imj9WXtLGhkPUkmiWpYuOYhv2Z1uU13Ctu2C0iBojA== X-Received: by 2002:a1c:dc0a:: with SMTP id t10mr4108434wmg.113.1587142029368; Fri, 17 Apr 2020 09:47:09 -0700 (PDT) MIME-Version: 1.0 References: <20200417151540.22111-1-olga.kornievskaia@gmail.com> <9c6c72708f360f543e2b8caaf56cc074aa825c96.camel@hammerspace.com> <7dd1b9300d2a0ec1a31fb3879c62a94f535ccad5.camel@hammerspace.com> In-Reply-To: <7dd1b9300d2a0ec1a31fb3879c62a94f535ccad5.camel@hammerspace.com> From: Olga Kornievskaia Date: Fri, 17 Apr 2020 12:46:58 -0400 Message-ID: Subject: Re: [PATCH 1/1] NFSv4.1: fix lone sequence transport assignment To: Trond Myklebust Cc: "linux-nfs@vger.kernel.org" , "anna.schumaker@netapp.com" Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Fri, Apr 17, 2020 at 12:20 PM Trond Myklebust wrote: > > On Fri, 2020-04-17 at 11:43 -0400, Olga Kornievskaia wrote: > > Hi Trond, > > > > On Fri, Apr 17, 2020 at 11:31 AM Trond Myklebust > > wrote: > > > Hi Olga, > > > > > > On Fri, 2020-04-17 at 11:15 -0400, Olga Kornievskaia wrote: > > > > When nconnect is used, SEQUENCE operation currently isn't bound > > > > to > > > > a particular transport. The problem is created on an idle mount, > > > > where SEQUENCE is the only operation being sent and opened TPC > > > > connections are slowly being close from the lack of use. If > > > > SEQUENCE > > > > is not assigned to the main connection, the main connection can > > > > be closed and with that so is the back channel bound to that > > > > connection. > > > > > > > > Since the only way client handles callback_path down is by > > > > sending > > > > BIND_CONN_TO_SESSION requesting to bind both backchannel and fore > > > > channel on the connection that was left going, but that > > > > connection > > > > was already bound to only forechannel. According to the spec, > > > > it's > > > > not allowed to change channel binding after they are done. > > > > > > > > The fix is to make sure that a lone SEQUENCE always goes on the > > > > main connection, keeping backchannel alive. > > > > > > > > Fixes: 5a0c257f8 ("NFS: send state management on a single > > > > connection") > > > > Signed-off-by: Olga Kornievskaia > > > > --- > > > > fs/nfs/nfs4proc.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > > > index 99e9f2e..461f85d 100644 > > > > --- a/fs/nfs/nfs4proc.c > > > > +++ b/fs/nfs/nfs4proc.c > > > > @@ -8857,7 +8857,7 @@ static struct rpc_task > > > > *_nfs41_proc_sequence(struct nfs_client *clp, > > > > .rpc_client = clp->cl_rpcclient, > > > > .rpc_message = &msg, > > > > .callback_ops = &nfs41_sequence_ops, > > > > - .flags = RPC_TASK_ASYNC | RPC_TASK_TIMEOUT, > > > > + .flags = RPC_TASK_ASYNC | RPC_TASK_TIMEOUT | > > > > RPC_TASK_NO_ROUND_ROBIN, > > > > }; > > > > struct rpc_task *ret; > > > > > > > > > > This works only in the case where the client is only sending > > > SEQUENCE > > > instructions. There are other cases where it could be sending out > > > other > > > operations that also renew the lease, but is doing it very > > > infrequently. Won't that also run into the same problem? > > > > Hm... I see so main channel can still go idle and close, when > > infrequent operations are happening on the other connections before > > round-robin-ing to the main connection.... > > > > > Is the fundamental problem here that we're not handling the > > > SEQ4_STATUS_CB_PATH_DOWN / SEQ4_STATUS_CB_PATH_DOWN_SESSION flags > > > correctly or is there something else going on? > > > > Yes the client doesn't recover properly. But the fix wasn't trivial > > to > > me (so I thought my patch was enough but I see it's not). Say client > > shuts down the main connection because it was idle. Now whatever > > operations goes on a different connection is going to get callback > > down. The only way the client can create a new backchannel (according > > to the spec) is if it creates a brand new connection and sends > > BIND_CONN_TO_SESSION there (all existing connections are already > > bound > > to fore channel and according to the spec you can't modify the > > existing binding). But then we'd need to make sure that it's the > > first > > one in the list of connections we iterate thru (as i think 1st marks > > the main connection?) as the other operations that supposed to only > > go > > on main connection need to know which connection to pick. > > > > The reason it's not seen against linux is because it doesn't follow > > the spec is doesn't reject attempts to bind a backchannel to an > > already existing connection that was only bound for fore channel. > > > > Oh, I see. So the server is replying NFS4ERR_INVAL in order to let the > client know that it is trying to change the channel bindings for that > connection. Well server isn't failing because client is asking for FORE_OR_BOTH and it's a choice so server is returning FORE. I'm not sure we can ask the server to fail the request with ERR_INVAL.... (rather I can ask but ) rather can we expect the server to do that always? > Hmm... Is there any reason why we can't just add a handler to > nfs4_bind_one_conn_to_session_done() that intercepts NFS4ERR_INVAL, and > disconnects the xprt before retrying? > We should probably add a wrapper to xprt_force_disconnect() in > include/linux/sunrpc/clnt.h. Something like the following? > > > static inline void rpc_task_close_connection(struct rpc_task *task) > { > if (task->tk_xprt) > xprt_force_disconnect(task->tk_xprt); > } > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >