Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp851700ybt; Wed, 8 Jul 2020 13:21:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxtAflDi9IJH7zIDxFnbu3Qe22l6VLiRodfDcZPtrLQs19VxinrL+2PpkEoj0O+Jw5PuE1d X-Received: by 2002:a17:906:5657:: with SMTP id v23mr55030897ejr.196.1594239699066; Wed, 08 Jul 2020 13:21:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594239699; cv=none; d=google.com; s=arc-20160816; b=Cl6G364FwzdZE2dibfb9UiT6iqUUcqFbmu2FQlk5ZcnWMzF3GcpEtHs3orUjMxjnfG 8kH0WATQUE4YeExwonAq99PaO5C78J9d3kQ4qs74+B2Qzmqdkebwy9QRhN2ZkDsaWixj DBklCu0YHyaVyzP2abyW8365ckDuX4zwNhupWfBRGrn5hflr0I145ZcyF0sd/u3yujQF M5uJmRrKuICXfqJxZh+8NAcb+Ckop7k/lh1vU+ozHUIjGRNi2QU5HAsc/L3NpQUlkT/D FfIfV7OrTfS08Ji6XlO84eUlz7pa2CsDnl9pi9svxIk3qJPnuethElVpI6dwudV2V5AU CMJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=I2PfEQPxgZY4hwjqV2T4P8YyR49aWSh/8TcC3UdSwTo=; b=ynf6Zb+csHhLvqHSB+KqXbprJOauAlL/vhez8UiWIIxGdFFyWyPmk0CGI9UEL7cwxR +FQhItz0PH/zizibMhGx4ymnyJ4J9uQMaRxDrh/XkifzZtbcna20hpjVtAXKoo3SaZQq 1RMKXy/0SKHehhz3GElYo3jj2/pTzOSFyMEXtLWcubUNWJw6JHTJkBGm2m+0RFwzjxEg E582oLkmB65N38nXctivn5HC1Aqd8N8y8CrN3CID3+w6KibjM5SKrZ2yu6GhKa/mgzir rXu2Gxx7KT84rZ+y9UiIVB7IgmwOwoTs2JU44rv1rYB9VSYWE+vcQcTSBjFrSySXWxLX RKRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="a6NM/8t/"; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i2si609934eds.437.2020.07.08.13.21.05; Wed, 08 Jul 2020 13:21:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="a6NM/8t/"; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726506AbgGHUTy (ORCPT + 99 others); Wed, 8 Jul 2020 16:19:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726122AbgGHUTx (ORCPT ); Wed, 8 Jul 2020 16:19:53 -0400 Received: from mail-ej1-x643.google.com (mail-ej1-x643.google.com [IPv6:2a00:1450:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97A5BC061A0B for ; Wed, 8 Jul 2020 13:19:53 -0700 (PDT) Received: by mail-ej1-x643.google.com with SMTP id dp18so51845574ejc.8 for ; Wed, 08 Jul 2020 13:19:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=I2PfEQPxgZY4hwjqV2T4P8YyR49aWSh/8TcC3UdSwTo=; b=a6NM/8t/C1nW0y2F+3GU/dJ3Mxc4RHm9xCUN9wLdl5rFCfpsmPTWG3bK5vW/jc/25K W19tGKWvwS9WWGX3BUT7geP0IqZzbWDPrYcsUQExQgLGgM1qT+ls+RDTGvMOFrpKubLy 730/MfPtnBN3BR2galbYIwDWqjuoMcX5g1YUeMY14OyPiJvcW7OMNn21M3eMC4zzUpXo NMyYYC9WXSlz9G2nYcC+BcM0GUgEAA4WkOkXd/dxZ5F4NyqcrLl1deMe5m1DooEJPkRo 9Lx/blJX0QKAzhPGdJccbkFlthgfBKhxByNBJEczHxd+W7MIrUDuNaKhdtViSyAA1Tpc eSgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=I2PfEQPxgZY4hwjqV2T4P8YyR49aWSh/8TcC3UdSwTo=; b=az6PW4y3bKBhaQC04t8NcKXvxcfu8t6LKMgdvKEoi9HK6vj+bLmMVX7+e27o8pnD5a AA/B+XTCHffbpUVejyGzzzKD++TrCEVuukbKBXOjpg953CNFpS7oSLo+Eu9c6qdC7ePl lXa+QiIONvLK2v65orxs8okub2jrJeJyF+iOt1ZV1lS48nblFCEazvPcOT20kFzU9ZSm ZVoVF2a+0nMlOtIRrSN9fSaJhQrWzlJsrxw3zFdGex8NRqPgScAcPbjDdadJfLdWf0Gn xLo0l8LVQt+asXcmukWjNEmvaan7ECGHCAG9tPj+tyuW9XxTeT++FAEwXeMD2905zVio DTzg== X-Gm-Message-State: AOAM531CWs3x4CEE2NU0ZUxwGxGRyclq1YkuoSIkmG+VHEkuICApGx7p JfG3sxfrfkUbll6lDWbsDY4hLA19ddd1A6FYoTphJw== X-Received: by 2002:a17:906:70d1:: with SMTP id g17mr51951860ejk.436.1594239592232; Wed, 08 Jul 2020 13:19:52 -0700 (PDT) MIME-Version: 1.0 References: <20200708155018.110150-1-Anna.Schumaker@Netapp.com> <20200708155018.110150-2-Anna.Schumaker@Netapp.com> <25e89e208bd3c6e44f8041d64c96be238b78c3b6.camel@hammerspace.com> In-Reply-To: From: Anna Schumaker Date: Wed, 8 Jul 2020 16:19:36 -0400 Message-ID: Subject: Re: [PATCH v1 1/1] NFS: Fix interrupted slots by sending a solo SEQUENCE operation To: Trond Myklebust Cc: "linux-nfs@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, Jul 8, 2020 at 12:08 PM Anna Schumaker wrote: > > On Wed, Jul 8, 2020 at 12:00 PM Trond Myklebust wrote: > > > > On Wed, 2020-07-08 at 11:50 -0400, schumaker.anna@gmail.com wrote: > > > From: Anna Schumaker > > > > > > We used to do this before 3453d5708b33, but this was changed to > > > better > > > handle the NFS4ERR_SEQ_MISORDERED error code. This commit fixed the > > > slot > > > re-use case when the server doesn't receive the interrupted > > > operation, > > > but if the server does receive the operation then it could still end > > > up > > > replying to the client with mis-matched operations from the reply > > > cache. > > > > > > We can fix this by sending a SEQUENCE to the server while recovering > > > from > > > a SEQ_MISORDERED error when we detect that we are in an interrupted > > > slot > > > situation. > > > > > > Fixes: 3453d5708b33 (NFSv4.1: Avoid false retries when RPC calls are > > > interrupted) > > > Signed-off-by: Anna Schumaker > > > --- > > > fs/nfs/nfs4proc.c | 17 +++++++++++++++-- > > > 1 file changed, 15 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > > index e32717fd1169..5de41a5772f0 100644 > > > --- a/fs/nfs/nfs4proc.c > > > +++ b/fs/nfs/nfs4proc.c > > > @@ -774,6 +774,14 @@ static void nfs4_slot_sequence_acked(struct > > > nfs4_slot *slot, > > > slot->seq_nr_last_acked = seqnr; > > > } > > > > > > +static void nfs4_probe_sequence(struct nfs_client *client, const > > > struct cred *cred, > > > + struct nfs4_slot *slot) > > > +{ > > > + struct rpc_task *task = _nfs41_proc_sequence(client, cred, > > > slot, true); > > > + if (!IS_ERR(task)) > > > + rpc_wait_for_completion_task(task); > > > > Hmm... I am a little concerned about the wait here, since we don't know > > what kind of thread this is. I've been playing with this all afternoon. > > > > Any chance we could kick off a _nfs41_proc_sequence asynchronously, and > > then perhaps requeue the original task to wait for the next free slot? I haven't had much luck getting this to work. The asynchronous task is easy enough, but I haven't been able to get the original onto a new slot yet. Is there a good way to do this without a new call to nfs4_setup_sequence()? nfs41_sequence_process() only has the sequence_res available, and there are enough call sites that adding in sequence_args creates a lot of churn. > > I suppose one issue there would be if the 'original task is an earlier > > call to _nfs41_proc_sequence, but perhaps that can be worked around? I could use the rpc task to see if it's sending a sequence, and only do this if it's not. I don't know if there is a cleaner way to do this. Do you have any suggestions? Anna > > I'll try it and see what happens. Thanks for the feedback! > Anna > > > > > > +} > > > + > > > static int nfs41_sequence_process(struct rpc_task *task, > > > struct nfs4_sequence_res *res) > > > { > > > @@ -790,6 +798,7 @@ static int nfs41_sequence_process(struct rpc_task > > > *task, > > > goto out; > > > > > > session = slot->table->session; > > > + clp = session->clp; > > > > > > trace_nfs4_sequence_done(session, res); > > > > > > @@ -804,7 +813,6 @@ static int nfs41_sequence_process(struct rpc_task > > > *task, > > > nfs4_slot_sequence_acked(slot, slot->seq_nr); > > > /* Update the slot's sequence and clientid lease timer > > > */ > > > slot->seq_done = 1; > > > - clp = session->clp; > > > do_renew_lease(clp, res->sr_timestamp); > > > /* Check sequence flags */ > > > nfs41_handle_sequence_flag_errors(clp, res- > > > >sr_status_flags, > > > @@ -852,10 +860,15 @@ static int nfs41_sequence_process(struct > > > rpc_task *task, > > > /* > > > * Were one or more calls using this slot interrupted? > > > * If the server never received the request, then our > > > - * transmitted slot sequence number may be too high. > > > + * transmitted slot sequence number may be too high. > > > However, > > > + * if the server did receive the request then it might > > > + * accidentally give us a reply with a mismatched > > > operation. > > > + * We can sort this out by sending a lone sequence > > > operation > > > + * to the server on the same slot. > > > */ > > > if ((s32)(slot->seq_nr - slot->seq_nr_last_acked) > 1) > > > { > > > slot->seq_nr--; > > > + nfs4_probe_sequence(clp, task->tk_msg.rpc_cred, > > > slot); > > > goto retry_nowait; > > > } > > > /* > > -- > > Trond Myklebust > > CTO, Hammerspace Inc > > 4984 El Camino Real, Suite 208 > > Los Altos, CA 94022 > > www.hammer.space > > > >