Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp99081ybz; Tue, 28 Apr 2020 19:10:00 -0700 (PDT) X-Google-Smtp-Source: APiQypJtyOaOrGnzPaKNm5mLYCGRwPC15gkqyoh5pkTNMJ8iUkaECWj3I2WTcdQ0IdGeOneLPeGm X-Received: by 2002:aa7:d518:: with SMTP id y24mr528897edq.222.1588126199973; Tue, 28 Apr 2020 19:09:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588126199; cv=none; d=google.com; s=arc-20160816; b=ltPEQL7u6lVv7yHUiw7NBKfA77kums/5eoyZ+D4XKG1fNLWrd6+21nM305R3qrtReL h2Ilc/kqAjMwbezYQL6EPjPbu96fjTsfTelDJPR1SVfEoh7mtZe5rdHMUsX3X5ElpfHC Lu7qPcbfJcZBbHby7emy+7WZZDklQzCE+JNFwn3wgZO0zFI3MAmG3XZNCfnhxrGPsaHb nHTOfDKXIqYuTLH5G/4teajKSmD5VV3EP6pdXsxfmDi3slsaLNoC6R1vahXvC48NbV/B ZYBYUsoiVwe2zbbjuT3E7/szDObbUUrLOgwJWfaaBzJ9sOTWc6HwLTugp/uhVVVukhT0 hDTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=RHYB7piIqSayE9OxHKV4cM1EI6X3+se2sir4MhViPkY=; b=Tq0HAhEGO9ULsa6K832qDj1YXY6LHwt+YB6+K2o/BsSMPh5FhfV11aeigB7qmAJSft LaaiZ1NsR8AnoOaryHr5SqhydyibfhDTB+aMpBGzc7aczsfFLcsh+5kukkxiytbi5Z5F jWq5z/KHRs9wAq+HYnNN5Z99tyqsuOZlgjo6s0CWmpq/nj8qJfrzy9upce7D+fxhKW5m DLLdumDcFN36+zmNdV4/qX1kPTmOBWEZCd7D7Hg0r13YEqDQnqkT4Bv+vA3T74/SDepo IpO8gHo21zpdKxe86HJdzq7E2GeeWInOyfOUNQHV4R4+VgaB41sQnU4vi1V6Db+NgMkT d2PQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@umich.edu header.s=google-2016-06-03 header.b=b+j6Gr0F; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=umich.edu Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r25si2700706edb.189.2020.04.28.19.09.22; Tue, 28 Apr 2020 19:09:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@umich.edu header.s=google-2016-06-03 header.b=b+j6Gr0F; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=umich.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726587AbgD2CG6 (ORCPT + 99 others); Tue, 28 Apr 2020 22:06:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726158AbgD2CG6 (ORCPT ); Tue, 28 Apr 2020 22:06:58 -0400 Received: from mail-ej1-x629.google.com (mail-ej1-x629.google.com [IPv6:2a00:1450:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6E16C03C1AC for ; Tue, 28 Apr 2020 19:06:57 -0700 (PDT) Received: by mail-ej1-x629.google.com with SMTP id s3so289824eji.6 for ; Tue, 28 Apr 2020 19:06:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=umich.edu; s=google-2016-06-03; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=RHYB7piIqSayE9OxHKV4cM1EI6X3+se2sir4MhViPkY=; b=b+j6Gr0FmFaZtSd14OYT6otb08x5E6BKAsipAvZWkb1ykCFrNBmDmKS3BTya0X3Mmm MSh2fCaFbCzRWEsy2kVr8sUfFdc2AyIXVBjLSUwjZKrYaTF8QeRTB05mmQzsx/qAYQpO LCVqrrCpxrGJWF8KivoCB3lMl/I62mCQM298UNXA/4OKqVlit/T0cFv2hSzdiXx4FpdL TPhtEpw0VFvH5z67UhYE6SDsSpytMOEmOZNrUxaYEvRpXKxx2Wo7uKVGZOVTe7r4Ju5q +JxjJy0I/9Mh6rgiNVU3uq5XfG1Cps9piLy0FwS5Hd6En9jdgTvQ/SNIdSurj07IS7RA wDfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RHYB7piIqSayE9OxHKV4cM1EI6X3+se2sir4MhViPkY=; b=Y0RmA0ZlFf3l6H4EVBby+A0hhCFdCm2UccWJ7LUbPkB5JRIZ1/2u2QGzDjsWsvKHZo w5TuHIla2WkO6mF22fFMqc1/X9KBbcNgS+fljs/lODeC1Hr6pXN+R8XKda/fjFtW7cPo fTuH2wQK4t4zr5I9X2AlTHydQ9IgoFUzhQ/gi1ujBY0+Y5G1R0CPffLT53G+2OE2SbRt 9wCAO8ExtYpfkZW8BwAYEXIffLWpgTYS4uEjHqWJQYnW/zSL9yuUBjx7V7h7eN+Z7EJN jUysCNMTCe/KE/XSVkG/TnqsIkYThN24VCsoPOg4TPxgyyrKfYIwmc/PU3GIIeZuWY2p 3VMg== X-Gm-Message-State: AGi0PuaFvtSqmQ9Lqpupx1YbHRV6DlgOhODirELkqg2WtLW38XUzmJPq MysqYjGuSf/9k0dAoDu2RrPpa/+lgbikFWzwrVwSbg== X-Received: by 2002:a17:906:2792:: with SMTP id j18mr569256ejc.215.1588126015306; Tue, 28 Apr 2020 19:06:55 -0700 (PDT) MIME-Version: 1.0 References: <98410608e028cb4b53024c7669e0fb70fea98214.camel@hammerspace.com> <98a10c8775e4127419ac57630f839744bdf1063d.camel@hammerspace.com> In-Reply-To: From: Olga Kornievskaia Date: Tue, 28 Apr 2020 22:06:44 -0400 Message-ID: Subject: Re: handling ERR_SERVERFAULT on RESTOREFH To: Trond Myklebust Cc: "linux-nfs@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, Apr 28, 2020 at 7:42 PM Trond Myklebust wrote: > > On Tue, 2020-04-28 at 19:02 -0400, Olga Kornievskaia wrote: > > On Tue, Apr 28, 2020 at 5:32 PM Trond Myklebust < > > trondmy@hammerspace.com> wrote: > > > On Tue, 2020-04-28 at 16:40 -0400, Olga Kornievskaia wrote: > > > > On Tue, Apr 28, 2020 at 2:47 PM Trond Myklebust < > > > > trondmy@hammerspace.com> wrote: > > > > > Hi Olga, > > > > > > > > > > On Tue, 2020-04-28 at 14:14 -0400, Olga Kornievskaia wrote: > > > > > > Hi folk, > > > > > > > > > > > > Looking for guidance on what folks think. A client is sending > > > > > > a > > > > > > LINK > > > > > > operation to the server. This compound after the LINK has > > > > > > RESTOREFH > > > > > > and GETATTR. Server returns SERVER_FAULT to on RESTOREFH. But > > > > > > LINK is > > > > > > done successfully. Client still fails the system call with > > > > > > EIO. > > > > > > We > > > > > > have a hardline and "ln" saying hardlink failed. > > > > > > > > > > > > Should the client not fail the system call in this case? The > > > > > > fact > > > > > > that > > > > > > we couldn't get up-to-date attributes don't seem like the > > > > > > reason > > > > > > to > > > > > > fail the system call? > > > > > > > > > > > > Thank you. > > > > > > > > > > I don't really see this as worth fixing on the client. It is > > > > > very > > > > > clearly a server bug. > > > > > > > > Why is that a server bug? A server can legitimately have an issue > > > > trying to execute an operation (RESTOREFH) and legitimately > > > > returning > > > > an error. > > > > > > If it is happening consistently on the server, then it is a bug, > > > and it > > > gets reported by the client in the same way we always report > > > NFS4ERR_SERVERFAULT, by converting to an EREMOTEIO. > > > > Yes but the client doesn't retry so it can't assess if it's > > consistently happening or not. It can be a transient error (or > > ENOMEM) > > that's later resolved. > > If the server wants to signal a transient error, it should send > NFS4ERR_DELAY. ERR_DELAY not an allowed error for the RESTOREFH. But let's say, the server does return it, then client is not following the spec because if it'll get this error, it will retry the whole compound (causing a different error of redoing a non-idempotent operation). The spec says client is responsible for handling partially completed compound. The client should only retry the failed operations in a compound, I don't see that client does that. > > > > NFS client also ignores errors of the returning GETATTR after the > > > > RESTOREFH. So I'm not sure why we are then not ignoring errors > > > > (or > > > > some errors) of the RESTOREFH. > > > > > > We do need to check the value of RESTOREFH in order to figure out > > > if we > > > can continue reading the XDR buffer to decode the file attributes. > > > We > > > want to read those file attributes because we do expect the change > > > attribute, the ctime and the nlinks values to all change as a > > > result of > > > the operation. > > > > I have nothing against decoding the error and using it in a decision > > to keep decoding. But the client doesn't have to propagate the > > RESTOREFH error to the application? > > > > In all other non-idempotent operations that have other operations (ie > > GETATTR) following them, the client ignores the errors. Btw I just > > noticed that on OPEN compound, since we ignore decode error from the > > GETATTR, it would continue decoding LAYOUTGET... > > > > CREATE has problem if the following GETFH will return EDELAY. Client > > doesn't deal with retrying a part of the compound. It retries the > > whole compound. It leads to an error (since non-idempotent operation > > is retried). But I guess that's a 2nd issue (or a 3rd if we could the > > decoding layoutget).... > > > > All this is under the umbrella of how to handle errors on > > non-idempotent operations in a compound.... > > There is no point in trying to handle errors that make no sense. If the > server has a bug, then let's expose it instead of trying to hide it in > the sofa cushions. EDELAY on GETFH is a reasonable error for the server to return. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >