Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp4888792ybl; Mon, 26 Aug 2019 18:14:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqzRUjNeIwPN4C+DyQ7XpQFfuAGe/51yCxNZqJcdtoktFkv4U9/gMmpnJMx9Te93Ov0c3nHz X-Received: by 2002:a63:c1c:: with SMTP id b28mr19048443pgl.354.1566868458729; Mon, 26 Aug 2019 18:14:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566868458; cv=none; d=google.com; s=arc-20160816; b=ZvTE3pORGlEqv9SbMaWwrBujPxiZiMxRq5AOTVIqJgyVfywvORQBdcPc/tB0UYbut8 mlgprUqZiqVPTuxQBNWJ8judEpIlir3SyMpThSzsJ6XIW6Tkcdt3P+TwGu1VytlbICbw 38IHDyLMYlRGQR9V2VYNq4Xnr5aULI6xftbOQrnw53vj0m4T6nryTuy74ab5dNqskuyc 041swKG6dGiG4s1GnlZi+27WYLSA3WVTFESASopQy91xYGO1NtawRly0l+ZFGq428mEi j4uuLNN3KNa2/cDQDF278GizXtPy2SuoglYDZNftqupvyrkpM4cDEPcHEYBkrOrKsuJm TKdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Ap0xcIEFZUSeGYC9AsYf3TyPjqSiq5nqIq1/QSNMsKk=; b=HQYOZmKdO1TiH8/VXj7F5bGgxokA55wr2Qb7swL44BTfUMftbt2J1z0dt/kS/JI7UJ u+qrQKPyQ1uvx8X5qAE2igogkaKKnUaUxlX65I1UzhhsfGshQRMPUeri5uyESLdRvAtx cuPkzU7FVWj2rFryJVzxTZTgm3/1UcFv1Veh+ZshJMvLZX2UR0VQwqAOSzmgPGFQkWMQ Bv/HQnKcCKxt98xHIUezfElh/uSqXeo9FYb859soYgH2FEP3b09gHUOFPkhgtV9PNAYS cRdPvtdxvNIxeIVgFFDDl1li2bYhi0flx6uoV0gq9QYxx9PSL83kId/feWJXi6dIKr5I jCeQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z1si5277577pff.145.2019.08.26.18.13.58; Mon, 26 Aug 2019 18:14:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727788AbfH0BNz (ORCPT + 99 others); Mon, 26 Aug 2019 21:13:55 -0400 Received: from fieldses.org ([173.255.197.46]:47128 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727435AbfH0BNy (ORCPT ); Mon, 26 Aug 2019 21:13:54 -0400 Received: by fieldses.org (Postfix, from userid 2815) id 8957F20AE; Mon, 26 Aug 2019 21:13:54 -0400 (EDT) Date: Mon, 26 Aug 2019 21:13:54 -0400 From: "bfields@fieldses.org" To: Trond Myklebust Cc: "linux-nfs@vger.kernel.org" , "bfields@redhat.com" Subject: Re: [PATCH 0/3] Handling NFSv3 I/O errors in knfsd Message-ID: <20190827011354.GB30827@fieldses.org> References: <20190826165021.81075-1-trond.myklebust@hammerspace.com> <20190826205156.GA27834@fieldses.org> <20190827004811.GA30827@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, Aug 27, 2019 at 12:56:07AM +0000, Trond Myklebust wrote: > On Mon, 2019-08-26 at 20:48 -0400, bfields@fieldses.org wrote: > > On Mon, Aug 26, 2019 at 09:02:31PM +0000, Trond Myklebust wrote: > > > On Mon, 2019-08-26 at 16:51 -0400, J. Bruce Fields wrote: > > > > On Mon, Aug 26, 2019 at 12:50:18PM -0400, Trond Myklebust wrote: > > > > > Note that if multiple clients were writing to the same file, > > > > > then we probably want to bump the boot verifier anyway, since > > > > > only one COMMIT will see the error report (because the cached > > > > > file is also shared). > > > > > > > > I'm confused by the "probably should". So that's future work? > > > > I guess it'd mean some additional work to identify that case. > > > > You can't really even distinguish clients in the NFSv3 case, but > > > > I suppose you could use IP address or TCP connection as an > > > > approximation. > > > > > > I'm suggesting we should do this too, but I haven't done so yet in > > > these patches. I'd like to hear other opinions (particularly from > > > you, Chuck and Jeff). > > > > Does this process actually converge, or do we end up with all the > > clients retrying the writes and, again, only one of them getting the > > error? > > The client that gets the error should stop retrying if the error is > fatal. Have clients historically been good about that? I just wonder whether it's a concern that boot-verifier-bumping could magnify the impact of clients that are overly persistent about retrying IO errors. > > I wonder what the typical errors are, anyway. > > I would expect ENOSPC, and EIO to be the most common. The former if > delayed allocation and/or snapshots result in writes failing after > writing to the page cache. The latter if we hit a disk outage or other > such problem. Makes sense. --b.