Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp3201720imm; Tue, 4 Sep 2018 17:53:32 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYytSgOKsNfH5xfP36lEyP8HN7pHTu417lq5lrIRLOoPU7OmzNa72G6aX5D0VwbNJu8/IkN X-Received: by 2002:a17:902:ab94:: with SMTP id f20-v6mr36578830plr.231.1536108812838; Tue, 04 Sep 2018 17:53:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536108812; cv=none; d=google.com; s=arc-20160816; b=tfftxlf3f2OtiNLsTC7VwqxD/We5PJPt8I3YHBMK+VZGSrSfWYhIds3scG5dBbzZ8G SC2H9cS2o+BKKqSp/zcEogO5bIuQs8yXOUJxgc5d5fOmyMz4WqSyIInJIUg3+3QuzJ4O nx9jPbRiCX34XNI1ibGCdtP1HVoPHDSO9zg/3O3OWe726clNC1sk49I4Yq6alDuotxQe VwFX5ZMzqOX6DKUxyUlX+JF44AoJobR4oSEU2iC7FXX5WIjKjOHbClpTpAQq2kJEB9dj HQqC9xGiOR4H/eR1TdKqhmERosyhPh5kfWpd1ARS7TYCsi1tHRHdb12eTxZy5yoq03Vw VhVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=BwM0lgiaTNsZWBv3rY1XnfhV65iheKGcPizOUnXnzAA=; b=RKTgJ5HwCFzKT3HyMYtAa4ju52EceycrcSmPsrESqpIsreqv2mP9K2gup4OkppRy0g E3Naqv3DunF/hFQsauTGmGHUMnbAjLtFqNHIlYxRDyOOvaDKTwC4jjG4OgTUYw2Q/9qs 0OuXtxBIWkgqChv2r839Ie//w+whanTCc7abYGaav77Ea1mHqbJZtPkqj9x1sYEFjUpx SrzUk1kgMBGCMl7Do4wHF+/VXHxeOpq6zzwET88ktr8QchjCRYGqSzqAnubtc8oLkXQ+ XOPOJqavQ8bWrFBROFlGe6F8uXHFXvzLRNDOv4XSf9MtAoLA7CclUdg47NBzq4QVjG20 fq0w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p1-v6si385474pfe.150.2018.09.04.17.53.17; Tue, 04 Sep 2018 17:53:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726311AbeIEFTW (ORCPT + 99 others); Wed, 5 Sep 2018 01:19:22 -0400 Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:55472 "EHLO ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725825AbeIEFTW (ORCPT ); Wed, 5 Sep 2018 01:19:22 -0400 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail06.adl6.internode.on.net with ESMTP; 05 Sep 2018 10:21:44 +0930 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1fxM2o-0002y1-1s; Wed, 05 Sep 2018 10:51:42 +1000 Date: Wed, 5 Sep 2018 10:51:42 +1000 From: Dave Chinner To: Vito Caputo Cc: Jeff Layton , "J. Bruce Fields" , Rogier Wolff , =?utf-8?B?54Sm5pmT5Yas?= , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: POSIX violation by writeback error Message-ID: <20180905005141.GD27618@dastard> References: <20180904075347.GH11854@BitWizard.nl> <82ffc434137c2ca47a8edefbe7007f5cbecd1cca.camel@redhat.com> <20180904161203.GD17478@fieldses.org> <20180904162348.GN17123@BitWizard.nl> <20180904185411.GA22166@fieldses.org> <20180904203534.yumaest6v5p6izln@shells.gnugeneration.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20180904203534.yumaest6v5p6izln@shells.gnugeneration.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 04, 2018 at 01:35:34PM -0700, Vito Caputo wrote: > On Tue, Sep 04, 2018 at 04:18:18PM -0400, Jeff Layton wrote: > > On Tue, 2018-09-04 at 14:54 -0400, J. Bruce Fields wrote: > > > On Tue, Sep 04, 2018 at 06:23:48PM +0200, Rogier Wolff wrote: > > > > On Tue, Sep 04, 2018 at 12:12:03PM -0400, J. Bruce Fields wrote: > > > > > Well, I think the point was that in the above examples you'd prefer that > > > > > the read just fail--no need to keep the data. A bit marking the file > > > > > (or even the entire filesystem) unreadable would satisfy posix, I guess. > > > > > Whether that's practical, I don't know. > > > > > > > > When you would do it like that (mark the whole filesystem as "in > > > > error") things go from bad to worse even faster. The Linux kernel > > > > tries to keep the system up even in the face of errors. > > > > > > > > With that suggestion, having one application run into a writeback > > > > error would effectively crash the whole system because the filesystem > > > > may be the root filesystem and stuff like "sshd" that you need to > > > > diagnose the problem needs to be read from the disk.... > > > > > > Well, the absolutist position on posix compliance here would be that a > > > crash is still preferable to returning the wrong data. And for the > > > cases 焦晓冬 gives, that sounds right? Maybe it's the wrong balance in > > > general, I don't know. And we do already have filesystems with > > > panic-on-error options, so if they aren't used maybe then maybe users > > > have already voted against that level of strictness. > > > > > > > Yeah, idk. The problem here is that this is squarely in the domain of > > implementation defined behavior. I do think that the current "policy" > > (if you call it that) of what to do after a wb error is weird and wrong. > > What we probably ought to do is start considering how we'd like it to > > behave. > > > > How about something like this? > > > > Mark the pages as "uncleanable" after a writeback error. We'll satisfy > > reads from the cached data until someone calls fsync, at which point > > we'd return the error and invalidate the uncleanable pages. > > > > If no one calls fsync and scrapes the error, we'll hold on to it for as > > long as we can (or up to some predefined limit) and then after that > > we'll invalidate the uncleanable pages and start returning errors on > > reads. If someone eventually calls fsync afterward, we can return to > > normal operation. > > > > As always though...what about mmap? Would we need to SIGBUS at the point > > where we'd start returning errors on read()? > > > > Would that approximate the current behavior enough and make sense? > > Implementing it all sounds non-trivial though... > > > > Here's a crazy and potentially stupid idea: > > Implement a new class of swap space for backing dirty pages which fail > to write back. Pages in this space survive reboots, essentially backing > the implicit commitment POSIX establishes in the face of asynchronous > writeback errors. Rather than evicting these pages as clean, they are > swapped out to the persistent swap. And when that "swap" area gets write errors, too? What then? We're straight back to the same "what the hell do we do with the error" problem. Adding more turtles doesn't help solve this issue. Cheers, Dave. -- Dave Chinner david@fromorbit.com