Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp3655186imm; Wed, 5 Sep 2018 03:56:40 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbAtR+MgJniReoyKCFGRX+0MOvGpYLZqfwXPNA9PKlaHcSx3hdNNMl28EZMpfDK3GngoFRM X-Received: by 2002:a63:66c7:: with SMTP id a190-v6mr34019532pgc.411.1536145000131; Wed, 05 Sep 2018 03:56:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536145000; cv=none; d=google.com; s=arc-20160816; b=uB0WarR09Ze4RCwzraLmnFSdGkue+N8AgggpdhhkL2QC5YnfVlemrFWPOKiuYdok9n oHyGV9PUIf2MhN2seCZMVoCJOUeqTlf4DNGz6q0A/gv6DQsgZH0ePQMyVM2BfQmiflbM auMZEi66pmSdBcu0broNFWrgJ32NEObE702KOqEBT6IU0JTnVe19dDQvXZ8KBZn4a4xx Kh2kuBiyfT0hMQCTc0BhmyYfeNq862/Azwk3KJsoxlejUETHp4vySM0mSWYIJ6IqkVqx kyqVg7TtRuoPJcOUZY/F5crZdRJcb9w4QJpYRN2ACAa9EgVcgrY5gMOlGylxaUcnuM9K 8IKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=7B2v2hHSg+KboroGP/trjmfTfA11ZB66khXAHYGBKPY=; b=XKJG7mPig3K9vogKP+HzBxkZHt6gwiAD01frE7D3IiTCXMucjhqZ3RINzUdktfdpuF E/37mYASTP6za1ANdkE34/cxHqP3X5Vj0IQ60HyHEIRmQX09HltqcSnxuaciDLwvAxIx c98b21y/VNDB9aXZMGp8/+djnfnV8wDFuH6YXIzQt6LPSfE60aM0Pa3jvi/0+1j0vlln x4kcEbhHgeOTNkkpIFO0V2YMFCLgbHB5oJ6r0NhccdPIYtn7bmxM60ToUQ7ty5LKy/VP Y/0Tl9Y8vgjoN1ttPySvN+o1wHQQtR0WlWb1jy9ek626eANyGgBAgMyXy7yP9+xM/SGK 7Wng== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z11-v6si1564061pgv.138.2018.09.05.03.56.24; Wed, 05 Sep 2018 03:56:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727308AbeIEPZA (ORCPT + 99 others); Wed, 5 Sep 2018 11:25:00 -0400 Received: from mail-yb1-f195.google.com ([209.85.219.195]:34919 "EHLO mail-yb1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725900AbeIEPY7 (ORCPT ); Wed, 5 Sep 2018 11:24:59 -0400 Received: by mail-yb1-f195.google.com with SMTP id o17-v6so2501867yba.2 for ; Wed, 05 Sep 2018 03:55:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=7B2v2hHSg+KboroGP/trjmfTfA11ZB66khXAHYGBKPY=; b=BxzNp09qBAEuRGJ+XNLR8rmqpkYfEBT9fZ+UfDwpzmkzyvgpl39QLZo3xYDXVavPqx GMKGMt4bTvXTD7v+BzhXahiYkVwSk85RPFuNhDJP6zthVaGoX6F/+Rwvz9HGX8qrI0Vg Stwl2at7/6N2yb9mflz9cACr/WQv/l3TjuOh+fBFXW+sCP5k5JDkVQdJH5qiQWN/D4F6 bk0GDJpwX6jGIC4+DeLlhS/hCQbLaLkQ0VUfGHRPrspGespuA0DthtVGmGqfyXvKgKb7 Ls2xPzQxwUJQvqyvhhNLwfTdD3CH4T++M1yoax06hctK5u7ClmNWlNUqM49lizs+od1C Jq9A== X-Gm-Message-State: APzg51B9DAMuIihyx7hSgDHklMzUB2jKedpznX0V4ut5jIMcn4XLP6g3 OQAfjNR+hwjdZWvSX1qsJAYEnQ== X-Received: by 2002:a25:945:: with SMTP id u5-v6mr13998401ybm.300.1536144917644; Wed, 05 Sep 2018 03:55:17 -0700 (PDT) Received: from tleilax.poochiereds.net (cpe-2606-A000-1100-DB-0-0-0-E2E.dyn6.twc.com. [2606:a000:1100:db::e2e]) by smtp.gmail.com with ESMTPSA id x133-v6sm527170ywg.49.2018.09.05.03.55.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 05 Sep 2018 03:55:17 -0700 (PDT) Message-ID: <09ba078797a1327713e5c2d3111641246451c06e.camel@redhat.com> Subject: Re: POSIX violation by writeback error From: Jeff Layton To: =?UTF-8?Q?=E7=84=A6=E6=99=93=E5=86=AC?= Cc: bfields@fieldses.org, R.E.Wolff@bitwizard.nl, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 05 Sep 2018 06:55:15 -0400 In-Reply-To: References: <20180904075347.GH11854@BitWizard.nl> <82ffc434137c2ca47a8edefbe7007f5cbecd1cca.camel@redhat.com> <20180904161203.GD17478@fieldses.org> <20180904162348.GN17123@BitWizard.nl> <20180904185411.GA22166@fieldses.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-1.fc28) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2018-09-05 at 16:24 +0800, 焦晓冬 wrote: > On Wed, Sep 5, 2018 at 4:18 AM Jeff Layton wrote: > > > > On Tue, 2018-09-04 at 14:54 -0400, J. Bruce Fields wrote: > > > On Tue, Sep 04, 2018 at 06:23:48PM +0200, Rogier Wolff wrote: > > > > On Tue, Sep 04, 2018 at 12:12:03PM -0400, J. Bruce Fields wrote: > > > > > Well, I think the point was that in the above examples you'd prefer that > > > > > the read just fail--no need to keep the data. A bit marking the file > > > > > (or even the entire filesystem) unreadable would satisfy posix, I guess. > > > > > Whether that's practical, I don't know. > > > > > > > > When you would do it like that (mark the whole filesystem as "in > > > > error") things go from bad to worse even faster. The Linux kernel > > > > tries to keep the system up even in the face of errors. > > > > > > > > With that suggestion, having one application run into a writeback > > > > error would effectively crash the whole system because the filesystem > > > > may be the root filesystem and stuff like "sshd" that you need to > > > > diagnose the problem needs to be read from the disk.... > > > > > > Well, the absolutist position on posix compliance here would be that a > > > crash is still preferable to returning the wrong data. And for the > > > cases 焦晓冬 gives, that sounds right? Maybe it's the wrong balance in > > > general, I don't know. And we do already have filesystems with > > > panic-on-error options, so if they aren't used maybe then maybe users > > > have already voted against that level of strictness. > > > > > > > Yeah, idk. The problem here is that this is squarely in the domain of > > implementation defined behavior. I do think that the current "policy" > > (if you call it that) of what to do after a wb error is weird and wrong. > > What we probably ought to do is start considering how we'd like it to > > behave. > > > > How about something like this? > > > > Mark the pages as "uncleanable" after a writeback error. We'll satisfy > > reads from the cached data until someone calls fsync, at which point > > we'd return the error and invalidate the uncleanable pages. > > Totally agree with you. > > > > > If no one calls fsync and scrapes the error, we'll hold on to it for as > > long as we can (or up to some predefined limit) and then after that > > we'll invalidate the uncleanable pages and start returning errors on > > reads. If someone eventually calls fsync afterward, we can return to > > normal operation. > > Agree with you except that using fsync() as `clear_error_mark()` seems > weird and counter-intuitive. > That is essentially how fsync (and the errseq_t infrastructure) works. Once the kernel has hit a wb error, it reports that error to fsync exactly once per fd. In practice, the errors are not "cleared", but it appears that way to the fsync caller. > > > > As always though...what about mmap? Would we need to SIGBUS at the point > > where we'd start returning errors on read()? > > I think SIGBUS to mmap() is the same thing as EIO to read(). > > > > > Would that approximate the current behavior enough and make sense? > > Implementing it all sounds non-trivial though... > > No. > No problem is reported because nowadays we are relying on the > underlying disk drives. They transparently redirect bad sectors and > use S.M.A.R.T to waning us long before a real EIO could be seen. > As to network filesystems, if I'm not wrong, close() op calls fsync() > inside the implementation. So there is also no problem. There is no requirement for a filesystem to flush data on close(). In fact, most local filesystems do not. NFS does, but that's because it has to in order to provide close-to-open cache consistency semantics. -- Jeff Layton