Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2414859imm; Tue, 4 Sep 2018 04:11:08 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYZiqg1y7r1ayTabWhRRz/GGdLGsf7209i08kJPye1UfAF4aOF3secENm9tiAtJC6vSzRmT X-Received: by 2002:a63:1551:: with SMTP id 17-v6mr2111696pgv.383.1536059468340; Tue, 04 Sep 2018 04:11:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536059468; cv=none; d=google.com; s=arc-20160816; b=i6an5xQck7p8uqvDVrJbzpOCl0Ae+1P/6/kRLaUSaw/LDn0Gbzv336w94BAeAsNB6/ FbKZ1HsGvr3l0LpJkouKjk3RNs7kJpmQjwjpi8MPjdBGlgVQEIcvizOqXxbjrMPtDI7t 1HY0/yyjSbSNadq47ClM0w0+lI6d6YUdu9rKzswQwY50Xntp4QHR9bPoMU/AxSi84UGR hJW7LKh5x6SAmo44/kWVkCWHHgNGcbUDBKqs7BMC9gRzjRkkq/KK4sw0nUn7l5xWP51y 7zX/J4MPkIU8lWSnedSONE6l9KlatubPUjNMH9B4DPi4PjSQ9jEWeDwzZqXSuMaMWZ49 tLVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=pKRi1mq42PohxYzS47GA5j/6nSE1A7k//x/RUKJ8w88=; b=dOJ9bDU7Z31ocL3keGVVURQeSzWpENLO5B+x3av92qNaJG1wctUYwPW3eqT0UMTdl2 XfKEpzSPsLl+Sifj62unFqFfv1CCEHG9SgILIjweGqNvv/Yj+6aO1xh66ivbg1Lj4/ns imb1AgJ/6cZp+sD54RpdO1jzqOwXr7P+KRhcY1ZUg7la/SO8/T6dK4lfKbJ849wbQoy1 JZ7t5S8Cj2AzaRb0+P8W66ghhv5WIzjvaGnUAnQ6t1B8RLWGBzslYaKS28zc+wMk7Whx 9WY4GcFBONJ6bpJvFXBvJP6ZpEdJjMrh02TULYiLrrFeT7jfK5jAM0Hw6VeLON8Bz/pP jH/A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x20-v6si3991546pff.333.2018.09.04.04.10.53; Tue, 04 Sep 2018 04:11:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727255AbeIDPeQ (ORCPT + 99 others); Tue, 4 Sep 2018 11:34:16 -0400 Received: from mail-yb1-f194.google.com ([209.85.219.194]:32926 "EHLO mail-yb1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726294AbeIDPeP (ORCPT ); Tue, 4 Sep 2018 11:34:15 -0400 Received: by mail-yb1-f194.google.com with SMTP id m123-v6so1127731ybm.0 for ; Tue, 04 Sep 2018 04:09:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=pKRi1mq42PohxYzS47GA5j/6nSE1A7k//x/RUKJ8w88=; b=qPALPRTrbVokWnsSWCZwHcHQ6sNLKXQpmlotIJ4mpEnWc153NsMRZ+GlpEZ+0Dvre3 wuDR+WhWowFdUWvAUqF7vn5eRsJZ3FLjkGcle06/fFrEH8GsE49e9rNc6t5WlInPQ6RI lRmMr2+oKslp5IyAxOAtmHSdPkAM5P+hK/E5mp2XnWEWpfAx6hfxnk7CqBshal2sZ5FJ X1MsQjLQpl8ESCwFD1Gi0JgRcEPNmOo6mfWwTlHg1TEctkJiWKtd5BDNPiWmy6g/yqUh BmazOiP4PXDufb+BtrD8vNByZ4iJwr4VlxKLuRc/3JjPY5Z8omiSOj/rdcNME0UVrsL9 AlZw== X-Gm-Message-State: APzg51AfVQJ1Cf9BzeakONbVH3tDyfbSkHyDhjdvdM8KKtFWtwBiceaC upL2uyWz9Ji4RPC7Dhg/9c6B7Q== X-Received: by 2002:a25:db83:: with SMTP id g125-v6mr18331692ybf.412.1536059376185; Tue, 04 Sep 2018 04:09:36 -0700 (PDT) Received: from tleilax.poochiereds.net (cpe-2606-A000-1100-DB-0-0-0-161.dyn6.twc.com. [2606:a000:1100:db::161]) by smtp.gmail.com with ESMTPSA id t4-v6sm8066794ywa.51.2018.09.04.04.09.35 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 04 Sep 2018 04:09:35 -0700 (PDT) Message-ID: <82ffc434137c2ca47a8edefbe7007f5cbecd1cca.camel@redhat.com> Subject: Re: POSIX violation by writeback error From: Jeff Layton To: =?UTF-8?Q?=E7=84=A6=E6=99=93=E5=86=AC?= , R.E.Wolff@bitwizard.nl Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Tue, 04 Sep 2018 07:09:34 -0400 In-Reply-To: References: <20180904075347.GH11854@BitWizard.nl> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-1.fc28) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2018-09-04 at 16:58 +0800, 焦晓冬 wrote: > On Tue, Sep 4, 2018 at 3:53 PM Rogier Wolff wrote: > > ... > > > > > > Jlayton's patch is simple but wonderful idea towards correct error > > > reporting. It seems one crucial thing is still here to be fixed. Does > > > anyone have some idea? > > > > > > The crucial thing may be that a read() after a successful > > > open()-write()-close() may return old data. > > > > > > That may happen where an async writeback error occurs after close() > > > and the inode/mapping get evicted before read(). > > > > Suppose I have 1Gb of RAM. Suppose I open a file, write 0.5Gb to it > > and then close it. Then I repeat this 9 times. > > > > Now, when writing those files to storage fails, there is 5Gb of data > > to remember and only 1Gb of RAM. > > > > I can choose any part of that 5Gb and try to read it. > > > > Please make a suggestion about where we should store that data? > > That is certainly not possible to be done. But at least, shall we report > error on read()? Silently returning wrong data may cause further damage, > such as removing wrong files since it was marked as garbage in the old file. > Is the data wrong though? You tried to write and then that failed. Eventually we want to be able to get at the data that's actually in the file -- what is that point? If I get an error back on a read, why should I think that it has anything at all to do with writes that previously failed? It may even have been written by a completely separate process that I had nothing at all to do with. > As I can see, that is all about error reporting. > > As for suggestion, maybe the error flag of inode/mapping, or the entire inode > should not be evicted if there was an error. That hopefully won't take much > memory. On extreme conditions, where too much error inode requires staying > in memory, maybe we should panic rather then spread the error. > > > > > In the easy case, where the data easily fits in RAM, you COULD write a > > solution. But when the hardware fails, the SYSTEM will not be able to > > follow the posix rules. > > Nope, we are able to follow the rules. The above is one way that follows the > POSIX rules. > This is something we discussed at LSF this year. We could attempt to keep dirty data around for a little while, at least long enough to ensure that reads reflect earlier writes until the errors can be scraped out by fsync. That would sort of redefine fsync from being "ensure that my writes are flushed" to "synchronize my cache with the current state of the file". The problem of course is that applications are not required to do fsync at all. At what point do we give up on it, and toss out the pages that can't be cleaned? We could allow for a tunable that does a kernel panic if writebacks fail and the errors are never fetched via fsync, and we run out of memory. I don't think that is something most users would want though. Another thought: maybe we could OOM kill any process that has the file open and then toss out the page data in that situation? I'm wide open to (good) ideas here. -- Jeff Layton