Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp197168pxu; Tue, 5 Jan 2021 08:30:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJwH54FhUMj+Jn5kkG7SxXsu7t3+qsvNFtk+w0Sk4e/IE5efmG104DKmpwOfkfznOzDVHGhi X-Received: by 2002:a17:906:4146:: with SMTP id l6mr71923665ejk.341.1609864212910; Tue, 05 Jan 2021 08:30:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609864212; cv=none; d=google.com; s=arc-20160816; b=kAR8sracbkElLo/Fqd5RWkRH3WCpGB+qhrEVOqq1I4iGkhreiVYrtqZaDdTRH60mpB GsEQqhi/sdYQtt3j8yJHjZDx5SJorU768/LsvScj4qPyXA0+P6iVZNEtEGNP9gLC7yBe N+v19qyLdGxLvn8Bb4sNpKdc6GukfuXmBNxMAqh/Fm7TQov6mCdkULwL6aijIL6/H2c4 cnEoY8ZYQcfJKGiOzLNOvIliAYZRoi+g1o/N61eJUMUqw10Aa0sLSd3oPfPJyB/lettd ryEiBUUvpQWC7bpgxHBa8Jx/8dQyjc0EPk5zl9BsmO0vpCF2a/FlLmpYTTNVFriCp/Xq ebzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=LUe4MFNSbFjHcEIv0kK9wumiX0gnLbIWW/ao7PC469E=; b=NysoUb0fqGgA5sSKAa0IqcQYjn5Lu1PetEeL6DkB0LB3AiRASmB7NLGEl5at4FWoHf XD3ySBswizkIGimIUh7KkA0MMjMJviL84avweD8a6u2bxyS4vPjrPfeXIzoPT7AqcL4A hqX3/YlxBI1qyqhb0eAC67CNgIKZppgJV2fThREWS/Cu7mtuCb3L+oQnU30AVc9F03tT M1D+kxRSNwIgJG4OAgh0XtGObolWbRUDhXxRexdL1DGxwepvXpQ/4zHBy/le5b8hKdjy 7l1KVrIv6a/MZiK1c18y7FUH3ngZAozk7UgWt9xFwirCtiUCUSwu3Rm1T/VIVzKE3nq4 g1pw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ALOyYuCk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v1si28078892ejf.162.2021.01.05.08.29.48; Tue, 05 Jan 2021 08:30:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ALOyYuCk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728602AbhAEQ2X (ORCPT + 99 others); Tue, 5 Jan 2021 11:28:23 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:43211 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726096AbhAEQ2W (ORCPT ); Tue, 5 Jan 2021 11:28:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1609864015; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=LUe4MFNSbFjHcEIv0kK9wumiX0gnLbIWW/ao7PC469E=; b=ALOyYuCk6BiUGXrvzDXXOC7yqnohFnXuKTDgm1ls/mf2qCz3uEmacZvKKYCrwUbtZ+2yjP EbFqKXLin8BUd6uJeQdbbADjD+wB5q2L4IZavB96mS48qQvB36mD692UaiWJ5LKf1bjRQu iITTG9mM8k2CfZZr3EEij0FQQzLgv6c= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-432-cZDdMYFZNfa-fReCkXKBHw-1; Tue, 05 Jan 2021 11:26:51 -0500 X-MC-Unique: cZDdMYFZNfa-fReCkXKBHw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 509FE10054FF; Tue, 5 Jan 2021 16:26:48 +0000 (UTC) Received: from horse.redhat.com (ovpn-117-227.rdu2.redhat.com [10.10.117.227]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5FA4660BFA; Tue, 5 Jan 2021 16:26:47 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id E937F220BCF; Tue, 5 Jan 2021 11:26:46 -0500 (EST) Date: Tue, 5 Jan 2021 11:26:46 -0500 From: Vivek Goyal To: Amir Goldstein Cc: Matthew Wilcox , Sargun Dhillon , linux-fsdevel , linux-kernel , overlayfs , Jeff Layton , Miklos Szeredi , Jan Kara , NeilBrown , Al Viro , Christoph Hellwig , Chengguang Xu Subject: Re: [PATCH 3/3] overlayfs: Report writeback errors on upper Message-ID: <20210105162646.GD3200@redhat.com> References: <20201223200746.GR874@casper.infradead.org> <20201223202140.GB11012@ircssh-2.c.rugged-nimbus-611.internal> <20201223204428.GS874@casper.infradead.org> <20210104151424.GA63879@redhat.com> <20210104154015.GA73873@redhat.com> <20210104224447.GG63879@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 05, 2021 at 09:11:23AM +0200, Amir Goldstein wrote: > > > > > > What I would rather see is: > > > - Non-volatile: first syncfs in every container gets an error (nice to have) > > > > I am not sure why are we making this behavior per container. This should > > be no different from current semantics we have for syncfs() on regular > > filesystem. And that will provide what you are looking for. If you > > want single error to be reported in all ovleray mounts, then make > > sure you have one fd open in each mount after mount, then call syncfs() > > on that fd. > > > > Ok. > > > Not sure why overlayfs behavior/semantics should be any differnt > > than what regular filessytems like ext4/xfs are offering. Once we > > get page cache sharing sorted out with xfs reflink, then people > > will not even need overlayfs and be able to launch containers > > just using xfs reflink and share base image. In that case also > > they will need to keep an fd open per container they want to > > see an error in. > > > > So my patches exactly provide that. syncfs() behavior is same with > > overlayfs as application gets it on other filesystems. And to me > > its important to keep behavior same. > > > > > - Volatile: every syncfs and every fsync in every container gets an error > > > (important IMO) > > > > For volatile mounts, I agree that we need to fail overlayfs instance > > as soon as first error is detected since mount. And this applies to > > not only syncfs()/fsync() but to read/write and other operations too. > > > > For that we will need additional patches which are floating around > > to keep errseq sample in overlay and check for errors in all > > paths syncfs/fsync/read/write/.... and fail fs. > > > But these patches build on top of my patches. > > Here we disagree. > > I don't see how Jeff's patch is "building on top of your patches" > seeing that it is perfectly well contained and does not in fact depend > on your patches. Jeff's patches are solving problem only for volatile mounts and they are propagating error to overlayfs sb. My patches are solving the issue both for volatile mount as well as non-volatile mounts and solve it using same method so there is no confusion. So there are multiple pieces to this puzzle and IMHO, it probably should be fixed in this order. A. First fix the syncfs() path to return error both for volatile as as well non-volatile mounts. B. And then add patches to fail filesystem for volatile mount as soon as first error is detected (either in syncfs path or in other paths like read/write/...). This probably will require to save errseq in ovl_fs, and then compare with upper_sb in critical paths and fail filesystem as soon as error is detected. C. Finally fix the issues related to mount/remount error detection which Sargun is wanting to fix. This will be largerly solved by B except saving errseq on disk. My patches should fix the first problem. And more patches can be applied on top to fix issue B and issue C. Now if we agree with this, in this context I see that fixing problem B and C is building on top of my patches which fixes problem A. > > And I do insist that the fix for volatile mounts syncfs/fsync error > reporting should be applied before your patches or at the very least > not heavily depend on them. I still don't understand that why volatile syncfs() error reporting is more important than non-volatile syncfs(). But I will stop harping on this point now. My issue with Jeff's patches is that syncfs() error reporting should be dealt in same way both for volatile and non-volatile mount. That is compare file->f_sb_err and upper_sb->s_wb_err to figure out if there is an error to report to user space. Currently this patches only solve the problem for volatile mounts and use propagation to overlay sb which is conflicting for non-volatile mounts. IIUC, your primary concern with volatile mount is that you want to detect as soon as writeback error happens, and flag it to container manager so that container manager can stop container, throw away upper layer and restart from scratch. If yes, what you want can be solved by solving problem B and backporting it to LTS kernel. I think patches for that will be well contained within overlayfs (And no VFS) changes and should be relatively easy to backport. IOW, backportability to LTS kernel should not be a concern/blocker for my patch series which fixes syncfs() issue for overlayfs. Thanks Vivek > > volatile mount was introduced in fresh new v5.10, which is also an > LTS kernel. It would be inconsiderate of volatile mount users and developers > to make backporting that fix to v5.10.y any harder than it should be. > > > My patches don't solve this problem of failing overlay mount for > > the volatile mount case. > > > > Here we agree. > > > > > > > This is why I prefer to sample upper sb error on mount and propagate > > > new errors to overlayfs sb (Jeff's patch). > > > > Ok, I think this is one of the key points of the whole discussion. What > > mechanism should be used to propagate writeback errors through overlayfs. > > > > A. Propagate errors from upper sb to overlay sb. > > B. Leave overlay sb alone and use upper sb for error checks. > > > > We don't have good model to propagate errors between super blocks, > > so Jeff preferred not to do error propagation between super blocks > > for regular mounts. > > > > https://lore.kernel.org/linux-fsdevel/bff90dfee3a3392d67a4f3516ab28989e87fa25f.camel@kernel.org/ > > > > If we are not defining new semantics for syncfs() for overlayfs, then > > I can't see what's the advantage of coming up with new mechanism to > > propagate errors to overlay sb. Approach B should work just fine and > > provide the syncfs() semantics we want for overlayfs (Same semantics > > as other filesystems). > > > > Ok. I am on board with B. > > Philosophically. overlayfs model is somewhere between "passthrough" > and "proxy" when handling pure upper files and as overlayfs evolves, > it steadily moves towards the "proxy" model, with page cache and > writeback being the largest remaining piece to convert. > > So I concede that as long as overlayfs writeback is mostly passthrough, > syncfs might as well be passthrough to upper fs as well. > > Thanks, > Amir. >