Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1169326AbdDXLug (ORCPT ); Mon, 24 Apr 2017 07:50:36 -0400 Received: from mail-qk0-f179.google.com ([209.85.220.179]:35016 "EHLO mail-qk0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1169307AbdDXLuW (ORCPT ); Mon, 24 Apr 2017 07:50:22 -0400 Message-ID: <1493034618.2895.10.camel@redhat.com> Subject: Re: [PATCH v2 08/17] fs: retrofit old error reporting API onto new infrastructure From: Jeff Layton To: NeilBrown , linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, tytso@mit.edu, jack@suse.cz, willy@infradead.org, viro@zeniv.linux.org.uk Date: Mon, 24 Apr 2017 07:50:18 -0400 In-Reply-To: <87pog2rbpd.fsf@notabene.neil.brown.name> References: <20170412120614.6111-1-jlayton@redhat.com> <20170412120614.6111-9-jlayton@redhat.com> <87fuhduvcv.fsf@notabene.neil.brown.name> <1492036881.19286.1.camel@redhat.com> <87vaq2tzhu.fsf@notabene.neil.brown.name> <1492778818.7308.8.camel@redhat.com> <87pog2rbpd.fsf@notabene.neil.brown.name> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6 (3.22.6-2.fc25) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4306 Lines: 92 On Mon, 2017-04-24 at 08:38 +1000, NeilBrown wrote: > On Fri, Apr 21 2017, Jeff Layton wrote: > > > On Tue, 2017-04-18 at 08:56 +1000, NeilBrown wrote: > > > On Wed, Apr 12 2017, Jeff Layton wrote: > > > > > > > On Thu, 2017-04-13 at 08:14 +1000, NeilBrown wrote: > > > > > > > > > > I suspect that the filemap_check_wb_error() will need to be moved > > > > > into some parent of the current call site, which is essentially what you > > > > > suggest below. It would be nice if we could do that first, rather than > > > > > having the current rather odd code. But maybe this way is an easier > > > > > transition. It isn't obviously wrong, it just isn't obviously right > > > > > either. > > > > > > > > > > > > > Yeah. It's just such a daunting task to have to change so much of the > > > > existing code. I'm looking for ways to make this simpler. > > > > > > > > I think it probably is reasonable for filemap_write_and_wait* to just > > > > sample it as early as possible in those functions. filemap_fdatawait is > > > > the real questionable one, as you may have already had some writebacks > > > > complete with errors. > > > > > > > > In any case, my thinking was that the old code is not obviously correct > > > > either, so while this shortens the "error capture window" on these > > > > calls, it seems like a reasonable place to start improving things. > > > > > > I agree. It wouldn't hurt to add a note to this effect in the patch > > > comment so that people understand that the code isn't seen to be > > > "correct" but only "no worse" with clear direction on what sort of > > > improvement might be appropriate. > > > > > > > I've got a cleaned-up set that is getting close to ready for > > reposting. Before I do though, I think there is another option here > > that's worth discussing. > > > > We could store a second wb_err_t (aka errseq_t in the new set) in the > > mapping that would would basically act as a "cursor" for these cases. > > filemap_check_errors would need to do something like > > filemap_report_wb_error, but it would swap the value into the mapping's > > cursor instead of dealing with the one in struct file. > > > > I don't really like adding yet another field here, but the struct > > address_space definition has this: > > > > __attribute__((aligned(sizeof(long)))); > > > > Adding the wb_err field means that we end up growing the struct by 8 > > bytes on x86_64 anyway. Adding another 4 bytes would just consume the > > pad, so it wouldn't cost anything there. YMMV on other arches of > > course. > > > > That's also not perfectly like what we have with AS_EIO/AS_ENOSPC > > flags, but is probably close enough not to matter. > > > > So...this would let us limp along for even longer with the model of > > reporting since last check. I'm not sure that's a good thing though. A > > long term goal here is to have kernel code that's dealing with > > writeback be more deliberate about the point from which it's checking > > errors, and this doesn't help promote that. > > I think this question needs some input from filesystem developers who > might be affected by the answer. > > My preference is to not add this field. I think we would eventually > want to remove it again, and it is easier to ensure it doesn't stay > forever if it is never added. > The version without this field isn't (I think) too bad, but maybe it is > bad enough to motivate fs developers to create a better solution in each > individual case. > > If some filesystem developer says they don't like that sort of social > engineering, or objects for any other reason, I will bow to the superior > stake they hold. > > That's pretty much my view too. I just figured I needed to throw the option out there in the interest of full disclosure. I think keeping a per-mapping cursor like this does make sense in some situations though. For instance, there does seem to be quite a bit of local fs journaling code that goes through the pagecache. For those, I could see keeping the cursor in some sort of per-journal structure, and doing a check-and-advance against that in appropriate places. This is an option we can bring up for folks who do want to continue to use a similar error tracking model in these situations though. -- Jeff Layton