Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp1094612ybv; Fri, 7 Feb 2020 14:23:44 -0800 (PST) X-Google-Smtp-Source: APXvYqxZ9tPBanP69s/p9eZE7rJ+QmNQDpqmbd0H8HER96TqdRNINyyWeOr5+C29UXKNuJtlosOQ X-Received: by 2002:aca:2109:: with SMTP id 9mr3359247oiz.119.1581114224187; Fri, 07 Feb 2020 14:23:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581114224; cv=none; d=google.com; s=arc-20160816; b=YjAPOjDOpFOlSr4CrCZYBqEtKJQed0pv+uVRQp1VgwO1Vht9E+00n77wFzw1h3R6xv RliKYfCj648lH+0BcJ+O/107+NOdZJcOPFbDK94OYsSKIYXhJevrvR630QV+IH/Y5ogt uV0urLsPBJgm2TMy+GiNFuaEzO/XlXqiOZ73Qcu23aKoO98ViQy8BJur16y6XXFPFDpn /dYilZg153U48T5owOYR0k49yG6yydzrIsG4WFSgZTJaeLTE0bAngFsJunLIVmzQwDa8 cRdiXwrigxEKcuuCy/yXE7koesae/aPc1ihysBKWqnAn2jOfLW3liiXBJ00LdlKaWaNB io2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature:dkim-signature; bh=IL/LfkaWT2urQsB/VwVzpCWTPCd3Otm9ZgC8nW8C82A=; b=eYqlBpQP+vHP+qsWKcVyD73PM8Ub13WsFvGt6yFjiOoZ+TTTGN5JFFWIkNGxEm//pQ VuIcQzU9CYarSQL8V8kgnJJdH4ZOIODHvn/AbMTkEmShEPPFGbJZGlkWwrwdes9EFzAa Ha1lQ75DS34SkGT0WrrK4ezIowpkIRi+C3XTZquctF1laQMNW4WKZzdPvcd/0ZQyA2gi DKQn/Tv3AYczFl3+xu4shUwuTAe+JXao0M374B9qY1X8MyC9RFIoBj8rHBH3hjNfb1oe SZwdn7NFlHTQY0uziDKWd7c4fSmpNYiNy3tFdzXrUFxwFs6Dw2zZe7C/HxZIGS2FjL61 VzJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@anarazel.de header.s=fm3 header.b=c9GVvFUQ; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=NmcDdrsa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a143si4485228oii.179.2020.02.07.14.23.31; Fri, 07 Feb 2020 14:23:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@anarazel.de header.s=fm3 header.b=c9GVvFUQ; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=NmcDdrsa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727317AbgBGWVe (ORCPT + 99 others); Fri, 7 Feb 2020 17:21:34 -0500 Received: from new4-smtp.messagingengine.com ([66.111.4.230]:36439 "EHLO new4-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727032AbgBGWVe (ORCPT ); Fri, 7 Feb 2020 17:21:34 -0500 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailnew.nyi.internal (Postfix) with ESMTP id EB2C21C01; Fri, 7 Feb 2020 17:21:32 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Fri, 07 Feb 2020 17:21:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm3; bh=IL/LfkaWT2urQsB/VwVzpCWTPCd 3Otm9ZgC8nW8C82A=; b=c9GVvFUQL80CP5G7aPFzIZxHB6c9gkhBq4HDcF3x/iS 5U15En+1PQy2tAnq1dQgCQY/7wTE1F51Vk7jv8AiabSp6LRe1l8SFmKGQqcflHd/ CBBFtm0KJLfBHfD7eNyEZKV5gdXEalo8xZwjW6Cr8ValjSiKow8FJG3A8+2CPbco W8eRSvdDPFFoRnmi8cO+4oh3viBrta7VXBJo3UGrmmHGmQ+DPxACnccSkf7IoKip bMF3laEtQzFtUrzSBS7sYtKWAHfkwYKY8i8AIP2Owa9EyGHfjMBC9wH5sS7dOyjh ybqNDYCKmOsn3P/9IAoRNhGeis3ZObkftmjj2I2Zbvg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=IL/Lfk aWT2urQsB/VwVzpCWTPCd3Otm9ZgC8nW8C82A=; b=NmcDdrsa97Tx5tzpeDOORp uuPrkqmO36dy+vtvJO65zi3s9Vgtnx86di/afooGw3loWUopGKeki2HPU9TDRfvS gysWUE+gTVxtg0NkfpkKEB61kaN5yTjxw+SBQFhAsnTRYUQvYS3qqNEFY7N0bxIe Z+VYCObqFKWyH3RNbz1FezU8y2UhlknG4CZALpxci/YAXGqi+hOZ+tDvrQPdWyVM 7TznSCMAYUvPGuE8W/Js0tR4ZZMTnJDh202sN+Zyeur3C//UWrP0CUDwOi7ppEOa Z4ODu340DQtEyRusHv9TLESha3Ix/momjcYPURMNml5db/FUEa1I6kPWekej6MEw == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedugedrheehgdduiedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucfkphepie ejrdduiedtrddvudejrddvhedtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghm pehmrghilhhfrhhomheprghnughrvghssegrnhgrrhgriigvlhdruggv X-ME-Proxy: Received: from intern.anarazel.de (c-67-160-217-250.hsd1.ca.comcast.net [67.160.217.250]) by mail.messagingengine.com (Postfix) with ESMTPA id 3195F328005E; Fri, 7 Feb 2020 17:21:32 -0500 (EST) Date: Fri, 7 Feb 2020 14:21:30 -0800 From: Andres Freund To: Jeff Layton Cc: Dave Chinner , viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, willy@infradead.org, dhowells@redhat.com, hch@infradead.org, jack@suse.cz, akpm@linux-foundation.org Subject: Re: [PATCH v3 0/3] vfs: have syncfs() return error when there are writeback errors Message-ID: <20200207222130.urcfi3i3dlfscimy@alap3.anarazel.de> References: <20200207170423.377931-1-jlayton@kernel.org> <20200207205243.GP20628@dread.disaster.area> <20200207212012.7jrivg2bvuvvful5@alap3.anarazel.de> <220e015c525650588f24d17f549cd0a87ec518fd.camel@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <220e015c525650588f24d17f549cd0a87ec518fd.camel@kernel.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 2020-02-07 17:05:28 -0500, Jeff Layton wrote: > On Fri, 2020-02-07 at 13:20 -0800, Andres Freund wrote: > > On 2020-02-08 07:52:43 +1100, Dave Chinner wrote: > > > On Fri, Feb 07, 2020 at 12:04:20PM -0500, Jeff Layton wrote: > > > > You're probably wondering -- Where are v1 and v2 sets? > > > > The basic idea is to track writeback errors at the superblock level, > > > > so that we can quickly and easily check whether something bad happened > > > > without having to fsync each file individually. syncfs is then changed > > > > to reliably report writeback errors, and a new ioctl is added to allow > > > > userland to get at the current errseq_t value w/o having to sync out > > > > anything. > > > > > > So what, exactly, can userspace do with this error? It has no idea > > > at all what file the writeback failure occurred on or even > > > what files syncfs() even acted on so there's no obvious error > > > recovery that it could perform on reception of such an error. > > > > Depends on the application. For e.g. postgres it'd to be to reset > > in-memory contents and perform WAL replay from the last checkpoint. Due > > to various reasons* it's very hard for us (without major performance > > and/or reliability impact) to fully guarantee that by the time we fsync > > specific files we do so on an old enough fd to guarantee that we'd see > > the an error triggered by background writeback. But keeping track of > > all potential filesystems data resides on (with one fd open permanently > > for each) and then syncfs()ing them at checkpoint time is quite doable. > > > > *I can go into details, but it's probably not interesting enough > > > > Do applications (specifically postgresql) need the ability to check > whether there have been writeback errors on a filesystem w/o blocking on > a syncfs() call? I thought that you had mentioned a specific usecase > for that, but if you're actually ok with syncfs() then we can drop that > part altogether. It'd be considerably better if we could check for errors without a blocking syncfs(). A syncfs will trigger much more dirty pages to be written back than what we need for durability. Our checkpoint writes are throttled to reduce the impact on current IO, we try to ensure there's not much outstanding IO before calling fsync() on FDs, etc - all to avoid stalls. Especially as on plenty installations there's also temporary files, e.g. for bigger-than-memory sorts, on the same FS. So if we had to syncfs() to reliability detect errros it'd cause some pain - but would still be an improvement. But with a nonblocking check we could compare the error count from the last checkpoint with the current count before finalizing the checkpoint - without causing unnecessary writeback. Currently, if we crash (any unclean shutdown, be it a PG bug, OS dying, kill -9), we'll iterate over all files afterwards to make sure they're fsynced, before starting to perform WAL replay. That can take quite a while on some systems - it'd be much nicer if we could just syncfs() the involved filesystems (which we can detect more quickly than iterating over the whole directory tree, there's only a few places where we support separate mounts), and still get errors. > Yeah, if we do end up keeping it, I'm leaning toward making this > fetchable via fsinfo() (once that's merged). If we do that, then we'll > split this into a struct with two fields -- the most recent errno and an > opaque token that you can keep to tell whether new errors have been > recorded since. > > I think that should be a little cleaner from an API standpoint. Probably > we can just drop the ioctl, under the assumption that fsinfo() will be > available in 5.7. Sounds like a plan. I guess an alternative could be to expose the error count in /sys, but that looks like it'd be a bigger project, as there doesn't seem to be a good pre-existing hierarchy to hook into. Greetings, Andres Freund