Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2402997pxb; Tue, 9 Mar 2021 01:23:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJylSGkZPoGCfuMdiG2c0opTzs6Td070xeZxLbqyKWDOUWv2QCI3BDev+d9CHWnDGuvFO0TA X-Received: by 2002:a05:6402:30a5:: with SMTP id df5mr3049110edb.24.1615281794911; Tue, 09 Mar 2021 01:23:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615281794; cv=none; d=google.com; s=arc-20160816; b=txw2pqKQwosavrfbEWrTw5VO6jxawd6nf3mnl7KrULh65dM13WEO278pq+lYphylcp VfGhoRIxiKnTq9SG5kyXccSwx5Bdy8RC3HNt0S/kS+75suNDI4CVSn0DVRIn4py7eLTQ mcyz6AmxQTOocx3oMXN65P1oUqvkGce1XTN6AVwd7ZOcDi3VMnGQH5ky9a+tbwcRsiHY LCso/NoYAFa1Ja4A7txrhZPY7dw/TY80xOE5QLngzbSg7DY7Lp838+8AJ3/UY/LNueM0 JVJbcA5EFLOkVA5W3TcDLYfioReKMC5DorieWkPHZsww3abQKAAO9bAlZ+r3JE0d+Asn ZLCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:content-transfer-encoding :content-id:mime-version:subject:cc:to:references:in-reply-to:from :organization:dkim-signature; bh=JbGQhFsWORLegeM9HGujB/YiUTVMdc6Nb8R30vrP8qg=; b=b9TKhibH2bDQasrsXaxDoG0uZqObHz2CyLjMlk/x3ETe1rT30WjqAhTet3iA+w2xPT MXqGpLIqQOvvpGBqtdlObbngl0o4HbvyFlx3SrBCAQzNRu1foxpKDsc9vOMxKKeh5t3E 4XIFIEBLSEqtNlLKnJz5Ebs35JLrmkd6R8egKPYpQSo549M+G2p+jRqPZKPX8VCJkyZ1 0bBP/x67Ef5zitYKIeifwr2pQJTX6DmUAq7MtgvFyT3/94fx7JChCjQZ6WrQ7aUXj1Gb gQQ1YxBGG+ptKJFeH8dQmOg3DSpMybfraOcYMbDClJssDE39F9t56zgfn9hCjRIBMO6x GV7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=QANZGjB7; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y14si9243295edt.78.2021.03.09.01.22.42; Tue, 09 Mar 2021 01:23:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=QANZGjB7; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229480AbhCIJWL (ORCPT + 99 others); Tue, 9 Mar 2021 04:22:11 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:42462 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229553AbhCIJV7 (ORCPT ); Tue, 9 Mar 2021 04:21:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615281718; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JbGQhFsWORLegeM9HGujB/YiUTVMdc6Nb8R30vrP8qg=; b=QANZGjB7q6airvkYHP7s24/3Y5T31h0I4MEXFYStBomr1I7e/F8+wGOwiOFDAI/hpfCgv3 GcM1YE5ZQnsM0hFr2wJmRhvM6NJYZsfz3zLwrSFABEInCoZ+/ar5/M7JioDVKrI91Ib20q hnCupH9AHjs4vhtG/6/TDJVs2rpTbN8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-566-WXeLrfhUNES_-vEl3omWaQ-1; Tue, 09 Mar 2021 04:21:57 -0500 X-MC-Unique: WXeLrfhUNES_-vEl3omWaQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 92A4B193F560; Tue, 9 Mar 2021 09:21:53 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id A9F7059458; Tue, 9 Mar 2021 09:21:46 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <20210308215535.GA63242@dread.disaster.area> References: <20210308215535.GA63242@dread.disaster.area> <2653261.1614813611@warthog.procyon.org.uk> <517184.1615194835@warthog.procyon.org.uk> To: Dave Chinner Cc: dhowells@redhat.com, Amir Goldstein , linux-cachefs@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , "J. Bruce Fields" , Christoph Hellwig , Dave Chinner , Alexander Viro , linux-afs@lists.infradead.org, Linux NFS Mailing List , CIFS , ceph-devel , v9fs-developer@lists.sourceforge.net, linux-fsdevel , linux-kernel , Miklos Szeredi Subject: Re: fscache: Redesigning the on-disk cache MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <152280.1615281705.1@warthog.procyon.org.uk> Content-Transfer-Encoding: quoted-printable Date: Tue, 09 Mar 2021 09:21:45 +0000 Message-ID: <152281.1615281705@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Dave Chinner wrote: > > > With ->fiemap() you can at least make the distinction between a non > > > existing and an UNWRITTEN extent. > > = > > I can't use that for XFS, Ext4 or btrfs, I suspect. Christoph and Dav= e's > > assertion is that the cache can't rely on the backing filesystem's met= adata > > because these can arbitrarily insert or remove blocks of zeros to brid= ge or > > split extents. > = > Well, that's not the big problem. The issue that makes FIEMAP > unusable for determining if there is user data present in a file is > that on-disk extent maps aren't exactly coherent with in-memory user > data state. > = > That is, we can have a hole on disk with delalloc user data in > memory. There's user data in the file, just not on disk. Same goes > for unwritten extents - there can be dirty data in memory over an > unwritten extent, and it won't get converted to written until the > data is written back and the filesystem runs a conversion > transaction. > = > So, yeah, if you use FIEMAP to determine where data lies in a file > that is being actively modified, you're going get corrupt data > sooner rather than later. SEEK_HOLE/DATA are coherent with in > memory user data, so don't have this problem. I thought you and/or Christoph said it *was* a problem to use the backing filesystem's metadata to track presence of data in the cache because the filesystem (or its tools) can arbitrarily insert blocks of zeros to bridge/break up extents. If that is the case, then that is a big problem, and SEEK_HOLE/DATA won't suffice. If it's not a problem - maybe if I can set a mark on a file to tell the filesystem and tools not to do that - then that would obviate the need for= me to store my own maps. David