Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp6504064ybl; Wed, 15 Jan 2020 05:50:45 -0800 (PST) X-Google-Smtp-Source: APXvYqxqmCvLg/GgBgk7cZRCD6ixO4VurFl5DX6oP37MtgHGByRVpkJ01jZSdTvhuZU83/ksud8S X-Received: by 2002:aca:fcd1:: with SMTP id a200mr20184215oii.74.1579096245369; Wed, 15 Jan 2020 05:50:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579096245; cv=none; d=google.com; s=arc-20160816; b=HRYDM/EhYLxfhpqToEVgil+VRpfq+cS9kc+9+tgT72wnfU3JbwTxTKvoP4Os4wdhTM 8W0bTcsJuoGKTZyCH+rCfACV9Y4c2gaCf0G1Umkws1fAlokY1UHMAPD8IxAXT69bleF7 SrwlfR+L037iTz3YUJ3ebf9EPPU1ge77fAIjxv4SBvzqns+5/RJrJbxGMRN7ugOnUlOV l1FqrkD5aQH1vdZUNX8t3JxUyWpdRndWo2yYmuukz31si940UzNFXPgXUM+P7VdO6iYU ePOy+z7KNwYQFNIY4Ht78juL9m3A7fKaV9tgRgkWt4dJDmwyWLPpjOV3MZQtL7e+OVnu 4eyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:content-id:mime-version :subject:cc:to:references:in-reply-to:from:organization :dkim-signature; bh=w+swK9osAIFZi3cI9plfEPgc0HcD0yrb1cBEuZ3PWuI=; b=LowQ4oF6JP8SmpHkUTqEQuFXJTn9RCqOwRKFuYcUz9Ap4wViCB9kKgnnomzN6XlGGz aboyS8eAXE1yhssCBNqEyCxy36lrwOlhm/CM0Lt2WPVm0AAX3ZJqBhya/vg9yJcpGunO dDwfhJ9LlGlc/SxSwG4Wb8Dy6Cx85l0dATQef1vO2+PWI6cFFsO5LM5fmcG6x2D0TWGq Vz+S51Z0jb3JC9rfJY9Bhi2Y744ASvMgXO9pen8jF3YYsUlPOwugEl2K63tkYYWvRDQM 4bDITgcomEjcevenQABNAJsdm6J24vnSefmM4SIUegTFEzm+GbaDiUb/zNN9LQ6PsOds AxkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CE5Gv8Oj; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n14si11093859otk.179.2020.01.15.05.50.30; Wed, 15 Jan 2020 05:50:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CE5Gv8Oj; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728992AbgAONuV (ORCPT + 99 others); Wed, 15 Jan 2020 08:50:21 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:56468 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728986AbgAONuV (ORCPT ); Wed, 15 Jan 2020 08:50:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1579096220; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=w+swK9osAIFZi3cI9plfEPgc0HcD0yrb1cBEuZ3PWuI=; b=CE5Gv8OjqLpwR9x9G/v76Xqsw0JqrzhccwgY88x6gdAN0WaRJQ1VdgeGQcXRwPWACULl2v oNtIL+FQwUKOArNC8LmAiGj4u4p8kQxT43j+P6eBP7asPoRTuO4vC+s1Ko0YZIUIE2nBNd pVVMWJeBj15pTcbdfH0lU2r3IJJoQjk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-15-GOHqpJ9cNSinEYjPiomTzw-1; Wed, 15 Jan 2020 08:50:16 -0500 X-MC-Unique: GOHqpJ9cNSinEYjPiomTzw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 837421005502; Wed, 15 Jan 2020 13:50:14 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-120-52.rdu2.redhat.com [10.10.120.52]) by smtp.corp.redhat.com (Postfix) with ESMTP id 142CC19C5B; Wed, 15 Jan 2020 13:50:11 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <20200114224917.GA165687@mit.edu> References: <20200114224917.GA165687@mit.edu> <4467.1579020509@warthog.procyon.org.uk> To: "Theodore Y. Ts'o" Cc: dhowells@redhat.com, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, hch@lst.de, adilger.kernel@dilger.ca, darrick.wong@oracle.com, clm@fb.com, josef@toxicpanda.com, dsterba@suse.com, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: Problems with determining data presence by examining extents? MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <22055.1579096211.1@warthog.procyon.org.uk> Date: Wed, 15 Jan 2020 13:50:11 +0000 Message-ID: <22056.1579096211@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Theodore Y. Ts'o wrote: > but I'm not sure we would want to make any guarantees with respect to (b). Um. That would potentially make disconnected operation problematic. Now, it's unlikely that I'll want to store a 256KiB block of zeros, but not impossible. > I suspect I understand why you want this; I've fielded some requests > for people wanting to do something very like this at $WORK, for what I > assume to be for the same reason you're seeking to do this; to create > do incremental caching of files and letting the file system track what > has and hasn't been cached yet. Exactly so. If I can't tap in to the filesystem's own map of what data is present in a file, then I have to do it myself in parallel. Keeping my own list or map has a number of issues: (1) It's redundant. I have to maintain a second copy of what the filesystem already maintains. This uses extra space. (2) My map may get out of step with the filesystem after a crash. The filesystem has tools to deal with this in its own structures. (3) If the file is very large and sparse, then keeping a bit-per-block map in a single xattr may not suffice or may become unmanageable. There's a limit of 64k, which for bit-per-256k limits the maximum mappable size to 1TiB (I could use multiple xattrs, but some filesystems may have total xattr limits) and whatever the size, I need a single buffer big enough to hold it. I could use a second file as a metadata cache - but that has worse coherency properties. (As I understand it, setxattr is synchronous and journalled.) > If we were going to add such a facility, what we could perhaps do is > to define a new flag indicating that a particular file should have no > extent mapping optimization applied, such that FIEMAP would return a > mapping if and only if userspace had written to a particular block, or > had requested that a block be preallocated using fallocate(). The > flag could only be set on a zero-length file, and this might disable > certain advanced file system features, such as reflink, at the file > system's discretion; and there might be unspecified performance > impacts if this flag is set on a file. That would be fine for cachefiles. Also, I don't need to know *where* the data is, only that the first byte of my block exists - if a DIO read returns short when it reaches a hole. David