Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp5771822ybl; Tue, 14 Jan 2020 14:51:24 -0800 (PST) X-Google-Smtp-Source: APXvYqwahBfOtGgCc499S8aaIqjMzyQnTjeF+806g4bH6cSjPPUyRJb0yt/czdlkEmKHGCR94hmt X-Received: by 2002:aca:458:: with SMTP id 85mr19218106oie.56.1579042284789; Tue, 14 Jan 2020 14:51:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579042284; cv=none; d=google.com; s=arc-20160816; b=fhzBsSJCMUYyutHzcV45CdWvTbE6q4eQ/MY2A2EXsCEbLybzm3W89PHkBkDE4L2hKF dqV1zcxqnZBlx3wWo3bkz4WO9a0l5DLugcdeWXbHWGyhUG6Npizn2IAfnMwc8QhuEmav cjc0xMYhioFUl3pTzfDjZ5mK/SdYBZITnNdeCG8KUon6IPjCm5DQiie0I4NYUyiVIkPS ey+ukmQJbqGuo0XuxqQEClxw0nRKsZ8GMFQz3GiE9HxrgEbNxE6xA6cNcQ/OeTn3A+PJ OxCySxnBp98XW74ZAXQMK4h1RBufL87pA7vsVKSzWjAXsvDRhFdn4cTeT/fWUCoHqx4s ZFBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=wS1V5SbTbKbxz7a7SbsneormlqiWh5KuVDaLpDAWeV4=; b=JIq0l9EuwipZ5u4+l2GmcighE1RSv8IdqjuxKG9nQBfpYJ/ETTnJzzzJd+PP6zBJ4w dkn2TX+aRX164xpgZm10mbUnB3D4PgsUO8S+jy7UqFMsWLU8/JEa6IMAXeYpbaMni59m DJRKQyL8qBMgq9VyNS6oOKFljAO+qLh7hQmdLuNsrYvBx9BBrtGmkS4PHc3Y6VYJ4t71 Jutup7p1thzoK0PvZjbUkywBPG68FAtZwO3CIbdOPW79C00/dgAXC03tMoLc1MQXMfN2 /hn0oUVRto4+AKTDJG6JOcUZBRd+k/Pnoq7I4JSg8s8S7ak0xQJH9bM1oUFYCoAD5Mtq B9+A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e3si9844901otp.286.2020.01.14.14.51.10; Tue, 14 Jan 2020 14:51:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728759AbgANWvC (ORCPT + 99 others); Tue, 14 Jan 2020 17:51:02 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:33606 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728746AbgANWvB (ORCPT ); Tue, 14 Jan 2020 17:51:01 -0500 Received: from callcc.thunk.org (guestnat-104-133-0-108.corp.google.com [104.133.0.108] (may be forged)) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 00EMnH2t015142 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Jan 2020 17:49:18 -0500 Received: by callcc.thunk.org (Postfix, from userid 15806) id 441C34207DF; Tue, 14 Jan 2020 17:49:17 -0500 (EST) Date: Tue, 14 Jan 2020 17:49:17 -0500 From: "Theodore Y. Ts'o" To: David Howells Cc: linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, hch@lst.de, adilger.kernel@dilger.ca, darrick.wong@oracle.com, clm@fb.com, josef@toxicpanda.com, dsterba@suse.com, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: Problems with determining data presence by examining extents? Message-ID: <20200114224917.GA165687@mit.edu> References: <4467.1579020509@warthog.procyon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4467.1579020509@warthog.procyon.org.uk> Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Tue, Jan 14, 2020 at 04:48:29PM +0000, David Howells wrote: > Again with regard to my rewrite of fscache and cachefiles: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter > > I've got rid of my use of bmap()! Hooray! > > However, I'm informed that I can't trust the extent map of a backing file to > tell me accurately whether content exists in a file because: > > (a) Not-quite-contiguous extents may be joined by insertion of blocks of > zeros by the filesystem optimising itself. This would give me a false > positive when trying to detect the presence of data. > > (b) Blocks of zeros that I write into the file may get punched out by > filesystem optimisation since a read back would be expected to read zeros > there anyway, provided it's below the EOF. This would give me a false > negative. > > Is there some setting I can use to prevent these scenarios on a file - or can > one be added? I don't think there's any way to do this in a portable way, at least today. There is a hack we could be use that would work for ext4 today, at least with respect to (a), but I'm not sure we would want to make any guarantees with respect to (b). I suspect I understand why you want this; I've fielded some requests for people wanting to do something very like this at $WORK, for what I assume to be for the same reason you're seeking to do this; to create do incremental caching of files and letting the file system track what has and hasn't been cached yet. If we were going to add such a facility, what we could perhaps do is to define a new flag indicating that a particular file should have no extent mapping optimization applied, such that FIEMAP would return a mapping if and only if userspace had written to a particular block, or had requested that a block be preallocated using fallocate(). The flag could only be set on a zero-length file, and this might disable certain advanced file system features, such as reflink, at the file system's discretion; and there might be unspecified performance impacts if this flag is set on a file. File systems which do not support this feature would not allow this flag to be set. - Ted