From: Andreas Dilger Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) Date: Sun, 17 Apr 2011 20:45:56 -0600 Message-ID: <6C89E159-A5F6-4A06-A3D2-273BE4CFB9B5@dilger.ca> References: <20110414140222.GB1679@x4.trippels.de> <4DA70BD3.1070409@draigBrady.com> <4DA717B2.3020305@sandeen.net> <20110414225904.GK21395@dastard> <4DA7836A.5040604@draigBrady.com> <20110415000940.GL21395@dastard> <76FFF648-CA02-494B-A862-566C66A8CB82@dilger.ca> <20110416005040.GP21395@dastard> <4EEEA16E-1FDB-4430-A372-8F8701196E4C@mit.edu> <20110418004040.GS21395@dastard> Mime-Version: 1.0 (iPhone Mail 8G4) Content-Type: multipart/alternative; boundary=Apple-Mail-5-517528027 Content-Transfer-Encoding: 7bit Cc: Theodore Tso , Eric Sandeen , xfs-oss , "coreutils-mXXj517/zsQ@public.gmane.org" , "linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Markus Trippelsdorf To: Dave Chinner Return-path: In-Reply-To: <20110418004040.GS21395@dastard> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: coreutils-bounces+gcgcg-coreutils=m.gmane.org-mXXj517/zsQ@public.gmane.org Sender: coreutils-bounces+gcgcg-coreutils=m.gmane.org-mXXj517/zsQ@public.gmane.org List-Id: linux-ext4.vger.kernel.org --Apple-Mail-5-517528027 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 On 2011-04-17, at 6:40 PM, Dave Chinner wrote: > On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote: >>=20 >> On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote: >>> In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which= is fine.=20 >>=20 >> Except that if someone is copying a large delay allocated file, it will c= ause=20 >> the file to immediately snapped to disk, which might not be the greatest >> thing in the world.=20 >=20 > Obvious workaround - if the initial fiemap call shows unwritten > extents, redo it with the sync flag set. Though that assume=D1=95 that > you can trust things like delalloc extents to only cover the range > that valid data exists in. Which, of course, you can't assume, > either. :/ Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do anyt= hing if there is unwritten data, which is the only case we are concerned wit= h at this point. In any case, this is a simple solution for coreutils until= such a time that a more complex solution is added in the kernel (if ever). >> Christoph is write, SEEK_HOLE and SEEK_DATA are >> a much better API for what cp woulld lke to do. Unfortunately it hasn't >> been implemented yet in the VFS... >=20 > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem. I don't see how this will change the problem in any meaningful way. There wi= ll still need to be code that is traversing the on-disk mapping, and also ke= eping it coherent with unwritten data in the page cache. Since FIEMAP already exists for most Linux filesystems, it probably makes se= nse to implement SEEK_{HOLE,DATA} by calling FIEMAP to get the disk mapping i= n the first place. I agree that SEEK_{HOLE,DATA} is an easier programming interface, and probab= ly what cp, tar, etc should use, once it is implemented.=20 Cheers, Andreas= --Apple-Mail-5-517528027 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On 2011-04-17, at 6:40 PM, Dave Chinner &l= t;david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:<= /span>
On Sat, Apr 16, 2= 011 at 08:21:28AM -0400, Theodore Tso wrote:
On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
In that case, it m= eans cp should just always use FIEMAP_FLAG_SYNC, which is fine.
<= /blockquote>

Except that if someone is copying a lar= ge delay allocated file, it will cause
the file to immediately snapped to disk, which might not b= e the greatest
thing i= n the world.

Obvious workarou= nd - if the initial fiemap call shows unwritten
extents, red= o it with the sync flag set. Though that assume=D1=95 that
y= ou can trust things like delalloc extents to only cover the range
= that valid data exists in. Which, of course, you can't assume,<= br>either. :/

Always passi= ng FIEMAP_FLAG_SYNC is fine in this case. It should only do anything if t= here is unwritten data, which is the only case we are concerned with at this= point.  In any case, this is a simple solution for coreutils until suc= h a time that a more complex solution is added in the kernel (if ever).

Christoph is write, SEEK_HOLE and SEEK_DATA are
a much better API for what cp woulld lke to do= .  Unfortunately it hasn't
been implemented yet in the VFS...
<= /span>
Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this p= roblem.

I don't see how this will change the problem in any meaningf= ul way. There will still need to be code that is traversing the on-disk mapp= ing, and also keeping it coherent with unwritten data in the page cache.

Since FIEMAP already exists for most Linux filesystems= , it probably makes sense to implement SEEK_{HOLE,DATA} by calling FIEMAP to g= et the disk mapping in the first place.

I agree that SEEK_{HOLE,DATA} is an easier programming interface= , and probably what cp, tar, etc should use, once it is implemented. 

Cheers, Andreas
= --Apple-Mail-5-517528027--