2011-04-19 21:08:25

by Dave Chinner

[permalink] [raw]
Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)

On Tue, Apr 19, 2011 at 10:09:09AM -0400, Ted Ts'o wrote:
> On Tue, Apr 19, 2011 at 05:45:38PM +1000, Dave Chinner wrote:
> > You are *not listening*. There is no #2. FIEMAP returns the extent
> > state _on disk_ at the time of the call.
>
> Dave, you're being rather strident about your insistence about what
> FIEMAP's semantics are.

The bit about the page cache state being relevant? That's what I was
refering to here.

> Part of the problem here is that it's *not*
> clear or settled.
>
> If it really is the state _on_ _disk_, does XFS really have a DELALLOC
> bit _on_ _disk_?

Sigh. No.

This whole thing blew up because of unwritten extent behaviour when
there is dirty page cache covering and unwritten extent. Delalloc
was not the issue - what I said is absolutely true for unwritten
extents. Somewhere in the middle someone started talking about
delalloc extents and conflating their behaviour with unwritten
extents, but I continued to talk about unwritten extents and
cached pages.

Even so, for delalloc extents the dirty page state in the page cache
is irrelevant. I've said earlier that XFS delalloc extents can span
regions that have no page cache state - they don't get reported as
holes by FIEMAP because they are tracked as delalloc. IOWs, like
unwritten extents, you can't rely on delalloc extents to tell you
where the data is in the file.

So, it logically follws that you need to use the SYNC flag for both
unwritten extents and delalloc extents to find out where there data
realy lies by converting them to real, written extents. i.e. the
only extents you can trust contain data from FIEMAP are the real
extents on disk....

Cheers,

Dave.
--
Dave Chinner
[email protected]


2011-04-20 15:29:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)

On Wed, Apr 20, 2011 at 07:08:25AM +1000, Dave Chinner wrote:
> So, it logically follws that you need to use the SYNC flag for both
> unwritten extents and delalloc extents to find out where there data
> realy lies by converting them to real, written extents. i.e. the
> only extents you can trust contain data from FIEMAP are the real
> extents on disk....

Even more funny is that the bug report that started this thread involved
software that didn't actually care about the location on disk, at all.

cp from coreutils really just wanted an efficient way to skip holes
in sparse files, and we got into a chain reaction of various flaws
and oversights :

(1) Linux lacks the SEEK_HOLE/SEEK_DATA interface that would make
skipping holes trivial and thus coreutils has to use FIEMAP.
(2) ext4 and btrfs in some cases mishandled reporting delalloc
extents, which means coreutils had to add the sync flag,
despite not caring where data is on disk
(3) coreutils tried to treat unwrittent extents as holes. Which
makes some sense given their high-level description, although
probably not too much in practice given that we explicitly
allocated blocks to these "holes" to optimize performance.
But the main issue here is that there is no documentation
that clearly states that unwrittent extents reported by
FIEMAP may actually contain useful data. In fact there's
no useful documentation for FIEMAP outside the kernel tree.
And interface that complex really needs a manpage.