On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> Telldir() and seekdir() are basically implementation horrors for any
> file system that is using anything other than a simple array of
> directory entries ala the V7 Unix file system or the BSD FFS. For any
> file system which is using a more advanced data structure, like
> b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> readdir stream.
I'll just point you to this:
http://marc.info/?l=linux-ext4&m=136081996316453&w=2
so you can see that XFS implements what you say can't possibly be
done. ;)
FWIW, that post only talked about the data segment. I didn't mention
that XFS has 2 other segments in the directory file (both beyond
EOF) for the directory data indexes. One contains the name-hash btree
index used for name based lookups and the other contains a freespace
index for tracking free space in the data segment.
IOWs persistent, deterministic, low cost telldir/seekdir behaviour
was a problem solved in the 1990s. :)
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Thu, Feb 14, 2013 at 05:10:02PM +1100, Dave Chinner wrote:
> On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> > Telldir() and seekdir() are basically implementation horrors for any
> > file system that is using anything other than a simple array of
> > directory entries ala the V7 Unix file system or the BSD FFS. For any
> > file system which is using a more advanced data structure, like
> > b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> > readdir stream.
>
> I'll just point you to this:
>
> http://marc.info/?l=linux-ext4&m=136081996316453&w=2
>
> so you can see that XFS implements what you say can't possibly be
> done. ;)
>
> FWIW, that post only talked about the data segment. I didn't mention
> that XFS has 2 other segments in the directory file (both beyond
> EOF) for the directory data indexes. One contains the name-hash btree
> index used for name based lookups and the other contains a freespace
> index for tracking free space in the data segment.
OK, so in some sense that reduces the problem to that of implementing
readdir cookies for directories that are stored in a simple linear
array.
Which I should know how to do but I don't: I guess all you need is a
provision for making holes on remove (so that you aren't required move
existing entries, messing up offsets for concurrent readers)?
Purely out of curiosity: is there a more detailed writeup of XFS's
directory format? (Or a pointer to a piece of the code a person could
understand without losing a month to it?)
--b.
>
> IOWs persistent, deterministic, low cost telldir/seekdir behaviour
> was a problem solved in the 1990s. :)
On Thu, Feb 14, 2013 at 05:01:10PM -0500, J. Bruce Fields wrote:
> On Thu, Feb 14, 2013 at 05:10:02PM +1100, Dave Chinner wrote:
> > On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> > > Telldir() and seekdir() are basically implementation horrors for any
> > > file system that is using anything other than a simple array of
> > > directory entries ala the V7 Unix file system or the BSD FFS. For any
> > > file system which is using a more advanced data structure, like
> > > b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> > > readdir stream.
> >
> > I'll just point you to this:
> >
> > http://marc.info/?l=linux-ext4&m=136081996316453&w=2
> >
> > so you can see that XFS implements what you say can't possibly be
> > done. ;)
> >
> > FWIW, that post only talked about the data segment. I didn't mention
> > that XFS has 2 other segments in the directory file (both beyond
> > EOF) for the directory data indexes. One contains the name-hash btree
> > index used for name based lookups and the other contains a freespace
> > index for tracking free space in the data segment.
>
> OK, so in some sense that reduces the problem to that of implementing
> readdir cookies for directories that are stored in a simple linear
> array.
*nod*
> Which I should know how to do but I don't: I guess all you need is a
> provision for making holes on remove (so that you aren't required move
> existing entries, messing up offsets for concurrent readers)?
Exactly.
The data segment is a virtual mapping that is maintained by the
extent tree, so we can simply punch holes in it for directory blocks
that are empty and no longer referenced. i.e. the data segement
really is just a sparse file.
The result of doing block mapping this way is that the freespace
tracking segment actually only needs to track space in partially
used blocks. Hence we only need to allocate new blocks when the
freespace map empties, And we work out where to allocate the new
block in the virtual map by doing an extent tree lookup to find the
first hole....
> Purely out of curiosity: is there a more detailed writeup of XFS's
> directory format? (Or a pointer to a piece of the code a person could
> understand without losing a month to it?)
Not really. There's documentation of the on-disk structures, but
it's a massive leap from there to understanding the structure and
how it all ties together. I've been spending the past couple of
months deep in the depths of the XFS directory code so how it all
works is front-and-center in my brain right now...
That said, the thought had crossed my mind that there's a a couple
of LWN articles/conference talks I could put together as a brain
dump. ;)
Cheers,
Dave.
--
Dave Chinner
[email protected]