2007-09-07 05:46:39

by Bharata B Rao

[permalink] [raw]
Subject: [RFC] Union Mount: Readdir approaches

Hi,

Any filesystem namespace unification solution (Union Mount, Unionfs) needs to
provide a unified or merged view of directory contents. Typically this is
done by reading the directory entries of all the union'ed layers (starting
from the top and working downwards) and merging the result by eliminating
the duplicate entries. This is done by extending the getdents/readdir system
calls to support the notion of union'ed directories.

Here I try to describe briefly the approaches we have tried with Union Mount
and the issues we are facing. The intention is to solicit feedback and
suggestions on how to solve this problem of merging directory contents.

Approach 1
----------
This is the approach which is present in the current Union Mount patches.

In this approach, the directory entries are read starting from the topmost
directory and they are maintained in a cache. Subsequently, when the entries
from the lower directories are read, they are checked for duplicates (in the
cache) before being passed out to the user space. There can be multiple calls
to readdir/getdents routines for reading the entries of a single directory.
But union directory cache is not maitained across these calls. Instead
for every call, the previously read entries are re-read into the cache
and newly read entires are compared against these for duplicates before
being they are returned to user space.

Since Union Mount does unioning at VFS layer, the file descriptor
of the topmost directory represents all the underlying directories. So
even though readdir() happens on the same file struct, it needs to internally
obtain file structs for lower directories and issue readdir() on them
separately. While this itself is fine, it causes a problem when the
directory is lseek'ed.

In this approach, the file position of the topmost directory is linearly
scaled to accommodate the (inode) sizes of the underlying directories.
For eg, if the union has two layers (with inode sizes 100 each), then
at the end of reading the lower directory, the file->f_pos of the topmost
directory would be 200. So, it is possible for the file position
(of the topmost directory) to exceed it's inode size.

This approach works only if all the layers of the union have flat file
directories (like ext2). And it fails for those filesystems which return
a directory offset (dirent->d_off) greater than the inode size. And it allows
a sane lseek() behaviour on union'ed directories (only for flat file
directories).

Ref: http://lkml.org/lkml/2007/7/30/174

Approach 2
----------
This approach was part of a few versions of Union Mount patches.

Like Approach 1, this also maintains a cache of directory
entries, but this cache persists across readdir calls. So unlike the
previous approach, here the cache is not destroyed and rebuilt for
every readdir(). Instead the cache information is left hanging off
the file structure of the topmost directory. In addition to the
cached entries, it also stores information about the current
directory(vfsmnt, dentry) in the union stack on which last readdir() was done
and also the offset (file->f_pos) at which the readdir() stopped.
Subsequent readdir() will fetch this information from file struct (of the top
layer) and start reading the entries at where the previous readdir() had
stopped. And this approach doesn't try to interpret or change file->f_pos.

Since the information about the directory and the offset within it is
stored separately when the readdir() ends, this approach works for all
filesystems, irrespective of the method used by them to encode directory's
file->f_pos. But again, this can't support a sane lseek() on union'ed
directories in its current form.

Ref: http://lkml.org/lkml/2007/6/20/21

Approach 3
----------
Chistoph Hellwig suggested that to avoid a single file struct representing
multiple objects, ->readdir() which is currently a file operation should
instead be changed to inode operation. But it is not sure atm how this could
help because getdents(2) and readdir(2) restrict us to use a open file
descriptor and there is one fd for union'ed directories. Moreover we need
a single offset element to correctly represent the offset into lower
directories so as to support lseek(2) correctly.

Ref: http://lkml.org/lkml/2007/6/20/107

Questions
---------
The main problem in getting a sane readdir() implementation in Union Mount
is the fact that a single vfs object (file structure) is used to represent
more than one (underlying) directory. Because of this, it is unclear as to
how lseek(2) needs to behave in such cases.

First of all, should we even expect a sane lseek(2) on union mounted
directories ? If not, will the Approach 2, which works uniformly for
all filesystem types be acceptable ?

If lseek(2) needs to be supported, then how do we define the seek behaviour
when two different types of directories are union'ed ? For eg. how do we define
lseek(2) on a union of ext2 directory on top of a nfs directory ? Since both
of them use different encoding methods for filp->f_pos, how do we establish
a common lseek(2) behaviour here ?

And finally, what is the use case for directory seek ? Would anybody walk
directory by directory by seeking into a directory file ?

Regards,
Bharata.


2007-09-07 07:40:36

by J. R. Okajima

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches


Hello Bharata,
I am developing a linux stackable/unification filesystem too.

Bharata B Rao:
> Questions
> ---------
:::
> First of all, should we even expect a sane lseek(2) on union mounted
> directories ? If not, will the Approach 2, which works uniformly for
> all filesystem types be acceptable ?
>
> If lseek(2) needs to be supported, then how do we define the seek behaviour
> when two different types of directories are union'ed ? For eg. how do we define
> lseek(2) on a union of ext2 directory on top of a nfs directory ? Since both
> of them use different encoding methods for filp->f_pos, how do we establish
> a common lseek(2) behaviour here ?
>
> And finally, what is the use case for directory seek ? Would anybody walk
> directory by directory by seeking into a directory file ?

Although I don't remember exactly, NFS or smbfs seek for a
directory. Additionally, any user process can call seekdir or
something. So I believe any stackable/unification filesystem should
support it.

Here is my approach. While I don't think it is the best approach since
it consumes much memory and cpu, I hope it help you. (or assist
you? Sorry, I don't know correct English word)

- the stackable fs has its own inode, file, dentry object, which has an
array for the underlying inode pointers. and the whiteout is a regular
file with a special name, instead of a flag in inode. this is the most
different architecture from your unification embeded in VFS.
- the vritual dir inode object has a cache for its child entries. it is
called vdir. the cache has a version and a customizlable lifetime too.
- all the existing underlying (same-named) dir are opened too. the file
objects are stored in the virtual file object as an array.
- the virtual file object has a cache for its child entries too. it is a
copy of the one in the inode object.

When the first readdir is issued:
- call vfs_readdir for every underlying opened dir (file) object.
- store every entry to either the hash table for the result or the
whiteout, when the same-named entry didn't exist in the tables.
- to improvement the performance, the allocated memory for the hash
tables are managed in a pointer array. and the elements are
concatinated logically by the pointer.
- the pointer for the result-table, the version, and the currect jiffies
are set to vdir, which is a cache in an inode.
- all cache are copied to a member in a file object.
- the index of the cache memory block and the offset in an array is
handled as the seek position.

In the case of the application issued this sequence:
- opendir()
- readdir()
- creat or unlink an entry under the dir
- readdir()

When an entry under the dir was removed or added, the inode version will
be updated. Since readdir can compare it with the cached version or the
lifetime (jiffies) in the file object, it can refresh the entries. But
in this case, it doesn't, since the file position is not 0. If the
application needs the latest entries, it has to call rewinddir.
The cache in the file object will updated only the case of obsoleted AND
the file position is 0.

When a dir who has already its vdir is opened, the cache in the inode
object will be used without calling vfs_readdir, after checking the
version and the lifetime which are stored in the inode object. If it is
obsoleted, vfs_readdir will be called again in order to update the cache
in the inode.


If you are interested in this approach, please refer to
http://aufs.sf.net. It is working and used by several people.


Junjiro Okajima

2007-09-07 07:59:23

by Bharata B Rao

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches

On Fri, Sep 07, 2007 at 04:31:26PM +0900, [email protected] wrote:
>
> When the first readdir is issued:
> - call vfs_readdir for every underlying opened dir (file) object.
> - store every entry to either the hash table for the result or the
> whiteout, when the same-named entry didn't exist in the tables.
> - to improvement the performance, the allocated memory for the hash
> tables are managed in a pointer array. and the elements are
> concatinated logically by the pointer.
> - the pointer for the result-table, the version, and the currect jiffies
> are set to vdir, which is a cache in an inode.
> - all cache are copied to a member in a file object.
> - the index of the cache memory block and the offset in an array is
> handled as the seek position.

Ok, interesting approach. So you define the seek behaviour on your
directory cache rather than allowing the underlying filesystems to
interpret the seek. I guess we can do something similar with Union
Mounts also.

> If you are interested in this approach, please refer to
> http://aufs.sf.net. It is working and used by several people.

Will look at it. And thanks Junjiro for your detailed explanation of
the aufs approach.

Regards,
Bharata.

2007-09-07 11:57:50

by Al Boldi

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches

[email protected] wrote:
> If you are interested in this approach, please refer to
> http://aufs.sf.net. It is working and used by several people.

Any chance you can post a patch against 2.6.22?


Thanks!

--
Al

2007-09-07 12:50:21

by J. R. Okajima

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches


Al Boldi:
> > If you are interested in this approach, please refer to
> > http://aufs.sf.net. It is working and used by several people.
>
> Any chance you can post a patch against 2.6.22?

Unfortunately there are many reasons to keep me away from sending a
patch.
- it is large (48 source files).
- it supports linux-2.6.16 and later, and uses
#if ... LINUX_VERSION_CODE ...
condition a lot. I want to keep handling them as long as possible
since some users use it.
- there are sereral codes I want to rewrite cleanly or well-behaved.
- and other reasons.

But if you really want to read or try it, you can get all source files
from sourceforge. Read http://aufs.sf.net and try,
$ cvs -d:pserver:[email protected]:/cvsroot/aufs login
(CVS password is empty)
$ cvs -z3 -d:pserver:[email protected]:/cvsroot/aufs co aufs


Regards,
Junjiro Okajima

2007-09-07 14:39:21

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches

On Sep 7 2007 11:16, Bharata B Rao wrote:

>Questions
>---------
>The main problem in getting a sane readdir() implementation in Union Mount
>is the fact that a single vfs object (file structure) is used to represent
>more than one (underlying) directory. Because of this, it is unclear as to
>how lseek(2) needs to behave in such cases.
>
>First of all, should we even expect a sane lseek(2) on union mounted
>directories ? If not, will the Approach 2, which works uniformly for
>all filesystem types be acceptable ?

Filesystems also have this problem. http://lkml.org/lkml/2007/4/7/107
If you ask me, we should entriely do away with the telldir/seekdir
blackmagic (and add appropriate compat code).



Jan
--

2007-09-07 17:40:27

by Josef 'Jeff' Sipek

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches

On Fri, Sep 07, 2007 at 01:28:55PM +0530, Bharata B Rao wrote:
> On Fri, Sep 07, 2007 at 04:31:26PM +0900, [email protected] wrote:
> >
> > When the first readdir is issued:
> > - call vfs_readdir for every underlying opened dir (file) object.
> > - store every entry to either the hash table for the result or the
> > whiteout, when the same-named entry didn't exist in the tables.
> > - to improvement the performance, the allocated memory for the hash
> > tables are managed in a pointer array. and the elements are
> > concatinated logically by the pointer.
> > - the pointer for the result-table, the version, and the currect jiffies
> > are set to vdir, which is a cache in an inode.
> > - all cache are copied to a member in a file object.
> > - the index of the cache memory block and the offset in an array is
> > handled as the seek position.
>
> Ok, interesting approach. So you define the seek behaviour on your
> directory cache rather than allowing the underlying filesystems to
> interpret the seek. I guess we can do something similar with Union
> Mounts also.

Unless I missunderstood something, Unionfs uses the same approach. Even
Unionfs's ODF branch does the same thing. The major difference is that we
keep the cache in a file on a disk.

Josef 'Jeff' Sipek.

--
Evolution, n.:
A hypothetical process whereby infinitely improbable events occur with
alarming frequency, order arises from chaos, and no one is given credit.

2007-09-07 17:54:44

by Erez Zadok

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches

In message <[email protected]>, "Josef 'Jeff' Sipek" writes:
> On Fri, Sep 07, 2007 at 01:28:55PM +0530, Bharata B Rao wrote:
> > On Fri, Sep 07, 2007 at 04:31:26PM +0900, [email protected] wrote:
> > >
> > > When the first readdir is issued:
> > > - call vfs_readdir for every underlying opened dir (file) object.
> > > - store every entry to either the hash table for the result or the
> > > whiteout, when the same-named entry didn't exist in the tables.
> > > - to improvement the performance, the allocated memory for the hash
> > > tables are managed in a pointer array. and the elements are
> > > concatinated logically by the pointer.
> > > - the pointer for the result-table, the version, and the currect jiffies
> > > are set to vdir, which is a cache in an inode.
> > > - all cache are copied to a member in a file object.
> > > - the index of the cache memory block and the offset in an array is
> > > handled as the seek position.
> >
> > Ok, interesting approach. So you define the seek behaviour on your
> directory cache rather than allowing the underlying filesystems to
> > interpret the seek. I guess we can do something similar with Union
> > Mounts also.
>
> Unless I missunderstood something, Unionfs uses the same approach. Even
> Unionfs's ODF branch does the same thing. The major difference is that we
> keep the cache in a file on a disk.

Yup.

Bharata, in the long run, storing a cache of the readdir state on disk, is
the best approach by far. Since you already spend the CPU and memory
resources to create a merged view, storing it on disk as a contiguous file
isn't that much more effort. That effort pays off later on esp. if the
directories don't change often:

- you get a compatible behavior with seekdir/telldir (no matter how
braindead that interface is :-)

- for subsequent directory reading, your performance actually improves
because you don't have to repeat the duplicate elimination and whiteout
processing -- just read the cached file from disk as any other file. You
then benefit from traditional readahead, and from not having to cache the
entire contents of the readdir state file, so it falls under normal
paging/flushing policies.

Any policy which merges the readdir info and keeps it in memory indefinitely
is problematic -- you increase average memory pressure on the system over a
longer period of time; and when you purge your readdir state from memory,
you have to recreate it from scratch, re-consuming the same CPU/memory
resources.

Our ODF code implements the readdir state caching policy, as described in
the ODF design document here:

<http://www.filesystems.org/unionfs-odf.txt>

Finally, I don't think it'll be so easy to get rid of seekdir/telldir, b/c
some of it is the default behavior of non-linux NFS/smb clients (we've seen
it with Solaris NFS clients).

Erez.

2007-09-07 23:08:24

by Matt Keenan

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches

[email protected] wrote:
> Hello Bharata,
> I am developing a linux stackable/unification filesystem too.
>
> Bharata B Rao:
>
>> Questions
>> ---------
>>
> :::
>
>> First of all, should we even expect a sane lseek(2) on union mounted
>> directories ? If not, will the Approach 2, which works uniformly for
>> all filesystem types be acceptable ?
>>
>> If lseek(2) needs to be supported, then how do we define the seek behaviour
>> when two different types of directories are union'ed ? For eg. how do we define
>> lseek(2) on a union of ext2 directory on top of a nfs directory ? Since both
>> of them use different encoding methods for filp->f_pos, how do we establish
>> a common lseek(2) behaviour here ?
>>
>> And finally, what is the use case for directory seek ? Would anybody walk
>> directory by directory by seeking into a directory file ?
>>
>
> Although I don't remember exactly, NFS or smbfs seek for a
> directory. Additionally, any user process can call seekdir or
> something. So I believe any stackable/unification filesystem should
> support it.
>
> Here is my approach. While I don't think it is the best approach since
> it consumes much memory and cpu, I hope it help you. (or assist
> you? Sorry, I don't know correct English word)
>
> - the stackable fs has its own inode, file, dentry object, which has an
> array for the underlying inode pointers. and the whiteout is a regular
> file with a special name, instead of a flag in inode. this is the most
> different architecture from your unification embeded in VFS.
> - the vritual dir inode object has a cache for its child entries. it is
> called vdir. the cache has a version and a customizlable lifetime too.
> - all the existing underlying (same-named) dir are opened too. the file
> objects are stored in the virtual file object as an array.
> - the virtual file object has a cache for its child entries too. it is a
> copy of the one in the inode object.
>
> When the first readdir is issued:
> - call vfs_readdir for every underlying opened dir (file) object.
> - store every entry to either the hash table for the result or the
> whiteout, when the same-named entry didn't exist in the tables.
> - to improvement the performance, the allocated memory for the hash
> tables are managed in a pointer array. and the elements are
> concatinated logically by the pointer.
> - the pointer for the result-table, the version, and the currect jiffies
> are set to vdir, which is a cache in an inode.
> - all cache are copied to a member in a file object.
> - the index of the cache memory block and the offset in an array is
> handled as the seek position.
>
> In the case of the application issued this sequence:
> - opendir()
> - readdir()
> - creat or unlink an entry under the dir
> - readdir()
>
> When an entry under the dir was removed or added, the inode version will
> be updated. Since readdir can compare it with the cached version or the
> lifetime (jiffies) in the file object, it can refresh the entries. But
> in this case, it doesn't, since the file position is not 0. If the
> application needs the latest entries, it has to call rewinddir.
> The cache in the file object will updated only the case of obsoleted AND
> the file position is 0.
>
> When a dir who has already its vdir is opened, the cache in the inode
> object will be used without calling vfs_readdir, after checking the
> version and the lifetime which are stored in the inode object. If it is
> obsoleted, vfs_readdir will be called again in order to update the cache
> in the inode.
>
>
This sounds like a good approach. How does aufs handle low memory
situations? Union mounts seem to be quite common on low memory embedded
systems. Is there a way for the VM to signal to aufs/the union
filesystem to trim its cache? Also on the memory consumption front I
guess you could get the union fs to refer to a singleton name entry
directly instead of creating a new virtual inode et al. This may lead to
some unusualness though for mounts over different filesystems that have
different length directory files (eg vfat and ext3). This does run
counter to the model described above in some ways.

Matt

2007-09-10 02:18:01

by J. R. Okajima

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches


Hello Jeff,

"Josef 'Jeff' Sipek":
> Unless I missunderstood something, Unionfs uses the same approach. Even
> Unionfs's ODF branch does the same thing. The major difference is that we
> keep the cache in a file on a disk.

The approach unionfs-2.1.2 took differs from mine.
Major difference is,
- in this approach, the cache in struct file is the completed virtual
disk block for the dir.
- readdir doesn't have to call vfs_readdir when the cache exists and is
valid. simply calls filldir.
- it supports SEEK_END.


Junjiro Okajima

2007-09-10 02:18:37

by J. R. Okajima

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches


Matt Keenan:
> This sounds like a good approach. How does aufs handle low memory
> situations? Union mounts seem to be quite common on low memory embedded
> systems. Is there a way for the VM to signal to aufs/the union
> filesystem to trim its cache? Also on the memory consumption front I

I also want to find the way to make this implementation to be smaller.

Basically there is no such signal or message.
The cache for entries will be discarded when the inode cache is shrinked
or the file is closed (and the file object is destroyed), because it is
in inode or file object.
It is up to your system memory size or something when the inode cache is
shrinked.


Junjiro Okajima

2007-09-10 03:46:55

by Bharata B Rao

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches

On Fri, Sep 07, 2007 at 01:39:41PM -0400, Josef 'Jeff' Sipek wrote:
> On Fri, Sep 07, 2007 at 01:28:55PM +0530, Bharata B Rao wrote:
> > On Fri, Sep 07, 2007 at 04:31:26PM +0900, [email protected] wrote:
> > >
> > > When the first readdir is issued:
> > > - call vfs_readdir for every underlying opened dir (file) object.
> > > - store every entry to either the hash table for the result or the
> > > whiteout, when the same-named entry didn't exist in the tables.
> > > - to improvement the performance, the allocated memory for the hash
> > > tables are managed in a pointer array. and the elements are
> > > concatinated logically by the pointer.
> > > - the pointer for the result-table, the version, and the currect jiffies
> > > are set to vdir, which is a cache in an inode.
> > > - all cache are copied to a member in a file object.
> > > - the index of the cache memory block and the offset in an array is
> > > handled as the seek position.
> >
> > Ok, interesting approach. So you define the seek behaviour on your
> > directory cache rather than allowing the underlying filesystems to
> > interpret the seek. I guess we can do something similar with Union
> > Mounts also.
>
> Unless I missunderstood something, Unionfs uses the same approach. Even

But in the version of unionfs present in -mm, lseek on directories is
still limited in functionality as it allows seeking to only the
beginning and to the current position.

> Unionfs's ODF branch does the same thing. The major difference is that we
> keep the cache in a file on a disk.

And as Erez explained, it is ODF which is allowing you to have a
complete lseek behaviour.

Regards,
Bharata.

2007-09-10 05:15:30

by Bharata B Rao

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches

On Fri, Sep 07, 2007 at 01:54:18PM -0400, Erez Zadok wrote:
> In message <[email protected]>, "Josef 'Jeff' Sipek" writes:
> > On Fri, Sep 07, 2007 at 01:28:55PM +0530, Bharata B Rao wrote:
> > > On Fri, Sep 07, 2007 at 04:31:26PM +0900, [email protected] wrote:
> > > >
> > > > When the first readdir is issued:
> > > > - call vfs_readdir for every underlying opened dir (file) object.
> > > > - store every entry to either the hash table for the result or the
> > > > whiteout, when the same-named entry didn't exist in the tables.
> > > > - to improvement the performance, the allocated memory for the hash
> > > > tables are managed in a pointer array. and the elements are
> > > > concatinated logically by the pointer.
> > > > - the pointer for the result-table, the version, and the currect jiffies
> > > > are set to vdir, which is a cache in an inode.
> > > > - all cache are copied to a member in a file object.
> > > > - the index of the cache memory block and the offset in an array is
> > > > handled as the seek position.
> > >
> > > Ok, interesting approach. So you define the seek behaviour on your
> > directory cache rather than allowing the underlying filesystems to
> > > interpret the seek. I guess we can do something similar with Union
> > > Mounts also.
> >
> > Unless I missunderstood something, Unionfs uses the same approach. Even
> > Unionfs's ODF branch does the same thing. The major difference is that we
> > keep the cache in a file on a disk.
>
> Yup.
>
> Bharata, in the long run, storing a cache of the readdir state on disk, is
> the best approach by far. Since you already spend the CPU and memory
> resources to create a merged view, storing it on disk as a contiguous file
> isn't that much more effort. That effort pays off later on esp. if the
> directories don't change often:

Erez, I agree that there are positives in storing the readdir state
on disk, but ...

>
> - you get a compatible behavior with seekdir/telldir (no matter how
> braindead that interface is :-)

lseek problem can also be solved by defining the seek on the cached (in
memory) readdir state. (similar to aufs)

>
> - for subsequent directory reading, your performance actually improves
> because you don't have to repeat the duplicate elimination and whiteout
> processing -- just read the cached file from disk as any other file. You
> then benefit from traditional readahead, and from not having to cache the
> entire contents of the readdir state file, so it falls under normal
> paging/flushing policies.

I guess, same performance benefits can be obtained if we leave the
readdir state in memory for sometime. In the Approach 2 which I
described in my original post, the readdir state was stored as part of
the file structure and it remained there until the last close of the
directory.

>
> Any policy which merges the readdir info and keeps it in memory indefinitely
> is problematic -- you increase average memory pressure on the system over a
> longer period of time; and when you purge your readdir state from memory,
> you have to recreate it from scratch, re-consuming the same CPU/memory
> resources.

Yes, keeping the state in memory indefinitely is problematic.
But we can always purge it under memory pressure. If it comes
to that then it is a tradeoff between recreating the state after purging
and having the state stored in a separate physical file system.

Not related directly to readdir, but I had concerns about ODF:

- Creating/Replicating the entire directory tree of the union. So you
can potentially have a very large tree duplicated (ofcouse with zero
length files, but still) in ODF.

- Storing whiteouts in ODF might be a feasible solution for unionfs, but for
union mount it looks like an overhead. You can afford to have an
extra (whiteout) lookup into ODF from your unionfs lookup code, since
you do all this from unionfs filesystems code. But doing anything similar
with union mounts from VFS layer is going to look ugly.

Regards,
Bharata.

2007-09-12 02:06:51

by J. R. Okajima

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches


"Josef 'Jeff' Sipek":
> So, if I understand correctly, you create the entire block as if you were
> going to write to disk? Unionfs keeps the data in a linked list.

Basically yes.
But the dir block in cache has no hole which is contiguous memory.


Junjiro Okajima

2007-09-12 10:46:37

by Al Boldi

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches

[email protected] wrote:
> But if you really want to read or try it, you can get all source files
> from sourceforge. Read http://aufs.sf.net and try,
> $ cvs -d:pserver:[email protected]:/cvsroot/aufs login
> (CVS password is empty)
> $ cvs -z3 -d:pserver:[email protected]:/cvsroot/aufs co
> aufs

This is way too complicated, but I tried it anyway, only to find it doesn't
compile:

CHK include/linux/version.h
CHK include/linux/utsrelease.h
CALL scripts/checksyscalls.sh
CHK include/linux/compile.h
CC fs/aufs/dentry.o
fs/aufs/dentry.c:630:1: directives may not be used inside a macro argument
fs/aufs/dentry.c:629:65: unterminated argument list invoking macro "unlikely"
fs/aufs/dentry.c: In function `h_d_revalidate':
fs/aufs/dentry.c:631: `unlikely' undeclared (first use in this function)
fs/aufs/dentry.c:631: (Each undeclared identifier is reported only once
fs/aufs/dentry.c:631: for each function it appears in.)
fs/aufs/dentry.c:635: parse error before ')' token
fs/aufs/dentry.c:571: warning: unused variable `h_plus'
fs/aufs/dentry.c:571: warning: unused variable `is_nfs'
fs/aufs/dentry.c:572: warning: unused variable `p'
fs/aufs/dentry.c:575: warning: unused variable `h_inode'
fs/aufs/dentry.c:575: warning: unused variable `h_cached_inode'
fs/aufs/dentry.c:576: warning: unused variable `h_mode'
fs/aufs/dentry.c:578: warning: unused variable `reval'
fs/aufs/dentry.c:639: label `err' used but not defined
fs/aufs/dentry.c: At top level:
fs/aufs/dentry.c:642: warning: type defaults to `int' in declaration of
`reval'
fs/aufs/dentry.c:642: warning: initialization makes integer from pointer
without a cast
fs/aufs/dentry.c:642: warning: data definition has no type or storage class
fs/aufs/dentry.c:643: parse error before "if"
fs/aufs/dentry.c:649: warning: type defaults to `int' in declaration of `err'
fs/aufs/dentry.c:649: `h_dentry' undeclared here (not in a function)
fs/aufs/dentry.c:649: `p' undeclared here (not in a function)
fs/aufs/dentry.c:649: called object is not a function
fs/aufs/dentry.c:649: warning: data definition has no type or storage class
fs/aufs/dentry.c:650: parse error before "if"
fs/aufs/dentry.c:653: warning: type defaults to `int' in declaration of
`fake_dm_release'
fs/aufs/dentry.c:653: warning: parameter names (without types) in function
declaration
fs/aufs/dentry.c:653: conflicting types for `fake_dm_release'

...and more...


It would make matters much easier if you could just publish a link to a
combo-patch against at least the latest stable kernel, like 2.6.22.


Thanks!

--
Al

2007-09-12 18:25:29

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches


On Sep 12 2007 13:46, Al Boldi wrote:
>[email protected] wrote:
>> But if you really want to read or try it, you can get all source files
>> from sourceforge. Read http://aufs.sf.net and try,
>> $ cvs -d:pserver:[email protected]:/cvsroot/aufs login
>> (CVS password is empty)
>> $ cvs -z3 -d:pserver:[email protected]:/cvsroot/aufs co
>> aufs
>
>This is way too complicated, but I tried it anyway, only to find it doesn't
>compile:

cvs up -D 2007-08-07

that one works ;-)



Jan
--

2007-09-13 02:16:33

by J. R. Okajima

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches


Jan Engelhardt:
> On Sep 12 2007 13:46, Al Boldi wrote:
::
> >This is way too complicated, but I tried it anyway, only to find it doesn't
> >compile:
>
> cvs up -D 2007-08-07
>
> that one works ;-)

Jan, do you mean that only the one month old version could be compiled?
It it rather surprise since I know some users compiled the newer
versions. Won't you tell me how did you 'make' it? I think a personal
mail for me is pereferable to ML.

To Al Boldi,
Will you send me directly the message which is quoted by Jan? Since it
was not delivered to me.

Thanks in advance.
Junjiro Okajima

2007-09-13 05:33:28

by Al Boldi

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches

[email protected] wrote:
> Jan Engelhardt:
> > On Sep 12 2007 13:46, Al Boldi wrote:
> > >This is way too complicated, but I tried it anyway, only to find it
> > > doesn't compile:
> >
> > cvs up -D 2007-08-07
> >
> > that one works ;-)
>
> Jan, do you mean that only the one month old version could be compiled?
> It it rather surprise since I know some users compiled the newer
> versions. Won't you tell me how did you 'make' it? I think a personal
> mail for me is pereferable to ML.
>
> To Al Boldi,
> Will you send me directly the message which is quoted by Jan? Since it
> was not delivered to me.
>
> Thanks in advance.
> Junjiro Okajima

It turns out that the problem was this in dentry.c:

627- if (unlikely(do_udba
628- && !is_root
629- && (unhashed != d_unhashed(h_dentry)
630://#if 1
631- || name->len != h_dentry->d_name.len
632- || memcmp(name->name, h_dentry->d_name.name,
633- name->len)
634-//#endif
635- ))) {
636- LKTRTrace("unhash 0x%x 0x%x, %.*s %.*s\n",
637- unhashed, d_unhashed(h_dentry),
638- DLNPair(dentry), DLNPair(h_dentry));
639- goto err;
640- }

Commenting the #if block makes it compile now.

Works great too. Even performance wise. Needs more testing though.

You really need to post a cleaned up version for review and possible
inclusion into mainline. It definitely looks solid.


Thanks!

--
Al

---------- Original Message ----------

Subject: Re: [RFC] Union Mount: Readdir approaches
Date: Wednesday 12 September 2007 01:46 pm
From: Al Boldi <[email protected]>
To: [email protected]
Cc: [email protected], [email protected],
[email protected], [email protected], Jan Blunck
<[email protected]>, "Josef 'Jeff' Sipek" <[email protected]>

[email protected] wrote:
> But if you really want to read or try it, you can get all source files
> from sourceforge. Read http://aufs.sf.net and try,
> $ cvs -d:pserver:[email protected]:/cvsroot/aufs login
> (CVS password is empty)
> $ cvs -z3 -d:pserver:[email protected]:/cvsroot/aufs co
> aufs

This is way too complicated, but I tried it anyway, only to find it doesn't
compile:

CHK include/linux/version.h
CHK include/linux/utsrelease.h
CALL scripts/checksyscalls.sh
CHK include/linux/compile.h
CC fs/aufs/dentry.o
fs/aufs/dentry.c:630:1: directives may not be used inside a macro argument
fs/aufs/dentry.c:629:65: unterminated argument list invoking macro
"unlikely" fs/aufs/dentry.c: In function `h_d_revalidate':
fs/aufs/dentry.c:631: `unlikely' undeclared (first use in this function)
fs/aufs/dentry.c:631: (Each undeclared identifier is reported only once
fs/aufs/dentry.c:631: for each function it appears in.)
fs/aufs/dentry.c:635: parse error before ')' token
fs/aufs/dentry.c:571: warning: unused variable `h_plus'
fs/aufs/dentry.c:571: warning: unused variable `is_nfs'
fs/aufs/dentry.c:572: warning: unused variable `p'
fs/aufs/dentry.c:575: warning: unused variable `h_inode'
fs/aufs/dentry.c:575: warning: unused variable `h_cached_inode'
fs/aufs/dentry.c:576: warning: unused variable `h_mode'
fs/aufs/dentry.c:578: warning: unused variable `reval'
fs/aufs/dentry.c:639: label `err' used but not defined
fs/aufs/dentry.c: At top level:
fs/aufs/dentry.c:642: warning: type defaults to `int' in declaration of
`reval'
fs/aufs/dentry.c:642: warning: initialization makes integer from pointer
without a cast
fs/aufs/dentry.c:642: warning: data definition has no type or storage class
fs/aufs/dentry.c:643: parse error before "if"
fs/aufs/dentry.c:649: warning: type defaults to `int' in declaration of
`err' fs/aufs/dentry.c:649: `h_dentry' undeclared here (not in a function)
fs/aufs/dentry.c:649: `p' undeclared here (not in a function)
fs/aufs/dentry.c:649: called object is not a function
fs/aufs/dentry.c:649: warning: data definition has no type or storage class
fs/aufs/dentry.c:650: parse error before "if"
fs/aufs/dentry.c:653: warning: type defaults to `int' in declaration of
`fake_dm_release'
fs/aufs/dentry.c:653: warning: parameter names (without types) in function
declaration
fs/aufs/dentry.c:653: conflicting types for `fake_dm_release'

...and more...


It would make matters much easier if you could just publish a link to a
combo-patch against at least the latest stable kernel, like 2.6.22.


Thanks!

--
Al

2007-09-13 05:53:51

by J. R. Okajima

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches


Al Boldi:
> It turns out that the problem was this in dentry.c:
:::
> Commenting the #if block makes it compile now.
>
> Works great too. Even performance wise. Needs more testing though.

Thank you for your report and forwarding your original message.
And I am glad that it is working for you.

It seems that '#if ... #endif' in an 'unlikly' macro argument is bad
coding. I don't know why my compiler and other users compiler didn't
produce an error.
Anyway, I'll fix such code.


> You really need to post a cleaned up version for review and possible
> inclusion into mainline. It definitely looks solid.

I'll try in the future.


Thanks
Junjiro Okajima

2007-09-13 06:29:32

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [RFC] Union Mount: Readdir approaches


On Sep 13 2007 14:52, [email protected] wrote:
>Al Boldi:
>> It turns out that the problem was this in dentry.c:
> :::
>> Commenting the #if block makes it compile now.
>>
>> Works great too. Even performance wise. Needs more testing though.
>
>Thank you for your report and forwarding your original message.
>And I am glad that it is working for you.
>
>It seems that '#if ... #endif' in an 'unlikly' macro argument is bad
>coding. I don't know why my compiler and other users compiler didn't
>produce an error.

http://lkml.org/lkml/2007/8/5/249 might be related.



Jan
--