2001-02-19 00:18:39

by David Konerding

[permalink] [raw]
Subject: problems with reiserfs + nfs using 2.4.2-pre4


Hi,

I migrated some exported disks over to reiserfs and had no luck when I
mounted the disk via NFS on another machine. I've noticed many messages
about reiser and NFS in the archives, but my understanding was that
it had been cleared up. In particular, the Configure.help in 2.4.2-pre4
says "reiserfs can be used for anything that ext2 can be used for".

Here's my setup:

server box running RH71beta, kernel 2.4.2-pre4 compiled with gcc-2.95.2.

client box running RH71beta, kernel 2.4.2-pre4 compiled with gcc-2.95.2.

On the server, I built a standard kernel with NFS server, LVM, and reiser.
On the client, I built a standard kernel with NFS client.

On the server, the filesystem was created over an LVM logical volume.
When I mounted the server-exported filesystem on the client node,
there are no visible files or directories. Reverting to ext2 fixed the problem.

I'd like to hear from people who have tried recent 2.4 kernels with RH7.1,
NFS, and reiser, to see if anybody has had luck.

Dave


2001-02-19 00:39:37

by NeilBrown

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

On Sunday February 18, [email protected] wrote:
>
> Hi,
>
> I migrated some exported disks over to reiserfs and had no luck when I
> mounted the disk via NFS on another machine. I've noticed many messages
> about reiser and NFS in the archives, but my understanding was that
> it had been cleared up. In particular, the Configure.help in 2.4.2-pre4
> says "reiserfs can be used for anything that ext2 can be used for".

pure marketing! Last I checked, it didn't have quotas either (well, it
is available as an extra patch, but they might make incompatable
changes later I gather).

If you go to

http://www.cse.unsw.edu.au/~neilb/patches/linux/2.4.2-pre3/

and pick up patch-B-nfsdops and patch-C-reisernfs
you should get reasonable nfs service for reiserfs.
Note that this is not final code though. The format of the filehandle
will probably change shortly as it doesn't currently contain a
generation number.
A similar patch is available somewhere under http://www.namesys.com I
believe.

NeilBrown

2001-02-19 01:55:50

by Alan

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

> it had been cleared up. In particular, the Configure.help in 2.4.2-pre4
> says "reiserfs can be used for anything that ext2 can be used for".

The configure.help is wrong on that and one other thing. NFS doesnt work
without extra patches and big endian boxes dont work with reiserfs currently

Chris - would it be worth sending me a patch that notes the NFS thing in
Configure.help and includes the patch url ?

2001-02-19 02:56:42

by David Konerding

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

Neil Brown writes:
>On Sunday February 18, [email protected] wrote:
>>
>> Hi,
>>
>> I migrated some exported disks over to reiserfs and had no luck when I
>> mounted the disk via NFS on another machine. I've noticed many messages
>> about reiser and NFS in the archives, but my understanding was that
>> it had been cleared up. In particular, the Configure.help in 2.4.2-pre4
>> says "reiserfs can be used for anything that ext2 can be used for".
>
>
>If you go to
>
> http://www.cse.unsw.edu.au/~neilb/patches/linux/2.4.2-pre3/
>
>and pick up patch-B-nfsdops and patch-C-reisernfs
>you should get reasonable nfs service for reiserfs.
>Note that this is not final code though. The format of the filehandle
>will probably change shortly as it doesn't currently contain a
>generation number.
>A similar patch is available somewhere under http://www.namesys.com I
>believe.

OK, I grabbed these patches and applied them against 2.4.2-pre4 and
recompiled, rebooted. I am now able to use reiserfs with NFS,
basic operations appear to work as expected but I haven't done large amounts
of file IO or lots of concurrent requests.

What is the plan with regards to these patches, or ones like it, making it into
the distribution? I noticedAlan Cox just posted the following:

>The configure.help is wrong on that and one other thing. NFS doesnt work
>without extra patches and big endian boxes dont work with reiserfs currently
>Chris - would it be worth sending me a patch that notes the NFS thing in
>Configure.help and includes the patch url ?

which suggests that the Configure.help mis-statement is going to be corrected but
doesn't imply the fix is going in any time soon. I think NFS over Reiser is going
to be interesting to people who are running big fileservers, so if this patch
really does fix it (without breaking anything else) then itwould make sense for it
to go in.

Dave

2001-02-19 03:40:42

by NeilBrown

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

On Sunday February 18, [email protected] wrote:
>
> OK, I grabbed these patches and applied them against 2.4.2-pre4 and
> recompiled, rebooted. I am now able to use reiserfs with NFS,
> basic operations appear to work as expected but I haven't done large amounts
> of file IO or lots of concurrent requests.
>
> What is the plan with regards to these patches, or ones like it, making it into
> the distribution?

The changes to knfsd are fairly big and mean that every filesystem
type that is to be exported needs to explicitly provide support for
knfsd.

I have almost completed doing this for
ext2fs reiserfs isofs ufs efs

which were the only ones that were mentioned when I asked "what
filesystem types do you want to export" on the nfs list.
The isofs stuff needs more work to be *right*, but it should be at
least as good as the current code provides. (i.e. it works as long as
things don't get flushed out of the dentry cache).

The reiserfs bit needs work to get generation number checking done.
This is being worked on by Danilov Nikita.

I hope to put out a patch set for testing in a day or so and possibly
suggest it to Alan for his -ac series. I don't see it going into
2.4.2, but 2.4.3 might be possible if Linus agrees.

NeilBrown

2001-02-19 11:24:52

by Alan

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

> I hope to put out a patch set for testing in a day or so and possibly
> suggest it to Alan for his -ac series. I don't see it going into
> 2.4.2, but 2.4.3 might be possible if Linus agrees.

Im not interested in a patch that requires NFS is hacked for each file system
that tells me the implementation is wrong. The previous setup worked perfectly
for everything but reiserfs, so why isnt the newer setup one allowing each fs
to override a generic behaviour which is the current working behaviour, thereby
meaning all the other fs's work.

FS authors shouldnt have to do extra work to support knfsd unless they are doing
interesting and unusual things

2001-02-19 15:38:14

by Chris Mason

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4



On Monday, February 19, 2001 01:55:57 AM +0000 Alan Cox
<[email protected]> wrote:

>> it had been cleared up. In particular, the Configure.help in 2.4.2-pre4
>> says "reiserfs can be used for anything that ext2 can be used for".
>
> The configure.help is wrong on that and one other thing. NFS doesnt work
> without extra patches and big endian boxes dont work with reiserfs
> currently
>
> Chris - would it be worth sending me a patch that notes the NFS thing in
> Configure.help and includes the patch url ?

Yes, it would. I'd rather put a link to http://www.reiserfs.org, where the
quota/NFS/big endian patches will be linked to. Sound good?

-chris

2001-02-20 00:41:32

by NeilBrown

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

On Monday February 19, [email protected] wrote:
> > I hope to put out a patch set for testing in a day or so and possibly
> > suggest it to Alan for his -ac series. I don't see it going into
> > 2.4.2, but 2.4.3 might be possible if Linus agrees.
>
> Im not interested in a patch that requires NFS is hacked for each file system
> that tells me the implementation is wrong. The previous setup worked perfectly
> for everything but reiserfs, so why isnt the newer setup one allowing each fs
> to override a generic behaviour which is the current working behaviour, thereby
> meaning all the other fs's work.
>
> FS authors shouldnt have to do extra work to support knfsd unless they are doing
> interesting and unusual things

A fair question (it was a question, wasn't it :-) and I will see if I
can provide a convincing answer.

My answer has three parts:
1/ iget abuse
2/ lookup("..")
3/ requirements to support knfsd.

Part 1 also contains some comments about reiserfs for Chris Mason.

---------------------------------------------------------------
1/ The linux-2.4 Todo list has, in section
"10. To Do But Non Showstopper"

- iget abuse in knfsd


Part of the need for change to knfsd and filesystems is to fix this
abuse. But what is this abuse? Well..

iget is a (n often misunderstood I think) helper function for
filesystems. It helps maintain an inode cache.

If a filesystem chooses, it may provide a read_inode super_operation
and then call iget whenever it wants to find a particular inode.
If iget finds the inode in the cache, it will return it. If it
doesn't it will create a new inode, set the i_ino field (and a few
others) and pass this to read_inode for the filesystem to finish
off. iget handles any locking needed to make sure no inode gets
created or read_inode'd twice.

It is worth noting that read_inode does *not* have to actually read
in the inode, though this is the typical (i.e. ext2fs) thing to do.
nfs_read_inode, for example, just initialises a few extra fields in
the inode. After the call to iget completes, other parts of the nfs
code fills in the rest of the inode from data received from the NFS
server.

It is also worth noting that not all filesystems use
iget/read_inode. msdos/fat maintains it's own inode cache and so
doesn't use the one provided by iget.

A final point to note - and this is actually quite important - is
that the value passed to iget (an inode number) needs to contain
enough information to be able to answer "is this the right inode"
(recognition) but does not need to be able to answer "where can I
find this inode" (recall), though these two go together for ext2 and
similar file systems.

The case of nfs, already mentioned, shows this. A filehandle is
needed to find an inode, but a filehandle is not passed to iget.

nfs did need an extension to iget though, and hence created iget4.
The assumption that 4bytes is enough to recognise an inode is a bit
dated. NFSv3 needs at least 8bytes (because the fileid is 64bit)
and Trond decided (quite reasonably) to use 16bytes (including the
fsid). Because these cannot be squeezed into a 32bit ino_t, he
create iget4 which also had an opaque datum (arbitrary size) and an
"actor" which would compare that datum against a given inode to see
if there was a match. This works fine, but is a bit awkward as the
whole opaque datum is not available for hash calculations. Maybe
this can be tided up in 2.5.

Note that this extension did not change the fact the iget has enough
information to recognise, but not necessarily to recall an inode.

When reiserfs came along, it abused this, and re-interpreted the
opaque datum to contain information for recalling (locating) an
inode - if read_inode2 was defined. I think this is wrong.
What reiserfs *should* do (I think) is:
- Just pass the 32bit inode number to iget - this seems to be enough
to recognise an inode in reiserfs.
- have read_inode mark the inode as "incomplete" in some way.
- In reiserfs_iget, after calling "iget", it should
+ lock the inode,
+ check if it is incomplete or not
+ if it is incomplete, read in the inode
+ unlock the inode.

This removes the need for read_inode2 and maintains the same
concurrency controls. Whether the locking can be done with
I_LOCK/i_wait or not I am not sure.


Anyway, back to nfsd: nfsd currently gets an inode number out of
the file handle and passes it to iget. The above discussion of iget
should show that this is the wrong thing to do. In particular,
knfsd needs to be able to recall (locate) an inode, and iget only
guarantees being able to recognise one. It happens to work
for ext2fs, but it is an abuse of the interface and must change.

If knfsd is not to use iget, it needs some other interface to use.
My patch provides such an interface by defining an nfsd_operations
structure which contains encode_fh and decode_fh so that a
filesystem can determine exactly how to treat a filehandle for that
filesystem. It also provides helper functions so that the filesystem
needs to do very little work itself. But it does need to do some as
the next section discusses.

------------------------------------------------------

2/ lookup("..").

The VFS layer requires that every directory in the dcache be
properly connected to the root of the filesystem.
A particular reason for this is so that it can provide protection
against two concurrent diretory renames creating an unreachable loop
in the filesystem. See s_vfs_rename_sem and is_subdir.

When knfsd maps a filehandle to a directory, the ancestors of this
directory may not be in the dcache, and so nfsd has to put them
there before it does anything with the directory dentry.
To be able to do this it must be able to find the parent of a given
directory. However, this is something that no other part of the
filesystem code needs to do.
You might think that chdir("..") needs to find the parent of a
directory, but the filesystem never gets a chance to see the "..".
This lookup is performed entirely in the dcache by the VFS layer.
See follow_dotdot in fs/namei.c.

Most filesystems (i.e. ext2fs) do support lookup("..") and will find
the parent inode. isofs is actually an interesting case. It tries
to even though there is no need, and it fails :-(

So currently knfsd does a lookup of ".." to find the parent
directory, and this works fairly well. However the locking issues
are a bit complex.
Doing the 'lookup("..")' through normal (i.e. VFS, channels) causes a
dentry for ".." to be created, which is quite un-natural for the
VFS. It breaks rules such as "one dentry per directory".

I think that the code in the current kernel that does this gets all
the locking right, but I'm not 100% confident, particularly because
it is difficult to reason about. It is hard to reason about
behaviour when you are breaking rules.

In the most recent rewrite of the knfsd code where I started getting
the filesystem involved in resolving file handles, I found that it
was much easier to be confident of correctness if I got the
filesystem to do the lookup("..") underneath the VFS layer.
It could be argued that lookup_parent should be an inode_operation
instead of an nfsd_operation, but either way, it needs to be
exported for knfsd to be able to do it's job properly.


-------------------------------------------------------------

3/ requirements to support knfsd.

It is already the case that a filesystem needs to do some special
things to support knfsd:
- it should maintain a generation number in i_generation so that
kfnsd can tell if an inode number has been reused.
- it must provide a read_inode routine that completely fills in the
inode
- the read_inode routine should cope if asked to read an inode that
doesn't exist, but returning a bad_inode.
- it should support lookup("..").


Before 2.2.14, no filesystem lived up to all of these. Since then,
ext2 does, but no other.

What is needed for knfsd support with my patches?
1/ A filesystem needs to assert that it is exportable. Doing a
lookup by filehandle is quite different from doing a lookup by
name, and knfsd needs to know if it is reasonable with a given
filesystem.
knfsd currently guess exportable filesystems be rejecting:
those that have NODEV
those that have no read_inode
This is fairly adhoc.
The NODEV requirement is still there, so knfsd can identify the
filesystem, but instead of the read_inode test, a filesystem
simply presents a non-NULL nfsd_operations structure.
It can leave all fields NULL, but this cannot give full support:

REQUIREMENT: set sb->s_nfsd_op to an nfsd_operation structure.

The filesystem can define encode_fh and decode_fh itself to do
whatever it likes. It can also define get_name and get_dentry to
direct the filehandle lookup at a finer grained level. However if
it want to use the simple, least "interesting and unusual" approach,
it must meet the following requirements:

2/ As discussed above, knfsd needs to lookup(".."), and the vfs
interface does not provide a clean way do do that, so:

REQUIREMENT: define get_parent in the nfsd_operation structure to
be a function which find the parent of a directory. This is
typically about the same size as the inode_operation->lookup
function.

3/ Any time that an inode number gets reused for a different file,
i_generation must be set to a new value (with high
degree of probability at least).
As I said above, this is already a requirement, and only ext2
meets it.

REQUIREMENT: maintain i_generation

4/ As discussed above:

REQUIREMENT: read_inode, must completely fill in the inode,
and must reject requests for non-existing inodes by returning a
bad_inode

5/ If a normal VFS operation builds a dcache path down to a
directory at the same time that knfsd is trying to build a dcache
path up from that directory, there is a real possibility of
clashing and making duplicate dentries for some directories. To
guard against this, lookup and get_parent must co-operate with
locking to make sure that only one of them attaches a dentry to a
given inode. This is easily done by calling a support routine in
dcache.c

REQUIREMENT: lookup must call d_splice_alias when it finds
an inode, and get_parent must call d_make_alias.


This may seem like a lot, but several of these are already
requirements which most filesystems don't meet, and other are there
to tidy-up interfaces and make locking more straight forward.

If there anything here has seems unreasonable?

NeilBrown

2001-02-20 01:16:47

by Chris Mason

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4



On Tuesday, February 20, 2001 11:40:24 AM +1100 Neil Brown
<[email protected]> wrote:
>
> When reiserfs came along, it abused this, and re-interpreted the
> opaque datum to contain information for recalling (locating) an
> inode - if read_inode2 was defined. I think this is wrong.
>

I suggested to Al Viro a while ago to break iget up into two calls, and
then got sucked into something else and didn't follow up. Independently,
he came up with the below message during a thread on fs-devel about
read_inode. I think it is very similar to what you've described, and it
should work. I'm willing to help integrate/code once things settle down
for 2.4.

His last paragraph is also important, I'd rather see this as a new
call...BTW, I believe XFS and GFS actually have 64 bit inode numbers, while
reiserfs has a unique 32 bit inode number (objectid) and a unique 32 bit
packing locality that are both required to find the object.

Cut n' paste from Viro's mail, beware of newline munging:

> From: Alexander Viro <[email protected]>
> To: Steve Whitehouse <[email protected]>
> cc: [email protected]
> Subject: Re: read_inode() extra argument ?
> Date-Sent: Tuesday, December 19, 2000 06:41:42 AM -0500
>
[snip]
> Basically, read_inode() (and iget()) is _not_ a universal API. It's
> a conveniency thing for filesystems that are comfortable with it. Notice
> that quite a few filesystems do not have ->read_inode() at all.
>
> If you need more than just an inumber to fill the inode - don't use
> iget() at all. It's just a wrong API for that kind of situations.
>
> In all fairness, ->read_inode() should not be a method. Correct way to
> deal with that would be along the lines
>
> --- fs/inode.c Tue Dec 19 09:42:41 2000
> +++ fs/inode.c.new Tue Dec 19 09:59:37 2000
> @@ -661,22 +661,10 @@
> inode->i_ino = ino;
> inode->i_flags = 0;
> atomic_set(&inode->i_count, 1);
> - inode->i_state = I_LOCK;
> + inode->i_state = I_LOCK | I_NEW;
> spin_unlock(&inode_lock);
>
> clean_inode(inode);
> - sb->s_op->read_inode(inode);
> -
> - /*
> - * This is special! We do not need the spinlock
> - * when clearing I_LOCK, because we're guaranteed
> - * that nobody else tries to do anything about the
> - * state of the inode when it is locked, as we
> - * just created it (so there can be no old holders
> - * that haven't tested I_LOCK).
> - */
> - inode->i_state &= ~I_LOCK;
> - wake_up(&inode->i_wait);
>
> return inode;
> }
> @@ -693,6 +681,20 @@
> wait_on_inode(inode);
> }
> return inode;
> +}
> +
> +void unlock_new_inode(struct inode *inode)
> +{
> + /*
> + * This is special! We do not need the spinlock
> + * when clearing I_LOCK, because we're guaranteed
> + * that nobody else tries to do anything about the
> + * state of the inode when it is locked, as we
> + * just created it (so there can be no old holders
> + * that haven't tested I_LOCK).
> + */
> + inode->i_state &= ~(I_LOCK|I_NEW);
> + wake_up(&inode->i_wait);
> }
>
> static inline unsigned long hash(struct super_block *sb, unsigned long
> i_ino) --- fs/ext2/namei.c Tue Dec 19 09:59:54 2000
> +++ fs/ext2/namei.c.new Tue Dec 19 10:01:22 2000
> @@ -175,9 +175,12 @@
> unsigned long ino = le32_to_cpu(de->inode);
> brelse (bh);
> inode = iget(dir->i_sb, ino);
> -
> if (!inode)
> return ERR_PTR(-EACCES);
> + if (inode->i_state & I_NEW) {
> + ext2_read_inode(inode);
> + unlock_new_inode(inode);
> + }
> }
> d_add(dentry, inode);
> return NULL;
>
> (modulo obvious changes to fs.h, exporting (or inlining)
> unlock_new_inode(), etc.)
>
> The point being: allow filesystems to use whatever means they find
> convenient for filling the inode. All we really need from icache is an
> analog of iget() that would leave new struct inode locked, recognizable
> as new and would _not_ do anything fs-specific. Quite possibly we want a
> new function for that leaving iget() as-is so that existing callers would
> stay intact - patch above just describes the general idea.




2001-02-20 01:24:49

by Alan

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

> This may seem like a lot, but several of these are already
> requirements which most filesystems don't meet, and other are there
> to tidy-up interfaces and make locking more straight forward.

As a 2.5 thing it sounds like a very sensible path. It will also provide
some of the operations groundwork needed for file systems that can only use
NFS4 temporary handles

2001-02-20 01:38:24

by NeilBrown

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

On Tuesday February 20, [email protected] wrote:
> > This may seem like a lot, but several of these are already
> > requirements which most filesystems don't meet, and other are there
> > to tidy-up interfaces and make locking more straight forward.
>
> As a 2.5 thing it sounds like a very sensible path. It will also provide
> some of the operations groundwork needed for file systems that can only use
> NFS4 temporary handles

I had wanted to get it in before 2.4, but didn't quite make it.

I guess that means that, from your perspective, nfs exporting of
reiserfs remains an extra-kernel patch at least until 2.5 starts up
and the code has been tested extensively and it gets backported to
2.4, or maybe even until 2.6/3.0. Such is life.

NeilBrown

2001-02-20 01:35:04

by NeilBrown

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

On Monday February 19, [email protected] wrote:
>
>
> On Tuesday, February 20, 2001 11:40:24 AM +1100 Neil Brown
> <[email protected]> wrote:
> >
> > When reiserfs came along, it abused this, and re-interpreted the
> > opaque datum to contain information for recalling (locating) an
> > inode - if read_inode2 was defined. I think this is wrong.
> >
>
> I suggested to Al Viro a while ago to break iget up into two calls, and
> then got sucked into something else and didn't follow up. Independently,
> he came up with the below message during a thread on fs-devel about
> read_inode. I think it is very similar to what you've described, and it
> should work. I'm willing to help integrate/code once things settle down
> for 2.4.

I must have missed that thread...
Yes, what Al Viro suggests is exactly the same idea as mine, except
that I suggest leaving iget as it is and getting the filesystem to do
a bit of locking.

>
> His last paragraph is also important, I'd rather see this as a new
> call...BTW, I believe XFS and GFS actually have 64 bit inode numbers, while
> reiserfs has a unique 32 bit inode number (objectid) and a unique 32 bit
> packing locality that are both required to find the object.

I think the particular need for a new call is to handle long inode
identifiers better. The current iget4 is a bit ugly.
If/when a new iget is done to handle long identifiers elegantly, it
would probably be good to include the I_NEW stuff as well, but for now
(2.4) both long identifiers and delayed fill-in can be done without
changing iget.

NeilBrown

(what's happened to Al Viro anyway, he has been awfully quite lately?)

2001-02-20 02:12:07

by David Konerding

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

Alan Cox writes:
>> This may seem like a lot, but several of these are already
>> requirements which most filesystems don't meet, and other are there
>> to tidy-up interfaces and make locking more straight forward.
>
>As a 2.5 thing it sounds like a very sensible path. It will also provide
>some of the operations groundwork needed for file systems that can only use
>NFS4 temporary handles
>

OK so I think what I can take from this is: for kernel 2.4 in the foreseeable
future, reiserfs over NFS won't work without a special patch. And, filesystems
other than ext2 in general might not support NFS. This all has to do with
internal design decisions, results of people coding against those decisions,
possibly abusing the decisions to acheive their own goals, etc, etc, all
of which lead to problems when the underlying design decisions changed and broke
code which depended on the old decisions.

For the foreseeable future I am going to stick with ext2 for my NFS-exported
directories. I think these problems need to be carefully stated in the Configure.help
for each filesystem which does not support NFS ... Configure.help is often the
most authoritative and up-to-date technical description of a particular kernel's
capabilities.

Dave

2001-02-20 02:47:58

by Roman Zippel

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

Hi,

On Tue, 20 Feb 2001, Neil Brown wrote:

> 2/ lookup("..").

A small question:
Why exactly is this needed?

bye, Roman

2001-02-20 09:44:49

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] Re: problems with reiserfs + nfs using 2.4.2-pre4

>>>>> " " == Roman Zippel <[email protected]> writes:

> Hi, On Tue, 20 Feb 2001, Neil Brown wrote:

>> 2/ lookup("..").

> A small question: Why exactly is this needed?

Short answer: the existence of 'rename' makes it necessary, since it
means that the directory path is volatile as far as the client is
concerned.

IIRC several NFS implementations (not Linux though) rely on being able
to walk back up the directory tree in order to discover the path at
any given moment.

Under Linux, our reliance on dentries doesn't allow for directory
paths to be volatile, so we don't support it. Instead we end up having
to support aliased directories.

Cheers,
Trond

2001-02-20 14:15:36

by Roman Zippel

[permalink] [raw]
Subject: Re: [NFS] Re: problems with reiserfs + nfs using 2.4.2-pre4

Hi,

On 20 Feb 2001, Trond Myklebust wrote:

> IIRC several NFS implementations (not Linux though) rely on being able
> to walk back up the directory tree in order to discover the path at
> any given moment.

If I read the source correctly, namespace operation are done with dir file
handle + file name. I'm playing with the idea if we could relax the rule,
that all dentries must be connected to the root. Inode to dentry lookups
are really evil, e.g. the current code ignores that there might be a fs
that supports links to dirs (besides that vfs doesn't support that very
well either).
What IMO knfsd needs is only a file handle <-> inode operation and as long
as the inode is not connected to a dcache entry (i_dentry is empty) it
gets a dummy dentry, which is used for further lookups. As soon as a real
dentry lookups that inode, we can flush the dummy dentry (small change to
d_instantiate()).
This would make it possible to support fs, that can't lookup ".." or it
would avoid extra checks for fs, that don't have real ".." dir entries.
All what a fs needs to do is to generate a 16(?) byte cookie, which can be
used to find the inode back (with the default to i_ino + i_generation).
This is nothing for 2.4, but IMO something that could be tried with 2.5.

bye, Roman

PS: /me is searching his fire proof underwear. :)

2001-02-20 15:17:10

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] Re: problems with reiserfs + nfs using 2.4.2-pre4

>>>>> " " == Roman Zippel <[email protected]> writes:

> If I read the source correctly, namespace operation are done
> with dir file handle + file name. I'm playing with the idea if
> we could relax the rule, that all dentries must be connected to
> the root. Inode to dentry lookups are really evil, e.g. the
> current code ignores that there might be a fs that supports
> links to dirs (besides that vfs doesn't support that very well
> either). What IMO knfsd needs is only a file handle <-> inode
> operation and as long as the inode is not connected to a dcache
> entry (i_dentry is empty) it gets a dummy dentry, which is used
> for further lookups. As soon as a real dentry lookups that
> inode, we can flush the dummy dentry (small change to
> d_instantiate()). This would make it possible to support fs,
> that can't lookup ".." or it would avoid extra checks for fs,
> that don't have real ".." dir entries. All what a fs needs to
> do is to generate a 16(?) byte cookie, which can be used to
> find the inode back (with the default to i_ino + i_generation).
> This is nothing for 2.4, but IMO something that could be tried
> with 2.5.

Isn't this more or less the default already in 2.4?

If I read the code correctly, we set the dentry d_flag
DCACHE_NFSD_DISCONNECTED on such dummy dentries. We only force a
lookup of the full path if the inode represents a directory or the
NFSEXP_NOSUBTREECHECK export flag is not set.

It doesn't seem like a major change to delay that full path lookup of
the dentry until nfsd_lookup('..') is actually called (in the case
where the 'subtree_check' flag isn't used).
However, outright banning lookups of '..' by any one filesystem isn't
an option: path lookups are used for a lot more than just
`getcwd'. Imagine for instance trying to follow a relative soft link
across such a filesystem.

Cheers,
Trond

2001-02-20 16:33:00

by Roman Zippel

[permalink] [raw]
Subject: Re: [NFS] Re: problems with reiserfs + nfs using 2.4.2-pre4

Hi,

On Tue, 20 Feb 2001, Trond Myklebust wrote:

> If I read the code correctly, we set the dentry d_flag
> DCACHE_NFSD_DISCONNECTED on such dummy dentries. We only force a
> lookup of the full path if the inode represents a directory or the
> NFSEXP_NOSUBTREECHECK export flag is not set.

IMO you can't safely delay the release of the dummy entry without help of
vfs. Are these dummy entries always properly released?
It seems I forgot about the subtree check, so it seems a fs that can't
provide a get_parent, can only be exported completely?

> It doesn't seem like a major change to delay that full path lookup of
> the dentry until nfsd_lookup('..') is actually called (in the case
> where the 'subtree_check' flag isn't used).
> However, outright banning lookups of '..' by any one filesystem isn't
> an option: path lookups are used for a lot more than just
> `getcwd'. Imagine for instance trying to follow a relative soft link
> across such a filesystem.

AFAIK this is already done in the generic code (in path_walk(), which is
also called by vfs_follow_link()).

bye, Roman

2001-02-20 22:55:12

by Brian May

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

>>>>> "dek" == dek ml <[email protected]> writes:

dek> OK so I think what I can take from this is: for kernel 2.4 in
dek> the foreseeable future, reiserfs over NFS won't work without
dek> a special patch. And, filesystems other than ext2 in general

Does this apply to the user space NFS server? kernel space NFS server?
Or both?
--
Brian May <[email protected]>

2001-02-20 23:32:51

by Chris Mason

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4



On Wednesday, February 21, 2001 09:54:19 AM +1100 Brian May
<[email protected]> wrote:

>>>>>> "dek" == dek ml <[email protected]> writes:
>
> dek> OK so I think what I can take from this is: for kernel 2.4 in
> dek> the foreseeable future, reiserfs over NFS won't work without
> dek> a special patch. And, filesystems other than ext2 in general
>
> Does this apply to the user space NFS server? kernel space NFS server?
> Or both?

Both, but in different ways. Andi Kleen has a set patches for using the
userspace NFS server with reiserfs:

ftp.suse.com/pub/people/ak/nfs

-chris

2001-02-21 03:02:41

by NeilBrown

[permalink] [raw]
Subject: Re: problems with reiserfs + nfs using 2.4.2-pre4

On Tuesday February 20, [email protected] wrote:
> Hi,
>
> On Tue, 20 Feb 2001, Neil Brown wrote:
>
> > 2/ lookup("..").
>
> A small question:
> Why exactly is this needed?
>
> bye, Roman

Having read the subsequent posts, I now see what you are thinking and
know how to answer this.

The problem that I see is indeed related to rename, but not in the way
that Trond mentions. (Tronds issue is a real, though very different,
issue).

Suppose I have a directory path

/a/b/c/d/e/f/g

Where filehandle X refers to /a/b and filehande Y refers to
/a/b/c/d/e/f/g

Then an NFS request comes in saying

RENAME X/c to Y/h

If this was performed, then you would end up with a directory path

/a/b

and a disconnected loop:


d/e/f/g/h
^ v
+-------+

which is not good.
This possibility is protected against by the VFS layer. It serialises
all directory renames using s_vfs_rename_sem and then checks that the
source of the rename isn't an ancestor of the target.
For this check to be reliable, each dentry for a directory must be
properly connected to the root.

Now if you have a filesystem that:

- cannot do ".." lookups efficiently, or doesn't want to and
- can protect against this sort of loop (and any other issues that
the VFS usually protects against) itself

then it can (with my patch) simply define decode_fh and encode_fh and
do whatever it likes, such as create a dentry that isn't properly
connected.

get_parent is only called by nfsd_find_fh_dentry which is a helper
routine which is normally called by the decode_fh function (and is
called by the default decode_fh function).
If a filesystem provides it's own decode_fh which doesn't call
nfsd_find_fh_dentry, then get_parent won't be called and isn't needed.

NeilBrown

2001-02-21 12:43:43

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] Re: problems with reiserfs + nfs using 2.4.2-pre4

>>>>> " " == Neil Brown <[email protected]> writes:

> - cannot do ".." lookups efficiently, or doesn't want to and
> - can protect against this sort of loop (and any other issues
> that
> the VFS usually protects against) itself

> then it can (with my patch) simply define decode_fh and
> encode_fh and do whatever it likes, such as create a dentry
> that isn't properly connected.

Do we really want to design for this though?

Being able to look up the parent directory is explicitly encoded in
all revisions of the NFS protocol. In NFSv2/v3 the names '.' and '..'
have a clearly defined meaning, whereas in NFSv4 you are required to
support LOOKUPP.

I can't speculate about the needs of other networked filesystems which
may want to use this handle interface, but as far NFS is concerned we
are not interested in considering such special cases.

Cheers,
Trond