2001-07-09 17:29:18

by Craig Soules

[permalink] [raw]
Subject: NFS Client patch

--- linux/include/linux/nfs_fs.h Fri May 25 21:02:11 2001
+++ /usr/src/linux/include/linux/nfs_fs.h Sun Jul 8 14:40:57 2001
@@ -160,13 +160,18 @@
static __inline__ struct rpc_cred *
nfs_file_cred(struct file *file)
{
- struct rpc_cred *cred = (struct rpc_cred *)(file->private_data);
+ struct nfs_file_private *priv =
+ (struct nfs_file_private *)(file->private_data);
+ struct rpc_cred *cred = priv->cred;
#ifdef RPC_DEBUG
if (cred && cred->cr_magic != RPCAUTH_CRED_MAGIC)
BUG();
#endif
return cred;
}
+
+#define NFS_FILE_LASTMOD(filep) \
+ ((struct nfs_file_private *)((filep)->private_data))->lastmod

/*
* linux/fs/nfs/dir.c
--- linux/include/linux/nfs_fs_i.h Mon Feb 19 20:13:00 2001
+++ /usr/src/linux/include/linux/nfs_fs_i.h Sun Jul 8 14:18:16 2001
@@ -98,4 +98,12 @@
*/
#define NFS_LCK_GRANTED 0x0001 /* lock has been granted */

+/*
+ * NFS private data in the file descriptor
+ */
+struct nfs_file_private {
+ struct rpc_cred *cred;
+ unsigned long lastmod;
+};
+
#endif
--- linux/fs/nfs/dir.c Sat May 19 21:02:45 2001
+++ /usr/src/linux/fs/nfs/dir.c Sun Jul 8 14:35:55 2001
@@ -384,6 +384,11 @@
memset(&my_entry, 0, sizeof(my_entry));

desc->file = filp;
+
+ if(NFS_FILE_LASTMOD(filp) < NFS_ATTRTIMEO_UPDATE(inode)) {
+ filp->f_pos = 0;
+ NFS_FILE_LASTMOD(filp) = NFS_ATTRTIMEO_UPDATE(inode);
+ }
desc->target = filp->f_pos;
desc->entry = &my_entry;
desc->decode = NFS_PROTO(inode)->decode_dirent;
--- linux/fs/nfs/inode.c Sat May 19 21:14:38 2001
+++ /usr/src/linux/fs/nfs/inode.c Sun Jul 8 14:36:30 2001
@@ -799,13 +799,14 @@
*/
int nfs_open(struct inode *inode, struct file *filp)
{
+ struct nfs_file_private *priv;
struct rpc_auth *auth;
- struct rpc_cred *cred;

lock_kernel();
+ priv = kmalloc(sizeof(struct nfs_file_private), GFP_KERNEL);
auth = NFS_CLIENT(inode)->cl_auth;
- cred = rpcauth_lookupcred(auth, 0);
- filp->private_data = cred;
+ priv->cred = rpcauth_lookupcred(auth, 0);
+ filp->private_data = priv;
unlock_kernel();
return 0;
}
@@ -814,12 +815,17 @@
{
struct rpc_auth *auth;
struct rpc_cred *cred;
+ struct nfs_file_private *priv;

lock_kernel();
- auth = NFS_CLIENT(inode)->cl_auth;
- cred = nfs_file_cred(filp);
- if (cred)
- rpcauth_releasecred(auth, cred);
+ priv = filp->private_data;
+ if(priv) {
+ auth = NFS_CLIENT(inode)->cl_auth;
+ cred = nfs_file_cred(filp);
+ if (cred)
+ rpcauth_releasecred(auth, cred);
+ kfree(priv);
+ }
unlock_kernel();
return 0;
}


2001-07-09 18:34:33

by Andi Kleen

[permalink] [raw]
Subject: Re: NFS Client patch

Craig Soules <[email protected]> writes:

> Our system does automatic directory compaction through the use of a tree
> structure, and so the cookie needs to be invalidated. Also, any other
> file system whicih does immediate directory compaction would require this.

Actually all the file systems who do that on Linux (JFS, XFS, reiserfs)
have fixed the issue properly server side, by adding a layer that generates
stable cookies. You should too.

-Andi

2001-07-09 18:59:55

by Trond Myklebust

[permalink] [raw]
Subject: NFS Client patch

>>>>> " " == Craig Soules <[email protected]> writes:

> Hello, I hope that I am sending this to the appropriate people.
> I have been working on a project known as Self-Securing Storage
> here at Carnegie Mellon University. We have developed our
> storage server to act as an NFSv2 server, and have been using
> the Linux NFSv2 client to do our benchmarking. I have run
> across a small problem with the 2.4 implementation of the Linux
> NFSv2 client.

> The problem is in the readdir() operation. The current cookie
> for a given readdir operation is being stored in the file
> descriptor. The problem is that it is not being reset to 0 if
> a change has been made to the directory as is indicated in the
> NFSv2 spec. This problem is often seen when doing an operation
> such as rm -rf to a large directory tree due to the
> asynchronous remove operation that has been implemented.

The NFSv2 spec says no such thing. It simply says that you set the
cookie to zero when you want to start at the beginning of the
directory. This is only needed when we want to reread the directory
into the page cache.

Your patch will automatically lead to duplicate entries in readdir()
on most if not all servers whenever the attributes on the inode have
been refreshed (whether or not the cache has been invalidated). That's
a bug...

Cheers,
Trond

2001-07-09 19:46:09

by Craig Soules

[permalink] [raw]
Subject: Re: NFS Client patch

On Mon, 9 Jul 2001, Trond Myklebust wrote:
> The NFSv2 spec says no such thing. It simply says that you set the
> cookie to zero when you want to start at the beginning of the
> directory. This is only needed when we want to reread the directory
> into the page cache.

Ok, perhaps I mis-spoke slightly. What the spec does state is that the
cookie is opaque. This has generally been interpreted to mean that you
should not trust it to be stable after a change to that directory.

> Your patch will automatically lead to duplicate entries in readdir()
> on most if not all servers whenever the attributes on the inode have
> been refreshed (whether or not the cache has been invalidated). That's
> a bug...

If I were to do a create during a readdir() operation which inserted
itself in the directory before the place it left off, that entry would be
left out of the listing. That is also a bug, wouldn't you think?

I'd also like to point out that every other operating system which I have
tested this with has resulted in the correct behavior (NetBSD, FreeBSD,
Digital Unix, ...)

As for the refresh vs. invalidate problem, I would be happy to add
another time stamp to the in-core nfs inode exclusively for directory
invalidation.

Craig

2001-07-09 19:53:36

by Charles Cazabon

[permalink] [raw]
Subject: Re: NFS Client patch

Craig Soules <[email protected]> wrote:
>
> Ok, perhaps I mis-spoke slightly. What the spec does state is that the
> cookie is opaque. This has generally been interpreted to mean that you
> should not trust it to be stable after a change to that directory.

I thought the generally accepted meaning of "opaque" in this context was
"don't expect to be able to infer anything relevant from this data". This has
nothing to do with how long a particular cookie is valid.

Charles
--
-----------------------------------------------------------------------
Charles Cazabon <[email protected]>
GPL'ed software available at: http://www.qcc.sk.ca/~charlesc/software/
-----------------------------------------------------------------------

2001-07-09 20:05:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS Client patch

>>>>> " " == Craig Soules <[email protected]> writes:

>> Your patch will automatically lead to duplicate entries in
>> readdir() on most if not all servers whenever the attributes on
>> the inode have been refreshed (whether or not the cache has
>> been invalidated). That's a bug...

> If I were to do a create during a readdir() operation which
> inserted itself in the directory before the place it left off,
> that entry would be left out of the listing. That is also a
> bug, wouldn't you think?

No: it's POSIX

If the client discovers that the cache is invalid, it clears it, and
refills the cache. We then start off at the next cookie after the last
read cookie. Test it on an ordinary filesystem and you'll see the
exact same behaviour. The act of creating or deleting files is *not*
supposed invalidate the readdir offset.

You are confusing the act of detecting whether or not the cache is
invalid with that of recovering after a cache invalidation. In the
former case we do have room for improvement: see for instance

http://www.fys.uio.no/~trondmy/src/2.4.6/linux-2.4.6-cto.dif

which strengthens the attribute checking on open().

Cheers,
Trond

2001-07-09 21:47:52

by J. Richard Sladkey

[permalink] [raw]
Subject: RE: NFS Client patch

> Ok, perhaps I mis-spoke slightly. What the spec does state is that the
> cookie is opaque. This has generally been interpreted to mean that you
> should not trust it to be stable after a change to that directory.

This interpretation isn't useful. If a second client modifies the
directory while the first client is reading a directory, the first
client has no way of knowing that its cookie is now invalid, yet it
clearly will be invalid if the server's cookies are invalid after
any directory modifying operation.

The solution is that the server must cope with directory modifying
operations and still keep its cookies valid. It can do this by
cooperating with the filesystem to update the "true" index associated
with the "opaque" index stored in any outstanding cookies. Or it can
simply have a more sophisticated cookie, such as passing the inode number
of the first unread directory entry in the cookie and the offset of the
entry. If they are the same continue as usual. If they are different,
re-read the directory from the beginning, searching for that specific
inode. Or any other scheme. Note that this information is still
completely opaque to the client.

Other operating systems may not show the problem if they pre-read
much larger blocks of directory entries. However, you should be able
to provoke the problem out of any OS by creating a sufficiently
large directory. In any case, the two-client argument clearly shows
that cookies should be so fragile that any directory operation
makes them invalid. This makes your server more complicated, but it
seems like the correct behavior.

2001-07-09 22:10:08

by Craig Soules

[permalink] [raw]
Subject: Re: NFS Client patch

On Mon, 9 Jul 2001, Trond Myklebust wrote:
> If the client discovers that the cache is invalid, it clears it, and
> refills the cache. We then start off at the next cookie after the last
> read cookie. Test it on an ordinary filesystem and you'll see the
> exact same behaviour. The act of creating or deleting files is *not*
> supposed invalidate the readdir offset.

I would say that assuming that the readdir cookie is an offset is a break
in the spec. In fact, there are a few things in the spec which I'd like
to point out. First of all, "All of the procedures in the NFS protocol
are assumed to be synchronous." Which means that you should not even be
making asynchronous remove calls. Second, the server is meant to be
"as stateless as possible." I would argue that this means that you should
not make assumptions about the cookie's state if another operation is
interposed between two readdir() operations. As an aside, by adding a
translation layer to the cookies (as suggested by an earlier post) would
break this, as the server would have to store that state in the event of a
server crash, thus breaking the spec.

> You are confusing the act of detecting whether or not the cache is
> invalid with that of recovering after a cache invalidation. In the
> former case we do have room for improvement: see for instance

It may be that my solution is imperfect for the problem I would like
fixed, however, I do not believe that your current recovery mechanism is
correct.

Craig

2001-07-10 08:22:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS Client patch

>>>>> " " == Craig Soules <[email protected]> writes:

> On Mon, 9 Jul 2001, Trond Myklebust wrote:
>> If the client discovers that the cache is invalid, it clears
>> it, and refills the cache. We then start off at the next cookie
>> after the last read cookie. Test it on an ordinary filesystem
>> and you'll see the exact same behaviour. The act of creating or
>> deleting files is *not* supposed invalidate the readdir offset.

> I would say that assuming that the readdir cookie is an offset
> is a break in the spec. In fact, there are a few things in the
> spec which I'd like to point out. First of all, "All of the
> procedures in the NFS protocol are assumed to be synchronous."
> Which means that you should not even be making asynchronous
> remove calls. Second, the server is meant to be "as stateless
> as possible." I would argue that this means that you should
> not make assumptions about the cookie's state if another
> operation is interposed between two readdir() operations. As
> an aside, by adding a translation layer to the cookies (as
> suggested by an earlier post) would break this, as the server
> would have to store that state in the event of a server crash,
> thus breaking the spec.

Imagine if somebody gives you a 1Gb directory. Would it or would it
not piss you off if your file pointer got reset to 0 every time
somebody created a file?

The current semantics are scalable. Anything which resets the file
pointer upon change of a file/directory/whatever isn't...

Cheers,
Trond

2001-07-10 13:33:37

by Chris Wedgwood

[permalink] [raw]
Subject: Re: NFS Client patch

On Mon, Jul 09, 2001 at 08:33:31PM +0200, Andi Kleen wrote:

Actually all the file systems who do that on Linux (JFS, XFS,
reiserfs) have fixed the issue properly server side, by adding a
layer that generates stable cookies. You should too.

I've always thought that was a stupid fix. Why not have the clients be
smarted and make them responsible for getting a new cookie if the old
one is hosed?

For linux, with the dcache, I'm not even sure that this would be all
the hard. Persumable Solaris could (does?) do the same?



--cw

2001-07-10 13:38:37

by Chris Wedgwood

[permalink] [raw]
Subject: Re: NFS Client patch

On Tue, Jul 10, 2001 at 10:22:16AM +0200, Trond Myklebust wrote:

Imagine if somebody gives you a 1Gb directory. Would it or would
it not piss you off if your file pointer got reset to 0 every time
somebody created a file?

The current semantics are scalable. Anything which resets the file
pointer upon change of a file/directory/whatever isn't...

Anyone using a 1GB directory deserves for it not to scale. I think
this is a very poor example.

No that I disagree with you, the largest directories I have on my
system here are 2.6MB (freedb, lots of hashed flat-files in one
directory), here I do agree that you should not have to reset the
counter everytime.





--cw

2001-07-10 13:41:57

by Andi Kleen

[permalink] [raw]
Subject: Re: NFS Client patch

On Wed, Jul 11, 2001 at 01:33:11AM +1200, Chris Wedgwood wrote:
> On Mon, Jul 09, 2001 at 08:33:31PM +0200, Andi Kleen wrote:
>
> Actually all the file systems who do that on Linux (JFS, XFS,
> reiserfs) have fixed the issue properly server side, by adding a
> layer that generates stable cookies. You should too.
>
> I've always thought that was a stupid fix. Why not have the clients be
> smarted and make them responsible for getting a new cookie if the old
> one is hosed?

Because to get that new cookie you would need another cookie; otherwise
you could violate the readdir guarantee that it'll never return files
twice.

> For linux, with the dcache, I'm not even sure that this would be all
> the hard. Persumable Solaris could (does?) do the same?

dcache is not populated on readdir for good reasons (it would
require reading the inodes and tie a of lot of memory) and you would need
to lock all the dcache entries belong to a directory while a nfs readdir;
tieing up even more memory. Also a readdir() is not bounded in time.

BTW; the cookie issue is not an NFS only problem. It occurs on local
IO as well. Just consider rm -rf - reading directories and in parallel
deleting them (the original poster's file system would have surely
gotten that wrong). Another tricky case is telldir().


-Andi

2001-07-10 15:07:05

by Craig Soules

[permalink] [raw]
Subject: RE: NFS Client patch

On Mon, 9 Jul 2001, J. Richard Sladkey wrote:
> This interpretation isn't useful. If a second client modifies the
> directory while the first client is reading a directory, the first
> client has no way of knowing that its cookie is now invalid, yet it
> clearly will be invalid if the server's cookies are invalid after
> any directory modifying operation.

I believe that the behavior in this case is still undetermanistic. More
generically, if one person is writing to a file and another is reading, it
is unclear what that person will get at all.

Craig

2001-07-10 16:48:45

by Craig Soules

[permalink] [raw]
Subject: Re: NFS Client patch

On Tue, 10 Jul 2001, Andi Kleen wrote:
> Because to get that new cookie you would need another cookie; otherwise
> you could violate the readdir guarantee that it'll never return files
> twice.

I cannot locate any such guarantee in the NFS spec... are you refering to
another spec which applies?

> BTW; the cookie issue is not an NFS only problem. It occurs on local
> IO as well. Just consider rm -rf - reading directories and in parallel
> deleting them (the original poster's file system would have surely
> gotten that wrong). Another tricky case is telldir().

I don't believe that the behavior in this case is deterministic. If you
have multiple people accessing a single file, reading and writing to it,
there is no guarantee as to what the behavior is. The client should be
able to handle any errors it creates for itself while doing this kind of
parallel operation.

Craig

2001-07-10 17:06:25

by Andi Kleen

[permalink] [raw]
Subject: Re: NFS Client patch

On Tue, Jul 10, 2001 at 12:48:20PM -0400, Craig Soules wrote:
> On Tue, 10 Jul 2001, Andi Kleen wrote:
> > Because to get that new cookie you would need another cookie; otherwise
> > you could violate the readdir guarantee that it'll never return files
> > twice.
>
> I cannot locate any such guarantee in the NFS spec... are you refering to
> another spec which applies?

It's the unix semantics of readdir(); e.g. specified in Single Unix:

`` The type DIR, which is defined in the header <dirent.h>, represents
a directory stream, which is an ordered sequence of all the
directory entries in a particular directory. Directory entries
represent files; files may be removed from a directory or added to
a directory asynchronously to the operation of readdir(). ''

An ordered sequence does not include cycles.


>
> > BTW; the cookie issue is not an NFS only problem. It occurs on local
> > IO as well. Just consider rm -rf - reading directories and in parallel
> > deleting them (the original poster's file system would have surely
> > gotten that wrong). Another tricky case is telldir().
>
> I don't believe that the behavior in this case is deterministic. If you
> have multiple people accessing a single file, reading and writing to it,
> there is no guarantee as to what the behavior is. The client should be
> able to handle any errors it creates for itself while doing this kind of
> parallel operation.

What happens with new entries added is unspecified; but old entries removed
in parallel should never cause a violation of the rule above.

A simple index into a rebalancing btree unfortunately doesn't fulfil this;
but there are ways to add additional layers to fix it.

The easiest test for it is rm -rf.


-Andi

2001-07-10 18:05:06

by Chris Wedgwood

[permalink] [raw]
Subject: Re: NFS Client patch

On Tue, Jul 10, 2001 at 07:06:02PM +0200, Andi Kleen wrote:

It's the unix semantics of readdir(); e.g. specified in Single Unix:

`` The type DIR, which is defined in the header <dirent.h>, represents
a directory stream, which is an ordered sequence of all the
directory entries in a particular directory. Directory entries
represent files; files may be removed from a directory or added to
a directory asynchronously to the operation of readdir(). ''

An ordered sequence does not include cycles.

*Who* says NFS has to be a *unix* like filesystem?



--cw

2001-07-11 08:15:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS Client patch

>>>>> " " == Chris Wedgwood <[email protected]> writes:

> On Tue, Jul 10, 2001 at 10:22:16AM +0200, Trond Myklebust
> wrote:
> Imagine if somebody gives you a 1Gb directory. Would it or
> would it not piss you off if your file pointer got reset to
> 0 every time somebody created a file?

> The current semantics are scalable. Anything which resets
> the file pointer upon change of a file/directory/whatever
> isn't...

> Anyone using a 1GB directory deserves for it not to scale. I
> think this is a very poor example.

It's an extreme case, but it illustrates something the kernel should
be able to cope with.

> No that I disagree with you, the largest directories I have on
> my system here are 2.6MB (freedb, lots of hashed flat-files in
> one directory), here I do agree that you should not have to
> reset the counter everytime.

Right: this is what most people expect. The reason why the readdir
code went through several quite different incarnations in the 2.3.x
series was that duplicate directory entries were not acceptable to
people.

readdir() is not an atomic operation. You can't lock a directory while
doing a series of readdir calls either on local filesystems nor over
NFS. As such, the idea of volatile cookies doesn't really make sense,
nor is it supported in rfc1094 (NFSv2):

Each "entry" contains a "fileid" which consists of a unique number
to identify the file within a filesystem, the "name" of the file,
and a "cookie" which is an opaque pointer to the next entry in the
directory. The cookie is used in the next READDIR call to get more
entries starting at a given point in the directory.

Nothing there states that the cookie can be invalidated, nor is there
even an error to tell you that this is the case.


In rfc1813 (NFSv3), they recognized that NFSv2 couldn't cope with
stale cookies (yes: this fact is explicitly written down on pages 77
and 78), and hence they introduced the cookie verifier and the
NFS3ERR_BAD_COOKIE error, that can be used by the server to declare a
cookie as being stale. In this case, some extra recovery action might
make sense. I can see 3 possible solutions:

1) The behaviour in this case is undefined. Leave it up to the
user to reopen the directory, reset the file pointer, or whatever.
This is what we do now.

2) implement some extra caching info to allow an improved recovery
of the last file position. This would likely have to involve
storing the fileid + filename of the last entry somewhere in the
struct file.

This scheme means that lseek() breaks, and can undermine
glibc. The latter has a lousy getdents algorithm in which it
reads a number n of entries into a temporary buffer, the copy <=
n entries to the user (because their struct dirent is larger than
the kernel struct dirent), and then use lseek() to jump back.

3) Implement something like Craig suggests whereby you reset the
file pointer.

This gives unexpected results as far as the user is concerned as
it causes duplicate entries to pop up without any warning. It too
breaks lseek(). It's a policy decision on behalf of the user.

If someone can persuade the glibc people to implement a sane algorithm
for getdents() that precalculates the upper limit on how much padding
is needed, and drops the use of lseek(), then (2) might possibly be
worth doing for 2.5.x (I believe I heard that Solaris does something
along these lines).

If not, given a choice between (1) and (3), I choose (1).



Finally, any scheme that assumes cookie staleness in all cases where
the directory mtime changes will not be passed on to Linus.

Cheers,
Trond

2001-07-12 20:58:04

by Alan

[permalink] [raw]
Subject: Re: NFS Client patch

> An ordered sequence does not include cycles.
>
> *Who* says NFS has to be a *unix* like filesystem?

NFS is most emphatically not a posix compliant FS, at the best of times.

2001-07-13 11:27:13

by Chris Wedgwood

[permalink] [raw]
Subject: Re: NFS Client patch

On Thu, Jul 12, 2001 at 09:57:55PM +0100, Alan Cox wrote:

NFS is most emphatically not a posix compliant FS, at the best of
times.

Thats my point... why are we requiring file-systems and knfsd do all
sorts of psycho-acoustic-dildonics to make it look like one? Why not
just accept its not very posix like and that good-enough, is, well,
good-enough?



--cw

2001-07-17 22:03:47

by Hans Reiser

[permalink] [raw]
Subject: Re: NFS Client patch

Andi Kleen wrote:
>
> Craig Soules <[email protected]> writes:
>
> > Our system does automatic directory compaction through the use of a tree
> > structure, and so the cookie needs to be invalidated. Also, any other
> > file system whicih does immediate directory compaction would require this.
>
> Actually all the file systems who do that on Linux (JFS, XFS, reiserfs)
> have fixed the issue properly server side, by adding a layer that generates
> stable cookies. You should too.
>
> -Andi
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

I take issue with the word "properly". We have bastardized our FS design to do it. NFS should not
be allowed to impose stable cookie maintenance on filesystems, it violates layering. Simply
returning the last returned filename is so simple to code, much simpler than what we have to do to
cope with cookies. Linux should fix the protocol for NFS, not ask Craig to screw over his FS
design. Not that I think that will happen.....

Hans

2001-07-17 22:14:52

by Craig Soules

[permalink] [raw]
Subject: Re: NFS Client patch

On Wed, 18 Jul 2001, Hans Reiser wrote:
> I take issue with the word "properly". We have bastardized our FS design to do it. NFS should not
> be allowed to impose stable cookie maintenance on filesystems, it violates layering. Simply
> returning the last returned filename is so simple to code, much simpler than what we have to do to
> cope with cookies. Linux should fix the protocol for NFS, not ask Craig to screw over his FS
> design. Not that I think that will happen.....

Unfortunately to comply with NFSv2, the cookie cannot be larger than
32-bits. I believe this oversight has been correct in later NFS versions.

I do agree that forcing the underlying fs to "fix" itself for NFS is the
wrong solution. I can understand their desire to follow unix semantics
(although I don't entirely agree with them), so until I think up a more
palatable solution for the linux community, I will just keep my patches to
myself :)

Craig

2001-07-17 22:23:13

by Hans Reiser

[permalink] [raw]
Subject: Re: NFS Client patch

Craig Soules wrote:
>
> On Wed, 18 Jul 2001, Hans Reiser wrote:
> > I take issue with the word "properly". We have bastardized our FS design to do it. NFS should not
> > be allowed to impose stable cookie maintenance on filesystems, it violates layering. Simply
> > returning the last returned filename is so simple to code, much simpler than what we have to do to
> > cope with cookies. Linux should fix the protocol for NFS, not ask Craig to screw over his FS
> > design. Not that I think that will happen.....
>
> Unfortunately to comply with NFSv2, the cookie cannot be larger than
> 32-bits. I believe this oversight has been correct in later NFS versions.
>
> I do agree that forcing the underlying fs to "fix" itself for NFS is the
> wrong solution. I can understand their desire to follow unix semantics
> (although I don't entirely agree with them), so until I think up a more
> palatable solution for the linux community, I will just keep my patches to
> myself :)
>
> Craig

64 bits as in NFS v4 is still not large enough to hold a filename. For practical reasons, ReiserFS
does what is needed to work with NFS, but what is needed bad design features, and any FS designer
who doesn't feel the need to get along with NFS should not have acceptance of bad design be made a
criterion for the acceptance of his patches. Just let NFS not work for Craig's FS, what is the
problem with that?

Hans

2001-07-18 13:26:53

by Daniel Phillips

[permalink] [raw]
Subject: Re: NFS Client patch

On Wednesday 18 July 2001 00:21, Hans Reiser wrote:
> Craig Soules wrote:
> > On Wed, 18 Jul 2001, Hans Reiser wrote:
> > > I take issue with the word "properly". We have bastardized our
> > > FS design to do it. NFS should not be allowed to impose stable
> > > cookie maintenance on filesystems, it violates layering. Simply
> > > returning the last returned filename is so simple to code, much
> > > simpler than what we have to do to cope with cookies. Linux
> > > should fix the protocol for NFS, not ask Craig to screw over his
> > > FS design. Not that I think that will happen.....
> >
> > Unfortunately to comply with NFSv2, the cookie cannot be larger
> > than 32-bits. I believe this oversight has been correct in later
> > NFS versions.
> >
> > I do agree that forcing the underlying fs to "fix" itself for NFS
> > is the wrong solution. I can understand their desire to follow unix
> > semantics (although I don't entirely agree with them), so until I
> > think up a more palatable solution for the linux community, I will
> > just keep my patches to myself :)
> >
> > Craig
>
> 64 bits as in NFS v4 is still not large enough to hold a filename.
> For practical reasons, ReiserFS does what is needed to work with NFS,
> but what is needed bad design features, and any FS designer who
> doesn't feel the need to get along with NFS should not have
> acceptance of bad design be made a criterion for the acceptance of
> his patches. Just let NFS not work for Craig's FS, what is the
> problem with that?

I was planning to add coalesce-on-delete to my ext2 directory index
patch at some point, now I see I'll step right into this NFS
doo-d^H^H^H^H^H problem. What to do? Obviously it's not an option
to have NFS not work for ext2. Just leaving the directory
uncoalesced fixes the problem in some sense and doesn't hurt things
all that much. Ext2 has been running that way for years.

Can I automagically know that a directory is mounted via NFS and
disable the coalescing? Or maybe I need a -o coalesce=on/off, with
"off" as the default. Ugh.

As you point out, this sucks.

--
Daniel

2001-07-18 13:58:47

by Chris Mason

[permalink] [raw]
Subject: Re: NFS Client patch



On Wednesday, July 18, 2001 02:02:33 AM +0400 Hans Reiser <[email protected]> wrote:

> Andi Kleen wrote:
>>
>> Craig Soules <[email protected]> writes:
>>
>> > Our system does automatic directory compaction through the use of a tree
>> > structure, and so the cookie needs to be invalidated. Also, any other
>> > file system whicih does immediate directory compaction would require this.
>>
>> Actually all the file systems who do that on Linux (JFS, XFS, reiserfs)
>> have fixed the issue properly server side, by adding a layer that generates
>> stable cookies. You should too.
>>

> I take issue with the word "properly". We have bastardized our FS design to do it. NFS should not
> be allowed to impose stable cookie maintenance on filesystems, it violates layering. Simply
> returning the last returned filename is so simple to code, much simpler than what we have to do to
> cope with cookies. Linux should fix the protocol for NFS, not ask Craig to screw over his FS
> design. Not that I think that will happen.....
>

Well, returning the last filename won't do much for filesystems that don't
have any directory indexes, but that's besides the point. Could nfsv4 be
better than it is? probably. Can we change older NFS protocols to have
a linux specific hack that makes them more filesystem (or at least reiserfs)
friendly? probably.

NFS is what it is, good and bad. People use it because it fits their setup
better than the others, so the filesystems need to cater to it, exactly
because it has such a large cross platform user base. Linux specific mods
take away from that cross platform base.

-chris

2001-07-18 14:01:07

by Jan Harkes

[permalink] [raw]
Subject: Re: NFS Client patch

On Wed, Jul 18, 2001 at 02:21:46AM +0400, Hans Reiser wrote:
> Craig Soules wrote:
> > Unfortunately to comply with NFSv2, the cookie cannot be larger than
> > 32-bits. I believe this oversight has been correct in later NFS versions.
> >
> > I do agree that forcing the underlying fs to "fix" itself for NFS is the
> > wrong solution. I can understand their desire to follow unix semantics
> > (although I don't entirely agree with them), so until I think up a more
> > palatable solution for the linux community, I will just keep my patches to
> > myself :)
> >
> > Craig
>
> 64 bits as in NFS v4 is still not large enough to hold a filename.
> For practical reasons, ReiserFS does what is needed to work with NFS,
> but what is needed bad design features, and any FS designer who
> doesn't feel the need to get along with NFS should not have acceptance
> of bad design be made a criterion for the acceptance of his patches.
> Just let NFS not work for Craig's FS, what is the problem with that?

Those 64-bits could be used for a simple hash to identify the filename.

In any case, what happens if the file was renamed or removed between the
2 readdir calls. A cookie identifying a name that was returned last, or
should be read next is just as volatile as a cookie that contains an
offset into the directory.

Jan

2001-07-18 14:48:22

by Hans Reiser

[permalink] [raw]
Subject: Re: NFS Client patch

Daniel Phillips wrote:
>
> On Wednesday 18 July 2001 00:21, Hans Reiser wrote:
> > Craig Soules wrote:
> > > On Wed, 18 Jul 2001, Hans Reiser wrote:
> > > > I take issue with the word "properly". We have bastardized our
> > > > FS design to do it. NFS should not be allowed to impose stable
> > > > cookie maintenance on filesystems, it violates layering. Simply
> > > > returning the last returned filename is so simple to code, much
> > > > simpler than what we have to do to cope with cookies. Linux
> > > > should fix the protocol for NFS, not ask Craig to screw over his
> > > > FS design. Not that I think that will happen.....
> > >
> > > Unfortunately to comply with NFSv2, the cookie cannot be larger
> > > than 32-bits. I believe this oversight has been correct in later
> > > NFS versions.
> > >
> > > I do agree that forcing the underlying fs to "fix" itself for NFS
> > > is the wrong solution. I can understand their desire to follow unix
> > > semantics (although I don't entirely agree with them), so until I
> > > think up a more palatable solution for the linux community, I will
> > > just keep my patches to myself :)
> > >
> > > Craig
> >
> > 64 bits as in NFS v4 is still not large enough to hold a filename.
> > For practical reasons, ReiserFS does what is needed to work with NFS,
> > but what is needed bad design features, and any FS designer who
> > doesn't feel the need to get along with NFS should not have
> > acceptance of bad design be made a criterion for the acceptance of
> > his patches. Just let NFS not work for Craig's FS, what is the
> > problem with that?
>
> I was planning to add coalesce-on-delete to my ext2 directory index
> patch at some point, now I see I'll step right into this NFS
> doo-d^H^H^H^H^H problem. What to do? Obviously it's not an option
> to have NFS not work for ext2. Just leaving the directory
> uncoalesced fixes the problem in some sense and doesn't hurt things
> all that much. Ext2 has been running that way for years.
>
> Can I automagically know that a directory is mounted via NFS and
> disable the coalescing? Or maybe I need a -o coalesce=on/off, with
> "off" as the default. Ugh.
>
> As you point out, this sucks.
>
> --
> Daniel

The NFS v4 committee clowns had this pointed out to them, and failed to fix it. I personally think
we should just fix Linux NFS to use filenames as cookies if both client and server are
"v4.cookie_monster" compliant. That is not to say that I actually have a person free to do that
work, but it sure gets tempting at times.....

When we implement more advanced directories for ReiserFS (which we deferred doing until after
Reiser4), we will probably have to code up a v4.cookie_monster implementation.

Hans

2001-07-18 14:47:59

by Hans Reiser

[permalink] [raw]
Subject: Re: NFS Client patch

Jan Harkes wrote:
>
> On Wed, Jul 18, 2001 at 02:21:46AM +0400, Hans Reiser wrote:
> > Craig Soules wrote:
> > > Unfortunately to comply with NFSv2, the cookie cannot be larger than
> > > 32-bits. I believe this oversight has been correct in later NFS versions.
> > >
> > > I do agree that forcing the underlying fs to "fix" itself for NFS is the
> > > wrong solution. I can understand their desire to follow unix semantics
> > > (although I don't entirely agree with them), so until I think up a more
> > > palatable solution for the linux community, I will just keep my patches to
> > > myself :)
> > >
> > > Craig
> >
> > 64 bits as in NFS v4 is still not large enough to hold a filename.
> > For practical reasons, ReiserFS does what is needed to work with NFS,
> > but what is needed bad design features, and any FS designer who
> > doesn't feel the need to get along with NFS should not have acceptance
> > of bad design be made a criterion for the acceptance of his patches.
> > Just let NFS not work for Craig's FS, what is the problem with that?
>
> Those 64-bits could be used for a simple hash to identify the filename.

That is what reiserfs does today. It sucks. The performance is worse for many to most apps because
files are not in lexicographic order, and stem compression is impossible.

>
> In any case, what happens if the file was renamed or removed between the
> 2 readdir calls. A cookie identifying a name that was returned last, or
> should be read next is just as volatile as a cookie that contains an
> offset into the directory.

No, if the file was removed, it still tells you where to start your search. A missing filename is
just as good a marker as a present one.

Hans

2001-07-19 11:36:57

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS Client patch

>>>>> " " == Chris Mason <[email protected]> writes:

> Well, returning the last filename won't do much for filesystems
> that don't have any directory indexes, but that's besides the
> point. Could nfsv4 be better than it is? probably. Can we
> change older NFS protocols to have a linux specific hack that
> makes them more filesystem (or at least reiserfs) friendly?
> probably.

NFSv2 and v3 have a fixed format for readdir calls. There's bugger all
you can do to change this without making the resulting protocol
incompatible with NFS.

If you don't want Reiserfs to be NFS compatible, then fine, but I
personally don't want to see hacks to the NFS v2/v3 code that rely on
'hidden knowledge' of the filesystem on the server.

Cheers,
Trond

2001-07-19 18:06:55

by Hans Reiser

[permalink] [raw]
Subject: Re: NFS Client patch

Trond Myklebust wrote:
>
> >>>>> " " == Chris Mason <[email protected]> writes:
>
> > Well, returning the last filename won't do much for filesystems
> > that don't have any directory indexes, but that's besides the
> > point. Could nfsv4 be better than it is? probably. Can we
> > change older NFS protocols to have a linux specific hack that
> > makes them more filesystem (or at least reiserfs) friendly?
> > probably.
>
> NFSv2 and v3 have a fixed format for readdir calls. There's bugger all
> you can do to change this without making the resulting protocol
> incompatible with NFS.
>
> If you don't want Reiserfs to be NFS compatible, then fine, but I
> personally don't want to see hacks to the NFS v2/v3 code that rely on
> 'hidden knowledge' of the filesystem on the server.
>
> Cheers,
> Trond
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

The current code does rely on hidden knowledge of the filesytem on the server, and refuses to
operate with any FS that does not describe a position in a directory as an offset or hash that fits
into 32 or 64 bits.

But be calm, I am not planning on fixing this myself anytime in the next year, we have an ugly and
hideous hack deployed in ReiserFS that works, for now I am just saying the folks who designed NFS
did a bad job and resolutely continue doing a bad job, and if someone wanted to fix it, they could
fix cookies to use filenames instead of byte offsets for those filesytems able to better use
filenames than byte offsets to describe a position within a directory, and for those clients and
servers who are both smart enough to understand filenames instead of cookies (able to understand the
cookie monster protocol).

Hans

2001-07-20 08:51:25

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS Client patch

>>>>> " " == Hans Reiser <[email protected]> writes:

> The current code does rely on hidden knowledge of the filesytem
> on the server, and refuses to operate with any FS that does not
> describe a position in a directory as an offset or hash that
> fits into 32 or 64 bits.

I'm not saying that ReiserFS is wrong to question the correctness of
this. I'm just saying that NFSv2 and v3 are fixed protocols, and that
it's too late to do anything about them. I read Chris mail as a
suggestion of creating yet another NQNFS, and this would IMHO be a
mistake. Better to concentrate on NFSv4 which is meant to be
extendible.

> But be calm, I am not planning on fixing this myself anytime in
> the next year, we have an ugly and hideous hack deployed in
> ReiserFS that works, for now I am just saying the folks who
> designed NFS did a bad job and resolutely continue doing a bad
> job, and if someone wanted to fix it, they could fix cookies to
> use filenames instead of byte offsets for those filesytems able
> to better use filenames than byte offsets to describe a
> position within a directory, and for those clients and servers
> who are both smart enough to understand filenames instead of
> cookies (able to understand the cookie monster protocol).

This is something which I believe you raised in the NFSv4 group, and
which could indeed be a candidate for an NFSv4 extension. After all,
this is in essence a recognition of the method most NFS clients
implement for recovering from an EBADCOOKIE error. Why was the idea
dropped?

(Note: As I said, under Linux we're currently hampered when
considering the above alternatives by the fact that glibc requires the
ability to lseek() on directories. This is a bug that they could
easily fix, and it affects not only your suggestion, but also all the
other suggestions in which one implements non-permanent cookies)

Cheers,
Trond

2001-07-20 11:34:25

by Hans Reiser

[permalink] [raw]
Subject: Re: NFS Client patch

Trond Myklebust wrote:
>
> >>>>> " " == Hans Reiser <[email protected]> writes:
>
> > The current code does rely on hidden knowledge of the filesytem
> > on the server, and refuses to operate with any FS that does not
> > describe a position in a directory as an offset or hash that
> > fits into 32 or 64 bits.
>
> I'm not saying that ReiserFS is wrong to question the correctness of
> this. I'm just saying that NFSv2 and v3 are fixed protocols, and that
> it's too late to do anything about them. I read Chris mail as a
> suggestion of creating yet another NQNFS, and this would IMHO be a
> mistake. Better to concentrate on NFSv4 which is meant to be
> extendible.
>
> > But be calm, I am not planning on fixing this myself anytime in
> > the next year, we have an ugly and hideous hack deployed in
> > ReiserFS that works, for now I am just saying the folks who
> > designed NFS did a bad job and resolutely continue doing a bad
> > job, and if someone wanted to fix it, they could fix cookies to
> > use filenames instead of byte offsets for those filesytems able
> > to better use filenames than byte offsets to describe a
> > position within a directory, and for those clients and servers
> > who are both smart enough to understand filenames instead of
> > cookies (able to understand the cookie monster protocol).
>
> This is something which I believe you raised in the NFSv4 group, and
> which could indeed be a candidate for an NFSv4 extension. After all,
> this is in essence a recognition of the method most NFS clients
> implement for recovering from an EBADCOOKIE error. Why was the idea
> dropped?

Lack of desire to do anything, near as I could tell.

>
> (Note: As I said, under Linux we're currently hampered when
> considering the above alternatives by the fact that glibc requires the
> ability to lseek() on directories. This is a bug that they could
> easily fix, and it affects not only your suggestion, but also all the
> other suggestions in which one implements non-permanent cookies)

I would be quite happy if you (or anyone) could fix it, sometime in the next 3 years.

>
> Cheers,
> Trond

2001-07-20 14:08:11

by Chris Mason

[permalink] [raw]
Subject: Re: NFS Client patch



On Friday, July 20, 2001 10:50:57 AM +0200 Trond Myklebust <[email protected]> wrote:

>>>>>> " " == Hans Reiser <[email protected]> writes:
>
> > The current code does rely on hidden knowledge of the filesytem
> > on the server, and refuses to operate with any FS that does not
> > describe a position in a directory as an offset or hash that
> > fits into 32 or 64 bits.
>
> I'm not saying that ReiserFS is wrong to question the correctness of
> this. I'm just saying that NFSv2 and v3 are fixed protocols, and that
> it's too late to do anything about them. I read Chris mail as a
> suggestion of creating yet another NQNFS, and this would IMHO be a
> mistake. Better to concentrate on NFSv4 which is meant to be
> extendible.

Ah, then I was unclear...I think that while we certainly could
make linux (or reiserfs) specific changes to NFSvOld, it would be
a really bad idea. In my mind, the biggest strength behind NFS
is its cross platform support, and maintaining some extension
would only be slightly more fun than daily visits to the dentist ;-)

I also think it is easy to call NFSv4 poorly designed, but much
harder to design it to exploit the strengths of every FS on every
unix flavor. Shrug, there are tradeoffs everywhere.

I don't plan on supporting NFSv4 because it is the best network
filesystem ever made, but because it is in our best interest to
be compatible with those kinds of industry standards.

-chris

2001-07-22 22:11:23

by Pavel Machek

[permalink] [raw]
Subject: Re: NFS Client patch

Hi!

> > In any case, what happens if the file was renamed or removed between the
> > 2 readdir calls. A cookie identifying a name that was returned last, or
> > should be read next is just as volatile as a cookie that contains an
> > offset into the directory.
>
> No, if the file was removed, it still tells you where to start your search. A missing filename is
> just as good a marker as a present one.

And if new file is created with same name?
Pavel

--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2001-07-23 00:17:39

by Rob Landley

[permalink] [raw]
Subject: Re: NFS Client patch

On Thursday 19 July 2001 14:24, Pavel Machek wrote:
> Hi!
>
> > > In any case, what happens if the file was renamed or removed between
> > > the 2 readdir calls. A cookie identifying a name that was returned
> > > last, or should be read next is just as volatile as a cookie that
> > > contains an offset into the directory.
> >
> > No, if the file was removed, it still tells you where to start your
> > search. A missing filename is just as good a marker as a present one.
>
> And if new file is created with same name?
> Pavel

The same thing that happens as if a new file was inserted BEFORE your cursor,
in the part of the directory you've already looked at. You ignore it.

The "filename cookie" indicates the LAST file we looked at. We've already
seen it. Therefore, whether it's the same file or not, we don't care. We
just want the next file AFTER that one.

We're doing fairly arbitrary, unlocked reads across volatile data. No
algorithm is going to behave perfectly here. We just want a behavior that is
consistent, guaranteed to complete, and doesn't violate any obvious
constraints about how filesystems should behave (like producing duplicate
entries, which returning two different sets of data with the same filename
would do).

Makes sense to me, anyway...

Rob

2001-07-23 03:28:12

by Horst von Brand

[permalink] [raw]
Subject: Re: NFS Client patch

Rob Landley <[email protected]> said:
> On Thursday 19 July 2001 14:24, Pavel Machek wrote:

[...]

> > > No, if the file was removed, it still tells you where to start your
> > > search. A missing filename is just as good a marker as a present one.

> > And if new file is created with same name?

> The same thing that happens as if a new file was inserted BEFORE your
> cursor, in the part of the directory you've already looked at. You
> ignore it.

Who says that if I've got files A, B, C, D, and delete B, and create a new
B, whatever underlying directory structure there is will place it where the
old B was? It might reuse holes before A...
--
Horst von Brand [email protected]
Casilla 9G, Vin~a del Mar, Chile +56 32 672616

2001-07-23 18:59:50

by Rob Landley

[permalink] [raw]
Subject: Re: NFS Client patch

On Sunday 22 July 2001 22:02, Horst von Brand wrote:
> Rob Landley <[email protected]> said:
> > On Thursday 19 July 2001 14:24, Pavel Machek wrote:
>
> [...]
>
> > > > No, if the file was removed, it still tells you where to start your
> > > > search. A missing filename is just as good a marker as a present
> > > > one.
> > >
> > > And if new file is created with same name?
> >
> > The same thing that happens as if a new file was inserted BEFORE your
> > cursor, in the part of the directory you've already looked at. You
> > ignore it.
>
> Who says that if I've got files A, B, C, D, and delete B, and create a new
> B, whatever underlying directory structure there is will place it where the
> old B was? It might reuse holes before A...

I suppose the assumption was that the directory entries are returned in
alphabetically sorted order, even if the underlying filesystem doesn't do
that. Maybe this is a waste of effort on the server's part (and
generating/maintaining other sorts of cookies aren't?), but it also seems
fairly easy to make it work. (I could easily be missing something obvious...)

Rob