2010-10-18 01:20:30

by Ian Munsie

[permalink] [raw]
Subject: NFS sillyrename side effect

Hi Trond,

I'm currently investigating a bug report related to NFS for an internal
project, and I wanted to get your input as I've been able to reproduce
it on an upstream 2.6.36-rc7 kernel on Power7 hardware. I have tracked
it down to a side effect of the nfs_sillyrename function.

I'm not very familiar with the inner workings of the nfs layer, is the
purpose of the sillyrename function to work around a limitation of nfs
if a file is unlinked while in use?

It seems to me that if the sillyrename function is indeed necessary that
we should not reveal the temporary filename it produces to userspace,
but I wonder if that might create issues (say if userspace thinks a
directory is empty when it acutally has a sillyrename file in it)?

Basically I want to know if the behaviour I've outlined below is
the expected behaviour of NFS, or if you considder this a bug?



The gist of the original bug report is that running the syscalls test
suite from the Linux Test Project on an nfsroot would emit warnings on
the recvmsg01 test, however that test never emits warnings when run
individually. The particular warning in question is:

tst_rmdir(): rmobj(/tmp/ltp-Url7aJLwfH/recZ7XHW6) failed:
unlink(/tmp/ltp-Url7aJLwfH/recZ7XHW6/.nfs00000000019f12d80000002e) failed;
errno=16: Device or resource busy

However removing a number of preceding tests would instead produce a
warning like the following:

recvmsg01 0 TWARN : tst_rmdir():
rmobj(/tmp/ltp-eenc4irncU/recRpxeMd) failed:
lstat(/tmp/ltp-eenc4irncU/recRpxeMd/.nfs00000000400a0e9a0000023c) failed;
errno=2: No such file or directory


Scott (on CC), the original reporter produced that second behaviour by
commenting out the first 475 tests and all the tests after recvmsg01.
I've been able to produce it over 50% of the time with just the
following two test cases:

foo echo
recvmsg01 recvmsg01

Presumably this is a race and the exact conditions necessary to
reproduce it will vary depending on network, hardware, etc.
Scott reported that this behaviour only began after transitioning from
2.6.31 to 2.6.32 - I haven't checked 2.6.31 to determine if the issue
may have existed and just have been harder to produce or if 2.6.31
really did not exhibit this behaviour.



The recvmsg01 testcase sets up a temporary directory, performs a mkstemp
to generate a temporary file within that directory which it promptly
unlinks and creates a unix socket with that filename to perform it's
tests. That all seems to work fine.

When it cleans up it kills it's server, unlinks that unix socket then
calls tst_rmdir() which recursively removes (by calling it's rmobj
function) the temporary directory and it's contents.

The race seems to be that when the socket is unlinked it's still in use,
so the nfs layer performs a sillyrename on it to asynchronously unlink
it, but before that completes the tst_rmdir function has already read
the contents of the directory and spotted the .nfs00000... file from the
sillyrename, which depending on the exact timing of the unlink will
cause one of the above two warnings.


A somewhat filtered (grep -v permission) output of the race after
activating nfs_debug is below:

setup:
NFS: lookup(recRpxeMd/udsockOxmZAh)
NFS: create(0:d/1074400921), udsockOxmZAh
NFS: nfs_fhget(0:d/1074400922 ct=1)
NFS: dentry_delete(recRpxeMd/udsockOxmZAh, 0)
NFS: nfs_update_inode(0:d/1074400921 ct=1 info=0x7d7f)
NFS: isize change on server for file 0:d/1074400921
NFS: unlink(0:d/1074400921, udsockOxmZAh)
NFS: safe_remove(recRpxeMd/udsockOxmZAh)
NFS: dentry_delete(recRpxeMd/udsockOxmZAh, 18)
NFS: nfs_update_inode(0:d/1074400921 ct=1 info=0x7d7f)
NFS: isize change on server for file 0:d/1074400921
NFS: lookup(recRpxeMd/udsockOxmZAh)
NFS: mknod(0:d/1074400921), udsockOxmZAh
...
cleanup:
NFS: unlink(0:d/1074400921, udsockOxmZAh)
NFS: silly-rename(recRpxeMd/udsockOxmZAh, ct=3)
NFS: trying to rename udsockOxmZAh to .nfs00000000400a0e9a0000023c
NFS: lookup(recRpxeMd/.nfs00000000400a0e9a0000023c)
NFS: dentry_delete(recRpxeMd/.nfs00000000400a0e9a0000023c, 10)
NFS: nfs_update_inode(0:d/1074400921 ct=1 info=0x7d7f)
NFS: isize change on server for file 0:d/1074400921
NFS: nfs_update_inode(0:d/3758724786 ct=2 info=0x1fdff)
NFS: nfs_update_inode(0:d/1074400921 ct=1 info=0x7d7f)
NFS: dentry_delete(recRpxeMd/.nfs00000000400a0e9a0000023c, 102)
NFS: dentry_delete(recRpxeMd/smtAmobkv, 8)
NFS: dentry_delete(recRpxeMd/smtAmobkv, 8)
NFS: unlink(0:d/1074400921, smtAmobkv)
NFS: safe_remove(recRpxeMd/smtAmobkv)
NFS: dentry_delete(recRpxeMd/smtAmobkv, 18)
NFS: nfs_update_inode(0:d/1074400921 ct=1 info=0x7d7f)
NFS: mtime change on server for file 0:d/1074400921
NFS: isize change on server for file 0:d/1074400921
NFS: lookup(recRpxeMd/.nfs00000000400a0e9a0000023c)
NFS: dentry_delete(recRpxeMd/.nfs00000000400a0e9a0000023c, 0)
NFS: dentry_delete(bin/recvmsg02, 8)


Cheers,
-Ian


2010-10-18 14:48:16

by Cláudio Martins

[permalink] [raw]
Subject: Re: NFS sillyrename side effect


On Mon, 18 Oct 2010 10:10:59 -0400 Jeff Layton <[email protected]> wrote:
> See:
>
> http://nfs.sourceforge.net/
>
> ...section D2. The faq mentions that NFSv4 could do away with it
> because it's stateful, but that's not really the case either.
>

Section D2 ends with:

"The NFS version 4 protocol is stateful, and could actually support
delete-on-last-close. Unfortunately there isn't an easy way to do this
and remain backwards-compatible with version 2 and 3 accessors."

So, theoretically, could one modify the code to selectively disable
silly rename on a client, when it knows it is talking v4 with the
server?

Thanks

Cláudio


2010-10-19 06:40:18

by Benny Halevy

[permalink] [raw]
Subject: Re: NFS sillyrename side effect

On 2010-10-18 19:10, J. Bruce Fields wrote:
> On Mon, Oct 18, 2010 at 11:01:38AM -0400, Jeff Layton wrote:
>> On Mon, 18 Oct 2010 15:53:44 +0100
>> Cláudio Martins <[email protected]> wrote:
>>
>>>
>>> On Mon, 18 Oct 2010 15:48:11 +0100 Cláudio Martins <[email protected]> wrote:
>>>>
>>>> Section D2 ends with:
>>>>
>>>> "The NFS version 4 protocol is stateful, and could actually support
>>>> delete-on-last-close. Unfortunately there isn't an easy way to do this
>>>> and remain backwards-compatible with version 2 and 3 accessors."
>>>>
>>>> So, theoretically, could one modify the code to selectively disable
>>>> silly rename on a client, when it knows it is talking v4 with the
>>>> server?
>>>>
>>>
>>> BTW, to clarify, I'm assuming a scenario where the server is
>>> configured to talk v4 only, which I suspect should be common, at least
>>> when you're relying on v4 kerberos security.
>>>
>>
>> Sadly, no...
>>
>> The server does generally hold the file open as long as the client has
>> the file open. So, you could delete the file while nfsd has it open and
>> everything would probably still work.
>>
>> Suppose though that the server crashes and reboots. When it comes back
>> up, fsck figures out that the file has been unlinked and frees the
>> blocks on the disk. Now you can't reclaim the state on the open file.
>>
>> We're pretty much stuck with silly-renaming even for v4.
>
> The server could do something like rename the file into a special
> directory somewhere

The clients can do something similar too, like sillyrenaming the
files onto <mountpoint>/.unlinked_while_opened.<client-id>
Removing this directory when it empties.

Benny

> that only it had access to, preserving the file
> across reboot. Then at the end of the grace period it would remove any
> files in that directory that had not been reclaimed by some client.
>
> The problem would still remain that the *client* wouldn't know that the
> server was capable of doing this, so would still be stuck doing
> sillyrename on its own just to be sure.
>
> NFSv4.1 adds an open return flag which allows the server to tell the
> client that the client doesn't need to do sillyrename; see the
> discussion of OPEN4_RESULT_PRESERVE_UNLINKED flag in rfc 5661.
>
> I don't think anyone's looked into implementing that yet; might be a fun
> project.
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-10-19 13:33:46

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFS sillyrename side effect

On Tue, 2010-10-19 at 08:40 +0200, Benny Halevy wrote:
> On 2010-10-18 19:10, J. Bruce Fields wrote:
> > On Mon, Oct 18, 2010 at 11:01:38AM -0400, Jeff Layton wrote:
> >> On Mon, 18 Oct 2010 15:53:44 +0100
> >> Cláudio Martins <[email protected]> wrote:
> >>
> >>>
> >>> On Mon, 18 Oct 2010 15:48:11 +0100 Cláudio Martins <[email protected]> wrote:
> >>>>
> >>>> Section D2 ends with:
> >>>>
> >>>> "The NFS version 4 protocol is stateful, and could actually support
> >>>> delete-on-last-close. Unfortunately there isn't an easy way to do this
> >>>> and remain backwards-compatible with version 2 and 3 accessors."
> >>>>
> >>>> So, theoretically, could one modify the code to selectively disable
> >>>> silly rename on a client, when it knows it is talking v4 with the
> >>>> server?
> >>>>
> >>>
> >>> BTW, to clarify, I'm assuming a scenario where the server is
> >>> configured to talk v4 only, which I suspect should be common, at least
> >>> when you're relying on v4 kerberos security.
> >>>
> >>
> >> Sadly, no...
> >>
> >> The server does generally hold the file open as long as the client has
> >> the file open. So, you could delete the file while nfsd has it open and
> >> everything would probably still work.
> >>
> >> Suppose though that the server crashes and reboots. When it comes back
> >> up, fsck figures out that the file has been unlinked and frees the
> >> blocks on the disk. Now you can't reclaim the state on the open file.
> >>
> >> We're pretty much stuck with silly-renaming even for v4.
> >
> > The server could do something like rename the file into a special
> > directory somewhere
>
> The clients can do something similar too, like sillyrenaming the
> files onto <mountpoint>/.unlinked_while_opened.<client-id>
> Removing this directory when it empties.

This would require access, directory create, and delete permissions for
the directory '<mountpoint>/', which is not always a given.

Besides, what if the server reboots, and my clientid changes?

Trond

2010-10-18 16:21:44

by Lyle Seaman

[permalink] [raw]
Subject: Re: NFS sillyrename side effect

You would need to do the state recovery before or during the fsck
(which would require the local filesystem to have some loose
integration with the network protocol).

Also, re the original issue, can't the client do a silly rename of the
directory if it contains only .nfs files during rmdir?

> Suppose though that the server crashes and reboots. When it comes back
>> up, fsck figures out that the file has been unlinked and frees the
>> blocks on the disk. Now you can't reclaim the state on the open file.

2010-10-18 17:00:20

by Cláudio Martins

[permalink] [raw]
Subject: Re: NFS sillyrename side effect


On Mon, 18 Oct 2010 12:21:43 -0400 Lyle Seaman <[email protected]> wrote:
> You would need to do the state recovery before or during the fsck
> (which would require the local filesystem to have some loose
> integration with the network protocol).
>
> Also, re the original issue, can't the client do a silly rename of the
> directory if it contains only .nfs files during rmdir?
>

I think the problem is that would be racy. You have no guarantee that
other clients won't create files on that directory between the time you
check the directory and the time you'd silly-rename it, so you would
end up "deleting" directories which are not "empty".

Best regards

Cláudio


2010-10-21 17:50:16

by Benny Halevy

[permalink] [raw]
Subject: Re: NFS sillyrename side effect

On 2010-10-19 15:32, Trond Myklebust wrote:
> On Tue, 2010-10-19 at 08:40 +0200, Benny Halevy wrote:
>> On 2010-10-18 19:10, J. Bruce Fields wrote:
>>> On Mon, Oct 18, 2010 at 11:01:38AM -0400, Jeff Layton wrote:
>>>> On Mon, 18 Oct 2010 15:53:44 +0100
>>>> Cláudio Martins <[email protected]> wrote:
>>>>
>>>>>
>>>>> On Mon, 18 Oct 2010 15:48:11 +0100 Cláudio Martins <[email protected]> wrote:
>>>>>>
>>>>>> Section D2 ends with:
>>>>>>
>>>>>> "The NFS version 4 protocol is stateful, and could actually support
>>>>>> delete-on-last-close. Unfortunately there isn't an easy way to do this
>>>>>> and remain backwards-compatible with version 2 and 3 accessors."
>>>>>>
>>>>>> So, theoretically, could one modify the code to selectively disable
>>>>>> silly rename on a client, when it knows it is talking v4 with the
>>>>>> server?
>>>>>>
>>>>>
>>>>> BTW, to clarify, I'm assuming a scenario where the server is
>>>>> configured to talk v4 only, which I suspect should be common, at least
>>>>> when you're relying on v4 kerberos security.
>>>>>
>>>>
>>>> Sadly, no...
>>>>
>>>> The server does generally hold the file open as long as the client has
>>>> the file open. So, you could delete the file while nfsd has it open and
>>>> everything would probably still work.
>>>>
>>>> Suppose though that the server crashes and reboots. When it comes back
>>>> up, fsck figures out that the file has been unlinked and frees the
>>>> blocks on the disk. Now you can't reclaim the state on the open file.
>>>>
>>>> We're pretty much stuck with silly-renaming even for v4.
>>>
>>> The server could do something like rename the file into a special
>>> directory somewhere
>>
>> The clients can do something similar too, like sillyrenaming the
>> files onto <mountpoint>/.unlinked_while_opened.<client-id>
>> Removing this directory when it empties.
>
> This would require access, directory create, and delete permissions for
> the directory '<mountpoint>/', which is not always a given.

Correct. But that's pretty easy to probe :)

>
> Besides, what if the server reboots, and my clientid changes?

By <client-id> I meant a unique identifier for the client, not necessarily
the nfsv4.x clientid. The client could just as well use its ip address
(to help admins deal with the aftermath in case the client goes away) and its
boot time, or even a random string.

Benny

>
> Trond

2010-10-21 18:02:04

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS sillyrename side effect

On Thu, Oct 21, 2010 at 07:50:12PM +0200, Benny Halevy wrote:
> On 2010-10-19 15:32, Trond Myklebust wrote:
> > This would require access, directory create, and delete permissions for
> > the directory '<mountpoint>/', which is not always a given.
>
> Correct. But that's pretty easy to probe :)
>
> > Besides, what if the server reboots, and my clientid changes?
>
> By <client-id> I meant a unique identifier for the client, not necessarily
> the nfsv4.x clientid. The client could just as well use its ip address
> (to help admins deal with the aftermath in case the client goes away) and its
> boot time, or even a random string.

Like the current sillyrename, it'd help in some cases, but not all. The
set of cases where it worked would be a little larger, but also a little
harder to explain. There may be diminishing returns to trying to
perfect this kind of workaround.

With 4.1 almost actually ready time might be better spent there, as it
offers the chance of a complete solution.

--b.

2010-10-18 14:16:07

by Jeff Layton

[permalink] [raw]
Subject: Re: NFS sillyrename side effect

On Mon, 18 Oct 2010 12:20:19 +1100
Ian Munsie <[email protected]> wrote:

> Hi Trond,
>
> I'm currently investigating a bug report related to NFS for an internal
> project, and I wanted to get your input as I've been able to reproduce
> it on an upstream 2.6.36-rc7 kernel on Power7 hardware. I have tracked
> it down to a side effect of the nfs_sillyrename function.
>
> I'm not very familiar with the inner workings of the nfs layer, is the
> purpose of the sillyrename function to work around a limitation of nfs
> if a file is unlinked while in use?
>

See:

http://nfs.sourceforge.net/

...section D2. The faq mentions that NFSv4 could do away with it
because it's stateful, but that's not really the case either.

> It seems to me that if the sillyrename function is indeed necessary that
> we should not reveal the temporary filename it produces to userspace,
> but I wonder if that might create issues (say if userspace thinks a
> directory is empty when it acutally has a sillyrename file in it)?
>
> Basically I want to know if the behaviour I've outlined below is
> the expected behaviour of NFS, or if you considder this a bug?
>


It's expected. Sillyrenaming sucks, but there really is no great
alternative to it. It's one of the prices we pay for having NFSv2/3 be
stateless.

I suppose in principle we could do things like hide silly-renamed
dentries from userspace, but that might also be problematic. You'd
still be unable to remove a directory that has a silly-renamed file in
it, for instance even though it looks empty. There would also be
inconsistencies as other machines and the server would still see
the .nfsXXXXX files.

The bottom line is that you need to be really careful with programs
that use delete-on-last-close when running on NFS.

>
>
> The gist of the original bug report is that running the syscalls test
> suite from the Linux Test Project on an nfsroot would emit warnings on
> the recvmsg01 test, however that test never emits warnings when run
> individually. The particular warning in question is:
>
> tst_rmdir(): rmobj(/tmp/ltp-Url7aJLwfH/recZ7XHW6) failed:
> unlink(/tmp/ltp-Url7aJLwfH/recZ7XHW6/.nfs00000000019f12d80000002e) failed;
> errno=16: Device or resource busy
>
> However removing a number of preceding tests would instead produce a
> warning like the following:
>
> recvmsg01 0 TWARN : tst_rmdir():
> rmobj(/tmp/ltp-eenc4irncU/recRpxeMd) failed:
> lstat(/tmp/ltp-eenc4irncU/recRpxeMd/.nfs00000000400a0e9a0000023c) failed;
> errno=2: No such file or directory
>
>
> Scott (on CC), the original reporter produced that second behaviour by
> commenting out the first 475 tests and all the tests after recvmsg01.
> I've been able to produce it over 50% of the time with just the
> following two test cases:
>
> foo echo
> recvmsg01 recvmsg01
>
> Presumably this is a race and the exact conditions necessary to
> reproduce it will vary depending on network, hardware, etc.
> Scott reported that this behaviour only began after transitioning from
> 2.6.31 to 2.6.32 - I haven't checked 2.6.31 to determine if the issue
> may have existed and just have been harder to produce or if 2.6.31
> really did not exhibit this behaviour.
>
>
>
> The recvmsg01 testcase sets up a temporary directory, performs a mkstemp
> to generate a temporary file within that directory which it promptly
> unlinks and creates a unix socket with that filename to perform it's
> tests. That all seems to work fine.
>
> When it cleans up it kills it's server, unlinks that unix socket then
> calls tst_rmdir() which recursively removes (by calling it's rmobj
> function) the temporary directory and it's contents.
>
> The race seems to be that when the socket is unlinked it's still in use,
> so the nfs layer performs a sillyrename on it to asynchronously unlink
> it, but before that completes the tst_rmdir function has already read
> the contents of the directory and spotted the .nfs00000... file from the
> sillyrename, which depending on the exact timing of the unlink will
> cause one of the above two warnings.
>
>
> A somewhat filtered (grep -v permission) output of the race after
> activating nfs_debug is below:
>
> setup:
> NFS: lookup(recRpxeMd/udsockOxmZAh)
> NFS: create(0:d/1074400921), udsockOxmZAh
> NFS: nfs_fhget(0:d/1074400922 ct=1)
> NFS: dentry_delete(recRpxeMd/udsockOxmZAh, 0)
> NFS: nfs_update_inode(0:d/1074400921 ct=1 info=0x7d7f)
> NFS: isize change on server for file 0:d/1074400921
> NFS: unlink(0:d/1074400921, udsockOxmZAh)
> NFS: safe_remove(recRpxeMd/udsockOxmZAh)
> NFS: dentry_delete(recRpxeMd/udsockOxmZAh, 18)
> NFS: nfs_update_inode(0:d/1074400921 ct=1 info=0x7d7f)
> NFS: isize change on server for file 0:d/1074400921
> NFS: lookup(recRpxeMd/udsockOxmZAh)
> NFS: mknod(0:d/1074400921), udsockOxmZAh
> ...
> cleanup:
> NFS: unlink(0:d/1074400921, udsockOxmZAh)
> NFS: silly-rename(recRpxeMd/udsockOxmZAh, ct=3)
> NFS: trying to rename udsockOxmZAh to .nfs00000000400a0e9a0000023c
> NFS: lookup(recRpxeMd/.nfs00000000400a0e9a0000023c)
> NFS: dentry_delete(recRpxeMd/.nfs00000000400a0e9a0000023c, 10)
> NFS: nfs_update_inode(0:d/1074400921 ct=1 info=0x7d7f)
> NFS: isize change on server for file 0:d/1074400921
> NFS: nfs_update_inode(0:d/3758724786 ct=2 info=0x1fdff)
> NFS: nfs_update_inode(0:d/1074400921 ct=1 info=0x7d7f)
> NFS: dentry_delete(recRpxeMd/.nfs00000000400a0e9a0000023c, 102)
> NFS: dentry_delete(recRpxeMd/smtAmobkv, 8)
> NFS: dentry_delete(recRpxeMd/smtAmobkv, 8)
> NFS: unlink(0:d/1074400921, smtAmobkv)
> NFS: safe_remove(recRpxeMd/smtAmobkv)
> NFS: dentry_delete(recRpxeMd/smtAmobkv, 18)
> NFS: nfs_update_inode(0:d/1074400921 ct=1 info=0x7d7f)
> NFS: mtime change on server for file 0:d/1074400921
> NFS: isize change on server for file 0:d/1074400921
> NFS: lookup(recRpxeMd/.nfs00000000400a0e9a0000023c)
> NFS: dentry_delete(recRpxeMd/.nfs00000000400a0e9a0000023c, 0)
> NFS: dentry_delete(bin/recvmsg02, 8)
>
>
> Cheers,
> -Ian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


--
Jeff Layton <[email protected]>

2010-10-18 17:10:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS sillyrename side effect

On Mon, Oct 18, 2010 at 11:01:38AM -0400, Jeff Layton wrote:
> On Mon, 18 Oct 2010 15:53:44 +0100
> Cláudio Martins <[email protected]> wrote:
>
> >
> > On Mon, 18 Oct 2010 15:48:11 +0100 Cláudio Martins <[email protected]> wrote:
> > >
> > > Section D2 ends with:
> > >
> > > "The NFS version 4 protocol is stateful, and could actually support
> > > delete-on-last-close. Unfortunately there isn't an easy way to do this
> > > and remain backwards-compatible with version 2 and 3 accessors."
> > >
> > > So, theoretically, could one modify the code to selectively disable
> > > silly rename on a client, when it knows it is talking v4 with the
> > > server?
> > >
> >
> > BTW, to clarify, I'm assuming a scenario where the server is
> > configured to talk v4 only, which I suspect should be common, at least
> > when you're relying on v4 kerberos security.
> >
>
> Sadly, no...
>
> The server does generally hold the file open as long as the client has
> the file open. So, you could delete the file while nfsd has it open and
> everything would probably still work.
>
> Suppose though that the server crashes and reboots. When it comes back
> up, fsck figures out that the file has been unlinked and frees the
> blocks on the disk. Now you can't reclaim the state on the open file.
>
> We're pretty much stuck with silly-renaming even for v4.

The server could do something like rename the file into a special
directory somewhere that only it had access to, preserving the file
across reboot. Then at the end of the grace period it would remove any
files in that directory that had not been reclaimed by some client.

The problem would still remain that the *client* wouldn't know that the
server was capable of doing this, so would still be stuck doing
sillyrename on its own just to be sure.

NFSv4.1 adds an open return flag which allows the server to tell the
client that the client doesn't need to do sillyrename; see the
discussion of OPEN4_RESULT_PRESERVE_UNLINKED flag in rfc 5661.

I don't think anyone's looked into implementing that yet; might be a fun
project.

--b.

2010-10-18 14:53:47

by Cláudio Martins

[permalink] [raw]
Subject: Re: NFS sillyrename side effect


On Mon, 18 Oct 2010 15:48:11 +0100 Cláudio Martins <[email protected]> wrote:
>
> Section D2 ends with:
>
> "The NFS version 4 protocol is stateful, and could actually support
> delete-on-last-close. Unfortunately there isn't an easy way to do this
> and remain backwards-compatible with version 2 and 3 accessors."
>
> So, theoretically, could one modify the code to selectively disable
> silly rename on a client, when it knows it is talking v4 with the
> server?
>

BTW, to clarify, I'm assuming a scenario where the server is
configured to talk v4 only, which I suspect should be common, at least
when you're relying on v4 kerberos security.

Best regards

Cláudio


2010-10-18 15:01:40

by Jeff Layton

[permalink] [raw]
Subject: Re: NFS sillyrename side effect

On Mon, 18 Oct 2010 15:53:44 +0100
Cl?udio Martins <[email protected]> wrote:

>
> On Mon, 18 Oct 2010 15:48:11 +0100 Cl?udio Martins <[email protected]> wrote:
> >
> > Section D2 ends with:
> >
> > "The NFS version 4 protocol is stateful, and could actually support
> > delete-on-last-close. Unfortunately there isn't an easy way to do this
> > and remain backwards-compatible with version 2 and 3 accessors."
> >
> > So, theoretically, could one modify the code to selectively disable
> > silly rename on a client, when it knows it is talking v4 with the
> > server?
> >
>
> BTW, to clarify, I'm assuming a scenario where the server is
> configured to talk v4 only, which I suspect should be common, at least
> when you're relying on v4 kerberos security.
>

Sadly, no...

The server does generally hold the file open as long as the client has
the file open. So, you could delete the file while nfsd has it open and
everything would probably still work.

Suppose though that the server crashes and reboots. When it comes back
up, fsck figures out that the file has been unlinked and frees the
blocks on the disk. Now you can't reclaim the state on the open file.

We're pretty much stuck with silly-renaming even for v4.

--
Jeff Layton <[email protected]>

2010-10-21 18:28:48

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFS sillyrename side effect

On Thu, 2010-10-21 at 19:50 +0200, Benny Halevy wrote:
> On 2010-10-19 15:32, Trond Myklebust wrote:
> > On Tue, 2010-10-19 at 08:40 +0200, Benny Halevy wrote:
> >> The clients can do something similar too, like sillyrenaming the
> >> files onto <mountpoint>/.unlinked_while_opened.<client-id>
> >> Removing this directory when it empties.
> >
> > This would require access, directory create, and delete permissions for
> > the directory '<mountpoint>/', which is not always a given.
>
> Correct. But that's pretty easy to probe :)

Yup, but you'd then need to handle both cases in order to avoid
regressions.

You'd also have to deal with more complex locking: avoiding deadlocks
when creating mountpoint/.unlinked_while_open.<clientid> while still
holding the directory mutex for the unlink(). That basically requires
you to take mutexes in the opposite order to that required for a
cross-directory rename.

IOW: is it really worth the effort?

Note also that userland can do exactly the same thing. You are basically
proposing to do the whole 'trashcan' concept in kernel space...

> >
> > Besides, what if the server reboots, and my clientid changes?
>
> By <client-id> I meant a unique identifier for the client, not necessarily
> the nfsv4.x clientid. The client could just as well use its ip address
> (to help admins deal with the aftermath in case the client goes away) and its
> boot time, or even a random string.

Trond

2010-10-18 15:44:16

by Cláudio Martins

[permalink] [raw]
Subject: Re: NFS sillyrename side effect


On Mon, 18 Oct 2010 11:01:38 -0400 Jeff Layton <[email protected]> wro=
te:
> On Mon, 18 Oct 2010 15:53:44 +0100
> Cl=C3=A1udio Martins <[email protected]> wrote:
>=20
> > >=20
> > > So, theoretically, could one modify the code to selectively disa=
ble
> > > silly rename on a client, when it knows it is talking v4 with the
> > > server?
> > >=20
> >=20
> > BTW, to clarify, I'm assuming a scenario where the server is
> > configured to talk v4 only, which I suspect should be common, at le=
ast
> > when you're relying on v4 kerberos security.
> >=20
>=20
> Sadly, no...
>=20
> The server does generally hold the file open as long as the client ha=
s
> the file open. So, you could delete the file while nfsd has it open a=
nd
> everything would probably still work.
>=20
> Suppose though that the server crashes and reboots. When it comes bac=
k
> up, fsck figures out that the file has been unlinked and frees the
> blocks on the disk. Now you can't reclaim the state on the open file.
>=20
> We're pretty much stuck with silly-renaming even for v4.
>=20

Jeff,

Thank you for the explanation, you make a good point.

Best regards

Cl=C3=A1udio


2010-10-19 05:18:59

by Ian Munsie

[permalink] [raw]
Subject: Re: NFS sillyrename side effect

Hi all,

Excerpts from Jeff Layton's message of Tue Oct 19 01:10:59 +1100 2010:
> See:
>
> http://nfs.sourceforge.net/
>
> ...section D2. The faq mentions that NFSv4 could do away with it
> because it's stateful, but that's not really the case either.

Thanks for the pointer.

> > It seems to me that if the sillyrename function is indeed necessary that
> > we should not reveal the temporary filename it produces to userspace,
> > but I wonder if that might create issues (say if userspace thinks a
> > directory is empty when it acutally has a sillyrename file in it)?
> >
> > Basically I want to know if the behaviour I've outlined below is
> > the expected behaviour of NFS, or if you considder this a bug?
> >
>
>
> It's expected. Sillyrenaming sucks, but there really is no great
> alternative to it. It's one of the prices we pay for having NFSv2/3 be
> stateless.
>
> I suppose in principle we could do things like hide silly-renamed
> dentries from userspace, but that might also be problematic. You'd
> still be unable to remove a directory that has a silly-renamed file in
> it, for instance even though it looks empty. There would also be
> inconsistencies as other machines and the server would still see
> the .nfsXXXXX files.

I had a feeling that would be the case.

> The bottom line is that you need to be really careful with programs
> that use delete-on-last-close when running on NFS.


Thanks to everyone who replied for clarifying this matter for me.

Cheers,
-Ian