2005-11-11 11:42:15

by Edward Hibbert

[permalink] [raw]
Subject: Rename window

I have noticed a atomicity problem with the handling of rename by the
Linux NFS client, and just wanted to check if this is something that we
just have to work-around, or is there some other solution.

The problem arises when two different machines issue a file rename which
is identical.

E.g. rename oldname newname

You would expect that one machine would be successful, and the other
would fail. However, the NFS client seems to be issuing 3 NFS
operations to perform the rename

* LOOKUP oldname
* LOOKUP newname
* RENAME oldname newname

There is a timing window where both LOOKUPs can succeed (if the other
machine does the rename at the right time). In this case, the Linux
NFS client does not issue the final NFS RENAME operation - BUT still
returns success to the caller.

Thus both machines have succeeded in renaming the file.

You might argue that since they are performing the same operation that
it is OK for both to succeed, but I have an application that depends on
the atomicity of the operation, as it uses the filename to hold a
counter, and rename to assign a unique counter to a process.

Any help appreciated.

Edward.



2005-11-11 13:30:54

by Neil Horman

[permalink] [raw]
Subject: Re: Rename window

On Fri, Nov 11, 2005 at 11:42:01AM -0000, Edward Hibbert wrote:
> I have noticed a atomicity problem with the handling of rename by the
> Linux NFS client, and just wanted to check if this is something that we
> just have to work-around, or is there some other solution.
>
> The problem arises when two different machines issue a file rename which
> is identical.
>
> E.g. rename oldname newname
>
> You would expect that one machine would be successful, and the other
> would fail. However, the NFS client seems to be issuing 3 NFS
> operations to perform the rename
>
> * LOOKUP oldname
> * LOOKUP newname
> * RENAME oldname newname
>
> There is a timing window where both LOOKUPs can succeed (if the other
> machine does the rename at the right time). In this case, the Linux
> NFS client does not issue the final NFS RENAME operation - BUT still
> returns success to the caller.
>
> Thus both machines have succeeded in renaming the file.
>
> You might argue that since they are performing the same operation that
> it is OK for both to succeed, but I have an application that depends on
> the atomicity of the operation, as it uses the filename to hold a
> counter, and rename to assign a unique counter to a process.
>
> Any help appreciated.
>

I don't believe rename ever guaranteed atomicity of its operation. If you want
to guarantee atomicity I expect you will want to implement a locking strategy
around the directories/files you are modifying in the rename operation using fcntl
locking.
Regards
Neil

> Edward.
>
>

--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-11 13:38:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: Rename window

On Fri, 2005-11-11 at 11:42 +0000, Edward Hibbert wrote:
> I have noticed a atomicity problem with the handling of rename by the
> Linux NFS client, and just wanted to check if this is something that
> we just have to work-around, or is there some other solution.
>
> The problem arises when two different machines issue a file rename
> which is identical.
>
> E.g. rename oldname newname
>
> You would expect that one machine would be successful, and the other
> would fail. However, the NFS client seems to be issuing 3 NFS
> operations to perform the rename
> * LOOKUP oldname
> * LOOKUP newname
> * RENAME oldname newname
> There is a timing window where both LOOKUPs can succeed (if the other
> machine does the rename at the right time). In this case, the Linux
> NFS client does not issue the final NFS RENAME operation - BUT still
> returns success to the caller.
>
> Thus both machines have succeeded in renaming the file.
>
> You might argue that since they are performing the same operation that
> it is OK for both to succeed, but I have an application that depends
> on the atomicity of the operation, as it uses the filename to hold a
> counter, and rename to assign a unique counter to a process.

This is another of those cases where the VFS has optimised for local
filesystem behaviour, and where fixing this problem would require
significant VFS changes.
You'd basically have to add a new lookup intent for the RENAME function
in order to tell the NFS layer that it can ignore the lookup requests.
We probably will get round to doing that sometime, but it is not one of
the most pressing bugs on my list.

In the meantime, I'd suggest just not relying on this level of atomicity
(there are in any case situations where this is always impossible - for
instance in sillyrename() situations).

Cheers,
Trond



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-11 13:41:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: Rename window

On Fri, 2005-11-11 at 08:30 -0500, Neil Horman wrote:

> I don't believe rename ever guaranteed atomicity of its operation. If you want
> to guarantee atomicity I expect you will want to implement a locking strategy
> around the directories/files you are modifying in the rename operation using fcntl
> locking.

No. Edward is correct in that a POSIX filesystem is supposed to
guarantee atomicity (this is one reason why we have a dedicated rename()
instead of playing games like link(a,b); unlink(b);).

Unfortunately, as I said, there are issues in the VFS...

Cheers,
Trond



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-11 14:58:20

by Peter Staubach

[permalink] [raw]
Subject: Re: Rename window

Trond Myklebust wrote:

>On Fri, 2005-11-11 at 11:42 +0000, Edward Hibbert wrote:
>
>
>>I have noticed a atomicity problem with the handling of rename by the
>>Linux NFS client, and just wanted to check if this is something that
>>we just have to work-around, or is there some other solution.
>>
>>The problem arises when two different machines issue a file rename
>>which is identical.
>>
>>E.g. rename oldname newname
>>
>>You would expect that one machine would be successful, and the other
>>would fail. However, the NFS client seems to be issuing 3 NFS
>>operations to perform the rename
>> * LOOKUP oldname
>> * LOOKUP newname
>> * RENAME oldname newname
>>There is a timing window where both LOOKUPs can succeed (if the other
>>machine does the rename at the right time). In this case, the Linux
>>NFS client does not issue the final NFS RENAME operation - BUT still
>>returns success to the caller.
>>
>>Thus both machines have succeeded in renaming the file.
>>
>>You might argue that since they are performing the same operation that
>>it is OK for both to succeed, but I have an application that depends
>>on the atomicity of the operation, as it uses the filename to hold a
>>counter, and rename to assign a unique counter to a process.
>>
>>
>
>This is another of those cases where the VFS has optimised for local
>filesystem behaviour, and where fixing this problem would require
>significant VFS changes.
>You'd basically have to add a new lookup intent for the RENAME function
>in order to tell the NFS layer that it can ignore the lookup requests.
>We probably will get round to doing that sometime, but it is not one of
>the most pressing bugs on my list.
>
>In the meantime, I'd suggest just not relying on this level of atomicity
>(there are in any case situations where this is always impossible - for
>instance in sillyrename() situations).
>

It is true that rename(2) is supposed to be atomic.

It is also true that this is a problem in an optimization in the file system
independent layer in the current Linux. There is a check which compares the
"from" inode and the "to" inode and, if they are the same, just returns no
error, ie. success. NFS does need at least the lookup on the "to" name so
that it can handle the rename to an open file case.

Couldn't this be (fairly) easily handled by making the short circuit check
in vfs_rename() conditional and not use it for NFS mounted file systems
and perhaps others?

Thanx...

ps


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-11 15:13:39

by Trond Myklebust

[permalink] [raw]
Subject: Re: Rename window

On Fri, 2005-11-11 at 09:58 -0500, Peter Staubach wrote:
> Trond Myklebust wrote:
>
> >On Fri, 2005-11-11 at 11:42 +0000, Edward Hibbert wrote:
> >
> >
> >>I have noticed a atomicity problem with the handling of rename by the
> >>Linux NFS client, and just wanted to check if this is something that
> >>we just have to work-around, or is there some other solution.
> >>
> >>The problem arises when two different machines issue a file rename
> >>which is identical.
> >>
> >>E.g. rename oldname newname
> >>
> >>You would expect that one machine would be successful, and the other
> >>would fail. However, the NFS client seems to be issuing 3 NFS
> >>operations to perform the rename
> >> * LOOKUP oldname
> >> * LOOKUP newname
> >> * RENAME oldname newname
> >>There is a timing window where both LOOKUPs can succeed (if the other
> >>machine does the rename at the right time). In this case, the Linux
> >>NFS client does not issue the final NFS RENAME operation - BUT still
> >>returns success to the caller.
> >>
> >>Thus both machines have succeeded in renaming the file.
> >>
> >>You might argue that since they are performing the same operation that
> >>it is OK for both to succeed, but I have an application that depends
> >>on the atomicity of the operation, as it uses the filename to hold a
> >>counter, and rename to assign a unique counter to a process.
> >>
> >>
> >
> >This is another of those cases where the VFS has optimised for local
> >filesystem behaviour, and where fixing this problem would require
> >significant VFS changes.
> >You'd basically have to add a new lookup intent for the RENAME function
> >in order to tell the NFS layer that it can ignore the lookup requests.
> >We probably will get round to doing that sometime, but it is not one of
> >the most pressing bugs on my list.
> >
> >In the meantime, I'd suggest just not relying on this level of atomicity
> >(there are in any case situations where this is always impossible - for
> >instance in sillyrename() situations).
> >
>
> It is true that rename(2) is supposed to be atomic.
>
> It is also true that this is a problem in an optimization in the file system
> independent layer in the current Linux. There is a check which compares the
> "from" inode and the "to" inode and, if they are the same, just returns no
> error, ie. success. NFS does need at least the lookup on the "to" name so
> that it can handle the rename to an open file case.
>
> Couldn't this be (fairly) easily handled by making the short circuit check
> in vfs_rename() conditional and not use it for NFS mounted file systems
> and perhaps others?

That causes weird behaviour to pop up in other places. One classic is
the rename of an open file onto a hard link of itself, where unless
someone (either NFS or the VFS) tests those inodes for equality, NFS
will end up sillyrenaming the hard link, then renaming the open file in
complete violation of POSIX.

Cheers,
Trond



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-11 15:25:14

by Peter Staubach

[permalink] [raw]
Subject: Re: Rename window

Trond Myklebust wrote:

>On Fri, 2005-11-11 at 09:58 -0500, Peter Staubach wrote:
>
>
>>Trond Myklebust wrote:
>>
>>
>>
>>>On Fri, 2005-11-11 at 11:42 +0000, Edward Hibbert wrote:
>>>
>>>
>>>
>>>
>>>>I have noticed a atomicity problem with the handling of rename by the
>>>>Linux NFS client, and just wanted to check if this is something that
>>>>we just have to work-around, or is there some other solution.
>>>>
>>>>The problem arises when two different machines issue a file rename
>>>>which is identical.
>>>>
>>>>E.g. rename oldname newname
>>>>
>>>>You would expect that one machine would be successful, and the other
>>>>would fail. However, the NFS client seems to be issuing 3 NFS
>>>>operations to perform the rename
>>>> * LOOKUP oldname
>>>> * LOOKUP newname
>>>> * RENAME oldname newname
>>>>There is a timing window where both LOOKUPs can succeed (if the other
>>>>machine does the rename at the right time). In this case, the Linux
>>>>NFS client does not issue the final NFS RENAME operation - BUT still
>>>>returns success to the caller.
>>>>
>>>>Thus both machines have succeeded in renaming the file.
>>>>
>>>>You might argue that since they are performing the same operation that
>>>>it is OK for both to succeed, but I have an application that depends
>>>>on the atomicity of the operation, as it uses the filename to hold a
>>>>counter, and rename to assign a unique counter to a process.
>>>>
>>>>
>>>>
>>>>
>>>This is another of those cases where the VFS has optimised for local
>>>filesystem behaviour, and where fixing this problem would require
>>>significant VFS changes.
>>>You'd basically have to add a new lookup intent for the RENAME function
>>>in order to tell the NFS layer that it can ignore the lookup requests.
>>>We probably will get round to doing that sometime, but it is not one of
>>>the most pressing bugs on my list.
>>>
>>>In the meantime, I'd suggest just not relying on this level of atomicity
>>>(there are in any case situations where this is always impossible - for
>>>instance in sillyrename() situations).
>>>
>>>
>>>
>>It is true that rename(2) is supposed to be atomic.
>>
>>It is also true that this is a problem in an optimization in the file system
>>independent layer in the current Linux. There is a check which compares the
>>"from" inode and the "to" inode and, if they are the same, just returns no
>>error, ie. success. NFS does need at least the lookup on the "to" name so
>>that it can handle the rename to an open file case.
>>
>>Couldn't this be (fairly) easily handled by making the short circuit check
>>in vfs_rename() conditional and not use it for NFS mounted file systems
>>and perhaps others?
>>
>>
>
>That causes weird behaviour to pop up in other places. One classic is
>the rename of an open file onto a hard link of itself, where unless
>someone (either NFS or the VFS) tests those inodes for equality, NFS
>will end up sillyrenaming the hard link, then renaming the open file in
>complete violation of POSIX.
>

I suppose that a check could be added for the hard link count, but yuck,
that is a nasty situation. Oh well, good point...

ps


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-11 15:28:18

by Edward Hibbert

[permalink] [raw]
Subject: RE: Rename window

Thanks for your replies. We're getting round this by ensuring that we
don't rename to the same name - we can put something machine-specific in
the destination filename fairly easily.

However I should probably have said that our test app suggests this
doesn't occur with Solaris NFS clients, which might influence people's
opinion on whether it should be fixed.

Regards,

Edward.

-----Original Message-----
From: Peter Staubach [mailto:[email protected]]=20
Sent: 11 November 2005 15:25
To: Trond Myklebust
Cc: Edward Hibbert; [email protected]
Subject: Re: [NFS] Rename window

Trond Myklebust wrote:

>On Fri, 2005-11-11 at 09:58 -0500, Peter Staubach wrote:
> =20
>
>>Trond Myklebust wrote:
>>
>> =20
>>
>>>On Fri, 2005-11-11 at 11:42 +0000, Edward Hibbert wrote:
>>>=20
>>>
>>> =20
>>>
>>>>I have noticed a atomicity problem with the handling of rename by=20
>>>>the Linux NFS client, and just wanted to check if this is something=20
>>>>that we just have to work-around, or is there some other solution.
>>>>
>>>>The problem arises when two different machines issue a file rename=20
>>>>which is identical.
>>>>
>>>>E.g. rename oldname newname
>>>>
>>>>You would expect that one machine would be successful, and the other
>>>>would fail. However, the NFS client seems to be issuing 3 NFS
>>>>operations to perform the rename
>>>> * LOOKUP oldname=20
>>>> * LOOKUP newname=20
>>>> * RENAME oldname newname
>>>>There is a timing window where both LOOKUPs can succeed (if the
other
>>>>machine does the rename at the right time). In this case, the
Linux
>>>>NFS client does not issue the final NFS RENAME operation - BUT still

>>>>returns success to the caller.
>>>>
>>>>Thus both machines have succeeded in renaming the file.
>>>>
>>>>You might argue that since they are performing the same operation=20
>>>>that it is OK for both to succeed, but I have an application that=20
>>>>depends on the atomicity of the operation, as it uses the filename=20
>>>>to hold a counter, and rename to assign a unique counter to a
process.
>>>> =20
>>>>
>>>> =20
>>>>
>>>This is another of those cases where the VFS has optimised for local=20
>>>filesystem behaviour, and where fixing this problem would require=20
>>>significant VFS changes.
>>>You'd basically have to add a new lookup intent for the RENAME=20
>>>function in order to tell the NFS layer that it can ignore the lookup
requests.
>>>We probably will get round to doing that sometime, but it is not one=20
>>>of the most pressing bugs on my list.
>>>
>>>In the meantime, I'd suggest just not relying on this level of=20
>>>atomicity (there are in any case situations where this is always=20
>>>impossible - for instance in sillyrename() situations).
>>>
>>> =20
>>>
>>It is true that rename(2) is supposed to be atomic.
>>
>>It is also true that this is a problem in an optimization in the file=20
>>system independent layer in the current Linux. There is a check which

>>compares the "from" inode and the "to" inode and, if they are the=20
>>same, just returns no error, ie. success. NFS does need at least the=20
>>lookup on the "to" name so that it can handle the rename to an open
file case.
>>
>>Couldn't this be (fairly) easily handled by making the short circuit=20
>>check in vfs_rename() conditional and not use it for NFS mounted file=20
>>systems and perhaps others?
>> =20
>>
>
>That causes weird behaviour to pop up in other places. One classic is=20
>the rename of an open file onto a hard link of itself, where unless=20
>someone (either NFS or the VFS) tests those inodes for equality, NFS=20
>will end up sillyrenaming the hard link, then renaming the open file in

>complete violation of POSIX.
>

I suppose that a check could be added for the hard link count, but yuck,
that is a nasty situation. Oh well, good point...

ps


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-11 15:29:40

by Trond Myklebust

[permalink] [raw]
Subject: RE: Rename window

On Fri, 2005-11-11 at 15:28 +0000, Edward Hibbert wrote:

> However I should probably have said that our test app suggests this
> doesn't occur with Solaris NFS clients, which might influence people's
> opinion on whether it should be fixed.

Not really, no. ;-)

Cheers,
Trond



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-11 15:31:52

by Edward Hibbert

[permalink] [raw]
Subject: RE: Rename window

Hey, who said "it" couldn't refer to "Solaris"?

Edward.=20

-----Original Message-----
From: Trond Myklebust [mailto:[email protected]]=20
Sent: 11 November 2005 15:29
To: Edward Hibbert
Cc: Peter Staubach; [email protected]
Subject: RE: [NFS] Rename window

On Fri, 2005-11-11 at 15:28 +0000, Edward Hibbert wrote:

> However I should probably have said that our test app suggests this=20
> doesn't occur with Solaris NFS clients, which might influence people's

> opinion on whether it should be fixed.

Not really, no. ;-)

Cheers,
Trond



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-11 15:33:17

by Peter Staubach

[permalink] [raw]
Subject: Re: Rename window

Edward Hibbert wrote:

>Thanks for your replies. We're getting round this by ensuring that we
>don't rename to the same name - we can put something machine-specific in
>the destination filename fairly easily.
>
>However I should probably have said that our test app suggests this
>doesn't occur with Solaris NFS clients, which might influence people's
>opinion on whether it should be fixed.
>

I'd be surprised if this couldn't happen with Solaris clients. They do
the same lookups and have many of the same decision making statements.
It is implemented differently, completely down in the NFS file system,
but it works about the same.

ps


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs