2004-06-25 17:49:43

by Ara.T.Howard

[permalink] [raw]
Subject: client side noac


is there any way for user code to make nfs file handles behave as if they were
mounted noac?

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-06-25 18:02:58

by Lever, Charles

[permalink] [raw]
Subject: RE: client side noac

> is there any way for user code to make nfs file handles=20
> behave as if they were mounted noac?

can you be more specific about what you need?


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-25 18:07:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: client side noac

P=E5 fr , 25/06/2004 klokka 13:49, skreiv Ara.T.Howard:
> is there any way for user code to make nfs file handles behave as if they=
were
> mounted noac?

I don't understand the question.

Filehandles are used as the unique identifier of a file. If a filehandle
changes, it is considered to belong to a different file. In consequence,
the only time you can revalidate a filehandle is at file lookup time.

Cheers,
Trond


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-25 18:52:54

by Ara.T.Howard

[permalink] [raw]
Subject: RE: client side noac

On Fri, 25 Jun 2004, Lever, Charles wrote:

>> is there any way for user code to make nfs file handles
>> behave as if they were mounted noac?
>
> can you be more specific about what you need?


that these types of operations should never yield ESTALE

- open path
- cp nfs_from nfs_to

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-25 18:59:30

by Ara.T.Howard

[permalink] [raw]
Subject: Re: client side noac

On Fri, 25 Jun 2004, Trond Myklebust wrote:

> P? fr , 25/06/2004 klokka 13:49, skreiv Ara.T.Howard:
>> is there any way for user code to make nfs file handles behave as if they were
>> mounted noac?
>
> I don't understand the question.
>
> Filehandles are used as the unique identifier of a file. If a filehandle
> changes, it is considered to belong to a different file. In consequence,
> the only time you can revalidate a filehandle is at file lookup time.
>
> Cheers,
> Trond

hi trond-

what i mean is, suprising things can raise ESTALE (i'm using an exception
based language that maps errno to exceptions - ruby)

for example

- File::copy src, dest
- open path, 'r'

to get around this i have a method, 'uncache', which:

- opens a file, retrying one time if this returns ESTALE
- applies a non-blocking shared fcntl lock to file
- links the file to a unique name, rm's name
- does a chmod file to current mode
- does a utime file to current file's utime

this is every thing i know of which could possibly invalidate the attribute
cache.

this seems to work but it's a performance KILLER. i'm exploring keeping
certain files (in this case an nfs accessed object database work queue) on a
fs exported/mounted noac and sync to get around some problems my code is
having. my code is very fault tolerant now and deals errors quite gracefully
- i've fed about 50,000 jobs through my queue with no errors - but i wondering
if their are more elegant approaches.

regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================

2004-06-25 19:17:03

by Trond Myklebust

[permalink] [raw]
Subject: Re: client side noac

P=E5 fr , 25/06/2004 klokka 14:59, skreiv Ara.T.Howard:

> what i mean is, suprising things can raise ESTALE (i'm using an exception
> based language that maps errno to exceptions - ruby)

If the server is acting sanely, the only thing that can cause an ESTALE
to occur is if a file gets deleted from some process on another client.
Unless you are playing really silly games (allowing other clients to
delete *open* files from underneath you), you should usually be able to
get round that problem by turning off attribute caching on the parent
directories: acdirmin=3Dacdirmax=3D0.

Cheers,
Trond


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-25 19:31:34

by Ara.T.Howard

[permalink] [raw]
Subject: Re: client side noac

On Fri, 25 Jun 2004, Trond Myklebust wrote:

> P? fr , 25/06/2004 klokka 14:59, skreiv Ara.T.Howard:
>
>> what i mean is, suprising things can raise ESTALE (i'm using an exception
>> based language that maps errno to exceptions - ruby)
>

> If the server is acting sanely, the only thing that can cause an ESTALE
> to occur is if a file gets deleted from some process on another client.

yes this is what is happening.

> Unless you are playing really silly games (allowing other clients to
> delete *open* files from underneath you)

no silly games: all access to the file in question is gaurded via a lockfile
(link(2)) and an fcntl based exclusive lock placed on the lockfile itself to
prevent open->lock race conditions. eg:

f = create_lockfile # external lock 1
f.lockf # external lock 2

...
open file_in_question # access file - throws ESTALE
...
on_some_conditions_file_is_deleted
...

f.unlock
destroy_lockfile f

no read locks are ever used, and the file format itself is sensitive to
corruption and i am seeing none. therefore, i'm as confident as i can be that
no two processes are accessing the file at the same moment in time. however,
as the above shows, sometimes the file is deleted within the confines of the
double lock. i think what is happening is that the nfs client is caching info
about the file_in_question between invocations. i assumed that a call to open
would flush the cache - i am not maintaining an open file handle across
invocations - the file is closed each time, also within the confines of the
lock.

> , you should usually be able to get round that problem by turning off
> attribute caching on the parent directories: acdirmin=acdirmax=0.

i am unfamiliar with that option but will read about it and fwd to my
sysad....

>
> Cheers,
> Trond
>

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================

2004-06-26 22:17:13

by Ara.T.Howard

[permalink] [raw]
Subject: Re: client side noac

On Fri, 25 Jun 2004, Trond Myklebust wrote:

> open() should not cause cache flushes unless it detects that the file
> has been modified.
>
> As for whether or not lookups are cached: We check the mtime of the
> parent directory in order to find out whether or not it has changed. If
> it has, then we do a new lookup of the file.
>
> Cheers,
> Trond

so i can see how opening a file may result in a later ESTALE (mtime caching)
but how could open ITSELF return ESTALE? this is the behaviour i'm seeing.

confused.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-26 23:11:28

by Ara.T.Howard

[permalink] [raw]
Subject: Re: client side noac

On Fri, 25 Jun 2004, Trond Myklebust wrote:

> open() should not cause cache flushes unless it detects that the file
> has been modified.
>
> As for whether or not lookups are cached: We check the mtime of the
> parent directory in order to find out whether or not it has changed. If
> it has, then we do a new lookup of the file.
>
> Cheers,
> Trond

here are two small ruby scripts which will demonstrate the problem:

this script simply loops removing the file 'foobar', ignoring failure if it did
not exist

================
file: rm.rb
================
loop do
begin
File::unlink 'foobar'
rescue Errno::ENOENT
end
end
================

this script loops opening up the file 'foobar' and copying it to 'barfoo'. if
'foobar' is found not to exist it is created by making a tmp file and mv'ing it
on top of 'foobar'. if you are unfamiliar with ruby note that open, when
called with a block, automatically closes the open file handle and the end of
the block. if you doubt this do

~/shared > touch foobar && strace ruby -e 'open("foobar"){}' 2>&1 | egrep 'open|close' | tail -2
open("foobar", O_RDONLY|O_LARGEFILE) = 3
close(3) = 0

================
file: open.rb
================
loop do
begin
open('foobar','r') do |fb|
open('barfoo','w') do |bf|
bf.write fb.read
end
end
rescue Errno::ENOENT
open('foobar.tmp', 'w'){|f| f.puts 42}
File::rename 'foobar.tmp', 'foobar'
end
end
================


here is a demo run:


on one client run the rm.rb script:

~/shared > uname -n
carp.ngdc.noaa.gov

~/shared > ruby rm.rb

on another run the open.rb script:

~/shared > uname -n
jib.ngdc.noaa.gov

~/shared > ruby open.rb
open.rb:3:in `initialize': Stale NFS file handle - foobar (Errno::ESTALE)
from open.rb:3:in `open'
from open.rb:3
from open.rb:1:in `loop'
from open.rb:12


this makes no sense to me. i understand that the open call should throw
Errno::ENOENT - since the rm.rb is constantly removing it. but why should it
throw Errno::ESTALE? it seems like this could only be caused by a buggy client
sending an invalid file handle across the wire? shouldn't clients handle this
case - knowing that it makes little sense for open to return ESTALE (what would
be 'stale' afterall)?? my code simply retries (once) if it raises ESTALE on an
open - and this always works - shouldn't the nfs client code also do this?

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-26 23:19:19

by Ara.T.Howard

[permalink] [raw]
Subject: Re: client side noac

On Sat, 26 Jun 2004, Ara.T.Howard wrote:


forgot to attach the strace from running 'ruby open.rb':

...
...
open("foobar", O_RDONLY|O_LARGEFILE) = 3
open("barfoo", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 4
fstat64(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
fstat64(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb741a000
_llseek(3, 0, [0], SEEK_CUR) = 0
read(3, "", 4096) = 0
close(4) = 0
close(3) = 0
munmap(0xb741a000, 4096) = 0
open("foobar", O_RDONLY|O_LARGEFILE) = -1 ESTALE (Stale NFS file handle)
...
...

again, seems like this should be impossible...


-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-27 19:28:35

by Trond Myklebust

[permalink] [raw]
Subject: Re: client side noac

P=E5 lau , 26/06/2004 klokka 18:17, skreiv Ara.T.Howard:
> On Fri, 25 Jun 2004, Trond Myklebust wrote:
> > As for whether or not lookups are cached: We check the mtime of the
> > parent directory in order to find out whether or not it has changed. If
> > it has, then we do a new lookup of the file.

> so i can see how opening a file may result in a later ESTALE (mtime cachi=
ng)
> but how could open ITSELF return ESTALE? this is the behaviour i'm seein=
g.

Read the above paragraph carefully: we check the mtime on the parent
directory in order to decide whether or not to lookup again...

That mtime value is subject to caching rules. The whole point is to
reduce the number of on-the-wire RPC calls on sane setups.

Cheers,
Trond


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-28 03:55:26

by Ara.T.Howard

[permalink] [raw]
Subject: Re: client side noac

On Sun, 27 Jun 2004, Trond Myklebust wrote:

> P? lau , 26/06/2004 klokka 18:17, skreiv Ara.T.Howard:
>> On Fri, 25 Jun 2004, Trond Myklebust wrote:
>>> As for whether or not lookups are cached: We check the mtime of the
>>> parent directory in order to find out whether or not it has changed. If
>>> it has, then we do a new lookup of the file.
>
>> so i can see how opening a file may result in a later ESTALE (mtime caching)
>> but how could open ITSELF return ESTALE? this is the behaviour i'm seeing.
>
> Read the above paragraph carefully: we check the mtime on the parent
> directory in order to decide whether or not to lookup again...
>
> That mtime value is subject to caching rules. The whole point is to
> reduce the number of on-the-wire RPC calls on sane setups.
>
> Cheers,
> Trond

i understand what you are saying. i guess i should clarify by saying "i don't
understand _why_ open itself should return ESTALE". i mean, in the case of a
read or write the client cannot know what action to take. in this case it
seems like the client certainly can: make an RPC call. i realize the design
emphsizes making the minumum number of calls, but this seems like one too few
doesn't it? i really don't know the code at all so please forgive me if this
sounds stupid: but why, upon finding a statle filehandle, is it deemed better
to return this known bad value instead of checking for a newer (and still
possibly bad) one? are there certain conditions which warrant this behaviour?

more to the point, this is what i'm doing in my code

fd =
begin
open path
rescue Errno::ESTALE
open path
end

in otherwords - if ESTALE is found on open, try once more. this seems to
work. should i not be doing this? if i should, why shouldn't the nfs client
code do it for me?

thanks for the help understanding this matter.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================

2004-06-28 04:37:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: client side noac

P=E5 su , 27/06/2004 klokka 23:55, skreiv Ara.T.Howard:

> i understand what you are saying. i guess i should clarify by saying "i =
don't
> understand _why_ open itself should return ESTALE". i mean, in the case =
of a
> read or write the client cannot know what action to take. in this case i=
t
> seems like the client certainly can: make an RPC call. i realize the des=
ign
> emphsizes making the minumum number of calls, but this seems like one too=
few
> doesn't it? i really don't know the code at all so please forgive me if =
this
> sounds stupid: but why, upon finding a statle filehandle, is it deemed be=
tter
> to return this known bad value instead of checking for a newer (and still
> possibly bad) one? are there certain conditions which warrant this behav=
iour?
>=20
> more to the point, this is what i'm doing in my code
>=20
> fd =3D
> begin
> open path
> rescue Errno::ESTALE
> open path
> end
>=20
> in otherwords - if ESTALE is found on open, try once more. this seems to
> work. should i not be doing this? if i should, why shouldn't the nfs cl=
ient
> code do it for me?

In 2.6.x kernels, we don't do strict revalidation of the parent
directories in the path: only the target file itself, so if someone
deletes the current directory, you may end up with a stale filehandle.
Please also note that renaming directories (and files) into other
directories is also prone to this sort of ESTALE errors if your Linux
server isn't exporting using the "no_subtree_check".

Otherwise, the file itself should always be strictly checked (unless
you've also got the "nocto" mount flag set). I won't exclude that the
lookup may be using stale data if there turns out to be some race with
readdirplus on NFSv3 that we've overlooked. You could check by comparing
with a NFSv2 mount.

Cheers,
Trond


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-28 14:10:03

by Ara.T.Howard

[permalink] [raw]
Subject: Re: client side noac

On Mon, 28 Jun 2004, Trond Myklebust wrote:

> In 2.6.x kernels, we don't do strict revalidation of the parent directories
> in the path: only the target file itself, so if someone deletes the current
> directory, you may end up with a stale filehandle.
> Please also note that renaming directories (and files) into other
> directories is also prone to this sort of ESTALE errors if your Linux server
> isn't exporting using the "no_subtree_check".
>
> Otherwise, the file itself should always be strictly checked (unless
> you've also got the "nocto" mount flag set). I won't exclude that the
> lookup may be using stale data if there turns out to be some race with
> readdirplus on NFSv3 that we've overlooked. You could check by comparing
> with a NFSv2 mount.
>
> Cheers,
> Trond

we are using

~ > uname -srm && cat /etc/redhat-release; mount | grep nfs
Linux 2.4.21-15.0.2.ELsmp i686
Red Hat Enterprise Linux WS release 3 (Taroon Update 2)
moby.b:/raid/array0/part1/export on /dmsp/moby-0-1 type nfs (rw,bg,hard,intr,rsize=8192,wsize=8192,addr=10.1.186.54)
moby.b:/raid/array1/part1/export on /dmsp/moby-1-1 type nfs (rw,bg,hard,intr,rsize=8192,wsize=8192,addr=10.1.186.54)
moby.b:/raid/array1/part2/export on /dmsp/moby-1-2 type nfs (rw,bg,hard,intr,rsize=8192,wsize=8192,addr=10.1.186.54)
moby.b:/raid/array1/part3/export on /dmsp/moby-1-3 type nfs (rw,bg,hard,intr,rsize=8192,wsize=8192,addr=10.1.186.54)
moby.b:/raid/array1/part4/export on /dmsp/moby-1-4 type nfs (rw,bg,hard,intr,rsize=8192,wsize=8192,addr=10.1.186.54)
moby.b:/raid/array1/part5/export on /dmsp/moby-1-5 type nfs (rw,bg,hard,intr,rsize=8192,wsize=8192,addr=10.1.186.54)
moby.b:/raid/array1/part6/export on /dmsp/moby-1-6 type nfs (rw,bg,hard,intr,rsize=8192,wsize=8192,addr=10.1.186.54)

so i guess that first bit does not apply. the code IS renaming files but only
into the same directory, the directory is never removed and we are not using
the 'nocto' mount option. i am 99.9% positive that we are using nfsv3 - how
can i determine this from the client side? am i reading you correctly in
saying that this setup/usage pattern (demo scripts) should result in strict
chcking of files and that ESTALE should not be returned to client from open()?
or have i mis-understood?

sorry this is dragging on.... in summary i am wanting to know:

0)
if you think this is a possible bug which i should explore further

1) the behaviour i am seeing (demonstrated by scripts sent earlier) is
consistent with our software configuration and, if so, if the 'correct'
thing for user code to do in this case is simply to retry one time if open
returns ESTALE, or if user code should attempt to force cache
invalidation, etc.?

cheers (again!).

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit http://www.blackhat.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs