Dear all,
Server: RedHat 7.3, kernel 2.4.20-19.7smp, nfs-util 1.0.5
Client: RedHat 7.3, kernel 2.4.18-27.7.x, nfs-utils 0.3.3-6.73
Over 200 entries in /etc/exports, after issuing 'exportfs -r', some
clients keep certain home directories stale.
On the client, the dmesg output says:
nfs: server x not responding, still trying
nfs: server x OK
nfs_statfs: statfs error = 116
nfs_statfs: statfs error = 116
nfs_statfs: statfs error = 116
nfs_statfs: statfs error = 116
bash# ls -ld /home/user
ls: /home/user: Stale NFS file handle
On the server, I'm looking at the traffic from client ('tcpdump -s300 -i
eth2 -Nt host client'):
...
client.4180178520 > fs.nfs: 112 read fh Unknown/1 4096 bytes @
0x000029000 (DF)
x.nfs > client.4180178520: reply ok 32 (DF)
client.4196955736 > fs.nfs: 112 read fh Unknown/1 4096 bytes @
0x000029000 (DF)
x.nfs > client.4196955736: reply ok 32 (DF)
client.4213732952 > fs.nfs: 112 read fh Unknown/1 4096 bytes @
0x000029000 (DF)
x.nfs > client.4213732952: reply ok 32 (DF)
client.4230510168 > fs.nfs: 112 read fh Unknown/1 4096 bytes @
0x000029000 (DF)
x.nfs > client.4230510168: reply ok 32 (DF)
client.4247287384 > fs.nfs: 112 read fh Unknown/1 4096 bytes @
0x000029000 (DF)
x.nfs > client.4247287384: reply ok 32 (DF)
client.4264064600 > fs.nfs: 112 read fh Unknown/1 4096 bytes @
0x000029000 (DF)
x.nfs > client.4264064600: reply ok 32 (DF)
client.4280841816 > fs.nfs: 112 read fh Unknown/1 4096 bytes @
0x000029000 (DF)
x.nfs > client.4280841816: reply ok 32 (DF)
...
I don't know how to interpret this, but the facts are:
- the server x is running NFS, most clients still work w/o problems
- client can not recover from the stale handle and it looks like it is
spamming the server
My questions are:
- Is this a race?
- Is there a way to get the client working again w/o having to reboot
(or kill all the users processes and umount the home if that's
possible)? I tried restarting rpc.statd on the client but that did not help.
- How can I provide more debugging infos if needed?
- Could this be related to the thread "[NFS] nfs errors clutter up logs
after 2.4.20 -> 2.4.22-pre10", we really see a lot of messages like that
on all clients
Thanks for your help.
Regards,
Marc
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
Marc Schmitt wrote:
>
> My questions are:
>
> - Is this a race?
It sounds to me like it could be be a server issue under
a very heavy load... How many nfsd are you running? Try
increasing the number to see if that helps....
>
> - Is there a way to get the client working again w/o having to reboot
> (or kill all the users processes and umount the home if that's
> possible)? I tried restarting rpc.statd on the client but that did not
> help.
not really... :(
> - How can I provide more debugging infos if needed?
ethereal traces have more information and are
generally more useful... imo...
SteveD.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
> > - Is there a way to get the client working again w/o having to reboot
> > (or kill all the users processes and umount the home if that's
> > possible)? I tried restarting rpc.statd on the client but that did not
> > help.
>
> not really... :(
>
Mounting the directory again should help. IHMO not a nice solution but it
always worked on our systems.
Cheers,
Bernd
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2003-09-18 at 21:24, Bernd Schubert wrote:
> > > - Is there a way to get the client working again w/o having to reboot
> > > (or kill all the users processes and umount the home if that's
> > > possible)? I tried restarting rpc.statd on the client but that did not
> > > help.
> >
> > not really... :(
> >
>
> Mounting the directory again should help. IHMO not a nice solution but it
> always worked on our systems.
Do you mean 'mount -o remount /home/user'?
I've tried that, it didn't work.
Greetz
Marc
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2003-09-18 at 15:49, Steve Dickson wrote:
> Marc Schmitt wrote:
>
> >
> > My questions are:
> >
> > - Is this a race?
>
> It sounds to me like it could be be a server issue under
> a very heavy load... How many nfsd are you running? Try
> increasing the number to see if that helps....
Thanks, I'm trying that and changed the number of nfsd from 32 to 64 on
the production system.
> > - How can I provide more debugging infos if needed?
>
> ethereal traces have more information and are
> generally more useful... imo...
I'll try to get a test setup running with the same software versions,
create a couple hundres of exports and bomb it from one of the our
clusters with bonnie++s. Like that I'll hopefully be able to reproduce
this re-exporting issue.
A user has found a bug that appears when checking out a big subversion
repository on the same server over NFS, it will always timeout upon this
huge amount of file manipulations and the checkout fails. He then
reproduced the issue with a small script that basicly loops over those
four commands:
rename ("old/bla", "new/bla")
stat("new,bla",..)
chmod("new/bla")
rename ("new/bla", "old/bla")
Before 1000 iterations the script returns: Error setting new/bla to
read-only! We'll try to narrow this down on the test cluster, too. One
particularity has been found already: the bug only appears if
the renaming takes place over directory boundries.
Regards,
Marc
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thursday 18 September 2003 21:31, Marc Schmitt wrote:
> On Thu, 2003-09-18 at 21:24, Bernd Schubert wrote:
> > > > - Is there a way to get the client working again w/o having to reboot
> > > > (or kill all the users processes and umount the home if that's
> > > > possible)? I tried restarting rpc.statd on the client but that did
> > > > not help.
> > >
> > > not really... :(
> >
> > Mounting the directory again should help. IHMO not a nice solution but it
> > always worked on our systems.
>
> Do you mean 'mount -o remount /home/user'?
> I've tried that, it didn't work.
>
No, no, I simply mean:
mount -t nfs server:export_dir target_dir
so just 'overmounting' the already mounted directory.
Hope it helps,
Bernd
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On September 18, [email protected] wrote:
>
> A user has found a bug that appears when checking out a big subversion
> repository on the same server over NFS, it will always timeout upon this
> huge amount of file manipulations and the checkout fails. He then
> reproduced the issue with a small script that basicly loops over those
> four commands:
>
> rename ("old/bla", "new/bla")
> stat("new,bla",..)
> chmod("new/bla")
> rename ("new/bla", "old/bla")
>
> Before 1000 iterations the script returns: Error setting new/bla to
> read-only! We'll try to narrow this down on the test cluster, too. One
> particularity has been found already: the bug only appears if
> the renaming takes place over directory boundries.
Sounds like you are using the "subtree_check" export flag on that
export (possibly implicitly). You don't want to. i.e. add
"no_subtree_check" after reading "man exports"
NeilBrown
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
Neil Brown wrote:
>On September 18, [email protected] wrote:
>
>
>>A user has found a bug that appears when checking out a big subversion
>>repository on the same server over NFS, it will always timeout upon this
>>huge amount of file manipulations and the checkout fails. He then
>>reproduced the issue with a small script that basicly loops over those
>>four commands:
>>
>>rename ("old/bla", "new/bla")
>>stat("new,bla",..)
>>chmod("new/bla")
>>rename ("new/bla", "old/bla")
>>
>>Before 1000 iterations the script returns: Error setting new/bla to
>>read-only! We'll try to narrow this down on the test cluster, too. One
>>particularity has been found already: the bug only appears if
>>the renaming takes place over directory boundries.
>>
>>
>
>Sounds like you are using the "subtree_check" export flag on that
>export (possibly implicitly). You don't want to. i.e. add
>"no_subtree_check" after reading "man exports"
>
>
Excellent, that worked. Thank you very much.
Marc
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2003-09-18 at 15:49, Steve Dickson wrote:
> Marc Schmitt wrote:
>
> >
> > My questions are:
> >
> > - Is this a race?
>
> It sounds to me like it could be be a server issue under
> a very heavy load... How many nfsd are you running? Try
> increasing the number to see if that helps....
I remembered that the NFS-HowTo refers to this by giving a rule how to
detremine if one needs more nfsd running, the HowTo says
(http://nfs.sourceforge.net/nfs-howto/performance.html section 5.6):
"If you are using a 2.4 or higher kernel and you want to see how heavily
each nfsd thread is being used, you can look at the file
/proc/net/rpc/nfsd. The last ten numbers on the th line in that file
indicate the number of seconds that the thread usage was at that
percentage of the maximum allowable. If you have a large number in the
top three deciles, you may wish to increase the number of nfsd
instances."
The th line looks like this (after changing to 64 nfsd, obviously):
th 64 6121728 134012.900 61327.500 34092.130 21573.980 22513.750
8121.200 5826.550 4062.540 3129.340 26975.820
The last ten numbers are then:
134012.900 61327.500 34092.130 21573.980 22513.750 8121.200 5826.550
4062.540 3129.340 26975.820
But what is referred by "top three deciles"? I have an english
understanding problem here, sorry. I looked up the word decile and it
means what I guessed: one tenth or one unit out of ten. Does that mean
that the top three deciles are:
134012.900 61327.500 34092.130 ?
That does not make sense to me, because it says "If you have a large
number...", it refers to "a" number or should it read "If you have a
large number amongst the top...".
And then what is "the thread usage at that percentage of the maximum
allowable"? Which value refers to the maximum? 6121728?
Can someone please try to explain this to me, I'm pretty much lost...
TIA
Marc
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
Hi Bernd,
Thanks for that hint, I've used it a couple of times meanwhile and it
works fine.
Marc
Bernd Schubert wrote:
> No, no, I simply mean:
>
>mount -t nfs server:export_dir target_dir
>
>so just 'overmounting' the already mounted directory.
>
>Hope it helps,
>Bernd
>
>
-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs