Hello Neil,
I've been trying to get at this problem for a while now, and had been
concentrating on the client-side of the problem (and consequently
bothering Trond about it) [1,2]. I am now pretty much convinced this is a
server-side problem, and as I've patched 2.4.20 with all the NFS patches
pending (that didn't have to do with the kernel lock breaking) and still
see the issue, I decided to report this bug.
The scenario is: a set of NFS clients with root mounted over nfs from a
single server. Clients run vanilla 2.4.20, server runs 2.4.20 patched
with your server-side patches I mentioned above. The clients run okay
for a period, and then one of them will start to hang for long periods
of time for certain operations (it happens on startup and shutdown, for
instance). Once the client hangs start the server needs to be rebooted
for it to clear up.
It seems to be reproducible by having the client hang or reboot without
shutting down properly. Another tip is that the server gets files left
over in /var/lib/nfs/sm/ for the hanging client(s).
I've been trying to track this down for a while, but since I'm not very
proficient with debugging at this level, I haven't had much luck. It's
really a problem because I need to reboot and make 20 people stop
working when the problem gets serious. Trond has had a hand trying
to help me, but we still haven't uncovered anything. I wonder if you
have any clue what could be happenning?
The other details are standard: the clients are debian woodys with
nfs-utils 1.0.1 installed, and the server has the same version. The
server runs reiserfs over RAID-1 partitions (using the kernel md
driver). Could it be triggered because of this perhaps unusual
combination?
Some of the messages I point out below have some info about the issue -
including tcpdumps and traces of nlm_debug on the server and client.
Mount options follow for the client filesystems:
anthem:/export/root/ / nfs defaults,rw,rsize=8192,wsize=8192,nfsvers=2 0 0
anthem:/home /home nfs defaults,rw,rsize=8192,wsize=8192,nfsvers=3 0 0
I have checked and, yes, root is mounted using version 2 and the rest as
version 3. Perhaps I should try getting the kernel to mount root using
version 3?
[1] http://groups.google.com/groups?q=trond+christian+nfs&hl=pt&lr=&ie=UTF-8&client=googlet&scoring=d&selm=20030108151424.N2628%40blackjesus.async.com.br.lucky.linux.kernel&rnum=1
[2] http://groups.google.com/groups?hl=pt&lr=&ie=UTF-8&client=googlet&th=3575b3c5f3360eb0&seekm=20030108151424.N2628%40blackjesus.async.com.br.lucky.linux.kernel&frame=off
Thanks for any help you can give.
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
On Friday January 24, [email protected] wrote:
>
> Hello Neil,
Hi.
>
> I've been trying to get at this problem for a while now, ....
>
> It seems to be reproducible by having the client hang or reboot without
> shutting down properly. Another tip is that the server gets files left
> over in /var/lib/nfs/sm/ for the hanging client(s).
>
> Mount options follow for the client filesystems:
>
> anthem:/export/root/ / nfs defaults,rw,rsize=8192,wsize=8192,nfsvers=2 0 0
> anthem:/home /home nfs defaults,rw,rsize=8192,wsize=8192,nfsvers=3 0 0
>
Hmmm. So you have several clients all mounting the same root
filesystem, and mounting it writable? That doesn't sound like a plan
for success. How do you make sure the clients don't tread over each
other when using /etc files?
I suspect that what you really want is to mount root read-only, or
mount separate roots for each client, and then in either case to mount
with the "nolock" flag.
I suspect that your problem is related to the client trying to do
locking, but no having statd running on the client.
You cannot meaningfully do locking on an NFS mounted root filesystem.
Infact, I think it would be good if the default mount options for nfs
root included nolock... and if I read fs/nfs/nfsroot.c:root_nfs_name
correctly, nolock is the default. Are you overriding that default
be explicitly setting "lock"??
NeilBrown
On Sat, Jan 25, 2003 at 02:54:09PM +1100, Neil Brown wrote:
> Hmmm. So you have several clients all mounting the same root
> filesystem, and mounting it writable? That doesn't sound like a plan
> for success. How do you make sure the clients don't tread over each
> other when using /etc files?
The truth is few (broken wrt the FHS) programs actually write to /etc. I
have set up everything so nothing is written to in /etc, and it actually
works very well (have to use a special init(8) that doesn't write to
/etc/ioctl.save). This setup has been running for almost a year now,
with the locking problem being the only one left to fix.
> I suspect that what you really want is to mount root read-only, or
> mount separate roots for each client, and then in either case to mount
> with the "nolock" flag.
Well, mounting root read-only is a good idea but it sacrifices being
able to administer the system from any station, and it also puts a lot
of burden on me to fix *all* programs to not write to anywhere on it.
This shouldn't be too hard, but we're still just working around the bug,
which I would really like to identify and fix.
> I suspect that your problem is related to the client trying to do
> locking, but no having statd running on the client.
I am 100% positive statd runs on every single client. This problem here
only happens spuriously. It goes away when I restart nfsd and mountd
(in that order). It really does look like a bug <wink>
> You cannot meaningfully do locking on an NFS mounted root filesystem.
> Infact, I think it would be good if the default mount options for nfs
> root included nolock... and if I read fs/nfs/nfsroot.c:root_nfs_name
> correctly, nolock is the default. Are you overriding that default
> be explicitly setting "lock"??
Nope. I've just tested and the default (specifying no lock option upon
bootup) really is nolock:
/dev/root on / type nfs (rw,v3,rsize=8192,wsize=8192,hard,udp,nolock,addr=192.168.99.4)
I wonder why you can't do locking on NFS root (if it's a current
limitation of if it doesn't make sense).
But I also think this problem shouldn't be happening if no locking was
going on. And when I checked using nlm_debug it sure did seem locking
was being used. What do you make of it?
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
>>>>> " " == Christian Reis <[email protected]> writes:
> I wonder why you can't do locking on NFS root (if it's a
> current limitation of if it doesn't make sense).
locking supposes that you are already running a statd daemon, which
you clearly cannot be doing on an nfsroot system. If you need locking
on a root partition, then you'll need to set up an initrd from which
to start all the necessary daemons...
BTW: Did I understand you and Neil correctly when you appeared to say
that you were sharing the *same* root partition between several
clients?
If so, then that could easily explain your problem: a directory like
/var/lib/nfs simply cannot be shared among several different
machines. Read the 'statd' manpage, and I'm sure you will understand
why.
Cheers,
Trond
On Sun, Jan 26, 2003 at 10:49:14PM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > I wonder why you can't do locking on NFS root (if it's a
> > current limitation of if it doesn't make sense).
>
> locking supposes that you are already running a statd daemon, which
> you clearly cannot be doing on an nfsroot system. If you need locking
> on a root partition, then you'll need to set up an initrd from which
> to start all the necessary daemons...
This makes a lot of sense, I just had never thought about it properly.
I'm not sure I *need* locking, so I'll run with nolock till it bites me.
> BTW: Did I understand you and Neil correctly when you appeared to say
> that you were sharing the *same* root partition between several
> clients?
Yes, you did understand correctly. The same root partition is mounted by
around 20 machines. It works, too. The bug that we have manifests itself
very rarely, and only when one of the machines does an unclean shutdown.
I still haven't been able to reproduce it so I still haven't seen a
solution yet.
> If so, then that could easily explain your problem: a directory like
> /var/lib/nfs simply cannot be shared among several different
> machines. Read the 'statd' manpage, and I'm sure you will understand
> why.
Well, none of the machines by default exports anything through NFS, so
none of them explicitly *need* /var/lib/nfs. I've done some careful
study and separated the directories which are written to on a per-host
basis, and used a lot of tmpfs. It works quite well, to be honest. A
breakdown of "special" directories:
- /var/spool and /var/log need to be separate, for obvious reasons.
- /proc/mounts should be linked to /etc/mtab to avoid the need for
writing there.
- /tmp, /var/tmp, /dev/shm, /var/lock, /var/run, /var/lib/nfs,
/var/yp/binding, /var/lib/sendmail are tmpfs.
None of the users have root access so writing to the partition only is
done as the result of servers running. I used a lot of reboots and ls
-lt to find out what needs to be separate, and there are few issues that
need fixing (/etc/ioctl.save being the latest).
One issue I ran into that I only discovered today (well, we all have to
learn someday) was that a shared /dev is not a good idea, because some
programs write to it. Case in point was syslogd, which creates /dev/log
- all but the last machine had logging broken. Since nobody needs logs
on these boxes anyway, it had gone on unnoticed, but I'm now using
devfs, and it works fine.
Everybody seems to find this setup a bit bizarre. It's not. It keeps
maintenence down to zero for everything, and adding a new box means
running a script once.
statd(8) does indicate that /var/lib/nfs is private, so I just mount it
as tmpfs. Should I make it persistent, or is the fact those files
disappear on an unclean reboot a sign of trouble?
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
>>>>> " " == Christian Reis <[email protected]> writes:
> statd(8) does indicate that /var/lib/nfs is private, so I just
> mount it as tmpfs. Should I make it persistent, or is the fact
> those files disappear on an unclean reboot a sign of trouble?
If you want locking to work, then /var/lib/nfs *MUST* be
persistent and unique for each client.
If not then the server will fail to be notified that it needs to
release any POSIX locks it might think you were holding if/when your
NFS client fails to shutdown cleanly.
That again will typically cause a deadlock the next time you try to
access your mailspool (if the server thinks it is already holding a
lock on your behalf).
Cheers,
Trond
On Mon, Jan 27, 2003 at 12:02:00AM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > statd(8) does indicate that /var/lib/nfs is private, so I just
> > mount it as tmpfs. Should I make it persistent, or is the fact
> > those files disappear on an unclean reboot a sign of trouble?
>
> If you want locking to work, then /var/lib/nfs *MUST* be
> persistent and unique for each client.
I had never realized this; things are symmetric in an odd way with NFS,
and this bit with locking can trick you. I've changed the clients to
mount private nfs directories (the perks of shared root for diskless ;)
and I do hope things will work out from now on.
One thing worth noting is that the private /var/lib/nfs directory has to
be mounted a) with nolock (I assume) and b) *before* statd and lockd
have gone up.
> That again will typically cause a deadlock the next time you try to
> access your mailspool (if the server thinks it is already holding a
> lock on your behalf).
I am now left wondering how it bit us so little here. Is there a way of
finding out exactly *which* files are being locked at a certain time for
a certain client?
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
>>>>> " " == Christian Reis <[email protected]> writes:
> Is there a way of finding out exactly *which* files are being
> locked at a certain time for a certain client?
Not really. 'cat /proc/locks' is about the closest you can get. That
will give you no NFS-specific information though.
Cheers,
Trond
> Hmm, interesting. Can you try disabling some of the probes for
> extended keyboards in atkbd.c to see if some of them could confuse
> your keyboard so that the BIOS doesn't like it after boot? Also you
> may want to kill the keyboard reset on reboot ... (atkbd_cleanup) ...
I've been following this because my Dell Latitude C810 has the
"keyboard/mouse doesn't work after reboot" with all of the recent 2.5.x
kernel that I have tried.
When I saw the suggestion above to try removing the keyboard reset I
thought that was just too easy to pass up giving it a try. Sure enough,
removing just the one line that preforms the keyboard reset from
atkbd_cleanup solves the problem for me. Now I guess it's time to try
to determine why, and what the real fix should likely be.
Just thought it might be valuable to have another report about the
problem.
Later,
Tom
On 26 January 2003 18:02, Christian Reis wrote:
> On Sat, Jan 25, 2003 at 02:54:09PM +1100, Neil Brown wrote:
> > Hmmm. So you have several clients all mounting the same root
> > filesystem, and mounting it writable? That doesn't sound like a
> > plan for success. How do you make sure the clients don't tread
> > over each other when using /etc files?
>
> The truth is few (broken wrt the FHS) programs actually write to
> /etc. I have set up everything so nothing is written to in /etc, and
> it actually works very well (have to use a special init(8) that
> doesn't write to /etc/ioctl.save). This setup has been running for
> almost a year now, with the locking problem being the only one left
> to fix.
My root fs is RO. Works wonders. Clients simply CANNOT trash their
/bin, /lib etc ;)
> > I suspect that what you really want is to mount root read-only, or
> > mount separate roots for each client, and then in either case to
> > mount with the "nolock" flag.
>
> Well, mounting root read-only is a good idea but it sacrifices being
> able to administer the system from any station, and it also puts a
> lot of burden on me to fix *all* programs to not write to anywhere on
> it. This shouldn't be too hard, but we're still just working around
> the bug, which I would really like to identify and fix.
It was not really *that* difficult for me. I used devfs and symlinks.
/etc, /var, /tmp are different directories per client,
/home, /usr are shared. The rest stays on root fs readonly.
ssh to NFS server if you want to modify some files on root fs.
Separate etc/var/tmp files for each client = no concurrent rw access.
> > I suspect that your problem is related to the client trying to do
> > locking, but no having statd running on the client.
>
> I am 100% positive statd runs on every single client. This problem
> here only happens spuriously. It goes away when I restart nfsd and
> mountd (in that order). It really does look like a bug <wink>
File locking over the network is hard to do reliably.
I have no experience with that in NFS, but presume there
can be problems in some situations (statd or portmap
crashed on a client, client hung/disconnected from the net,
etc etc etc...)
Anyway, such corner cases are painful, thank you for
your efforts to nail it down.
--
vda
On 27 January 2003 00:47, Christian Reis wrote:
> On Sun, Jan 26, 2003 at 10:49:14PM +0100, Trond Myklebust wrote:
> > >>>>> " " == Christian Reis <[email protected]> writes:
> > > I wonder why you can't do locking on NFS root (if it's a
> > > current limitation of if it doesn't make sense).
> >
> > locking supposes that you are already running a statd daemon, which
> > you clearly cannot be doing on an nfsroot system. If you need
> > locking on a root partition, then you'll need to set up an initrd
> > from which to start all the necessary daemons...
>
> This makes a lot of sense, I just had never thought about it
> properly. I'm not sure I *need* locking, so I'll run with nolock till
> it bites me.
>
> > BTW: Did I understand you and Neil correctly when you appeared to
> > say that you were sharing the *same* root partition between several
> > clients?
>
> Yes, you did understand correctly. The same root partition is mounted
> by around 20 machines. It works, too. The bug that we have manifests
> itself very rarely, and only when one of the machines does an unclean
> shutdown. I still haven't been able to reproduce it so I still
> haven't seen a solution yet.
>
> > If so, then that could easily explain your problem: a directory
> > like /var/lib/nfs simply cannot be shared among several different
> > machines. Read the 'statd' manpage, and I'm sure you will
> > understand why.
>
> Well, none of the machines by default exports anything through NFS,
> so none of them explicitly *need* /var/lib/nfs. I've done some
> careful study and separated the directories which are written to on a
> per-host basis, and used a lot of tmpfs. It works quite well, to be
> honest. A breakdown of "special" directories:
>
> - /var/spool and /var/log need to be separate, for obvious reasons.
> - /proc/mounts should be linked to /etc/mtab to avoid the need for
> writing there.
> - /tmp, /var/tmp, /dev/shm, /var/lock, /var/run, /var/lib/nfs,
> /var/yp/binding, /var/lib/sendmail are tmpfs.
I did the same.
You will end up amending this list. Simplify it:
/var need to be separate, for obvious reasons. ;)
/tmp need to be separate
/etc need to be separate
> None of the users have root access so writing to the partition only
> is done as the result of servers running. I used a lot of reboots and
> ls -lt to find out what needs to be separate, and there are few
> issues that need fixing (/etc/ioctl.save being the latest).
Entire /etc. How can you have different per-client configs for
e.g. /etc/resolv.conf? I know you don't usually need that.
Sometimes we need to do unusual things ;)
> One issue I ran into that I only discovered today (well, we all have
> to learn someday) was that a shared /dev is not a good idea, because
> some programs write to it. Case in point was syslogd, which creates
> /dev/log - all but the last machine had logging broken. Since nobody
> needs logs on these boxes anyway, it had gone on unnoticed, but I'm
> now using devfs, and it works fine.
Same here. Devfs is cool ;)
For one, it forces people to think before they got strange ideas of
putting something foreign in /dev. Like abm syslogd.
> Everybody seems to find this setup a bit bizarre. It's not. It keeps
> maintenence down to zero for everything, and adding a new box means
> running a script once.
Yeah! ;) What a contrast with typical Windows network mess
you can find in random office!
--
vda
On Tue, Jan 28, 2003 at 10:00:05AM +0200, Denis Vlasenko wrote:
> > Well, mounting root read-only is a good idea but it sacrifices being
> > able to administer the system from any station, and it also puts a
> > lot of burden on me to fix *all* programs to not write to anywhere on
> > it. This shouldn't be too hard, but we're still just working around
> > the bug, which I would really like to identify and fix.
>
> It was not really *that* difficult for me. I used devfs and symlinks.
> /etc, /var, /tmp are different directories per client,
> /home, /usr are shared. The rest stays on root fs readonly.
> ssh to NFS server if you want to modify some files on root fs.
>
> Separate etc/var/tmp files for each client = no concurrent rw access.
I agree it is a lot simpler; however, you have to give up the ability to
install and upgrade system software seamlessly. When Debian reports a
security issue, all I do is apt-get -u upgrade and skim through it - all
boxes are magically updated. No need to update the individual /etc files
for the changes, and no messy links either.
It does require you take care, though. The most important issue is
finding out what files are written to in these directories (in violation
of the LFS/FHS, I must say). The current culprit I am after is a
/sbin/init, who writes to /etc/ioctl.save (why, I wonder). After a lot
of cleanup, I've managed to pair this down to teh minimum, and I'm going
after some of the last culprits now.
> File locking over the network is hard to do reliably.
> I have no experience with that in NFS, but presume there
> can be problems in some situations (statd or portmap
> crashed on a client, client hung/disconnected from the net,
> etc etc etc...)
>
> Anyway, such corner cases are painful, thank you for
> your efforts to nail it down.
It seems Trond has given us the answer to the problem: the persistence
of /var/lib/nfs seems to be essential to a healthy diskless client. One
of our co-workers who was an expert as triggering the problems is at the
beach this week, so I can't tell for sure, but next Tuesday or so I hope
to post to NFS-list with [SUMMARY] in the Subject line <wink>
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
On Tue, Jan 28, 2003 at 10:14:09AM +0200, Denis Vlasenko wrote:
> > None of the users have root access so writing to the partition only
> > is done as the result of servers running. I used a lot of reboots and
> > ls -lt to find out what needs to be separate, and there are few
> > issues that need fixing (/etc/ioctl.save being the latest).
>
> Entire /etc. How can you have different per-client configs for
> e.g. /etc/resolv.conf? I know you don't usually need that.
> Sometimes we need to do unusual things ;)
Well, the per-client configurations are an exception at our office, and
the only things we customize are XFree86 (we use the
Xfree86Config.hostname capability), gpm and the kernel, which is
dealt out by DHCPD. We also have a special startup script that is run
for the named box if it exists (/etc/init.d/host-specific/`hostname`).
I'm sure this won't work for everybody, but it does work for us, a
smallish development team.
> Same here. Devfs is cool ;)
> For one, it forces people to think before they got strange ideas of
> putting something foreign in /dev. Like abm syslogd.
Only after using devfs in this context have I come to appreciate how
nice it is. And I had it in place after 10 minutes of reading and two of
recompiled kernels. Amazing.
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
Am Die, 2003-01-28 um 09.00 schrieb Denis Vlasenko:
> It was not really *that* difficult for me. I used devfs and symlinks.
> /etc, /var, /tmp are different directories per client,
> /home, /usr are shared. The rest stays on root fs readonly.
> ssh to NFS server if you want to modify some files on root fs.
This will only work dandy if the server runs the same OS on the
same architecture and its own system is well enough equipped to
do software installations and bootstraps. Although I'm using Linux on
my server as well as the same architecture as most of the clients
I sometimes experience troubles working in the chrooted client
environment.
--
Servus,
Daniel