2008-07-21 02:14:09

by Michel Lespinasse

[permalink] [raw]
Subject: Re: NFS hangs with 2.6.25/2.6.26 despite server being reachable

On Thu, Jul 17, 2008 at 11:04:05PM -0700, Michel Lespinasse wrote:
> On Wed, Jul 16, 2008 at 03:15:53PM -0400, J. Bruce Fields wrote:
> > On Tue, Jul 15, 2008 at 10:40:53PM -0700, Michel Lespinasse wrote:
> > > I'm getting frequent NFS hangs when running 2.6.25 or 2.6.26 on my
> > > NFS clients, while 2.6.24 seems to work fine.
> > > [...]
> > > Any ideas about what might be going wrong and/or what additional
> > > information I should try to collect about the hangs ?
> >
> > A sysrq-T trace showing where the clients were hung might help. (So,
> > "echo T >/proc/sysrq-trigger", then look at the logs.)
> Thanks for the reply. I'm now running with sysrq enabled.
> Have not captured the failure yet, but then again it's been only one night.
> I prefer to go with 2.6.25 instead of 2.6.26 because 2.6.25 generally
> recovers from the failure after a few minutes - so there is a higher chance
> that I'll actually get something useful logged.

It took me a while, as for some reason I could not get things to fail
this week (It's probably that I don't know all the factors that trigger
the NFS hangs, yet). Then today I got two NFS hangs in a row, running
kernel version on my K7 based client.

In both cases I captured information using alt-sysreq-t, the system
hung there for a few minutes, I double checked that the machine was
pingable from the server, and I got the dumps out of kern.log after
the machine recovered. The logs are incomplete, given that syslog
could not run well with the rootfs hung. I'm not sure if a larger
dmesg buffer would help ? Anyway, please get the logs from


In both cases I see a lot of nfs_wait_schedule, wait_on_bit_lock,
nfs_revalidate_inode, nfs_check_verifier. Not sure if that's expected,
but that's what I get, and the machine is pingable from the server side.

Hope this helps. Let me know if you want me to try something else.

Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.