2007-08-22 18:54:28

by Morrison, Tom

[permalink] [raw]
Subject: NFS/RPC Hangs after updating time...

Hi,

I've got an unusual problem with a corner case that I
am investigating. I've tried 'googling' around to
find if someone has discussed this before, but
I have yet to find any discussion close to this -
so I bring it to your collective minds...

I am working with a 2.6.11++ kernel on an embedded
server platform that is NFS serving Linux rootfs
for other embedded NFS Client boards.

Everything works fine when the system time on the
server board is relatively synchronized with the real-time.

It hangs after attempting to update the time from a
nonsensical time (e.g.: 2 months ago) - the most significant
part of it is that it only hangs IFF it has started
serving its NFS client boards before I attempt to
update the time.


The most significant output (when turning on
RPC debugging) is from:

linux/net/sunrpc/cache.c (cache_check) - line 90:

>> Want update, refage=1800, age=4288285

It continually loops through this method - and the cache
never gets updated...even thought with some additional
sleuthing (aka: additional debug printks - it thinks
that there is an cache update pending).

Thanks for any ideas you have on this subject.

Sincerely,





Tom Morrison
Principal S/W Engineer
Empirix, Inc (http://www.empirix.com)
[email protected]
(781) 266 - 3567



2007-08-22 20:26:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS/RPC Hangs after updating time...

On Wed, Aug 22, 2007 at 02:37:22PM -0400, Morrison, Tom wrote:
> I am working with a 2.6.11++ kernel on an embedded
> server platform that is NFS serving Linux rootfs
> for other embedded NFS Client boards.
>
> Everything works fine when the system time on the
> server board is relatively synchronized with the real-time.
>
> It hangs after attempting to update the time from a
> nonsensical time (e.g.: 2 months ago) - the most significant
> part of it is that it only hangs IFF it has started
> serving its NFS client boards before I attempt to
> update the time.
>
>
> The most significant output (when turning on
> RPC debugging) is from:
>
> linux/net/sunrpc/cache.c (cache_check) - line 90:
>
> >> Want update, refage=1800, age=4288285
>
> It continually loops through this method - and the cache
> never gets updated...even thought with some additional
> sleuthing (aka: additional debug printks - it thinks
> that there is an cache update pending).

Can you reproduce the problem with the current kernel? (Say 2.6.22 or
later?)

--b.

2007-08-31 18:36:21

by Morrison, Tom

[permalink] [raw]
Subject: Follow up to: NFS/RPC Hangs after updating time...

This is a follow-up...

After a huge pain in the rear upgrading from a
2.6.11++ to a 2.6.23-rc3 (I'll give the powerpc
folks a 'piece' of my mind on that front) - the
NFS hang problem that I was experiencing on the
older kernel is NOT occurring on this new version.

Now what do I do?

Is the net/sunrpc net/nfsx pieces isolated enough
from the rest of the kernel that I could fork-lift
it back to the 2.6.11 (or is that really a lost cause).

Thanks for any/all feedback on this front!


-----Original Message-----
From: J. Bruce Fields [mailto:[email protected]]
Sent: Wednesday, August 22, 2007 4:27 PM
To: Morrison, Tom
Cc: [email protected]; [email protected]; Rushton,
Matt
Subject: Re: NFS/RPC Hangs after updating time...

On Wed, Aug 22, 2007 at 02:37:22PM -0400, Morrison, Tom wrote:
> I am working with a 2.6.11++ kernel on an embedded
> server platform that is NFS serving Linux rootfs
> for other embedded NFS Client boards.
>
> Everything works fine when the system time on the
> server board is relatively synchronized with the real-time.
>
> It hangs after attempting to update the time from a
> nonsensical time (e.g.: 2 months ago) - the most significant
> part of it is that it only hangs IFF it has started
> serving its NFS client boards before I attempt to
> update the time.
>
>
> The most significant output (when turning on
> RPC debugging) is from:
>
> linux/net/sunrpc/cache.c (cache_check) - line 90:
>
> >> Want update, refage=1800, age=4288285
>
> It continually loops through this method - and the cache
> never gets updated...even thought with some additional
> sleuthing (aka: additional debug printks - it thinks
> that there is an cache update pending).

Can you reproduce the problem with the current kernel? (Say 2.6.22 or
later?)

--b.

2007-08-31 21:46:06

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Follow up to: NFS/RPC Hangs after updating time...

On Fri, Aug 31, 2007 at 02:35:19PM -0400, Morrison, Tom wrote:
> This is a follow-up...
>
> After a huge pain in the rear upgrading from a
> 2.6.11++ to a 2.6.23-rc3 (I'll give the powerpc
> folks a 'piece' of my mind on that front) - the
> NFS hang problem that I was experiencing on the
> older kernel is NOT occurring on this new version.
>
> Now what do I do?

Well, between the time jump and the rpc debugging output, you've got
some great clues there--given some time I'm sure it would be possible to
completely figure out what's going on.

Unfortunately the people with the most knowledge of the code probably
don't have the time to fix problems on old kernels, so unless somebody
else recognizes the problem immediately, I'm not sure what to suggest.
Obviously, a wholesale upgrade to a more recent kernel would be the one
sure bet....

> Is the net/sunrpc net/nfsx pieces isolated enough
> from the rest of the kernel that I could fork-lift
> it back to the 2.6.11 (or is that really a lost cause).

I suspect it's a lost cause. A lot has happened in the last couple
years.

--b.

> > It hangs after attempting to update the time from a
> > nonsensical time (e.g.: 2 months ago) - the most significant
> > part of it is that it only hangs IFF it has started
> > serving its NFS client boards before I attempt to
> > update the time.
> >
> >
> > The most significant output (when turning on
> > RPC debugging) is from:
> >
> > linux/net/sunrpc/cache.c (cache_check) - line 90:
> >
> > >> Want update, refage=1800, age=4288285
> >
> > It continually loops through this method - and the cache
> > never gets updated...even thought with some additional
> > sleuthing (aka: additional debug printks - it thinks
> > that there is an cache update pending).
>
> Can you reproduce the problem with the current kernel? (Say 2.6.22 or
> later?)
>
> --b.