2002-10-14 23:31:07

by Andrew Ryan

[permalink] [raw]
Subject: 2.4.19+RPC_ALL hangs running dbench 2.0

I've been running tests on the 2.4.19_NFS_ALL (the one from Oct 5) kernel
and seeing an easily reproducible hang on my machine (2x1.4 GHz PIII,
Compaq DL380G2, 4GB RAM), mounting a Netapp (F820 running 6.2R2) with the
mount options:
rw,tcp,nfsvers=3,rsize=32768,wsize=32768,intr,hard

The symptom is, I start a dbench run, and it starts up and runs for a
bit...
$ ~/dbench-2.0/dbench 16
clients started
16 23801 21.45 MB/sec

Then it gets hung up, and the dbench process is still running, and
the MB/sec number keeps dropping rapidly, approaching 0.

At this point:
* Any commands in other shells that are currently running (e.g. 'top') are
hung.
* My other shells are not hung, but if I try to execute any commands, the
commands hang forever.
* I can kill the dbench process with Ctrl-C, but that just gives me a
shell that cannot execute any commands (they all hang, like the other
shells).
* The nmi_watchdog is never triggered, even though the system is
completely unresponsive from a user level.

When I ctl-C the hung dbench process, sometimes the kernel generates
an oops, but other times not. If I have kdb on, I can get a backtrace, but
I was hoping there was an easier way to figure out what is causing this
bug. The one oops I get says something about 'kernel BUG at
highmem.c:159!'

Note I do *NOT* get this error if I run without the NFS_ALL. I also tested
this with just the RPC_ALL and I get the same error. So it definitely has
to be something in the RPC_ALL patchset. I'm confused though, bec. this is
the patchset which claims to have specific fixes for HIGHMEM.

All I really want is a fast, stable client for my 4GB, 2 CPU boxes. I'd
use the stock 2.4.19 but the RPC_ALL patchset leads me to believe that
there are HIGHMEM bugs in the stock 2.4.19 NFS client.

I'm willing to do some testing to chase this down, if it helps.


andrew



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2002-10-15 01:05:45

by Andrew Ryan

[permalink] [raw]
Subject: Re: 2.4.19+RPC_ALL hangs running dbench 2.0

Further testing has revealed that the hang shows up if the only patch
applied to vanilla 2.4.19 is linux-2.4.19-01-fix_kmap1.dif, if this helps
narrow down the problem.


andrew



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-15 03:06:06

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.19+RPC_ALL hangs running dbench 2.0

>>>>> " " == Andrew Ryan <[email protected]> writes:

> When I ctl-C the hung dbench process, sometimes the kernel
> generates an oops, but other times not. If I have kdb on, I can
> get a backtrace, but I was hoping there was an easier way to
> figure out what is causing this bug. The one oops I get says
> something about 'kernel BUG at highmem.c:159!'

The following patch removes an unbalanced kunmap() that should really
have gone in -fix_kmap2.dif.

Marcelo: this needs to be applied to 2.4.20-pre10 (it's already in 2.5.x).

Cheers,
Trond

--- linux-2.4.19-smp/fs/nfs/dir.c.orig Sat Oct 5 03:55:12 2002
+++ linux-2.4.19-smp/fs/nfs/dir.c Tue Oct 15 04:59:27 2002
@@ -152,7 +152,6 @@
return 0;
error:
SetPageError(page);
- kunmap(page);
UnlockPage(page);
invalidate_inode_pages(inode);
desc->error = error;


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-15 05:01:53

by Andrew Ryan

[permalink] [raw]
Subject: Re: 2.4.19+RPC_ALL hangs running dbench 2.0

I applied this patch on top of Oct-05 NFS_ALL, rebuilt my kernel, and still
get the same problem.

Anything else I can try?

At 05:05 AM 10/15/02 +0200, Trond Myklebust wrote:
> >>>>> " " == Andrew Ryan <[email protected]> writes:
>
> > When I ctl-C the hung dbench process, sometimes the kernel
> > generates an oops, but other times not. If I have kdb on, I can
> > get a backtrace, but I was hoping there was an easier way to
> > figure out what is causing this bug. The one oops I get says
> > something about 'kernel BUG at highmem.c:159!'
>
>The following patch removes an unbalanced kunmap() that should really
>have gone in -fix_kmap2.dif.
>
>Marcelo: this needs to be applied to 2.4.20-pre10 (it's already in 2.5.x).
>
>Cheers,
> Trond
>
>--- linux-2.4.19-smp/fs/nfs/dir.c.orig Sat Oct 5 03:55:12 2002
>+++ linux-2.4.19-smp/fs/nfs/dir.c Tue Oct 15 04:59:27 2002
>@@ -152,7 +152,6 @@
> return 0;
> error:
> SetPageError(page);
>- kunmap(page);
> UnlockPage(page);
> invalidate_inode_pages(inode);
> desc->error = error;



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-15 12:40:58

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.19+RPC_ALL hangs running dbench 2.0

>>>>> " " == Andrew Ryan <[email protected]> writes:

> I applied this patch on top of Oct-05 NFS_ALL, rebuilt my
> kernel, and still get the same problem.

You are still seeing the Oops, or just the hangs?

Cheers,
Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-15 13:25:53

by Andrew Ryan

[permalink] [raw]
Subject: Re: 2.4.19+RPC_ALL hangs running dbench 2.0

Sorry, no Oops, just hangs.



andrew

Trond Myklebust wrote:
>>>>>>" " == Andrew Ryan <[email protected]> writes:
>>>>>
>
> > I applied this patch on top of Oct-05 NFS_ALL, rebuilt my
> > kernel, and still get the same problem.
>
> You are still seeing the Oops, or just the hangs?
>
> Cheers,
> Trond
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs