2003-11-12 18:06:06

by Jason Holmes

[permalink] [raw]
Subject: 2.4 vs. 2.6 nfs client performance

Hi,

I'm running some NFS performance tests to determine the best way to go
to configure a few fileservers for some linux clusters I run. Right now
I'm getting together a suite of test programs representative of the
typical applications we see run on our clusters (scientific applications
such as Ansys or Abaqus, custom MPI code, etc.) for use as a benchmark
suite that I can disperse across 16 machines or so to create a decent
load on the fileservers. I've just got started with a program called
Gaussian03 (molecular modelling code that does a lot of I/O) and I'm
already seeing some odd performance differences between the 2.4.22
client and the 2.6.0-test9-mm1 client (both against a 2.6.0-test9-mm2
server):

async mounts
------------
2.4.22: 110.87user 43.69system 4:04.58elapsed 63%CPU
2.6.0-test9-mm1: 111.88user 315.57system 9:51.87elapsed 72%CPU

sync mounts
-----------
2.4.22: 109.99user 45.49system 32:04.44elapsed 8%CPU
2.6.0-test9-mm1: 112.33user 197.76system 1:08:13elapsed 7%CPU

Note that the 1:08:13 in the sync 2.6.0-test9-mm1 is 1 *hour*, 8
minutes, not 1 *minute*. In both cases the 2.6 client came in at about
twice the time. A local run not using NFS finishes in 2:13.49.

The nfsstats output for two async runs looks like:

-- 2.4.22 --
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 880 0% 0 0% 160 0% 4745 0% 0 0%
read write create mkdir symlink mknod
780 0% 178588 32% 24 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
24 0% 0 0% 0 0% 0 0% 18 0% 0 0%
fsstat fsinfo pathconf commit
3 0% 3 0% 0 0% 369185 66%

Client rpc stats:
calls retrans authrefrsh
554410 2099 0


-- 2.6.0-test9-mm1 --

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 740 0% 0 0% 126 0% 249 0% 0 0%
read write create mkdir symlink mknod
855 0% 178574 14% 24 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
24 0% 0 0% 0 0% 0 0% 18 0% 0 0%
fsstat fsinfo pathconf commit
0 0% 3 0% 0 0% 1023777 85%

Client rpc stats:
calls retrans authrefrsh
1204390 5 0

The 2.6 client has over twice the number of RPC calls and almost 3 times
the number of commits whereas the 2.4 client has alot more accesses.
Mounts were done with rsize=32768,wsize=32768. The NFS filesystem is
ext3. The two machines are connected via gigabit ethernet and the
traffic between them never goes above 20-30 MB/s.

Can someone clue me in as to why this may be happening and if it's a
"bug" or not?

Thanks,

--
Jason Holmes


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-11-12 20:20:41

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4 vs. 2.6 nfs client performance

>>>>> " " == Jason Holmes <[email protected]> writes:

> Can someone clue me in as to why this may be happening and if
> it's a "bug" or not?

Large numbers of 'commit' operations are a typical sign of the memory
management going wild. Have you compared this to the standard
2.6.0-test9 kernel?

What sort of load did you otherwise have on that machine?

Cheers,
Trond


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-11-12 20:51:43

by Jason Holmes

[permalink] [raw]
Subject: Re: 2.4 vs. 2.6 nfs client performance

Trond Myklebust wrote:
>
> >>>>> " " == Jason Holmes <[email protected]> writes:
>
> > Can someone clue me in as to why this may be happening and if
> > it's a "bug" or not?
>
> Large numbers of 'commit' operations are a typical sign of the memory
> management going wild. Have you compared this to the standard
> 2.6.0-test9 kernel?
>
> What sort of load did you otherwise have on that machine?

The machines (both client and server) are idle other than the testing
I'm doing and I'm doing only one test at a time at this point.

I did not do plain 2.6.0-test9 yet, but here is 2.6.0-test6 and
2.6.0-test9-bk17:

-- 2.6.0-test6 --

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 749 0% 0 0% 130 0% 251 0% 0 0%
read write create mkdir symlink mknod
833 0% 178577 14% 24 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
24 0% 0 0% 0 0% 0 0% 18 0% 1 0%
fsstat fsinfo pathconf commit
0 0% 3 0% 0 0% 1024801 85%

Client rpc stats:
calls retrans authrefrsh
1205411 7 0

-- 2.6.0-test9-bk17 --

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 744 0% 0 0% 129 0% 252 0% 0 0%
read write create mkdir symlink mknod
873 0% 178574 14% 24 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
24 0% 0 0% 0 0% 0 0% 18 0% 0 0%
fsstat fsinfo pathconf commit
0 0% 3 0% 0 0% 1023451 84%

Client rpc stats:
calls retrans authrefrsh
1204092 3 0

Thanks,

--
Jason Holmes


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-11-12 21:18:41

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4 vs. 2.6 nfs client performance

>>>>> " " == Jason Holmes <[email protected]> writes:

> Client nfs v3: null getattr setattr lookup access readlink 0 0%
> 744 0% 0 0% 129 0% 252 0% 0 0% read write create mkdir symlink
> mknod 873 0% 178574 14% 24 0% 0 0% 0 0% 0 0% remove rmdir
> rename link readdir readdirplus 24 0% 0 0% 0 0% 0 0% 18 0% 0 0%
> fsstat fsinfo pathconf commit 0 0% 3 0% 0 0% 1023451 84%

> Client rpc stats: calls retrans authrefrsh 1204092 3 0

Something is seriously screwed up here: it should be impossible to get
more commit requests than there are write requests.

Basically, commit is only sent if at least one unstable write has been
sent, and is waiting on the commit list. The commit list itself is
emptied before the RPC call gets sent. If the commit fails, it
resends all pending writes.

IOW: it looks like you are seeing some form of list corruption. I
suspect that the problem here might be something scribbling over the
nfs_inode. Are you running with slab debugging, and stack overflow
enabled?

Cheers,
Trond


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-11-12 21:34:12

by Jason Holmes

[permalink] [raw]
Subject: Re: 2.4 vs. 2.6 nfs client performance

Trond Myklebust wrote:
>
> >>>>> " " == Jason Holmes <[email protected]> writes:
>
> > Client nfs v3: null getattr setattr lookup access readlink 0 0%
> > 744 0% 0 0% 129 0% 252 0% 0 0% read write create mkdir symlink
> > mknod 873 0% 178574 14% 24 0% 0 0% 0 0% 0 0% remove rmdir
> > rename link readdir readdirplus 24 0% 0 0% 0 0% 0 0% 18 0% 0 0%
> > fsstat fsinfo pathconf commit 0 0% 3 0% 0 0% 1023451 84%
>
> > Client rpc stats: calls retrans authrefrsh 1204092 3 0
>
> Something is seriously screwed up here: it should be impossible to get
> more commit requests than there are write requests.
>
> Basically, commit is only sent if at least one unstable write has been
> sent, and is waiting on the commit list. The commit list itself is
> emptied before the RPC call gets sent. If the commit fails, it
> resends all pending writes.
>
> IOW: it looks like you are seeing some form of list corruption. I
> suspect that the problem here might be something scribbling over the
> nfs_inode. Are you running with slab debugging, and stack overflow
> enabled?

I wasn't, but I'm recompiling test9-bk17 with all of the debugging
options turned on right now. Is there anything in particular you want
me to do?

Thanks,

--
Jason Holmes


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-11-12 22:08:01

by Eric Whiting

[permalink] [raw]
Subject: Re: 2.4 vs. 2.6 nfs client performance

Jason Holmes wrote:
>
<snip>
> I'm
> already seeing some odd performance differences between the 2.4.22
> client and the 2.6.0-test9-mm1 client (both against a 2.6.0-test9-mm2
> server):

Similar observations here. Our users have reported NFS slowdowns of 10x on
2.6.0-test6-mm4 nfs clients compared to the same box and setup running 2.4.22.
Something is goofy in 2.6.0-test6-mm4 that has a bad effect on NFS client
performance.

eric


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-11-12 23:22:18

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4 vs. 2.6 nfs client performance

>>>>> " " == Jason Holmes <[email protected]> writes:

> I wasn't, but I'm recompiling test9-bk17 with all of the
> debugging options turned on right now. Is there anything in
> particular you want me to do?

Come to think of it, could you try running with and without the patch
on

http://www.fys.uio.no/~trondmy/src/Linux-2.6.x/2.6.0-test9/linux-2.6.0-01-fix_deadlock.dif

That fixes a race that is likely to show up when memory is
low. Normally it should cause a deadlock rather than corruption, but
it is conceivable that there are other side-effects...

Please make sure that you run with slab debugging enabled...

Cheers,
Trond


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-11-13 01:20:34

by Jason Holmes

[permalink] [raw]
Subject: Re: 2.4 vs. 2.6 nfs client performance

Trond Myklebust wrote:
>
> >>>>> " " == Jason Holmes <[email protected]> writes:
>
> > I wasn't, but I'm recompiling test9-bk17 with all of the
> > debugging options turned on right now. Is there anything in
> > particular you want me to do?
>
> Come to think of it, could you try running with and without the patch
> on
>
> http://www.fys.uio.no/~trondmy/src/Linux-2.6.x/2.6.0-test9/linux-2.6.0-01-fix_deadlock.dif
>
> That fixes a race that is likely to show up when memory is
> low. Normally it should cause a deadlock rather than corruption, but
> it is conceivable that there are other side-effects...
>
> Please make sure that you run with slab debugging enabled...

With all debugging enabled, test9-bk17:

patched: 112.21user 259.86system 10:53.48elapsed 56%CPU
unpatched: 112.17user 252.35system 10:37.43elapsed 57%CPU

-- patched --

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 752 0% 0 0% 134 0% 260 0% 0 0%
read write create mkdir symlink mknod
873 0% 178571 14% 24 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
24 0% 0 0% 0 0% 0 0% 18 0% 0 0%
fsstat fsinfo pathconf commit
0 0% 3 0% 0 0% 1044537 85%

Client rpc stats:
calls retrans authrefrsh
1225196 8 0

-- unpatched --

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 753 0% 0 0% 134 0% 260 0% 0 0%
read write create mkdir symlink mknod
873 0% 178575 14% 24 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
24 0% 0 0% 0 0% 0 0% 18 0% 0 0%
fsstat fsinfo pathconf commit
0 0% 3 0% 0 0% 1050417 85%

Client rpc stats:
calls retrans authrefrsh
1231081 5 0

I didn't see anything to indicate that any debug code had triggered (I'm
assuming that if it did the kernel would give an OOPS?)

FWIW, I ran the test with nfsvers=2 (with the export still being async)
and it finished in 112.53user 45.02system 2:43.50elapsed 96%CPU.

Thanks,

--
Jason Holmes


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-11-13 02:07:54

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4 vs. 2.6 nfs client performance

>>>>> " " == Jason Holmes <[email protected]> writes:

> FWIW, I ran the test with nfsvers=2 (with the export still
> being async) and it finished in 112.53user 45.02system
> 2:43.50elapsed 96%CPU.

For comparison, on my own machine, I get stats of the form:

Client rpc stats:
calls retrans authrefrsh
248291 6225 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 77 0% 0 0% 60 0% 94 0% 0 0%
read write create mkdir symlink mknod
0 0% 240378 96% 8 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
33 0% 0 0% 0 0% 0 0% 0 0% 12 0%
fsstat fsinfo pathconf commit
0 0% 1 0% 0 0% 7628 3%


Any hints on how I can reproduce your results?

Cheers,
Trond


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-11-13 14:16:44

by Jason Holmes

[permalink] [raw]
Subject: Re: 2.4 vs. 2.6 nfs client performance

Trond Myklebust wrote:
>
> >>>>> " " == Jason Holmes <[email protected]> writes:
>
> > FWIW, I ran the test with nfsvers=2 (with the export still
> > being async) and it finished in 112.53user 45.02system
> > 2:43.50elapsed 96%CPU.
>
> For comparison, on my own machine, I get stats of the form:
>
> Client rpc stats:
> calls retrans authrefrsh
> 248291 6225 0
> Client nfs v2:
> null getattr setattr root lookup readlink
> 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
> read wrcache write create remove rename
> 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
> link symlink mkdir rmdir readdir fsstat
> 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
>
> Client nfs v3:
> null getattr setattr lookup access readlink
> 0 0% 77 0% 0 0% 60 0% 94 0% 0 0%
> read write create mkdir symlink mknod
> 0 0% 240378 96% 8 0% 0 0% 0 0% 0 0%
> remove rmdir rename link readdir readdirplus
> 33 0% 0 0% 0 0% 0 0% 0 0% 12 0%
> fsstat fsinfo pathconf commit
> 0 0% 1 0% 0 0% 7628 3%
>
> Any hints on how I can reproduce your results?

Unfortunately Gaussian's licensing would prohibit me from passing it
along to reproduce this. I'll try to reproduce this with another
program this morning and get back to you.

Thanks,

--
Jason Holmes


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs