2003-06-27 16:58:11

by Larry Sendlosky

[permalink] [raw]
Subject: ~2.4.20-pre3 -> 2.4.21 : nfs client read performance "broken"

We recently started using 2.4.21, a move up from 2.4.18. Somebody
in the group brought to my attention a nfs client performance
problem using 2.4.21 vs 2.4.18. Single client read throughput can be
cut in half!

I've narrowed down the problem to the "congestion
avoidance" patches that went into RPC, specifically changesets
1.555.49.11, .12, and .13. A kernel built with 1.555.48.10
works just fine, and everything after exhibits the performance
problem I will describe below. Building a kernel from the RH9
CD sources shows the same problem.

I've searched the mailing lists and google'd and cannot find any
other mention of this problem, so I'm hoping that I've missed something
obvious because nobody else has complained.

Client: Dell 600 connected to 100Mb switch. 512MB memory.
Server: Tyan 2510 running 2.4.18 connected to same 100Mb switch. 1GB memory.

The 100MB switch has 5 ports with 2 other basically inactive systems
in my office.

Performance summary:

Client running "pre-congestion control" RPC (circa 2.4.20-pre3)
can read 50MB of file data in about 5 seconds realtime with
each 256K buffer read taking less than 40ms with 0 RPC client retransmits.

Client running 2.4.21 can read 50MB of file data in about 10 seconds
realtime with 2/3 of the 256K buffer reads taking greater than 40ms with about
300 RPC client retransmits.

The only conclusion I can draw from my data is that the "congestion avoidance"
code is fundamentally broken because it can't handle the degenerate
case where there is *no* congestion.

Extra info:

In the RPC code, I tried treating UDP and TCP the same wrt the
"nocongestion control" flag, .i.e. in xprt_setup, in xprt.c

xprt->cwnd = RPC_MAXCWND;
xprt->nocong = 1;

for both protocols. My intend was to "neuter" the congestion avoidance
code. Well, the client reads never finished.

I also did some "spot" instrumentation and noticed that the new
code "ratchets" down the RPC timeout values as part of the algorithm.
In my test case, the timer gets down to 40ms and timeouts start to
occur and it continues to get some timeouts using that value thereafter.
This causes the retransmits. So, in another attempt to "neuter" the code, I
changed rpc_calc_rto in sunrpc/timer.c to never return a timer value
less than 100ms. I know all my reads happen much faster. However, with
that change, I see just as many timeouts of 100ms timers! And, many
I/O completion times go above 100ms. This is one of the reasons why I
think the code is broken.


Performance data:

Below shows the actual data I was referring to above. The 'workq' program
is an in-house tool that we use to "beat on" disks and filesystems.
'workq -r -t1 -f10 -S5M -s256k -dv' means use 1 thread to read 5MB
from 10 files using a 256k buffer. The files already exist.

Thanks for any help,

larry


[root@johny2 root]# uname -a
Linux pooperscooper 2.4.20-pre3 #2 SMP Fri Jun 27 09:35:01 EDT 2003 i686 n

[root@pooperscooper root]# !mount
mount -t nfs -o rsize=8192,wsize=8192 lws02:/var/data /var/data

[root@pooperscooper root]# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0
/proc /proc proc rw 0 0
/dev/md1 /boot ext3 rw 0 0
/dev/md7 /var ext3 rw 0 0
none /dev/pts devpts rw 0 0
none /dev/shm tmpfs rw 0 0
/dev/md10 /.metadata ext3 rw,noatime 0 0
lws02:/var/data /var/data nfs rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=lw0

[root@pooperscooper root]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
3 0 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 1 33% 0 0% 0 0% 0 0% 0 0%
read write create mkdir symlink mknod
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
fsstat fsinfo pathconf commit
1 33% 1 33% 0 0% 0 0%

[root@pooperscooper root]# cd /var
[root@pooperscooper var]# ls -l data/fifty.dat
-rw-r--r-- 1 root root 52428800 Jun 24 16:25 data/fifty.dat

[root@pooperscooper var]# time cat data/fifty.dat > /dev/null

real 0m4.820s
user 0m0.010s
sys 0m0.120s


[root@pooperscooper var]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
12736 0 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 6 0% 0 0% 1 0% 9 0% 0 0%
read write create mkdir symlink mknod
6400 99% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 1 0% 0 0%
fsstat fsinfo pathconf commit
1 0% 1 0% 0 0% 0 0%

[root@pooperscooper var]# !umount
umount /var/data

[root@pooperscooper var]# !mount
mount -t nfs -o rsize=8192,wsize=8192 lws02:/var/data /var/data

[root@pooperscooper var]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
12739 0 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 7 0% 0 0% 1 0% 9 0% 0 0%
read write create mkdir symlink mknod
6400 99% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 1 0% 0 0%
fsstat fsinfo pathconf commit
2 0% 2 0% 0 0% 0 0%

[root@pooperscooper var]# ls -l data
total 102576
-rwxr-xr-x 1 root root 5242880 Jun 23 11:15 disk0.dat
-rwxr-xr-x 1 root root 5242880 Jun 23 11:15 disk1.dat
-rwxr-xr-x 1 root root 5242880 Jun 23 11:15 disk2.dat
-rwxr-xr-x 1 root root 5242880 Jun 23 11:15 disk3.dat
-rwxr-xr-x 1 root root 5242880 Jun 23 11:15 disk4.dat
-rwxr-xr-x 1 root root 5242880 Jun 23 11:15 disk5.dat
-rwxr-xr-x 1 root root 5242880 Jun 23 11:15 disk6.dat
-rwxr-xr-x 1 root root 5242880 Jun 23 11:15 disk7.dat
-rwxr-xr-x 1 root root 5242880 Jun 23 11:15 disk8.dat
-rwxr-xr-x 1 root root 5242880 Jun 23 11:15 disk9.dat
-rw-r--r-- 1 root root 52428800 Jun 24 16:25 fifty.dat

[root@pooperscooper var]# /usr/local/bin/workq -r -t1 -f10 -S5M -s256k -dv

Copyright (c) 2000-2003, Storigen Systems, Inc.
All rights reserved.

OP THDS IOSZ FILS FSIZ BOFF FOFF BWDTH CPUT CLOCKT %CPUMbs CSWIT FAULT
---- ---- ---- ---- ---- ---- ---- ----- ------- ------ ------- ----- -----
READ 1 256 10 5 0 0 010.3 000.130 0004.9 0.01625 0 7
10ms = 0
20ms = 4
30ms = 192
40ms = 4
50ms = 0
60ms = 0
70ms = 0
80ms = 0
90ms = 0
100ms = 0
200ms = 0
300ms = 0
400ms = 0
500ms = 0
600ms = 0
700ms = 0
800ms = 0
900ms = 0
1000ms = 0
>1s = 0
all_ios = 200
min = 0.012
max = 0.036

[root@pooperscooper var]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
25395 0 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 18 0% 0 0% 12 0% 42 0% 0 0%
read write create mkdir symlink mknod
12800 99% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 2 0% 0 0%
fsstat fsinfo pathconf commit
2 0% 2 0% 0 0% 0 0%

[root@pooperscooper var]#

*******************************************************************************
Bad performance
******************************************************************************

[root@pooperscooper root]# uname -a

Linux pooperscooper 2.4.21 #1 SMP Tue Jun 24 08:40:50 EDT 2003 in

[root@pooperscooper root]# cd /var

[root@pooperscooper var]# !mount
mount -t nfs -o rsize=8192,wsize=8192 lws02:/var/data /var/data

[root@pooperscooper var]# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0
/proc /proc proc rw 0 0
/dev/md1 /boot ext3 rw 0 0
/dev/md7 /var ext3 rw 0 0
none /dev/pts devpts rw 0 0
none /dev/shm tmpfs rw 0 0
/dev/md10 /.metadata ext3 rw,noatime 0 0
lws02:/var/data /var/data nfs rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=lw0

[root@pooperscooper var]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
3 0 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 1 33% 0 0% 0 0% 0 0% 0 0%
read write create mkdir symlink mknod
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
fsstat fsinfo pathconf commit
1 33% 1 33% 0 0% 0 0%

[root@pooperscooper var]# time cat data/fifty.dat > /dev/null

real 0m10.183s
user 0m0.000s
sys 0m0.030s

[root@pooperscooper var]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
6412 304 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 4 0% 0 0% 1 0% 4 0% 0 0%
read write create mkdir symlink mknod
6400 99% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 1 0% 0 0%
fsstat fsinfo pathconf commit
1 0% 1 0% 0 0% 0 0%

[root@pooperscooper var]# !umount
umount /var/data

[root@pooperscooper var]# !mount
mount -t nfs -o rsize=8192,wsize=8192 lws02:/var/data /var/data

[root@pooperscooper var]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
6415 304 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 5 0% 0 0% 1 0% 4 0% 0 0%
read write create mkdir symlink mknod
6400 99% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 1 0% 0 0%
fsstat fsinfo pathconf commit
2 0% 2 0% 0 0% 0 0%

[root@pooperscooper var]# /usr/local/bin/workq -r -t1 -f10 -S5M -s256k -dv

Copyright (c) 2000-2003, Storigen Systems, Inc.
All rights reserved.

OP THDS IOSZ FILS FSIZ BOFF FOFF BWDTH CPUT CLOCKT %CPUMbs CSWIT FAULT
---- ---- ---- ---- ---- ---- ---- ----- ------- ------ ------- ----- -----
READ 1 256 10 5 0 0 004.6 000.100 0011.0 0.01250 0 7
10ms = 0
20ms = 61
30ms = 4
40ms = 0
50ms = 29
60ms = 24
70ms = 0
80ms = 20
90ms = 56
100ms = 4
200ms = 1
300ms = 0
400ms = 0
500ms = 1
600ms = 0
700ms = 0
800ms = 0
900ms = 0
1000ms = 0
>1s = 0
all_ios = 200
min = 0.012
max = 0.46

[root@pooperscooper var]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
12847 644 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 5 0% 0 0% 11 0% 25 0% 0 0%
read write create mkdir symlink mknod
12800 99% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 2 0% 0 0%
fsstat fsinfo pathconf commit
2 0% 2 0% 0 0% 0 0%