2007-12-10 22:53:33

by Fredrik Lindgren

[permalink] [raw]
Subject: Client performance questions

Hello

We have a mail-application running on a set of Linux machines using NFS
for the storage. Recently iowait on the machines has started to become
a problem, and it seems that they can't quite keep up. Iowait figures of
50% or above are not uncommon during peak hours.

What I'd like to know if there is something we could do to makes things
run smoother or if we've hit some performance cap and adding more machines
is the best answer.

>From what we can tell the NFS server doesn't seem to be the bottleneck.
The performance metrics say it's fine and we've run tests from other clients
during both on and off hours seeing almost the same results regardless.
The server(s) is a BlueArc cluster.

The clients are quad 2,2Ghz Opteron machines running Linux kernel 2.6.18.3,
except one which is on 2.6.23.9 since today.

Mount options on the clients are as follows:
bg,intr,timeo=600,retrans=2,vers=3,proto=tcp,rsize=32768,wsize=32768

MTU is 9000 bytes and they're all in the same Gigabit Ethernet switch along
with the NFS server.

Each client seems to be doing somewhere around 3500 NFS ops/s during peak hours.
Average read/write size seems to be around 16kb, although these operations make
up just ~30% of the activity.

This is from the 2.6.23.9 client:
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 11020402 20% 2823881 5% 7708643 14% 13259044 24% 20 0%
read write create mkdir symlink mknod
8693411 16% 6750099 12% 3107 0% 120 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
1729 0% 0 0% 1558 0% 0 0% 7 0% 2738003 5%
fsstat fsinfo pathconf commit
74550 0% 40 0% 0 0% 0 0%

This is from a 2.6.18.3 one:
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 2147483647 23% 495517229 5% 1234824013 13% 2147483647 23% 22972 0%
read write create mkdir symlink mknod
1505525496 16% 1095925729 12% 492815 0% 14863 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
206499 0% 67 0% 273202 0% 0 0% 324 0% 447735359 4%
fsstat fsinfo pathconf commit
31254030 0% 18 0% 0 0% 0 0%

10:37:03 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
10:37:08 PM all 15.72 0.00 9.68 57.49 0.15 2.15 14.82 7671.80
10:37:08 PM 0 16.40 0.00 8.20 61.60 0.00 1.80 12.00 1736.40
10:37:08 PM 1 13.80 0.00 9.60 51.40 0.20 2.00 23.00 1503.00
10:37:08 PM 2 17.40 0.00 10.20 63.40 0.20 2.60 6.20 2424.00
10:37:08 PM 3 15.20 0.00 10.60 53.80 0.20 2.40 18.20 2008.00

Is this the level of performance that could be expected from these machines?
Any suggestions on what to change to squeeze some more performance from them?

Regards,
Fredrik Lindgren







2007-12-10 23:10:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: Client performance questions


On Mon, 2007-12-10 at 22:52 +0100, Fredrik Lindgren wrote:
> Hello
>
> We have a mail-application running on a set of Linux machines using NFS
> for the storage. Recently iowait on the machines has started to become
> a problem, and it seems that they can't quite keep up. Iowait figures of
> 50% or above are not uncommon during peak hours.

Have you tried increasing the value
in /proc/sys/fs/nfs/nfs_congestion_kb ?

Cheers
Trond


2007-12-10 23:26:17

by Jeff Layton

[permalink] [raw]
Subject: Re: Client performance questions

On Mon, 10 Dec 2007 22:52:51 +0100
"Fredrik Lindgren" <[email protected]> wrote:

> Hello
>
> We have a mail-application running on a set of Linux machines using
> NFS for the storage. Recently iowait on the machines has started to
> become a problem, and it seems that they can't quite keep up. Iowait
> figures of 50% or above are not uncommon during peak hours.
>

High iowait numbers are not a problem in and of themselves. A high
iowait number is just indicative that the machine is spending a lot of
its time waiting for I/O. On a busy NFS client, that may be expected
particularly if the machine isn't doing much else CPU-wise.

The big question is whether you're getting enough throughput for your
applications. iowait percentages can't tell you that. That's not to say
that you don't have a performance issue, but you'll probably need to
get some better metrics than iowait figures to track it down.

--
Jeff Layton <[email protected]>

2007-12-11 15:37:11

by Aaron Wiebe

[permalink] [raw]
Subject: Re: Client performance questions

Greetings - we also run mail on NFS.

On Dec 10, 2007 4:52 PM, Fredrik Lindgren <[email protected]> wrote:

> Mount options on the clients are as follows:
> bg,intr,timeo=600,retrans=2,vers=3,proto=tcp,rsize=32768,wsize=32768
>
> MTU is 9000 bytes and they're all in the same Gigabit Ethernet switch along
> with the NFS server.
>
> Each client seems to be doing somewhere around 3500 NFS ops/s during peak hours.
> Average read/write size seems to be around 16kb, although these operations make
> up just ~30% of the activity.

This suggestion may get some controversy here - however it is what we
have done and its made a difference.

Try switching to UDP. We've found that with mail applications, a
large amount of the transactions are small, and the windowing of TCP
actually slows things down. This is what we use for mount options:

rw,v3,rsize=8192,wsize=8192,acregmin=15,acregmax=120,acdirmin=45,acdirmax=120,hard,lock,proto=udp

With a 9000 byte jumbo frame, this will restrict every datagram into a
single frame - also, you may want to play with sysctl
sunrpc.udp_slot_table_entries. We keep this at 32., but it can go
right up to 128.

In our case, our issues with this config are nearly exclusively in
keeping our server side storage fast and responsive.

-Aaron

2007-12-11 17:05:08

by Chuck Lever III

[permalink] [raw]
Subject: Re: Client performance questions

On Dec 11, 2007, at 10:37 AM, Aaron Wiebe wrote:
> Greetings - we also run mail on NFS.
>
> On Dec 10, 2007 4:52 PM, Fredrik Lindgren <[email protected]> wrote:
>
>> Mount options on the clients are as follows:
>> bg,intr,timeo=600,retrans=2,vers=3,proto=tcp,rsize=32768,wsize=32768
>>
>> MTU is 9000 bytes and they're all in the same Gigabit Ethernet
>> switch along
>> with the NFS server.
>>
>> Each client seems to be doing somewhere around 3500 NFS ops/s
>> during peak hours.
>> Average read/write size seems to be around 16kb, although these
>> operations make
>> up just ~30% of the activity.
>
> This suggestion may get some controversy here - however it is what we
> have done and its made a difference.

Although TCP is generally our recommendation, your settings are
reasonable. I would complain only if you were using "soft" ! :-)

> Try switching to UDP. We've found that with mail applications, a
> large amount of the transactions are small, and the windowing of TCP
> actually slows things down.

If windowing is a problem, have you tried boosting the default size
of the socket send and receive buffers on both ends?

> This is what we use for mount options:
>
> rw,v3,rsize=8192,wsize=8192,acregmin=15,acregmax=120,acdirmin=45,acdir
> max=120,hard,lock,proto=udp
>
> With a 9000 byte jumbo frame, this will restrict every datagram into a
> single frame

Disclaimer: kids, if you try this at home, it will work well as long
as the network links between your client and server all run at the
same speed (ie. it will not work well if you have for example 100Mb
connections from client to switch and GbE from switch to server).

> - also, you may want to play with sysctl
> sunrpc.udp_slot_table_entries. We keep this at 32., but it can go
> right up to 128.
>
> In our case, our issues with this config are nearly exclusively in
> keeping our server side storage fast and responsive.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2007-12-11 17:11:56

by Aaron Wiebe

[permalink] [raw]
Subject: Re: Client performance questions

Afternoon..

On Dec 11, 2007 12:03 PM, Chuck Lever <[email protected]> wrote:
>
> If windowing is a problem, have you tried boosting the default size
> of the socket send and receive buffers on both ends?

I believe we tweaked that on the client side at one point, but our
issues with the current settings puts most of the problems on the
storage side at present. We've shifted our concentration mostly
towards getting our storage itself performing better, rather than
getting client throughput higher so we can crush our storage even
more. But since these things tend to be cyclical, I'll keep it in
mind for when our bottleneck shifts back again. :)

-Aaron