LinuxLists.cc - Re: 2.6.30-rc deadline scheduler performance regression for iozone over NFS

2009-05-13 16:42:10

Subject: Re: 2.6.30-rc deadline scheduler performance regression for iozone over NFS

On Wed, 13 May 2009 12:20:57 -0400 Olga Kornievskaia <[email protected]> wrote:

> I believe what you are seeing is how well TCP autotuning performs.
> What old NFS code was doing is disabling autotuning and instead using
> #nfsd thread to scale TCP recv window. You are providing an example of
> where setting TCP buffer sizes outperforms TCP autotuning. While this
> is a valid example, there is also an alternative example of where old
> NFS design hurts performance.

<scratches head>

Jeff's computer got slower. Can we fix that?

2009-05-13 18:16:42

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: 2.6.30-rc deadline scheduler performance regression for iozone over NFS

On Wed, May 13, 2009 at 12:32 PM, Andrew Morton
<[email protected]> wrote:
> On Wed, 13 May 2009 12:20:57 -0400 Olga Kornievskaia <[email protected]=
=2Eedu> wrote:
>
>> I believe what you are seeing is how well TCP autotuning performs.
>> What old NFS code was doing is disabling autotuning and instead usin=
g
>> #nfsd thread to scale TCP recv window. You are providing an example =
of
>> where setting TCP buffer sizes outperforms TCP autotuning. While thi=
s
>> is a valid example, there is also an alternative example of where ol=
d
>> NFS design hurts performance.
>
> <scratches head>
>
> Jeff's computer got slower. =A0Can we fix that?

We realize that decrease performance is a problem and understand that
reverting the patch might be the appropriate course of action!

But we are curious why this is happening. Jeff if it's not too much tro=
uble
could you generate tcpdumps for both cases. We are curious what are
the max window sizes in both cases? Also could you give us your tcp and
network sysctl values for the testing environment (both client and serv=
er
values) that you can get with "sysctl -a | grep tcp" and also
" | grep net.core".

Poor performance using TCP autotuning can be demonstrated outside
of NFS but using Iperf. It can be shown that iperf will work better if =
"-w"
flag is used. When this flag is set, Iperf calls setsockopt() call whic=
h in
the kernel turns off autotuning.

As for fixing this it would be great if we could get some help from the
TCP kernel folks?

Another thing I should mention is that the proposed NFS patch does
reach into the TCP buffers because we need to make sure the recv buffer
is big enough to receive an RPC. To use autotuning NFS would
have to rely on the system-wide sysctl values. One way to ensure
that an RPC would fit is to then increase system-wide default TCP recv
buffer but then all connection would be using value. We thought that
instead of imposing such requirement we internally set the buffer
size big enough.

2009-05-13 18:34:04

by Jim Rees

[permalink] [raw]

Subject: Re: 2.6.30-rc deadline scheduler performance regression for iozone over NFS

Andrew Morton wrote:

Jeff's computer got slower. Can we fix that?

TCP autotuning can reduce performance by up to about 10% in some cases.
Jeff found one of these cases. While the autotuning penalty never exceeds
10% as far as I know, I can provide examples of other cases where autotuning
improves nfsd performance by more than a factor of 100.

The right thing is to fix autotuning. If autotuning is considered too
broken to use, it should be turned off everywhere, not just in nfsd, as it
hurts/benefits all TCP clients, not just nfs.

This topic has been discussed before on netdev:
http://www.spinics.net/lists/netdev/msg68650.html
http://www.spinics.net/lists/netdev/msg68155.html

2009-05-13 19:45:38

by Trond Myklebust

[permalink] [raw]

Subject: Re: 2.6.30-rc deadline scheduler performance regression for iozone over NFS

On Wed, 2009-05-13 at 14:25 -0400, Jim Rees wrote:
> Andrew Morton wrote:
>
> Jeff's computer got slower. Can we fix that?
>
> TCP autotuning can reduce performance by up to about 10% in some cases.
> Jeff found one of these cases. While the autotuning penalty never exceeds
> 10% as far as I know, I can provide examples of other cases where autotuning
> improves nfsd performance by more than a factor of 100.
>
> The right thing is to fix autotuning. If autotuning is considered too
> broken to use, it should be turned off everywhere, not just in nfsd, as it
> hurts/benefits all TCP clients, not just nfs.
>
> This topic has been discussed before on netdev:
> http://www.spinics.net/lists/netdev/msg68650.html
> http://www.spinics.net/lists/netdev/msg68155.html

Yes, but one consequence of this patch is that the socket send buffer
size sk->sk_sndbuf is now initialised to a smaller value than before.

This again means that the test for xprt->xpt_ops->xpo_has_wspace() in
svc_xprt_enqueue() will fail more often, and so you will be able to
process fewer incoming requests in parallel while you are waiting for
the send window size to build up.

Perhaps the right thing to do here is to allow some limited violation of
the xpo_has_wspace() test while the send window is in the process of
building up?

Cheers
Trond