2005-12-20 15:38:46

by Kenny Simpson

[permalink] [raw]
Subject: NFS TCP settings

Hello,
Why does the Linux NFS client force the rsize ans wsize to be a power o=
f 2, and why is the TCP
window size equal to the rsize or wsize?

I am seeing some unfortunate behavior in the NFS traffix patterns I bel=
ieve to be associated
with the interaction of TCP window size and rsize/wsize.

I have a dedicated connection to a NetApp via GbE crossover cable to el=
iminate any switching
effects, and for the client I have a dual P4 machine running 2.6.15-rc5 +=
nfs_all patch using an
e1000 NIC. Jumbo frames (8160) are set on both client and server,=20

I max out the TCP window size and xfer size on the NetApp to 64k, and set=
the client to mount with
rsize=3Dwsize=3D1M. After the mount, I see /proc/mounts showing the rsiz=
e and wsize to be 64k -
ethereal shows me the NetApp provided this info during the mount (and I f=
ollowed along in the
source).

With a xfer size =3D=3D TCP window size, the client must always do an e=
xtra round trip to send the
data. The total payload is xfer size + RPC overhead, right? I see in et=
hereal that the client
fills the TCP window, waits for an ACK, then sends the final packet.
My understanding of RPC is that the server must wait for the final byte=
of the message before it
can process the request. Therefore, since the window size is just a litt=
le too small, we get an
extra round trip delay penalty.
Next I tried dropping the xfer size on the NetApp down to 60k (61440), =
but this made the client
drop its block sizes to 32k. From the source (inode.c) I see the rsize a=
nd wsize are adjusted by
nfs_block_size, which forces a power of 2.
Is there really a need for this to be a power of 2?

This change does help the write behavior, as now the TCP window is not =
filled, and the extra
round trip is eliminated.

For read behavior, I see the Linux client setting the TCP window size t=
o be the same as the
rsize - hitting the xfer size =3D=3D TCP window size behavior described a=
bove (i.e. an extra round
trip penalty due to filling the TCP window).

Are these easily changable? Am I just missing something?

-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around=20
http://mail.yahoo.com=20


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log fi=
les
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-12-20 15:47:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS TCP settings

On Tue, 2005-12-20 at 07:38 -0800, Kenny Simpson wrote:
> Hello,
> Why does the Linux NFS client force the rsize ans wsize to be a power of 2, and why is the TCP
> window size equal to the rsize or wsize?

The NFS/RPC code has no direct control over the TCP window size. That
would be an issue for the networking people.

As for the issue of forcing the rsize/wsize to be a power of two, then 2
reasons come to mind:

1) We want to make it easy to match PAGE_CACHE_SIZE and read/write
request sizes.
2) server block sizes are usually powers of two.

Cheers,
Trond



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-12-21 16:33:01

by Kenny Simpson

[permalink] [raw]
Subject: Re: NFS TCP settings

--- Trond Myklebust <[email protected]> wrote:
> The NFS/RPC code has no direct control over the TCP window size. That
> would be an issue for the networking people.

I have tracked this a bit further, and am reporting back for completeness=
..

The version of OnTap we are using (6.4.1p1) does not seem to support RFC =
1323 (large TCP windows),
but the Linux client does. The window scaling is communicated in the ini=
tial connection (SYN), so
if ethereal does not catch that, it will incorrectly report the window si=
ze as the unscaled number
- making the Linux client appear to be using a very small window.
The reason it appeared that the small window size was being honored is a =
result of the direct
crossover cable making the BDP (bandwidth-delay product) very small.

As for the server side, because large TCP windows are not supported, the =
best it can do is 64k -
which it does. Running with GbE and jumbo packets fills the window quite=
fast obviating the need
for RFC 1323.

Thanks to those who took the time to swing a clue stick in my direction.

-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around=20
http://mail.yahoo.com=20


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log fi=
les
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-12-21 19:02:43

by Dan Stromberg

[permalink] [raw]
Subject: Re: NFS TCP settings

On Wed, 2005-12-21 at 08:32 -0800, Kenny Simpson wrote:
> --- Trond Myklebust <[email protected]> wrote:
> > The NFS/RPC code has no direct control over the TCP window size. That
> > would be an issue for the networking people.
>
> I have tracked this a bit further, and am reporting back for completeness..
>
> The version of OnTap we are using (6.4.1p1) does not seem to support RFC 1323 (large TCP windows),
> but the Linux client does. The window scaling is communicated in the initial connection (SYN), so
> if ethereal does not catch that, it will incorrectly report the window size as the unscaled number
> - making the Linux client appear to be using a very small window.
> The reason it appeared that the small window size was being honored is a result of the direct
> crossover cable making the BDP (bandwidth-delay product) very small.
>
> As for the server side, because large TCP windows are not supported, the best it can do is 64k -
> which it does. Running with GbE and jumbo packets fills the window quite fast obviating the need
> for RFC 1323.
>
> Thanks to those who took the time to swing a clue stick in my direction.

Speaking of TCP Window Scaling... :)

I have some connection between some pairs of hosts for which all of the
following appear to be true:

1) TCP Window Scaling is enabled on both endpoints
2) TCP Window Scaling is mentioned in the initial SYN and SYN/ACK
3) tcptrace thinks that TCP Window Scaling isn't happening

Is there something other than the SYN and SYN/ACK that might cause TCP
Window Scaling to go away?

(Other than SACK that is - we have that enabled - at least according to
tcptrace)

Thanks!




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-12-21 22:06:05

by Jerome Warnier

[permalink] [raw]
Subject: Re: NFS TCP settings

Le mercredi 21 d=E9cembre 2005 =E0 08:32 -0800, Kenny Simpson a =E9crit :
> --- Trond Myklebust <[email protected]> wrote:
> > The NFS/RPC code has no direct control over the TCP window size. That
> > would be an issue for the networking people.
>=20
> I have tracked this a bit further, and am reporting back for completeness=
..
>=20
> The version of OnTap we are using (6.4.1p1) does not seem to support RFC =
1323 (large TCP windows),
> but the Linux client does. The window scaling is communicated in the ini=
tial connection (SYN), so
> if ethereal does not catch that, it will incorrectly report the window si=
ze as the unscaled number
> - making the Linux client appear to be using a very small window.
> The reason it appeared that the small window size was being honored is a =
result of the direct
> crossover cable making the BDP (bandwidth-delay product) very small.
>=20
> As for the server side, because large TCP windows are not supported, the =
best it can do is 64k -
> which it does. Running with GbE and jumbo packets fills the window quite=
fast obviating the need
> for RFC 1323.
I'm experiencing big performance trouble with NFS on a GbE NIC (Intel
eepro1000) and a 2.6.8 kernel, to the point of NFS being unusable, and
having to force NIC to 100Mbs. The clients are 10/100 Mbs.
I've made no particular tuning on the server side for the GbE. Could
someone help me to get the most out of this GbE NIC?

Thanks

> Thanks to those who took the time to swing a clue stick in my direction.
>=20
> -Kenny




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-12-22 22:41:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS TCP settings

On Wed, 2005-12-21 at 11:02 -0800, Dan Stromberg wrote:

> I have some connection between some pairs of hosts for which all of the
> following appear to be true:
>
> 1) TCP Window Scaling is enabled on both endpoints
> 2) TCP Window Scaling is mentioned in the initial SYN and SYN/ACK
> 3) tcptrace thinks that TCP Window Scaling isn't happening
>
> Is there something other than the SYN and SYN/ACK that might cause TCP
> Window Scaling to go away?

Routers/switches that don't support TCP window scaling are a typical
cause.

Cheers,
Trond



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs