2005-06-28 15:30:23

by Anthony Howe

[permalink] [raw]
Subject: slow file opens on nfs mount across high-latency network

I have searched through the mail lists and google and
have not found material describing the nfs file open
threshold effect that I am experiencing.

I have been experimenting with opening files on NFS
mounts over varying network latencies. I notice that
there seems to be a threshold on the number of
concurrent nfs file opens as network latency
increases. Up to and including the threshold, nfs
file open performance is fine. After the threshold of
concurrent opens, performance degrades at a linear
rate.

For example, the graph in the attached files shows
this threshold effect for various network latencies:
- 0ms network latency - no max limit
- 20ms network latency - 40 maximum concurrent opens
- 40ms network latency - 20 maximum concurrent opens
- 80ms network latency - 15 maximum concurrent opens
- 120ms network latency - 5 maximum concurrent opens

What would cause this "hockey stick" threshold effect
shown in the attached file?
Are there any settings that would change this effect?

Here are the stats of my experiment:
- testing using Redhat Enterprise AS servers V3
connected via a 100Mbps switch
- inserting latency with nist net
- experiment process spawns X number of threads set to
each open a file on an NFS mount, the time taken for
each file open is recorded
- adjusting rsize, wsize does not affect "hockey
stick" threshold effect
- adjusting /proc/sys/net/core/rmem* does not affect
"hockey stick" threshold effect
- adjusting number of nfsd processes does not affect
"hockey stick" threshold effect

Thanks in advance for any tips or directions where I
can look for more information on this topic.

Regards,

Anthony Howe

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


Attachments:
nfsperformance.pdf (36.72 kB)
91626793-nfsperformance.pdf

2005-06-28 16:53:34

by Anthony Howe

[permalink] [raw]
Subject: RE: slow file opens on nfs mount across high-latency network

I probably should have provided this link to the pdf
document instead of the attachment:

http://ahowe_ca.tripod.com/nfsperformance.pdf

This graph show the hockey stick effect that I
described in the below post.

Anthony

> I have searched through the mail lists and google
and
> have not found material describing the nfs file open
> threshold effect that I am experiencing. =20
>
> I have been experimenting with opening files on NFS
> mounts over varying network latencies. I notice
that
> there seems to be a threshold on the number of
> concurrent nfs file opens as network latency
> increases. Up to and including the threshold, nfs
> file open performance is fine. After the threshold
of
> concurrent opens, performance degrades at a linear
> rate.
>=20
> For example, the graph in the attached files shows
> this threshold effect for various network latencies:
> - 0ms network latency - no max limit
> - 20ms network latency - 40 maximum concurrent opens
> - 40ms network latency - 20 maximum concurrent opens
> - 80ms network latency - 15 maximum concurrent opens
> - 120ms network latency - 5 maximum concurrent opens
>=20
> What would cause this "hockey stick" threshold
effect
> shown in the attached file?
> Are there any settings that would change this
effect?
>=20
> Here are the stats of my experiment:
> - testing using Redhat Enterprise AS servers V3
> connected via a 100Mbps switch
> - inserting latency with nist net
> - experiment process spawns X number of threads set
to
> each open a file on an NFS mount, the time taken for
> each file open is recorded
> - adjusting rsize, wsize does not affect "hockey
> stick" threshold effect
> - adjusting /proc/sys/net/core/rmem* does not affect
> "hockey stick" threshold effect
> - adjusting number of nfsd processes does not affect

> "hockey stick" threshold effect
>=20
> Thanks in advance for any tips or directions where I
> can look for more information on this topic.
>=20
> Regards,
>=20
> Anthony Howe

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around=20
http://mail.yahoo.com=20


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=3D7477&alloc_id=3D16492&op=3Dclic=
k
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-06-29 12:04:58

by Neil Horman

[permalink] [raw]
Subject: Re: slow file opens on nfs mount across high-latency network

On Tue, Jun 28, 2005 at 11:30:17AM -0400, Anthony Howe wrote:
> I have searched through the mail lists and google and
> have not found material describing the nfs file open
> threshold effect that I am experiencing.
>
> I have been experimenting with opening files on NFS
> mounts over varying network latencies. I notice that
> there seems to be a threshold on the number of
> concurrent nfs file opens as network latency
> increases. Up to and including the threshold, nfs
> file open performance is fine. After the threshold of
> concurrent opens, performance degrades at a linear
> rate.
>
> For example, the graph in the attached files shows
> this threshold effect for various network latencies:
> - 0ms network latency - no max limit
> - 20ms network latency - 40 maximum concurrent opens
> - 40ms network latency - 20 maximum concurrent opens
> - 80ms network latency - 15 maximum concurrent opens
> - 120ms network latency - 5 maximum concurrent opens
>
> What would cause this "hockey stick" threshold effect
> shown in the attached file?
> Are there any settings that would change this effect?
>
> Here are the stats of my experiment:
> - testing using Redhat Enterprise AS servers V3
> connected via a 100Mbps switch
> - inserting latency with nist net
> - experiment process spawns X number of threads set to
> each open a file on an NFS mount, the time taken for
> each file open is recorded
> - adjusting rsize, wsize does not affect "hockey
> stick" threshold effect
> - adjusting /proc/sys/net/core/rmem* does not affect
> "hockey stick" threshold effect
> - adjusting number of nfsd processes does not affect
> "hockey stick" threshold effect
>
> Thanks in advance for any tips or directions where I
> can look for more information on this topic.
>
> Regards,
>
> Anthony Howe
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com


do you have tcpdumps taken at all these data points? It kind of sounds to me
like your getting a geared slowdown from additional congestion caused by rpc
retransmits (i.e., on a high latency link, you'll wind up with more rpc
retransmits, resulting in more congestion, resulting in more lost packets,
resulting in more retransmits, etc.).

Regards
Neil


--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-06-29 12:36:58

by Peter Staubach

[permalink] [raw]
Subject: Re: slow file opens on nfs mount across high-latency network

Neil Horman wrote:

>On Tue, Jun 28, 2005 at 11:30:17AM -0400, Anthony Howe wrote:
>
>
>>I have searched through the mail lists and google and
>>have not found material describing the nfs file open
>>threshold effect that I am experiencing.
>>
>>I have been experimenting with opening files on NFS
>>mounts over varying network latencies. I notice that
>>there seems to be a threshold on the number of
>>concurrent nfs file opens as network latency
>>increases. Up to and including the threshold, nfs
>>file open performance is fine. After the threshold of
>>concurrent opens, performance degrades at a linear
>>rate.
>>
>>For example, the graph in the attached files shows
>>this threshold effect for various network latencies:
>>- 0ms network latency - no max limit
>>- 20ms network latency - 40 maximum concurrent opens
>>- 40ms network latency - 20 maximum concurrent opens
>>- 80ms network latency - 15 maximum concurrent opens
>>- 120ms network latency - 5 maximum concurrent opens
>>
>>What would cause this "hockey stick" threshold effect
>>shown in the attached file?
>>Are there any settings that would change this effect?
>>
>>Here are the stats of my experiment:
>>- testing using Redhat Enterprise AS servers V3
>>connected via a 100Mbps switch
>>- inserting latency with nist net
>>- experiment process spawns X number of threads set to
>>each open a file on an NFS mount, the time taken for
>>each file open is recorded
>>- adjusting rsize, wsize does not affect "hockey
>>stick" threshold effect
>>- adjusting /proc/sys/net/core/rmem* does not affect
>>"hockey stick" threshold effect
>>- adjusting number of nfsd processes does not affect
>>"hockey stick" threshold effect
>>
>>Thanks in advance for any tips or directions where I
>>can look for more information on this topic.
>>
>>Regards,
>>
>>Anthony Howe
>>
>>__________________________________________________
>>Do You Yahoo!?
>>Tired of spam? Yahoo! Mail has the best spam protection around
>>http://mail.yahoo.com
>>
>>
>
>
>do you have tcpdumps taken at all these data points? It kind of sounds to me
>like your getting a geared slowdown from additional congestion caused by rpc
>retransmits (i.e., on a high latency link, you'll wind up with more rpc
>retransmits, resulting in more congestion, resulting in more lost packets,
>resulting in more retransmits, etc.).
>
Some questions --

These are mounts done using the NFSv3 protocol? Over TCP or over UDP?

Each thread is opening an independent file? In independent directories
or in a common directory?

Thanx...

ps


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-06-29 13:43:57

by Anthony Howe

[permalink] [raw]
Subject: RE: slow file opens on nfs mount across high-latency network

>Neil Horman wrote:
>
>>On Tue, Jun 28, 2005 at 11:30:17AM -0400, Anthony Howe wrote:
>>
>>
>>>I have searched through the mail lists and google and
>>>have not found material describing the nfs file open threshold effect
>>>that I am experiencing.
>>>
>>>I have been experimenting with opening files on NFS
>>>mounts over varying network latencies. I notice that
>>>there seems to be a threshold on the number of
>>>concurrent nfs file opens as network latency
>>>increases. Up to and including the threshold, nfs
>>>file open performance is fine. After the threshold of concurrent
>>>opens, performance degrades at a linear rate.
>>>
>>>For example, the graph in the attached files shows
>>>this threshold effect for various network latencies:
>>>- 0ms network latency - no max limit
>>>- 20ms network latency - 40 maximum concurrent opens
>>>- 40ms network latency - 20 maximum concurrent opens
>>>- 80ms network latency - 15 maximum concurrent opens
>>>- 120ms network latency - 5 maximum concurrent opens
>>>
>>>What would cause this "hockey stick" threshold effect
>>>shown in the attached file?
>>>Are there any settings that would change this effect?
>>>
>>>Here are the stats of my experiment:
>>>- testing using Redhat Enterprise AS servers V3
>>>connected via a 100Mbps switch
>>>- inserting latency with nist net
>>>- experiment process spawns X number of threads set to
>>>each open a file on an NFS mount, the time taken for
>>>each file open is recorded
>>>- adjusting rsize, wsize does not affect "hockey
>>>stick" threshold effect
>>>- adjusting /proc/sys/net/core/rmem* does not affect
>>>"hockey stick" threshold effect
>>>- adjusting number of nfsd processes does not affect
>>>"hockey stick" threshold effect
>>>
>>>Thanks in advance for any tips or directions where I
>>>can look for more information on this topic.
>>>
>>>Regards,
>>>
>>>Anthony Howe
>>>
>>>__________________________________________________
>>>Do You Yahoo!?
>>>Tired of spam? Yahoo! Mail has the best spam protection around
>>>http://mail.yahoo.com
>>>
>>>
>>
>>
>>do you have tcpdumps taken at all these data points? It kind of sounds
>>to me like your getting a geared slowdown from additional congestion
>>caused by rpc retransmits (i.e., on a high latency link, you'll wind up
>>with more rpc retransmits, resulting in more congestion, resulting in
>>more lost packets, resulting in more retransmits, etc.).
>>
>Some questions --
>
>These are mounts done using the NFSv3 protocol? Over TCP or over UDP?
>
>Each thread is opening an independent file? In independent directories or
in a common directory?
>
> Thanx...
>
> ps

We are using the NFSv3 protocol over TCP. Here the options for both the
client and the server:

- client options "rw, noexec, nosuid, nodev, noatime, hard, intr,tcp"

- server options "rw, aysnc, wdelay, all_squash, root_squash,anonuid=500,
anongid=500"

Each thread is opening an independent file from the same common nfs
directory. So 100 threads will mean 100 separate files will be created.

(I am going to do some tcpdumps to see if there are a buildup of rpc
retransmits)

Thanks,

Anthony



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-06-29 13:57:00

by Trond Myklebust

[permalink] [raw]
Subject: Re: slow file opens on nfs mount across high-latency network

ty den 28.06.2005 Klokka 11:30 (-0400) skreiv Anthony Howe:

> What would cause this "hockey stick" threshold effect
> shown in the attached file?

Have you cross-checked your graph with the figures from "nfsstat"?

Cheers,
Trond



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-07-06 19:16:56

by Anthony Howe

[permalink] [raw]
Subject: RE: slow file opens on nfs mount across high-latency network

> Neil Horman wrote:
>=20
> >On Tue, Jun 28, 2005 at 11:30:17AM -0400, Anthony Howe wrote:
> > =20
> >
> >>I have searched through the mail lists and google and
> >>have not found material describing the nfs file open=20
> threshold effect=20
> >>that I am experiencing.
> >>
> >>I have been experimenting with opening files on NFS
> >>mounts over varying network latencies. I notice that
> >>there seems to be a threshold on the number of
> >>concurrent nfs file opens as network latency
> >>increases. Up to and including the threshold, nfs
> >>file open performance is fine. After the threshold of concurrent=20
> >>opens, performance degrades at a linear rate.
> >>
> >>For example, the graph in the attached files shows
> >>this threshold effect for various network latencies:
> >>- 0ms network latency - no max limit
> >>- 20ms network latency - 40 maximum concurrent opens
> >>- 40ms network latency - 20 maximum concurrent opens
> >>- 80ms network latency - 15 maximum concurrent opens
> >>- 120ms network latency - 5 maximum concurrent opens
> >>
> >>What would cause this "hockey stick" threshold effect
> >>shown in the attached file?
> >>Are there any settings that would change this effect?
> >>
> >>Here are the stats of my experiment:
> >>- testing using Redhat Enterprise AS servers V3
> >>connected via a 100Mbps switch
> >>- inserting latency with nist net
> >>- experiment process spawns X number of threads set to
> >>each open a file on an NFS mount, the time taken for
> >>each file open is recorded
> >>- adjusting rsize, wsize does not affect "hockey
> >>stick" threshold effect
> >>- adjusting /proc/sys/net/core/rmem* does not affect
> >>"hockey stick" threshold effect
> >>- adjusting number of nfsd processes does not affect
> >>"hockey stick" threshold effect
> >>
> >>Thanks in advance for any tips or directions where I
> >>can look for more information on this topic.
> >>
> >>Regards,
> >>
> >>Anthony Howe
> >>
> >>__________________________________________________
> >>Do You Yahoo!?
> >>Tired of spam? Yahoo! Mail has the best spam protection around
> >>http://mail.yahoo.com=20
> >> =20
> >>
> >
> >
> >do you have tcpdumps taken at all these data points? It=20
> kind of sounds=20
> >to me like your getting a geared slowdown from additional congestion=20
> >caused by rpc retransmits (i.e., on a high latency link,=20
> you'll wind up=20
> >with more rpc retransmits, resulting in more congestion,=20
> resulting in=20
> >more lost packets, resulting in more retransmits, etc.).
> >
> Some questions --
>=20
> These are mounts done using the NFSv3 protocol? Over TCP or over UDP?
>=20
> Each thread is opening an independent file? In independent=20
> directories or in a common directory?
>=20
> Thanx...
>=20
> ps
>=20

I have found a solution (work around) to the latency problem that I
described in the above posting. The solution seems to be to add more =
mounts
to the same nfs share and spread the threads evenly among the mounts. =20

This solution is highlighted by the graph:
http://ahowe_ca.tripod.com/nfsperformance2.pdf. The graph shows the
performance of 40 threads each performing file opens across an nfs =
share.
Each thread repeats 10 times opening a file for write and closing it 10
times. The x-axis is various roundtrip network latencies. The y-axis =
is
the mean time to open a file across the nfs mount. Each line represents =
a
different number of mount points. The top line is one mount point, and =
40
threads concurrently calling file opens on this mount. The bottom line
represents 40 mounts to the same share, each thread with its own line. =
From
this graph I can come to the following conclusions:

1. Evenly distributing file access threads across many NFS mounts to the
same share reduces the effects of latency.

2. There seems to be resource limit per mount point. The higher the =
network
latency the faster this limit is reached. This explains the hockey =
stick
effect shown in graph: http://ahowe_ca.tripod.com/nfsperformance.pdf.

3. This resource limit is not related to the hard limit of 256 read or =
write
operations per mount point since we are only using 40 threads.

4. Removing all network latency eliminates this resource problem and 1 =
mount
or 40 mounts display the same performance.

Does anyone know what would be causing this resource limit? And is there =
any
way to improve it so that I don't have to add more mounts to reduce the
effects of latency?

Anthony





-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-07-06 19:21:59

by Lever, Charles

[permalink] [raw]
Subject: RE: slow file opens on nfs mount across high-latency network

> I have found a solution (work around) to the latency problem that I
> described in the above posting. The solution seems to be to=20
> add more mounts
> to the same nfs share and spread the threads evenly among the=20
> mounts. =20
>=20
> This solution is highlighted by the graph:
> http://ahowe_ca.tripod.com/nfsperformance2.pdf. The graph shows the
> performance of 40 threads each performing file opens across=20
> an nfs share.
> Each thread repeats 10 times opening a file for write and=20
> closing it 10
> times. The x-axis is various roundtrip network latencies. =20
> The y-axis is
> the mean time to open a file across the nfs mount. Each line=20
> represents a
> different number of mount points. The top line is one mount=20
> point, and 40
> threads concurrently calling file opens on this mount. The=20
> bottom line
> represents 40 mounts to the same share, each thread with its=20
> own line. From
> this graph I can come to the following conclusions:
>=20
> 1. Evenly distributing file access threads across many NFS=20
> mounts to the
> same share reduces the effects of latency.
>=20
> 2. There seems to be resource limit per mount point. The=20
> higher the network
> latency the faster this limit is reached. This explains the=20
> hockey stick
> effect shown in graph: http://ahowe_ca.tripod.com/nfsperformance.pdf.
>=20
> 3. This resource limit is not related to the hard limit of=20
> 256 read or write
> operations per mount point since we are only using 40 threads.
>=20
> 4. Removing all network latency eliminates this resource=20
> problem and 1 mount
> or 40 mounts display the same performance.
>=20
> Does anyone know what would be causing this resource limit?=20
> And is there any
> way to improve it so that I don't have to add more mounts to=20
> reduce the
> effects of latency?

you may be encountering the limit of 16 concurrent RPC requests per
mount point.

in 2.6 this is tunable: echo 64 >
/proc/sys/sunrpc/{udp,tcp}_slot_table_entries

then remount.

in 2.4, you'll have to rebuild the kernel after changing RPC_MAXCONG to
a larger number (like 64) in include/linux/sunrpc/xprt.h.


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs