2014-04-23 18:01:51

by Cedric Blancher

[permalink] [raw]
Subject: Tuning Linux NFSv4 for high latency connections?

Are there any options to improve the Linux NFSv4 performance over a
high latency connection?

We currently use Solaris/Illumos as NFSv4 server and client over a
cross continental Internet connection. Latency is terrible (~220ms)
but the counter this by running work in parallel so the latency is
mostly mitigated.

We now wish to migrate (short: Away from Oracle because support is
basically unbearable) to Linux (tested SuSE 13.1 and current Fedora)
and build times are 17 times (!!!) SLOWER than on Solaris/Illumos.

Are there any tunables besides actimeo=300?

Ced
--
Cedric Blancher <[email protected]>
Institute Pasteur


2014-04-24 03:12:26

by Jim Rees

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?

Cedric Blancher wrote:

Are there any options to improve the Linux NFSv4 performance over a
high latency connection?

We did some work along these lines at CITI years ago. As I remember, the
main thing was to increase net.ipv4.tcp_[rw]mem on the server side, because
tcp auto-tuning was being defeated. This may be less of an issue with your
work load, which sounds like many small files rather than one big one. In
theory, NFSv4 delegations should help, but I don't know how well that works.

2014-04-28 05:23:15

by Dean

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?



On 4/24/14, 10:22 AM, Cedric Blancher wrote:
> On 24 April 2014 05:12, Jim Rees <[email protected]> wrote:
>> Cedric Blancher wrote:
>>
>> Are there any options to improve the Linux NFSv4 performance over a
>> high latency connection?
>>
>> We did some work along these lines at CITI years ago. As I remember, the
>> main thing was to increase net.ipv4.tcp_[rw]mem on the server side, because
>> tcp auto-tuning was being defeated. This may be less of an issue with your
>> work load, which sounds like many small files rather than one big one. In
>> theory, NFSv4 delegations should help, but I don't know how well that works.

Along with Jim's work, we followed up with a fair bit, but in general we
found that nfs clients just can't do well over large rtt due to the slow
window ramp up time and adverse reaction to packet loss. Unfortunately
the only way to overcome these issues (other than using a custom udp
protocol which isn't supported) is to use multiple TCP connections,
which is what we do by using multiple nodes....

I have some basic instructions here on what we do in our environments:
http://researcher.watson.ibm.com/researcher/view_person_subpage.php?id=4427

Dean

2014-04-23 20:24:07

by Malahal Naineni

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?

Cedric Blancher [[email protected]] wrote:
> Are there any options to improve the Linux NFSv4 performance over a
> high latency connection?
>
> We currently use Solaris/Illumos as NFSv4 server and client over a
> cross continental Internet connection. Latency is terrible (~220ms)
> but the counter this by running work in parallel so the latency is
> mostly mitigated.
>
> We now wish to migrate (short: Away from Oracle because support is
> basically unbearable) to Linux (tested SuSE 13.1 and current Fedora)
> and build times are 17 times (!!!) SLOWER than on Solaris/Illumos.
>
> Are there any tunables besides actimeo=300?

rsize and wsize may help! You need to figure out if the read is the
issue or the write before you dig further.

Regards, Malahal.


2014-04-28 10:35:40

by Jim Rees

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?

Dean wrote:

On 4/24/14, 10:22 AM, Cedric Blancher wrote:
>On 24 April 2014 05:12, Jim Rees <[email protected]> wrote:
>>Cedric Blancher wrote:
>>
>> Are there any options to improve the Linux NFSv4 performance over a
>> high latency connection?
>>
>>We did some work along these lines at CITI years ago. As I remember, the
>>main thing was to increase net.ipv4.tcp_[rw]mem on the server side, because
>>tcp auto-tuning was being defeated. This may be less of an issue with your
>>work load, which sounds like many small files rather than one big one. In
>>theory, NFSv4 delegations should help, but I don't know how well that works.

Along with Jim's work, we followed up with a fair bit, but in general we
found that nfs clients just can't do well over large rtt due to the slow
window ramp up time and adverse reaction to packet loss. Unfortunately the
only way to overcome these issues (other than using a custom udp protocol
which isn't supported) is to use multiple TCP connections, which is what we
do by using multiple nodes....

Yeah, at the time I think reno was the default congestion, and you need
something with a faster rampup. I believe cubic is default now and it's
pretty good but still not good enough. Andy Adamson did some work too,
making the number of rpc slots dynamic, and I think that's in the kernel
now.

If you've got a very high speed network, like say 10Gb with >100 msec, you
may need to do some tuning in the ethernet driver, increasing ring buffer
sizes and so on. Your congestion window can grow to hundreds of MB in this
case.

And there's no getting around that nfs is fairly chatty.

2014-04-23 22:14:07

by Cedric Blancher

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?

On 23 April 2014 23:15, Malahal Naineni <[email protected]> wrote:
> Cedric Blancher [[email protected]] wrote:
>> On 23 April 2014 22:44, Malahal Naineni <[email protected]> wrote:
>> > Cedric Blancher [[email protected]] wrote:
>> >> On 23 April 2014 22:24, Malahal Naineni <[email protected]> wrote:
>> >> > Cedric Blancher [[email protected]] wrote:
>> >> >> Are there any options to improve the Linux NFSv4 performance over a
>> >> >> high latency connection?
>> >> >>
>> >> >> We currently use Solaris/Illumos as NFSv4 server and client over a
>> >> >> cross continental Internet connection. Latency is terrible (~220ms)
>> >> >> but the counter this by running work in parallel so the latency is
>> >> >> mostly mitigated.
>> >> >>
>> >> >> We now wish to migrate (short: Away from Oracle because support is
>> >> >> basically unbearable) to Linux (tested SuSE 13.1 and current Fedora)
>> >> >> and build times are 17 times (!!!) SLOWER than on Solaris/Illumos.
>> >> >>
>> >> >> Are there any tunables besides actimeo=300?
>> >> >
>> >> > rsize and wsize may help! You need to figure out if the read is the
>> >> > issue or the write before you dig further.
>> >>
>> >> I already tried to tune rsize/wsize, making them both smaller or the
>> >> maximum of 1048576 bytes, with no effect.
>> >>
>> >> One possible theory is that maybe something in Linux doesn't allow
>> >> multiple requests to be issued in parallel and waits for each request
>> >> to be completed before issuing the next one?
>> >
>> > Linux NFS client can issue I/Os in parallel. Should be limited by number
>> > of RPC slots though.
>>
>> What controls the number of RPC slots? is there a tunable? Is there
>> something to monitor the usage?
>
> sysctl sunrpc.tcp_slot_table_entries (if you are using tcp)

Its 16
NFSv4 is tcp only

I tried to bump the value to 128 - without effect - but the change is
not persistent across reboots. Is there something like Solaris
/etc/system which the kernel reads to set these values?

> Also, mountstats <mount-point> would be very helpful.

I don't have that command. likely my test machine is too old

Ced
--
Cedric Blancher <[email protected]>
Institute Pasteur

2014-04-23 21:15:35

by Malahal Naineni

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?

Cedric Blancher [[email protected]] wrote:
> On 23 April 2014 22:44, Malahal Naineni <[email protected]> wrote:
> > Cedric Blancher [[email protected]] wrote:
> >> On 23 April 2014 22:24, Malahal Naineni <[email protected]> wrote:
> >> > Cedric Blancher [[email protected]] wrote:
> >> >> Are there any options to improve the Linux NFSv4 performance over a
> >> >> high latency connection?
> >> >>
> >> >> We currently use Solaris/Illumos as NFSv4 server and client over a
> >> >> cross continental Internet connection. Latency is terrible (~220ms)
> >> >> but the counter this by running work in parallel so the latency is
> >> >> mostly mitigated.
> >> >>
> >> >> We now wish to migrate (short: Away from Oracle because support is
> >> >> basically unbearable) to Linux (tested SuSE 13.1 and current Fedora)
> >> >> and build times are 17 times (!!!) SLOWER than on Solaris/Illumos.
> >> >>
> >> >> Are there any tunables besides actimeo=300?
> >> >
> >> > rsize and wsize may help! You need to figure out if the read is the
> >> > issue or the write before you dig further.
> >>
> >> I already tried to tune rsize/wsize, making them both smaller or the
> >> maximum of 1048576 bytes, with no effect.
> >>
> >> One possible theory is that maybe something in Linux doesn't allow
> >> multiple requests to be issued in parallel and waits for each request
> >> to be completed before issuing the next one?
> >
> > Linux NFS client can issue I/Os in parallel. Should be limited by number
> > of RPC slots though.
>
> What controls the number of RPC slots? is there a tunable? Is there
> something to monitor the usage?

sysctl sunrpc.tcp_slot_table_entries (if you are using tcp)

Also, mountstats <mount-point> would be very helpful.

Regards, Malahal.


2014-04-23 20:30:42

by Cedric Blancher

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?

On 23 April 2014 22:24, Malahal Naineni <[email protected]> wrote:
> Cedric Blancher [[email protected]] wrote:
>> Are there any options to improve the Linux NFSv4 performance over a
>> high latency connection?
>>
>> We currently use Solaris/Illumos as NFSv4 server and client over a
>> cross continental Internet connection. Latency is terrible (~220ms)
>> but the counter this by running work in parallel so the latency is
>> mostly mitigated.
>>
>> We now wish to migrate (short: Away from Oracle because support is
>> basically unbearable) to Linux (tested SuSE 13.1 and current Fedora)
>> and build times are 17 times (!!!) SLOWER than on Solaris/Illumos.
>>
>> Are there any tunables besides actimeo=300?
>
> rsize and wsize may help! You need to figure out if the read is the
> issue or the write before you dig further.

I already tried to tune rsize/wsize, making them both smaller or the
maximum of 1048576 bytes, with no effect.

One possible theory is that maybe something in Linux doesn't allow
multiple requests to be issued in parallel and waits for each request
to be completed before issuing the next one?

Help!

Ced
--
Cedric Blancher <[email protected]>
Institute Pasteur

2014-04-24 17:22:10

by Cedric Blancher

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?

On 24 April 2014 05:12, Jim Rees <[email protected]> wrote:
> Cedric Blancher wrote:
>
> Are there any options to improve the Linux NFSv4 performance over a
> high latency connection?
>
> We did some work along these lines at CITI years ago. As I remember, the
> main thing was to increase net.ipv4.tcp_[rw]mem on the server side, because
> tcp auto-tuning was being defeated. This may be less of an issue with your
> work load, which sounds like many small files rather than one big one. In
> theory, NFSv4 delegations should help, but I don't know how well that works.

Trond, can you help?

Ced
--
Cedric Blancher <[email protected]>
Institute Pasteur

2014-04-23 22:58:01

by Malahal Naineni

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?

Cedric Blancher [[email protected]] wrote:
> On 23 April 2014 23:15, Malahal Naineni <[email protected]> wrote:
> > Cedric Blancher [[email protected]] wrote:
> >> On 23 April 2014 22:44, Malahal Naineni <[email protected]> wrote:
> >> > Cedric Blancher [[email protected]] wrote:
> >> >> On 23 April 2014 22:24, Malahal Naineni <[email protected]> wrote:
> >> >> > Cedric Blancher [[email protected]] wrote:
> >> >> >> Are there any options to improve the Linux NFSv4 performance over a
> >> >> >> high latency connection?
> >> >> >>
> >> >> >> We currently use Solaris/Illumos as NFSv4 server and client over a
> >> >> >> cross continental Internet connection. Latency is terrible (~220ms)
> >> >> >> but the counter this by running work in parallel so the latency is
> >> >> >> mostly mitigated.
> >> >> >>
> >> >> >> We now wish to migrate (short: Away from Oracle because support is
> >> >> >> basically unbearable) to Linux (tested SuSE 13.1 and current Fedora)
> >> >> >> and build times are 17 times (!!!) SLOWER than on Solaris/Illumos.
> >> >> >>
> >> >> >> Are there any tunables besides actimeo=300?
> >> >> >
> >> >> > rsize and wsize may help! You need to figure out if the read is the
> >> >> > issue or the write before you dig further.
> >> >>
> >> >> I already tried to tune rsize/wsize, making them both smaller or the
> >> >> maximum of 1048576 bytes, with no effect.
> >> >>
> >> >> One possible theory is that maybe something in Linux doesn't allow
> >> >> multiple requests to be issued in parallel and waits for each request
> >> >> to be completed before issuing the next one?
> >> >
> >> > Linux NFS client can issue I/Os in parallel. Should be limited by number
> >> > of RPC slots though.
> >>
> >> What controls the number of RPC slots? is there a tunable? Is there
> >> something to monitor the usage?
> >
> > sysctl sunrpc.tcp_slot_table_entries (if you are using tcp)
>
> Its 16
> NFSv4 is tcp only
>
> I tried to bump the value to 128 - without effect - but the change is
> not persistent across reboots. Is there something like Solaris
> /etc/system which the kernel reads to set these values?

Probably depends on your distro. Look at /etc/sysctl.conf if you have
that file.

> > Also, mountstats <mount-point> would be very helpful.
>
> I don't have that command. likely my test machine is too old

Hmm, my RHEL6.4 has it. What nfs-utils package you have.

Regards, Malahal.


2014-04-23 21:04:32

by Cedric Blancher

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?

On 23 April 2014 22:44, Malahal Naineni <[email protected]> wrote:
> Cedric Blancher [[email protected]] wrote:
>> On 23 April 2014 22:24, Malahal Naineni <[email protected]> wrote:
>> > Cedric Blancher [[email protected]] wrote:
>> >> Are there any options to improve the Linux NFSv4 performance over a
>> >> high latency connection?
>> >>
>> >> We currently use Solaris/Illumos as NFSv4 server and client over a
>> >> cross continental Internet connection. Latency is terrible (~220ms)
>> >> but the counter this by running work in parallel so the latency is
>> >> mostly mitigated.
>> >>
>> >> We now wish to migrate (short: Away from Oracle because support is
>> >> basically unbearable) to Linux (tested SuSE 13.1 and current Fedora)
>> >> and build times are 17 times (!!!) SLOWER than on Solaris/Illumos.
>> >>
>> >> Are there any tunables besides actimeo=300?
>> >
>> > rsize and wsize may help! You need to figure out if the read is the
>> > issue or the write before you dig further.
>>
>> I already tried to tune rsize/wsize, making them both smaller or the
>> maximum of 1048576 bytes, with no effect.
>>
>> One possible theory is that maybe something in Linux doesn't allow
>> multiple requests to be issued in parallel and waits for each request
>> to be completed before issuing the next one?
>
> Linux NFS client can issue I/Os in parallel. Should be limited by number
> of RPC slots though.

What controls the number of RPC slots? is there a tunable? Is there
something to monitor the usage?

Ced
--
Cedric Blancher <[email protected]>
Institute Pasteur

2014-04-23 20:44:12

by Malahal Naineni

[permalink] [raw]
Subject: Re: Tuning Linux NFSv4 for high latency connections?

Cedric Blancher [[email protected]] wrote:
> On 23 April 2014 22:24, Malahal Naineni <[email protected]> wrote:
> > Cedric Blancher [[email protected]] wrote:
> >> Are there any options to improve the Linux NFSv4 performance over a
> >> high latency connection?
> >>
> >> We currently use Solaris/Illumos as NFSv4 server and client over a
> >> cross continental Internet connection. Latency is terrible (~220ms)
> >> but the counter this by running work in parallel so the latency is
> >> mostly mitigated.
> >>
> >> We now wish to migrate (short: Away from Oracle because support is
> >> basically unbearable) to Linux (tested SuSE 13.1 and current Fedora)
> >> and build times are 17 times (!!!) SLOWER than on Solaris/Illumos.
> >>
> >> Are there any tunables besides actimeo=300?
> >
> > rsize and wsize may help! You need to figure out if the read is the
> > issue or the write before you dig further.
>
> I already tried to tune rsize/wsize, making them both smaller or the
> maximum of 1048576 bytes, with no effect.
>
> One possible theory is that maybe something in Linux doesn't allow
> multiple requests to be issued in parallel and waits for each request
> to be completed before issuing the next one?

Linux NFS client can issue I/Os in parallel. Should be limited by number
of RPC slots though.

Regards, Malahal.