LinuxLists.cc - rsize,wsize=1M causes severe lags in 10/100 Mbps, what sets those defaults?

2019-09-19 07:44:38

Subject: rsize,wsize=1M causes severe lags in 10/100 Mbps, what sets those defaults?

Hi, in any recent distribution that I tried, the default NFS wsize/rsize
was 1 MB.

On 10/100 Mbps networks, this causes severe lags, timeouts, and dmesg
fills with messages like:

> [ 316.404250] nfs: server 192.168.1.112 not responding, still trying
> [ 316.759512] nfs: server 192.168.1.112 OK

Forcing wsize/rsize to 32K makes all the problems disappear and NFS
access more snappy, without sacrificing any speed at least up to gigabit
networks that I tested with.

I would like to request that the defaults be changed to 32K.
But I didn't find out where these defaults come from, where to file the
issue and my test case / benchmarks to support it.

I've initially reported it at the klibc nfsmount program that I was
using, but this is just using the NFS defaults, which are the ones that
should be amended. So initial test case / benchmarks there:
https://lists.zytor.com/archives/klibc/2019-September/004234.html

Please Cc me as I'm not in the list.

Thank you,
Alkis Georgopoulos

2019-09-19 17:32:15

by Trond Myklebust

[permalink] [raw]

Subject: Re: rsize,wsize=1M causes severe lags in 10/100 Mbps, what sets those defaults?

On Thu, 19 Sep 2019 at 03:44, Alkis Georgopoulos <[email protected]> wrote:
>
> Hi, in any recent distribution that I tried, the default NFS wsize/rsize
> was 1 MB.
>
> On 10/100 Mbps networks, this causes severe lags, timeouts, and dmesg
> fills with messages like:
>
> > [ 316.404250] nfs: server 192.168.1.112 not responding, still trying
> > [ 316.759512] nfs: server 192.168.1.112 OK
>
> Forcing wsize/rsize to 32K makes all the problems disappear and NFS
> access more snappy, without sacrificing any speed at least up to gigabit
> networks that I tested with.
>
> I would like to request that the defaults be changed to 32K.
> But I didn't find out where these defaults come from, where to file the
> issue and my test case / benchmarks to support it.
>
> I've initially reported it at the klibc nfsmount program that I was
> using, but this is just using the NFS defaults, which are the ones that
> should be amended. So initial test case / benchmarks there:
> https://lists.zytor.com/archives/klibc/2019-September/004234.html
>
> Please Cc me as I'm not in the list.
>

The default client behaviour is just to go with whatever recommended
value the server specifies. You can change that value yourself on the
knfsd server by editing the pseudo-file in
/proc/fs/nfsd/max_block_size.

Cheers
Trond

2019-09-19 18:21:33

by Alkis Georgopoulos

[permalink] [raw]

Subject: Re: rsize,wsize=1M causes severe lags in 10/100 Mbps

On 9/19/19 6:08 PM, Trond Myklebust wrote:
> The default client behaviour is just to go with whatever recommended
> value the server specifies. You can change that value yourself on the
> knfsd server by editing the pseudo-file in
> /proc/fs/nfsd/max_block_size.

Thank you, and I guess I can automate this, by running
`systemctl edit nfs-kernel-server`, and adding:
[Service]
ExecStartPre=sh -c 'echo 32768 > /proc/fs/nfsd/max_block_size'

But isn't it a problem that the defaults cause errors in dmesg and
severe lags in 10/100 Mbps, and even make 1000 Mbps a lot less snappy
than with 32K?

In any case thank you again.
Alkis Georgopoulos

2019-09-19 18:27:33

by Trond Myklebust

[permalink] [raw]

Subject: Re: rsize,wsize=1M causes severe lags in 10/100 Mbps

On Thu, 2019-09-19 at 18:58 +0300, Alkis Georgopoulos wrote:
> On 9/19/19 6:08 PM, Trond Myklebust wrote:
> > The default client behaviour is just to go with whatever
> > recommended
> > value the server specifies. You can change that value yourself on
> > the
> > knfsd server by editing the pseudo-file in
> > /proc/fs/nfsd/max_block_size.
>
> Thank you, and I guess I can automate this, by running
> `systemctl edit nfs-kernel-server`, and adding:
> [Service]
> ExecStartPre=sh -c 'echo 32768 > /proc/fs/nfsd/max_block_size'
>
> But isn't it a problem that the defaults cause errors in dmesg and
> severe lags in 10/100 Mbps, and even make 1000 Mbps a lot less
> snappy
> than with 32K?
>

No. It is not a problem, because nfs-utils defaults to using TCP
mounts. Fragmentation is only a problem with UDP, and we stopped
defaulting to that almost 2 decades ago.

However it may well be that klibc is still defaulting to using UDP, in
which case it should be fixed. There are major Linux distros out there
today that don't even compile in support for NFS over UDP any more.

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2019-09-19 22:14:41

by Alkis Georgopoulos

[permalink] [raw]

Subject: Re: rsize,wsize=1M causes severe lags in 10/100 Mbps

On 9/19/19 7:11 PM, Trond Myklebust wrote:
> No. It is not a problem, because nfs-utils defaults to using TCP
> mounts. Fragmentation is only a problem with UDP, and we stopped
> defaulting to that almost 2 decades ago.
>
> However it may well be that klibc is still defaulting to using UDP, in
> which case it should be fixed. There are major Linux distros out there
> today that don't even compile in support for NFS over UDP any more.

I haven't tested with UDP at all; the problem was with TCP.
I saw the problem in klibc nfsmount with TCP + NFS 3,
and in `mount -t nfs -o timeo=7 server:/share /mnt` with TCP + NFS 4.2.

Steps to reproduce:
1) Connect server <=> client at 10 or 100 Mbps.
Gigabit is also "less snappy" but it's less obvious there.
For reliable results, I made sure that server/client/network didn't have
any other load at all.

2) Server:
echo '/srv *(ro,async,no_subtree_check)' >> /etc/exports
exportfs -ra
truncate -s 10G /srv/10G.file
The sparse file ensures that disk IO bandwidth isn't an issue.

3) Client:
mount -t nfs -o timeo=7 192.168.1.112:/srv /mnt
dd if=/mnt/10G.file of=/dev/null status=progress

4) Result:
dd there starts with 11.2 MB/sec, which is fine/expected,
and it slowly drops to 2 MB/sec after a while,
it lags, omitting some seconds in its output line,
e.g. 507510784 bytes (508 MB, 484 MiB) copied, 186 s, 2,7 MB/s^C,
at which point "Ctrl+C" needs 30+ seconds to stop dd,
because of IO waiting etc.

In another terminal tab, `dmesg -w` is full of these:
[ 316.404250] nfs: server 192.168.1.112 not responding, still trying
[ 316.759512] nfs: server 192.168.1.112 OK

5) Remarks:
With timeo=600, there are no errors in dmesg.
The fact that timeo=7 (the nfsmount default) causes errors, proves that
some packets need more than 0.7 secs to arrive.
Which in turn explains why all the applications open extremely slowly
and feel sluggish on netroot = 100 Mbps, NFS, TCP.

Lowering rsize,wsize from 1M to 32K solves all those issues without any
negative side effects that I can see. Even on gigabit, 32K makes
applications a lot more snappy so it's better even there.
On 10 Mbps, rsize=1M is completely unusable.

So I'm not sure where rsize=1M is a better default. Is it only for 10G+
connections?

Thank you very much,
Alkis Georgopoulos

2019-09-19 22:15:13

by Trond Myklebust

[permalink] [raw]

Subject: Re: rsize,wsize=1M causes severe lags in 10/100 Mbps

On Thu, 2019-09-19 at 22:21 +0300, Alkis Georgopoulos wrote:
> On 9/19/19 7:11 PM, Trond Myklebust wrote:
> > No. It is not a problem, because nfs-utils defaults to using TCP
> > mounts. Fragmentation is only a problem with UDP, and we stopped
> > defaulting to that almost 2 decades ago.
> >
> > However it may well be that klibc is still defaulting to using UDP,
> > in
> > which case it should be fixed. There are major Linux distros out
> > there
> > today that don't even compile in support for NFS over UDP any more.
>
> I haven't tested with UDP at all; the problem was with TCP.
> I saw the problem in klibc nfsmount with TCP + NFS 3,
> and in `mount -t nfs -o timeo=7 server:/share /mnt` with TCP + NFS
> 4.2.
>
> Steps to reproduce:
> 1) Connect server <=> client at 10 or 100 Mbps.
> Gigabit is also "less snappy" but it's less obvious there.
> For reliable results, I made sure that server/client/network didn't
> have
> any other load at all.
>
> 2) Server:
> echo '/srv *(ro,async,no_subtree_check)' >> /etc/exports
> exportfs -ra
> truncate -s 10G /srv/10G.file
> The sparse file ensures that disk IO bandwidth isn't an issue.
>
> 3) Client:
> mount -t nfs -o timeo=7 192.168.1.112:/srv /mnt
> dd if=/mnt/10G.file of=/dev/null status=progress
>
> 4) Result:
> dd there starts with 11.2 MB/sec, which is fine/expected,
> and it slowly drops to 2 MB/sec after a while,
> it lags, omitting some seconds in its output line,
> e.g. 507510784 bytes (508 MB, 484 MiB) copied, 186 s, 2,7 MB/s^C,
> at which point "Ctrl+C" needs 30+ seconds to stop dd,
> because of IO waiting etc.
>
> In another terminal tab, `dmesg -w` is full of these:
> [ 316.404250] nfs: server 192.168.1.112 not responding, still trying
> [ 316.759512] nfs: server 192.168.1.112 OK
>
> 5) Remarks:
> With timeo=600, there are no errors in dmesg.
> The fact that timeo=7 (the nfsmount default) causes errors, proves
> that
> some packets need more than 0.7 secs to arrive.
> Which in turn explains why all the applications open extremely
> slowly
> and feel sluggish on netroot = 100 Mbps, NFS, TCP.
>
> Lowering rsize,wsize from 1M to 32K solves all those issues without
> any
> negative side effects that I can see. Even on gigabit, 32K makes
> applications a lot more snappy so it's better even there.
> On 10 Mbps, rsize=1M is completely unusable.
>
> So I'm not sure where rsize=1M is a better default. Is it only for
> 10G+
> connections?
>

I don't understand why klibc would default to supplying a timeo=7
argument at all. It would be MUCH better if it just let the kernel set
the default, which in the case of TCP is timeo=600.

I agree with your argument that replaying requests every 0.7 seconds is
just going to cause congestion. TCP provides for reliable delivery of
RPC messages to the server, which is why the kernel default is a full
minute.

So please ask the klibc developers to change libmount to let the kernel
decide the default mount options. Their current setting is just plain
wrong.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2019-09-19 22:15:20

by Alkis Georgopoulos

[permalink] [raw]

Subject: Re: rsize,wsize=1M causes severe lags in 10/100 Mbps

On 9/19/19 10:51 PM, Trond Myklebust wrote:
> I don't understand why klibc would default to supplying a timeo=7
> argument at all. It would be MUCH better if it just let the kernel set
> the default, which in the case of TCP is timeo=600.
>
> I agree with your argument that replaying requests every 0.7 seconds is
> just going to cause congestion. TCP provides for reliable delivery of
> RPC messages to the server, which is why the kernel default is a full
> minute.
>
> So please ask the klibc developers to change libmount to let the kernel
> decide the default mount options. Their current setting is just plain
> wrong.

This was what I asked in my first message to their mailing list,
https://lists.zytor.com/archives/klibc/2019-September/004234.html

Then I realized that timeo=600 just hides the real problem,
which is rsize=1M.

NFS defaults: timeo=600,rsize=1M => lag
nfsmount defaults: timeo=7,rsize=1MK => lag AND dmesg errors

My proposal: timeo=whatever,rsize=32K => all fine

If more benchmarks are needed from me to document the
"NFS defaults: timeo=600,rsize=1M => lag"
I can surely provide them.

Thanks,
Alkis

2019-09-19 22:16:11

by Trond Myklebust

[permalink] [raw]

Subject: Re: rsize,wsize=1M causes severe lags in 10/100 Mbps

On Thu, 2019-09-19 at 22:57 +0300, Alkis Georgopoulos wrote:
> On 9/19/19 10:51 PM, Trond Myklebust wrote:
> > I don't understand why klibc would default to supplying a timeo=7
> > argument at all. It would be MUCH better if it just let the kernel
> > set
> > the default, which in the case of TCP is timeo=600.
> >
> > I agree with your argument that replaying requests every 0.7
> > seconds is
> > just going to cause congestion. TCP provides for reliable delivery
> > of
> > RPC messages to the server, which is why the kernel default is a
> > full
> > minute.
> >
> > So please ask the klibc developers to change libmount to let the
> > kernel
> > decide the default mount options. Their current setting is just
> > plain
> > wrong.
>
> This was what I asked in my first message to their mailing list,
> https://lists.zytor.com/archives/klibc/2019-September/004234.html
>
> Then I realized that timeo=600 just hides the real problem,
> which is rsize=1M.
>
> NFS defaults: timeo=600,rsize=1M => lag
> nfsmount defaults: timeo=7,rsize=1MK => lag AND dmesg errors
>
> My proposal: timeo=whatever,rsize=32K => all fine
>
> If more benchmarks are needed from me to document the
> "NFS defaults: timeo=600,rsize=1M => lag"
> I can surely provide them.

There are plenty of operations that can take longer than 700 ms to
complete. Synchronous writes to disk are one, but COMMIT (i.e. the NFS
equivalent of fsync()) can often take much longer even though it has no
payload.

So the problem is not the size of the WRITE payload. The real problem
is the timeout.

The bottom line is that if you want to keep timeo=7 as a mount option
for TCP, then you are on your own.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]