2019-09-19 22:16:03

by Alkis Georgopoulos

[permalink] [raw]
Subject: Re: rsize,wsize=1M causes severe lags in 10/100 Mbps

On 9/19/19 11:05 PM, Trond Myklebust wrote:
> There are plenty of operations that can take longer than 700 ms to
> complete. Synchronous writes to disk are one, but COMMIT (i.e. the NFS
> equivalent of fsync()) can often take much longer even though it has no
> payload.
>
> So the problem is not the size of the WRITE payload. The real problem
> is the timeout.
>
> The bottom line is that if you want to keep timeo=7 as a mount option
> for TCP, then you are on your own.
>

The problem isn't timeo at all.
If I understand it correctly, when I try to launch firefox over nfsroot,
NFS will wait until it fills 1M before "replying" to the application.
Thus the applications will launch a lot slower, as they get "disk
feedback" in larger chunks and not "snappy".

In numbers:
timeo=600,rsize=1M => firefox opens in 30 secs
timeo=600,rsize=32k => firefox opens in 20 secs

Anyway, thank you very much for your time and feedback.

Kind regards,
Alkis Georgopoulos


2019-09-19 22:17:37

by Trond Myklebust

[permalink] [raw]
Subject: Re: rsize,wsize=1M causes severe lags in 10/100 Mbps

On Thu, 2019-09-19 at 23:20 +0300, Alkis Georgopoulos wrote:
> On 9/19/19 11:05 PM, Trond Myklebust wrote:
> > There are plenty of operations that can take longer than 700 ms to
> > complete. Synchronous writes to disk are one, but COMMIT (i.e. the
> > NFS
> > equivalent of fsync()) can often take much longer even though it
> > has no
> > payload.
> >
> > So the problem is not the size of the WRITE payload. The real
> > problem
> > is the timeout.
> >
> > The bottom line is that if you want to keep timeo=7 as a mount
> > option
> > for TCP, then you are on your own.
> >
>
> The problem isn't timeo at all.
> If I understand it correctly, when I try to launch firefox over
> nfsroot,
> NFS will wait until it fills 1M before "replying" to the application.
> Thus the applications will launch a lot slower, as they get "disk
> feedback" in larger chunks and not "snappy".
>
> In numbers:
> timeo=600,rsize=1M => firefox opens in 30 secs
> timeo=600,rsize=32k => firefox opens in 20 secs
>

That's a different problem, and is most likely due to readahead causing
your client to read more data than it needs to. It is also true that
the maximum readahead size is proportional to the rsize and that maybe
it shouldn't be.
However the VM layer is supposed to ensure that the kernel doesn't try
to read ahead more than necessary. It is bounded by the maximum we set
in the NFS layer, but it isn't supposed to hit that maximum unless the
readahead heuristics show that the application may need it.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]