2019-06-03 15:16:00

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: NFS (pNFS) and VM dirty bytes



Dear NFS fellows,

though this is not directly NFS issue, I post this question
here as we mostly affected by NFS clients (and you have enough
kernel connection to route it to the right people).

We have 25 new data processing nodes with 32 cores, 256 GB RAM and 25 Gb/s NIC.
They run CentOS 7 (but this is irrelevant, I think).

When each node runs 24 parallel write incentive (75% write, 25% read) workloads, we see a spike of
IO errors on close. Client runs into timeout due to slow network or IO starvation on the NFS servers.
It stumbles, disconnects, establishes a new connection and stumbled again...

As default values for dirty pages is

vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_ratio = 30

the first data get sent when at least 25GB of data is accumulated.

To get the full deployment more responsive, we have reduced default numbers to something more reasonable:

vm.dirty_background_ratio = 0
vm.dirty_ratio = 0
vm.dirty_background_bytes = 67108864
vm.dirty_bytes = 536870912

IOW, we force client to start to send data as soon as 64MB is written. The question is how get this
values optimal and how make them file system/mount point specific.

Thanks in advance,
Tigran.


2019-06-03 16:13:49

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS (pNFS) and VM dirty bytes

On Mon, 2019-06-03 at 17:07 +0200, Mkrtchyan, Tigran wrote:
>
> Dear NFS fellows,
>
> though this is not directly NFS issue, I post this question
> here as we mostly affected by NFS clients (and you have enough
> kernel connection to route it to the right people).
>
> We have 25 new data processing nodes with 32 cores, 256 GB RAM and 25
> Gb/s NIC.
> They run CentOS 7 (but this is irrelevant, I think).
>
> When each node runs 24 parallel write incentive (75% write, 25% read)
> workloads, we see a spike of
> IO errors on close. Client runs into timeout due to slow network or
> IO starvation on the NFS servers.
> It stumbles, disconnects, establishes a new connection and stumbled
> again...

You can adjust the pNFS timeout behaviour using the 'dataserver_timeo'
and 'dataserver_retrans' module parameters on both the files and
flexfiles pNFS driver modules.

>
> As default values for dirty pages is
>
> vm.dirty_background_bytes = 0
> vm.dirty_background_ratio = 10
> vm.dirty_bytes = 0
> vm.dirty_ratio = 30
>
> the first data get sent when at least 25GB of data is accumulated.
>
> To get the full deployment more responsive, we have reduced default
> numbers to something more reasonable:
>
> vm.dirty_background_ratio = 0
> vm.dirty_ratio = 0
> vm.dirty_background_bytes = 67108864
> vm.dirty_bytes = 536870912
>
> IOW, we force client to start to send data as soon as 64MB is
> written. The question is how get this
> values optimal and how make them file system/mount point specific.

The memory management system knows nothing about mount points, and the
filesystems know nothing about the memory management limits. That is by
design.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]