2015-03-30 21:44:04

by lyndat3

[permalink] [raw]
Subject: large data transfer rate slowdows over NFSv4 local lan with kernel 3.19x & 3.16x ?

I was just pinged on this by a client; I can reproduce it here.

I have two opensuse 13.2 machines. NFS xfers -- cp & rsync -- between them slow to a crawl: < 1 MB/sec in the worst case, over a 1Gb local lan.

Chats @ #networking/#nfs suggest this is a kernel+NFS issue. So checking here 1st.

Both machines run kernel

uname -rm
3.19.3-1.gf10e7fc-default x86_64

Both have NFS installed. Packages include

nfs-client-1.3.0-4.2.1.x86_64
nfs-kernel-server-1.3.0-4.2.1.x86_64

The server's store is at

/NAS/NAS1

it's on a LV on a software (mdadm v3.3.1) RAID-10 array.

The client's mounted it at

/mnt/NFS4/NAS1

mount | egrep "NFS|NAS1"
/etc/auto.nfs4 on /mnt/NFS4 type autofs (rw,relatime,fd=6,pgrp=2619,timeout=10,minproto=5,maxproto=5,indirect)
xen01.loc:/ on /mnt/NFS4/NAS1 type nfs4 (rw,nosuid,nodev,relatime,sync,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.101,fsc,local_lock=none,addr=10.0.0.1)

The diagnostics I can think to do follow.

creating test files @ both machines

@ server
dd if=/dev/zero of=/NAS/NAS1/dump-server-file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.37631 s, 744 MB/s

@ client
dd if=/dev/zero of=~/dump-client-file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.988 s, 515 MB/s


TESTS

(1) server -> server, local cp
rm -f /tmp/dump-server-file
time /bin/cp /NAS/NAS1/dump-server-file /tmp/
real 0m0.486s
user 0m0.004s
sys 0m0.480s
~= 2100MB/s (real)
~= 2100MB/s (sys)

(2) server -> server, local rsync
rm -f /tmp/dump-server-file
time /usr/bin/rsync /NAS/NAS1/dump-server-file /tmp/
real 0m2.491s
user 0m3.344s
sys 0m1.264s
~= 411 MB/s (real)
~= 810 MB/s (sys)

(3) client -> server, PING

ping -c 10 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.307 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.280 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.303 ms
64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=0.280 ms
64 bytes from 10.0.0.1: icmp_seq=5 ttl=64 time=0.262 ms
64 bytes from 10.0.0.1: icmp_seq=6 ttl=64 time=0.290 ms
64 bytes from 10.0.0.1: icmp_seq=7 ttl=64 time=0.281 ms
64 bytes from 10.0.0.1: icmp_seq=8 ttl=64 time=0.286 ms
64 bytes from 10.0.0.1: icmp_seq=9 ttl=64 time=0.287 ms
64 bytes from 10.0.0.1: icmp_seq=10 ttl=64 time=0.291 ms

--- 10.0.0.1 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8999ms
rtt min/avg/max/mdev = 0.262/0.286/0.307/0.023 ms

(4) client -> `iperf3 -s`@server: TCP, one thread

iperf3 -c 10.0.0.1 -t 60 -i 15 -F ~/dump-client-file -f M
[ ID] Interval Transfer Bandwidth Retr
[ 5] 0.00-3.08 sec 337 MBytes 918 Mbits/sec 10 sender
[ 5] 0.00-3.08 sec 336 MBytes 915 Mbits/sec receiver

(5) client -> `iperf3 -s`@server: TCP, 100 threads

iperf3 -c 10.0.0.1 -t 60 -i 15 -F ~/dump-client-file -f M -P100
[ ID] Interval Transfer Bandwidth Retr
...
[SUM] 0.00-31.08 sec 3.40 GBytes 940 Mbits/sec 121 sender
[SUM] 0.00-31.08 sec 3.39 GBytes 937 Mbits/sec receiver

(6) client -> `iperf3 -s`@server: UDP, one thread

iperf3 -c 10.0.0.1 -t 60 -i 15 -F ~/dump-client-file -f M -b 1G -P 1
[ ID] Interval Transfer Bandwidth Retr
...
[ 5] 0.00-8.97 sec 977 MBytes 913 Mbits/sec 25 sender
[ 5] 0.00-8.97 sec 976 MBytes 912 Mbits/sec receiver

(7) client -> `iperf3 -s`@server: UDP, 100 threads

iperf3 -c 10.0.0.1 -t 60 -i 15 -F ~/dump-client-file -f M -b 1G -P 100
[ ID] Interval Transfer Bandwidth Retr
...
[SUM] 0.00-60.01 sec 6.56 GBytes 939 Mbits/sec 180 sender
[SUM] 0.00-60.01 sec 6.55 GBytes 937 Mbits/sec receiver

(8) client -> server, cp over NFS

rm -f /mnt/NFS4/NAS1/dump-client-file
time /bin/cp ~/dump-client-file /mnt/NFS4/NAS1/
real 0m54.589s
user 0m0.005s
sys 0m1.225s
~= 18.75 MB/s (real)
~= 810 MB/s (sys)

(9) client -> server, rsync over NFS

rm -f /mnt/NFS4/NAS1/dump-client-file
time /usr/bin/rsync ~/dump-client-file /mnt/NFS4/NAS1/
real 18m13.408s
user 0m4.642s
sys 0m2.627s
~= 0.937 MB/s (real)
~= 390 MB/s (sys)


EDIT:

ruling out rsync alone


(10) rsync, no NFS

time /usr/bin/rsync ~/dump-client-file [email protected]:/NAS/NAS1
real 0m19.179s
user 0m16.505s
sys 0m4.135s

(11) rsync, over NFS

time /usr/bin/rsync ~/dump-client-file /mnt/NFS4/NAS1/
real 18m25.726s
user 0m4.647s
sys 0m2.912s


and

Testing for a kernel dependency, downgrading kernel-default to

uname -rm
3.16.7-7-default x86_64

and retesting the slow cases, there's still no significant change

(12) client -> server, cp over NFS

rm -f /mnt/NFS4/NAS1/dump-client-file
time /bin/cp ~/dump-client-file /mnt/NFS4/NAS1/
real 0m56.064s
user 0m0.003s
sys 0m1.266s
~= 18.26 MB/s (real)
~= 809 MB/s (sys)

(13) client -> server, rsync over NFS

rm -f /mnt/NFS4/NAS1/dump-client-file
time /usr/bin/rsync ~/dump-client-file /mnt/NFS4/NAS1/
real 17m59.312s
user 0m4.116s
sys 0m2.226s
~= 0.949 MB/s (real)
~= 460 MB/s (sys)


If there's additional diagnostic info, I can provide it.


LT



2015-03-31 14:21:17

by lyndat3

[permalink] [raw]
Subject: file xfer over NFSv4 with 'sync' ~300X slower than with 'async' ?

Narrowing doen the issue, NFSv4 file xfer with 'sync' appears, here, to be ~ 300X slower than with 'async'.

(1) for NFSv4 mount with 'sync'

grep NAS1 /etc/auto.nfs4
NAS1 -fstype=nfs4,_netdev,rw,proto=tcp,sync,... xen01.loc:/

a 100MB file xfer takes ~8 minutes

rm -f /mnt/NFS4/NAS1/file.out && \
time dd if=/dev/zero of=/mnt/NFS4/NAS1/file.out bs=32K count=3K
3072+0 records in
3072+0 records out
100663296 bytes (101 MB) copied, 485.721 s, 207 kB/s

real 8m5.861s
user 0m0.012s
sys 0m0.250s

(2) Change mount 'sync' -> 'async',

vi /etc/auto.nfs4
- NAS1 -fstype=nfs4,_netdev,rw,proto=tcp,sync,... xen01.loc:/
+ NAS1 -fstype=nfs4,_netdev,rw,proto=tcp,async,... xen01.loc:/
systemctl restart autofs

the same 100MB file xfer takes ~ 2 seconds

rm -f /mnt/NFS4/NAS1/file.out && \
time dd if=/dev/zero of=/mnt/NFS4/NAS1/file.out bs=32K count=3K
3072+0 records in
3072+0 records out
100663296 bytes (101 MB) copied, 1.65577 s, 60.8 MB/s

real 0m1.658s
user 0m0.000s
sys 0m0.089s


I'd expect 'sync' to be slower than 'async', but 300X ?

Is there additional config that cures, or at least drastically improves, this slow down?

Some very old (10+ years) posts suggested kernel bugs, but those were fixed ages ago. Maybe reemerged?

LT

2015-04-01 17:54:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: file xfer over NFSv4 with 'sync' ~300X slower than with 'async' ?

On Tue, Mar 31, 2015 at 07:21:16AM -0700, [email protected] wrote:
> Narrowing doen the issue, NFSv4 file xfer with 'sync' appears, here, to be ~ 300X slower than with 'async'.
>
> (1) for NFSv4 mount with 'sync'
>
> grep NAS1 /etc/auto.nfs4
> NAS1 -fstype=nfs4,_netdev,rw,proto=tcp,sync,... xen01.loc:/
>
> a 100MB file xfer takes ~8 minutes
>
> rm -f /mnt/NFS4/NAS1/file.out && \
> time dd if=/dev/zero of=/mnt/NFS4/NAS1/file.out bs=32K count=3K
> 3072+0 records in
> 3072+0 records out
> 100663296 bytes (101 MB) copied, 485.721 s, 207 kB/s
>
> real 8m5.861s
> user 0m0.012s
> sys 0m0.250s
>
> (2) Change mount 'sync' -> 'async',
>
> vi /etc/auto.nfs4
> - NAS1 -fstype=nfs4,_netdev,rw,proto=tcp,sync,... xen01.loc:/
> + NAS1 -fstype=nfs4,_netdev,rw,proto=tcp,async,... xen01.loc:/
> systemctl restart autofs
>
> the same 100MB file xfer takes ~ 2 seconds
>
> rm -f /mnt/NFS4/NAS1/file.out && \
> time dd if=/dev/zero of=/mnt/NFS4/NAS1/file.out bs=32K count=3K
> 3072+0 records in
> 3072+0 records out
> 100663296 bytes (101 MB) copied, 1.65577 s, 60.8 MB/s
>
> real 0m1.658s
> user 0m0.000s
> sys 0m0.089s
>
>
> I'd expect 'sync' to be slower than 'async', but 300X ?

There's no maximum sync/async ratio. You could make that ratio lower or
higher by varying dd's block size, for example.

The way I'd look at it, your dd of a 100MB file above is doing 3072
writes, and taking about 8*60/3072 =~ .16 seconds per write.

That does sound high. Things to look at to understand why might include
the round-trip ping time to the server, and the time for the server's
disk to do a synchronous write.

--b.

>
> Is there additional config that cures, or at least drastically improves, this slow down?
>
> Some very old (10+ years) posts suggested kernel bugs, but those were fixed ages ago. Maybe reemerged?
>
> LT
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-04-01 18:08:04

by lyndat3

[permalink] [raw]
Subject: Re: file xfer over NFSv4 with 'sync' ~300X slower than with 'async' ?

> There's no maximum sync/async ratio. You could make that ratio lower or
> higher by varying dd's block size, for example.
>
> The way I'd look at it, your dd of a 100MB file above is doing 3072
> writes, and taking about 8*60/3072 =~ .16 seconds per write.
>
> That does sound high. Things to look at to understand why might include
> the round-trip ping time to the server, and the time for the server's
> disk to do a synchronous write.
>

After comments from one of the nfs client maintainers, it turns out the slowness issue is simply one of not-quite-MIS-configuration.

As helpfully commented here

http://serverfault.com/questions/499174/etc-exports-mount-option/500553#500553

, IIUC there are two *separate* syncs to consider -- at the server, and at the client.

'sync' on the EXPORT, and 'async' on the MOUNT is the sane approach; That config also appears to return the performance.

The many recommendations online to use 'sync' for data integrity are IIUC for sync on the server.

With 'async' set on the moount, and 'sync' on the exports, It appears integrity of writes is properly assured and I've got rsync-over-NFSv4 performance back in the ~30-60 MB/s range.

LT

2015-04-01 18:51:47

by J. Bruce Fields

[permalink] [raw]
Subject: Re: file xfer over NFSv4 with 'sync' ~300X slower than with 'async' ?

On Wed, Apr 01, 2015 at 11:02:17AM -0700, [email protected] wrote:
> > There's no maximum sync/async ratio. You could make that ratio lower or
> > higher by varying dd's block size, for example.
> >
> > The way I'd look at it, your dd of a 100MB file above is doing 3072
> > writes, and taking about 8*60/3072 =~ .16 seconds per write.
> >
> > That does sound high. Things to look at to understand why might include
> > the round-trip ping time to the server, and the time for the server's
> > disk to do a synchronous write.
> >
>
> After comments from one of the nfs client maintainers, it turns out the slowness issue is simply one of not-quite-MIS-configuration.
>
> As helpfully commented here
>
> http://serverfault.com/questions/499174/etc-exports-mount-option/500553#500553
>
> , IIUC there are two *separate* syncs to consider -- at the server, and at the client.
>
> 'sync' on the EXPORT, and 'async' on the MOUNT is the sane approach; That config also appears to return the performance.
>
> The many recommendations online to use 'sync' for data integrity are IIUC for sync on the server.

Yes. This is a common source of confusion. In retrospect maybe the
export sync/async option should have had a different name from the
client mount option.--b.

> With 'async' set on the moount, and 'sync' on the exports, It appears integrity of writes is properly assured and I've got rsync-over-NFSv4 performance back in the ~30-60 MB/s range.
>
> LT

2015-04-01 18:59:49

by Trond Myklebust

[permalink] [raw]
Subject: Re: file xfer over NFSv4 with 'sync' ~300X slower than with 'async' ?

On Wed, Apr 1, 2015 at 2:51 PM, J. Bruce Fields <[email protected]> wrote:
> On Wed, Apr 01, 2015 at 11:02:17AM -0700, [email protected] wrote:
>> > There's no maximum sync/async ratio. You could make that ratio lower or
>> > higher by varying dd's block size, for example.
>> >
>> > The way I'd look at it, your dd of a 100MB file above is doing 3072
>> > writes, and taking about 8*60/3072 =~ .16 seconds per write.
>> >
>> > That does sound high. Things to look at to understand why might include
>> > the round-trip ping time to the server, and the time for the server's
>> > disk to do a synchronous write.
>> >
>>
>> After comments from one of the nfs client maintainers, it turns out the slowness issue is simply one of not-quite-MIS-configuration.
>>
>> As helpfully commented here
>>
>> http://serverfault.com/questions/499174/etc-exports-mount-option/500553#500553
>>
>> , IIUC there are two *separate* syncs to consider -- at the server, and at the client.
>>
>> 'sync' on the EXPORT, and 'async' on the MOUNT is the sane approach; That config also appears to return the performance.
>>
>> The many recommendations online to use 'sync' for data integrity are IIUC for sync on the server.
>
> Yes. This is a common source of confusion. In retrospect maybe the
> export sync/async option should have had a different name from the
> client mount option.--b.
>

Do we still need a server 'async' export option? Who is still using
NFSv2 for any type of performance-critical work?

Trond

2015-04-01 19:04:07

by lyndat3

[permalink] [raw]
Subject: Re: file xfer over NFSv4 with 'sync' ~300X slower than with 'async' ?

Hi

> > Yes. This is a common source of confusion. In retrospect maybe the
> > export sync/async option should have had a different name from the
> > client mount option.--b.
> >
>
> Do we still need a server 'async' export option? Who is still using
> NFSv2 for any type of performance-critical work?


Just to be clear -- MY pebkac was that I'd set the CLIENT mount as 'sync' -- based on the misunderstanding that write integrity required it 'everywhere' -- on the export AND the mount -- and that 'async' was potentially unsafe.

The server was always exporting 'sync'.

LT

2015-04-01 19:50:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: file xfer over NFSv4 with 'sync' ~300X slower than with 'async' ?

On Wed, Apr 01, 2015 at 02:59:48PM -0400, Trond Myklebust wrote:
> On Wed, Apr 1, 2015 at 2:51 PM, J. Bruce Fields <[email protected]> wrote:
> > On Wed, Apr 01, 2015 at 11:02:17AM -0700, [email protected] wrote:
> >> > There's no maximum sync/async ratio. You could make that ratio lower or
> >> > higher by varying dd's block size, for example.
> >> >
> >> > The way I'd look at it, your dd of a 100MB file above is doing 3072
> >> > writes, and taking about 8*60/3072 =~ .16 seconds per write.
> >> >
> >> > That does sound high. Things to look at to understand why might include
> >> > the round-trip ping time to the server, and the time for the server's
> >> > disk to do a synchronous write.
> >> >
> >>
> >> After comments from one of the nfs client maintainers, it turns out the slowness issue is simply one of not-quite-MIS-configuration.
> >>
> >> As helpfully commented here
> >>
> >> http://serverfault.com/questions/499174/etc-exports-mount-option/500553#500553
> >>
> >> , IIUC there are two *separate* syncs to consider -- at the server, and at the client.
> >>
> >> 'sync' on the EXPORT, and 'async' on the MOUNT is the sane approach; That config also appears to return the performance.
> >>
> >> The many recommendations online to use 'sync' for data integrity are IIUC for sync on the server.
> >
> > Yes. This is a common source of confusion. In retrospect maybe the
> > export sync/async option should have had a different name from the
> > client mount option.--b.
> >
>
> Do we still need a server 'async' export option? Who is still using
> NFSv2 for any type of performance-critical work?

It also bypasses commits on metadata operations. Not that that makes it
a good idea, but it could still easily make a noticeable difference.

--b.

2015-04-01 19:56:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: file xfer over NFSv4 with 'sync' ~300X slower than with 'async' ?

On Wed, Apr 01, 2015 at 12:04:07PM -0700, [email protected] wrote:
> Hi
>
> > > Yes. This is a common source of confusion. In retrospect maybe the
> > > export sync/async option should have had a different name from the
> > > client mount option.--b.
> > >
> >
> > Do we still need a server 'async' export option? Who is still using
> > NFSv2 for any type of performance-critical work?
>
>
> Just to be clear -- MY pebkac was that I'd set the CLIENT mount as 'sync' -- based on the misunderstanding that write integrity required it 'everywhere' -- on the export AND the mount -- and that 'async' was potentially unsafe.
>
> The server was always exporting 'sync'.

Yeah, understood, I just meant that if we'd originally named that export
option, I don't know, "trash_me_on_reboot", then you wouldn't have
gotten the "don't use async, it's unsafe" idea, and wouldn't have gotten
into this mess. But, too late to do anything about that, I guess.

--b.

2015-04-01 20:02:58

by Trond Myklebust

[permalink] [raw]
Subject: Re: file xfer over NFSv4 with 'sync' ~300X slower than with 'async' ?

On Wed, Apr 1, 2015 at 3:56 PM, J. Bruce Fields <[email protected]> wrote:
> On Wed, Apr 01, 2015 at 12:04:07PM -0700, [email protected] wrote:
>> Hi
>>
>> > > Yes. This is a common source of confusion. In retrospect maybe the
>> > > export sync/async option should have had a different name from the
>> > > client mount option.--b.
>> > >
>> >
>> > Do we still need a server 'async' export option? Who is still using
>> > NFSv2 for any type of performance-critical work?
>>
>>
>> Just to be clear -- MY pebkac was that I'd set the CLIENT mount as 'sync' -- based on the misunderstanding that write integrity required it 'everywhere' -- on the export AND the mount -- and that 'async' was potentially unsafe.
>>
>> The server was always exporting 'sync'.
>
> Yeah, understood, I just meant that if we'd originally named that export
> option, I don't know, "trash_me_on_reboot", then you wouldn't have
> gotten the "don't use async, it's unsafe" idea, and wouldn't have gotten
> into this mess. But, too late to do anything about that, I guess.
>

I like that name... Just set it up as an alias for 'async' in
/etc/exports and add nagware to exportfs.

Trond

2015-04-01 20:24:33

by lyndat3

[permalink] [raw]
Subject: Re: file xfer over NFSv4 with 'sync' ~300X slower than with 'async' ?

> Yeah, understood, I just meant that if we'd originally named that export
> option, I don't know, "trash_me_on_reboot", then you wouldn't have
> gotten the "don't use async, it's unsafe" idea, and wouldn't have gotten
> into this mess. But, too late to do anything about that, I guess.


I'm surprised at how ineffective I was at finding & understanding the whole business to begin with.

If not for Trond's comments and stumbling on that reference re: NFS sync/async issues on "not-my-distro", I'd still be thrashing around.

A bunch of other people responding on the issue weren't clear about, or aware of, the issue. At least not that I was hearing. Entirely possible that I was asking the wrong question the wrong way, so I'll take my share of it.

Just looking at the decade+ of "my NFS is slow. why?" posts & comments I found & read, having this async/sync biz documented & easy to find for normal humans might be helpful. If it's there I sure as heck missed it.

My two cents.

Thanks again!

LT