2008-01-17 20:54:18

by Patrick J. LoPresti

[permalink] [raw]
Subject: Fast network file copy; "recvfile()" ?

I need to copy large (> 100GB) files between machines on a fast
network. Both machines have reasonably fast disk subsystems, with
read/write performance benchmarked at > 800 MB/sec. Using 10GigE cards
and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP
throughput better than 600 MB/sec.

My question is how best to move the actual file. NFS writes appear to
max out at a little over 100 MB/sec on this configuration. FTP and
rcp give me around 250 MB/sec. Thus I am planning to write custom
code to send and receive the file.

For sending, I believe my best options are:

1) O_DIRECT read() + send()
2) mmap() + madvise(WILLNEED) + send()
3) fadvise(WILLNEED) + sendfile()

I am leaning towards (3), since I gather that sendfile() is supposed
to be pretty fast.

My question is what to do on the receiving end. In short, if I want
the equivalent of a "recvfile()" to go with sendfile(), what is my
best bet on Linux?

I will probably try recv() + O_DIRECT write(), but I am curious if
there are other approaches I should try.

Thanks!

- Pat


2008-01-17 22:52:00

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Fast network file copy; "recvfile()" ?


On Jan 17 2008 12:53, Patrick J. LoPresti wrote:

>Using 10GigE cards and the usual tweaks to tcp_rmem etc., I am
>getting single-stream TCP throughput better than 600 MB/sec.

Hm, be aware not to hit the sequence wrap :)

>1) O_DIRECT read() + send()
>2) mmap() + madvise(WILLNEED) + send()
>3) fadvise(WILLNEED) + sendfile()
>
>I am leaning towards (3), since I gather that sendfile() is supposed
>to be pretty fast.
>
>My question is what to do on the receiving end. In short, if I want
>the equivalent of a "recvfile()" to go with sendfile(), what is my
>best bet on Linux?

splice() I think.

2008-01-21 18:30:08

by dean gaudet

[permalink] [raw]
Subject: Re: Fast network file copy; "recvfile()" ?

On Thu, 17 Jan 2008, Patrick J. LoPresti wrote:

> I need to copy large (> 100GB) files between machines on a fast
> network. Both machines have reasonably fast disk subsystems, with
> read/write performance benchmarked at > 800 MB/sec. Using 10GigE cards
> and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP
> throughput better than 600 MB/sec.
>
> My question is how best to move the actual file. NFS writes appear to
> max out at a little over 100 MB/sec on this configuration.

did your "usual tweaks" include mounting with -o tcp,rsize=262144,wsize=262144 ?

i should have kept better notes last time i was experimenting with this,
but from memory here's what i found:

- if i used three NFS clients and was reading from page cache on the
server i hit 1.2GB/s total throughput from the server. the client
NFS code was maxing out one CPU on each of the client machines.

- disk subsystem (sw raid10 far2) was capable of 600MB/s+ when read
locally on the NFS server, but topped out around ~250MB/s when read
remotely (no matter how many clients).

my workload was read-intensive so i didn't experiment with writes...

-dean