I need to copy large (> 100GB) files between machines on a fast
network. Both machines have reasonably fast disk subsystems, with
read/write performance benchmarked at > 800 MB/sec. Using 10GigE cards
and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP
throughput better than 600 MB/sec.
My question is how best to move the actual file. NFS writes appear to
max out at a little over 100 MB/sec on this configuration. FTP and
rcp give me around 250 MB/sec. Thus I am planning to write custom
code to send and receive the file.
For sending, I believe my best options are:
1) O_DIRECT read() + send()
2) mmap() + madvise(WILLNEED) + send()
3) fadvise(WILLNEED) + sendfile()
I am leaning towards (3), since I gather that sendfile() is supposed
to be pretty fast.
My question is what to do on the receiving end. In short, if I want
the equivalent of a "recvfile()" to go with sendfile(), what is my
best bet on Linux?
I will probably try recv() + O_DIRECT write(), but I am curious if
there are other approaches I should try.
Thanks!
- Pat
On Jan 17 2008 12:53, Patrick J. LoPresti wrote:
>Using 10GigE cards and the usual tweaks to tcp_rmem etc., I am
>getting single-stream TCP throughput better than 600 MB/sec.
Hm, be aware not to hit the sequence wrap :)
>1) O_DIRECT read() + send()
>2) mmap() + madvise(WILLNEED) + send()
>3) fadvise(WILLNEED) + sendfile()
>
>I am leaning towards (3), since I gather that sendfile() is supposed
>to be pretty fast.
>
>My question is what to do on the receiving end. In short, if I want
>the equivalent of a "recvfile()" to go with sendfile(), what is my
>best bet on Linux?
splice() I think.
On Thu, 17 Jan 2008, Patrick J. LoPresti wrote:
> I need to copy large (> 100GB) files between machines on a fast
> network. Both machines have reasonably fast disk subsystems, with
> read/write performance benchmarked at > 800 MB/sec. Using 10GigE cards
> and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP
> throughput better than 600 MB/sec.
>
> My question is how best to move the actual file. NFS writes appear to
> max out at a little over 100 MB/sec on this configuration.
did your "usual tweaks" include mounting with -o tcp,rsize=262144,wsize=262144 ?
i should have kept better notes last time i was experimenting with this,
but from memory here's what i found:
- if i used three NFS clients and was reading from page cache on the
server i hit 1.2GB/s total throughput from the server. the client
NFS code was maxing out one CPU on each of the client machines.
- disk subsystem (sw raid10 far2) was capable of 600MB/s+ when read
locally on the NFS server, but topped out around ~250MB/s when read
remotely (no matter how many clients).
my workload was read-intensive so i didn't experiment with writes...
-dean