Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964856AbXADONr (ORCPT ); Thu, 4 Jan 2007 09:13:47 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S964874AbXADONr (ORCPT ); Thu, 4 Jan 2007 09:13:47 -0500 Received: from brick.kernel.dk ([62.242.22.158]:9004 "EHLO kernel.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964856AbXADONq (ORCPT ); Thu, 4 Jan 2007 09:13:46 -0500 Date: Thu, 4 Jan 2007 15:16:38 +0100 From: Jens Axboe To: saeed bishara Cc: linux-kernel@vger.kernel.org Subject: Re: using splice/vmsplice to improve file receive performance Message-ID: <20070104141638.GB11203@kernel.dk> References: <20061222094858.GP17199@kernel.dk> <20061222113917.GQ17199@kernel.dk> <20061222124710.GR17199@kernel.dk> <20070104140813.GZ11203@kernel.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070104140813.GZ11203@kernel.dk> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6003 Lines: 134 On Thu, Jan 04 2007, Jens Axboe wrote: > On Wed, Jan 03 2007, saeed bishara wrote: > > On 12/22/06, Jens Axboe wrote: > > >On Fri, Dec 22 2006, saeed bishara wrote: > > >> On 12/22/06, Jens Axboe wrote: > > >> >On Fri, Dec 22 2006, saeed bishara wrote: > > >> >> On 12/22/06, Jens Axboe wrote: > > >> >> >On Thu, Dec 21 2006, saeed bishara wrote: > > >> >> >> Hi, > > >> >> >> I'm trying to use the splice/vmsplice system calls to improve the > > >> >> >> samba server write throughput, but before touching the smbd, I > > >started > > >> >> >> to improve the ttcp tool since it simple and has the same flow. I'm > > >> >> >> expecting to avoid the "copy_from_user" path when using those > > >> >> >> syscalls. > > >> >> >> so far, I couldn't make any improvement, actually the throughput > > >get > > >> >> >> worst. the new receive flow looks like this (code also attached): > > >> >> >> 1. read tcp packet (64 pages) to page aligned buffer. > > >> >> >> 2. vmsplice the buffer to pipe with SPLICE_F_MOVE. > > >> >> >> 3. splice the pipe to the file, also with SPLICE_F_MOVE. > > >> >> >> > > >> >> >> the strace shows that the splice takes a lot of time. also when > > >> >> >> profiling the kernel, I found that the memcpy() called to often !! > > >> >> > > > >> >> >(didn't see this until now, axboe@suse.de doesn't work anymore) > > >> >> > > > >> >> >I'm assuming that you mean you vmsplice with SPLICE_F_GIFT, to hand > > >> >> >ownership of the pages to the kernel (in which case SPLICE_F_MOVE > > >will > > >> >> >work, otherwise you get a copy)? If not, that'll surely cost you a > > >data > > >> >> >copy > > >> >> I'll try the vmplice with SPLICE_F_GIFT and splice with MOVE. btw, > > >> >> I noticed that the splice system call takes the bulk of the time, > > >> >> does it mean anything? > > >> > > > >> >Hard to say without seeing some numbers :-) > > >> I'm out of the office, I'll send it later. btw, my test bed ( the > > >> receiver side ) is arm9. does it matter? > > > > > >The vmsplice is basically vm intensive, so it could matter. > > > > > >> >> >This sounds remarkably like a recent thread on lkml, you may want to > > >> >> >read up on that. Basically using splice for network receive is a bit > > >of > > >> >> >a work-around now, since you do need the one copy and then vmsplice > > >that > > >> >> >into a pipe. To realize the full potential of splice, we first need > > >> >> >socket receive support so you can skip that step (splice from socket > > >to > > >> >> >pipe, splice pipe to file). > > >> >> Ashwini Kulkarni posted patches that implements that, see > > >> >> http://lkml.org/lkml/2006/9/20/272 . is that right? > > >> >> > > > >> >> >There was no test code attached, btw. > > >> >> sorry, here it is. > > >> >> can you please add sample application to your test tools (splice,fio > > >> >> ,,) that demonstrates my flow; socket to file using read & vmsplice? > > >> > > > >> >I didn't add such an example, since I had hoped that we would have > > >> >splice from socket support sooner rather than later. But I can do so, of > > >> >course. > > >> do you any preliminary patches? I can start playing with it. > > > > > >I don't, Intel posted a set of patches a few months ago though. I didn't > > >have time to look that at the time being, but you should be able to find > > >them in the archives. > > > > > >> >I'll try your test. One thing that sticks out initially is that you > > >> >should be using full pages, the splice pipe will not merge page > > >> >segments. So don't use a buflen less than the page size. > > >> > > >> yes, actually I run the ttcp with -l65536 ( 64KB ), and the buffer is > > >> always page aligned.also, the splice/vmsplice with MOVE or GIFT will > > >> fail if the buffer is not a whole pages. am I rigth? > > > > > >Yes. > > > > > >I added a simple splice-fromnet example in the splice git repo, see if > > >you can repeat your results with that. Doing: > > > > > ># ./splice-fromnet -g 2001 | ./splice-out -m /dev/null > > > > > >and > > > > > ># cat /dev/zero | netcat localhost 2001 > > > > > >gets me about 490MiB/sec, using a recv/write loop is around 413MiB/sec. > > >Not migrating pages gets me around 422MiB/sec. > > > > > >-- > > >Jens Axboe > > > > > > > > I've done some investigation in the splice flow and found the following: > > even when using vmsplice with GIFT and splice with MOVE, the user > > buffers still copied, I see that the memcpy from pipe_to_file() is > > called. > > I added debug messages in this function and here what I got: > > 1. the generic_pipe_buf_steal always fails, this is because the > > page_count is 2. > > 2. after then, the find_lock_page fails as well. > > 3. page_cache_alloc_cold succeeds. > > 4. but, since the buf->page is differs from the page (returned by > > page_cache_alloc_cold) the memcpy function is called. > > > > this behavior true for all the buffers that vmspliced to ext3 file. > > is this the expected behavior? is there any way to make the steal > > operation return with success? > > It works for me, with most pages. Using the vmsplice/splice-out from the > splice tools, doing > > $ ./vmsplice -g | ./splice-out -m g > > about half of the pages have count==1 and the steal suceeds. > > find_lock_page() will only suceed, if the file exists and is cached > already. splice-out will truncate the file, so it should never suceed > for that case. For both the find_lock_page() success and failure case > (page being allocated), it's a given that we need to copy the data. Testing a simpler case (not switching buffers), all but one page was stolen. I tested with on-stack and posix_memalign returned buffers. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/