Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754028AbYFXSYk (ORCPT ); Tue, 24 Jun 2008 14:24:40 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751666AbYFXSYa (ORCPT ); Tue, 24 Jun 2008 14:24:30 -0400 Received: from fxip-0047f.externet.hu ([88.209.222.127]:41000 "EHLO pomaz-ex.szeredi.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751321AbYFXSY3 (ORCPT ); Tue, 24 Jun 2008 14:24:29 -0400 To: torvalds@linux-foundation.org CC: miklos@szeredi.hu, jens.axboe@oracle.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org In-reply-to: (message from Linus Torvalds on Tue, 24 Jun 2008 10:30:21 -0700 (PDT)) Subject: Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations References: <20080621154607.154640724@szeredi.hu> <20080621154726.494538562@szeredi.hu> <20080624080440.GJ20851@kernel.dk> <20080624111913.GP20851@kernel.dk> Message-Id: From: Miklos Szeredi Date: Tue, 24 Jun 2008 20:24:16 +0200 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2691 Lines: 62 > > > > OK it could be done, possibly at great pain. But why is it important? > > What's the use case where it matters that splice-in should not block > > on the read? > > If you're splicing from one file to another, the _goal_ you should have is > that you want to have a mode where you can literally steal the page, and > never _ever_ be IO-synchronous (well, meta-data accesses will be, you > can't really avoid that sanely). > > IOW, it should be possible to do a > > - splice() file->pipe with SPLICE_STEAL > don't even wait for the read to finish! > > - splice() pipe->file > insert the page into the destination page cache, mark it dirty > > an no, we probably do not support that yet (for example, I wouldn't be > surprised if "dirty + !uptodate" is considered an error for the VM even > though the page should still be locked from the read), but it really was a > design goal. OK. But currently we have an implementation that 1) doesn't do any of this, unless readahead is disabled 2) if readhead is disabled, it does the async splice-in (file->pipe), but blocks on splice-out (pipe->any) 3) it blocks on read(pipefd, ...) even if pipefd is set to O_NONBLOCK And in addition, splice-in and splice-out can return a short count or even zero count if the filesystem invalidates the cached pages during the splicing (data became stale for example). Are these the right semantics? I'm not sure. > Also, asynchronous is important even when you "just" want to overlap IO > with CPU, so even if it's going to the network, then if you can delay the > "wait for IO to complete" until the last possible moment (ie the _second_ > splice, when you end up copying it into an SKB, then both your throughput > and your latency are likely going to be noticeably better, because you've > now been able to do a lot of the costly CPU work (system exit + entry at > the least, but hopefully a noticeable portion of the TCP stack too) > overlapped with the disk seeking. My feeling is (and I'm not an expert in this area at all) is that disk seeking will be many orders of magnitude slower than any CPU work associated with getting the data out to the network. > So asynchronous ops was really one of the big goals for splice. Well, if it can be implemented right, I have nothing against that. But what we currently have is very far from that, and it seems to me there are very big hurdles to overcome yet. Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/