Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755548AbYGaRVf (ORCPT ); Thu, 31 Jul 2008 13:21:35 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752308AbYGaRVY (ORCPT ); Thu, 31 Jul 2008 13:21:24 -0400 Received: from mail2.shareable.org ([80.68.89.115]:46561 "EHLO mail2.shareable.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751772AbYGaRVX (ORCPT ); Thu, 31 Jul 2008 13:21:23 -0400 Date: Thu, 31 Jul 2008 18:21:11 +0100 From: Jamie Lokier To: Linus Torvalds Cc: Miklos Szeredi , jens.axboe@oracle.com, akpm@linux-foundation.org, nickpiggin@yahoo.com.au, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch v3] splice: fix race with page invalidation Message-ID: <20080731172111.GA23644@shareable.org> References: <20080731001131.GA30900@shareable.org> <20080731004214.GA32207@shareable.org> <20080731061201.GA7156@shareable.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2997 Lines: 67 Linus Torvalds wrote: > > , or (b) while sendfile claims those > > pages, they are marked COW. > > .. and this one shows that you have no clue about performance of a memcpy. > > Once you do that COW, you're actually MUCH BETTER OFF just copying. > Copying a page is much cheaper than doing COW on it. That sounds familiar :-) But did you miss the bit where you DON'T COPY ANYTHING EVER*? COW is able provide _correctness_ for the rare corner cases which you're not optimising for. You don't actually copy more than 0.0% (*approx). Copying is supposed to be _so_ rare, in this, that it doesn't count. Correctness is so you can say "I've written that, when the syscall returns I know what data the other end will receive, if it succeeds". Knowing what can happen in what order is bread and butter around here, you know how useful that can. The cost of COW is TLB flushes*. But for splice, there ARE NO TLB FLUSHES because such files are not mapped writable! And you don't intend to write the file soon either. A program would be daft to use splice _intending_ to do those things, it obviously would be poor use of the interface. The kernel may as well copy the data if they did (and it's in a good position to decide). > Doing a "write()" really isn't that expensive. People think that > memory is slow, but memory isn't all that slow, and caches work > really well. Yes, memory is slow compared to a few reference count > increments, but memory is absolutely *not* slow when compared to the > overhead of TLB invalidates across CPUs etc. You're missing the real point of network splice(). It's not just for speed. It's for sharing data. Your TCP buffers can share data, when the same big lump is in flight to lots of clients. Think static file / web / FTP server, the kind with 80% of hits to 0.01% of the files roughly the same of your RAM. You want network splice() for the same reason you want shared libraries. So that memory use scales better with some loads**. You don't know how much good that will do, only, like shared libraries, that it's intrinsically good if it doesn't cost anything. And I'm suggesting that since no TLB flushes or COW copies are expected, and you can just copy at sendfile time if the page is already write-mapped anywhere, so the page references aren't complicated, it shouldn't cost anything. ** - Admittedly this is rather load dependent. But it's potentially O(c*d) for write vs. O(d) for sendfile, hand-wavingly, where c is the number of connections using d data. (Then again, as I work out the figures, RAM is getting cheaper faster than bandwidth-latency products are getting bigger... It's not a motivator except for cheapskates. But doesn't detract from intrinsic goodness.) -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/