Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757723AbYGaS7K (ORCPT ); Thu, 31 Jul 2008 14:59:10 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753946AbYGaS6z (ORCPT ); Thu, 31 Jul 2008 14:58:55 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:53220 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753165AbYGaS6y (ORCPT ); Thu, 31 Jul 2008 14:58:54 -0400 Date: Thu, 31 Jul 2008 11:54:56 -0700 (PDT) From: Linus Torvalds To: Jamie Lokier cc: Miklos Szeredi , jens.axboe@oracle.com, akpm@linux-foundation.org, nickpiggin@yahoo.com.au, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch v3] splice: fix race with page invalidation In-Reply-To: <20080731172111.GA23644@shareable.org> Message-ID: References: <20080731001131.GA30900@shareable.org> <20080731004214.GA32207@shareable.org> <20080731061201.GA7156@shareable.org> <20080731172111.GA23644@shareable.org> User-Agent: Alpine 1.10 (LFD 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1994 Lines: 55 On Thu, 31 Jul 2008, Jamie Lokier wrote: > > But did you miss the bit where you DON'T COPY ANYTHING EVER*? COW is > able provide _correctness_ for the rare corner cases which you're not > optimising for. You don't actually copy more than 0.0% (*approx). The thing is, just even _marking_ things COW is the expensive part. If we have to walk page tables - we're screwed. > The cost of COW is TLB flushes*. But for splice, there ARE NO TLB > FLUSHES because such files are not mapped writable! For splice, there are also no flags to set, no extra tracking costs, etc etc. But yes, we could make splice (from a file) do something like - just fall back to copy if the page is already mapped (page->mapcount gives us that) - set a bit ("splicemapped") when we splice it in, and increment page->mapcount for each splice copy. - if a "splicemapped" page is ever mmap'ed or written to (either through write or truncate), we COW it then (and actually move the page cache page - it would be a "woc": a reverse cow, not a normal one). - do all of this with page lock held, to make sure that there are no writers or new mappers happening. So it's probably doable. (We could have a separate "splicecount", and actually allow non-writable mappings, but I suspect we cannot afford the space in teh "struct space" for a whole new count). > You're missing the real point of network splice(). > > It's not just for speed. > > It's for sharing data. Your TCP buffers can share data, when the same > big lump is in flight to lots of clients. Think static file / web / > FTP server, the kind with 80% of hits to 0.01% of the files roughly > the same of your RAM. Maybe. Does it really show up as a big thing? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/