Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756695AbYBDSWV (ORCPT ); Mon, 4 Feb 2008 13:22:21 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752369AbYBDSWK (ORCPT ); Mon, 4 Feb 2008 13:22:10 -0500 Received: from accolon.hansenpartnership.com ([76.243.235.52]:46739 "EHLO accolon.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753196AbYBDSWI (ORCPT ); Mon, 4 Feb 2008 13:22:08 -0500 Subject: Re: Integration of SCST in the mainstream Linux kernel From: James Bottomley To: Vladislav Bolkhovitin Cc: Bart Van Assche , Linus Torvalds , Andrew Morton , FUJITA Tomonori , linux-scsi@vger.kernel.org, scst-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org In-Reply-To: <47A751C5.60600@vlnb.net> References: <1201639331.3069.58.camel@localhost.localdomain> <47A05CBD.5050803@vlnb.net> <47A7049A.9000105@vlnb.net> <1202139015.3096.5.camel@localhost.localdomain> <47A73C86.3060604@vlnb.net> <1202144767.3096.38.camel@localhost.localdomain> <47A7488B.4080000@vlnb.net> <1202145901.3096.49.camel@localhost.localdomain> <47A751C5.60600@vlnb.net> Content-Type: text/plain Date: Mon, 04 Feb 2008 12:22:02 -0600 Message-Id: <1202149322.3096.66.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.12.3 (2.12.3-1.fc8) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3582 Lines: 86 On Mon, 2008-02-04 at 20:56 +0300, Vladislav Bolkhovitin wrote: > James Bottomley wrote: > > On Mon, 2008-02-04 at 20:16 +0300, Vladislav Bolkhovitin wrote: > > > >>James Bottomley wrote: > >> > >>>>>>So, James, what is your opinion on the above? Or the overall SCSI target > >>>>>>project simplicity doesn't matter much for you and you think it's fine > >>>>>>to duplicate Linux page cache in the user space to keep the in-kernel > >>>>>>part of the project as small as possible? > >>>>> > >>>>> > >>>>>The answers were pretty much contained here > >>>>> > >>>>>http://marc.info/?l=linux-scsi&m=120164008302435 > >>>>> > >>>>>and here: > >>>>> > >>>>>http://marc.info/?l=linux-scsi&m=120171067107293 > >>>>> > >>>>>Weren't they? > >>>> > >>>>No, sorry, it doesn't look so for me. They are about performance, but > >>>>I'm asking about the overall project's architecture, namely about one > >>>>part of it: simplicity. Particularly, what do you think about > >>>>duplicating Linux page cache in the user space to have zero-copy cached > >>>>I/O? Or can you suggest another architectural solution for that problem > >>>>in the STGT's approach? > >>> > >>> > >>>Isn't that an advantage of a user space solution? It simply uses the > >>>backing store of whatever device supplies the data. That means it takes > >>>advantage of the existing mechanisms for caching. > >> > >>No, please reread this thread, especially this message: > >>http://marc.info/?l=linux-kernel&m=120169189504361&w=2. This is one of > >>the advantages of the kernel space implementation. The user space > >>implementation has to have data copied between the cache and user space > >>buffer, but the kernel space one can use pages in the cache directly, > >>without extra copy. > > > > > > Well, you've said it thrice (the bellman cried) but that doesn't make it > > true. > > > > The way a user space solution should work is to schedule mmapped I/O > > from the backing store and then send this mmapped region off for target > > I/O. For reads, the page gather will ensure that the pages are up to > > date from the backing store to the cache before sending the I/O out. > > For writes, You actually have to do a msync on the region to get the > > data secured to the backing store. > > James, have you checked how fast is mmaped I/O if work size > size of > RAM? It's several times slower comparing to buffered I/O. It was many > times discussed in LKML and, seems, VM people consider it unavoidable. Erm, but if you're using the case of work size > size of RAM, you'll find buffered I/O won't help because you don't have the memory for buffers either. > So, using mmaped IO isn't an option for high performance. Plus, mmaped > IO isn't an option for high reliability requirements, since it doesn't > provide a practical way to handle I/O errors. I think you'll find it does ... the page gather returns -EFAULT if there's an I/O error in the gathered region. msync does something similar if there's a write failure. > > You also have to pull tricks with > > the mmap region in the case of writes to prevent useless data being read > > in from the backing store. > > Can you be more exact and specify what kind of tricks should be done for > that? Actually, just avoid touching it seems to do the trick with a recent kernel. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/