Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752256AbZLRHKs (ORCPT ); Fri, 18 Dec 2009 02:10:48 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751153AbZLRHKq (ORCPT ); Fri, 18 Dec 2009 02:10:46 -0500 Received: from bedivere.hansenpartnership.com ([66.63.167.143]:35830 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750986AbZLRHKp (ORCPT ); Fri, 18 Dec 2009 02:10:45 -0500 Subject: Re: [git patches] xfs and block fixes for virtually indexed arches From: James Bottomley To: Dave Chinner Cc: FUJITA Tomonori , jens.axboe@oracle.com, torvalds@linux-foundation.org, tytso@mit.edu, kyle@mcmartin.ca, linux-parisc@vger.kernel.org, linux-kernel@vger.kernel.org, hch@infradead.org, linux-arch@vger.kernel.org In-Reply-To: <20091218024440.GG4850@discord.disaster> References: <20091217193648.GI4489@kernel.dk> <1261094220.2752.27.camel@mulgrave.site> <20091218095944G.fujita.tomonori@lab.ntt.co.jp> <20091218024440.GG4850@discord.disaster> Content-Type: text/plain; charset="UTF-8" Date: Fri, 18 Dec 2009 08:10:39 +0100 Message-Id: <1261120239.3013.10.camel@mulgrave.site> Mime-Version: 1.0 X-Mailer: Evolution 2.28.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3974 Lines: 91 On Fri, 2009-12-18 at 13:44 +1100, Dave Chinner wrote: > On Fri, Dec 18, 2009 at 10:00:21AM +0900, FUJITA Tomonori wrote: > > On Fri, 18 Dec 2009 00:57:00 +0100 > > James Bottomley wrote: > > > > > On Thu, 2009-12-17 at 20:36 +0100, Jens Axboe wrote: > > > > On Thu, Dec 17 2009, Linus Torvalds wrote: > > > > > > > > > > > > > > > On Thu, 17 Dec 2009, tytso@mit.edu wrote: > > > > > > > > > > > > Sure, but there's some rumors/oral traditions going around that some > > > > > > block devices want bio address which are page aligned, because they > > > > > > want to play some kind of refcounting game, > > > > > > > > > > Yeah, you might be right at that. > > > > > > > > > > > And it's Weird Shit(tm) (aka iSCSI, AoE) type drivers, that most of us > > > > > > don't have access to, so just because it works Just Fine on SATA doesn't > > > > > > mean anything. > > > > > > > > > > > > And none of this is documented anywhere, which is frustrating as hell. > > > > > > Just rumors that "if you do this, AoE/iSCSI will corrupt your file > > > > > > systems". > > > > > > > > > > ACK. Jens? > > > > > > > > I've heard those rumours too, and I don't even know if they are true. > > > > Who has a pointer to such a bug report and/or issue? The block layer > > > > itself doesn't not have any such requirements, and the only places where > > > > we play page games is for bio's that were explicitly mapped with pages > > > > by itself (like mapping user data).o > > > > > > OK, so what happened is that prior to the map single fix > > > > > > commit df46b9a44ceb5af2ea2351ce8e28ae7bd840b00f > > > Author: Mike Christie > > > Date: Mon Jun 20 14:04:44 2005 +0200 > > > > > > [PATCH] Add blk_rq_map_kern() > > > > > > > > > bio could only accept user space buffers, so we had a special path for > > > kernel allocated buffers. That commit unified the path (with a separate > > > block API) so we could now submit kmalloc'd buffers via block APIs. > > > > > > So the rule now is we can accept any user mapped area via > > > blk_rq_map_user and any kmalloc'd area via blk_rq_map_kern(). We might > > > not be able to do a stack area (depending on how the arch maps the > > > stack) and we definitely cannot do a vmalloc'd area. > > > > > > So it sounds like we only need a blk_rq_map_vmalloc() using the same > > > techniques as the patch set and we're good to go. > > > > I'm not sure about it. > > > > As I said before (when I was against this 'adding vmalloc support to > > the block layer' stuff), are there potential users of this except for > > XFS? Are there anyone who does such a thing now? > > As Christoph already mentioned, XFS is not passing the vmalloc'd > range to the block layer - it passes the underlying pages to the > block layer. Hence I'm not sure there actually is anyone who is > passing vmalloc'd addresses to the block layer. Perhaps we should > put a WARN_ON() in the block layer to catch anyone doing such a > thing before considering supporting vmalloc'd addresses in the block > layer? vmalloc is just an alias for vmap/vmalloc in the above statements (basically anything with an additional kernel virtual mapping which causes aliases). If we support vmap, we naturally support vmalloc as well. > > This API might be useful for only journaling file systems using log > > formats that need large contiguous buffer. Sound like only XFS? > > FWIW, mapped buffers larger than PAGE_SIZE are used for more than just log > recovery in XFS. e.g. filesystems with directory block size larger > than page size uses mapped buffers. However, XFS is the only fs that actually uses kernel virtual mapping to solve this problem. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/