Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755741Ab2K1UNv (ORCPT ); Wed, 28 Nov 2012 15:13:51 -0500 Received: from mail-wi0-f170.google.com ([209.85.212.170]:36797 "EHLO mail-wi0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754882Ab2K1UNt (ORCPT ); Wed, 28 Nov 2012 15:13:49 -0500 MIME-Version: 1.0 In-Reply-To: References: <20121120180949.GG1408@quack.suse.cz> <50AF7901.20401@kernel.dk> <50B46E05.70906@kernel.dk> <50B4B313.3030707@kernel.dk> <50B5CC5A.8060607@kernel.dk> From: Linus Torvalds Date: Wed, 28 Nov 2012 12:13:27 -0800 X-Google-Sender-Auth: sDql_9u7sYM3Asa5ELsQiRObqr8 Message-ID: Subject: Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow) To: Mikulas Patocka Cc: Jens Axboe , Jeff Chua , Lai Jiangshan , Jan Kara , lkml , linux-fsdevel Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1894 Lines: 47 On Wed, Nov 28, 2012 at 12:03 PM, Linus Torvalds wrote: > > mmap() is in *no* way special. The exact same thing happens for > regular read/write. Yet somehow the mmap code is special-cased, while > the normal read-write code is not. I just double-checked, because it's been a long time since I actually looked at the code. But yeah, block device read/write uses the pure page cache functions. IOW, it has the *exact* same IO engine as mmap() would have. So here's my suggestion: - get rid of *all* the locking in aio_read/write and the splice paths - get rid of all the stupid mmap games - instead, add them to the functions that actually use "blkdev_get_block()" and "blkdev_get_blocks()" and nowhere else. That's a fairly limited number of functions: blkdev_{read,write}page(), blkdev_direct_IO() and blkdev_write_{begin,end}() Doesn't that sounds simpler? And more logical: it protects the actual places that use the block size of the device. I dunno. Maybe there is some fundamental reason why the above is broken, but it seems to be a much simpler approach. Sure, you need to guarantee that the people who get the write-lock cannot possibly cause IO while holding it, but since the only reason to get the write lock would be to change the block size, that should be pretty simple, no? Yeah, yeah, I'm probably missing something fundamental, but the above sounds like the simple approach to fixing things. Aiming for having the block size read-lock be taken by the things that pass in the block-size itself. It would be nice for things to be logical and straightforward. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/