Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754997Ab2K1UDr (ORCPT ); Wed, 28 Nov 2012 15:03:47 -0500 Received: from mail-wg0-f44.google.com ([74.125.82.44]:55966 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754414Ab2K1UDp (ORCPT ); Wed, 28 Nov 2012 15:03:45 -0500 MIME-Version: 1.0 In-Reply-To: References: <20121120180949.GG1408@quack.suse.cz> <50AF7901.20401@kernel.dk> <50B46E05.70906@kernel.dk> <50B4B313.3030707@kernel.dk> <50B5CC5A.8060607@kernel.dk> From: Linus Torvalds Date: Wed, 28 Nov 2012 12:03:23 -0800 X-Google-Sender-Auth: mDKaFUeQJVHB-RNQSaXjDsVVFck Message-ID: Subject: Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow) To: Mikulas Patocka Cc: Jens Axboe , Jeff Chua , Lai Jiangshan , Jan Kara , lkml , linux-fsdevel Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1917 Lines: 43 On Wed, Nov 28, 2012 at 11:50 AM, Mikulas Patocka wrote: > > mmap_region() doesn't care about the block size. But a lot of > page-in/page-out code does. That seems a bogus argument. mmap() is in *no* way special. The exact same thing happens for regular read/write. Yet somehow the mmap code is special-cased, while the normal read-write code is not. I suspect it might be *easier* to trigger some issues with mmap, but that still isn't a good enough reason to special-case it. We don't add locking to one please just because that one place shows some race condition more easily. We fix the locking. So for example, maybe the code that *actually* cares about the buffer size (the stuff that allocates buffers in fs/buffer.c) needs to take that new percpu read lock. Basically, any caller of "alloc_page_buffers()/create_empty_buffers()" or whatever. I also wonder whether we need it *at*all*. I suspect that we could easily have multiple block-sizes these days for the same block device. It *used* to be (millions of years ago, when dinosaurs roamed the earth) that the block buffers were global and shared with all users of a partition. But that hasn't been true since we started using the page cache, and I suspect that some of the block size changing issues are simply entirely stale. Yeah, yeah, there could be some coherency issues if people write to the block device through different block sizes, but I think we have those coherency issues anyway. The page-cache is not coherent across different mapping inodes anyway. So I really suspect that some of this is "legacy logic". Or at least perhaps _should_ be. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/