Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754157Ab2K2SM5 (ORCPT ); Thu, 29 Nov 2012 13:12:57 -0500 Received: from mail-we0-f174.google.com ([74.125.82.174]:39064 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753983Ab2K2SMx (ORCPT ); Thu, 29 Nov 2012 13:12:53 -0500 MIME-Version: 1.0 In-Reply-To: <20121129175102.GA3490@shiny> References: <20121129141249.GB30766@shiny> <20121129175102.GA3490@shiny> From: Linus Torvalds Date: Thu, 29 Nov 2012 10:12:30 -0800 X-Google-Sender-Auth: 15LEIywonH80M2xM6pqqFwF5Cm8 Message-ID: Subject: Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow) To: Chris Mason , Linus Torvalds , Chris Mason , Mikulas Patocka , Jens Axboe , Jeff Chua , Lai Jiangshan , Jan Kara , lkml , linux-fsdevel , Al Viro Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2224 Lines: 43 On Thu, Nov 29, 2012 at 9:51 AM, Chris Mason wrote: > > The bigger question is do we have users that expect to be able to set > the blocksize after mmaping the block device (no writes required)? I > actually feel a little bad for taking up internet bandwidth asking, but > it is a change in behaviour. Yeah, it is. That said, I don't think people will really notice. Nobody mmap's block devices outside of some databases, afaik, and nobody sane mounts a partition at the same time a DB is using it. So I think the new EBUSY check is *ugly*, but I don't realistically believe that it is a problem. The ugliness of the locking is why I'm not a huge fan of it, but if it works I can live with it. But yes, the mmap tests are new with the locking, and could in theory be problematic if somebody reports that it breaks anything. And like the locking, they'd just go away if we just do the fs/buffer.c approach instead. Because doing things in fs/buffer.c simply means that we don't care (and serialization is provided by the page lock on a per-page basis, which is what mmap relies on anyway). So doing the per-page fs/buffer.c approach (along with the "ACCESS_ONCE()" on inode->i_blkbits to make sure we get *one* consistent value, even if we don't care *which* value it is) would basically revert to all the old semantics. The only thing it would change is that we wouldn't see oopses. (And in theory, it would allow us to actively mix-and-match different block sizes for a block device, but realistically I don't think there are any actual users of that - although I could imagine that a filesystem would use a smaller block size for file tail-blocks etc, and still want to use the fs/buffer.c code, so it's *possible* that it would be useful, but filesystems have been able to do things like that by just doing their buffers by hand anyway, so it's not really fundamentally new, just a possible generalization of code) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/