Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762603AbXEWNvv (ORCPT ); Wed, 23 May 2007 09:51:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755337AbXEWNvm (ORCPT ); Wed, 23 May 2007 09:51:42 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:54971 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755150AbXEWNvl (ORCPT ); Wed, 23 May 2007 09:51:41 -0400 Date: Wed, 23 May 2007 14:51:40 +0100 From: Al Viro To: Miklos Szeredi Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org, torvalds@linux-foundation.org Subject: Re: [RFC PATCH] file as directory Message-ID: <20070523135140.GW4095@ftp.linux.org.uk> References: <20070523095127.GQ4095@ftp.linux.org.uk> <20070523102437.GS4095@ftp.linux.org.uk> <20070523113925.GT4095@ftp.linux.org.uk> <20070523121620.GU4095@ftp.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4748 Lines: 91 On Wed, May 23, 2007 at 03:01:38PM +0200, Miklos Szeredi wrote: > Someone might think of a way to make those work with directories. > Invisible directory entries, anyone? Not unless you manage to get working union-mount [*NOT* unionfs] > > * invalidation on unlink is still an open problem. > > * locking in final mntput() doesn't look nice; we probably need > > a new refcounting scheme for vfsmounts to make that work. I have a variant > > that might work here (and make life much easier for expiry logics in > > automount/shared trees, which is what it had been initially proposed for), > > Which variant? We had that "detached subtrees" thing, is that it? Umm... It is related to detached subtrees, but I'm not sure if it is what you are thinking about. Short version of the story: new counter (mnt_busy) that would be defined in the following way: the number of external references (not due to the vfsmount tree structure or from namespace to root) + the number of children that have non-zero ->mnt_busy. And a per-vfsmount flag ("goner"). The rules for handling ->mnt_busy: * duplicating external reference: increment m->mnt_busy * getting from m to child: increment child->mnt_busy, if it went from 0 to non-zero - increment m->mnt_busy as well (that's done under vfsmount_lock, so we can safely check for zero here). * getting from m to parent: increment parent->mnt_busy. * dropping external reference: decrement m->mnt_busy; if it's still non-zero, we are done. If it's zero, we are in for some work (and had acquired vfsmount_lock by atomic_dec_and_lock()). Here's what we do: * go through ancestors, decrementing ->mnt_busy, until we hit the root or get to one with ->mnt_busy staying non-zero. * find the most remote ancestor that has zero ->mnt_busy and is marked as goner (might be m itself). * if no such beast exists, we are done. * otherwise, detach the subtree rooted in that ancestor from its parent (if any) and unhash its root (if hashed). Now there is no external references to any vfsmount in that subtree. * now we can kill all vfsmounts in that subtree. * detaching m from parent: nothing; we trade a busy child of parent for new external reference to parent. * lazy umount: in addition to detaching everything from parents and dropping resulting external references to parents, mark everything in the subtree as goners. * normal umount: check ->mnt_busy *and* lack of children, detach, mark as goner, drop resulting external reference to parent. * fun new stuff - umount of intact subtree: detach the subtree from parent, do *not* dissolve it, mark everything in subtree as goners. If something we mark as goner is not busy, we can kill it and all its descendents. The subtree will be shrinking as its pieces lose external references. * check for expirability: "we hold an external reference to m and m->mnt_busy is 1". No need to look into children, etc. * your vfsmounts: simply mark them goners from the very beginning. > > but it still doesn't kill the need to deal with invalidation. And > > yes, NFS still needs it (and so do all network filesystems, really). > > The question of caching is related to that. > > So what's so special about invalidation? Why not just treat > dir-on-file mounts the same as any other ref on the dentry? Because of the case of having something mounted in that subtree. The current code doesn't even try to evict such stuff. NFS *does*, but it's not in position to do that decently (not NFS fault, it's just that we don't have the data needed for it). Note that one problem we used to have back then is gone - namely, per-namespace semaphores. It's a global semaphore now, so we *can* do cross-namespace rogering of mount trees without that kind of locking horrors. What we really need is "go through dentry subtree, try to evict everything we can, for anything that has stuff mounted on it go through all such vfsmounts and kick them and all their descendents out". That's what should happen on invalidation. From generic code, so that NFS wouldn't have to bother. And _that_ is what we could call from ->unlink() on your inode - would take care of submounts. Note that I'm not all that happy about this scheme; we might make it work, but I still want to see a good use scenario for that kind of stuff. Invalidation logics is a separate story - it's simply needed for existing stuff; that area sucks *badly*, regardless of adding these hybrid objects. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/