Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933507AbbDUVUT (ORCPT ); Tue, 21 Apr 2015 17:20:19 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:40411 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932539AbbDUVUN (ORCPT ); Tue, 21 Apr 2015 17:20:13 -0400 Date: Tue, 21 Apr 2015 22:20:07 +0100 From: Al Viro To: Richard Weinberger Cc: Christoph Hellwig , Linus Torvalds , NeilBrown , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Message-ID: <20150421212007.GU889@ZenIV.linux.org.uk> References: <20150420181222.GK889@ZenIV.linux.org.uk> <20150421144959.GR889@ZenIV.linux.org.uk> <20150421150408.GA29838@infradead.org> <553668C1.8030707@nod.at> <20150421154504.GT889@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150421154504.GT889@ZenIV.linux.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3908 Lines: 73 On Tue, Apr 21, 2015 at 04:45:04PM +0100, Al Viro wrote: > On Tue, Apr 21, 2015 at 05:12:01PM +0200, Richard Weinberger wrote: > > > I'm pretty sure we can kill it. I had the plan to rip it out during this merge window > > along with other broken UML stuff but I was too late to ask on the UML mailinglist > > if someone is using it (which I really doubt). > > So, let's kill it with v4.2. > > Let's do it. Then ->put_link() is left in an interesting situation - *all* > instances only use the 'cookie' argument... OK, so here's what we have: * a lot of filesystems are using page_follow_link_light(); for RCU mode they should simply look for page and if it's there and uptodate, that's it - just grab a reference and be done. If it's not uptodate - oh, well, fallback to non-RCU mode. Corresponding ->put_link() doesn't give a damn which inode or dentry it is - it's just page_cache_release() (we need to get rid of that kmap() crap anyway). * a lot of fast symlinks are using only inode; no ->put_link(), no blocking operations, etc. No problem at all. * shmem would probably want something similar to what page_follow_link_light() would be doing for RCU case. * befs: should switch to page_follow_link_light(); just a matter of giving it proper ->readpage(). * NFS: probably as in Neil's series, except that we really ought to add a helper for what page_follow_link_light() would do in RCU case, rather than open-coding it here (and, again, kmap/kunmap crap should go) * /proc/self and its per-thread ilk: just do GFP_ATOMIC allocation for RCU case (and handle failure as -ECHILD rather than usual -ENOMEM). * proc_symlink() stuff: uses only inode, nothing blocking, no problem. * 9p, cifs and fuse: those always query server on ->follow_link(); -ECHILD and be done with that. _IF_ they want some kind of caching, they can do as NFS does. hostfs is that way too. * gfs2: _probably_ want to bugger off with -ECHILD; OTOH, ocfs2 uses page_follow_link_light(), maybe correctly, maybe not, and it ought to have similar issues... * kernfs, configfs: -ECHILD. And git rm is _very_ tempting after reading that code... * lustre: hell knows, maybe always -ECHILD, maybe something like NFS. * XFS: see above. * hppfs: agreed to kill it off * autofs: not sure; it would be almost the usual fast symlink, if not for the fact that it marks an object reached from dentry as "used now". With RCU pathwalk it's _probably_ harmless, but I'd like a confirmation from autofs folks. * /proc/*/ns/*: in theory, we might make it handle RCU mode, but it's probably easier to say "just bugger off" * /proc/*/fd/*, /proc/*/exe, /proc/*/cwd, /proc/*/root: in principle doable, but not without serious massage. * /proc/*/map_files/*: -ECHILD. * overlayfs: usual "use GFP_ATOMIC in RCU mode, treat failures as -ECHILD". * ecryptfs: -ECHILD (and its use of ->readlink() is fishy, IMO). I agree that unlazy_walk() attempted when walking a symlink ought to fail with -ECHILD; we can't legitimize the symlink itself, so once we are out of RCU mode, there's nothing to hold the inode of symlink (and its body) from getting freed. Solution is wrong though; for example, when nested symlink occurs in the middle of a trailing one, we should *not* remove the flag upon leaving the nested symlink. Another unpleasant thing is that ->follow_link() saying "can't do that in RCU mode" ends up with restart from scratch - that actually risks to be worse than the mainline; there we would attempt unlazy_walk() and normally it would've succeed. AFAICS, the real rule is "can't unlazy if nd->last.name points into a symlink body and we might still need to access it"... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/