Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755544AbbEEFWM (ORCPT ); Tue, 5 May 2015 01:22:12 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:39588 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752764AbbEEFWI (ORCPT ); Tue, 5 May 2015 01:22:08 -0400 Date: Tue, 5 May 2015 06:22:05 +0100 From: Al Viro To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Christoph Hellwig , Neil Brown Subject: [RFC][PATCHSET] non-recursive pathname resolution Message-ID: <20150505052205.GS889@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9407 Lines: 197 My apologies for the patchbomb from hell, but I really think it needs a public review. There are several things in that queue; most of it is fs/namei.c rewrite and closer to the end we get * non-recursive link_path_walk() * the only symlink-related limitation is that there should be no more than 40 symlinks traversals, no matter how nested they are. * stack footprint independent from the nesting and about on par with mainline in "no symlinks" case. * some work towards the symlink traversal without dropping from RCU mode; not complete, but much better starting point than the current mainline. * IMO more readable logics around the pathname resolution in general, but then I'd spent the last couple of weeks messing with that code, so my perception might be seriously warped. Most of that stuff is mine, with exception of 3 commits from Neil's series (with some modifications). I've marked those with [N] in the list below. It really needs review and more testing; it *does* pass xfstests, LTP and local tests with no regressions, but at the very least it'll need careful profiling, as well as more eyes to read it through. The whole thing is based at -rc1. Individual patches are in followups; those who prefer to access it via git can fetch it at git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git link_path_walk Please, read and comment. Roadmap: * assorted preliminary patches: [1/79] 9p: don't bother with 4K allocation for 24-byte local array... [2/79] 9p: don't bother with __getname() in ->follow_link() [3/79][N] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link [4/79] ext4: split inode_operations for encrypted symlinks off the rest * getting rid of boilerplate for inline symlinks [5/79] libfs: simple_follow_link() [6/79] ext2: use simple_follow_link() [7/79] befs: switch to simple_follow_link() [8/79] ext3: switch to simple_follow_link() [9/79] ext4: switch to simple_follow_link() [10/79] jffs2: switch to simple_follow_link() [11/79] shmem: switch to simple_follow_link() [12/79] debugfs: switch to simple_follow_link() [13/79] ufs: switch to simple_follow_link() [14/79] ubifs: switch to simple_follow_link() [15/79] sysv: switch to simple_follow_link() [16/79] jfs: switch to simple_follow_link() [17/79] freevxfs: switch to simple_follow_link() [18/79] exofs: switch to {simple,page}_symlink_inode_operations [19/79] ceph: switch to simple_follow_link() * more assorted preparations: [20/79] logfs: fix a pagecache leak for symlinks [21/79][N] SECURITY: remove nameidata arg from inode_follow_link. [22/79] uninline walk_component() * getting trailing symlinks handling in do_last() into saner shape: [23/79] namei: take O_NOFOLLOW treatment into do_last() [24/79] do_last: kill symlink_ok [25/79] do_last: regularize the logics around following symlinks * getting rid of nameidata on stack during ->rmdir()/->unlink() and double nameidata during ->rename(): [26/79] namei: get rid of lookup_hash() [27/79] name: shift nameidata down into user_path_walk() * single instance of nameidata across the adjacent path_mountpoint() calls: [28/79] namei: lift nameidata into filename_mountpoint() At that point preparations end. The next one changes the calling conventions for ->follow_link/->put_link. We still are passing nameidata to ->follow_link(), but almost all instances do not use it anymore. Moreover, nd->saved_names[] is gone and so's nameidata in generic_readlink(). That one has the biggest footprint outside of fs/namei.c - it's a flagday change. [29/79] new ->follow_link() and ->put_link() calling conventions * some preparations for link_path_walk() massage: [30/79] namei.c: separate the parts of follow_link() that find the link body [31/79] namei: don't bother with ->follow_link() if ->i_link is set [32/79] namei: introduce nameidata->link [33/79] do_last: move path there from caller's stack frame * link_path_walk()/follow_link() rewrite. It's done in small steps, getting the damn thing into form that would allow to get rid of recursion: [34/79] namei: expand nested_symlink() in its only caller [35/79] namei: expand the call of follow_link() in link_path_walk() [36/79] namei: move the calls of may_follow_link() into follow_link() [37/79] namei: rename follow_link to trailing_symlink, move it down [38/79] link_path_walk: handle get_link() returning ERR_PTR() immediately [39/79] link_path_walk: don't bother with walk_component() after jumping link [40/79] link_path_walk: turn inner loop into explicit goto [41/79] link_path_walk: massage a bit more [42/79] link_path_walk: get rid of duplication [43/79] link_path_walk: final preparations to killing recursion * dumb and staightforward "put all variables we needed to be separate in each frame into an array of structs" killing of recursion. Array is still on stack frame of link_path_walk(), we are still with the same depth limits, etc. All of that will change shortly afterwards. [44/79] link_path_walk: kill the recursion * post-operation cleanup: [45/79] link_path_walk: split "return from recursive call" path [46/79] link_path_walk: cleanup - turn goto start; into continue; * array moves into nameidata, with extra (0th) element used instead of link/cookie pairs of local variables we used to have in trailing symlink loops: [47/79] namei: move link/cookie pairs into nameidata * more post-op cleanup - arguments of several functions are always the fields of the same array element or pointers to such: [48/79] namei: trim redundant arguments of trailing_symlink() [49/79] namei: trim redundant arguments of fs/namei.c:put_link() [50/79] namei: trim the arguments of get_link() * trim the array in nameidata to a couple of elements, use it until we need more and allocate a big (41-element) array when needed. Result: no more nesting limitations. [51/79] namei: remove restrictions on nesting depth * the use of array is seriously counterintuitive - for everything except the last component we are _not_ using the first element of array; moreover, way too many places are manipulating nd->depth. I've tried to sanitize all of that in one patch, but after several failures I ended up splitting the massage into smaller steps. The composition of those has ended up fairly small, but convincing yourself that it's correct will mean reporoducing this splitup ;-/ [52/79] link_path_walk: nd->depth massage, part 1 [53/79] link_path_walk: nd->depth massage, part 2 [54/79] link_path_walk: nd->depth massage, part 3 [55/79] link_path_walk: nd->depth massage, part 4 [56/79] trailing_symlink: nd->depth massage, part 5 [57/79] get_link: nd->depth massage, part 6 [58/79] trailing_symlink: nd->depth massage, part 7 [59/79] put_link: nd->depth massage, part 8 [60/79] link_path_walk: nd->depth massage, part 9 [61/79] link_path_walk: nd->depth massage, part 10 [62/79] link_path_walk: end of nd->depth massage [63/79] namei: we never need more than MAXSYMLINKS entries in nd->stack * getting more regular rules for calling terminate_walk()/put_link(). The reason for that part is (besides getting simpler logics in the end) that with RCU-symlink series we'll really need to have terminate_walk() trigger put_link() on everything we have on stack: [64/79] namei: lift (open-coded) terminate_walk() in follow_dotdot_rcu() into callers [65/79] lift terminate_walk() into callers of walk_component() [66/79] namei: lift (open-coded) terminate_walk() into callers of get_link() [67/79] namei: take put_link() into {lookup,mountpoint,do}_last() [68/79] namei: have terminate_walk() do put_link() on everything left [69/79] link_path_walk: move the OK: inside the loop * more put_link() work - basically, we are consolidating the "we decide that we'd found a symlink that needs to be followed" logics. Again, that's one of the areas RCU-symlink will need to modify: [70/79] namei: new calling conventions for walk_component() [71/79] namei: make should_follow_link() store the link in nd->link [72/79] namei: move link count check and stack allocation into pick_link() At that point the parts of the core work I consider reasonably stable are over. Next 4 commits are about not passing nameidata to ->follow_link() - it's the same situation as with pt_regs and IRQ handlers and the same kind of solution. Only 2 instances out of 26 use nameidata at all, and only to pass it to nd_jump_link(); seeing that we already have the "begin using this nameidata instance/end using it" pairs bracketing the areas where thread works with given instance (we need that for sane freeing of on-demand allocated nd->stack), the solution is obvious... [73/79] lustre: rip the private symlink nesting limit out [74/79][N] VFS: replace {, total_}link_count in task_struct with pointer to nameidata [75/79] namei: simplify the callers of follow_managed() [76/79] don't pass nameidata to ->follow_link() The rest is very preliminary bits and pieces for RCU-symlink; it's almost certainly going to be replaced and can be pretty much ignored for now: [77/79] namei: move unlazying on symlinks towards get_link() call sites [78/79] namei: new helper - is_stale() [79/79] namei: stretch RCU mode into get_link() -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/