Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758191AbbDVVGA (ORCPT ); Wed, 22 Apr 2015 17:06:00 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:45130 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757965AbbDVVF5 (ORCPT ); Wed, 22 Apr 2015 17:05:57 -0400 Date: Wed, 22 Apr 2015 22:05:53 +0100 From: Al Viro To: NeilBrown Cc: Christoph Hellwig , Linus Torvalds , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Message-ID: <20150422210553.GX889@ZenIV.linux.org.uk> References: <20150420181222.GK889@ZenIV.linux.org.uk> <20150421144959.GR889@ZenIV.linux.org.uk> <20150421150408.GA29838@infradead.org> <553668C1.8030707@nod.at> <20150421154504.GT889@ZenIV.linux.org.uk> <20150421212007.GU889@ZenIV.linux.org.uk> <20150422180702.GA15209@ZenIV.linux.org.uk> <20150422201238.GW889@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150422201238.GW889@ZenIV.linux.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3859 Lines: 77 On Wed, Apr 22, 2015 at 09:12:38PM +0100, Al Viro wrote: > On Wed, Apr 22, 2015 at 07:07:59PM +0100, Al Viro wrote: > > And one more: may_follow_link() is now potentially oopsable. Look: suppose > > we've reached the link in RCU mode, just as it got unlinked. link->dentry > > has become negative and may_follow_link() steps into > > /* Allowed if owner and follower match. */ > > inode = link->dentry->d_inode; > > if (uid_eq(current_cred()->fsuid, inode->i_uid)) > > return 0; > > Oops... Incidentally, I suspect that your __read_seqcount_retry() in > > follow_link() might be lacking a barrier; why isn't full read_seqcount_retry() > > needed? > > > > FWIW, I would rather fetch ->d_inode *and* checked ->seq proir to calling > > get_link(), and passed inode to it as an explicit argument. And passed it > > to may_follow_link() as well... > > Hrm... You know, something really weird is going on here. Where are > you setting nd->seq? I don't see anything in follow_link() doing that. > And nd->seq _definitely_ needs setting if you want to stay in RCU mode - > at that point it matches the dentry of symlink, not that of nd->path > (== parent directory). Neil, could you tell me which kernel you'd been > testing (ideally - commit ID is a public git tree), what config and what > tests had those been? FWIW, there's a wart that had been annoying me for quite a while, and it might be related to dealing with that properly. Namely, walk_component() calling conventions. We have walk_component(nd, &path, follow), which can * return -E..., and leave us with pathwalk terminated; path contains junk, and so does nd->path. * return 0, update nd->path, nd->inode and nd->seq. The contents of path is in undefined state - it might be unchanged, it might be equal to nd->path (and not pinned down, RCU mode or not). In any case, callers do not touch it afterwards. That's the normal case. * return 1, update nd->seq, leave nd->path and nd->inode unchanged and set path pointing to our symlink. nd->seq matches path, not nd->path. In all cases the original contents of path is ignored - it's purely 'out' parameter, but compiler can't establish that on its own; it _might_ be left untouched. In all cases when its contents survives we don't look at it afterwards, but proving that requires a non-trivial analysis. And in case when we return 1 (== symlink to be followed), we bugger nd->seq. It's left as we need it for unlazy_walk() (and after unlazy_walk() we don't care about it at all), so currently everything works, but if we want to stay in RCU mode for symlink traversal, we _will_ need ->d_seq of parent directory. I wonder if the right way to solve that would be to drop the path argument entirely and store the bugger in nameidata. As in union { struct qstr last; struct path link; }; ... union { int last_type; unsigned link_seq; }; in struct nameidata. We never need both at the same time; after walk_component() (or its analogue in do_last()) we don't need the component name anymore. That way walk_component() would not trash nd->seq when finding a symlink... It would also shrink the stack footprint a bit - local struct path next in link_path_walk() would be gone, along with the same thing in path_lookupat() and friends. Not a lot of win (4 pointers total), but it might be enough to excuse putting ->d_seq of root in there, along with ->link.dentry->d_inode, to avoid rechecking its ->d_seq. As the matter of fact, we do have an odd number of 32bit fields in there, so ->d_seq of root would fit nicely... Comments? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/