Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934632AbbEMWZg (ORCPT ); Wed, 13 May 2015 18:25:36 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:44849 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933971AbbEMWZf (ORCPT ); Wed, 13 May 2015 18:25:35 -0400 Date: Wed, 13 May 2015 23:25:33 +0100 From: Al Viro To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Christoph Hellwig , Neil Brown Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks Message-ID: <20150513222533.GA24192@ZenIV.linux.org.uk> References: <20150505052205.GS889@ZenIV.linux.org.uk> <20150511180650.GA4147@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150511180650.GA4147@ZenIV.linux.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4075 Lines: 76 More on top of the current vfs.git#for-next (== the posted patchset with a couple of fixes): more fs/namei.c reorganization and stack footprint reduction (below 1Kb now). One interesting piece of that is that we don't touch current->fs->lock anymore - unlazy_walk() used to, but now we can get rid of that. FWIW, at that point I'm starting to seriously look into a primitive that would take the usual dfd+name+flags and (path x inode x bool -> int) callback (since we don't have closures, it'd have to be int filename_apply(int dfd, struct filename *name, unsigned flags, int (*act)(struct path *path, struct inode *inode, bool may_block, void *ctx), void *ctx); ) with lookup done and if it ends up at something positive, act() called for it. If we end up reaching the very end in RCU mode, act() gets called with false as the third argument, _without_ dropping rcu_read_lock() or grabbing references. It may return -ECHILD, in which case we'll unlazy and call it again with may_block being true; if it does *not* return -ECHILD, we'll check for mount lock and d_seq still being valid. If they are, we are done, if not - restart the lookup from scratch in non-lazy mode. Basically, that's your "could we get stat(2) without ever dirtying anything shared?" thing, except that it's might be applicable to some of the getxattr(), statfs(), access(), listxattr() and readlink() as well. The obstacles for stat() are * ->d_weak_revalidate() needs to be taught about being called in RCU mode. Not a problem - flags are already passed and one of two instances is already checking for LOOKUP_RCU (what with being ->d_revalidate() at the same time). That one applies to all of them, not just stat(). * Linux S&M with its usual habit of sticking hooks into every orifice out there (and if there hadn't been one, the hook still goes in, of course). In this case it's not just selinux, as with follow_link - apparmor, tomoyo and smack are also there. selinux one looks like it could be made to work if given an inode and may_block in addition to struct path; the rest... no idea. * telling ->getattr() that we are in RCU mode. And giving it inode, of course. As the first approximation, we could live with just the "if ->getattr isn't NULL, chicken out and return -ECHILD", but e.g. ext4, btrfs and xfs have non-NULL ->getattr(). In this case I wonder if adding a new method wouldn't be the right thing... Overall, it seems to be doable, and with the results of massage already done to fs/namei.c the PITA promises to be fairly limited. How generic do we really want it? I mean, is e.g. access(2) (faccessat(2)) worth bothering with? Or getxattr(2), for that matter... Comments? Anyway, additional pieces of the series follow: namei: unlazy_walk() doesn't need to mess with current->fs anymore lustre: kill unused macro (LOOKUP_CONTINUE) lustre: kill unused helper get rid of assorted nameidata-related debris [Neil's] Documentation: remove outdated information from automount-support.txt namei: be careful with mountpoint crossings in follow_dotdot_rcu() namei: uninline set_root{,_rcu}() namei: pass the struct path to store the result down into path_lookupat() namei: move putname() call into filename_lookup() namei: shift nameidata inside filename_lookup() namei: make filename_lookup() reject ERR_PTR() passed as name namei: shift nameidata down into filename_parentat() namei: saner calling conventions for filename_create() namei: saner calling conventions for filename_parentat() namei: fold path_cleanup() into terminate_walk() namei: stash dfd and name into nameidata namei: trim do_last() arguments inline user_path_parent() inline user_path_create() namei: move saved_nd pointer into struct nameidata turn user_{path_at,path,lpath,path_dir}() into static inlines -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/