Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752529AbcDPAwh (ORCPT ); Fri, 15 Apr 2016 20:52:37 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:39526 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751882AbcDPAwf (ORCPT ); Fri, 15 Apr 2016 20:52:35 -0400 Date: Sat, 16 Apr 2016 01:52:32 +0100 From: Al Viro To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCHSET][RFC][CFT] parallel lookups Message-ID: <20160416005232.GV25498@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3980 Lines: 74 The thing appears to be working. It's in vfs.git#work.lookups; the last 5 commits are the infrastructure (fs/namei.c and fs/dcache.c; no changes in fs/*/*) + actual switch to rwsem. The missing bits: down_write_killable() (there had been a series posted introducing just that; for now I've replaced mutex_lock_killable() calls with plain inode_lock() - they are not critical for any testing and as soon as down_write_killable() gets there I'll replace those), lockdep bits might need corrections and right now it's only for lookups. I'm going to add readdir to the mix; the primitive added in this series (d_alloc_parallel()) will need to be used in dcache pre-seeding paths, ncpfs use of dentry_update_name_case() will need to be changed to something less hacky and syscalls calling iterate_dir() will need to switch to fdget_pos() (with FMODE_ATOMIC_POS set for directories as well as regulars). The last bit is needed for exclusion on struct file level - there's a bunch of cases where we maintain data structures hanging off file->private and those really need to be serialized. Besides, serializing ->f_pos updates is needed for sane semantics; right now we tend to use ->i_mutex for that, but it would be easier to go for the same mechanism as for regular files. With any luck we'll have working parallel readdir in addition to parallel lookups in this cycle as well. The patchset is on top of switching getxattr to passing dentry and inode separately; that part will get changes (in particular, the stuff agruen has posted lately), but the lookups queue proper cares only about being able to move security_d_instantiate() to the point before dentry is attached to inode. 1/15: security_d_instantiate(): move to the point prior to attaching dentry to inode. Depends on getxattr changes, allows to do the "attach to inode" and "add to dentry hash" parts without dropping ->d_lock in between. 2/15 -- 8/15: preparations - stuff similar to what went in during the last cycle; several places switched to lookup_one_len_unlocked(), a bunch of direct manipulations of ->i_mutex replaced with inode_lock, etc. helpers. kernfs: use lookup_one_len_unlocked(). configfs_detach_prep(): make sure that wait_mutex won't go away ocfs2: don't open-code inode_lock/inode_unlock orangefs: don't open-code inode_lock/inode_unlock reiserfs: open-code reiserfs_mutex_lock_safe() in reiserfs_unpack() reconnect_one(): use lookup_one_len_unlocked() ovl_lookup_real(): use lookup_one_len_unlocked() 9/15: lookup_slow(): bugger off on IS_DEADDIR() from the very beginning open-code real_lookup() call in lookup_slow(), move IS_DEADDIR check upwards. 10/15: __d_add(): don't drop/regain ->d_lock that's what 1/15 had been for; might make sense to reorder closer to it. 11/15 -- 14/15: actual machinery for parallel lookups. This stuff could've been a single commit, along with the actual switch to rwsem and shared lock in lookup_slow(), but it's easier to review if carved up like that. From the testing POV it's one chunk - it is bisect-safe, but the added code really comes into play only after we go for shared lock, which happens in 15/15. That's the core of the series. beginning of transition to parallel lookups - marking in-lookup dentries parallel lookups machinery, part 2 parallel lookups machinery, part 3 parallel lookups machinery, part 4 (and last) 15/15: parallel lookups: actual switch to rwsem Note that filesystems would be free to switch some of their own uses of inode_lock() to grabbing it shared - it's really up to them. This series works only with directories locking, but this field has become an rwsem for all inodes. XFS folks in particular might be interested in using it... I'll post the individual patches in followups. Again, this is also available in vfs.git #work.lookups (head at e2d622a right now). The thing survives LTP and xfstests without regressions, but more testing would certainly be appreciated. So would review, of course.