Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933083Ab0KLXRO (ORCPT ); Fri, 12 Nov 2010 18:17:14 -0500 Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:35168 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933009Ab0KLXRL (ORCPT ); Fri, 12 Nov 2010 18:17:11 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAENZ3Ux5LcEI/2dsb2JhbACiUHK+c4VKBA Date: Sat, 13 Nov 2010 10:17:05 +1100 From: Nick Piggin To: Linus Torvalds Cc: Nick Piggin , Nick Piggin , Eric Dumazet , Al Viro , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Dave Chinner Subject: Re: [patch 1/6] fs: icache RCU free inodes Message-ID: <20101112231705.GB3317@amd> References: <20101109124610.GB11477@amd> <1289319698.2774.16.camel@edumazet-laptop> <20101109220506.GE3246@amd> <20101112060202.GB3332@amd> <20101112064911.GA3775@amd> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2844 Lines: 60 On Fri, Nov 12, 2010 at 09:33:11AM -0800, Linus Torvalds wrote: > On Thu, Nov 11, 2010 at 10:49 PM, Nick Piggin wrote: > > > > In reality, it's likely to be well under 0.1% in any real workload, even > > an inode intensive one. So I much prefer to err on the side of less > > complexity, to start with. There just isn't much risk of regression > > AFAIKS, and much more risk of becoming unmaintainable too complex. > > Well, I have to say that if we don't get this lockless path lookup > thing merged in the next merge window (ir 38-rc1), I'm going to be > personally very disappointed (*). I'm trying to piece things together. I'll hopefully be able to post patches again soon for review. > So yes, the "initial complexity" argument is certainly acceptable to > me. It does make me suspect something is wrong, though, because quite > frankly, the actual accesses to the inode during the lockless walk > should be very _very_ controlled anyway. And it's trivial to do a "is > this inode still the same one I started with" with zero locking, by > just checking that "dentry->d_inode" is the same after-the-fact and > checking that the dentry is still hashed. The inode type had better > _NOT_ change if the dentry pointer is still there. > > So even if the type or i_ops changes, none of that should matter in > the least. Nobody should _care_. We might get two wildly different > results, but we have a trivial way to check whether the inode was > stable after-the-fact, and just punt if it wasn't. So it really smells > like if this is an issue, there's something wrong going on. Yes you are very right about that, it is actually possible to use seqlocks and re-checking things to verify it after the fact. And this is why I'm optimisic that we can tackle any and all regressions that come up. An example of where it can get more complicated: A filesystem has an ->op function which gets the sb from inode->i_sb, and then does the container_of thing, to get the filesystem specific superblock so it can check flags to determine something (eg. whether it is case sensitive or not). If the inode goes away and i_sb can change, this can oops. We basically just need to further tighten rules and further audit everyone. I'm not saying it can't be done, I'm just saying it's not _totally_ trivail like the usual DESTROY_BY_RCU pattern, so let's just see what incremental patches look like. I'm glad you agree at this point (and if it does turn out to be much simpler than I anticipate, then hey that's great, we can just move to DESTROY_BY_RCU even quicker). Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/