Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752054AbbHBSyM (ORCPT ); Sun, 2 Aug 2015 14:54:12 -0400 Received: from mail-pa0-f49.google.com ([209.85.220.49]:34005 "EHLO mail-pa0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751579AbbHBSyK (ORCPT ); Sun, 2 Aug 2015 14:54:10 -0400 Date: Sun, 2 Aug 2015 11:53:16 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds , Al Viro cc: Hugh Dickins , Dominique Martinet , "J. Bruce Fields" , Dominique Martinet , Linux Kernel Mailing List , linux-fsdevel , David Howells Subject: Re: [git pull] vfs.git spurious ENOTDIR fix In-Reply-To: Message-ID: References: <20150731205036.GA3752@nautica> <20150801072603.GV17109@ZenIV.linux.org.uk> <20150802001402.GY17109@ZenIV.linux.org.uk> <20150802002318.GZ17109@ZenIV.linux.org.uk> <20150802014139.GA17109@ZenIV.linux.org.uk> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2680 Lines: 60 On Sat, 1 Aug 2015, Linus Torvalds wrote: > On Sat, Aug 1, 2015 at 9:06 PM, Hugh Dickins wrote: > > > > (I don't actually understand why the clearing of DCACHE_ENTRY_TYPE in > > dentry_iput() is not of continuing concern; but don't worry, there's > > plenty I don't understand - so long as you're both satisfied that > > it's not a concern, no need to persuade me.) > > So dentry_iput() is only called as the dentry is being thrown away, > and is stale. > > Yes, such a stale dentry can be seen by an RCU lookup, but the RCU > lookups should always revalidate things after the lookup, so it > shouldn't matter. The problem here was that there was a missing > revalidate of the RCU lookup for an error case, so the error that > _should_ have been a harmless race that got handled later by the > proper validation instead turned into a real user-visible error. Thank you both for leading me through that: I really should have rechecked the sequence count invalidation in the source for myself (I had a wrong picture of it in my head), before inserting that parenthesis and taking your time over it; but had been in a hurry to get a response back. > > But we didn't use to clear the flags in dentry_iput, so before things > generally "happened to work" anyway, because this rare error case > didn't actually ever trigger in the first place. > > (And I still don't think we necessarily *should* clear the flags in > dentry_iput(), but it really shouldn't be a correctness issue) > > > Do we have any idea why a bug introduced in v3.13 should only now > > stand out, both for Dominique and for me? Has the RCU lookup somehow > > become much more effective recently? > > So I do think that the clearing of the dentry flags exposed a > situation that was harder to hit before. Right, that does indeed make sense of why it appeared now. I cannot actually report success from yesterday's testing, since it hung after 20 hours for, I believe, the same unrelated reason that I ran into before. I mentioned jbd2 last time, but I doubt that's at fault: it's almost certainly an issue with recent vmscan changes and/or recent loop changes - the business of page reclaim waiting on page writeback has always been tricky and fragile and deadlock-prone, the more so when loop is involved: probably the balance has got shifted slightly by recent changes, I'll look into it (but definitely not rc5 material). Thanks, Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/