Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753535AbbGaTn2 (ORCPT ); Fri, 31 Jul 2015 15:43:28 -0400 Received: from mail-pd0-f179.google.com ([209.85.192.179]:33296 "EHLO mail-pd0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751625AbbGaTnZ (ORCPT ); Fri, 31 Jul 2015 15:43:25 -0400 Date: Fri, 31 Jul 2015 12:42:30 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: "J. Bruce Fields" , Dominique Martinet , Hugh Dickins , Al Viro , Linux Kernel Mailing List , linux-fsdevel Subject: Re: v4.2-rc dcache regression, probably 75a6f82a0d10 In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2012 Lines: 45 On Fri, 31 Jul 2015, Linus Torvalds wrote: > On Fri, Jul 31, 2015 at 10:46 AM, Hugh Dickins wrote: > > > > Sounds like a dcache problem, and 75a6f82a0d10 seemed the only > > likely candidate, so I experimented with reverting it yesterday, > > and ran successfully for 24 hours. > > Hmm. Sounds odd. Are you running nfsd? That would explain why it > happens on ext4 but not tmpfs: ext4 has a get_parent method that can > get a disconnected entry, while tmpfs does not. > > That said, your load doesn't sound like it would actually ever trigger > this, unless you just didn't mention that you also end up using that > filesystem over nfs on another machine. No, no nfsd nor any kind of networking filesystem stuff going on. Right, I never looked to see what DCACHE_DISCONNECTED is actually about, just rushed ahead and tried running with the revert. > > So leave it running a while longer, but maybe it's 4bf46a272647 like > Dominique suspects. Although I don't see how that could trigger > anything either.. I restarted with a slightly different version of the load this morning, which has sometimes shown the issue more easily - I thought it better to restart with a variant than persist with a run that might have settled into a protected pattern. We'll see what that shows later on. It will indeed be weird and odd if it confirms that DCACHE_DISCONNECTED revert is good. I agree that Dominique's 4bf46a272647 seems now more likely, if still unlikely; but that was included in v4.1, and I saw no problem with v4.1 once the rmap_walk() skip was fixed. There may be some completely unrelated commit which alters the timing enough to expose or mask whatever is the guilty commit. Or something corrupting dentry->d_flags occasionally. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/