Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753981Ab3IJRjV (ORCPT ); Tue, 10 Sep 2013 13:39:21 -0400 Received: from mail-oa0-f51.google.com ([209.85.219.51]:53001 "EHLO mail-oa0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753749Ab3IJRjT (ORCPT ); Tue, 10 Sep 2013 13:39:19 -0400 MIME-Version: 1.0 In-Reply-To: References: Date: Tue, 10 Sep 2013 13:39:18 -0400 Message-ID: Subject: Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d From: Josh Boyer To: Linus Torvalds Cc: Al Viro , Waiman Long , "Linux-Kernel@Vger. Kernel. Org" , moneta.mace@gmail.com Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2351 Lines: 49 On Tue, Sep 10, 2013 at 1:33 PM, Linus Torvalds wrote: > On Tue, Sep 10, 2013 at 10:14 AM, Josh Boyer wrote: >> >> We've had a user report a backtrace from hitting the >> BUG_ON(!ret->d_lockref.count) added with the lockref infrastructure >> (commit 98474236f72) on rawhide today[1]. I've grabbed the backtrace >> below. The user has btrfs, NFS, and sshfs in usage with this oops. >> >> I've not seen anything similar, but I could have missed it. Does this >> look familiar to anyone? > > Nope. And the dget_parent() case itself hasn't even changed - that > BUG_ON() wasn't really added by the lockref code, it's just a > search-and-replace change of a BUG_ON(!d_count) to > BUG_ON(!d_lockref.count). The BUG_ON() existed before. > > That whole "dget_parent()" thing is also in the _simple_ case (not RCU > mode), and the BUG_ON is for when the dentry is properly locked, so > that's all "safe" code. The refcount must have gotten corrupted > earlier. > > Do you have the mainline git ID of that rawhide kernel? Because there > *was* a real bug in d_rcu_to_refcount. I don't see how it could > trigger that particular issue, but it could trigger scheduling while > in the rcu-protected region and that in turn could result in odd > things down the line, so.. > > That particular bug exists between commits 15570086b590 ("vfs: > reimplement d_rcu_to_refcount() using lockref_get_or_lock()") that > introduced it, and e5c832d55588 ("vfs: fix dentry RCU to refcounting > possibly sleeping dput()") that should have fixed it. But I don't know > what mainline kernel that "kernel-3.12.0-0.rc0.git16.2.fc21.x86_64" is > based on. I'm sure that information exists somewhere.. The subject says v3.11-7890-ge5c832d, which is the git-describe output of the mainline kernel for that Fedora build. Sorry, I should have made that clearer. So according to that, it should be based on the actual commit you identified as the fix. I have your latest tree as of this morning (v3.11-8716-g26b0332) building right now and have asked Moneta to test it. josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/