Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753966Ab1BTOj1 (ORCPT ); Sun, 20 Feb 2011 09:39:27 -0500 Received: from ironport2-out.teksavvy.com ([206.248.154.183]:61111 "EHLO ironport2-out.pppoe.ca" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753746Ab1BTOjZ (ORCPT ); Sun, 20 Feb 2011 09:39:25 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApIBAFO2YE1Ld/sX/2dsb2JhbAAMhBTLco9HgSeDQXYEhQ2MW4Yy X-IronPort-AV: E=Sophos;i="4.62,194,1297054800"; d="scan'208";a="92581016" Message-ID: <4D61279B.5030203@teksavvy.com> Date: Sun, 20 Feb 2011 09:39:23 -0500 From: Mark Lord User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: "Ted Ts'o" , Linux Kernel , linux-ext4@vger.kernel.org Subject: Re: ext4 crash on 2.6.37: NULL ptr in ext4_discard_preallocations References: <4D604620.9060204@teksavvy.com> <20110220000550.GA8765@thunk.org> <4D609E87.5000903@teksavvy.com> <4D60A117.8090604@teksavvy.com> <20110220061552.GB8765@thunk.org> <4D611D62.2030703@teksavvy.com> In-Reply-To: <4D611D62.2030703@teksavvy.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2024 Lines: 48 On 11-02-20 08:55 AM, Mark Lord wrote: > On 11-02-20 01:15 AM, Ted Ts'o wrote: >> On Sun, Feb 20, 2011 at 12:05:27AM -0500, Mark Lord wrote: >>> I suppose it must be, as there's no other 0x3c offset in that function. >>> Which means it's probably this line that's crashing: >>> >>> BUG_ON(pa->pa_obj_lock != &ei->i_prealloc_lock); >>> >>> ...which could only happen if "pa" was NULL there. >>> I wonder how that happened ? >> >> Which could only happen if ei->i_prealloc_list were not properly >> initialized (i..e, it was still NULL). Which shouldn't ever >> happen...., since all ext4_inodes are initialized in >> ext4_alloc_inode(). >> >> Hmm, can you replicate the crash? > > So far it has been a one time deal here, > but stuff like this is pretty serious nonetheless. > > I suppose it could also happen if another thread did a list-delete > at the same time as that function was running. Which would require > that there be a locking bug/confusion somewhere. > > Looking over the code, most places use rcu to protect accesses, > except for the fragment that crashed. That's probably just fine, > but something to reexamine just out of paranoia. > > Also, the spinlock pointer appears to be dynamic, one of two > possible spinlocks. Maybe something got confused there > (well, obviously *something* got confused, so..). That looks like the best candidate: perhaps pa->pa_obj_lock was one of the per-cpu lg_prealloc_lock's at that point in time. In which case an item could be deleted from the pa list concurrently with the function that actually crashed? That's as far as I can get with it in the time available. You folks do know this code much better, so perhaps just expend a few little grey cells on that theory before calling it quits? Cheers! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/