Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752347Ab3EMDHW (ORCPT ); Sun, 12 May 2013 23:07:22 -0400 Received: from li9-11.members.linode.com ([67.18.176.11]:47873 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751590Ab3EMDHV (ORCPT ); Sun, 12 May 2013 23:07:21 -0400 Date: Sun, 12 May 2013 23:07:12 -0400 From: "Theodore Ts'o" To: Tony Luck Cc: Dmitry Monakhov , eunb.song@samsung.com, "linux-ext4@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: EXT4 panic at jbd2_journal_put_journal_head() in 3.9+ Message-ID: <20130513030712.GC25996@thunk.org> Mail-Followup-To: Theodore Ts'o , Tony Luck , Dmitry Monakhov , eunb.song@samsung.com, "linux-ext4@vger.kernel.org" , "linux-kernel@vger.kernel.org" References: <6719519.5821368147110937.JavaMail.weblogic@epml17> <871u9e6ji1.fsf@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2131 Lines: 46 On Sun, May 12, 2013 at 07:04:45PM -0700, Tony Luck wrote: > My git bisect finally competed and points the a finger at: > > commit ae4647fb7654676fc44a97e86eb35f9f06b99f66 > Author: Jan Kara > Date: Fri Apr 12 00:03:42 2013 -0400 > > jbd2: reduce journal_head size > > Remove unused t_cow_tid field (ext4 copy-on-write support doesn't seem > to be happening) and change b_modified and b_jlist to bitfields thus > saving 8 bytes in the structure. Both you and Eunbong Song bisected to the same commit, so presumably the right thing to do at this point is to revert it. Have you tried reverting the commit and demonstrating that the problem goes away afterwards? The reason why I ask is that I'm completely at a lost to understand why this commit could be making a difference. Loooking at the commit, we're converting two unsigned fields, neither of which use more than 4 bits or 1 bits, respectively, to use bitfields instead. Why this could be causing __journal_remove_journal_head() to fail, especially in the way that it does, isn't making any sense to me. We are technically accessing jh->b_jlist without first locking jbd2_lock_bh_state(), but (a) it shouldn't make a difference whether we use a bitfield or 32-bit unsigned value, and (b) by the time we get to __journal_remove_journal_head(), nothing should be using the journal head, and we've locked jbd_lock_bh_journal_head(), which should prevent any one else from starting to use the journal head. Applying patch where I don't understand how it would make things better, even if it is a revert, scares me. If we are going to do this, and since I haven't yet been able to reproduce it on my testing setup, could you try taking Linus's just released 3.10-rc1 release, and revert commit ae4647fb765467, and confirm that this avoids the crash which you are seeing? Thanks, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/