Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030738AbWJKBnh (ORCPT ); Tue, 10 Oct 2006 21:43:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1030739AbWJKBnh (ORCPT ); Tue, 10 Oct 2006 21:43:37 -0400 Received: from sandeen.net ([209.173.210.139]:23904 "EHLO sandeen.net") by vger.kernel.org with ESMTP id S1030738AbWJKBng (ORCPT ); Tue, 10 Oct 2006 21:43:36 -0400 Message-ID: <452C4C47.2000107@sandeen.net> Date: Tue, 10 Oct 2006 20:43:35 -0500 From: Eric Sandeen User-Agent: Thunderbird 1.5.0.7 (Macintosh/20060909) MIME-Version: 1.0 To: Badari Pulavarty CC: Eric Sandeen , Jan Kara , Dave Jones , Andrew Morton , Linux Kernel Subject: Re: 2.6.18 ext3 panic. References: <20061002194711.GA1815@redhat.com> <20061003052219.GA15563@redhat.com> <4521F865.6060400@sandeen.net> <20061002231945.f2711f99.akpm@osdl.org> <452AA716.7060701@sandeen.net> <1160431165.17103.21.camel@dyn9047017100.beaverton.ibm.com> <20061009225036.GC26728@redhat.com> <20061010141145.GM23622@atrey.karlin.mff.cuni.cz> <452C18A6.3070607@redhat.com> <1160519106.28299.4.camel@dyn9047017100.beaverton.ibm.com> In-Reply-To: <1160519106.28299.4.camel@dyn9047017100.beaverton.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2860 Lines: 64 Badari Pulavarty wrote: > On Tue, 2006-10-10 at 17:03 -0500, Eric Sandeen wrote: >> Jan Kara wrote: >> >>> I think it's really the 1KB block size that makes it happen. >>> I've looked at journal_dirty_data() code and I think the following can >>> happen: >>> sync() eventually ends up in journal_dirty_data(bh) as Eric writes. >>> There is finds dirty buffer attached to the comitting transaction. So it drops >>> all locks and calls sync_dirty_buffer(bh). >>> Now in other process, file is truncated so that 'bh' gets just after EOF. >>> As we have 1kb buffers, it can happen that bh is in the partially >>> truncated page. Buffer is marked unmapped and clean. But in a moment the page >>> is marked dirty and msync() is called. That eventually calls >>> set_page_dirty() and all buffers in the page are marked dirty. >>> The first process now wakes up, locks the buffer, clears the dirty bit >>> and does submit_bh() - Oops. >> Hm, just FWIW I have a couple traces* of the buffer getting unmapped >> -before- journal_submit_data_buffers ever even finds it... >> >> journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, >> adding to array pid 1836 >> b_state:0x114025 b_jlist:BJ_SyncData cpu:0 b_count:2 b_blocknr:27130 >> b_jbd:1 b_frozen_data:0000000000000000 >> b_committed_data:0000000000000000 >> b_transaction:1 b_next_transaction:0 b_cp_transaction:0 >> b_trans_is_running:0 >> b_trans_is_comitting:1 b_jcount:0 pg_dirty:0 >> >> so it's already unmapped at this point. Could >> journal_submit_data_buffers benefit from some buffer_mapped checks? Or >> is that just a bandaid too late... > > Hmm.. > > b_state: 0x114025 > ^ > means BH_Mapped. Isn't it ? Whoops, I pasted in the wrong one, I guess, from earlier in the trace. Here are the ones I was looking at: journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, adding to array pid 1690 b_state:0x104005 b_jlist:BJ_SyncData cpu:0 b_count:2 b_blocknr:30045 b_jbd:1 b_frozen_data:0000000000000000 b_committed_data:0000000000000000 b_transaction:1 b_next_transaction:0 b_cp_transaction:0 b_trans_is_running:0 b_trans_is_comitting:1 b_jcount:0 pg_dirty:1 and journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, adding to array pid 1836 b_state:0x114005 b_jlist:BJ_SyncData cpu:1 b_count:2 b_blocknr:27130 b_jbd:1 b_frozen_data:0000000000000000 b_committed_data:0000000000000000 b_transaction:1 b_next_transaction:0 b_cp_transaction:0 b_trans_is_running:0 b_trans_is_comitting:1 b_jcount:0 pg_dirty:1 -Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/