From: Andrew Morton Subject: Re: [PATCH] jbd jbd2: fix dio write returning EIO when try_to_release_page fails Date: Mon, 4 Aug 2008 14:50:47 -0700 Message-ID: <20080804145047.04794bf3.akpm@linux-foundation.org> References: <6.0.0.20.2.20080804185338.03bcd488@172.19.0.2> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: cmm@us.ibm.com, jack@suse.cz, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org To: Hisashi Hifumi Return-path: In-Reply-To: <6.0.0.20.2.20080804185338.03bcd488@172.19.0.2> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Mon, 04 Aug 2008 20:10:33 +0900 Hisashi Hifumi wrote: > Hi > > Dio write returns EIO when try_to_release_page fails because bh is > still referenced. > The patch > "commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91 > Author: Mingming Cao > Date: Fri Jul 25 01:46:22 2008 -0700 > > jbd: fix race between free buffer and commit transaction > " > was merged into 2.6.27-rc1, but I noticed that this patch is not enough > to fix the race. > I did fsstress test heavily to 2.6.27-rc1, and found that dio write still > sometimes got EIO through this test. > The patch above fixed race between freeing buffer(dio) and committing > transaction(jbd) but I discovered that there is another race, > freeing buffer(dio) and ext3/4_ordered_writepage. > : background_writeout() > ->write_cache_pages() > ->ext3_ordered_writepage() > walk_page_buffers() <- take a bh ref > block_write_full_page() <- unlock_page > : <- end_page_writeback > : <- race! (dio write->try_to_release_page fails) > walk_page_buffers() <-release a bh ref > > ext3_ordered_writepage holds bh ref and does unlock_page remaining > taking a bh ref, so this causes the race and failure of > try_to_release_page. > > Following patch fixes this race. Please don't patch both filesystems in a single patch - they go into the tree via different routes. > > Signed-off-by :Hisashi Hifumi "Signed-off-by: ", please. > > diff -Nrup linux-2.6.27-rc1.org/fs/jbd/transaction.c linux-2.6.27-rc1/fs/jbd/transaction.c > --- linux-2.6.27-rc1.org/fs/jbd/transaction.c 2008-07-29 19:28:47.000000000 +0900 > +++ linux-2.6.27-rc1/fs/jbd/transaction.c 2008-07-29 20:40:12.000000000 +0900 > @@ -1764,6 +1764,12 @@ int journal_try_to_free_buffers(journal_ > */ > if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) { > journal_wait_for_transaction_sync_data(journal); > + > + bh = head; > + do { > + while (atomic_read(&bh->b_count)) > + schedule(); > + } while ((bh = bh->b_this_page) != head); > ret = try_to_free_buffers(page); > } The loop is problematic. If the scheduler decides to keep running this task then we have a busy loop. If this task has realtime policy then it might even lock up the kernel. Perhaps we can use wait_on_page_writeback()?