From: Jan Kara Subject: Re: [PATCH] ext4: Fix delalloc sync hang with journal lock inversion Date: Thu, 5 Jun 2008 18:22:09 +0200 Message-ID: <20080605162209.GG27370@duck.suse.cz> References: <1212154769-16486-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1212154769-16486-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1212154769-16486-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1212154769-16486-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1212154769-16486-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1212154769-16486-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20080602093459.GC30613@duck.suse.cz> <20080602095956.GB9225@skywalker> <20080602102759.GG30613@duck.suse.cz> <20080605135413.GI8942@skywalker> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jan Kara , cmm@us.ibm.com, linux-ext4@vger.kernel.org To: "Aneesh Kumar K.V" Return-path: Received: from styx.suse.cz ([82.119.242.94]:41497 "EHLO mail.suse.cz" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1759199AbYFEQWL (ORCPT ); Thu, 5 Jun 2008 12:22:11 -0400 Content-Disposition: inline In-Reply-To: <20080605135413.GI8942@skywalker> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Thu 05-06-08 19:24:13, Aneesh Kumar K.V wrote: > > Hmm, I've just got an idea that it may be better to introduce a new flag > > for wbc like range_cont and it would mean that we start scan at > > writeback_index (we use range_start if writeback_index is not set) and > > end with range_end. That way we don't have to be afraid of interference > > with other range_cyclic users and in principle, range_cyclic is originally > > meant for other uses... > > > > something like below ?. With this ext4_da_writepages have > > pgoff_t writeback_index = 0; > ..... > if (!wbc->range_cyclic) { > /* > * If range_cyclic is not set force range_cont > * and save the old writeback_index > */ > wbc->range_cont = 1; > writeback_index = mapping->writeback_index; > mapping->writeback_index = 0; > } > ... > mpage_da_writepages(..) > .. > if (writeback_index) > mapping->writeback_index = writeback_index; > return ret; > > > > mm: Add range_cont mode for writeback. > > From: Aneesh Kumar K.V > > Filesystems like ext4 needs to start a new transaction in > the writepages for block allocation. This happens with delayed > allocation and there is limit to how many credits we can request > from the journal layer. So we call write_cache_pages multiple > times with wbc->nr_to_write set to the maximum possible value > limitted by the max journal credits available. > > Add a new mode to writeback that enables us to handle this > behaviour. If mapping->writeback_index is not set we use > wbc->range_start to find the start index and then at the end > of write_cache_pages we store the index in writeback_index. Next > call to write_cache_pages will start writeout from writeback_index. > Also we limit writing to the specified wbc->range_end. > > > Signed-off-by: Aneesh Kumar K.V > --- > > include/linux/writeback.h | 1 + > mm/page-writeback.c | 10 +++++++++- > 2 files changed, 10 insertions(+), 1 deletions(-) I like it. I'm only not sure whether there cannot be two users of write_cache_pages() operating on the same mapping at the same time. Because then they could alter writeback_index under each other and that would probably result in unpleasant behavior. I think there can be two parallel calls for example from sync_single_inode() and sync_page_range(). In that case we'd need something like writeback_index inside wbc (or maybe just alter range_start automatically when range_cont is set?) so that parallel callers do no influence each other. Honza > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > index f462439..0d8573e 100644 > --- a/include/linux/writeback.h > +++ b/include/linux/writeback.h > @@ -63,6 +63,7 @@ struct writeback_control { > unsigned for_writepages:1; /* This is a writepages() call */ > unsigned range_cyclic:1; /* range_start is cyclic */ > unsigned more_io:1; /* more io to be dispatched */ > + unsigned range_cont:1; > }; > > /* > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 789b6ad..014a9f2 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -882,6 +882,12 @@ int write_cache_pages(struct address_space *mapping, > if (wbc->range_cyclic) { > index = mapping->writeback_index; /* Start from prev offset */ > end = -1; > + } else if (wbc->range_cont) { > + if (!mapping->writeback_index) > + index = wbc->range_start >> PAGE_CACHE_SHIFT; > + else > + index = mapping->writeback_index; > + end = wbc->range_end >> PAGE_CACHE_SHIFT; > } else { > index = wbc->range_start >> PAGE_CACHE_SHIFT; > end = wbc->range_end >> PAGE_CACHE_SHIFT; > @@ -954,7 +960,9 @@ int write_cache_pages(struct address_space *mapping, > index = 0; > goto retry; > } > - if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0)) > + if (wbc->range_cyclic || > + (range_whole && wbc->nr_to_write > 0) || > + wbc->range_cont) > mapping->writeback_index = index; > return ret; > } -- Jan Kara SUSE Labs, CR