From: Badari Pulavarty <pbadari@gmail.com>
Subject: Re: Reversing order of transaction start and page_lock for ext3/4
Date: Mon, 17 Mar 2008 08:22:17 -0800
Message-ID: <1205770937.26074.6.camel@dyn9047017100.beaverton.ibm.com>
References: <20080313180519.GL12523@duck.suse.cz>
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Cc: ext4 <linux-ext4@vger.kernel.org>
To: Jan Kara <jack@suse.cz>
In-Reply-To: <20080313180519.GL12523@duck.suse.cz>
Sender: linux-ext4-owner@vger.kernel.org

On Thu, 2008-03-13 at 19:05 +0100, Jan Kara wrote:
>   Hi,
> 
>   As Mark Fasheh pointed out, we cannot take page_lock inside a transaction
> commit because that could possibly deadlock with other thread holding the
> page_lock and waiting for commit to finish in journal_start. This is
> kind-of blocker for my new approach of handling of ordered mode in JBD.
> 
> So first, I'd like to ask what other people think about reversing locking
> order of page_lock and transaction start in ext3/4? I personally find it a
> good thing anyway (I've stumbled on problems with the current locking order
> several times, but so far I could always workaround them), logically it
> simply "makes sence" as transaction handle is naturally more long-lived
> than a lock on one page.
> 
> For the case that we agree we want to reverse the order, I've looked into
> how hard would it be. Ordinary write path is trivial. If we provide
> page_mkwrite function (which should be quite simple), we don't have to be
> afraid of instantiating holes in writepage so that makes things in
> writepage simpler (although we'd pay the some performance for writing page
> of zeros into the hole and later writing real data in writepage - currently
> we do only the second write together with block allocation). With
> page_mkwrite, we don't have to start transaction at all in writepage in
> writeback and ordered modes.

3 years ago, I tried to support writepages() for ext3 and ran into this.
At that time, page_mkwrite() was still under works and couldn't use it.
I think with page_mkwrite() we can do this, except that its a slight
behaviour change in the sense that the app could get allocation errors
(ENOSPC) while writing to mmap(). But I think its a good thing.

For ext4 delayed allocation, we might see small performance issue - but
I think it would be a special case.

Thanks,
Badari