2008-03-13 18:05:21

by Jan Kara

[permalink] [raw]
Subject: Reversing order of transaction start and page_lock for ext3/4

Hi,

As Mark Fasheh pointed out, we cannot take page_lock inside a transaction
commit because that could possibly deadlock with other thread holding the
page_lock and waiting for commit to finish in journal_start. This is
kind-of blocker for my new approach of handling of ordered mode in JBD.

So first, I'd like to ask what other people think about reversing locking
order of page_lock and transaction start in ext3/4? I personally find it a
good thing anyway (I've stumbled on problems with the current locking order
several times, but so far I could always workaround them), logically it
simply "makes sence" as transaction handle is naturally more long-lived
than a lock on one page.

For the case that we agree we want to reverse the order, I've looked into
how hard would it be. Ordinary write path is trivial. If we provide
page_mkwrite function (which should be quite simple), we don't have to be
afraid of instantiating holes in writepage so that makes things in
writepage simpler (although we'd pay the some performance for writing page
of zeros into the hole and later writing real data in writepage - currently
we do only the second write together with block allocation). With
page_mkwrite, we don't have to start transaction at all in writepage in
writeback and ordered modes. In data=journal mode, we still have to start
the transaction. So we'd have to do something like unlocking the page,
starting the transaction, locking the page, then carefully check whether
the page didn't get truncated etc... It is a question for discussion, whether
this moment wouldn't be appropriate for substituting journal=data mode with
an ordered mode but I guess the feature removal will take longer.

And as far as I can see that's all :). Comments, ideas, opinions welcome :).

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR


2008-03-17 15:22:01

by Badari Pulavarty

[permalink] [raw]
Subject: Re: Reversing order of transaction start and page_lock for ext3/4

On Thu, 2008-03-13 at 19:05 +0100, Jan Kara wrote:
> Hi,
>
> As Mark Fasheh pointed out, we cannot take page_lock inside a transaction
> commit because that could possibly deadlock with other thread holding the
> page_lock and waiting for commit to finish in journal_start. This is
> kind-of blocker for my new approach of handling of ordered mode in JBD.
>
> So first, I'd like to ask what other people think about reversing locking
> order of page_lock and transaction start in ext3/4? I personally find it a
> good thing anyway (I've stumbled on problems with the current locking order
> several times, but so far I could always workaround them), logically it
> simply "makes sence" as transaction handle is naturally more long-lived
> than a lock on one page.
>
> For the case that we agree we want to reverse the order, I've looked into
> how hard would it be. Ordinary write path is trivial. If we provide
> page_mkwrite function (which should be quite simple), we don't have to be
> afraid of instantiating holes in writepage so that makes things in
> writepage simpler (although we'd pay the some performance for writing page
> of zeros into the hole and later writing real data in writepage - currently
> we do only the second write together with block allocation). With
> page_mkwrite, we don't have to start transaction at all in writepage in
> writeback and ordered modes.

3 years ago, I tried to support writepages() for ext3 and ran into this.
At that time, page_mkwrite() was still under works and couldn't use it.
I think with page_mkwrite() we can do this, except that its a slight
behaviour change in the sense that the app could get allocation errors
(ENOSPC) while writing to mmap(). But I think its a good thing.

For ext4 delayed allocation, we might see small performance issue - but
I think it would be a special case.

Thanks,
Badari