2005-03-02 17:00:58

by Muthian Sivathanu

[permalink] [raw]
Subject: ext3 journal commit performance

Hi,

I have a question on ext3 journal commit code. When a
transaction is committed in the ordered mode, ext3
first issues the data writes, waits for them to
finish, then issues the journal writes, waits for them
to finish, and then writes out the commit record.

It appears that the first wait (for the data blocks)
is unnecessary because all that is required is that
before the commit, both the data and the metadata
blocks should be on disk. This extra wait can
potentially reduce performance in cases where the
journal is on a separate disk, because you lose
parallelism between data writes and the metadata
writes.

Does anyone have an idea as to why this extra wait was
introduced?

thanks very much,
Muthian

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


2005-03-02 17:44:19

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: ext3 journal commit performance

On Wed, 2 Mar 2005, Muthian Sivathanu wrote:

> Hi,
>
> I have a question on ext3 journal commit code. When a
> transaction is committed in the ordered mode, ext3
> first issues the data writes, waits for them to
> finish, then issues the journal writes, waits for them
> to finish, and then writes out the commit record.
>
> It appears that the first wait (for the data blocks)
> is unnecessary because all that is required is that

Wrong. If you perform two buffered writes back-to-back
will you guarantee that they are both on the disk when
the second finishes? Not on your life! They can (read will)
be reordered depending upon the closest seek. So it is
mandatory that one wait to make sure that both writes
occur in order.

> before the commit, both the data and the metadata
> blocks should be on disk. This extra wait can
> potentially reduce performance in cases where the
> journal is on a separate disk, because you lose
> parallelism between data writes and the metadata
> writes.
>
> Does anyone have an idea as to why this extra wait was
> introduced?
>
> thanks very much,
> Muthian
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

2005-03-02 17:49:23

by Muthian Sivathanu

[permalink] [raw]
Subject: Re: ext3 journal commit performance

> >
> > I have a question on ext3 journal commit code.
> When a
> > transaction is committed in the ordered mode, ext3
> > first issues the data writes, waits for them to
> > finish, then issues the journal writes, waits for
> them
> > to finish, and then writes out the commit record.
> >
> > It appears that the first wait (for the data
> blocks)
> > is unnecessary because all that is required is
> that
>
> Wrong. If you perform two buffered writes
> back-to-back
> will you guarantee that they are both on the disk
> when
> the second finishes? Not on your life! They can
> (read will)
> be reordered depending upon the closest seek. So it
> is
> mandatory that one wait to make sure that both
> writes
> occur in order.
>
>

Sorry if I was unclear. I did not say that waiting
for the metadata will guarantee commit of the data as
well. My point is you can wait for both of them
_together_ after issuing both of them to disk, instead
of serializing them at the issue stage itself.

Muthian

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

2005-03-02 18:01:50

by Andrew Morton

[permalink] [raw]
Subject: Re: ext3 journal commit performance

Muthian Sivathanu <[email protected]> wrote:
>
> Hi,
>
> I have a question on ext3 journal commit code. When a
> transaction is committed in the ordered mode, ext3
> first issues the data writes, waits for them to
> finish, then issues the journal writes, waits for them
> to finish, and then writes out the commit record.
>
> It appears that the first wait (for the data blocks)
> is unnecessary because all that is required is that
> before the commit, both the data and the metadata
> blocks should be on disk. This extra wait can
> potentially reduce performance in cases where the
> journal is on a separate disk, because you lose
> parallelism between data writes and the metadata
> writes.

1) write the data
2) wait on the data write
3) write the journal
4) wait on the journal write

If we were to omit step 2), we wouldn't be ordering data any more: we will
commit the journal while there are still data writes in flight.

However, what you are proposing is, I think,

1) write the data
2) write the journal
3) wait on the data write
4) wait on the journal write

That would work, and could possibly speed things up a little.

But bear in mind that the journal write is just a single seek, and the
journal tends to be at one end of the disk, and we need to seek to it
anyway. There would be some opportunity for the elevator and the disk to
optimise away a seek.

2005-03-02 21:33:30

by Andrew Morton

[permalink] [raw]
Subject: Re: ext3 journal commit performance

Andrew Morton <[email protected]> wrote:
>
> However, what you are proposing is, I think,
>
> 1) write the data
> 2) write the journal
> 3) wait on the data write
> 4) wait on the journal write

5) write the commit block
6) wait on the commit block