2008-08-15 21:33:09

by Milan Broz

[permalink] [raw]
Subject: Mount ext3 with barrier=1 doesn't send real barrier bio?

Hi,

I run some barrier tests over device-mapper (which currently doesn't
support barrier bio at all) and even if I set barrier=1 in ext3 mount,
there is never any bio with barrier flag... (in 2.6.27-rc)

How is the barrier=1 flag supposed to work in ext3 (JBD) now?

See:
If you specify barrier=1, JFS_BARRIER flag is set in ext3_init_journal_params
journal->j_flags |= JFS_BARRIER;

Now, journal_write_commit_record is called and this happens:

if (journal->j_flags & JFS_BARRIER) {
set_buffer_ordered(bh);
barrier_done = 1;
}
ret = sync_dirty_buffer(bh);

if (barrier_done)
clear_buffer_ordered(bh);

if (ret == -EOPNOTSUPP && barrier_done) {
...

>From this code I expect that EOPNOTSUPP is returned if barrier is not
supported (yes, that exactly does device-mapper now without barrier patches).

But it *never* happens because:

sync_dirty_buffer always calls
submit_bh(WRITE_SYNC, bh)

and in submit_bh is this test:

if (buffer_ordered(bh) && (rw == WRITE))
rw = WRITE_BARRIER;

but there is rw == WRITE_SYNC, not WRITE !

So the barrier flag for bio is never set and normal sync write
is performed.

Why it isn't done like in attached patch? Is it intentional or it is bug?

I think it was caused by change in this commit:

commit 18ce3751ccd488c78d3827e9f6bf54e6322676fb
Author: Jens Axboe <[email protected]>
Date: Tue Jul 1 09:07:34 2008 +0200

Properly notify block layer of sync writes

Milan
--

Set BIO_RW_BARRIER flag even for submit_bh sync write request.

Signed-off-by: Milan Broz <[email protected]>
---
fs/buffer.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2926,16 +2926,16 @@ int submit_bh(int rw, struct buffer_head * bh)
BUG_ON(!buffer_mapped(bh));
BUG_ON(!bh->b_end_io);

- if (buffer_ordered(bh) && (rw == WRITE))
- rw = WRITE_BARRIER;
-
/*
* Only clear out a write error when rewriting, should this
* include WRITE_SYNC as well?
*/
- if (test_set_buffer_req(bh) && (rw == WRITE || rw == WRITE_BARRIER))
+ if (test_set_buffer_req(bh) && rw == WRITE)
clear_buffer_write_io_error(bh);

+ if (buffer_ordered(bh) && ((rw & RW_MASK) == WRITE))
+ rw |= (1 << BIO_RW_BARRIER);
+
/*
* from here on down, it's all bio -- do the initial mapping,
* submit_bio -> generic_make_request may further map this bio around




2008-08-20 23:38:21

by Eric Sandeen

[permalink] [raw]
Subject: Re: Mount ext3 with barrier=1 doesn't send real barrier bio?

Milan Broz wrote:
> Hi,
>
> I run some barrier tests over device-mapper (which currently doesn't
> support barrier bio at all) and even if I set barrier=1 in ext3 mount,
> there is never any bio with barrier flag... (in 2.6.27-rc)
>
> How is the barrier=1 flag supposed to work in ext3 (JBD) now?

Milan, you're right. Ric saw this same strange behavior when doing some
benchmarking with and without barriers; Chris noticed the change in
submit_bh; I was about to write up a similar patch to what you've sent
already. Jens, does Milan's fix look good to you?

Incidentally, I ran Ric's test on ext3 on a sata drive:

# fs_mark -d /mnt/test -n 1600 -t 1 -s 20480

cfq:
files/s
2.6.25 2.6.26.2 2.6.26.2+patch
barrier=0 169 127 126
barrier=1 33 126 33

noop:
files/s
2.6.25 2.6.26.2 2.6.26.2+patch
barrier=0 191 184 185
barrier=1 33 180 33

deadline:
files/s
2.6.25 2.6.26.2 2.6.26.2+patch
barrier=0 181 182 185
barrier=1 33 185 33

anticipatory:
files/s
2.6.25 2.6.26.2 2.6.26.2+patch
barrier=0 187 133 132
barrier=1 34 134 33

-Eric

> See:
> If you specify barrier=1, JFS_BARRIER flag is set in ext3_init_journal_params
> journal->j_flags |= JFS_BARRIER;
>
> Now, journal_write_commit_record is called and this happens:
>
> if (journal->j_flags & JFS_BARRIER) {
> set_buffer_ordered(bh);
> barrier_done = 1;
> }
> ret = sync_dirty_buffer(bh);
>
> if (barrier_done)
> clear_buffer_ordered(bh);
>
> if (ret == -EOPNOTSUPP && barrier_done) {
> ...
>
> From this code I expect that EOPNOTSUPP is returned if barrier is not
> supported (yes, that exactly does device-mapper now without barrier patches).
>
> But it *never* happens because:
>
> sync_dirty_buffer always calls
> submit_bh(WRITE_SYNC, bh)
>
> and in submit_bh is this test:
>
> if (buffer_ordered(bh) && (rw == WRITE))
> rw = WRITE_BARRIER;
>
> but there is rw == WRITE_SYNC, not WRITE !
>
> So the barrier flag for bio is never set and normal sync write
> is performed.
>
> Why it isn't done like in attached patch? Is it intentional or it is bug?
>
> I think it was caused by change in this commit:
>
> commit 18ce3751ccd488c78d3827e9f6bf54e6322676fb
> Author: Jens Axboe <[email protected]>
> Date: Tue Jul 1 09:07:34 2008 +0200
>
> Properly notify block layer of sync writes
>
> Milan
> --
>
> Set BIO_RW_BARRIER flag even for submit_bh sync write request.
>
> Signed-off-by: Milan Broz <[email protected]>
> ---
> fs/buffer.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2926,16 +2926,16 @@ int submit_bh(int rw, struct buffer_head * bh)
> BUG_ON(!buffer_mapped(bh));
> BUG_ON(!bh->b_end_io);
>
> - if (buffer_ordered(bh) && (rw == WRITE))
> - rw = WRITE_BARRIER;
> -
> /*
> * Only clear out a write error when rewriting, should this
> * include WRITE_SYNC as well?
> */
> - if (test_set_buffer_req(bh) && (rw == WRITE || rw == WRITE_BARRIER))
> + if (test_set_buffer_req(bh) && rw == WRITE)
> clear_buffer_write_io_error(bh);
>
> + if (buffer_ordered(bh) && ((rw & RW_MASK) == WRITE))
> + rw |= (1 << BIO_RW_BARRIER);
> +
> /*
> * from here on down, it's all bio -- do the initial mapping,
> * submit_bio -> generic_make_request may further map this bio around
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-08-21 05:56:29

by Jens Axboe

[permalink] [raw]
Subject: Re: Mount ext3 with barrier=1 doesn't send real barrier bio?

On Wed, Aug 20 2008, Eric Sandeen wrote:
> Milan Broz wrote:
> > Hi,
> >
> > I run some barrier tests over device-mapper (which currently doesn't
> > support barrier bio at all) and even if I set barrier=1 in ext3 mount,
> > there is never any bio with barrier flag... (in 2.6.27-rc)
> >
> > How is the barrier=1 flag supposed to work in ext3 (JBD) now?
>
> Milan, you're right. Ric saw this same strange behavior when doing some
> benchmarking with and without barriers; Chris noticed the change in
> submit_bh; I was about to write up a similar patch to what you've sent
> already. Jens, does Milan's fix look good to you?

Yep looks good, thanks a lot Milan! I'll send in the patch, unless I'm
badly mistaken we need it for 2.6.26-stable as well.

--
Jens Axboe

2008-08-21 10:45:26

by Ric Wheeler

[permalink] [raw]
Subject: Re: Mount ext3 with barrier=1 doesn't send real barrier bio?

Jens Axboe wrote:
> On Wed, Aug 20 2008, Eric Sandeen wrote:
>
>> Milan Broz wrote:
>>
>>> Hi,
>>>
>>> I run some barrier tests over device-mapper (which currently doesn't
>>> support barrier bio at all) and even if I set barrier=1 in ext3 mount,
>>> there is never any bio with barrier flag... (in 2.6.27-rc)
>>>
>>> How is the barrier=1 flag supposed to work in ext3 (JBD) now?
>>>
>> Milan, you're right. Ric saw this same strange behavior when doing some
>> benchmarking with and without barriers; Chris noticed the change in
>> submit_bh; I was about to write up a similar patch to what you've sent
>> already. Jens, does Milan's fix look good to you?
>>
>
> Yep looks good, thanks a lot Milan! I'll send in the patch, unless I'm
> badly mistaken we need it for 2.6.26-stable as well.
>
>

I think we definitely need it there as well, thanks!

Ric

2008-08-21 22:24:21

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: Mount ext3 with barrier=1 doesn't send real barrier bio?

Eric Sandeen <[email protected]> writes:

>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -2926,16 +2926,16 @@ int submit_bh(int rw, struct buffer_head * bh)
>> BUG_ON(!buffer_mapped(bh));
>> BUG_ON(!bh->b_end_io);
>>
>> - if (buffer_ordered(bh) && (rw == WRITE))
>> - rw = WRITE_BARRIER;
>> -
>> /*
>> * Only clear out a write error when rewriting, should this
>> * include WRITE_SYNC as well?
>> */
>> - if (test_set_buffer_req(bh) && (rw == WRITE || rw == WRITE_BARRIER))
>> + if (test_set_buffer_req(bh) && rw == WRITE)
>> clear_buffer_write_io_error(bh);

This should be ((rw & RW_MASK) == WRITE) too? Anyway, this seems change
behavior of submit_bh(WRITE_BARRIER) (maybe reiserfs only), it wouldn't
be your intent...

>> + if (buffer_ordered(bh) && ((rw & RW_MASK) == WRITE))
>> + rw |= (1 << BIO_RW_BARRIER);
>> +
>> /*
>> * from here on down, it's all bio -- do the initial mapping,
>> * submit_bio -> generic_make_request may further map this bio around
--
OGAWA Hirofumi <[email protected]>

2008-08-22 06:38:38

by Jens Axboe

[permalink] [raw]
Subject: Re: Mount ext3 with barrier=1 doesn't send real barrier bio?

On Fri, Aug 22 2008, OGAWA Hirofumi wrote:
> Eric Sandeen <[email protected]> writes:
>
> >> --- a/fs/buffer.c
> >> +++ b/fs/buffer.c
> >> @@ -2926,16 +2926,16 @@ int submit_bh(int rw, struct buffer_head * bh)
> >> BUG_ON(!buffer_mapped(bh));
> >> BUG_ON(!bh->b_end_io);
> >>
> >> - if (buffer_ordered(bh) && (rw == WRITE))
> >> - rw = WRITE_BARRIER;
> >> -
> >> /*
> >> * Only clear out a write error when rewriting, should this
> >> * include WRITE_SYNC as well?
> >> */
> >> - if (test_set_buffer_req(bh) && (rw == WRITE || rw == WRITE_BARRIER))
> >> + if (test_set_buffer_req(bh) && rw == WRITE)
> >> clear_buffer_write_io_error(bh);
>
> This should be ((rw & RW_MASK) == WRITE) too? Anyway, this seems change
> behavior of submit_bh(WRITE_BARRIER) (maybe reiserfs only), it wouldn't
> be your intent...

Yes, I believe the simpler and more correct fix is:

diff --git a/fs/buffer.c b/fs/buffer.c
index 38653e3..16b2263 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2926,14 +2926,13 @@ int submit_bh(int rw, struct buffer_head * bh)
BUG_ON(!buffer_mapped(bh));
BUG_ON(!bh->b_end_io);

- if (buffer_ordered(bh) && (rw == WRITE))
+ if (buffer_ordered(bh) && (rw & WRITE))
rw = WRITE_BARRIER;

/*
- * Only clear out a write error when rewriting, should this
- * include WRITE_SYNC as well?
+ * Only clear out a write error when rewriting
*/
- if (test_set_buffer_req(bh) && (rw == WRITE || rw == WRITE_BARRIER))
+ if (test_set_buffer_req(bh) && (rw & WRITE))
clear_buffer_write_io_error(bh);

/*

--
Jens Axboe

2008-08-22 07:46:19

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: Mount ext3 with barrier=1 doesn't send real barrier bio?

Jens Axboe <[email protected]> writes:

>> This should be ((rw & RW_MASK) == WRITE) too? Anyway, this seems change
>> behavior of submit_bh(WRITE_BARRIER) (maybe reiserfs only), it wouldn't
>> be your intent...
>
> Yes, I believe the simpler and more correct fix is:
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 38653e3..16b2263 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2926,14 +2926,13 @@ int submit_bh(int rw, struct buffer_head * bh)
> BUG_ON(!buffer_mapped(bh));
> BUG_ON(!bh->b_end_io);
>
> - if (buffer_ordered(bh) && (rw == WRITE))
> + if (buffer_ordered(bh) && (rw & WRITE))
> rw = WRITE_BARRIER;

I see. But, umm..., this means WRITE_SYNC with barrier was deprecated?
Or typo?
--
OGAWA Hirofumi <[email protected]>

2008-08-22 07:58:34

by Jens Axboe

[permalink] [raw]
Subject: Re: Mount ext3 with barrier=1 doesn't send real barrier bio?

On Fri, Aug 22 2008, OGAWA Hirofumi wrote:
> Jens Axboe <[email protected]> writes:
>
> >> This should be ((rw & RW_MASK) == WRITE) too? Anyway, this seems change
> >> behavior of submit_bh(WRITE_BARRIER) (maybe reiserfs only), it wouldn't
> >> be your intent...
> >
> > Yes, I believe the simpler and more correct fix is:
> >
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 38653e3..16b2263 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -2926,14 +2926,13 @@ int submit_bh(int rw, struct buffer_head * bh)
> > BUG_ON(!buffer_mapped(bh));
> > BUG_ON(!bh->b_end_io);
> >
> > - if (buffer_ordered(bh) && (rw == WRITE))
> > + if (buffer_ordered(bh) && (rw & WRITE))
> > rw = WRITE_BARRIER;
>
> I see. But, umm..., this means WRITE_SYNC with barrier was deprecated?
> Or typo?

It was supposed to read rw |= WRITE_BARRIER :-)

--
Jens Axboe