2008-12-04 14:01:12

by Mikulas Patocka

[permalink] [raw]
Subject: Device loses barrier support (was: Fixed patch for simple barriers.)

On Thu, 4 Dec 2008, Andi Kleen wrote:

> On Thu, Dec 04, 2008 at 12:09:56AM -0500, Mikulas Patocka wrote:
> >
> > BTW. how is this patch supposed to work with pvmove? I.e. you advertise to
> > a filesystem that you support barriers, then the user runs pvmove and you
> > drop barrier support while the filesystem is mounted - that will confuse
> > the filesystem and maybe produce a data corruption. I wouldn't recommend
>
> File systems handle this generally. Also the pvmove itself will
> act as a barrier.
>
> -Andi

How do you want to handle this?

Imagine:
the filesystem submits a 1st write request
the filesystem submits a 2nd write barrier request
the filesystem submits a 3rd write request

... time passes ...

the 1st write request ends with success
the 2nd write request ends with -EOPNOTSUPP
the 3rd write request ends with success

--- when you first see -EOPNOTSUPP, you have already corrupted filesystem
(the 3rd write passed while the filesystem expected that it would be
finished after the 2nd write) and you are in an interrupt context, where
you can't reissue -EOPNOTSUPP request. So what do you want to do?


Possible ways how to solve it:

1) Wait synchronously for barriers, don't issue any other writes while
barrier is pending.

- this basically supresses any performance advantage barriers could have.
Ext3 is doing this.

- this solion is right. But if this is "the way it should be done", you
could rip barriers from the kernel completely and replace them with a
simple call to flush hardware cache. In this use scenario, they have no
advantage over a simple call to flush cache.

2) Resubmit the failed -EOPNOTSUPP request from a thread.

- this is what XFS is doing. Bad for code complexity (there must be a
special thread just to catch failed IOs). Also, it still produces
corrupted filesystem for a brief period of time.

3) Fail barriers only synchronously? (so that the caller can detect
missing barrier support before issuing other writes)

- unimplemntable in device mapper, if the device is suspended, it queues
bios.

4) Disallow losing barrier support?

- for me it looks like a sensible solution.

Mikulas


2008-12-04 14:09:21

by Andi Kleen

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

> the 1st write request ends with success
> the 2nd write request ends with -EOPNOTSUPP
> the 3rd write request ends with success
>
> --- when you first see -EOPNOTSUPP, you have already corrupted filesystem
> (the 3rd write passed while the filesystem expected that it would be

There's no passing of requests during pvmove. It's a really strong
barrier.

> finished after the 2nd write) and you are in an interrupt context, where
> you can't reissue -EOPNOTSUPP request. So what do you want to do?

The barrier aware file systems I know of just resubmit synchronously when
a barrier fails.

-Andi

2008-12-04 14:17:20

by Mikulas Patocka

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

On Thu, 4 Dec 2008, Andi Kleen wrote:

> > the 1st write request ends with success
> > the 2nd write request ends with -EOPNOTSUPP
> > the 3rd write request ends with success
> >
> > --- when you first see -EOPNOTSUPP, you have already corrupted filesystem
> > (the 3rd write passed while the filesystem expected that it would be
>
> There's no passing of requests during pvmove. It's a really strong
> barrier.

You start pvmove. The filesystem doesn't know about pvmove.

The next time filesystem does somethig, it submits these 3 requests and
the 2nd fill unexpectedly fail.

So the fact that pvmove drains the request queue won't help you.

> > finished after the 2nd write) and you are in an interrupt context, where
> > you can't reissue -EOPNOTSUPP request. So what do you want to do?
>
> The barrier aware file systems I know of just resubmit synchronously when
> a barrier fails.

... and produce structure corruption for certain period in time, because
the writes meant to be ordered are submitted unordered.

Mikulas

> -Andi
>

2008-12-04 14:47:20

by Andi Kleen

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

> You start pvmove. The filesystem doesn't know about pvmove.
>
> The next time filesystem does somethig, it submits these 3 requests and
> the 2nd fill unexpectedly fail.

Again the file systems handle failing barriers and they expect it.

>
> So the fact that pvmove drains the request queue won't help you.

Help you against what?

>
> > > finished after the 2nd write) and you are in an interrupt context, where
> > > you can't reissue -EOPNOTSUPP request. So what do you want to do?
> >
> > The barrier aware file systems I know of just resubmit synchronously when
> > a barrier fails.
>
> ... and produce structure corruption for certain period in time, because
> the writes meant to be ordered are submitted unordered.

No there is nothing unordered. The file system path typically looks like

commit of a transaction
if (i have never seen a barrier failing)
write block with barrier
if (EOPNOTSUPP) {
record failure
submit synchronously
}
} else
submit synchronously


So if a pvmove barrier fails it will just submit synchronously.

The write block with barrier bit varies, jbd/gfs2 do it synchronously
too and xfs does it asynchronously (with io done callbacks), but
in both cases they handle an EOPNOTSUPP comming out in the final
io done.

When the pvmove migrates from no barrier support to barrier support
there won't be any barrier on the file system for the time of the
current mount, but that's also fine.

-Andi

2008-12-04 16:46:32

by Mikulas Patocka

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

> > > > finished after the 2nd write) and you are in an interrupt context, where
> > > > you can't reissue -EOPNOTSUPP request. So what do you want to do?
> > >
> > > The barrier aware file systems I know of just resubmit synchronously when
> > > a barrier fails.
> >
> > ... and produce structure corruption for certain period in time, because
> > the writes meant to be ordered are submitted unordered.
>
> No there is nothing unordered. The file system path typically looks like
>
> commit of a transaction
> if (i have never seen a barrier failing)
> write block with barrier
> if (EOPNOTSUPP) {
> record failure
> submit synchronously
> }
> } else
> submit synchronously
>

If you view this as a "right" way of using barriers, then you can drop
barrier support at all and replace this code sequence with:

flush disk cache
submit write synchronously
flush disk cache

--- because synchronous barriers bring you no performance advantage over
the above sequence.

> So if a pvmove barrier fails it will just submit synchronously.
>
> The write block with barrier bit varies, jbd/gfs2 do it synchronously
> too and xfs does it asynchronously (with io done callbacks), but

And how does xfs preserve write ordering, if the barrier asynchronously
fails with -EOPNOTSUPP and there are other writes submitted after the
barrier?

> in both cases they handle an EOPNOTSUPP comming out in the final
> io done.

Mikulas

2008-12-04 17:43:47

by Andi Kleen

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

On Thu, Dec 04, 2008 at 11:45:44AM -0500, Mikulas Patocka wrote:
> > No there is nothing unordered. The file system path typically looks like
> >
> > commit of a transaction
> > if (i have never seen a barrier failing)
> > write block with barrier
> > if (EOPNOTSUPP) {
> > record failure
> > submit synchronously
> > }
> > } else
> > submit synchronously
> >
>
> If you view this as a "right" way of using barriers, then you can drop

It's the way the file systems do it. If you don't believe me feel
free to read the code for yourself.

> barrier support at all and replace this code sequence with:
>
> flush disk cache
> submit write synchronously
> flush disk cache
>
> --- because synchronous barriers bring you no performance advantage over
> the above sequence.

Remember this is done by a commit thread in a journaling file system.
Commits are ordered so the thread cannot really order out of order
anyways. And yes the barriers are essentially a way to flush the cache
regularly for selected commits. The alternative (if you
want to guarantee transaction order) would be to disable
the write cache completely and do it synchronous on each IO.

>
> > So if a pvmove barrier fails it will just submit synchronously.
> >
> > The write block with barrier bit varies, jbd/gfs2 do it synchronously
> > too and xfs does it asynchronously (with io done callbacks), but
>
> And how does xfs preserve write ordering, if the barrier asynchronously
> fails with -EOPNOTSUPP and there are other writes submitted after the
> barrier?

>From the high level journaling perspective they are not asynchronous
I think. Just the low level xfs_buf interface happens to use the asynchronous
callbacks instead of calling into the block layer directly like jbd et.al.
do.

-Andi

2008-12-04 17:53:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

On Thu, Dec 04, 2008 at 06:48:38PM +0100, Andi Kleen wrote:
> I think. Just the low level xfs_buf interface happens to use the asynchronous
> callbacks instead of calling into the block layer directly like jbd et.al.
> do.

Only delwri buffers are delayed in XFS, but the journaling code only
uses async buffers which *synchronously* call into the block layer, but
just don't wait for it to complete..

2008-12-04 19:42:53

by Mikulas Patocka

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)



On Thu, 4 Dec 2008, Andi Kleen wrote:

> On Thu, Dec 04, 2008 at 11:45:44AM -0500, Mikulas Patocka wrote:
> > > No there is nothing unordered. The file system path typically looks like
> > >
> > > commit of a transaction
> > > if (i have never seen a barrier failing)
> > > write block with barrier
> > > if (EOPNOTSUPP) {
> > > record failure
> > > submit synchronously
> > > }
> > > } else
> > > submit synchronously
> > >
> >
> > If you view this as a "right" way of using barriers, then you can drop
>
> It's the way the file systems do it. If you don't believe me feel
> free to read the code for yourself.
>
> > barrier support at all and replace this code sequence with:
> >
> > flush disk cache
> > submit write synchronously
> > flush disk cache
> >
> > --- because synchronous barriers bring you no performance advantage over
> > the above sequence.
>
> Remember this is done by a commit thread in a journaling file system.
> Commits are ordered so the thread cannot really order out of order
> anyways. And yes the barriers are essentially a way to flush the cache
> regularly for selected commits. The alternative (if you
> want to guarantee transaction order) would be to disable
> the write cache completely and do it synchronous on each IO.

Some times ago, I used barriers the asynchronous way in the spadfs
filesystem and they helped very much. The commit logic is:

- take the transaction lock (it prevents any updates)
- write remaining dirty buffers (but don't wait)
- submit barrier write to move to new transaction (but don't wait)
- drop the transaction lock
- eventually wait for completion of writes (if this was issued by fsync or
sync) or don't wait at all if it was issued by other things (periodic
syncer, emergency sync to reclaim free space or so).

The advantage of this approach is that the lock is held for very small
time, no IOs are really waited for inside the lock. So it doesn't block
concurrent activity while someone is committing. Note, that after the lock
is dropped, anyone can for example call mark_buffer_dirty, but writing of
the new buffer won't be reordered with the barrier write that is already
pending in the queue, so consistency is maintained.

I somehow got the idea that this was the reason why barriers are
implemented the way they are and that this was their intended mode of
operation. Note that you can't achieve the same thing (don't wait inside
the lock) if you submit barriers synchronously or if you use non-barrier
writes and disk cache flushes.

If you are pushing what you are pushing --- barriers allowing to return
EOPNOTSUPP anytime --- then asynchronous barrier submits can no longer be
used, because by the time EOPNOTSUPP is detected, the filesystem is
already corrupted.

Also, read Documentation/block/barrier.txt and see what you are breaking
in the document:

"all requests queued after the barrier request must be started only after
the barrier request is finished (again, made it to the physical medium)."

"Preceding requests are processed before the barrier and
following requests after."

--- you are going to break these rules. If the barrier can return
-EOPNOTSUPP anytime, the following request will be finished before the
barrier write is finished. (because the barrier write must be resubmitted
without barrier)

Basically, if you allow randomly failed barriers, you can drop barriers at
all and replace them with blkdev_issue_flush(); write&wait_synchronous();
blkdev_issue_flush(). There is no more reason to maintain complicated
logic of barriers, if you cripple them to the unusable point when they
randomly fail.


Another thing:

I'm wondering, where in fsync() does Linux wait for hardware disk cache to
be flushed? Isn't there a bug that fsync() will return before the cache is
flushed? I couldn't really find it. The last thing do_fsync calls is
filemap_fdatawait and it doesn't do cache flush (blkdev_issue_flush).

Mikulas

2008-12-04 22:04:52

by Andi Kleen

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

> If you are pushing what you are pushing --- barriers allowing to return
> EOPNOTSUPP anytime --- then asynchronous barrier submits can no longer be
> used, because by the time EOPNOTSUPP is detected, the filesystem is
> already corrupted.

Chris Mason pointed out that this can actually already happen. From
a quick review this can happen in MD raid1 at least (their barriers_work
flag is pretty similar to the DM implementation I did). So everyone
has to handle this already anyways.

> I'm wondering, where in fsync() does Linux wait for hardware disk cache to
> be flushed? Isn't there a bug that fsync() will return before the cache is
> flushed? I couldn't really find it. The last thing do_fsync calls is
> filemap_fdatawait and it doesn't do cache flush (blkdev_issue_flush).

At least in fsync() on journaling fs the metadata update should push it.

-Andi

--
[email protected]

2008-12-04 23:08:49

by Mikulas Patocka

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

On Thu, 4 Dec 2008, Andi Kleen wrote:

> > If you are pushing what you are pushing --- barriers allowing to return
> > EOPNOTSUPP anytime --- then asynchronous barrier submits can no longer be
> > used, because by the time EOPNOTSUPP is detected, the filesystem is
> > already corrupted.
>
> Chris Mason pointed out that this can actually already happen. From
> a quick review this can happen in MD raid1 at least (their barriers_work
> flag is pretty similar to the DM implementation I did). So everyone
> has to handle this already anyways.

So: remove barriers completely and use only blkdev_issue_flush to flush
disk cache. Because none of the major filesystems learned to use
barrier-optimized commits and this
"barriers-randomly-fail-with-EOPNOTSUPP" fact makes it impossible to use
them in an optimized way anyway.

There is another point:
"what is the main performance advantage of barriers?" - "that the user can
turn on hardware write cache with hdparm -W 1 command".

And if barriers fail at random points, the user can't turn on disk cache
anyway (he would get data corruption if barrier write failed and hardware
write cache was enabled). So barriers make no sense here.

> > I'm wondering, where in fsync() does Linux wait for hardware disk cache to
> > be flushed? Isn't there a bug that fsync() will return before the cache is
> > flushed? I couldn't really find it. The last thing do_fsync calls is
> > filemap_fdatawait and it doesn't do cache flush (blkdev_issue_flush).
>
> At least in fsync() on journaling fs the metadata update should push it.
>
> -Andi

And what about fdatasync()?

Mikulas

2008-12-05 00:37:50

by Andi Kleen

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

> And if barriers fail at random points, the user can't turn on disk cache
> anyway (he would get data corruption if barrier write failed and hardware

I think we already established earlier in the thread that there is no disk
corruption

> > > I'm wondering, where in fsync() does Linux wait for hardware disk cache to
> > > be flushed? Isn't there a bug that fsync() will return before the cache is
> > > flushed? I couldn't really find it. The last thing do_fsync calls is
> > > filemap_fdatawait and it doesn't do cache flush (blkdev_issue_flush).
> >
> > At least in fsync() on journaling fs the metadata update should push it.
> >
> > -Andi
>
> And what about fdatasync()?

I don't know. The surest way to find out is to instrument it and try.

-Andi
--
[email protected]

2008-12-05 01:17:15

by Mikulas Patocka

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

On Fri, 5 Dec 2008, Andi Kleen wrote:

> > And if barriers fail at random points, the user can't turn on disk cache
> > anyway (he would get data corruption if barrier write failed and hardware
>
> I think we already established earlier in the thread that there is no disk
> corruption

So, the facts are:

* barrier support in md-raid1 deviates from the specification at
Documentation/block/barrier.txt. The specification says that requests
submitted after the barrier request hit the media after the barrier
request hits the media. The reality is that the barrier request can be
randomly aborted and the requests submitted after it hit the media before
the barrier request.

* the filesystems developed hacks to work around this issue, the hacks
involve not submitting more requests after the barrier request,
synchronously waiting for the barrier request and eventually retrying it.
These hacks suppress any performance advantage barriers could bring.

* you submit a patch that makes barriers even more often deviate from the
specification and you argue that the patch is correct because filesystems
handle this deviation.

This is runaway logic that will eventually turn Linux into unmaintainable
mess. What do you think we'll be doing when we'll be implementing barriers
into other dm targets? Do you really think it'll be fun to write code to
double-submit all metadata writes just because you and some person at
md-raid1 provided an unreliable interface?

I am again repeating: either make barriers consistent with the
specification, or remove them at all.

Mikulas

2008-12-05 01:26:40

by Andi Kleen

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

> * barrier support in md-raid1 deviates from the specification at
> Documentation/block/barrier.txt. The specification says that requests
> submitted after the barrier request hit the media after the barrier
> request hits the media. The reality is that the barrier request can be
> randomly aborted and the requests submitted after it hit the media before
> the barrier request.

Yes the spec should be probably updated.

But also see Linus' rant from yesterday about code vs documentation.
When in doubt the code wins.
>
> * the filesystems developed hacks to work around this issue, the hacks
> involve not submitting more requests after the barrier request,

I suspect the reason the file systems did it this way is that
it was a much simpler change than to rewrite the transaction
manager for this.

> synchronously waiting for the barrier request and eventually retrying it.
> These hacks suppress any performance advantage barriers could bring.
>
> * you submit a patch that makes barriers even more often deviate from the
> specification and you argue that the patch is correct because filesystems
> handle this deviation.

Sorry what counts is the code behaviour, not the specification.

-Andi

--
[email protected]

2008-12-05 02:26:57

by Mikulas Patocka

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)



On Fri, 5 Dec 2008, Andi Kleen wrote:

> > * barrier support in md-raid1 deviates from the specification at
> > Documentation/block/barrier.txt. The specification says that requests
> > submitted after the barrier request hit the media after the barrier
> > request hits the media. The reality is that the barrier request can be
> > randomly aborted and the requests submitted after it hit the media before
> > the barrier request.
>
> Yes the spec should be probably updated.
>
> But also see Linus' rant from yesterday about code vs documentation.
> When in doubt the code wins.

The only one offender is "md". It is less overhead to change "md" to play
nice and be reliable than to double-submit requests in all the places that
needs write ordering.

> > * the filesystems developed hacks to work around this issue, the hacks
> > involve not submitting more requests after the barrier request,
>
> I suspect the reason the file systems did it this way is that
> it was a much simpler change than to rewrite the transaction
> manager for this.

It could be initial reason. But this unreliability also disallows any
improvement in filesystems. No one can write asynchronous transaction
manager because of that evil EOPNOTSUPP.

> > synchronously waiting for the barrier request and eventually retrying it.
> > These hacks suppress any performance advantage barriers could bring.
> >
> > * you submit a patch that makes barriers even more often deviate from the
> > specification and you argue that the patch is correct because filesystems
> > handle this deviation.
>
> Sorry what counts is the code behaviour, not the specification.

Better interface is that one that has less maintenance overhead. And I
don't see requiring the programmers of all IO code to double-submit
requests as less maintenance overhead.

> -Andi

Mikulas

---

If you want to make it easier to infer functionality from the code, apply
this patch :)

---
block/blk-core.c | 8 ++++++++
1 file changed, 8 insertions(+)

Index: linux-2.6.28-rc5-devel/block/blk-core.c
===================================================================
--- linux-2.6.28-rc5-devel.orig/block/blk-core.c 2008-12-05 02:54:25.000000000 +0100
+++ linux-2.6.28-rc5-devel/block/blk-core.c 2008-12-05 03:14:23.000000000 +0100
@@ -28,6 +28,7 @@
#include <linux/task_io_accounting_ops.h>
#include <linux/blktrace_api.h>
#include <linux/fault-inject.h>
+#include <linux/random.h>

#include "blk.h"

@@ -1528,6 +1529,13 @@ void submit_bio(int rw, struct bio *bio)

bio->bi_rw |= rw;

+ /* At least, make the true nature of write barriers obvious. */
+
+ if (bio_barrier(bio) && !(random32() % 42)) {
+ bio_endio(bio, -EOPNOTSUPP);
+ return;
+ }
+
/*
* If it's a regular read/write or a barrier with data attached,
* go through the normal accounting stuff before submission.

2008-12-05 02:58:37

by Andi Kleen

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

> The only one offender is "md".

I'm not sure. It wouldn't surprise me if it can happen with other setups
too. Perhaps Chris knows more.

> It is less overhead to change "md" to play
> nice and be reliable than to double-submit requests in all the places that
> needs write ordering.

They do that already anyways.

>
> > > * the filesystems developed hacks to work around this issue, the hacks
> > > involve not submitting more requests after the barrier request,
> >
> > I suspect the reason the file systems did it this way is that
> > it was a much simpler change than to rewrite the transaction
> > manager for this.
>
> It could be initial reason. But this unreliability also disallows any
> improvement in filesystems. No one can write asynchronous transaction
> manager because of that evil EOPNOTSUPP.

Doesn't seem right. It would be a simple state machine
to handle it fully asynchronous. Alternatively you could
always use empty barriers.

But we can worry about that when some in tree file system
actually tries to do that.

-Andi

--
[email protected]

2008-12-05 03:27:26

by Eric Sandeen

[permalink] [raw]
Subject: Re: Device loses barrier support

Mikulas Patocka wrote:

> Another thing:
>
> I'm wondering, where in fsync() does Linux wait for hardware disk cache to
> be flushed? Isn't there a bug that fsync() will return before the cache is
> flushed? I couldn't really find it. The last thing do_fsync calls is
> filemap_fdatawait and it doesn't do cache flush (blkdev_issue_flush).

ext4, reiserfs, and xfs all call blkdev_issue_flush() in their ->fsync
file operations (or down that path).

-Eric

2008-12-05 05:56:17

by Timothy Shimmin

[permalink] [raw]
Subject: Re: Device loses barrier support

Andi Kleen wrote:
> The write block with barrier bit varies, jbd/gfs2 do it synchronously
> too and xfs does it asynchronously (with io done callbacks), but
> in both cases they handle an EOPNOTSUPP comming out in the final
> io done.
>
Yes, XFS handles it, however,
it doesn't look like we currently propagate the EOPNOTSUPP
up to where we test it (not set for b_error).
Patch disscussed recently on xfs list to rectify this.

--Tim

2008-12-05 11:54:15

by Alan

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

On Fri, 5 Dec 2008 02:37:39 +0100
Andi Kleen <[email protected]> wrote:

> > * barrier support in md-raid1 deviates from the specification at
> > Documentation/block/barrier.txt. The specification says that requests
> > submitted after the barrier request hit the media after the barrier
> > request hits the media. The reality is that the barrier request can be
> > randomly aborted and the requests submitted after it hit the media before
> > the barrier request.
>
> Yes the spec should be probably updated.
>
> But also see Linus' rant from yesterday about code vs documentation.
> When in doubt the code wins.

Not when the fundamental design of the code is broken and trashes
performance. The documented behaviour is strongly desirable.

Alan

2008-12-05 12:18:29

by Andi Kleen

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

> Not when the fundamental design of the code is broken and trashes
> performance.

Sorry but that's just not correct. There's nothing in late failing
barriers that "trashes performance". The file system writers have
to be careful to handle it, but at least the current ones all do.
And also if someone writes a hypothetical fully asynchronously driven
barrier based IO transaction system it would be still possible to handle
the late failing barrier without too many complications.

Also late failing barriers is pretty much the only sane way to implement
barriers in software remapping schemes like DM and MD.

-Andi

--
[email protected]

2008-12-05 18:21:23

by Bodo Eggert

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

Andi Kleen <[email protected]> wrote:

>> Not when the fundamental design of the code is broken and trashes
>> performance.
>
> Sorry but that's just not correct. There's nothing in late failing
> barriers that "trashes performance". The file system writers have
> to be careful to handle it, but at least the current ones all do.

So let's keep requiring the "trashes performance" hacks, because they're
not directly in the barriers code? Hey, we nailed one foot to the ground,
and since it's not the nail's fault, let's do the same to the other foot?
It does not matter, since we can't move around right now, so it's a pretty
neat idea, isn't it?

> And also if someone writes a hypothetical fully asynchronously driven
> barrier based IO transaction system it would be still possible to handle
> the late failing barrier without too many complications.

You can just replace barriers with NOPs without too many complications,
because it only matters in cases of system failure. And even then,
it's only the difference between having a filesystem corruption and not
having it ... Who'd care?

> Also late failing barriers is pretty much the only sane way to implement
> barriers in software remapping schemes like DM and MD.

No, and seeing the same resubmit logic in more than one place should give
you the clue, even if you don't grasp that intermixing requests which
explicitely MUST NOT be intermixed is wrong.

The sane way is having a software barrier in DM and MD, going back to
sync if the new hardware does not support barriers in hardware. The
filesystems should not even need to know if barriers are supported,
just use them and they'll DTRT in any case. Any other interface is
worse than useless.

2008-12-05 18:30:21

by Andi Kleen

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

On Fri, Dec 05, 2008 at 07:21:06PM +0100, Bodo Eggert wrote:
> Andi Kleen <[email protected]> wrote:
>
> >> Not when the fundamental design of the code is broken and trashes
> >> performance.
> >
> > Sorry but that's just not correct. There's nothing in late failing
> > barriers that "trashes performance". The file system writers have
> > to be careful to handle it, but at least the current ones all do.
>
> So let's keep requiring the "trashes performance" hacks, because they're

Again there's nothing about the code that handles this that
"trashes performance".

Mickulas was just ranting that the current file systems don't use
barriers in the way he thinks they ought to be used, but that
is completely orthogonal to the problem if barriers may fail
late or not.

In Mickulas imaginary barrier world it would be slightly more
complicated to handle it, but far from impossible. And again
if there it wouldn't affect performance. In the real world
Linux barrier world handling late failing barriers in the fs
is also very easy currently.

-Andi

2008-12-07 04:17:24

by Dave Chinner

[permalink] [raw]
Subject: Re: Device loses barrier support (was: Fixed patch for simple barriers.)

On Thu, Dec 04, 2008 at 11:45:44AM -0500, Mikulas Patocka wrote:
> > The write block with barrier bit varies, jbd/gfs2 do it synchronously
> > too and xfs does it asynchronously (with io done callbacks), but
>
> And how does xfs preserve write ordering, if the barrier asynchronously
> fails with -EOPNOTSUPP and there are other writes submitted after the
> barrier?

Doesn't matter. XFS executes journal state changes during the I/O
completion callbacks a layer above this resubmit code. Hence,
if the I/O is resubmitted before the completion callbacks are
run, it just appears that the I/O has taken longer than expected
and the state change is delayed....

Cheers,

Dave.
--
Dave Chinner
[email protected]