2008-02-15 12:08:35

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] Implement barrier support for single device DM devices

Implement barrier support for single device DM devices

This patch implements barrier support in DM for the common case of dm linear
just remapping a single underlying device. In this case we can safely
pass the barrier through because there can be no reordering between
devices.

Signed-off-by: Andi Kleen <[email protected]>

---
drivers/md/dm-linear.c | 1 +
drivers/md/dm-table.c | 27 ++++++++++++++++++++++++++-
drivers/md/dm.c | 14 ++++----------
drivers/md/dm.h | 2 ++
4 files changed, 33 insertions(+), 11 deletions(-)

Index: linux/drivers/md/dm-table.c
===================================================================
--- linux.orig/drivers/md/dm-table.c
+++ linux/drivers/md/dm-table.c
@@ -38,6 +38,9 @@ struct dm_table {
sector_t *highs;
struct dm_target *targets;

+ unsigned single_device : 1;
+ unsigned barrier_supported : 1;
+
/*
* Indicates the rw permissions for the new logical
* device. This should be a combination of FMODE_READ
@@ -584,12 +587,21 @@ EXPORT_SYMBOL_GPL(dm_set_device_limits);
int dm_get_device(struct dm_target *ti, const char *path, sector_t start,
sector_t len, int mode, struct dm_dev **result)
{
- int r = __table_get_device(ti->table, ti, path,
+ struct dm_table *t = ti->table;
+ int r = __table_get_device(t, ti, path,
start, len, mode, result);

if (!r)
dm_set_device_limits(ti, (*result)->bdev);

+ if (!r) {
+ /* Only got single device? */
+ if (t->devices.next->next == &t->devices)
+ t->single_device = 1;
+ else
+ t->single_device = 0;
+ }
+
return r;
}

@@ -1023,6 +1035,16 @@ struct mapped_device *dm_table_get_md(st
return t->md;
}

+int dm_table_barrier_ok(struct dm_table *t)
+{
+ return t->single_device && t->barrier_supported;
+}
+
+void dm_table_support_barrier(struct dm_table *t)
+{
+ t->barrier_supported = 1;
+}
+
EXPORT_SYMBOL(dm_vcalloc);
EXPORT_SYMBOL(dm_get_device);
EXPORT_SYMBOL(dm_put_device);
@@ -1033,3 +1055,5 @@ EXPORT_SYMBOL(dm_table_get_md);
EXPORT_SYMBOL(dm_table_put);
EXPORT_SYMBOL(dm_table_get);
EXPORT_SYMBOL(dm_table_unplug_all);
+EXPORT_SYMBOL(dm_table_barrier_ok);
+EXPORT_SYMBOL(dm_table_support_barrier);
Index: linux/drivers/md/dm.c
===================================================================
--- linux.orig/drivers/md/dm.c
+++ linux/drivers/md/dm.c
@@ -801,7 +801,10 @@ static int __split_bio(struct mapped_dev
ci.map = dm_get_table(md);
if (unlikely(!ci.map))
return -EIO;
-
+ if (unlikely(bio_barrier(bio) && !dm_table_barrier_ok(ci.map))) {
+ bio_endio(bio, -EOPNOTSUPP);
+ return 0;
+ }
ci.md = md;
ci.bio = bio;
ci.io = alloc_io(md);
@@ -837,15 +840,6 @@ static int dm_request(struct request_que
int rw = bio_data_dir(bio);
struct mapped_device *md = q->queuedata;

- /*
- * There is no use in forwarding any barrier request since we can't
- * guarantee it is (or can be) handled by the targets correctly.
- */
- if (unlikely(bio_barrier(bio))) {
- bio_endio(bio, -EOPNOTSUPP);
- return 0;
- }
-
down_read(&md->io_lock);

disk_stat_inc(dm_disk(md), ios[rw]);
Index: linux/drivers/md/dm.h
===================================================================
--- linux.orig/drivers/md/dm.h
+++ linux/drivers/md/dm.h
@@ -116,6 +116,8 @@ void dm_table_unplug_all(struct dm_table
* To check the return value from dm_table_find_target().
*/
#define dm_target_is_valid(t) ((t)->table)
+int dm_table_barrier_ok(struct dm_table *t);
+void dm_table_support_barrier(struct dm_table *t);

/*-----------------------------------------------------------------
* A registry of target types.
Index: linux/drivers/md/dm-linear.c
===================================================================
--- linux.orig/drivers/md/dm-linear.c
+++ linux/drivers/md/dm-linear.c
@@ -52,6 +52,7 @@ static int linear_ctr(struct dm_target *
ti->error = "dm-linear: Device lookup failed";
goto bad;
}
+ dm_table_support_barrier(ti->table);

ti->private = lc;
return 0;


2008-02-15 12:20:33

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [PATCH] Implement barrier support for single device DM devices

On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote:
> Implement barrier support for single device DM devices

Thanks. We've got some (more-invasive) dm patches in the works that
attempt to use flushing to emulate barriers where we can't just
pass them down like that.

Alasdair
--
[email protected]

2008-02-15 13:08:07

by Michael Tokarev

[permalink] [raw]
Subject: Re: [PATCH] Implement barrier support for single device DM devices

Alasdair G Kergon wrote:
> On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote:
>> Implement barrier support for single device DM devices
>
> Thanks. We've got some (more-invasive) dm patches in the works that
> attempt to use flushing to emulate barriers where we can't just
> pass them down like that.

I wonder if it's worth the effort to try to implement this.

As far as I understand (*), if a filesystem realizes that the
underlying block device does not support barriers, it will
switch to using regular flushes instead - isn't it the same
thing as you're trying to do on an MD level?

Note that a filesystem must understand barriers/flushes on
underlying block device, since many disk drives don't support
barriers anyway.

(*) this is, in fact, an interesting question. I still can't
find complete information about this. For example, how safe
xfs is if barriers are not supported or turned off? Is it
"less safe" than with barriers? Will it use regular cache
flushes if barriers are not here? Ditto for ext3fs, but
here, barriers are not enabled by default.

/mjt

2008-02-15 13:44:35

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Implement barrier support for single device DM devices

On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
> Alasdair G Kergon wrote:
> > On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote:
> >> Implement barrier support for single device DM devices
> >
> > Thanks. We've got some (more-invasive) dm patches in the works that
> > attempt to use flushing to emulate barriers where we can't just
> > pass them down like that.
>
> I wonder if it's worth the effort to try to implement this.

DM in theory has some more knowledge for optimization. e.g. for example if
it knows that a stream of requests hits only a single device then
it can just pass the barriers through again and only flush when there
is really a request dependency between different devices. File systems can't
do it that fine grained; it's either all or nothing.

I don't know if doing it fine grained will much difference in performance
though. The only way to find out would be to try it.

-Andi

2008-02-15 14:12:51

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote:
> On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
> > I wonder if it's worth the effort to try to implement this.

My personal view (which seems to be in the minority) is that it's a
waste of our development time *except* in the (rare?) cases similar to
the ones Andi is talking about.

But the decision has already been made for us in the block layer:
dm is now pretty much required to support (zero-length) barriers.
Unfortunately we didn't get this finished in time for 2.6.25, but we
intend to get it done for 2.6.26: Woe betide any callers today that
don't handle EOPNOTSUPP!

Alasdair
--
[email protected]

2008-02-15 14:59:13

by Andi Kleen

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

On Fri, Feb 15, 2008 at 02:12:29PM +0000, Alasdair G Kergon wrote:
> On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote:
> > On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
> > > I wonder if it's worth the effort to try to implement this.
>
> My personal view (which seems to be in the minority) is that it's a
> waste of our development time *except* in the (rare?) cases similar to

At least for my machines it is the standard case; it is not rare.

And don't RH distributions install with LVM by default these days?
For those it should be the standard case too on all systems with
only a single disk.

The other relatively simple case I plan to look into (in fact
I already wrote something, but it's not postable yet) is dm-crypt
on single device. But it's a little more complicated than the
simple dm-linear case.

-Andi

2008-02-15 15:41:33

by Alan

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

> And don't RH distributions install with LVM by default these days?
> For those it should be the standard case too on all systems with
> only a single disk.

Yes - I make a point of turning it off ;)

Alan

2008-02-17 23:31:59

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH] Implement barrier support for single device DM devices

On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
> Alasdair G Kergon wrote:
> > On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote:
> >> Implement barrier support for single device DM devices
> >
> > Thanks. We've got some (more-invasive) dm patches in the works that
> > attempt to use flushing to emulate barriers where we can't just
> > pass them down like that.
>
> I wonder if it's worth the effort to try to implement this.
>
> As far as I understand (*), if a filesystem realizes that the
> underlying block device does not support barriers, it will
> switch to using regular flushes instead

No, typically the filesystems won't issue flushes, either.

> - isn't it the same
> thing as you're trying to do on an MD level?
>
> Note that a filesystem must understand barriers/flushes on
> underlying block device, since many disk drives don't support
> barriers anyway.
>
> (*) this is, in fact, an interesting question. I still can't
> find complete information about this. For example, how safe
> xfs is if barriers are not supported or turned off? Is it
> "less safe" than with barriers? Will it use regular cache
> flushes if barriers are not here?

Try reading at the XFS FAQ:

http://oss.sgi.com/projects/xfs/faq/#wcache

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-02-18 12:49:15

by Ric Wheeler

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

Alasdair G Kergon wrote:
> On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote:
>> On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
>>> I wonder if it's worth the effort to try to implement this.
>
> My personal view (which seems to be in the minority) is that it's a
> waste of our development time *except* in the (rare?) cases similar to
> the ones Andi is talking about.

Using working barriers is important for normal users when you really
care about data loss and have normal drives in a box. We do power fail
testing on boxes (with reiserfs and ext3) and can definitely see a lot
of file system corruption eliminated over power failures when barriers
are enabled properly.

It is not unreasonable for some machines to disable barriers to get a
performance boost, but I would not do that when you are storing things
you really need back.

Of course, you don't need barriers when you either disable the write
cache on the drives or use a battery backed RAID array which gives you a
write cache that will survive power outages...

ric

2008-02-18 13:24:39

by Michael Tokarev

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

Ric Wheeler wrote:
> Alasdair G Kergon wrote:
>> On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote:
>>> On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
>>>> I wonder if it's worth the effort to try to implement this.
>>
>> My personal view (which seems to be in the minority) is that it's a
>> waste of our development time *except* in the (rare?) cases similar to
>> the ones Andi is talking about.
>
> Using working barriers is important for normal users when you really
> care about data loss and have normal drives in a box. We do power fail
> testing on boxes (with reiserfs and ext3) and can definitely see a lot
> of file system corruption eliminated over power failures when barriers
> are enabled properly.
>
> It is not unreasonable for some machines to disable barriers to get a
> performance boost, but I would not do that when you are storing things
> you really need back.

The talk here is about something different - about supporting barriers
on md/dm devices, i.e., on pseudo-devices which uses multiple real devices
as components (software RAIDs etc). In this "world" it's nearly impossible
to support barriers if there are more than one underlying component device,
barriers only works if there's only one component. And the talk is about
supporting barriers only in "minority" of cases - mostly for simplest
device-mapper case only, NOT covering any raid1 or other "fancy" configurations.

> Of course, you don't need barriers when you either disable the write
> cache on the drives or use a battery backed RAID array which gives you a
> write cache that will survive power outages...

Two things here.

First, I still don't understand why in God's sake barriers are "working"
while regular cache flushes are not. Almost no consumer-grade hard drive
supports write barriers, but they all support regular cache flushes, and
the latter should be enough (while not the most speed-optimal) to ensure
data safety. Why to require write cache disable (like in XFS FAQ) instead
of going the flush-cache-when-appropriate (as opposed to write-barrier-
when-appropriate) way?

And second, "surprisingly", battery-backed RAID write caches tends to fail
too, sometimes... ;) Usually, such a battery is enough to keep the data
in memory for several hours only (sine many RAID controllers uses regular
RAM for memory caches, which requires some power to keep its state), --
I come across this issue the hard way, and realized that only very few
persons around me who manages raid systems even knows about this problem -
that the battery-backed cache is only for some time... For example,
power failed at evening, and by tomorrow morning, batteries are empty
already. Or, with better batteries, think about a weekend... ;)
(I've seen some vendors now uses flash-based backing store for caches
instead, which should ensure far better results here).

/mjt

2008-02-18 13:52:58

by Ric Wheeler

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

Michael Tokarev wrote:
> Ric Wheeler wrote:
>> Alasdair G Kergon wrote:
>>> On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote:
>>>> On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
>>>>> I wonder if it's worth the effort to try to implement this.
>>> My personal view (which seems to be in the minority) is that it's a
>>> waste of our development time *except* in the (rare?) cases similar to
>>> the ones Andi is talking about.
>> Using working barriers is important for normal users when you really
>> care about data loss and have normal drives in a box. We do power fail
>> testing on boxes (with reiserfs and ext3) and can definitely see a lot
>> of file system corruption eliminated over power failures when barriers
>> are enabled properly.
>>
>> It is not unreasonable for some machines to disable barriers to get a
>> performance boost, but I would not do that when you are storing things
>> you really need back.
>
> The talk here is about something different - about supporting barriers
> on md/dm devices, i.e., on pseudo-devices which uses multiple real devices
> as components (software RAIDs etc). In this "world" it's nearly impossible
> to support barriers if there are more than one underlying component device,
> barriers only works if there's only one component. And the talk is about
> supporting barriers only in "minority" of cases - mostly for simplest
> device-mapper case only, NOT covering any raid1 or other "fancy" configurations.

I understand that. Most of the time, dm or md devices are composed of
uniform components which will uniformly support (or not) the cache flush
commands used by barriers.

>
>> Of course, you don't need barriers when you either disable the write
>> cache on the drives or use a battery backed RAID array which gives you a
>> write cache that will survive power outages...
>
> Two things here.
>
> First, I still don't understand why in God's sake barriers are "working"
> while regular cache flushes are not. Almost no consumer-grade hard drive
> supports write barriers, but they all support regular cache flushes, and
> the latter should be enough (while not the most speed-optimal) to ensure
> data safety. Why to require write cache disable (like in XFS FAQ) instead
> of going the flush-cache-when-appropriate (as opposed to write-barrier-
> when-appropriate) way?

Barriers have different flavors, but can be composed of "cache" flushes
which are supported on all drives that I have seen (S-ATA and ATA) for
many years now. That is the flavor of barriers that we test with S-ATA &
ATA drives.

The issue is that without flushing/invalidating (or other way of
controlling the behavior of your storage), the file system has no way to
make sure that all data is on persistent & non-volatile media.

>
> And second, "surprisingly", battery-backed RAID write caches tends to fail
> too, sometimes... ;) Usually, such a battery is enough to keep the data
> in memory for several hours only (sine many RAID controllers uses regular
> RAM for memory caches, which requires some power to keep its state), --
> I come across this issue the hard way, and realized that only very few
> persons around me who manages raid systems even knows about this problem -
> that the battery-backed cache is only for some time... For example,
> power failed at evening, and by tomorrow morning, batteries are empty
> already. Or, with better batteries, think about a weekend... ;)
> (I've seen some vendors now uses flash-based backing store for caches
> instead, which should ensure far better results here).
>
> /mjt
>

That is why you need to get a good array, not just a simple controller ;-)

Most arrays do not use batteries to hold up the write cache, they use
the batteries to move any cached data to non-volatile media in the time
that the batteries hold up.

You could certainly get this kind of behavior from the flash scheme you
describe above as well...

ric

2008-02-18 22:17:17

by David Chinner

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote:
> First, I still don't understand why in God's sake barriers are "working"
> while regular cache flushes are not. Almost no consumer-grade hard drive
> supports write barriers, but they all support regular cache flushes, and
> the latter should be enough (while not the most speed-optimal) to ensure
> data safety. Why to require write cache disable (like in XFS FAQ) instead
> of going the flush-cache-when-appropriate (as opposed to write-barrier-
> when-appropriate) way?

Devil's advocate:

Why should we need to support multiple different block layer APIs
to do the same thing? Surely any hardware that doesn't support barrier
operations can emulate them with cache flushes when they receive a
barrier I/O from the filesystem....

Also, given that disabling the write cache still allows CTQ/NCQ to
operate effectively and that in most cases WCD+CTQ is as fast as
WCE+barriers, the simplest thing to do is turn off volatile write
caches and not require any extra software kludges for safe
operation.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-02-19 02:39:27

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [PATCH] Implement barrier support for single device DM devices

On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
> Alasdair G Kergon wrote:
> > On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote:
> >> Implement barrier support for single device DM devices
> > Thanks. We've got some (more-invasive) dm patches in the works that
> > attempt to use flushing to emulate barriers where we can't just
> > pass them down like that.
> I wonder if it's worth the effort to try to implement this.

The decision got taken to allocate barrier bios to implement the basic
flush so dm has little choice in this matter now. (If you're going to
implement barriers for flush, you might as well implement them more
generally.)

Maybe I should spell this out more clearly for those who weren't
tracking this block layer change: AFAIK You cannot currently flush a
device-mapper block device without doing some jiggery-pokery.

> For example, how safe
> xfs is if barriers are not supported or turned off?

The last time we tried xfs with dm it didn't seem to notice -EOPNOTSUPP
everywhere it should => recovery may find corruption.

Alasdair
--
[email protected]

2008-02-19 02:45:47

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

On Mon, Feb 18, 2008 at 08:52:10AM -0500, Ric Wheeler wrote:
> I understand that. Most of the time, dm or md devices are composed of
> uniform components which will uniformly support (or not) the cache flush
> commands used by barriers.

As a dm developer, it's "almost none of the time" because trivial
configurations aren't the ones that require lots of testing effort.

Let's stop arguing over "most of the time":-)

As Andi points out, there are certainly enough real-world users of
"single linear or crypt target using one physical device" for it to be
worth our supporting it.

Alasdair
--
[email protected]

2008-02-19 02:57:05

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote:
> Surely any hardware that doesn't support barrier
> operations can emulate them with cache flushes when they receive a
> barrier I/O from the filesystem....

My complaint about having to support them within dm when more than one
device is involved is because any efficiencies disappear: you can't send
further I/O to any one device until all the other devices have completed
their barrier (or else later I/O to that device could overtake the
barrier on another device). And then I argue that it would be better
for the filesystem to have the information that these are not hardware
barriers so it has the opportunity of tuning its behaviour (e.g.
flushing less often because it's a more expensive operation).

Alasdair
--
[email protected]

2008-02-19 05:37:26

by David Chinner

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

On Tue, Feb 19, 2008 at 02:56:43AM +0000, Alasdair G Kergon wrote:
> On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote:
> > Surely any hardware that doesn't support barrier
> > operations can emulate them with cache flushes when they receive a
> > barrier I/O from the filesystem....
>
> My complaint about having to support them within dm when more than one
> device is involved is because any efficiencies disappear: you can't send
> further I/O to any one device until all the other devices have completed
> their barrier (or else later I/O to that device could overtake the
> barrier on another device).

Right - it's a horrible performance hit.

But - how is what you describe any different to the filesystem doing:

- flush block device
- issue I/O
- wait for completion
- flush block device

around any I/O that it would otherwise simply tag as a barrier?
That serialisation at the filesystem layer is a horrible, horrible
performance hi.

And then there's the fact that we can't implement that in XFS
because all the barrier I/Os we issue are asynchronous. We'd
basically have to serialise all metadata operations and now we
are talking about far worse performance hits than implementing
barrier emulation in the block device.

Also, it's instructive to look at the implementation of
blkdev_issue_flush() - the API one is supposed to use to trigger a
full block device flush. It doesn't work on DM/MD either, because
it uses a no-I/O barrier bio:

bio->bi_end_io = bio_end_empty_barrier;
bio->bi_private = &wait;
bio->bi_bdev = bdev;
submit_bio(1 << BIO_RW_BARRIER, bio);

wait_for_completion(&wait);

So, if the underlying block device doesn't support barriers,
there's no point in changing the filesystem to issue flushes,
either...

> And then I argue that it would be better
> for the filesystem to have the information that these are not hardware
> barriers so it has the opportunity of tuning its behaviour (e.g.
> flushing less often because it's a more expensive operation).

There is generally no option from the filesystem POV to "flush
less". Either we use barrier I/Os where we need to and are safe with
volatile caches or we corrupt filesystems with volatile caches when
power loss occurs. There is no in-between where "flushing less"
will save us from corruption....

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-02-19 07:20:06

by Jeremy Higdon

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote:
> On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote:
> > First, I still don't understand why in God's sake barriers are "working"
> > while regular cache flushes are not. Almost no consumer-grade hard drive
> > supports write barriers, but they all support regular cache flushes, and
> > the latter should be enough (while not the most speed-optimal) to ensure
> > data safety. Why to require write cache disable (like in XFS FAQ) instead
> > of going the flush-cache-when-appropriate (as opposed to write-barrier-
> > when-appropriate) way?
>
> Devil's advocate:
>
> Why should we need to support multiple different block layer APIs
> to do the same thing? Surely any hardware that doesn't support barrier
> operations can emulate them with cache flushes when they receive a
> barrier I/O from the filesystem....
>
> Also, given that disabling the write cache still allows CTQ/NCQ to
> operate effectively and that in most cases WCD+CTQ is as fast as
> WCE+barriers, the simplest thing to do is turn off volatile write
> caches and not require any extra software kludges for safe
> operation.


I'll put it even more strongly. My experience is that disabling write
cache plus disabling barriers is often much faster than enabling both
barriers and write cache enabled, when doing metadata intensive
operations, as long as you have a drive that is good at CTQ/NCQ.

The only time write cache + barriers is significantly faster is when
doing single threaded data writes, such as direct I/O, or if CTQ/NCQ
is not enabled, or the drive does a poor job at it.

jeremy

2008-02-19 07:59:00

by Michael Tokarev

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

Jeremy Higdon wrote:
[]
> I'll put it even more strongly. My experience is that disabling write
> cache plus disabling barriers is often much faster than enabling both
> barriers and write cache enabled, when doing metadata intensive
> operations, as long as you have a drive that is good at CTQ/NCQ.

Now, and it's VERY interesting at least for me (and is off-topic in
this thread) -- which drive(s) are good at NCQ? I tried numerous SATA
(NCQ is about sata, right? :) drives, but NCQ either does nothing in
terms of performance or hurts. Yesterday we ordered another drive
from Hitachi (their "raid edition" thing), -- will try it tomorrow,
but I've no hope here as it's some 5th or 6th model/brand already.

(Ol'good SCSI drives, even 10 years old, shows large difference when
TCQ is enabled...)

Thanks!

2008-02-19 09:43:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

> My complaint about having to support them within dm when more than one
> device is involved is because any efficiencies disappear: you can't send
> further I/O to any one device until all the other devices have completed
> their barrier (or else later I/O to that device could overtake the
> barrier on another device). And then I argue that it would be better

I was wondering: would it help DM to have the concept of a "barrier window"
As in "this barrier is only affective for this group of requests"
With such a concept DM would need to stall only inside the groups
and possible even issue such barrier groups in parallel, couldn't it?

I'm sure you guys all have thought far more about barriers than
I ever did; if that idea came up before why was it dismissed?

-Andi

2008-02-19 11:13:01

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH] Implement barrier support for single device DM devices

On Tue, Feb 19, 2008 at 02:39:00AM +0000, Alasdair G Kergon wrote:
> > For example, how safe
> > xfs is if barriers are not supported or turned off?
>
> The last time we tried xfs with dm it didn't seem to notice -EOPNOTSUPP
> everywhere it should => recovery may find corruption.

Bug reports, please. What we don't know about, we can't fix.

As of this commit:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0bfefc46dc028df60120acdb92062169c9328769

XFS should be handling all cases of -EOPNOTSUPP for barrier
I/Os. If you are still having problems, please let us know.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-02-20 13:39:33

by Ric Wheeler

[permalink] [raw]
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

Jeremy Higdon wrote:
> On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote:
>> On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote:
>>> First, I still don't understand why in God's sake barriers are "working"
>>> while regular cache flushes are not. Almost no consumer-grade hard drive
>>> supports write barriers, but they all support regular cache flushes, and
>>> the latter should be enough (while not the most speed-optimal) to ensure
>>> data safety. Why to require write cache disable (like in XFS FAQ) instead
>>> of going the flush-cache-when-appropriate (as opposed to write-barrier-
>>> when-appropriate) way?
>> Devil's advocate:
>>
>> Why should we need to support multiple different block layer APIs
>> to do the same thing? Surely any hardware that doesn't support barrier
>> operations can emulate them with cache flushes when they receive a
>> barrier I/O from the filesystem....
>>
>> Also, given that disabling the write cache still allows CTQ/NCQ to
>> operate effectively and that in most cases WCD+CTQ is as fast as
>> WCE+barriers, the simplest thing to do is turn off volatile write
>> caches and not require any extra software kludges for safe
>> operation.
>
>
> I'll put it even more strongly. My experience is that disabling write
> cache plus disabling barriers is often much faster than enabling both
> barriers and write cache enabled, when doing metadata intensive
> operations, as long as you have a drive that is good at CTQ/NCQ.
>
> The only time write cache + barriers is significantly faster is when
> doing single threaded data writes, such as direct I/O, or if CTQ/NCQ
> is not enabled, or the drive does a poor job at it.
>
> jeremy
>

It would be interesting to compare numbers.

In the large, single threaded write case, what I have measured is
roughly 2x faster writes with barriers/write cache enabled on S-ATA/ATA
class drives. I think that this case alone is a fairly common one.

For very small file sizes, I have seen write cache off beat barriers +
write cache enabled as well but barriers start out performing write
cache disabled when you get up to moderate sizes (need to rerun tests to
get precise numbers/cross over data).

The type of workload is also important. In the test cases that I ran,
the application needs to fsync() each file so we beat up on the barrier
code pretty heavily.

ric