2009-04-07 20:42:18

by Anton Ertl

[permalink] [raw]
Subject: Out-of-order writing by disk drives

I have released a new version of hdtest, a program that tests whether
hard disks write out-of-order relative to the order that the writes
were passed to them from the OS. You find the program at

http://www.complang.tuwien.ac.at/anton/hdtest/

Here I mainly present the results from my tests, and explain enough
about the program so you know what I am talking about.


HOW DOES IT WORK?

It writes the blocks in an order like this:

1000-0-1001-0-1002-0-...

This sequence seems to inspire PATA and SATA disks to write
out-of-order (in the order 1000-1001-1002-...-0). So you turn off the
drive's power while running the program. The written blocks contain
certain data that another program from the suite can check after you
power the drive up again.


RESULTS

I performed two sets of tests, one in November 1999, and one in April
2009. The results have not changed much. In both tests disks wrote
data seriously out-of-order in their default configuration; they can
delay the writing of block 0 in this test for quite a long time.

In more detail:

In 2009 I tested three drives (and accessed the whole drive) under
Linux 2.6.18 on Debian Etch; the USB enclosure used was a Tsunami
Elegant 3.5" Enclosure that has PATA and SATA disk drive interfaces.

* Maxtor L300R0 PATA (300GB) connected through an USB enclosure: In
two tests it wrote the consecutive blocks 47 and 34 blocks after the
last written block 0.

* Seagate ST340062 Model 0A PATA (7200.10, 400GB):
connected through a USB enclosure:
3 times the result was as if it had written the blocks in-order
1 time it wrote 3064 blocks out-of-order
2 times it wrote 18384 blocks out-of-order
connected directly via PATA cable:
1 time it wrote 1972 blocks out-of-order

* Seagate ST340062 Model 0AS SATA (7200.10, 400GB) connected through a
USB enclosure:
1 time the result was as if it had written the blocks in-order
2 times it wrote 3064 blocks out-of-order
1 time it wrote 6128 blocks out-of-order
1 time it wrote 12256 blocks out-of-order
1 time it did not write block 0 at all

It is interesting that the number of blocks that is found to be
out-of-order is often a multiple of 3064. Maybe this is a multiple of
a track size; no other explanations come to mind.

In 1999 I tested two drives (and accessed one partition) under
Linux-2.2.1 on RedHat 5.1. The two drives were a Quantum Fireball
CR8.4A (8GB) and an IBM-DHEA-36480 (6GB), both connected directly via
PATA. I did one test with each of the disks, and they did not even
write block 0 once on the platters before I turned off the power.

I also tested the Quantum with write caching disabled (hdparm -W 0).
Hdtest was now quite noisy and produced the in-order result.


CONCLUSION

Applications and file systems requiring in-order writes (i.e.,
basically all of them) should use barriers or turn off write caching
for the disk drive(s) they use. Unfortunately, the Linux ext3 file
system does not use barriers by default; use the mount option
barrier=1 to enable them, e.g. by putting a line like this in
/etc/fstab:

/dev/md2 /home ext3 defaults,barrier=1 1 2

- anton


2009-04-14 14:10:48

by Andi Kleen

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

"Anton Ertl" <[email protected]> writes:
>
> /dev/md2 /home ext3 defaults,barrier=1 1 2

Just make sure /dev/md2 is a RAID1, MD RAID0/5/10 don't support barriers.

See also my general treatise of multiple device barriers earlier today.

-Andi
--
[email protected] -- Speaking for myself only.

2009-04-14 16:34:06

by Anton Ertl

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

Andi Kleen wrote:
>
> "Anton Ertl" <[email protected]> writes:
> >
> > /dev/md2 /home ext3 defaults,barrier=1 1 2
>
> Just make sure /dev/md2 is a RAID1, MD RAID0/5/10 don't support barriers.

Thank you. I added the following to the README:

|Note that, as of this writing (2009-04), not all Linux devices support
|barriers, in particular md devices only support them in RAID 1 mode;
|the kernel will reportedly warn about the lack of barriers if you try
|to use ext3 with barriers on a device that does not support barriers
|(look in, e.g., dmesg).

> See also my general treatise of multiple device barriers earlier today.

I guess you mean <[email protected]> in the
"dm-multipath and write request ordering" thread. Thank you for the
pointer.

- anton

2009-04-14 17:21:28

by Andi Kleen

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

On Tue, Apr 14, 2009 at 06:33:50PM +0200, Anton Ertl wrote:
> Andi Kleen wrote:
> >
> > "Anton Ertl" <[email protected]> writes:
> > >
> > > /dev/md2 /home ext3 defaults,barrier=1 1 2
> >
> > Just make sure /dev/md2 is a RAID1, MD RAID0/5/10 don't support barriers.
>
> Thank you. I added the following to the README:
>
> |Note that, as of this writing (2009-04), not all Linux devices support
> |barriers, in particular md devices only support them in RAID 1 mode;
> |the kernel will reportedly warn about the lack of barriers if you try
> |to use ext3 with barriers on a device that does not support barriers
> |(look in, e.g., dmesg).

A full listing of what devices do and don't support barriers would
be likely very long. You would actually need to list down to hard disks.

A common problem is barriers over LVM. Since 2.6.29 they work
with a single device (and if the underlying device supports it) with
dm linear, but not in any other LVM setup.

So it might be better to just generally recommend to check
dmesg.

-Andi

--
[email protected] -- Speaking for myself only.

2009-04-14 17:40:22

by Mark Lord

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

Andi Kleen wrote:
> On Tue, Apr 14, 2009 at 06:33:50PM +0200, Anton Ertl wrote:
>> Andi Kleen wrote:
>>> "Anton Ertl" <[email protected]> writes:
>>>> /dev/md2 /home ext3 defaults,barrier=1 1 2
>>> Just make sure /dev/md2 is a RAID1, MD RAID0/5/10 don't support barriers.
>> Thank you. I added the following to the README:
>>
>> |Note that, as of this writing (2009-04), not all Linux devices support
>> |barriers, in particular md devices only support them in RAID 1 mode;
>> |the kernel will reportedly warn about the lack of barriers if you try
>> |to use ext3 with barriers on a device that does not support barriers
>> |(look in, e.g., dmesg).
>
> A full listing of what devices do and don't support barriers would
> be likely very long. You would actually need to list down to hard disks.
>
> A common problem is barriers over LVM. Since 2.6.29 they work
> with a single device (and if the underlying device supports it) with
> dm linear, but not in any other LVM setup.

..

Does anyone else here find this rather peculiar?

The folks who actually care about barriers the most
(apart from kernel developers) are probably enterprise users.

And who is most likely to be using RAID and LVM,
where barriers generally don't work at all ?

2009-04-14 17:49:03

by Michael Tokarev

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

Mark Lord wrote:
> Andi Kleen wrote:
[]
>> A common problem is barriers over LVM. Since 2.6.29 they work
>> with a single device (and if the underlying device supports it) with
>> dm linear, but not in any other LVM setup.
>
> Does anyone else here find this rather peculiar?
>
> The folks who actually care about barriers the most
> (apart from kernel developers) are probably enterprise users.
>
> And who is most likely to be using RAID and LVM,
> where barriers generally don't work at all ?

And esp. RAID10.

(Not being an "enterprise" user really, but I do still use
several hard drives and databases).

For this very reason, I stopped using both RAID10 and LVM.

For now, I've several large RAID1 volumes with GPT partitions
inside them, and use that for various databases etc. With
XFS mostly. It also does not have constant alignment problems
what LVM has(*) (I can align GPT properly, even when the tools
to do so (parted-derivates) are very bad quality still, crashing
left and right). Yes it's not that easy to use as LVM, but it
is MUCH faster because of all the reasons stated.

(*) another LVM's issue which is hidden behind scenes for
most users, and is especially serious on raid5 or raid6 --
this is misaligned blocks. The good thing about this is
that raid[56] are not usually being used for write-intensive
applications like databases.

/mjt

2009-04-14 17:51:57

by Andi Kleen

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

> Does anyone else here find this rather peculiar?
>
> The folks who actually care about barriers the most
> (apart from kernel developers) are probably enterprise users.
>
> And who is most likely to be using RAID and LVM,
> where barriers generally don't work at all ?

The big enterprise users often have a SAN, so the LVM/RAID part is
hidden somewhere between a block device, together with a lot of
battery backed cache RAM (so that even running uncached is not too bad)
But on the other hand they also have UPSes, so data loss on power failure
might be not that big a problem for them.

Where I see it as a problem is with virtualization; LVM seems to be the
most sane way to manage file systems for lots of VMs and you likely
want barriers there too.

-Andi

--
[email protected] -- Speaking for myself only.

2009-04-14 18:10:07

by Michael Tokarev

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

Andi Kleen wrote:
[]
> Where I see it as a problem is with virtualization; LVM seems to be the
> most sane way to manage file systems for lots of VMs and you likely
> want barriers there too.

Virtualisation is the best fit for partitionable raid1 arrays.
It is, in fact, what we have here -- I tested LVM but rejected
it because of this very issue - it does not support barriers.
And used partitions inside RAID1 arrays instead. It is not
that easy as with lvm (requires some work with numbers instead
of names, but that's again not that problematic if you think
about /dev/disk/by-name/...), and does not support resizing,
but again, this is not really a very necessary feature here
(easy to copy data to a new, larger place).

And by the way, here, extlinux comes very very handy. Inside
guests I don't use partitions but "whole disks", including
boot disk (/dev/vda for kvm). ext3fs on it, and extlinux to
boot it (works flawlessly on a partition-less device)

/mjt

2009-04-14 18:48:36

by Andi Kleen

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

On Tue, Apr 14, 2009 at 10:09:48PM +0400, Michael Tokarev wrote:
> Andi Kleen wrote:
> []
> >Where I see it as a problem is with virtualization; LVM seems to be the
> >most sane way to manage file systems for lots of VMs and you likely
> >want barriers there too.
>
> Virtualisation is the best fit for partitionable raid1 arrays.
> It is, in fact, what we have here -- I tested LVM but rejected
> it because of this very issue - it does not support barriers.

It does now as of 2.6.29, as long as you only have a single
underlying device and use dm linear.

-Andi
>

--
[email protected] -- Speaking for myself only.

2009-04-14 19:27:42

by Chris Mason

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

On Tue, 2009-04-14 at 20:50 +0200, Andi Kleen wrote:
> On Tue, Apr 14, 2009 at 10:09:48PM +0400, Michael Tokarev wrote:
> > Andi Kleen wrote:
> > []
> > >Where I see it as a problem is with virtualization; LVM seems to be the
> > >most sane way to manage file systems for lots of VMs and you likely
> > >want barriers there too.
> >
> > Virtualisation is the best fit for partitionable raid1 arrays.
> > It is, in fact, what we have here -- I tested LVM but rejected
> > it because of this very issue - it does not support barriers.
>
> It does now as of 2.6.29, as long as you only have a single
> underlying device and use dm linear.

Eric Sandeen noticed this is actually still broken:

http://lkml.org/lkml/2009/3/23/360

-chris

2009-04-15 12:46:19

by Jens Axboe

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

On Tue, Apr 14 2009, Chris Mason wrote:
> On Tue, 2009-04-14 at 20:50 +0200, Andi Kleen wrote:
> > On Tue, Apr 14, 2009 at 10:09:48PM +0400, Michael Tokarev wrote:
> > > Andi Kleen wrote:
> > > []
> > > >Where I see it as a problem is with virtualization; LVM seems to be the
> > > >most sane way to manage file systems for lots of VMs and you likely
> > > >want barriers there too.
> > >
> > > Virtualisation is the best fit for partitionable raid1 arrays.
> > > It is, in fact, what we have here -- I tested LVM but rejected
> > > it because of this very issue - it does not support barriers.
> >
> > It does now as of 2.6.29, as long as you only have a single
> > underlying device and use dm linear.
>
> Eric Sandeen noticed this is actually still broken:
>
> http://lkml.org/lkml/2009/3/23/360

Alasdair promised to push the remaining barrier bits for 2.6.30, so
hopefully it should all be in working order Real Soon Now.

--
Jens Axboe

2009-04-17 19:47:20

by folkert

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

>> A full listing of what devices do and don't support barriers would
>> be likely very long. You would actually need to list down to hard disks.
>>
>> A common problem is barriers over LVM. Since 2.6.29 they work
>> with a single device (and if the underlying device supports it) with
>> dm linear, but not in any other LVM setup.
> ..
>
> Does anyone else here find this rather peculiar?
> The folks who actually care about barriers the most
> (apart from kernel developers) are probably enterprise users.
> And who is most likely to be using RAID and LVM,
> where barriers generally don't work at all ?

What about iSCSI? Does it support barriers?


Folkert van Heusden

--
MultiTail cok yonlu kullanimli bir program, loglari okumak, verilen
kommandolari yerine getirebilen. Filter, renk verme, merge, 'diff-
view', vs. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2009-04-17 20:26:18

by Andi Kleen

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

> What about iSCSI? Does it support barriers?

iSCSI uses just SCSI as the high level protocol as the name implies
and that supports barriers. But it of course always depends on
the backend storage device and the particular driver.

-Andi

--
[email protected] -- Speaking for myself only.

2009-04-17 21:07:27

by folkert

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

> I have released a new version of hdtest, a program that tests whether
> hard disks write out-of-order relative to the order that the writes
> were passed to them from the OS. You find the program at
> http://www.complang.tuwien.ac.at/anton/hdtest/

Not sure if it matters but it seems open-iscsi (both target and
initiator are linux systems) works fine with respect to the write
barriers: while running hdtest on an iscsi device I suddenly stopped the
traffic flowing using an iptables DROP-rule. Then of course I stopped
the iscsi initiator, removed the rules, restarted the initator and ran
hdcheck: all above the line have the correct magic.


Folkert van Heusden

--
http://www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
HP/UX en win een vlaai naar keuze
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2009-04-18 09:06:19

by Anton Ertl

[permalink] [raw]
Subject: Re: Out-of-order writing by disk drives

Folkert van Heusden wrote:
>
> > I have released a new version of hdtest, a program that tests whether
> > hard disks write out-of-order relative to the order that the writes
> > were passed to them from the OS. You find the program at
> > http://www.complang.tuwien.ac.at/anton/hdtest/
>
> Not sure if it matters but it seems open-iscsi (both target and
> initiator are linux systems) works fine with respect to the write
> barriers: while running hdtest on an iscsi device I suddenly stopped the
> traffic flowing using an iptables DROP-rule. Then of course I stopped
> the iscsi initiator, removed the rules, restarted the initator and ran
> hdcheck: all above the line have the correct magic.

hdtest does not use barriers (if it did, my results would hopefully be
different; BTW, how would I use device barriers from a user program?).
But it does write to the device opened with O_SYNC. So I expect the
kernel to pass the request synchronously to the device (due to
O_SYNC), but the device has no particular reason (like barriers) to
write the stuff in-order. So I would expect your disconnection not to
result in out-of-order writing just like I would not expect
disconnecting the USB or SATA connection to have that effect when
using a setup like I did (but I have not tried that).

In short, your experiment tells nothing about barriers over iSCSI,
because barriers are not used (AFAIK).

- anton