2007-05-12 02:02:50

by Andreas Dilger

[permalink] [raw]
Subject: [RFC] store RAID stride in superblock

It is possible to specify the RAID stride to mke2fs allow it to optimize
the layout of the bitmaps. With the new mballoc it is also possible to
tell it via a mount option to do large allocations aligned on the RAID
stride (by default it aligns on 1MB boundaries from the start of the LUN).

What would be rather convenient is to store the RAID stride value in the
superblock. That would spare a lot of hassle on the part of the admin
to tune the filesystem optimally for the underlying storage. There is
also a library used in the XFS tools that knows how to probe various
kinds of block devices (e.g. MD RAID, LVM/DM, etc) to get their storage
layout that would avoid the need for the user to specify anything.

Any thoughts on this?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


2007-05-12 02:21:27

by Eric Sandeen

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

Andreas Dilger wrote:
> It is possible to specify the RAID stride to mke2fs allow it to optimize
> the layout of the bitmaps. With the new mballoc it is also possible to
> tell it via a mount option to do large allocations aligned on the RAID
> stride (by default it aligns on 1MB boundaries from the start of the LUN).
>
> What would be rather convenient is to store the RAID stride value in the
> superblock. That would spare a lot of hassle on the part of the admin
> to tune the filesystem optimally for the underlying storage. There is
> also a library used in the XFS tools that knows how to probe various
> kinds of block devices (e.g. MD RAID, LVM/DM, etc) to get their storage
> layout that would avoid the need for the user to specify anything.
>
> Any thoughts on this?

I think it sounds great. I think ext4 would benefit greatly from
knowing a bit more about the underlying device geometry & allocating
accordingly...

-Eric

2007-05-12 08:11:52

by Eric Anopolsky

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

On Fri, 2007-05-11 at 19:02 -0700, Andreas Dilger wrote:
> What would be rather convenient is to store the RAID stride value in the
> superblock.
> There is also a library used in the XFS tools that knows how to probe various
> kinds of block devices (e.g. MD RAID, LVM/DM, etc) to get their storage
> layout that would avoid the need for the user to specify anything.
>
> Any thoughts on this?

It's late at night here so my thoughts are a little fuzzy. Nonetheless:

The concept is really tempting. RAID is good, and not asking the user
for information that the system can find out for itself is good too.

In the unlikely event that the RAID stride were to change, I think the
autodetect-each-time method would be superior to the store-in-superblock
method. Doubly so if the code to detect MD and LVM stride is lean and
clean.

I wonder if, in a RAID 0 configuration, deliberately misaligning data
structures smaller than (size of stride * number of disks in array)
would yield a performance benefit.

Cheers,

Eric


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-05-12 08:33:48

by Alex Tomas

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock


Eric wrote:
> The concept is really tempting. RAID is good, and not asking the user
> for information that the system can find out for itself is good too.
>
> In the unlikely event that the RAID stride were to change, I think the
> autodetect-each-time method would be superior to the store-in-superblock
> method. Doubly so if the code to detect MD and LVM stride is lean and
> clean.

true, but in some cases (hardware raid, SAN, etc) there is no easy way
to learn that other than asking user.

thanks, Alex

2007-05-12 09:32:52

by Eric Anopolsky

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

On Sat, 2007-05-12 at 12:33 +0400, Alex Tomas wrote:
> > In the unlikely event that the RAID stride were to change, I think the
> > autodetect-each-time method would be superior to the store-in-superblock
> > method.
>
> true, but in some cases (hardware raid, SAN, etc) there is no easy way
> to learn that other than asking user.

That hadn't occurred to me. Perhaps the filesystem driver or mkfs could
probe for the stride in those cases? If the code asks for, say, 10MiB of
data from the block device and it gets back sectors that are spaced
128KiB apart before it gets the rest of the data, it can make an
intelligent guess about the stride.

I wonder what penalties would come from a bad guess due to a cache in
between the block device driver and the disk platters, or other load on
a SAN...


Eric


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-05-12 09:38:19

by Alex Tomas

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

I don't quite follow? how would you "probe" ? for example,
there is DDN array which write well with 1MB aligned/sized
requests only. thus, mballoc tries to align allocation
requests WRT to this constrain. do you mean incorporation
storage benchmark in the mount procedure?

thanks, Alex

Eric wrote:
> That hadn't occurred to me. Perhaps the filesystem driver or mkfs could
> probe for the stride in those cases? If the code asks for, say, 10MiB of
> data from the block device and it gets back sectors that are spaced
> 128KiB apart before it gets the rest of the data, it can make an
> intelligent guess about the stride.
>
> I wonder what penalties would come from a bad guess due to a cache in
> between the block device driver and the disk platters, or other load on
> a SAN...
>
>
> Eric
>

2007-05-12 15:50:57

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

On May 12, 2007 01:11 -0700, Eric wrote:
> The concept is really tempting. RAID is good, and not asking the user
> for information that the system can find out for itself is good too.
>
> In the unlikely event that the RAID stride were to change, I think the
> autodetect-each-time method would be superior to the store-in-superblock
> method. Doubly so if the code to detect MD and LVM stride is lean and
> clean.

I've asked the block layer folks a couple of times if it would be possible
to have an interface for this in the kernel, but so far I've had little
success in getting them to do it and I don't have time for it myself.

I agree that auto-detection is best (would need a userspace interface too)
but a lot can be done with a format-time detection. It is unlikely that
the RAID striping will change under the filesystem, and if it does then
the stripe size is usually kept the same (e.g. RAID 5 restriping to add
a disk).

Even if the stiping does change, the current alignment of bitmaps is
about the worst possible case for power-of-two stride sizes because a
single disk has all of the bitmaps (using the terms "stripe = N * stride"
for N+1 RAID5 or N+2 RAID6 - if anyone knows the "more correct" terms
please speak up). It would also be possible to use tune2fs to change
the stride + stripe size in the superblock to at least tune the mballoc
allocation even if we can't move the bitmaps around very easily.

> I wonder if, in a RAID 0 configuration, deliberately misaligning data
> structures smaller than (size of stride * number of disks in array)
> would yield a performance benefit.

Yes, that would definitely be something to do. If you have N-disk RAID0,
each disk having "stride" blocks at a time, then offsetting the bitmaps by
"stride" blocks each is exactly what "mke2fs -E stride=" does. The
mballoc "stripe" option tries to put large allocations covering the whole
stripe to avoid parity read-modify-write if possible.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-05-12 16:14:22

by Eric Anopolsky

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

> > Perhaps the filesystem driver or mkfs could
> > probe for the stride in those cases? If the code asks for, say, 10MiB of
> > data from the block device and it gets back sectors that are spaced
> > 128KiB apart before it gets the rest of the data, it can make an
> > intelligent guess about the stride.
>
> do you mean incorporation
> storage benchmark in the mount procedure?

Yes. If the benefits of automatically aligning on-disk data structures
to the stride of the array are great enough, then a storage
mini-benchmark may be of use.

For example, suppose we have an array with a stride of 1MiB and the
filesystem driver requests 10MiB of contiguous data from the start of
the block device. Then the data at +0MiB from the start of the device,
the data at +1MiB, the data at +2MiB, and so on ought to arrive earlier
the data at, say, +0.5MiB, +1.5MiB and +2.5MiB. This would allow the
filesystem driver to detect the stride even when the striping isn't
being done by the MD or LVM/DM drivers in Linux (which, apparently, have
well-defined interfaces for discovering the stride in software).

I imagine this would work well for a run-of-the-mill hardware RAID card
in a PC. However, as you pointed out in your original email, there are
SANs to be considered. If another host is putting load on the SAN, it
could throw off the read timings and cause the filesystem driver to make
a bad guess.

Cheers,

Eric


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-05-19 06:16:45

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

On Fri, May 11, 2007 at 07:02:48PM -0700, Andreas Dilger wrote:
> It is possible to specify the RAID stride to mke2fs allow it to optimize
> the layout of the bitmaps. With the new mballoc it is also possible to
> tell it via a mount option to do large allocations aligned on the RAID
> stride (by default it aligns on 1MB boundaries from the start of the LUN).

You asked for it, you got it.

- Ted

# HG changeset patch
# User [email protected]
# Date 1179540413 14400
# Node ID 2afd1c039d26aaa66c55ede30770df1990392f84
# Parent f95d161b454ec94e8974946d38e3e94c612f2cd2
Store the RAID stride value in the superblock and take advantage of it

Store the RAID stride value when a filesystem is created with a requested
RAID stride, and then use it automatically in resize2fs.

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff -r f95d161b454e -r 2afd1c039d26 lib/ext2fs/ChangeLog
--- a/lib/ext2fs/ChangeLog Fri May 18 21:44:29 2007 -0400
+++ b/lib/ext2fs/ChangeLog Fri May 18 22:06:53 2007 -0400
@@ -1,3 +1,10 @@ 2007-05-08 Eric Sandeen <[email protected]
+2007-05-18 Theodore Tso <[email protected]>
+
+ * openfs.c (ext2fs_open2): Set fs->stride from the superblock's
+ s_raid_stride value.
+
+ * ext2_fs.h: Allocate space for RAID stride in the superblock.
+
2007-05-08 Eric Sandeen <[email protected]>

* ext2_fs.h (inode_uid, inode_gid): The inode_uid() and
diff -r f95d161b454e -r 2afd1c039d26 lib/ext2fs/ext2_fs.h
--- a/lib/ext2fs/ext2_fs.h Fri May 18 21:44:29 2007 -0400
+++ b/lib/ext2fs/ext2_fs.h Fri May 18 22:06:53 2007 -0400
@@ -573,7 +573,9 @@ struct ext2_super_block {
__u16 s_min_extra_isize; /* All inodes have at least # bytes */
__u16 s_want_extra_isize; /* New inodes should reserve # bytes */
__u32 s_flags; /* Miscellaneous flags */
- __u32 s_reserved[167]; /* Padding to the end of the block */
+ __u16 s_raid_stride; /* RAID stride */
+ __u16 s_pad; /* Padding */
+ __u32 s_reserved[166]; /* Padding to the end of the block */
};

/*
diff -r f95d161b454e -r 2afd1c039d26 lib/ext2fs/openfs.c
--- a/lib/ext2fs/openfs.c Fri May 18 21:44:29 2007 -0400
+++ b/lib/ext2fs/openfs.c Fri May 18 22:06:53 2007 -0400
@@ -297,6 +297,8 @@ errcode_t ext2fs_open2(const char *name,
dest += fs->blocksize;
}

+ fs->stride = fs->super->s_raid_stride;
+
*ret_fs = fs;
return 0;
cleanup:
diff -r f95d161b454e -r 2afd1c039d26 misc/ChangeLog
--- a/misc/ChangeLog Fri May 18 21:44:29 2007 -0400
+++ b/misc/ChangeLog Fri May 18 22:06:53 2007 -0400
@@ -1,4 +1,6 @@ 2007-05-18 Theodore Tso <[email protected]
2007-05-18 Theodore Tso <[email protected]>
+
+ * mke2fs.c (main): Save the raid stride to the superblock

* blkid.c (main): Add -g option to blkid which will garbage
collect the cache.
diff -r f95d161b454e -r 2afd1c039d26 misc/mke2fs.c
--- a/misc/mke2fs.c Fri May 18 21:44:29 2007 -0400
+++ b/misc/mke2fs.c Fri May 18 22:06:53 2007 -0400
@@ -1611,7 +1611,7 @@ int main (int argc, char *argv[])
test_disk(fs, &bb_list);

handle_bad_blocks(fs, bb_list);
- fs->stride = fs_stride;
+ fs->stride = fs->super->s_raid_stride = fs_stride;
retval = ext2fs_allocate_tables(fs);
if (retval) {
com_err(program_name, retval,
diff -r f95d161b454e -r 2afd1c039d26 resize/ChangeLog
--- a/resize/ChangeLog Fri May 18 21:44:29 2007 -0400
+++ b/resize/ChangeLog Fri May 18 22:06:53 2007 -0400
@@ -1,3 +1,9 @@ 2007-03-18 Theodore Tso <[email protected]
+2007-05-18 Theodore Tso <[email protected]>
+
+ * main.c (determine_fs_stride): Use the superblock s_raid_stride
+ if it is set; save the hueristically determined stride to
+ the superblock if it is not set.
+
2007-03-18 Theodore Tso <[email protected]>

* resize2fs.c (check_and_change_inodes): Check to make sure the
diff -r f95d161b454e -r 2afd1c039d26 resize/main.c
--- a/resize/main.c Fri May 18 21:44:29 2007 -0400
+++ b/resize/main.c Fri May 18 22:06:53 2007 -0400
@@ -101,6 +101,8 @@ static void determine_fs_stride(ext2_fil
unsigned int has_sb, prev_has_sb, num;
int i_stride, b_stride;

+ if (fs->stride)
+ return;
num = 0; sum = 0;
for (group = 0; group < fs->group_desc_count; group++) {
has_sb = ext2fs_bg_has_super(fs, group);
@@ -131,6 +133,9 @@ static void determine_fs_stride(ext2_fil
fs->stride = sum / num;
else
fs->stride = 0;
+
+ fs->super->s_raid_stride = fs->stride;
+ ext2fs_mark_super_dirty(fs);

#if 0
if (fs->stride)
@@ -348,7 +353,8 @@ int main (int argc, char ** argv)
_("Invalid stride length"));
exit(1);
}
- fs->stride = use_stride;
+ fs->stride = fs->super->s_raid_stride = use_stride;
+ ext2fs_mark_super_dirty(fs);
} else
determine_fs_stride(fs);


2007-05-24 11:44:44

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

On May 22, 2007 01:22 +0530, Kalpak Shah wrote:
> __u16 s_raid_stride; /* RAID stride */
> - __u16 s_pad; /* Padding */
> + __u16 s_mmp_interval; /* Wait for # seconds in MMP
> checking */
> + __u64 s_mmp_block; /* Block for multi-mount protection
> */

Ted, I just noticed this updated patch w.r.t. your recent s_raid_stride
addition. I also want to have a separate parameter for "s_raid_stripe_width"
which is normally N * s_raid_stride, where N is the number of disks in a
RAID 5 N+1 (or RAID 6 N+2) parity stripe. This is for delalloc+mballoc to
allow it to align and size new allocations so that writes do not impose
read-modify-write overhead on the RAID stripes.

My understanding from the code is that s_raid_stride is to put the bitmaps
for different groups on different disks to avoid always having a single
disk busy with bitmap updates.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-05-24 14:12:01

by Rupesh Thakare

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

Hello,
I've added "s_raid_stripe_width" parameter in superblock.
I've also incorporated "s_raid_stride" and "s_raid_stripe_width"
parameters in tune2fs.
The new options can be specified using '-E options' in both mke2fs and
tune2fs.
Both the Man pages (mke2fs and tune2fs) are updated accordingly.
Patch is attached herewith.
Thanks,
Rupesh.

Andreas Dilger wrote:
> On May 22, 2007 01:22 +0530, Kalpak Shah wrote:
>
>> __u16 s_raid_stride; /* RAID stride */
>> - __u16 s_pad; /* Padding */
>> + __u16 s_mmp_interval; /* Wait for # seconds in MMP
>> checking */
>> + __u64 s_mmp_block; /* Block for multi-mount protection
>> */
>>
>
> Ted, I just noticed this updated patch w.r.t. your recent s_raid_stride
> addition. I also want to have a separate parameter for "s_raid_stripe_width"
> which is normally N * s_raid_stride, where N is the number of disks in a
> RAID 5 N+1 (or RAID 6 N+2) parity stripe. This is for delalloc+mballoc to
> allow it to align and size new allocations so that writes do not impose
> read-modify-write overhead on the RAID stripes.
>
> My understanding from the code is that s_raid_stride is to put the bitmaps
> for different groups on different disks to avoid always having a single
> disk busy with bitmap updates.
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


Attachments:
e2fsprogs_stride.patch (9.87 kB)

2007-05-31 16:21:11

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

On Thu, May 24, 2007 at 07:45:32PM +0530, Rupesh Thakare wrote:
> Hello,
> I've added "s_raid_stripe_width" parameter in superblock.
> I've also incorporated "s_raid_stride" and "s_raid_stripe_width"
> parameters in tune2fs.
> The new options can be specified using '-E options' in both mke2fs and
> tune2fs.
> Both the Man pages (mke2fs and tune2fs) are updated accordingly.
> Patch is attached herewith.

Thanks. I've used a different offset for the raid_stripe_width, to
avoid conflicting with Kalpak's mmp patch.

Could you send me a signed-off-by for your patch?

Thanks,

- Ted

2007-05-31 20:19:08

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

On May 31, 2007 12:21 -0400, Theodore Tso wrote:
> On Thu, May 24, 2007 at 07:45:32PM +0530, Rupesh Thakare wrote:
> > I've added "s_raid_stripe_width" parameter in superblock.
> > I've also incorporated "s_raid_stride" and "s_raid_stripe_width"
> > parameters in tune2fs.
> > The new options can be specified using '-E options' in both mke2fs and
> > tune2fs.
> > Both the Man pages (mke2fs and tune2fs) are updated accordingly.
> > Patch is attached herewith.
>
> Thanks. I've used a different offset for the raid_stripe_width, to
> avoid conflicting with Kalpak's mmp patch.

Ah, we've been doing it the other way around here. It makes sense to keep
the s_raid_stripe_width fields together. I think this code is preliminary
enough that nobody has actually started using it yet. Can you please post
what the end of ext2_super_block looks like (whether you decide to reorder
the fields or not).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-05-31 20:58:55

by Kalpak Shah

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

On Thu, 2007-05-31 at 14:19 -0600, Andreas Dilger wrote:
> On May 31, 2007 12:21 -0400, Theodore Tso wrote:
> > On Thu, May 24, 2007 at 07:45:32PM +0530, Rupesh Thakare wrote:
> > > I've added "s_raid_stripe_width" parameter in superblock.
> > > I've also incorporated "s_raid_stride" and "s_raid_stripe_width"
> > > parameters in tune2fs.
> > > The new options can be specified using '-E options' in both mke2fs and
> > > tune2fs.
> > > Both the Man pages (mke2fs and tune2fs) are updated accordingly.
> > > Patch is attached herewith.
> >
> > Thanks. I've used a different offset for the raid_stripe_width, to
> > avoid conflicting with Kalpak's mmp patch.
>
> Ah, we've been doing it the other way around here. It makes sense to keep
> the s_raid_stripe_width fields together. I think this code is preliminary
> enough that nobody has actually started using it yet. Can you please post
> what the end of ext2_super_block looks like (whether you decide to reorder
> the fields or not).

I can update the MMP patches when I actually send them for inclusion. So
I think it makes sense to keep the s_raid_* fields together.

Thanks,
Kalpak.

>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-05-31 21:33:04

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

On Thu, May 31, 2007 at 02:19:02PM -0600, Andreas Dilger wrote:
> Ah, we've been doing it the other way around here. It makes sense to keep
> the s_raid_stripe_width fields together. I think this code is preliminary
> enough that nobody has actually started using it yet. Can you please post
> what the end of ext2_super_block looks like (whether you decide to reorder
> the fields or not).

Oops, I just pushed a set of bugfixes to Linux that included the
superblock field reservations. I was going back and forth about
whether to keep them together, or whether to keep the extra u16 s_pad
and then have to reserve another u16 field plus another u16 field for
MMP seconds field. Since you guys had been talking about the MMP code
for longer period of time (I think you first made the proposal a few
months ago), I had assumed it had precedence (and had possibly already
been in use at some customer somewhere), so I used Kalpak's original
MMP superblock field reservations.

I don't think it's worth changing at this point. (If no one is using
it yet, it won't be too hard to switch around so we're all doing the
same thing. :-) What is in the e2fsprogs hg repository as well as the
for_linus branch of ext4.git is:

..
__u16 s_raid_stride; /* RAID stride */
__u16 s_mmp_interval; /* # seconds to wait in MMP checking */
__u64 s_mmp_block; /* Block for multi-mount protection */
__u32 s_raid_stripe_width; /* blocks on all data disks (N*stride)*/
__u32 s_reserved[163]; /* Padding to the end of the block */
};

One question which does come to mind; is there any reason why we might
want to know the RAID level and/or the number of disks (as opposed to
just the stripe width)? And has anyone investigated where there are
magic ioctl's or libdevmapper APi's so we can get the RAID parameters
automatically? If so, patches so that mke2fs can get the information
automatically (as opposed to forcing the user to have to specify lots
of annoying options) would be most welcome....

- Ted

2007-05-31 22:01:13

by Eric Sandeen

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

Theodore Tso wrote:
> And has anyone investigated where there are
> magic ioctl's or libdevmapper APi's so we can get the RAID parameters
> automatically? If so, patches so that mke2fs can get the information
> automatically (as opposed to forcing the user to have to specify lots
> of annoying options) would be most welcome....

xfsprogs has a libdisk which does this for evms, lvm, md, dm, and xvm(!)

see for example md_get_subvol_stripe() in xfsprogs.

-Eric

2007-05-31 22:03:10

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC] store RAID stride in superblock

On May 31, 2007 17:33 -0400, Theodore Tso wrote:
> Oops, I just pushed a set of bugfixes to Linux that included the
> superblock field reservations.

Oh well.

> What is in the e2fsprogs hg repository ... is:
>
> ..
> __u16 s_raid_stride; /* RAID stride */
> __u16 s_mmp_interval; /* # seconds to wait in MMP checking */
> __u64 s_mmp_block; /* Block for multi-mount protection */
> __u32 s_raid_stripe_width; /* blocks on all data disks (N*stride)*/
> __u32 s_reserved[163]; /* Padding to the end of the block */
> };

We're updating our patches to be based on the new HG code.

> One question which does come to mind; is there any reason why we might
> want to know the RAID level and/or the number of disks (as opposed to
> just the stripe width)?

Not so far. The raid_stride is for bitmap placement (and could also be
used for alignment of random IOs to avoid making 2 disks busy when 1
would do). The raid_stripe_width is the amount that delalloc+mballoc
will use for allocations+writes to avoid read-modify-write of RAID stripes.
It doesn't really matter what the RAID level is.

> And has anyone investigated where there are
> magic ioctl's or libdevmapper APi's so we can get the RAID parameters
> automatically? If so, patches so that mke2fs can get the information
> automatically (as opposed to forcing the user to have to specify lots
> of annoying options) would be most welcome....

For now we will specify this via mke2fs or tune2fs for existing filesystems.
The XFS folks mentioned they have a library to extract this info for linux
devices (e.g. DM, MD, etc), but of course that still won't work for e.g.
external RAID devices.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.