2009-10-20 17:32:03

by Doug Hunley

[permalink] [raw]
Subject: changing stride and stripe_width post-fs-creation?

Is it safe to use tune2fs to alter stride and stripe_width on an ext4
fs once it has been created? Any caveats I should know about? Thanks

--
Douglas J Hunley, RHCT
[email protected] : http://douglasjhunley.com : Twitter: @hunleyd

Obsessively opposed to the typical.


2009-10-20 21:08:51

by Andreas Dilger

[permalink] [raw]
Subject: Re: changing stride and stripe_width post-fs-creation?

On 20-Oct-09, at 11:32, Doug Hunley wrote:
> Is it safe to use tune2fs to alter stride and stripe_width on an ext4
> fs once it has been created? Any caveats I should know about? Thanks


The stride is mostly used at fs creation time, but there is no problem
with changing it. The stripe_width is used by the allocator to align
file allocations with the RAID layout.

One question for Eric is whether the new libdisk patches he made will
set
the stripe_width to something ridiculous like 512 or 4096 bytes, or if
it
just leaves that field unset in that case. I suspect it would be bad
for
mballoc to see the stripe_width be such a small value.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-10-20 21:16:55

by Eric Sandeen

[permalink] [raw]
Subject: Re: changing stride and stripe_width post-fs-creation?

Andreas Dilger wrote:
> On 20-Oct-09, at 11:32, Doug Hunley wrote:
>> Is it safe to use tune2fs to alter stride and stripe_width on an ext4
>> fs once it has been created? Any caveats I should know about? Thanks
>
>
> The stride is mostly used at fs creation time, but there is no problem
> with changing it. The stripe_width is used by the allocator to align
> file allocations with the RAID layout.
>
> One question for Eric is whether the new libdisk patches he made will set
> the stripe_width to something ridiculous like 512 or 4096 bytes, or if it
> just leaves that field unset in that case. I suspect it would be bad for
> mballoc to see the stripe_width be such a small value.

well... yes, it does set it to whatever is reported:

+ min_io = blkid_topology_get_minimum_io_size(tp);
+ opt_io = blkid_topology_get_optimal_io_size(tp);
+ blocksize = EXT2_BLOCK_SIZE(fs_param);
+
+ fs_param->s_raid_stride = min_io / blocksize;
+ fs_param->s_raid_stripe_width = opt_io / blocksize;


if mballoc can't handle certain values then maybe the kernel code should
be changed to ignore it? Small values could just as easily come from a
user too ...

-Eric

2009-10-20 21:30:56

by Andreas Dilger

[permalink] [raw]
Subject: Re: changing stride and stripe_width post-fs-creation?

On 20-Oct-09, at 15:16, Eric Sandeen wrote:
> Andreas Dilger wrote:
>> The stride is mostly used at fs creation time, but there is no
>> problem
>> with changing it. The stripe_width is used by the allocator to align
>> file allocations with the RAID layout.
>> One question for Eric is whether the new libdisk patches he made
>> will set
>> the stripe_width to something ridiculous like 512 or 4096 bytes, or
>> if it
>> just leaves that field unset in that case. I suspect it would be
>> bad for
>> mballoc to see the stripe_width be such a small value.
>
> well... yes, it does set it to whatever is reported:
>
> + min_io = blkid_topology_get_minimum_io_size(tp);
> + opt_io = blkid_topology_get_optimal_io_size(tp);
> + blocksize = EXT2_BLOCK_SIZE(fs_param);
> +
> + fs_param->s_raid_stride = min_io / blocksize;
> + fs_param->s_raid_stripe_width = opt_io / blocksize;
>
>
> if mballoc can't handle certain values then maybe the kernel code
> should be changed to ignore it? Small values could just as easily
> come from a user too

That probably makes the most sense to have the kernel ignore the
value. It's
not that it can't "handle" it, just that I suspect mballoc will work
poorly if
it is trying to align the allocations to 1-block values (i.e. no
alignment at
all). Even with regular disks, reading in 64kB-aligned chunks is more
efficient
than reading misaligned chunks because of the track buffer.

Probably ignoring anything below 64kB makes sense, or possibly using
some
multiple of the specified size until it is larger than 64kB is better
(in
case someone formats their RAID-5 with 5 disks * 8kB chunk size or
similar).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.