From: Kevin Corry <[email protected]>
The dm-stripe target currently does not enforce that the size of a stripe
device be a multiple of the chunk-size. Under certain conditions, this can
lead to I/O requests going off the end of an underlying device. This
test-case shows one example.
echo "0 100 linear /dev/hdb1 0" | dmsetup create linear0
echo "0 100 linear /dev/hdb1 100" | dmsetup create linear1
echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" | \
dmsetup create stripe0
dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k
This will produce the output:
dd: writing '/dev/mapper/stripe0': Input/output error
97+0 records in
96+0 records out
And in the kernel log will be:
attempt to access beyond end of device
dm-0: rw=0, want=104, limit=100
The patch below will check that the table size is a multiple of the stripe
chunk-size when the table is created, which will prevent the above striped
device from being created.
This should not affect tools like LVM or EVMS, since in all the cases I can
think of, striped devices are always created with the sizes being a multiple
of the chunk-size.
The size of a stripe device must be a multiple of its chunk-size.
Signed-off-by: Kevin Corry <[email protected]>
Signed-Off-By: Alasdair G Kergon <[email protected]>
dm-stripe.c | 6 ++++++
1 file changed, 6 insertions(+)
Index: linux-2.6.16-rc5/drivers/md/dm-stripe.c
===================================================================
--- linux-2.6.16-rc5.orig/drivers/md/dm-stripe.c 2006-03-14 18:25:38.000000000 +0000
+++ linux-2.6.16-rc5/drivers/md/dm-stripe.c 2006-03-15 17:56:37.000000000 +0000
@@ -103,9 +103,15 @@ static int stripe_ctr(struct dm_target *
return -EINVAL;
}
+ if (((uint32_t)ti->len) & (chunk_size - 1)) {
+ ti->error = "dm-stripe: Target length not divisible by "
+ "chunk size";
+ return -EINVAL;
+ }
+
width = ti->len;
if (sector_div(width, stripes)) {
- ti->error = "dm-stripe: Target length not divisable by "
+ ti->error = "dm-stripe: Target length not divisible by "
"number of stripes";
return -EINVAL;
}
This patch broke my system. I am booting from a hardware fakeraid
consisting of a raid-0 between two WD 36 gig 10,000 rpm raptors on a VIA
sata raid controller. The dmraid utility previously would correctly
configure dm to access the device ( though I did see some of the IO
errors you mention in my logs ) and now dmraid fails to configure the dm
table because this patch rejects it. My working dmraid table follows:
via_hfciifae: 0 144607678 striped 2 128 8:0 0 8:16 0
I believe the correct thing to do is to special case the last stripe in
the volume like such:
0-31 64-67
32-63 68-71
Alasdair G Kergon wrote:
> From: Kevin Corry <[email protected]>
>
> The dm-stripe target currently does not enforce that the size of a stripe
> device be a multiple of the chunk-size. Under certain conditions, this can
> lead to I/O requests going off the end of an underlying device. This
> test-case shows one example.
>
> echo "0 100 linear /dev/hdb1 0" | dmsetup create linear0
> echo "0 100 linear /dev/hdb1 100" | dmsetup create linear1
> echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" | \
> dmsetup create stripe0
> dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k
>
> This will produce the output:
> dd: writing '/dev/mapper/stripe0': Input/output error
> 97+0 records in
> 96+0 records out
>
> And in the kernel log will be:
> attempt to access beyond end of device
> dm-0: rw=0, want=104, limit=100
>
> The patch below will check that the table size is a multiple of the stripe
> chunk-size when the table is created, which will prevent the above striped
> device from being created.
>
> This should not affect tools like LVM or EVMS, since in all the cases I can
> think of, striped devices are always created with the sizes being a multiple
> of the chunk-size.
>
> The size of a stripe device must be a multiple of its chunk-size.
>
> Signed-off-by: Kevin Corry <[email protected]>
> Signed-Off-By: Alasdair G Kergon <[email protected]>
>
> dm-stripe.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> Index: linux-2.6.16-rc5/drivers/md/dm-stripe.c
> ===================================================================
> --- linux-2.6.16-rc5.orig/drivers/md/dm-stripe.c 2006-03-14 18:25:38.000000000 +0000
> +++ linux-2.6.16-rc5/drivers/md/dm-stripe.c 2006-03-15 17:56:37.000000000 +0000
> @@ -103,9 +103,15 @@ static int stripe_ctr(struct dm_target *
> return -EINVAL;
> }
>
> + if (((uint32_t)ti->len) & (chunk_size - 1)) {
> + ti->error = "dm-stripe: Target length not divisible by "
> + "chunk size";
> + return -EINVAL;
> + }
> +
> width = ti->len;
> if (sector_div(width, stripes)) {
> - ti->error = "dm-stripe: Target length not divisable by "
> + ti->error = "dm-stripe: Target length not divisible by "
> "number of stripes";
> return -EINVAL;
> }
On Thu, Oct 12, 2006 at 12:01:21AM -0400, Phillip Susi wrote:
> now dmraid fails to configure the dm
> table because this patch rejects it.
> I believe the correct thing to do is to special case the last stripe in
> 0-31 64-67
> 32-63 68-71
AFAIK current versions of dmraid handle this correctly - Heinz?
Alasdair
--
[email protected]
Define 'handle it correctly'? The correct thing to do is for dm to
accept the values and work properly. dmraid only passes the parameters
to dm from the on disk data structure created by the bios.
Alasdair G Kergon wrote:
> On Thu, Oct 12, 2006 at 12:01:21AM -0400, Phillip Susi wrote:
>> now dmraid fails to configure the dm
>> table because this patch rejects it.
>
>> I believe the correct thing to do is to special case the last stripe in
>> 0-31 64-67
>> 32-63 68-71
>
> AFAIK current versions of dmraid handle this correctly - Heinz?
>
> Alasdair
On Thu, Oct 12, 2006 at 11:31:28AM -0400, Phillip Susi wrote:
> The correct thing to do is for dm to
> accept the values and work properly.
But there's ambiguity: should dm truncate just the last stripe(s) or all
stripes equally, or use some other combination of truncation? The answer
depends on the circumstances, and so it is for userspace, which should know
which of those is required, to resolve the matter by supplying appropriate
chunk sizes.
Alasdair
--
[email protected]
So you are saying that dmraid should build 3 tables: 1 for the bulk of
the array, 1 for only the last stripe, and 1 linear to connect them?
I'm not sure where the ambiguity is. Obviously all the stripes prior to
the last should behave normally, the only problem comes from the last
stripe. How else could you map the last stripe other than laying down x
sectors onto y drives as x / y sectors on each drive in sequence?
Alasdair G Kergon wrote:
> But there's ambiguity: should dm truncate just the last stripe(s) or all
> stripes equally, or use some other combination of truncation? The answer
> depends on the circumstances, and so it is for userspace, which should know
> which of those is required, to resolve the matter by supplying appropriate
> chunk sizes.
>
> Alasdair
On Thu, Oct 12, 2006 at 02:14:05PM -0400, Phillip Susi wrote:
> So you are saying that dmraid should build 3 tables: 1 for the bulk of
> the array, 1 for only the last stripe, and 1 linear to connect them?
No. 1 table. 2 consecutive targets with different stripe sizes, if that's
how the data is actually laid out.
> the only problem comes from the last
> stripe. How else could you map the last stripe other than laying down x
> sectors onto y drives as x / y sectors on each drive in sequence?
Depends whether or not you give precedence to the stripe size.
The underlying device might be much larger - dm doesn't know or care - and
the intention of userspace might have been to truncate a larger striped
device part-way through one of the stripes - an equally reasonable thing to
do.
Alasdair
--
[email protected]
Alasdair G Kergon wrote:
> On Thu, Oct 12, 2006 at 02:14:05PM -0400, Phillip Susi wrote:
>> So you are saying that dmraid should build 3 tables: 1 for the bulk of
>> the array, 1 for only the last stripe, and 1 linear to connect them?
>
> No. 1 table. 2 consecutive targets with different stripe sizes, if that's
> how the data is actually laid out.
>
One stripe table can only contain one stripe size, so to have two would
require two tables, and a third table to tie them back together.
>> the only problem comes from the last
>> stripe. How else could you map the last stripe other than laying down x
>> sectors onto y drives as x / y sectors on each drive in sequence?
>
> Depends whether or not you give precedence to the stripe size.
> The underlying device might be much larger - dm doesn't know or care - and
> the intention of userspace might have been to truncate a larger striped
> device part-way through one of the stripes - an equally reasonable thing to
> do.
The entire idea of a stripe is that you are using multiple identical
drives ( or partitions ), so it doesn't make any sense to be able to
truncate one of the drives. In any case, this is not something you can
do now, so the fact that you could not do it then either does not seem
to be a good argument against allowing partial tails.
On Thu, Oct 12, 2006 at 04:47:59PM -0400, Phillip Susi wrote:
> One stripe table can only contain one stripe size, so to have two would
> require two tables, and a third table to tie them back together.
One device-mapper table can contain two concatenated stripe targets.
http://people.redhat.com/agk/talks/FOSDEM_2005/text6.html
> The entire idea of a stripe is that you are using multiple identical
> drives ( or partitions ), so it doesn't make any sense to be able to
> truncate one of the drives.
dmraid is not the only user of device-mapper striping. Userspace volume
managers may want to use all sorts of odd layouts quite legitimately.
> In any case, this is not something you can
> do now,
[Actually that's what the code did before this patch:-(]
> so the fact that you could not do it then either does not seem
> to be a good argument against allowing partial tails.
The arguments are (1) to avoid the ambiguity I've discussed and (2) to avoid
the additional complexity the in-kernel striped target would require
(calculating a stripe size for the end of the device to override the one
supplied), when it's so simple for userspace to specify exactly what it
requires by using two targets.
Alasdair
--
[email protected]
On Thu, Oct 12, 2006 at 02:59:45PM +0100, Alasdair G Kergon wrote:
> On Thu, Oct 12, 2006 at 12:01:21AM -0400, Phillip Susi wrote:
> > now dmraid fails to configure the dm
> > table because this patch rejects it.
>
> > I believe the correct thing to do is to special case the last stripe in
> > 0-31 64-67
> > 32-63 68-71
>
> AFAIK current versions of dmraid handle this correctly - Heinz?
Yes, that's correct.
>
> Alasdair
> --
> [email protected]
--
Regards,
Heinz -- The LVM Guy --
*** Software bugs are stupid.
Nevertheless it needs not so stupid people to solve them ***
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Heinz Mauelshagen Red Hat GmbH
Consulting Development Engineer Am Sonnenhang 11
Storage Development 56242 Marienrachdorf
Germany
[email protected] PHONE +49 171 7803392
FAX +49 2626 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Turns out my raid volume's size according to the metadata on disk IS an
even multiple of the stripe factor, so it looks like this is a bug in
dmraid, and I have taken the issue to the ataraid mailing list.
Heinz Mauelshagen wrote:
> On Thu, Oct 12, 2006 at 02:59:45PM +0100, Alasdair G Kergon wrote:
>> On Thu, Oct 12, 2006 at 12:01:21AM -0400, Phillip Susi wrote:
>>> now dmraid fails to configure the dm
>>> table because this patch rejects it.
>>
>>> I believe the correct thing to do is to special case the last stripe in
>>> 0-31 64-67
>>> 32-63 68-71
>>
>> AFAIK current versions of dmraid handle this correctly - Heinz?
>
> Yes, that's correct.
>
>> Alasdair
>> --
>> [email protected]
>