2004-11-28 14:25:47

by Robert Murray

[permalink] [raw]
Subject: raid1 oops in 2.6.9 (debian package 2.6.9-1-686-smp)

Hi

The complete console log can be found at http://haylott.plus.com/~robbie/md-oops.txt

hde is a failed drive. In this log, hdg (the other drive in the raid1
array) is not present. This oops also occurs when hdg is present. I
don't know why it tries to use hde when it has been failed for some
time now. This doesn't occur with 2.6.8 (also a debian kernel). I
don't have a log of the oops when hdg was present, but I can provide
one if necessary.

Please let me know if there is any other information I can provide to
help to debug this. For now I have removed hde and everything is
working fine.

Best regards

Rob


2004-11-29 10:07:21

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: raid1 oops in 2.6.9 (debian package 2.6.9-1-686-smp)

On Sun, Nov 28, 2004 at 02:28:41PM +0000, Robert Murray wrote:
> Hi
>
> The complete console log can be found at http://haylott.plus.com/~robbie/md-oops.txt
>
> hde is a failed drive. In this log, hdg (the other drive in the raid1
> array) is not present. This oops also occurs when hdg is present. I
> don't know why it tries to use hde when it has been failed for some
> time now. This doesn't occur with 2.6.8 (also a debian kernel). I
> don't have a log of the oops when hdg was present, but I can provide
> one if necessary.
>
> Please let me know if there is any other information I can provide to
> help to debug this. For now I have removed hde and everything is
> working fine.

On a second note: Could someone please provide an explanation of why
the raid10 driver exists? People have created RAID-10 sets for years
using the RAID-0 driver on top of several RAID-1 arrays - this works
beautifully, it's simple, and it's easy to explain to people.

Why oh why, do we need raid10 ?

(I don't mean to bitch and moan over it - I just assume that there is a
good reason for it which was somehow never conveyed, or that I
overlooked in my search for this explanation)

And; if raid10 does not provide new functionality that was not possible
with raid1 + raid0, why oh why does this get accepted in a stable kernel
series? (ok, 2.6 is not stable, but I assume the intention is to make
it stable eventually, and accepting new functionality does not help this
process - all in all I do not understand the raid10 submission at all,
but I hope to be enlightened by someone (Neil?))

Also, I'd love to add a mention of raid10 in the HOWTO, but I need to
know why raid10 even exists before I can reasonably do that.

--

/ jakob "baffled Software-RAID HOWTO co-author"

2004-11-29 12:03:14

by jurriaan

[permalink] [raw]
Subject: Re: raid1 oops in 2.6.9 (debian package 2.6.9-1-686-smp)

From: Jakob Oestergaard <[email protected]>
Date: Mon, Nov 29, 2004 at 11:07:08AM +0100
> Why oh why, do we need raid10 ?

Raid-10 allows things currently not possible with raid-0/raid-1, like
spreading 2 pieces of data over 3 pieces of harddisk.

Their was an introductory message on the linux-raid mailinglist, but
it's more than one month old so I don't have a local copy.

> And; if raid10 does not provide new functionality that was not possible
> with raid1 + raid0, why oh why does this get accepted in a stable kernel
> series?

New drivers that are not enabled by default have always been allowed in
stable kernels, since they don't have an impact on stability for the
average user.

My $0.02,
Jurriaan
--
If something was not wrong things would not be right.
Sergeant Ortega - Zorro
Debian (Unstable) GNU/Linux 2.6.10-rc2-mm3 2x6078 bogomips load 1.44

2004-11-29 12:54:32

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: raid1 oops in 2.6.9 (debian package 2.6.9-1-686-smp)

On Mon, Nov 29, 2004 at 01:02:59PM +0100, Jurriaan wrote:
> From: Jakob Oestergaard <[email protected]>
> Date: Mon, Nov 29, 2004 at 11:07:08AM +0100
> > Why oh why, do we need raid10 ?
>
> Raid-10 allows things currently not possible with raid-0/raid-1, like
> spreading 2 pieces of data over 3 pieces of harddisk.

Sounds weird to me, but hey, that's probably just me :)

>
> Their was an introductory message on the linux-raid mailinglist, but
> it's more than one month old so I don't have a local copy.

I must have missed it when I looked for it then - I'll look again.

Thanks!

>
> > And; if raid10 does not provide new functionality that was not possible
> > with raid1 + raid0, why oh why does this get accepted in a stable kernel
> > series?
>
> New drivers that are not enabled by default have always been allowed in
> stable kernels, since they don't have an impact on stability for the
> average user.

True

--

/ jakob

2004-11-30 02:02:59

by NeilBrown

[permalink] [raw]
Subject: Re: raid1 oops in 2.6.9 (debian package 2.6.9-1-686-smp)

On Sunday November 28, [email protected] wrote:
> Hi
>
> The complete console log can be found at
> http://haylott.plus.com/~robbie/md-oops.txt

This looks like a known bug that is fixed in current 2.6.10
pre-releases.
>
> hde is a failed drive. In this log, hdg (the other drive in the raid1
> array) is not present. This oops also occurs when hdg is present. I
> don't know why it tries to use hde when it has been failed for some
> time now.

It tries to use hde because it sees no reason not to.
When a drive fails, md never writes to it again, so the record of it
being part of a raid1 array is still there.
If it is assembled with another drive that "knows" that hde has
failed, then it won't accept hde into the array. But the hdg missing,
hde is the best bet it has, and it tries it anyway.

NeilBrown