2002-10-02 13:01:42

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: AARGH! Please help. IDE controller fsckup

hi all

I have this cute little server with some 16 120gig IDE drives, and I've got
some serious problems with it.

Controllers:
One onboard IDE controller (2 channels).
Two promise ATA100 (2 channels each).
One CMD649 (2 channels).

something seriously bad about the CMD649 makes Linux beleive it's the first
controller with hd[abcd]. On these, there are two RAID-1s (/ and /var). Due
to the fact that the box has some 1,6TB disk space, we haven't got any backup
solution (we have an identical box in order to mirror them).

so - now - the CMD649 has suddenly begun to fail - losing contact with one or
two drives, and I _really_ need to get what's on /data (RAID-5 on
hd[efghijklmnop]) out. Problem is - the replacement controller I've got from
the vendor works fine (turns up as controller 3 serving hd[mnop]). How can I
revert this most easily to be able to boot again?

I hope this is not too off topic... Please excuse that.

roy

--
Roy Sigurd Karlsbakk, Datavaktmester
ProntoTV AS - http://www.pronto.tv/
Tel: +47 9801 3356

Computers are like air conditioners.
They stop working when you open Windows.


2002-10-03 09:47:47

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: AARGH! Please help. IDE controller fsckup

On Wed, Oct 02, 2002 at 03:16:46PM +0200, Roy Sigurd Karlsbakk wrote:
> hi all
>
> I have this cute little server with some 16 120gig IDE drives, and I've got
> some serious problems with it.
>
> Controllers:
> One onboard IDE controller (2 channels).
> Two promise ATA100 (2 channels each).
> One CMD649 (2 channels).
>
> something seriously bad about the CMD649 makes Linux beleive it's the first
> controller with hd[abcd]. On these, there are two RAID-1s (/ and /var). Due
> to the fact that the box has some 1,6TB disk space, we haven't got any backup
> solution (we have an identical box in order to mirror them).
>
> so - now - the CMD649 has suddenly begun to fail - losing contact with one or
> two drives, and I _really_ need to get what's on /data (RAID-5 on
> hd[efghijklmnop]) out. Problem is - the replacement controller I've got from
> the vendor works fine (turns up as controller 3 serving hd[mnop]). How can I
> revert this most easily to be able to boot again?

Hindsight: had you used persistent superblocks, this would not have
been a problem. The kernel would know the correct ordering from the
superblocks, not the device names.

Solution 1: Write to the RAID mailing list and have one of the mdadm
gurus give you a one-liner to initialize the array with the proper
ordering.

Solution 2: Edit your /etc/raidtab to reflect the new device naming and
run raidstart.

If you start up the array with a bad ordering, no amount of magic is
going to bring back you data (after parity has been "reconstructed" on
various chunks of your existing data).


>
> I hope this is not too off topic... Please excuse that.
>

linux-raid is a better place.


Cheers,

--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2002-10-03 10:09:42

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: AARGH! Please help. IDE controller fsckup

> > so - now - the CMD649 has suddenly begun to fail - losing contact with
> > one or two drives, and I _really_ need to get what's on /data (RAID-5 on
> > hd[efghijklmnop]) out. Problem is - the replacement controller I've got
> > from the vendor works fine (turns up as controller 3 serving hd[mnop]).
> > How can I revert this most easily to be able to boot again?
>
> Hindsight: had you used persistent superblocks, this would not have
> been a problem. The kernel would know the correct ordering from the
> superblocks, not the device names.

I have used presistent superblocks, but md0,1,2,3 will be differently ordered
if I change the disk order... At least I think so. It surely didn't work.

> Solution 1: Write to the RAID mailing list and have one of the mdadm
> gurus give you a one-liner to initialize the array with the proper
> ordering.
>
> Solution 2: Edit your /etc/raidtab to reflect the new device naming and
> run raidstart.

ok. but this won't be neccecary with persistent superblocks? right?

> If you start up the array with a bad ordering, no amount of magic is
> going to bring back you data (after parity has been "reconstructed" on
> various chunks of your existing data).

But ... with persistent superblock - is it possible to fsckup the raid?

> linux-raid is a better place.

I'll mail them. Thanks anyway

roy
--
Roy Sigurd Karlsbakk, Datavaktmester
ProntoTV AS - http://www.pronto.tv/
Tel: +47 9801 3356

Computers are like air conditioners.
They stop working when you open Windows.

2002-10-03 11:34:51

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: AARGH! Please help. IDE controller fsckup

On Thu, Oct 03, 2002 at 12:25:11PM +0200, Roy Sigurd Karlsbakk wrote:
> > > so - now - the CMD649 has suddenly begun to fail - losing contact with
> > > one or two drives, and I _really_ need to get what's on /data (RAID-5 on
> > > hd[efghijklmnop]) out. Problem is - the replacement controller I've got
> > > from the vendor works fine (turns up as controller 3 serving hd[mnop]).
> > > How can I revert this most easily to be able to boot again?
> >
> > Hindsight: had you used persistent superblocks, this would not have
> > been a problem. The kernel would know the correct ordering from the
> > superblocks, not the device names.
>
> I have used presistent superblocks, but md0,1,2,3 will be differently ordered
> if I change the disk order... At least I think so. It surely didn't work.

No. md0 would stay md0. This is another effect of using superblocks,
and in fact this is also (ironically) more or less the only argument
*against* using them :)

(Imagine inserting a disk which knows that it is disk 0 of md0 into some
machine that already has a perfectly fine md0 running)

>
> > Solution 1: Write to the RAID mailing list and have one of the mdadm
> > gurus give you a one-liner to initialize the array with the proper
> > ordering.
> >
> > Solution 2: Edit your /etc/raidtab to reflect the new device naming and
> > run raidstart.
>
> ok. but this won't be neccecary with persistent superblocks? right?

right

>
> > If you start up the array with a bad ordering, no amount of magic is
> > going to bring back you data (after parity has been "reconstructed" on
> > various chunks of your existing data).
>
> But ... with persistent superblock - is it possible to fsckup the raid?

You're root, it is indeed possible :)

But you would not need to perform any of the special operations that you
need to now.

Persistent superblocks saves you from a number of "bad" situations you
can encounter with normal production systems (such as replacing a
controller or moving disks around).

One should be careful when moving disks with persistent superblocks
between systems though. You don't want the kernel to autodetect the
"wrong" md0 on boot :) I consider this problem nonexistent in the
production environment that I administer, but I know that some people
feel differently about it. You should consider these pros and cons in
relation to your environment and make a decision based on that.

Cheers,

--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2002-10-03 13:18:22

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: AARGH! Please help. IDE controller fsckup

On Thu, Oct 03, 2002 at 03:13:28PM +0200, Roy Sigurd Karlsbakk wrote:
> > > I have used presistent superblocks, but md0,1,2,3 will be differently
> > > ordered if I change the disk order... At least I think so. It surely
> > > didn't work.
> >
> > No. md0 would stay md0. This is another effect of using superblocks,
> > and in fact this is also (ironically) more or less the only argument
> > *against* using them :)
> >
> > (Imagine inserting a disk which knows that it is disk 0 of md0 into some
> > machine that already has a perfectly fine md0 running)
>
> ok. so. theoretically - as long as the system finds all 16 drives, I should be
> able to shuffle them around and attach them to whichever controller there is?
> right?

It will not reattach your disks (you need to move cables to do that),
but it will know "First disk of md0" from "Second disk of md0"
regardless of whether those disks are /dev/hda or /dev/sdg.

You can shuffle your disks around as much as you please. When the RAID
code looks at your disks, it will read their superblocks and correctly
make the first disk of md0 the first disk of md0, and so forth,
regardless of the actual device name of the disk.

>
> ok.
>
> now, I've replaced the faulty controller, and booting up. the new controller
> is also (like the old one) a CMD649...
>

RAID doesn't care about controllers.

RAID without persistent superblocks cares about disk device names.

RAID with persistent superblocks don't care about disk device names.

> h??

?h?

>
> it works. but it surely didn't work last time...
>

Good for you :)

> thanks
>
> > > But ... with persistent superblock - is it possible to fsckup the raid?
> >
> > You're root, it is indeed possible :)
>
> er - yes. I more meant like 'automagically'

It will only automagically screw up your arrays if you shuffle disks
between machines (mix several RAID arrays from other systems in one
system) (you can of course move all your disks to one new machine, if
it has none of it's original RAIDed disks left).

Just don't mix disks with persistent superblocks from multiple machines
into one single machine. Unless you know exactly what you're doing.

--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2002-10-03 12:58:02

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: AARGH! Please help. IDE controller fsckup

> > I have used presistent superblocks, but md0,1,2,3 will be differently
> > ordered if I change the disk order... At least I think so. It surely
> > didn't work.
>
> No. md0 would stay md0. This is another effect of using superblocks,
> and in fact this is also (ironically) more or less the only argument
> *against* using them :)
>
> (Imagine inserting a disk which knows that it is disk 0 of md0 into some
> machine that already has a perfectly fine md0 running)

ok. so. theoretically - as long as the system finds all 16 drives, I should be
able to shuffle them around and attach them to whichever controller there is?
right?

ok.

now, I've replaced the faulty controller, and booting up. the new controller
is also (like the old one) a CMD649...

h??

it works. but it surely didn't work last time...

thanks

> > But ... with persistent superblock - is it possible to fsckup the raid?
>
> You're root, it is indeed possible :)

er - yes. I more meant like 'automagically'

> But you would not need to perform any of the special operations that you
> need to now.
>
> Persistent superblocks saves you from a number of "bad" situations you
> can encounter with normal production systems (such as replacing a
> controller or moving disks around).
>
> One should be careful when moving disks with persistent superblocks
> between systems though. You don't want the kernel to autodetect the
> "wrong" md0 on boot :) I consider this problem nonexistent in the
> production environment that I administer, but I know that some people
> feel differently about it. You should consider these pros and cons in
> relation to your environment and make a decision based on that.


--
Roy Sigurd Karlsbakk, Datavaktmester
ProntoTV AS - http://www.pronto.tv/
Tel: +47 9801 3356

Computers are like air conditioners.
They stop working when you open Windows.

2002-10-03 20:02:32

by Andre Hedrick

[permalink] [raw]
Subject: Re: AARGH! Please help. IDE controller fsckup


One of the observed issues under raid-tools is not looking at all the
devices' superblocks. This would allow for out of order initialization.
Treating the devices as domino chips and stuffing them back in random
order and it working.

If I am wrong here, great. Somebody please make the correction.

Cheers,

On Thu, 3 Oct 2002, Jakob Oestergaard wrote:

> On Thu, Oct 03, 2002 at 03:13:28PM +0200, Roy Sigurd Karlsbakk wrote:
> > > > I have used presistent superblocks, but md0,1,2,3 will be differently
> > > > ordered if I change the disk order... At least I think so. It surely
> > > > didn't work.
> > >
> > > No. md0 would stay md0. This is another effect of using superblocks,
> > > and in fact this is also (ironically) more or less the only argument
> > > *against* using them :)
> > >
> > > (Imagine inserting a disk which knows that it is disk 0 of md0 into some
> > > machine that already has a perfectly fine md0 running)
> >
> > ok. so. theoretically - as long as the system finds all 16 drives, I should be
> > able to shuffle them around and attach them to whichever controller there is?
> > right?
>
> It will not reattach your disks (you need to move cables to do that),
> but it will know "First disk of md0" from "Second disk of md0"
> regardless of whether those disks are /dev/hda or /dev/sdg.
>
> You can shuffle your disks around as much as you please. When the RAID
> code looks at your disks, it will read their superblocks and correctly
> make the first disk of md0 the first disk of md0, and so forth,
> regardless of the actual device name of the disk.
>
> >
> > ok.
> >
> > now, I've replaced the faulty controller, and booting up. the new controller
> > is also (like the old one) a CMD649...
> >
>
> RAID doesn't care about controllers.
>
> RAID without persistent superblocks cares about disk device names.
>
> RAID with persistent superblocks don't care about disk device names.
>
> > h??
>
> ?h?
>
> >
> > it works. but it surely didn't work last time...
> >
>
> Good for you :)
>
> > thanks
> >
> > > > But ... with persistent superblock - is it possible to fsckup the raid?
> > >
> > > You're root, it is indeed possible :)
> >
> > er - yes. I more meant like 'automagically'
>
> It will only automagically screw up your arrays if you shuffle disks
> between machines (mix several RAID arrays from other systems in one
> system) (you can of course move all your disks to one new machine, if
> it has none of it's original RAIDed disks left).
>
> Just don't mix disks with persistent superblocks from multiple machines
> into one single machine. Unless you know exactly what you're doing.
>
> --
> ................................................................
> : [email protected] : And I see the elder races, :
> :.........................: putrid forms of man :
> : Jakob ?stergaard : See him rise and claim the earth, :
> : OZ9ABN : his downfall is at hand. :
> :.........................:............{Konkhra}...............:
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Andre Hedrick
LAD Storage Consulting Group

2002-10-05 15:36:45

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: AARGH! Please help. IDE controller fsckup

Jakob Oestergaard wrote:

>>>>But ... with persistent superblock - is it possible to fsckup the raid?
>>>>
>>>>
>>>You're root, it is indeed possible :)
>>>
>>>
>>er - yes. I more meant like 'automagically'
>>
>>
>
>It will only automagically screw up your arrays if you shuffle disks
>between machines (mix several RAID arrays from other systems in one
>system) (you can of course move all your disks to one new machine, if
>it has none of it's original RAIDed disks left).
>
>Just don't mix disks with persistent superblocks from multiple machines
>into one single machine. Unless you know exactly what you're doing.
>
>
Could it be some kind of idea to 'sign' the disks with some hash out of
hostname and IP or something?

roy