2003-09-06 21:32:41

by Chris Meadors

[permalink] [raw]
Subject: Panic when finishing raidreconf on 2.4.0-test4 with preempt

I've done this twice now, I'd prefer not to do it again, but can upon
request, if you really need the oops output.

Running raidreconf to expand a 4 disk array to 5, seems to work
correctly until the very end. I'm guessing it is as the RAID super
block is being written. A preempt error is triggered and the kernel
panics. Upon reboot the MD driver doesn't think the 5th disk is valid
for consideration in the array and skips over it. Leaving a very
corrupted file system.


2003-09-09 18:12:08

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: Panic when finishing raidreconf on 2.4.0-test4 with preempt

On Sat, Sep 06, 2003 at 05:32:30PM -0400, Chris Meadors wrote:
> I've done this twice now, I'd prefer not to do it again, but can upon
> request, if you really need the oops output.
>
> Running raidreconf to expand a 4 disk array to 5, seems to work
> correctly until the very end. I'm guessing it is as the RAID super
> block is being written. A preempt error is triggered and the kernel
> panics. Upon reboot the MD driver doesn't think the 5th disk is valid
> for consideration in the array and skips over it. Leaving a very
> corrupted file system.

raidreconf does no "funny business" with the kernel, so I think this
points to either:
*) a bug which mkraid can trigger as well
*) an API change combined with missing error handling, which raidreconf
now triggers (by calling the old API)
*) a more general kernel bug - there is a *massive* VM load when
raidreconf does its magic, perhaps calling mkraid after beating
the VM half way to death can trigger the same error?

raidreconf, upon complete reconfiguration, will set up the new
superblock for you array, mark it as "unclean", and add the disks one by
one. Once all disks are added, the kernel should start calculating
parity information (because raidreconf does not do this during the
conversion, and hence marks the newly set up array as unclean in order
to have the kernel do this dirty work).

There should be nothing special about this, compared to normal mkraid
and raidhotadd usage - except raidreconf is probably a lot more likely
to trigger races.

Ah, fingerpointing ;)

(/me sits back, confident that his code is perfect and the kernel alone
is to blame)

--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2003-09-09 19:25:47

by Chris Meadors

[permalink] [raw]
Subject: Re: Panic when finishing raidreconf on 2.4.0-test4 with preempt

On Tue, 2003-09-09 at 14:11, Jakob Oestergaard wrote:

> raidreconf does no "funny business" with the kernel, so I think this
> points to either:
> *) a bug which mkraid can trigger as well
> *) an API change combined with missing error handling, which raidreconf
> now triggers (by calling the old API)
> *) a more general kernel bug - there is a *massive* VM load when
> raidreconf does its magic, perhaps calling mkraid after beating
> the VM half way to death can trigger the same error?
>
> raidreconf, upon complete reconfiguration, will set up the new
> superblock for you array, mark it as "unclean", and add the disks one by
> one. Once all disks are added, the kernel should start calculating
> parity information (because raidreconf does not do this during the
> conversion, and hence marks the newly set up array as unclean in order
> to have the kernel do this dirty work).
>
> There should be nothing special about this, compared to normal mkraid
> and raidhotadd usage - except raidreconf is probably a lot more likely
> to trigger races.
>
> Ah, fingerpointing ;)
>
> (/me sits back, confident that his code is perfect and the kernel alone
> is to blame)

I'll mess around this evening a bit if I get a chance. I really wasn't
in the mood to see this error again (losing five years worth of data can
do that to a person, but I've come to terms (with my own arrogance and
stupidity, along with the data loss (just old e-mails and pictures, but
stuff that is nice to hold onto anyway)) and pre-ordered Plextor's new
DVD burner). But that does leave me with a few blank drives that I can
beat on all anyone needs.

I'll be putting -test5 on first. I had planned on disabling the
preempt, but since that was reported in the oops, I'll leave it on.

I initially ran my mkraid under 2.4.20, but I'll see how it does with
2.6.0-test5. I'll mkraid on a 4 drive RAID5 setup, and see if it
completes. Then raidreconf it to 5 drives. I'll scribble down the oops
this time too, if I see it again.

Anything else anyone wants me to try? Or other data to fill in blanks?

--
Chris

2003-09-09 20:42:16

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: Panic when finishing raidreconf on 2.4.0-test4 with preempt

On Tue, Sep 09, 2003 at 03:21:31PM -0400, Chris Meadors wrote:
...
> I'll mess around this evening a bit if I get a chance. I really wasn't
> in the mood to see this error again (losing five years worth of data can
> do that to a person, but I've come to terms (with my own arrogance and
> stupidity, along with the data loss (just old e-mails and pictures, but
> stuff that is nice to hold onto anyway)) and pre-ordered Plextor's new
> DVD burner). But that does leave me with a few blank drives that I can
> beat on all anyone needs.

Eh, ok, I'm not really sure what you did...

You ran raidreconf once, and after the entire reconfiguration had run,
the kernel barfed.

Then what? You re-ran the reconfiguration? Same as before?

If so, then I can pretty much guarantee you that your data are lost. You
may get Ibas (ibas.no) to scrape off the upper layers of your disk
platters, run some pattern analysis on whats left, and possibly then
retrieve some of your old data, but that's about the only chance I can
see you having.

If you only ran raidreconf once, then there might still be a good chance
to get your data back. To me it doesn't sound like this is the case,
but if it is, please let me know.

Sorry about your loss (but running an experimental raid reconfiguration
tool on an experimental kernel without backups, well... ;)

>
> I'll be putting -test5 on first. I had planned on disabling the
> preempt, but since that was reported in the oops, I'll leave it on.

Ok. It would be interesting to see if the oops goes away when preempt is
disabled.

--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2003-09-12 07:16:14

by Chris Meadors

[permalink] [raw]
Subject: Re: Panic when finishing raidreconf on 2.6.0-test4 with preempt

[Kernel version corrected in the subject line.]
[Plus forgot to include l-k.]

On Tue, 2003-09-09 at 16:42, Jakob Oestergaard wrote:
> On Tue, Sep 09, 2003 at 03:21:31PM -0400, Chris Meadors wrote:
>
> Eh, ok, I'm not really sure what you did...
>
> You ran raidreconf once, and after the entire reconfiguration had run,
> the kernel barfed.

That's what I figured...

> Then what? You re-ran the reconfiguration? Same as before?

...after I ran it the second time.

The problem was, it takes a while for the reconf to run. So I went to
watch a movie or something. I got back and my screen was blanked, key
presses wouldn't clear it. Even Alt+SysRq wouldn't respond. So, I hit
the reset button. That is when I saw that the kernel wouldn't recognize
the drive that should have been part of the array as being part of the
array.

I figured the kernel panicked, and went to reproduce it. As the reconf
was running the second time, I started thinking to myself, that maybe it
wasn't a good idea. That I could have probably recovered data if I had
run fsck on the initial result.

> If so, then I can pretty much guarantee you that your data are lost. You
> may get Ibas (ibas.no) to scrape off the upper layers of your disk
> platters, run some pattern analysis on whats left, and possibly then
> retrieve some of your old data, but that's about the only chance I can
> see you having.
>
> If you only ran raidreconf once, then there might still be a good chance
> to get your data back. To me it doesn't sound like this is the case,
> but if it is, please let me know.

Nope, as I said, I ran it twice. Since my machine was hung solid, and
the screen was blank, I didn't know exactly what had happened, and then
my fingers worked faster than my brain.

> Sorry about your loss (but running an experimental raid reconfiguration
> tool on an experimental kernel without backups, well... ;)

Exactly, the raidreconf HOWTO also plainly says, "unless you don't
consider the data important, you should make a backup of your current
RAID array now." I don't know if it it is worth adding to the
documentation, to not rerun raidreconf, if it fails for what ever
reason, until the array has been recovered to a fully consistent state.

> Ok. It would be interesting to see if the oops goes away when preempt is
> disabled.

Okay, it took some time, but here is what I've tested, all on
2.6.0-test5 now:

First, by mistake, raidreconf on a started array, gets all the way
through, but when it discovers at the end that md0 already has disks, it
exits gracefully, no oops.

Second, raidreconf still triggers the oops in -test5 when expanding a 4
disk RAID5 to 5 disks.

Third, mkraid completes without any trouble when building a new 4 disk
array.

Last, even in a kernel built without preempt support (I don't know why I
thought that was the problem initially, I must have misread something),
raidreconf still oops the machine when attempting to write the new
superblocks.

The here is some of the the output from the oops, copied by hand, as it
hangs the machine solid, and I don't have anything else to capture it
with:

EIP is at blk_read_rq_sectors+0x50/0xd0

Process md0_raid5

Stack trace:

__end_that_request_first+0x127/0x230
scsi_end_request+0x3f/0xf0
scsi_io_completion+0x1bb/0x470
sym_xpt_done+0x3b/0x50
sd_rw_intr+0x5a/0x1d0
scsi_finish_command+0x76/0xc0
run_timer_softirq+0x10a/0x1b0
scsi_softirq+0x99/0xa0
do_IRQ+0xfe/0x130
common_interupt+0x18/0x20
xor_p5_mmx_5+0x6b/0x180
xor_block+0x5b/0xc0
compute_parity+0x15d/0x340
default_wake_function+0x0/0x30
handle_stripe+0x95f/0xcc0
__wake_up_common+0x31/0x60
raid5d+07d/0x140
default_wake_function+0x0/0x30
md_thread+0x0/0x190
kernel_thread_helper+0x5/0x10


If you need anything else, I can reproduce this at will. It just takes
about 30 minutes to reconf to 5 9GB drives.

--
Chris