2006-03-09 03:29:33

by Martin Michlmayr

[permalink] [raw]
Subject: Kernel panic on PC with broken hard drive, after DMA errors

My laptop hard drive recently died (or is in the process of dying).
HP wanted me to do some more tests before sending me a replacement, so
I tried booting Linux again today. I got lots of DMA errors, which
was really to be expected, but then I got a kernel panic. While I'd
not blame the kernel when a panic occurs with broken RAM or CPU, I'm
sure sure the kernel should panic just because of a broken IDE drive.

I posted a picture of the panic at http://cyrius.com/tmp/ide_panic.jpg
Is this something that can be fixed or is my hardware really so broken
that the kernel cannot deal with it?
--
Martin Michlmayr
http://www.cyrius.com/


2006-03-09 04:03:42

by Robert Hancock

[permalink] [raw]
Subject: Re: Kernel panic on PC with broken hard drive, after DMA errors

Martin Michlmayr wrote:
> My laptop hard drive recently died (or is in the process of dying).
> HP wanted me to do some more tests before sending me a replacement, so
> I tried booting Linux again today. I got lots of DMA errors, which
> was really to be expected, but then I got a kernel panic. While I'd
> not blame the kernel when a panic occurs with broken RAM or CPU, I'm
> sure sure the kernel should panic just because of a broken IDE drive.
>
> I posted a picture of the panic at http://cyrius.com/tmp/ide_panic.jpg
> Is this something that can be fixed or is my hardware really so broken
> that the kernel cannot deal with it?

Probably is a genuine bug. These kinds of reports have come up a few
times recently as I recall - it seems some of the error handling in the
drivers/ide code isn't quite so robust..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-03-09 15:15:20

by Martin Michlmayr

[permalink] [raw]
Subject: Re: Kernel panic on PC with broken hard drive, after DMA errors

* Robert Hancock <[email protected]> [2006-03-08 22:03]:
> Probably is a genuine bug. These kinds of reports have come up a few
> times recently as I recall - it seems some of the error handling in
> the drivers/ide code isn't quite so robust..

Was the traceback I posted enough so someone can find out what's going
on or do you need more information? I can hook up a serial console
and try to capture the full log, but I'm not sure I can reproduce this
kernel panic. The dying hard drive is quite arbitrary when it comes
to showing errors or working fine...
--
Martin Michlmayr
http://www.cyrius.com/

2006-03-09 16:40:08

by Alan

[permalink] [raw]
Subject: Re: Kernel panic on PC with broken hard drive, after DMA errors

On Iau, 2006-03-09 at 15:14 +0000, Martin Michlmayr wrote:
> on or do you need more information? I can hook up a serial console
> and try to capture the full log, but I'm not sure I can reproduce this
> kernel panic. The dying hard drive is quite arbitrary when it comes
> to showing errors or working fine...

Ancient known problem. I'd be interested if you can however break libata
and the PATA IDE patches the same way.

2006-03-09 16:54:12

by Martin Michlmayr

[permalink] [raw]
Subject: Re: Kernel panic on PC with broken hard drive, after DMA errors

* Alan Cox <[email protected]> [2006-03-09 16:45]:
> Ancient known problem. I'd be interested if you can however break
> libata and the PATA IDE patches the same way.

I can try, but like I said, the hard drive acts pretty arbitrarily and
won't always fail when I want it to. Do you know if there's a way to
trigger the problem? Otherwise I'll just try a couple of times,
but without a good way to trigger the problem I cannot really say if
it's gone with libata.
--
Martin Michlmayr
http://www.cyrius.com/

2006-03-09 18:31:40

by Alan

[permalink] [raw]
Subject: Re: Kernel panic on PC with broken hard drive, after DMA errors

On Iau, 2006-03-09 at 16:53 +0000, Martin Michlmayr wrote:
> * Alan Cox <[email protected]> [2006-03-09 16:45]:
> > Ancient known problem. I'd be interested if you can however break
> > libata and the PATA IDE patches the same way.
>
> I can try, but like I said, the hard drive acts pretty arbitrarily and
> won't always fail when I want it to. Do you know if there's a way to
> trigger the problem? Otherwise I'll just try a couple of times,
> but without a good way to trigger the problem I cannot really say if
> it's gone with libata.

You could try heavy I/O (find / -print type stuff), or if its specific
problem blocks then cp /dev/hda (/dev/sda for libata) /dev/null.

Libata should either error correctly or recover cleanly from the
problems.

2006-03-27 11:13:32

by Martin Michlmayr

[permalink] [raw]
Subject: Re: Kernel panic on PC with broken hard drive, after DMA errors

* Alan Cox <[email protected]> [2006-03-09 16:45]:
> > The dying hard drive is quite arbitrary when it comes to showing
> > errors or working fine...
>
> Ancient known problem. I'd be interested if you can however break
> libata and the PATA IDE patches the same way.

Sorry, but I'm not able to give you more information. I tried again
several times with PATA and never saw the oops again, so I don't think
trying libata would help since not seeing an oops wouldn't mean
anything at all. Unless there is a _specific_ way to trigger this bug
("cause much disk IO" isn't enough because it only led to an oops once
out of maybe something like 30-40 tries) I cannot do anything.
--
Martin Michlmayr
http://www.cyrius.com/