I have been using the 2.6.10.x series kernel for a while on my other
systems with no issues at all.
But on my laptop which has the via 686 chipset, I started having some
wierd issues. This happens after about 2 weeks of non-shutting down
Here is a sample of the data from my kernel log
About the device
May 24 16:22:22 laptop kernel: Uniform Multi-Platform E-IDE driver
Revision: 7.00alpha2
May 24 16:22:22 laptop kernel: ide: Assuming 33MHz system bus speed for
PIO modes; override with idebus=xx
May 24 16:22:22 laptop kernel: VP_IDE: IDE controller at PCI slot
0000:00:07.1
May 24 16:22:22 laptop kernel: VP_IDE: chipset revision 16
May 24 16:22:22 laptop kernel: VP_IDE: not 100%% native mode: will probe
irqs later
May 24 16:22:22 laptop kernel: VP_IDE: VIA vt82c686a (rev 22) IDE UDMA66
controller on pci0000:00:07.1
May 24 16:22:22 laptop kernel: ide0: BM-DMA at 0x1100-0x1107, BIOS
settings: hda:DMA, hdb:pio
May 24 16:22:22 laptop kernel: ide1: BM-DMA at 0x1108-0x110f, BIOS
settings: hdc:DMA, hdd:pio
May 24 16:22:22 laptop kernel: Probing IDE interface ide0...
May 24 16:22:22 laptop kernel: hda: FUJITSU MHS2030AT, ATA DISK drive
May 24 16:22:22 laptop kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
May 24 16:22:22 laptop kernel: Probing IDE interface ide1...
May 24 16:22:22 laptop kernel: hdc: DW-224E, ATAPI CD/DVD-ROM drive
May 24 16:22:22 laptop kernel: ide1 at 0x170-0x177,0x376 on irq 15
May 24 16:22:22 laptop kernel: hda: max request size: 128KiB
May 24 16:22:22 laptop kernel: hda: 58605120 sectors (30005 MB)
w/2048KiB Cache, CHS=58140/16/63
May 24 16:22:22 laptop kernel: hda: cache flushes supported
May 24 16:22:22 laptop kernel: hda: hda1 hda2 hda3 hda4
Error Messages
First sign of the problem
May 24 01:37:03 laptop kernel: ide: failed opcode was: unknown
May 24 01:37:03 laptop kernel: ide0: reset: success
May 24 01:38:15 laptop kernel: hda: irq timeout: status=0xd0 { Busy }
May 24 01:38:15 laptop kernel:
May 24 01:38:15 laptop kernel: ide: failed opcode was: unknown
May 24 01:38:17 laptop kernel: ide0: reset: success
May 24 01:47:57 laptop kernel: hda: irq timeout: status=0xd0 { Busy }
May 24 01:47:57 laptop kernel:
May 24 01:47:57 laptop kernel: ide: failed opcode was: unknown
May 24 01:48:32 laptop kernel: ide0: reset timed-out, status=0xd0
May 24 01:48:32 laptop kernel: hda: status timeout: status=0xd0 { Busy }
May 24 01:48:32 laptop kernel:
May 24 01:48:32 laptop kernel: ide: failed opcode was: unknown
May 24 01:48:32 laptop kernel: hda: drive not ready for command
May 24 01:48:32 laptop kernel: ide0: reset: success
May 24 01:50:59 laptop kernel: hda: irq timeout: status=0xd0 { Busy }
May 24 01:50:59 laptop kernel:
May 24 01:50:59 laptop kernel: ide: failed opcode was: unknown
May 24 01:51:04 laptop kernel: ide0: reset: success
May 24 01:53:44 laptop kernel: hda: irq timeout: status=0xd0 { Busy }
May 24 01:53:49 laptop kernel:
May 24 01:53:49 laptop kernel: ide: failed opcode was: unknown
May 24 01:53:49 laptop kernel: ide0: reset: success
May 24 01:54:14 laptop kernel: hda: irq timeout: status=0xd0 { Busy }
May 24 01:54:25 laptop kernel:
May 24 01:54:25 laptop kernel: ide: failed opcode was: unknown
May 24 01:54:25 laptop kernel: ide0: reset: success
May 24 02:00:12 laptop kernel: hda: irq timeout: status=0xd0 { Busy }
May 24 02:00:12 laptop kernel:
May 24 02:00:12 laptop kernel: ide: failed opcode was: unknown
May 24 02:00:16 laptop kernel: ide0: reset: success
May 24 02:00:35 laptop kernel: hda: irq timeout: status=0xd0 { Busy }
May 24 02:00:35 laptop kernel:
May 24 02:00:35 laptop kernel: ide: failed opcode was: unknown
May 24 02:00:47 laptop kernel: ide0: reset: success
May 24 02:23:36 laptop kernel: hda: status timeout: status=0xd0 { Busy }
Different error messages
May 24 16:10:47 laptop kernel: hda: status timeout: status=0xd0 { Busy }
May 24 16:10:47 laptop kernel:
May 24 16:10:47 laptop kernel: ide: failed opcode was: unknown
May 24 16:10:47 laptop kernel: hda: no DRQ after issuing MULTWRITE
May 24 16:10:50 laptop kernel: ide0: reset: success
May 24 16:14:50 laptop kernel: hda: status timeout: status=0xd0 { Busy }
May 24 16:14:50 laptop kernel:
May 24 16:14:50 laptop kernel: ide: failed opcode was: unknown
May 24 16:14:50 laptop kernel: hda: no DRQ after issuing MULTWRITE
May 24 16:14:55 laptop kernel: ide0: reset: success
May 24 16:15:35 laptop kernel: hda: irq timeout: status=0xd0 { Busy }
May 24 16:15:35 laptop kernel:
May 24 16:15:35 laptop kernel: ide: failed opcode was: unknown
May 24 16:15:37 laptop kernel: ide0: reset: success
May 24 16:16:01 laptop kernel: hda: status timeout: status=0xd0 { Busy }
May 24 16:16:01 laptop kernel:
May 24 16:16:01 laptop kernel: ide: failed opcode was: unknown
May 24 16:16:01 laptop kernel: hda: no DRQ after issuing MULTWRITE
May 24 16:16:01 laptop kernel: ide0: reset: success
The issue is also described here in this forum post from someone else
http://forums.viaarena.com/messageview.aspx?catid=28&threadid=66084&enterthread=y
--
----
Jim Gifford
[email protected]
Tested the hard drive it passes. Any other suggestions
That's not a bad platter issue. It could be the electronics on the
drive have a problem, but more likely something happened like the
drive spun down. If that is the case, the reset at the end should
have woken it up. Does the drive work correctly after the reset?
Ross
On 5/25/05, Jim Gifford <[email protected]> wrote:
> Tested the hard drive it passes. Any other suggestions
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Ross,
I thought of that to, I just have 2 laptops that are identical here
now. I have placed a different drive in the other one. So we can test
it, but looking through all my logs that I backed up, this problem
started when I put 2.6.8 on the laptop. I'm wondering if it could be an
ACPI issue or IDE issue, but have no idea on what to really look for.
May 24 16:22:31 laptop kernel: mtrr: 0x40000000,0x800000 overlaps
existing 0x40000000,0x400000
May 24 18:25:11 laptop kernel: hda: status timeout: status=0xd0 { Busy }
May 24 18:25:11 laptop kernel:
May 24 18:25:11 laptop kernel: ide: failed opcode was: unknown
May 24 18:25:11 laptop kernel: hda: no DRQ after issuing MULTWRITE
May 24 18:25:12 laptop kernel: ide0: reset: success
May 24 12:21:15 laptop2 kernel: mtrr: 0x40000000,0x800000 overlaps
existing 0x40000000,0x400000
May 24 16:55:53 laptop2 kernel: hda: status timeout: status=0xd0 { Busy }
May 24 16:55:53 laptop2 kernel:
May 24 16:55:54 laptop2 kernel: ide: failed opcode was: unknown
May 24 16:55:54 laptop2 kernel: hda: no DRQ after issuing MULTWRITE
May 24 16:55:54 laptop2 kernel: ide0: reset: success
Here is a link to my .config file
http://ftp.jg555.com/configs/laptop.config
Since you say the problem started with 2.6.8, perhaps the easiest
thing to do is look at the
kernel you say worked and its config and then just diff it vs 2.6.8.
I took a second look at your original message, and since it's an irq
timeout, it could just be that your drive is slow to answer a command
and then is confused by the error recovery code. Having two identical
laptops makes this easy. If your other laptop has the same problem,
then it's software (or buggy hardware that the software hasn't figured
out how to work around) if it doesn't, then it's hardware and there
are a couple of things that can be tweaked to see if it fixes things.
Ross
On 5/25/05, Jim Gifford <[email protected]> wrote:
> Ross,
> I thought of that to, I just have 2 laptops that are identical here
> now. I have placed a different drive in the other one. So we can test
> it, but looking through all my logs that I backed up, this problem
> started when I put 2.6.8 on the laptop. I'm wondering if it could be an
> ACPI issue or IDE issue, but have no idea on what to really look for.
>
> May 24 16:22:31 laptop kernel: mtrr: 0x40000000,0x800000 overlaps
> existing 0x40000000,0x400000
> May 24 18:25:11 laptop kernel: hda: status timeout: status=0xd0 { Busy }
> May 24 18:25:11 laptop kernel:
> May 24 18:25:11 laptop kernel: ide: failed opcode was: unknown
> May 24 18:25:11 laptop kernel: hda: no DRQ after issuing MULTWRITE
> May 24 18:25:12 laptop kernel: ide0: reset: success
>
> May 24 12:21:15 laptop2 kernel: mtrr: 0x40000000,0x800000 overlaps
> existing 0x40000000,0x400000
> May 24 16:55:53 laptop2 kernel: hda: status timeout: status=0xd0 { Busy }
> May 24 16:55:53 laptop2 kernel:
> May 24 16:55:54 laptop2 kernel: ide: failed opcode was: unknown
> May 24 16:55:54 laptop2 kernel: hda: no DRQ after issuing MULTWRITE
> May 24 16:55:54 laptop2 kernel: ide0: reset: success
>
> Here is a link to my .config file
> http://ftp.jg555.com/configs/laptop.config
>
If you are using the legacy IDE layer you want to tweak WAIT_CMD. For
testing, you can make it really large and see if that impacts your
problem. A drive vendor once told me that it could take more than a
minute for an IDE drive to complete a command. I no longer purchase
drives from that vendor.
I'm not sure what libata uses, but my guess is it defaults it from the
SCSI layer.
If tweaking these time outs make your problem go away, odds are what
happened was that your drive remapped a few more bad sectors and now
takes a little too long to complete commands. The linux ide error
recovery code does a WIN_IDLE_IMMEDIATE when there is a problem. This
is allowed by the ATA-2 spec, but confuses most modern drives. So
once you start getting errors, often the drive gets so confused, you
never stop.
Ross
On 5/26/05, Jim Gifford <[email protected]> wrote:
> What do you recommend trying?
>
Ross Biro wrote:
> problem. A drive vendor once told me that it could take more than a
> minute for an IDE drive to complete a command. I no longer purchase
> drives from that vendor.
All vendors could potentially take more than 30 seconds to complete ATA
commands such as FLUSH CACHE EXT, in extreme cases.
Jeff