2001-03-20 11:19:06

by Jules Bean

[permalink] [raw]
Subject: IDE DMA resets with ALI15X3 on Aladdin V

Hi,

I have an intermittent problem with my IDE setup:

pear# dmesg | grep -i ide
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
ALI15X3: IDE controller on PCI bus 00 dev 78
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:DMA
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
scsi0 : SCSI host adapter emulation for IDE ATAPI devices

pear# lspci
00:00.0 Host bridge: Acer Laboratories Inc. [ALi] M1541 (rev 04)
00:01.0 PCI bridge: Acer Laboratories Inc. [ALi] M5243 (rev 04)
00:02.0 USB Controller: Acer Laboratories Inc. [ALi] M5237 USB (rev
03)
00:07.0 ISA bridge: Acer Laboratories Inc. [ALi] M1533 PCI to ISA
Bridge [Aladdin IV] (rev c3)
00:08.0 Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev
08)
00:0f.0 IDE interface: Acer Laboratories Inc. [ALi] M5229 IDE (rev c2)
01:00.0 VGA compatible controller: 3Dfx Interactive, Inc. Voodoo 3
(rev 01)


It works absolutely fine under normal load; I see the very occasional

Mar 18 12:23:01 pear kernel: hda: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Mar 18 12:23:01 pear kernel: hda: dma_intr: error=0x84 {
DriveStatusError BadCRC }

but they don't seem to affect performance.

However, very occasionally, normally when the HD is under very heavy
load, I get messages like this:

Mar 20 10:24:05 pear kernel: hda: timeout waiting for DMA
Mar 20 10:24:05 pear kernel: ide_dmaproc: chipset supported
ide_dma_timeout func only: 14
Mar 20 10:24:05 pear kernel: hda: irq timeout: status=0x58 {
DriveReady SeekComplete DataRequest }
Mar 20 10:24:05 pear kernel: hda: status timeout: status=0xd0 { Busy }
Mar 20 10:24:05 pear kernel: hda: drive not ready for command
Mar 20 10:24:05 pear kernel: ide0: reset: success

This is accompanied by a freeze of the machine, which in this
particular instance sorted itself out. Sometimes, the machine goes
down hard, causing some disk corruption (always minor, thankfully, so
far).

This all sounds very like that described in the thread which starts at
http://www.uwsg.iu.edu/hypermail/linux/kernel/0006.1/0924.html
although he didn't seem to get an actually crashes from it.

Sometimes I also hear alarming clicking sounds from the disk; on those
occasions, the crash is hard, and the machine has to be reset.

Is this a hardware fault? I would think so, except for the
intermittent dma_intr errors suggesting there could be a motherboard
problem too?

The disk is as follows:

pear# cat /proc/ide/hda/model
Maxtor 91080D5

(It does normally seem to be that disk, but the other disk is much
smaller anyhow).

I do also sometime see problems with hdd, which is an ATAPI
cd-rom. This normally happens after there's been some problem with
hda, and I get:

Mar 20 09:56:04 pear kernel: scsi0 : SCSI host adapter emulation for
IDE ATAPI devices
Mar 20 09:56:04 pear kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 Inquiry 00 00 00 ff 00
Mar 20 09:56:04 pear kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 Inquiry 00 00 00 ff 00
Mar 20 09:56:04 pear kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Mar 20 09:56:04 pear kernel: SCSI bus is being reset for host 0
channel 0.
Mar 20 09:56:04 pear kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 Inquiry 00 00 00 ff 00
[repeated many times, before finally:]
Mar 20 09:56:04 pear kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Mar 20 09:56:04 pear kernel: SCSI bus is being reset for host 0
channel 0.
Mar 20 09:56:04 pear kernel: hdc: timeout waiting for DMA
Mar 20 09:56:04 pear kernel: ide_dmaproc: chipset supported
ide_dma_timeout func only: 14
Mar 20 09:56:04 pear kernel: hdc: irq timeout: status=0x50 {
DriveReady SeekComplete }
Mar 20 09:56:04 pear kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 Inquiry 00 00 00 ff 00
Mar 20 09:56:04 pear kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Mar 20 09:56:04 pear kernel: SCSI bus is being reset for host 0
channel 0.
Mar 20 09:56:04 pear kernel: hdd: irq timeout: status=0xd0 { Busy }
Mar 20 09:56:04 pear kernel: hdd: DMA disabled
Mar 20 09:56:04 pear kernel: hdd: ATAPI reset complete
Mar 20 09:56:04 pear kernel: hdd: status error: status=0x08 {
DataRequest }
Mar 20 09:56:04 pear kernel: hdd: drive not ready for command
Mar 20 09:56:04 pear kernel: hdd: status timeout: status=0x80 { Busy }
Mar 20 09:56:04 pear kernel: hdd: drive not ready for command
Mar 20 09:56:04 pear kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 Inquiry 00 00 00 ff 00
Mar 20 09:56:04 pear kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Mar 20 09:56:04 pear kernel: SCSI bus is being reset for host 0
channel 0.
Mar 20 09:56:04 pear kernel: hdd: ATAPI reset complete
Mar 20 09:56:04 pear kernel: hdd: irq timeout: status=0x80 { Busy }
Mar 20 09:56:04 pear kernel: hdd: status error: status=0x08 {
DataRequest }
Mar 20 09:56:04 pear kernel: hdd: drive not ready for command
Mar 20 09:56:04 pear kernel: hdd: status timeout: status=0x80 { Busy }
Mar 20 09:56:04 pear kernel: hdd: drive not ready for command
Mar 20 09:56:04 pear kernel: hdd: ATAPI reset complete

This is another reason to think it's not my hda, right? If it gets
confused with hdc and hdd too?

This is all using kernel 2.4.0, but I did used to see similar errors
with 2.2 + IDE patches.

Obviously I'll supply any more detail needed.

Please Cc: me on replies.

Jules

Meta: Is this the right forum for this question? If not, where? The
FAQ at tux.org suggests mailing the driver author directly, but normal
netiquette suggests to me that Andre gets enough mail as it is and
would prefer questions to go via the mailing list.


2001-03-20 14:50:04

by Jeremy Jackson

[permalink] [raw]
Subject: Re: IDE DMA resets with ALI15X3 on Aladdin V

Jules Bean wrote:

> Hi,
>
> I have an intermittent problem with my IDE setup:
>
> pear# dmesg | grep -i ide
> Uniform Multi-Platform E-IDE driver Revision: 6.31
> ide: Assuming 33MHz system bus speed for PIO modes; override with
> idebus=xx
> ALI15X3: IDE controller on PCI bus 00 dev 78
> ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
> ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:DMA
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> ide1 at 0x170-0x177,0x376 on irq 15
> scsi0 : SCSI host adapter emulation for IDE ATAPI devices
> {snip}
>
> It works absolutely fine under normal load; I see the very occasional
>
> Mar 18 12:23:01 pear kernel: hda: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> Mar 18 12:23:01 pear kernel: hda: dma_intr: error=0x84 {
> DriveStatusError BadCRC }
>

This is clue #1 - BadCRC

>
> but they don't seem to affect performance.
>
> However, very occasionally, normally when the HD is under very heavy
> load, I get messages like this:
>
> Mar 20 10:24:05 pear kernel: hda: timeout waiting for DMA
> Mar 20 10:24:05 pear kernel: ide_dmaproc: chipset supported
> ide_dma_timeout func only: 14
> Mar 20 10:24:05 pear kernel: hda: irq timeout: status=0x58 {
> DriveReady SeekComplete DataRequest }
> Mar 20 10:24:05 pear kernel: hda: status timeout: status=0xd0 { Busy }
> Mar 20 10:24:05 pear kernel: hda: drive not ready for command
> Mar 20 10:24:05 pear kernel: ide0: reset: success
>
> This is accompanied by a freeze of the machine, which in this
> particular instance sorted itself out. Sometimes, the machine goes
> down hard, causing some disk corruption (always minor, thankfully, so
> far).
>
> This all sounds very like that described in the thread which starts at
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0006.1/0924.html
> although he didn't seem to get an actually crashes from it.
>
> Sometimes I also hear alarming clicking sounds from the disk; on those

This is clue #2.

>
> occasions, the crash is hard, and the machine has to be reset.
>
> Is this a hardware fault? I would think so, except for the
> intermittent dma_intr errors suggesting there could be a motherboard
> problem too?
>
> The disk is as follows:
>
> pear# cat /proc/ide/hda/model
> Maxtor 91080D5

If memory serves me correctly, this disk is the same model as
one I send back for RMA... I think your disk is toast.
you could try using badblocks to scan it, prefferably in read-write
mode (you'll have to move your data to another disk first, of course,
since this erases data) You could try just a read scan first.
The clicking is likely the drive trying to re-read the bad sectors.
You should try to exchange it under warranty, before your data
disappears.

You could try using an Ultra-DMA 66 cable to improve signal quality,
even though this chipset (I think) only supports UDMA33. Also,
try it with the drive by itsself (not sharing cable with a cdrom or
another disk) on an interface.

Cheers,

Jeremy

>
>
> (It does normally seem to be that disk, but the other disk is much
> smaller anyhow).
>
> I do also sometime see problems with hdd, which is an ATAPI
> cd-rom. This normally happens after there's been some problem with
> hda, and I get:

{snip}