2006-08-26 16:45:08

by Peter

[permalink] [raw]
Subject: wrt: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Using 2.6.17.11 with reiser4 patch only. Voluntary preempt.

Recently, I have been receiving this sequence of errors:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hdb: DMA disabled

Previously, with a different hdb drive, the above hda/b were reversed, but
the output the same.

Diagnostics on the drives are fine. Removing the b drive removes the
messages. System functions fine anyway, and no data is lost as a result
of the errors. The persistence of it is frustrating!

With so many moving parts in a kernel, applications, drivers, and a user's
system, it's hard to pin down the root of the problem. I have a mix of
filesystems, drivers, etc. While I cannot recall exactly when I started
noticing these problems, I would venture it was in the 2.6.16 series. I
have two thoughts about these errors.

1) In my case, hda and hdb are different in capabilities.
/dev/hda:

Model=Maxtor 6Y200P0, FwRev=YAR41BW0, SerialNo=Y65WFMPE
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=57
BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: (null): ATA/ATAPI-1 ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

/dev/hdb:

Model=Maxtor 5T030H3, FwRev=TAH71DP0, SerialNo=T3H1876C
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=57
BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
(maybe): CurCHS=65535/16/0, CurSects=0, LBA=yes, LBAsects=60030432
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: ATA/ATAPI-6 T13 1410D revision 0: ATA/ATAPI-1 ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6

could it be that the dma or io handler is having trouble when each device
has different capabilities on the same ide channel? Or, could it be that
reiserfs or reiser4 is turning off dma because it is timing out somehow?

2) VMWare. I have noticed recently that the errors are occurring during or
after VMWare is run.

from dmesg:

/dev/vmnet: open called by PID 9443 (vmware-vmx)
eth0: Promiscuous mode enabled.
device eth0 entered promiscuous mode
bridge-eth0: enabled promiscuous mode
/dev/vmnet: port on hub 0 successfully opened
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown

I have tried compiling with all three preemption models, including with
the ck, beyond, and other variant patchsets. Currently, I am testing with
NO preemption and the errors are fewer but still there.

MOBO bios has dma enabled including for UDMA and prefetch is set to on.
Again, it's interesting to note I did not observe this error with only one
drive on ide0. I cannot pin down any more details at the moment, but if
anyone wants to explore this further (I know this has been an issue for
years), I am happy to try and help.

--
Peter
+++++
Do not reply to this email, it is a spam trap and not monitored.
I can be reached via this list, or via
jabber: pete4abw at jabber.org
ICQ: 73676357


2006-08-26 17:42:15

by Peter

[permalink] [raw]
Subject: Re: wrt: dma_intr: status=0x51 { DriveReady SeekComplete Error }

On Sat, 26 Aug 2006 16:12:52 +0000, Peter wrote:

> Using 2.6.17.11 with reiser4 patch only. Voluntary preempt.
>
> Recently, I have been receiving this sequence of errors:
>
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
> ide: failed opcode was: unknown

snip....

I should mention that I have an 80 pin IDE cables, and did try replacing
it. Same result. This problem is new, so I do not believe the cable is the
issue -- esp since manu diags show all is well.

--
Peter
+++++
Do not reply to this email, it is a spam trap and not monitored.
I can be reached via this list, or via
jabber: pete4abw at jabber.org
ICQ: 73676357

2006-08-26 22:24:27

by Alan

[permalink] [raw]
Subject: Re: wrt: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Ar Sad, 2006-08-26 am 16:12 +0000, ysgrifennodd Peter:
> 2) VMWare. I have noticed recently that the errors are occurring during or
> after VMWare is run.

Then please report them to the vmware people unless you can reproduce it
from a clean boot which never touched vmware or loaded any vmware
modules.

The only real way this error can occur other than hardware is if
something crapped on the drive configuration. If it is triggered by
vmware you know who to talk to 8)

2006-08-27 07:11:33

by Rogier Wolff

[permalink] [raw]
Subject: Re: wrt: dma_intr: status=0x51 { DriveReady SeekComplete Error }

On Sat, Aug 26, 2006 at 04:12:52PM +0000, Peter wrote:
> Diagnostics on the drives are fine. Removing the b drive removes the
> messages. System functions fine anyway, and no data is lost as a result
> of the errors. The persistence of it is frustrating!

What diagnostics did you try?

(I've got experience with a guy saying: "I have 5 which test perfect
with my diagnostics, but my embedded machine refuses them. What's
wrong?" All of them report through SMART that they HAVE reported media
errors, and they all have bad blocks.)

Do you have "smartd" running? I vaguely remember that it sometimes
triggered error messages from the normal driver.

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

2006-08-27 10:49:30

by Peter

[permalink] [raw]
Subject: Re: wrt: dma_intr: status=0x51 { DriveReady SeekComplete Error }

On Sun, 27 Aug 2006 09:11:31 +0200, Rogier Wolff wrote:

> On Sat, Aug 26, 2006 at 04:12:52PM +0000, Peter wrote:
>> Diagnostics on the drives are fine. Removing the b drive removes the
>> messages. System functions fine anyway, and no data is lost as a result
>> of the errors. The persistence of it is frustrating!
>
> What diagnostics did you try?
>
First badblocks, then Manufacturer's (Maxtor) overnight burn in tests.

> (I've got experience with a guy saying: "I have 5 which test perfect
> with my diagnostics, but my embedded machine refuses them. What's
> wrong?" All of them report through SMART that they HAVE reported media
> errors, and they all have bad blocks.)

I tried to be more careful before posting here :). I even ran a second
test on the drives on a second machine.
>
> Do you have "smartd" running? I vaguely remember that it sometimes
> triggered error messages from the normal driver.
>
No, but I think there is a driver issue such as VMW or the reiserfs.
Reiser even mentions this error on his faq suggesting it's a bad ide
cable. Of course, I changed mine (now testing #4). Of course, switching to
NO preempt has reduced the volume of errors greatly. Even last night
there were none for the time being.

thx.

--
Peter
+++++
Do not reply to this email, it is a spam trap and not monitored.
I can be reached via this list, or via
jabber: pete4abw at jabber.org
ICQ: 73676357