2005-01-23 02:58:49

by Sven Köhler

[permalink] [raw]
Subject: 2.6 more picky about IDE drives than 2.4 ?

Hi,

i have many problems with kernel 2.6.10 since it won't run stable with
an IDE-device. It's an internal IDE-RAID subsystem. The DMA is
frequently disabled, and even writes/reads fail and the kernel reports
I/O-Errors for many sectors. The RAID-device doesn't report any errors
it it's own event-log. You can have a closer look at the error-messages
below.

I'm mailing to the LKML, since i haven't been abled to reproduce the
problem with a kernel 2.4 bases system, but it randomly happens with 2.6
kernels. Let's take the latest Knoppix as an example (it comes with both
kernels):
- if i boot kernel 2.4, i can stress test the harddisk as much as i
want. the kernel does report any problem and it doesn't disable DMA well
- if i boot kernel 2.6, after a while, there are the error-message below
in the log. "hdparm -k1" doesn't help, the kernel will disable DMA mode.
There was a also a bigger problems for two times now, where the kernel
refused to write to the devide, due to the I/O-Errors below. I'm very
sad, that i haven't the log-lines prior to the I/O-Errors.

I testes the RAID-subsystem with two different PC-systems. Always the
same result: 2.4 works, 2.6 does not. It's hard for me to reproduce the
Errors through. I'm still writing an application to reliably reproduce
them :-( Does anybody know a good stress-test perhaps? Sequential
reading doesn't seem to do the trick.

What changes have been applied to the IDE subsystem from kernel 2.4 to
kernel 2.6? What may cause this different behaviour? What does
"status=0x51" mean? And why is "error=0x00" although the Error-Bit in
the status-byte has been set. (i guess this is what status=0x51 means).

How can the behaviour of kernel 2.6 be reverted to the behaviour of
kernel 2.4? I already tried "hda=nowerr" in the append-line, but it
doesn't help either. Is it a Bug of kernel 2.6, or should i smash the
manufactures doors, to make them release a firmware-update of the
RAID-subsystem since it reports strange values to the OS?


The first kind of errors:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x00 { }
ide: failed opcode was: unknown
hda: recal_intr: status=0x51 { DriveReady SeekComplete Error }
hda: recal_intr: error=0x00 { }
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x00 { }
ide: failed opcode was: unknown
hda: DMA disabled
ide0: reset: success

"dmesg" after with bigger problems:

end_request: I/O error, dev hdc, sector 709679458
ReiserFS: hdc3: warning: vs-13070: reiserfs_read_locked_inode: i/o
failure occurred trying to find stat data of [1018017 1018816 0x0 SD]
ReiserFS: hdc3: warning: clm-6006: writing inode 283 on readonly FS
end_request: I/O error, dev hdc, sector 705275426
ReiserFS: hdc3: warning: vs-13070: reiserfs_read_locked_inode: i/o
failure occurred trying to find stat data of [1018017 1018708 0x0 SD]
ReiserFS: hdc3: warning: clm-6006: writing inode 283 on readonly FS
end_request: I/O error, dev hdc, sector 709687130
ReiserFS: hdc3: warning: vs-13070: reiserfs_read_locked_inode: i/o
failure occurred trying to find stat data of [1018017 1018817 0x0 SD]
ReiserFS: hdc3: warning: clm-6006: writing inode 283 on readonly FS
end_request: I/O error, dev hdc, sector 709695114
ReiserFS: hdc3: warning: vs-13070: reiserfs_read_locked_inode: i/o
failure occurred trying to find stat data of [1018017 1018818 0x0 SD]
ReiserFS: hdc3: warning: clm-6006: writing inode 283 on readonly FS
end_request: I/O error, dev hdc, sector 709107250
...


Subject: Re: 2.6 more picky about IDE drives than 2.4 ?

On Sun, 23 Jan 2005 04:00:52 +0100, Sven K?hler <[email protected]> wrote:
> Hi,

Hi,

> i have many problems with kernel 2.6.10 since it won't run stable with
> an IDE-device. It's an internal IDE-RAID subsystem. The DMA is
> frequently disabled, and even writes/reads fail and the kernel reports
> I/O-Errors for many sectors. The RAID-device doesn't report any errors
> it it's own event-log. You can have a closer look at the error-messages
> below.
>
> I'm mailing to the LKML, since i haven't been abled to reproduce the
> problem with a kernel 2.4 bases system, but it randomly happens with 2.6
> kernels. Let's take the latest Knoppix as an example (it comes with both
> kernels):
> - if i boot kernel 2.4, i can stress test the harddisk as much as i
> want. the kernel does report any problem and it doesn't disable DMA well
> - if i boot kernel 2.6, after a while, there are the error-message below
> in the log. "hdparm -k1" doesn't help, the kernel will disable DMA mode.
> There was a also a bigger problems for two times now, where the kernel
> refused to write to the devide, due to the I/O-Errors below. I'm very
> sad, that i haven't the log-lines prior to the I/O-Errors.

You didn't give any information about your hardware (controller type,
drives used etc). Please read REPORTING-BUGS in the kernel source
directory. Also please find last working kernel version (2.5 or 2.6).

> I testes the RAID-subsystem with two different PC-systems. Always the
> same result: 2.4 works, 2.6 does not. It's hard for me to reproduce the
> Errors through. I'm still writing an application to reliably reproduce
> them :-( Does anybody know a good stress-test perhaps? Sequential
> reading doesn't seem to do the trick.
>
> What changes have been applied to the IDE subsystem from kernel 2.4 to
> kernel 2.6? What may cause this different behaviour? What does
> "status=0x51" mean? And why is "error=0x00" although the Error-Bit in
> the status-byte has been set. (i guess this is what status=0x51 means).
>
> How can the behaviour of kernel 2.6 be reverted to the behaviour of
> kernel 2.4? I already tried "hda=nowerr" in the append-line, but it
> doesn't help either. Is it a Bug of kernel 2.6, or should i smash the
> manufactures doors, to make them release a firmware-update of the
> RAID-subsystem since it reports strange values to the OS?

Dunno, I don't have a magic ball... ;)

Bartlomiej

2005-01-23 17:31:30

by Sven Köhler

[permalink] [raw]
Subject: Re: 2.6 more picky about IDE drives than 2.4 ?

>>i have many problems with kernel 2.6.10 since it won't run stable with
>>an IDE-device. It's an internal IDE-RAID subsystem. The DMA is
>>frequently disabled, and even writes/reads fail and the kernel reports
>>I/O-Errors for many sectors. The RAID-device doesn't report any errors
>>it it's own event-log. You can have a closer look at the error-messages
>>below.
>>
>>I'm mailing to the LKML, since i haven't been abled to reproduce the
>>problem with a kernel 2.4 bases system, but it randomly happens with 2.6
>>kernels. Let's take the latest Knoppix as an example (it comes with both
>>kernels):
>>- if i boot kernel 2.4, i can stress test the harddisk as much as i
>>want. the kernel does report any problem and it doesn't disable DMA well
>>- if i boot kernel 2.6, after a while, there are the error-message below
>>in the log. "hdparm -k1" doesn't help, the kernel will disable DMA mode.
>>There was a also a bigger problems for two times now, where the kernel
>>refused to write to the devide, due to the I/O-Errors below. I'm very
>>sad, that i haven't the log-lines prior to the I/O-Errors.
>
> You didn't give any information about your hardware (controller type,
> drives used etc). Please read REPORTING-BUGS in the kernel source
> directory. Also please find last working kernel version (2.5 or 2.6).

The hardware used was a Intel 440GX Chipset (PIIX southbridge, max
UDMA33), and a Sis 755 (SiS964 soutbridge, max UDMA133). The drive is an
EasyRAID R5A.

http://www.easyraid.com/index.php?Products:easyRAID_R5A

>>I testes the RAID-subsystem with two different PC-systems. Always the
>>same result: 2.4 works, 2.6 does not. It's hard for me to reproduce the
>>Errors through. I'm still writing an application to reliably reproduce
>>them :-( Does anybody know a good stress-test perhaps? Sequential
>>reading doesn't seem to do the trick.
>>
>>What changes have been applied to the IDE subsystem from kernel 2.4 to
>>kernel 2.6? What may cause this different behaviour? What does
>>"status=0x51" mean? And why is "error=0x00" although the Error-Bit in
>>the status-byte has been set. (i guess this is what status=0x51 means).
>>
>>How can the behaviour of kernel 2.6 be reverted to the behaviour of
>>kernel 2.4? I already tried "hda=nowerr" in the append-line, but it
>>doesn't help either. Is it a Bug of kernel 2.6, or should i smash the
>>manufactures doors, to make them release a firmware-update of the
>>RAID-subsystem since it reports strange values to the OS?
>
> Dunno, I don't have a magic ball... ;)

So i guess you will not know The R5A, and the Problems of Kernel 2.6
seems to be controller independant (Intel and SiS testen).

2005-01-25 17:20:01

by Sven Köhler

[permalink] [raw]
Subject: Re: 2.6 more picky about IDE drives than 2.4 ?

The manufacturer of the easyRAID R5A sent be a firmware-update, that
fixes the problem.

So forget about this.