2008-07-20 19:07:54

by Simen Thoresen

[permalink] [raw]
Subject: Misidentification and failing revalidations of ide dvd-roms with libata

Hi all,

I have a curious issue on an NForce4, x86_64 system. After reinstalling
it with a new distro (Ubuntu 8.04-based MythBuntu 8 - 2.6.24-19-generic,
i686), I am experiencing failing revalidations of my IDE DVD-rom drives;

[ 152.874745] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
frozen
[ 152.874753] ata4.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0
pio 8 in
[ 152.874754] cdb 4a 01 00 00 10 00 00 00 08 00 00 00 00 00 00 00
[ 152.874755] res 68/00:01:00:08:00/00:00:00:00:00/a0 Emask
0x2 (HSM violation)
[ 152.874758] ata4.00: status: { DRDY DF DRQ }
[ 152.874775] ata4: soft resetting link
[ 153.124616] ata4.00: model number mismatch 'Pioneer DVD-ROM
ATAPIModel DVD-116 0109' != 'Pio?eer?DVD?ROM?ATA?IMo?el ?VD-?16 ?010?'
[ 153.124621] ata4.00: revalidation failed (errno=-19)
[ 153.124624] ata4: failed to recover some devices, retrying in 5 secs
[ 155.206856] ata4: soft resetting link
[ 155.457098] ata4.00: model number mismatch 'Pioneer DVD-ROM
ATAPIModel DVD-116 0109' != 'Pio?eer?DVD?ROM?ATA?IMo?el ?VD-?16 ?010?'
[ 155.457103] ata4.00: revalidation failed (errno=-19)
[ 155.457106] ata4.00: disabled
[ 155.666359] ata4: soft resetting link
[ 155.916202] ata4.00: ATAPI: Pio?eer?DVD?ROM?ATA?IMo?el ?VD-?16 ?010?,
E1.?9 ?, max UDMA7
[ 155.919834] ata4.00: failed to set xfermode (err_mask=0x1)
[ 155.919837] ata4: failed to recover some devices, retrying in 5 secs
[ 158.002170] ata4: soft resetting link
[ 158.255648] ata4.00: failed to set xfermode (err_mask=0x1)
[ 158.255653] ata4.00: limiting speed to UDMA/100:PIO3
[ 158.255655] ata4: failed to recover some devices, retrying in 5 secs
[ 160.338395] ata4: soft resetting link
[ 160.591046] ata4.00: failed to set xfermode (err_mask=0x1)
[ 160.591050] ata4.00: disabled
[ 160.800818] sr 3:0:0:0: rejecting I/O to offline device
[ 160.800827] ata4: EH complete
[ 160.801023] ata4.00: detaching (SCSI 3:0:0:0)
[ 160.801192] scsi 3:0:0:0: rejecting I/O to dead device
[ 160.801205] scsi 3:0:0:0: rejecting I/O to dead device
[ 160.801208] scsi 3:0:0:0: rejecting I/O to dead device

As I understand, this would most commonly indicate that the drive has
gone bad, but this also occurs on the /other/ drive (same make/model) in
the system. Also, it occurs on one of them, and then so far not on the
other (until I reboot the system). This has only occurred while the
drives have been in use (ripping audio-CDs), and seems to occur fairly
quickly when a drive is in use (during the first few CDs). If only one
drive is in use, it will still fail.

This behaviour was never seen on the previous distro, CentOS4,
2.6.9-something). On this system, the drives were accessed as /dev/hda
and /dev/hdc

Also, I notice that the kernel inconsistently identifies the capabilites
of the drives;
[ 24.048722] Driver 'sr' needs updating - please use bus_type methods
[ 24.053214] sr0: scsi3-mmc drive: 40x/40x cd/rw xa/form2 cdda tray
[ 24.053220] Uniform CD-ROM driver Revision: 3.20
[ 24.053268] sr 2:0:0:0: Attached scsi CD-ROM sr0
[ 24.081327] sr1: scsi3-mmc drive: 12x/40x cd/rw xa/form2 cdda tray
[ 24.081379] sr 3:0:0:0: Attached scsi CD-ROM sr1

Here, the speed-ratings are inconsistent. Both drives are same
make/model, and the speed-rating is either seen as 40x/40x, 12x/40x or
125x/40x. These vary, seemingly at random, between boots. I believe I've
never seen one of them /not/ be 40x/40x. It is possible that the one
that is not 40x/40x is the one that fails, but I'm in no way certain
about this.

I'm not sure what to make of this, as this is my first system that uses
libata for ATAPI-devices like these.

The drives themselves are identified correctly.

[ 18.452227] ata3.00: ATAPI: Pioneer DVD-ROM ATAPIModel DVD-116 0109,
E1.09, max UDMA/66
[ 18.623964] ata3.00: configured for UDMA/66
[ 19.221877] ata4.00: ATAPI: Pioneer DVD-ROM ATAPIModel DVD-116 0109,
E1.09, max UDMA/66
[ 19.393614] ata4.00: configured for UDMA/66
[ 19.395336] scsi 2:0:0:0: CD-ROM PIONEER DVD-ROM DVD-116
1.09 PQ: 0 ANSI: 5
[ 19.395413] scsi 2:0:0:0: Attached scsi generic sg2 type 5
[ 19.396657] scsi 3:0:0:0: CD-ROM PIONEER DVD-ROM DVD-116
1.09 PQ: 0 ANSI: 5
[ 19.396702] scsi 3:0:0:0: Attached scsi generic sg3 type 5


Yours,
-S
--
Simen Thoresen, Dolphin ICS
Systems Administration and Wulfkit Support


2008-07-20 19:19:15

by Alan

[permalink] [raw]
Subject: Re: Misidentification and failing revalidations of ide dvd-roms with libata

> [ 155.457098] ata4.00: model number mismatch 'Pioneer DVD-ROM
> ATAPIModel DVD-116 0109' != 'Pioïeer¡DVD­ROM¡ATAñIMoåel åVD-±16 ¡010¹'
> [ 155.457103] ata4.00: revalidation failed (errno=-19)

So it failed because the data read from the drive was corrupted.

> As I understand, this would most commonly indicate that the drive has
> gone bad, but this also occurs on the /other/ drive (same make/model) in

Or a cable problem.

2008-07-20 19:34:48

by Simen Thoresen

[permalink] [raw]
Subject: Re: Misidentification and failing revalidations of ide dvd-roms with libata

Alan Cox wrote:
>> [ 155.457098] ata4.00: model number mismatch 'Pioneer DVD-ROM
>> ATAPIModel DVD-116 0109' != 'Pioïeer¡DVD­ROM¡ATAñIMoåel åVD-±16 ¡010¹'
>> [ 155.457103] ata4.00: revalidation failed (errno=-19)
>
> So it failed because the data read from the drive was corrupted.
>
>> As I understand, this would most commonly indicate that the drive has
>> gone bad, but this also occurs on the /other/ drive (same make/model) in
>
> Or a cable problem.

Hi Alan,

Should this apply here too? There are two separate cables, and I've
never seen the other drive to fail after the first drive has failed.
I've now ripped ~50 CDs with the still working drive, while this drive
failed during the first 5 CDs on my two previous boots.

I don't doubt that the revalidation goes bad, but I can't see hardware
being the cause of it as it hits any drive, but so-far only one drive pr
boot.

-S

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Simen Thoresen, Dolphin ICS
Systems Administration and Wulfkit Support

2008-07-20 20:13:50

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Misidentification and failing revalidations of ide dvd-roms with libata

On Sunday 20 July 2008 20:34:48 Simen Timian Thoresen wrote:
> Alan Cox wrote:
> >> [ 155.457098] ata4.00: model number mismatch 'Pioneer DVD-ROM
> >> ATAPIModel DVD-116 0109' != 'Pioïeer¡DVD­ROM¡ATAñIMoåel åVD-±16 ¡010¹'
> >> [ 155.457103] ata4.00: revalidation failed (errno=-19)
> >
> > So it failed because the data read from the drive was corrupted.
> >
> >> As I understand, this would most commonly indicate that the drive has
> >> gone bad, but this also occurs on the /other/ drive (same make/model) in
> >
> > Or a cable problem.
>
> Hi Alan,
>
> Should this apply here too? There are two separate cables, and I've
> never seen the other drive to fail after the first drive has failed.
> I've now ripped ~50 CDs with the still working drive, while this drive
> failed during the first 5 CDs on my two previous boots.
>
> I don't doubt that the revalidation goes bad, but I can't see hardware
> being the cause of it as it hits any drive, but so-far only one drive pr
> boot.

Power supply maybe? Check the rails. The +12VDC must be good.

--
Cheers,
Alistair.

2008-07-20 20:48:59

by Simen Thoresen

[permalink] [raw]
Subject: Re: Misidentification and failing revalidations of ide dvd-roms with libata

Alistair John Strachan wrote:
> On Sunday 20 July 2008 20:34:48 Simen Timian Thoresen wrote:
>> Alan Cox wrote:
>>>> [ 155.457098] ata4.00: model number mismatch 'Pioneer DVD-ROM
>>>> ATAPIModel DVD-116 0109' != 'Pioïeer¡DVD­ROM¡ATAñIMoåel åVD-±16 ¡010¹'
>>>> [ 155.457103] ata4.00: revalidation failed (errno=-19)
>>> So it failed because the data read from the drive was corrupted.
>>>
>
> Power supply maybe? Check the rails. The +12VDC must be good.
>
Hi Alistair,

Good suggestion - I've rebooted and am running the box with one of the
DVDroms connected to an external PSU. I'd now expect the one on the
internal rail to be the failing one, if either fails.

I'm done ripping CDs for now, but I'm checking to see if I can trigger
the same ripping DVDs.

Yours,
-S

--
Simen Thoresen, Dolphin ICS
Systems Administration and Wulfkit Support

2008-07-27 19:33:18

by Simen Thoresen

[permalink] [raw]
Subject: Re: Misidentification and failing revalidations of ide dvd-roms with libata

Alistair John Strachan wrote:
> On Sunday 20 July 2008 20:34:48 Simen Timian Thoresen wrote:
>> Alan Cox wrote:
>>>> [ 155.457098] ata4.00: model number mismatch 'Pioneer DVD-ROM
>>>> ATAPIModel DVD-116 0109' != 'Pio?eer?DVD?ROM?ATA?IMo?el ?VD-?16 ?010?'
>>>> [ 155.457103] ata4.00: revalidation failed (errno=-19)
>>> So it failed because the data read from the drive was corrupted.
>>>
>>>> As I understand, this would most commonly indicate that the drive has
>>>> gone bad, but this also occurs on the /other/ drive (same make/model) in
>>> Or a cable problem.
>> Hi Alan,
>>
>> Should this apply here too? There are two separate cables, and I've
>> never seen the other drive to fail after the first drive has failed.
>> I've now ripped ~50 CDs with the still working drive, while this drive
>> failed during the first 5 CDs on my two previous boots.
>>
>> I don't doubt that the revalidation goes bad, but I can't see hardware
>> being the cause of it as it hits any drive, but so-far only one drive pr
>> boot.
>
> Power supply maybe? Check the rails. The +12VDC must be good.

Hi Alistair, Alan,

I've spent a few hours more looking into this, and I'm not really
getting much clearer;

I've run either and both of the two DVDroms on an external power-supply
(drive-power from a jumped, believed good ATX-PSU), and the the symptoms
remain; inconsistent capabilities reported at boot, and when I start
ripping CDs, one of them will fail as above.

As I still have the impression that this started when I switched distros
(ie went from CentOS4 2.6.9 pre-libata-kernel to the current Ubuntu 8.04
2.6.24-kernel), I've now started playing around with the libata module
parameters.

Further suggestions are most welcome .-)

Yours,
-S

--
Simen Thoresen, Dolphin ICS
Systems Administration and Wulfkit Support

2008-07-27 20:48:23

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Misidentification and failing revalidations of ide dvd-roms with libata

On Sunday 27 July 2008 20:33:21 Simen Timian Thoresen wrote:
[snip]
> I've spent a few hours more looking into this, and I'm not really
> getting much clearer;
>
> I've run either and both of the two DVDroms on an external power-supply
> (drive-power from a jumped, believed good ATX-PSU), and the the symptoms
> remain; inconsistent capabilities reported at boot, and when I start
> ripping CDs, one of them will fail as above.
>
> As I still have the impression that this started when I switched distros
> (ie went from CentOS4 2.6.9 pre-libata-kernel to the current Ubuntu 8.04
> 2.6.24-kernel), I've now started playing around with the libata module
> parameters.

If it's not power then I'm still with Alan re the cables. Maybe you have very
long cables?

I used Alan's nForce4 pata driver for a year or so and didn't have any
problems. OTOH maybe at this stage it's worth checking whether amd74xx (old
IDE) breaks in similar ways?

Also, I assume you've memtest86'ed the machine?

--
Cheers,
Alistair.

2008-07-30 17:49:11

by Simen Thoresen

[permalink] [raw]
Subject: Re: Misidentification and failing revalidations of ide dvd-roms with libata

Alistair John Strachan wrote:
> On Sunday 27 July 2008 20:33:21 Simen Timian Thoresen wrote:
> [snip]
>> I've spent a few hours more looking into this, and I'm not really
>> getting much clearer;
>>
>> I've run either and both of the two DVDroms on an external power-supply
>> (drive-power from a jumped, believed good ATX-PSU), and the the symptoms
>> remain; inconsistent capabilities reported at boot, and when I start
>> ripping CDs, one of them will fail as above.
>>
>> As I still have the impression that this started when I switched distros
>> (ie went from CentOS4 2.6.9 pre-libata-kernel to the current Ubuntu 8.04
>> 2.6.24-kernel), I've now started playing around with the libata module
>> parameters.

Hi Alistair,

I'm sorry I have not yet come back on this. I /think/ I've had success
disabling DMA for atapi-devices (adding "options libata dma=5" to my
modprobe.conf file). I've been able to rip a number of cds without any
issues with this (except a reduced speed, I believe). I do get some
noise in dmesg;

[ 9686.570804] end_request: I/O error, dev sr1, sector 0
[ 9686.570814] Buffer I/O error on device sr1, logical block 0
[ 9686.570819] Buffer I/O error on device sr1, logical block 1
[ 9686.572134] end_request: I/O error, dev sr1, sector 1024
[ 9686.573105] end_request: I/O error, dev sr1, sector 1024
[ 9686.574101] end_request: I/O error, dev sr1, sector 1024


...but cdparianoia (ripper application) never complains and I've not
been able to notice any defects in the extracted audio.

As this was disabling DMA, and I've seen that frequency adjustments on
the CPU can cause DMA issues (at the time, with the ivtv-driver) on this
motherboard, I thought that I'd try disabling Cool'n'Quiet and thereby
lock the CPU to full speed, while reenabling DMA, but that did not help.
Ripping the first CDs, one of the drives failed again.

> If it's not power then I'm still with Alan re the cables. Maybe you have very
> long cables?

Nope, normal 40cm'is 80-pin IDE cables. While one could be bad, I
wouldn't expect both to be (etc).

> I used Alan's nForce4 pata driver for a year or so and didn't have any
> problems. OTOH maybe at this stage it's worth checking whether amd74xx (old
> IDE) breaks in similar ways?

from dmesg, I have this;

[ 17.647757] pata_amd 0000:00:06.0: version 0.3.10
[ 17.650097] scsi0 : pata_amd
[ 17.650229] scsi1 : pata_amd
(that's the one, right?)

...so the driver exists in the running kernel. Would I have to build my
own kernel, or is there some way to have this driver take hold instead
of libata? ...or can I do something in modprobe.conf to use the other
driver?

> Also, I assume you've memtest86'ed the machine?

Not strictly recently. I'll leave it doing so this evening, as I'll want
to reboot it to reactivate Cool'n'quiet.

Again, thank you for helping me look into this. For now it /looks/ like
I could be happy doing PIO, but I'd prefer to have this solved properly
if possible.

-S

--
Simen Thoresen, Dolphin ICS
Systems Administration and Wulfkit Support

2008-07-30 20:39:57

by Alistair John Strachan

[permalink] [raw]
Subject: Re: Misidentification and failing revalidations of ide dvd-roms with libata

On Wednesday 30 July 2008 18:31:15 Simen Timian Thoresen wrote:
> Alistair John Strachan wrote:
> > I used Alan's nForce4 pata driver for a year or so and didn't have any
> > problems. OTOH maybe at this stage it's worth checking whether amd74xx
> > (old IDE) breaks in similar ways?
>
> from dmesg, I have this;
>
> [ 17.647757] pata_amd 0000:00:06.0: version 0.3.10
> [ 17.650097] scsi0 : pata_amd
> [ 17.650229] scsi1 : pata_amd
> (that's the one, right?)
>
> ...so the driver exists in the running kernel. Would I have to build my
> own kernel, or is there some way to have this driver take hold instead
> of libata? ...or can I do something in modprobe.conf to use the other
> driver?

I have no clue what your distributor has enabled or disabled. The easiest way
would be to ignore your distribution kernel and just build your own one.

You'll probably need to disable pata_amd (from "Serial ATA (prod) and Parallel
ATA (experimental) drivers") and enable amd74xx (from "ATA/ATAPI/MFM/RLL
support -> Enhanced IDE/MFM/RLL disk/cdrom/tape/floppy support -> AMD and
nVidia IDE support (NEW)"). Note that you'll also need ide-cd for this
subsystem.

As for DMA, I know there were issues with ATAPI on nForce 4 chipsets, but I
don't really know the specifics. Could you try sata_nv.adma=0 on the kernel
cmdline? This is probably a red herring though, as I'm not sure what relation
the sata_nv driver has to pata_amd.

--
Cheers,
Alistair.

2008-07-30 22:58:01

by Alan

[permalink] [raw]
Subject: Re: Misidentification and failing revalidations of ide dvd-roms with libata

> cmdline? This is probably a red herring though, as I'm not sure what relation
> the sata_nv driver has to pata_amd.

None whatsoever. pata_amd drives the PATA port, sata_nv is a driver for
the older Nvidia SATA ports.

2008-07-31 04:44:19

by Simen Thoresen

[permalink] [raw]
Subject: Re: Misidentification and failing revalidations of ide dvd-roms with libata

Simen Timian Thoresen wrote:
> Alistair John Strachan wrote:
>> On Sunday 27 July 2008 20:33:21 Simen Timian Thoresen wrote:
>> [snip]
>>> I've spent a few hours more looking into this, and I'm not really
>>> getting much clearer;
>>>
>>> I've run either and both of the two DVDroms on an external power-supply
>>> (drive-power from a jumped, believed good ATX-PSU), and the the symptoms
>>> remain; inconsistent capabilities reported at boot, and when I start
>>> ripping CDs, one of them will fail as above.
>>>
>>> As I still have the impression that this started when I switched distros
>>> (ie went from CentOS4 2.6.9 pre-libata-kernel to the current Ubuntu 8.04
>>> 2.6.24-kernel), I've now started playing around with the libata module
>>> parameters.
>
> Hi Alistair,
>
> I'm sorry I have not yet come back on this. I /think/ I've had success
> disabling DMA for atapi-devices (adding "options libata dma=5" to my
> modprobe.conf file). I've been able to rip a number of cds without any
> issues with this (except a reduced speed, I believe). I do get some
> noise in dmesg;
>
> [ 9686.570804] end_request: I/O error, dev sr1, sector 0
> [ 9686.570814] Buffer I/O error on device sr1, logical block 0
> [ 9686.570819] Buffer I/O error on device sr1, logical block 1
> [ 9686.572134] end_request: I/O error, dev sr1, sector 1024
> [ 9686.573105] end_request: I/O error, dev sr1, sector 1024
> [ 9686.574101] end_request: I/O error, dev sr1, sector 1024
>
>
> ...but cdparianoia (ripper application) never complains and I've not
> been able to notice any defects in the extracted audio.
>
> As this was disabling DMA, and I've seen that frequency adjustments on
> the CPU can cause DMA issues (at the time, with the ivtv-driver) on this
> motherboard, I thought that I'd try disabling Cool'n'Quiet and thereby
> lock the CPU to full speed, while reenabling DMA, but that did not help.
> Ripping the first CDs, one of the drives failed again.
>
>> If it's not power then I'm still with Alan re the cables. Maybe you
>> have very long cables?
>
> Nope, normal 40cm'is 80-pin IDE cables. While one could be bad, I
> wouldn't expect both to be (etc).
>
>> I used Alan's nForce4 pata driver for a year or so and didn't have any
>> problems. OTOH maybe at this stage it's worth checking whether amd74xx
>> (old IDE) breaks in similar ways?
>
> from dmesg, I have this;
>
> [ 17.647757] pata_amd 0000:00:06.0: version 0.3.10
> [ 17.650097] scsi0 : pata_amd
> [ 17.650229] scsi1 : pata_amd
> (that's the one, right?)
>
> ...so the driver exists in the running kernel. Would I have to build my
> own kernel, or is there some way to have this driver take hold instead
> of libata? ...or can I do something in modprobe.conf to use the other
> driver?
>
>> Also, I assume you've memtest86'ed the machine?
>
> Not strictly recently. I'll leave it doing so this evening, as I'll want
> to reboot it to reactivate Cool'n'quiet.

It passed 22 runs over the night.

I'll aim to build a new kernel using the amd74xx pata driver over the
weekend. For now, I can work with the system as it is using PIO for the
DVDrom drives.

Yours,
-S

> Again, thank you for helping me look into this. For now it /looks/ like
> I could be happy doing PIO, but I'd prefer to have this solved properly
> if possible.
>
> -S
>


--
Simen Thoresen, Dolphin ICS
Systems Administration and Wulfkit Support