2007-05-01 04:33:27

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

[cc'ing linux-ide and Albert, Hi!]

William Thompson wrote:
> On Mon, Apr 30, 2007 at 12:22:21PM +0200, Tejun Heo wrote:
>> William Thompson wrote:
>>> I've been playing with libata on a few machines and I found that this machine
>>> (An old Dell Dimension L866r) gives me this when it loads and does not give me
>>> access to the cdrom. This is the only machine that I've tested that I know
>>> for a fact cannot do DMA on the cdrom. I searched and noticed a similar
>>> problem with 2.6.19-rc versions but I'm not sure if it's the same problem.
>>>
>>> dmesg output:
>>> libata version 2.00 loaded.
>>> ata_piix 0000:00:1f.1: version 2.00ac7
>>> PCI: Setting latency timer of device 0000:00:1f.1 to 64
>>> ata1: PATA max UDMA/66 cmd 0x1F0 ctl 0x3F6 bmdma 0xFFA0 irq 14
>>> ata2: PATA max UDMA/66 cmd 0x170 ctl 0x376 bmdma 0xFFA8 irq 15
>>> scsi0 : ata_piix
>>> ata1.00: ATA-4, max UDMA/33, 10018890 sectors: LBA
>>> ata1.00: ata1: dev 0 multi count 16
>>> ata1.00: configured for UDMA/33
>>> scsi1 : ata_piix
>>> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x1)
>> Hmm... IDENTIFY failed on the second port. How reproducible is the
>> problem? Every time? Does it work with the IDE driver? If so, please
>> post the result of 'hdparm -I /dev/hdX'.
>
> Reproducible? Everytime
>
> Yes, it works fine with the IDE driver, so long as DMA is disabled.
>
> hdparm -I:
> /dev/hdc:
>
> ATAPI CD-ROM, with removable media
> Model Number: Lite-On LTN483S 48x Max
> Serial Number:
> Firmware Revision: PD02
> Standards:
> Used: ATAPI for CD-ROMs, SFF-8020i, r2.5
> Supported: CD-ROM ATAPI-1
> Configuration:
> DRQ response: <=10ms with INTRQ
> Packet size: 12 bytes
> Capabilities:
> LBA, IORDY(can be disabled)
> DMA: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2
> Cycle time: min=120ns recommended=120ns
> PIO: pio0 pio1 pio2 pio3 pio4
> Cycle time: no flow control=120ns IORDY flow control=120ns

The err_mask is AC_ERR_DEV indicating that the device raised aborted the
IDENTIFY command. I wonder what's going on.

Can you change "#undef ATA_DEBUG" in include/linux/libata.h to "#define
ATA_DEBUG" and report the resulting dmesg? There will be a LOT of
messages so you probably want to increase printk buffer size and detach
other devices if possible. It would be best if your root device isn't
driven by libata so that you can just insert the module and store the
resulting dmesg.

Thanks.

--
tejun


2007-05-01 12:17:09

by William Thompson

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

On Tue, May 01, 2007 at 06:32:07AM +0200, Tejun Heo wrote:
> [cc'ing linux-ide and Albert, Hi!]

And be sure to keep me in CC, I'm not on any of these lists.

> William Thompson wrote:
> > On Mon, Apr 30, 2007 at 12:22:21PM +0200, Tejun Heo wrote:
> >> William Thompson wrote:
> >>> I've been playing with libata on a few machines and I found that this machine
> >>> (An old Dell Dimension L866r) gives me this when it loads and does not give me
> >>> access to the cdrom. This is the only machine that I've tested that I know
> >>> for a fact cannot do DMA on the cdrom. I searched and noticed a similar
> >>> problem with 2.6.19-rc versions but I'm not sure if it's the same problem.
> >>>
> >>> dmesg output:
> >>> libata version 2.00 loaded.
> >>> ata_piix 0000:00:1f.1: version 2.00ac7
> >>> PCI: Setting latency timer of device 0000:00:1f.1 to 64
> >>> ata1: PATA max UDMA/66 cmd 0x1F0 ctl 0x3F6 bmdma 0xFFA0 irq 14
> >>> ata2: PATA max UDMA/66 cmd 0x170 ctl 0x376 bmdma 0xFFA8 irq 15
> >>> scsi0 : ata_piix
> >>> ata1.00: ATA-4, max UDMA/33, 10018890 sectors: LBA
> >>> ata1.00: ata1: dev 0 multi count 16
> >>> ata1.00: configured for UDMA/33
> >>> scsi1 : ata_piix
> >>> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x1)
> >> Hmm... IDENTIFY failed on the second port. How reproducible is the
> >> problem? Every time? Does it work with the IDE driver? If so, please
> >> post the result of 'hdparm -I /dev/hdX'.
> >
> > Reproducible? Everytime
> >
> > Yes, it works fine with the IDE driver, so long as DMA is disabled.
> >
> > hdparm -I:
> > /dev/hdc:
> >
> > ATAPI CD-ROM, with removable media
> > Model Number: Lite-On LTN483S 48x Max
> > Serial Number:
> > Firmware Revision: PD02
> > Standards:
> > Used: ATAPI for CD-ROMs, SFF-8020i, r2.5
> > Supported: CD-ROM ATAPI-1
> > Configuration:
> > DRQ response: <=10ms with INTRQ
> > Packet size: 12 bytes
> > Capabilities:
> > LBA, IORDY(can be disabled)
> > DMA: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2
> > Cycle time: min=120ns recommended=120ns
> > PIO: pio0 pio1 pio2 pio3 pio4
> > Cycle time: no flow control=120ns IORDY flow control=120ns
>
> The err_mask is AC_ERR_DEV indicating that the device raised aborted the
> IDENTIFY command. I wonder what's going on.

FYI: This did work with ide-scsi. But I don't think that really matters.

> Can you change "#undef ATA_DEBUG" in include/linux/libata.h to "#define
> ATA_DEBUG" and report the resulting dmesg? There will be a LOT of
> messages so you probably want to increase printk buffer size and detach
> other devices if possible. It would be best if your root device isn't
> driven by libata so that you can just insert the module and store the
> resulting dmesg.

Yes, definately. The system is a generic system bootable from usb, cdrom,
or network. It's not installed on the local hard disk.

The following is a diff between before loading any libata modules and after
loading ata_piix.

--- noata 2007-05-01 08:04:00.207711900 +0000
+++ libata-atapiix 2007-05-01 08:04:33.398212573 +0000
@@ -180,3 +180,162 @@
logips2pp: Detected unknown logitech mouse model 90
input: ImExPS/2 Logitech Explorer Mouse as /class/input/input1
eth0: setting full-duplex.
+libata version 2.00 loaded.
+piix_init: pci_register_driver
+ata_piix 0000:00:1f.1: version 2.00ac7
+ata_pci_init_one: ENTER
+PCI: Setting latency timer of device 0000:00:1f.1 to 64
+ata_device_add: ENTER
+ata_port_add: ENTER
+ata_port_start: prd alloc, virt cd0f3000, dma d0f3000
+ata1: PATA max UDMA/66 cmd 0x1F0 ctl 0x3F6 bmdma 0xFFA0 irq 14
+__ata_port_freeze: ata1 port frozen
+ata_port_add: ENTER
+ata_port_start: prd alloc, virt cd0eb000, dma d0eb000
+ata2: PATA max UDMA/66 cmd 0x170 ctl 0x376 bmdma 0xFFA8 irq 15
+__ata_port_freeze: ata2 port frozen
+ata_device_add: probe begin
+scsi1 : ata_piix
+ata_port_schedule_eh: port EH scheduled
+ata_scsi_error: ENTER
+ata_port_flush_task: ENTER
+ata_port_flush_task: flush #1
+ata1: ata_port_flush_task: flush #2
+ata1: ata_port_flush_task: EXIT
+ata_eh_autopsy: ENTER
+ata_eh_recover: ENTER
+ata_eh_prep_resume: ENTER
+ata_eh_prep_resume: EXIT
+__ata_port_freeze: ata1 port frozen
+ata_std_softreset: ENTER
+ata_std_softreset: about to softreset, devmask=3
+ata_bus_softreset: ata1: bus reset via SRST
+ata_dev_classify: found ATA device by sig
+ata_dev_classify: found ATA device by sig
+ata_std_softreset: EXIT, classes[0]=1 [1]=5
+ata_std_postreset: ENTER
+ata_std_postreset: EXIT
+ata_eh_thaw_port: ata1 port thawed
+ata_eh_revalidate_and_attach: ENTER
+ata1.00: ata_dev_read_id: ENTER, host 1, dev 0
+ata_exec_command_pio: ata1: cmd 0xEC
+ata_hsm_move: ata1: protocol 2 task_state 1 (dev_stat 0x58)
+ata_pio_sector: data read
+ata_hsm_move: ata1: protocol 2 task_state 2 (dev_stat 0x50)
+ata_hsm_move: ata1: dev 0 command complete, drv_stat 0x50
+ata_port_flush_task: ENTER
+ata_port_flush_task: flush #1
+ata1: ata_port_flush_task: flush #2
+ata1: ata_port_flush_task: EXIT
+ata1.00: ATA-4, max UDMA/33, 10018890 sectors: LBA
+ata1.00: ata1: dev 0 multi count 16
+ata_eh_revalidate_and_attach: EXIT
+ata_eh_resume: ENTER
+ata_eh_resume: EXIT
+ata_dev_set_xfermode: set features - xfer mode
+ata_exec_command_pio: ata1: cmd 0xEF
+ata_hsm_move: ata1: protocol 1 task_state 2 (dev_stat 0x50)
+ata_hsm_move: ata1: dev 0 command complete, drv_stat 0x50
+ata_port_flush_task: ENTER
+ata_port_flush_task: flush #1
+ata1: ata_port_flush_task: flush #2
+ata1: ata_port_flush_task: EXIT
+ata_dev_set_xfermode: EXIT, err_mask=0
+ata1.00: ata_dev_read_id: ENTER, host 1, dev 0
+ata_exec_command_pio: ata1: cmd 0xEC
+ata_hsm_move: ata1: protocol 2 task_state 1 (dev_stat 0x58)
+ata_pio_sector: data read
+ata_hsm_move: ata1: protocol 2 task_state 2 (dev_stat 0x50)
+ata_hsm_move: ata1: dev 0 command complete, drv_stat 0x50
+ata_port_flush_task: ENTER
+ata_port_flush_task: flush #1
+ata1: ata_port_flush_task: flush #2
+ata1: ata_port_flush_task: EXIT
+ata_dev_set_mode: xfer_shift=12, xfer_mode=0x42
+ata1.00: configured for UDMA/33
+ata_eh_suspend: ENTER
+ata_eh_suspend: EXIT
+ata_eh_recover: EXIT, rc=0
+ata_scsi_error: EXIT
+scsi2 : ata_piix
+ata_port_schedule_eh: port EH scheduled
+ata_scsi_error: ENTER
+ata_port_flush_task: ENTER
+ata_port_flush_task: flush #1
+ata2: ata_port_flush_task: flush #2
+ata2: ata_port_flush_task: EXIT
+ata_eh_autopsy: ENTER
+ata_eh_recover: ENTER
+ata_eh_prep_resume: ENTER
+ata_eh_prep_resume: EXIT
+__ata_port_freeze: ata2 port frozen
+ata_std_softreset: ENTER
+ata_std_softreset: about to softreset, devmask=1
+ata_bus_softreset: ata2: bus reset via SRST
+ata_dev_classify: found ATA device by sig
+ata_dev_classify: unknown device
+ata_std_softreset: EXIT, classes[0]=1 [1]=5
+ata_std_postreset: ENTER
+ata_std_postreset: EXIT
+ata_eh_thaw_port: ata2 port thawed
+ata_eh_revalidate_and_attach: ENTER
+ata2.00: ata_dev_read_id: ENTER, host 2, dev 0
+ata_exec_command_pio: ata2: cmd 0xEC
+ata_hsm_move: ata2: protocol 2 task_state 1 (dev_stat 0x51)
+ata_hsm_move: ata2: protocol 2 task_state 3 (dev_stat 0x51)
+ata_port_flush_task: ENTER
+ata_port_flush_task: flush #1
+ata2: ata_port_flush_task: flush #2
+ata2: ata_port_flush_task: EXIT
+ata2.00: failed to IDENTIFY (I/O error, err_mask=0x1)
+ata_eh_revalidate_and_attach: EXIT
+ata_eh_prep_resume: ENTER
+ata_eh_prep_resume: EXIT
+__ata_port_freeze: ata2 port frozen
+ata_std_softreset: ENTER
+ata_std_softreset: about to softreset, devmask=1
+ata_bus_softreset: ata2: bus reset via SRST
+ata_dev_classify: unknown device
+ata_std_softreset: EXIT, classes[0]=5 [1]=5
+ata_std_postreset: ENTER
+ata_std_postreset: EXIT, no device
+ata_eh_thaw_port: ata2 port thawed
+ata_eh_revalidate_and_attach: ENTER
+ata_eh_revalidate_and_attach: EXIT
+ata_eh_resume: ENTER
+ata_eh_resume: EXIT
+ata_eh_suspend: ENTER
+ata_eh_suspend: EXIT
+ata_eh_recover: EXIT, rc=0
+ata_scsi_error: EXIT
+ata_device_add: host probe begin
+ata_scsi_dump_cdb: CDB (1:0,0,0) 12 00 00 00 24 00 00 00 00
+ata_scsi_dump_cdb: CDB (1:0,0,0) 12 00 00 00 60 00 00 00 00
+scsi 1:0:0:0: Direct-Access ATA QUANTUM FIREBALL A08. PQ: 0 ANSI: 5
+ata_scsi_dump_cdb: CDB (1:0,0,0) 00 00 00 00 00 00 00 00 00
+ata_scsi_dump_cdb: CDB (1:0,0,0) 25 00 00 00 00 00 00 00 00
+SCSI device sdb: 10018890 512-byte hdwr sectors (5130 MB)
+ata_scsi_dump_cdb: CDB (1:0,0,0) 5a 00 3f 00 00 00 00 00 08
+sdb: Write Protect is off
+sdb: Mode Sense: 00 3a 00 00
+ata_scsi_dump_cdb: CDB (1:0,0,0) 5a 00 08 00 00 00 00 00 08
+ata_scsi_dump_cdb: CDB (1:0,0,0) 5a 00 08 00 00 00 00 00 24
+SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
+ata_scsi_dump_cdb: CDB (1:0,0,0) 00 00 00 00 00 00 00 00 24
+ata_scsi_dump_cdb: CDB (1:0,0,0) 25 00 00 00 00 00 00 00 00
+SCSI device sdb: 10018890 512-byte hdwr sectors (5130 MB)
+ata_scsi_dump_cdb: CDB (1:0,0,0) 5a 00 3f 00 00 00 00 00 08
+sdb: Write Protect is off
+sdb: Mode Sense: 00 3a 00 00
+ata_scsi_dump_cdb: CDB (1:0,0,0) 5a 00 08 00 00 00 00 00 08
+ata_scsi_dump_cdb: CDB (1:0,0,0) 5a 00 08 00 00 00 00 00 24
+SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
+ sdb:<3>ata_scsi_dump_cdb: CDB (1:0,0,0) 28 00 00 00 00 00 00 00 08
+ata_sg_setup: 1 sg elements mapped
+ata_exec_command_pio: ata1: cmd 0xC8
+ata_hsm_move: ata1: protocol 3 task_state 2 (dev_stat 0x50)
+ata_hsm_move: ata1: dev 0 command complete, drv_stat 0x50
+ sdb1
+sd 1:0:0:0: Attached scsi disk sdb
+sd 1:0:0:0: Attached scsi generic sg1 type 0
+piix_init: done

2007-05-01 12:54:15

by Mark Lord

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

Tejun Heo wrote:
..
>>>> scsi1 : ata_piix
>>>> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x1)
..
> The err_mask is AC_ERR_DEV indicating that the device raised aborted the
> IDENTIFY command. I wonder what's going on.
>
> Can you change "#undef ATA_DEBUG" in include/linux/libata.h to "#define
> ATA_DEBUG" and report the resulting dmesg? There will be a LOT of
> messages so you probably want to increase printk buffer size and detach
> other devices if possible. It would be best if your root device isn't
> driven by libata so that you can just insert the module and store the
> resulting dmesg.

If the command really was "IDENTIFY", then of course it failed.
This is an ATAPI device, and it wants "PACKET_IDENTIFY".

Maybe libata somehow thinks it's ATA rather than ATAPI?

2007-05-01 12:55:29

by Mark Lord

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

Mark Lord wrote:
> Tejun Heo wrote:
> ..
>>>>> scsi1 : ata_piix
>>>>> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x1)
> ..
>> The err_mask is AC_ERR_DEV indicating that the device raised aborted the
>> IDENTIFY command. I wonder what's going on.
>>
>> Can you change "#undef ATA_DEBUG" in include/linux/libata.h to "#define
>> ATA_DEBUG" and report the resulting dmesg? There will be a LOT of
>> messages so you probably want to increase printk buffer size and detach
>> other devices if possible. It would be best if your root device isn't
>> driven by libata so that you can just insert the module and store the
>> resulting dmesg.
>
> If the command really was "IDENTIFY", then of course it failed.
> This is an ATAPI device, and it wants "PACKET_IDENTIFY".
>
> Maybe libata somehow thinks it's ATA rather than ATAPI?

And this is the second one today where it would be very useful
to see a tf dump. It's time to add one to that code patch, methinks.

2007-05-01 13:04:55

by Mark Lord

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

William Thompson wrote:
>
> +scsi2 : ata_piix
> +ata_port_schedule_eh: port EH scheduled
> +ata_scsi_error: ENTER
> +ata_port_flush_task: ENTER
> +ata_port_flush_task: flush #1
> +ata2: ata_port_flush_task: flush #2
> +ata2: ata_port_flush_task: EXIT
> +ata_eh_autopsy: ENTER
> +ata_eh_recover: ENTER
> +ata_eh_prep_resume: ENTER
> +ata_eh_prep_resume: EXIT
> +__ata_port_freeze: ata2 port frozen
> +ata_std_softreset: ENTER
> +ata_std_softreset: about to softreset, devmask=1
> +ata_bus_softreset: ata2: bus reset via SRST
> +ata_dev_classify: found ATA device by sig
> +ata_dev_classify: unknown device
> +ata_std_softreset: EXIT, classes[0]=1 [1]=5
> +ata_std_postreset: ENTER
> +ata_std_postreset: EXIT
> +ata_eh_thaw_port: ata2 port thawed
> +ata_eh_revalidate_and_attach: ENTER
> +ata2.00: ata_dev_read_id: ENTER, host 2, dev 0
> +ata_exec_command_pio: ata2: cmd 0xEC
> +ata_hsm_move: ata2: protocol 2 task_state 1 (dev_stat 0x51)
> +ata_hsm_move: ata2: protocol 2 task_state 3 (dev_stat 0x51)
> +ata_port_flush_task: ENTER
> +ata_port_flush_task: flush #1
> +ata2: ata_port_flush_task: flush #2
> +ata2: ata_port_flush_task: EXIT
> +ata2.00: failed to IDENTIFY (I/O error, err_mask=0x1)
> +ata_eh_revalidate_and_attach: EXIT
..

Yep. Libata seems to be treating the ATAPI drive as ATA.

Tejun, don't we have a fallback for when IDENTIFY fails?
If the drive rejects it (err=0x04), then this can mean only one thing:
unsupported command, so we next must try PACKET_IDENTIFY.

-ml

2007-05-01 13:18:21

by William Thompson

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

On Tue, May 01, 2007 at 09:04:46AM -0400, Mark Lord wrote:
> William Thompson wrote:
> >
> >+scsi2 : ata_piix
> >+ata_port_schedule_eh: port EH scheduled
> >+ata_scsi_error: ENTER
> >+ata_port_flush_task: ENTER
> >+ata_port_flush_task: flush #1
> >+ata2: ata_port_flush_task: flush #2
> >+ata2: ata_port_flush_task: EXIT
> >+ata_eh_autopsy: ENTER
> >+ata_eh_recover: ENTER
> >+ata_eh_prep_resume: ENTER
> >+ata_eh_prep_resume: EXIT
> >+__ata_port_freeze: ata2 port frozen
> >+ata_std_softreset: ENTER
> >+ata_std_softreset: about to softreset, devmask=1
> >+ata_bus_softreset: ata2: bus reset via SRST
> >+ata_dev_classify: found ATA device by sig
> >+ata_dev_classify: unknown device
> >+ata_std_softreset: EXIT, classes[0]=1 [1]=5
> >+ata_std_postreset: ENTER
> >+ata_std_postreset: EXIT
> >+ata_eh_thaw_port: ata2 port thawed
> >+ata_eh_revalidate_and_attach: ENTER
> >+ata2.00: ata_dev_read_id: ENTER, host 2, dev 0
> >+ata_exec_command_pio: ata2: cmd 0xEC
> >+ata_hsm_move: ata2: protocol 2 task_state 1 (dev_stat 0x51)
> >+ata_hsm_move: ata2: protocol 2 task_state 3 (dev_stat 0x51)
> >+ata_port_flush_task: ENTER
> >+ata_port_flush_task: flush #1
> >+ata2: ata_port_flush_task: flush #2
> >+ata2: ata_port_flush_task: EXIT
> >+ata2.00: failed to IDENTIFY (I/O error, err_mask=0x1)
> >+ata_eh_revalidate_and_attach: EXIT
> ..
>
> Yep. Libata seems to be treating the ATAPI drive as ATA.
>
> Tejun, don't we have a fallback for when IDENTIFY fails?
> If the drive rejects it (err=0x04), then this can mean only one thing:
> unsupported command, so we next must try PACKET_IDENTIFY.

Is it doing that just for this drive? I use libata on another machine (Dell
Dimension 2400) and it finds the cdrom just fine.

The one thing I do know, the machine with the non-working libata cdrom also
does not work with the ide driver *ONLY IF* DMA is turned on.

2007-05-01 13:24:47

by Mark Lord

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

William Thompson wrote:
>
> The one thing I do know, the machine with the non-working libata cdrom also
> does not work with the ide driver *ONLY IF* DMA is turned on.

That's probably because it's a mdma2 device, and not many chipsets seem
to do mdma2 correctly. I had a drive like that around here, too.
It could do dma on only one motherboard out of our half dozen or so.

But libata might actually be able to use mdma2 with it,
as I believe (unsubstantiated) that Alan may have done
a better implementation of setting the timings than what
we had with our old IDE drivers.

Cheers

2007-05-01 13:33:29

by Alan

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

> But libata might actually be able to use mdma2 with it,
> as I believe (unsubstantiated) that Alan may have done
> a better implementation of setting the timings than what
> we had with our old IDE drivers.

The new code knows how to set MWDMA2 timings properly, and it knows about
picking timings that also work for PIO. That's not my credit however, it
came for free when we adopted and used the ide-timing.h code from Vojtech
Pavlik.

I know Sergei thinks we should probably switch timings between PIO and
DMA operations when needed rather than run slower so it will be
interesting to know what this drive actually needs to behave with libata.

2007-05-01 13:33:46

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

Mark Lord wrote:
> And this is the second one today where it would be very useful
> to see a tf dump. It's time to add one to that code patch, methinks.

Yeah, we have all these fancy ata_msg_() thingies which can be used to
provide a lot of debugging info without affecting hot path. We're just
too lazy to actually use it. :-( Putting it on my ever growing to-do list.

--
tejun

2007-05-01 13:41:50

by Mark Lord

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

Tejun Heo wrote:
> Mark Lord wrote:
>> And this is the second one today where it would be very useful
>> to see a tf dump. It's time to add one to that code patch, methinks.
>
> Yeah, we have all these fancy ata_msg_() thingies which can be used to
> provide a lot of debugging info without affecting hot path. We're just
> too lazy to actually use it. :-( Putting it on my ever growing to-do list.

The thing about failures is, we're lucky enough to ever see even a single
end-user error log. So when we do get the *one* error log from grandma,
it had better contain sufficient info for us to at least guess at the
solution, because grandma has gone back to OS/X in the meanwhile. ;)

It's a good thing that failure paths are not normally "hot".
We can load those up with a little extra info, so long as they
cannot normally be trigged from userspace to generate a Denial-Of-Service
style of attack.

I think libata drive probing is a pretty safe area to show lots
of information (by default) when something goes wrong.

Cheers!

2007-05-01 13:41:59

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

Hello,

William Thompson wrote:
> On Tue, May 01, 2007 at 09:04:46AM -0400, Mark Lord wrote:
>>> +ata_dev_classify: found ATA device by sig
>>> +ata_dev_classify: unknown device
>>> +ata_std_softreset: EXIT, classes[0]=1 [1]=5
>>> +ata_std_postreset: ENTER
>>> +ata_std_postreset: EXIT
>>> +ata_eh_thaw_port: ata2 port thawed
>>> +ata_eh_revalidate_and_attach: ENTER
>>> +ata2.00: ata_dev_read_id: ENTER, host 2, dev 0
>>> +ata_exec_command_pio: ata2: cmd 0xEC
>>> +ata_hsm_move: ata2: protocol 2 task_state 1 (dev_stat 0x51)
>>> +ata_hsm_move: ata2: protocol 2 task_state 3 (dev_stat 0x51)
>>> +ata_port_flush_task: ENTER
>>> +ata_port_flush_task: flush #1
>>> +ata2: ata_port_flush_task: flush #2
>>> +ata2: ata_port_flush_task: EXIT
>>> +ata2.00: failed to IDENTIFY (I/O error, err_mask=0x1)
>>> +ata_eh_revalidate_and_attach: EXIT
>> ..
>>
>> Yep. Libata seems to be treating the ATAPI drive as ATA.

Yeah, indeed.

>> Tejun, don't we have a fallback for when IDENTIFY fails?
>> If the drive rejects it (err=0x04), then this can mean only one thing:
>> unsupported command, so we next must try PACKET_IDENTIFY.

Up until now, we've been depending on the device giving us the correct
signature on reset. This is the first reported case which screws that
up. Gee... Two crazy screwed up devices today. What's going on?

> Is it doing that just for this drive? I use libata on another machine (Dell
> Dimension 2400) and it finds the cdrom just fine.

That specific cdrom is crazy. It's telling libata that it's a disk when
it apparently isn't.

> The one thing I do know, the machine with the non-working libata cdrom also
> does not work with the ide driver *ONLY IF* DMA is turned on.

This is probably as Mark explained in the other thread.

Anyways, oh well, it looks like we need fallback mechanism to the other
IDENTIFY command after all. Jeff, any ideas or objections?

--
tejun

2007-05-01 13:51:42

by Mark Lord

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

>Gee... Two crazy screwed up devices today. What's going on?

libata is finally seeing use by non-kernel developers.

Cheers

2007-05-01 14:19:40

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

Mark Lord wrote:
>> Gee... Two crazy screwed up devices today. What's going on?
>
> libata is finally seeing use by non-kernel developers.

Amusing but factually incorrect. libata has been shipping in major
distros for years, with bazoodles of active users.

Jeff



2007-05-01 14:23:11

by Mark Lord

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

Jeff Garzik wrote:
> Mark Lord wrote:
>>> Gee... Two crazy screwed up devices today. What's going on?
>>
>> libata is finally seeing use by non-kernel developers.
>
> Amusing but factually incorrect. libata has been shipping in major
> distros for years, with bazoodles of active users.

But of course.

Corrected version:

libata-PATA is finally seeing use by more non-kernel developers.

2007-05-01 14:25:22

by Alan

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

On Tue, 01 May 2007 10:19:33 -0400
Jeff Garzik <[email protected]> wrote:

> Mark Lord wrote:
> >> Gee... Two crazy screwed up devices today. What's going on?
> >
> > libata is finally seeing use by non-kernel developers.
>
> Amusing but factually incorrect. libata has been shipping in major
> distros for years, with bazoodles of active users.

But not for PATA devices, SATA is *far* saner ...

2007-05-01 17:21:22

by William Thompson

[permalink] [raw]
Subject: Re: 2.6.20 libata cdrom

On Tue, May 01, 2007 at 03:40:36PM +0200, Tejun Heo wrote:
> >> Tejun, don't we have a fallback for when IDENTIFY fails?
> >> If the drive rejects it (err=0x04), then this can mean only one thing:
> >> unsupported command, so we next must try PACKET_IDENTIFY.
>
> Up until now, we've been depending on the device giving us the correct
> signature on reset. This is the first reported case which screws that
> up. Gee... Two crazy screwed up devices today. What's going on?

libata-pata is getting it's field testing now!

> > Is it doing that just for this drive? I use libata on another machine (Dell
> > Dimension 2400) and it finds the cdrom just fine.
>
> That specific cdrom is crazy. It's telling libata that it's a disk when
> it apparently isn't.

Wasn't sure, I thought it was something with the chipset. Every Dell
Dimension L series machine has done this to me (The DMA thing, that is).
Some gateways have as well.

> > The one thing I do know, the machine with the non-working libata cdrom also
> > does not work with the ide driver *ONLY IF* DMA is turned on.
>
> This is probably as Mark explained in the other thread.
>
> Anyways, oh well, it looks like we need fallback mechanism to the other
> IDENTIFY command after all. Jeff, any ideas or objections?

Got a patch or anything I can do to test?

Subject: Re: 2.6.20 libata cdrom


Hi,

On Tuesday 01 May 2007, William Thompson wrote:
> On Tue, May 01, 2007 at 09:04:46AM -0400, Mark Lord wrote:
> > William Thompson wrote:
> > >
> > >+scsi2 : ata_piix
> > >+ata_port_schedule_eh: port EH scheduled
> > >+ata_scsi_error: ENTER
> > >+ata_port_flush_task: ENTER
> > >+ata_port_flush_task: flush #1
> > >+ata2: ata_port_flush_task: flush #2
> > >+ata2: ata_port_flush_task: EXIT
> > >+ata_eh_autopsy: ENTER
> > >+ata_eh_recover: ENTER
> > >+ata_eh_prep_resume: ENTER
> > >+ata_eh_prep_resume: EXIT
> > >+__ata_port_freeze: ata2 port frozen
> > >+ata_std_softreset: ENTER
> > >+ata_std_softreset: about to softreset, devmask=1
> > >+ata_bus_softreset: ata2: bus reset via SRST
> > >+ata_dev_classify: found ATA device by sig
> > >+ata_dev_classify: unknown device
> > >+ata_std_softreset: EXIT, classes[0]=1 [1]=5
> > >+ata_std_postreset: ENTER
> > >+ata_std_postreset: EXIT
> > >+ata_eh_thaw_port: ata2 port thawed
> > >+ata_eh_revalidate_and_attach: ENTER
> > >+ata2.00: ata_dev_read_id: ENTER, host 2, dev 0
> > >+ata_exec_command_pio: ata2: cmd 0xEC
> > >+ata_hsm_move: ata2: protocol 2 task_state 1 (dev_stat 0x51)
> > >+ata_hsm_move: ata2: protocol 2 task_state 3 (dev_stat 0x51)
> > >+ata_port_flush_task: ENTER
> > >+ata_port_flush_task: flush #1
> > >+ata2: ata_port_flush_task: flush #2
> > >+ata2: ata_port_flush_task: EXIT
> > >+ata2.00: failed to IDENTIFY (I/O error, err_mask=0x1)
> > >+ata_eh_revalidate_and_attach: EXIT
> > ..
> >
> > Yep. Libata seems to be treating the ATAPI drive as ATA.
> >
> > Tejun, don't we have a fallback for when IDENTIFY fails?
> > If the drive rejects it (err=0x04), then this can mean only one thing:
> > unsupported command, so we next must try PACKET_IDENTIFY.
>
> Is it doing that just for this drive? I use libata on another machine (Dell
> Dimension 2400) and it finds the cdrom just fine.
>
> The one thing I do know, the machine with the non-working libata cdrom also
> does not work with the ide driver *ONLY IF* DMA is turned on.

Please give 2.6.21 kernel w/ IDE subsystem and DMA enabled a try
(2.6.21 kernel contains some important updates for both piix IDE driver
and DMA tuning code).

Also there is a newer firmware for this drive (PD03, your drive uses PD02),
although the changelog doesn't say anything about DMA problems it wouldn't
hurt to give it a try:

http://support.us.dell.com/support/downloads/download.aspx?c=us&l=en&s=gen&releaseid=R20664&formatcnt=1&libid=0&fileid=20516

Thanks,
Bart

2007-05-07 16:42:43

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] libata: fallback to the other IDENTIFY on device error

It seems the world isn't as frank as we thought and some devices lie
about who they are. Fallback to the other IDENTIFY if IDENTIFY fails
with device error. As this is the strategy used by IDE for a long
time, it shouldn't cause too much problem.

Signed-off-by: Tejun Heo <[email protected]>
Cc: William Thompson <[email protected]>
---
Willam, please verify this fixes your problem. Jeff, after all, we
seem to need this. :-(

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index ca67484..f543109 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1652,7 +1652,7 @@ int ata_dev_read_id(struct ata_device *d
struct ata_taskfile tf;
unsigned int err_mask = 0;
const char *reason;
- int tried_spinup = 0;
+ int may_fallback = 1, tried_spinup = 0;
int rc;

if (ata_msg_ctl(ap))
@@ -1696,11 +1696,30 @@ int ata_dev_read_id(struct ata_device *d
return -ENOENT;
}

+ if (may_fallback && (err_mask == AC_ERR_DEV)) {
+ may_fallback = 0;
+
+ /* Device or controller might have reported
+ * the wrong device class. Give a shot at then
+ * other IDENTIFY
+ */
+ if (class == ATA_DEV_ATA)
+ class = ATA_DEV_ATAPI;
+ else
+ class = ATA_DEV_ATA;
+ goto retry;
+ }
+
rc = -EIO;
reason = "I/O error";
goto err_out;
}

+ /* Falling back doesn't make sense if ID data was read
+ * successfully at least once.
+ */
+ may_fallback = 0;
+
swap_buf_le16(id, ATA_ID_WORDS);

/* sanity check */

2007-05-07 17:34:16

by William Thompson

[permalink] [raw]
Subject: Re: [PATCH] libata: fallback to the other IDENTIFY on device error

On Mon, May 07, 2007 at 06:42:26PM +0200, Tejun Heo wrote:
> It seems the world isn't as frank as we thought and some devices lie
> about who they are. Fallback to the other IDENTIFY if IDENTIFY fails
> with device error. As this is the strategy used by IDE for a long
> time, it shouldn't cause too much problem.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: William Thompson <[email protected]>
> ---
> Willam, please verify this fixes your problem. Jeff, after all, we
> seem to need this. :-(

Seems that I have a different version than you do. See below.

> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index ca67484..f543109 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -1652,7 +1652,7 @@ int ata_dev_read_id(struct ata_device *d
> struct ata_taskfile tf;
> unsigned int err_mask = 0;
> const char *reason;
> - int tried_spinup = 0;
> + int may_fallback = 1, tried_spinup = 0;
> int rc;

My version doesn't have "int tried_spinup = 0;" in it at all. I did find
where const char *reason; is (line 1455)

I'm using 2.6.20

I probably won't use 2.6.21 due to some of the problems I read about on the
kernel list.

I'll manually add this patch to mine and see how far I get.

2007-05-07 18:05:58

by William Thompson

[permalink] [raw]
Subject: Re: [PATCH] libata: fallback to the other IDENTIFY on device error

On Mon, May 07, 2007 at 06:42:26PM +0200, Tejun Heo wrote:
> It seems the world isn't as frank as we thought and some devices lie
> about who they are. Fallback to the other IDENTIFY if IDENTIFY fails
> with device error. As this is the strategy used by IDE for a long
> time, it shouldn't cause too much problem.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: William Thompson <[email protected]>
> ---
> Willam, please verify this fixes your problem. Jeff, after all, we
> seem to need this. :-(

Even though you have a different libata-core.c than I, the patch does work!

2007-05-08 08:02:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] libata: fallback to the other IDENTIFY on device error

William Thompson wrote:
> On Mon, May 07, 2007 at 06:42:26PM +0200, Tejun Heo wrote:
>> It seems the world isn't as frank as we thought and some devices lie
>> about who they are. Fallback to the other IDENTIFY if IDENTIFY fails
>> with device error. As this is the strategy used by IDE for a long
>> time, it shouldn't cause too much problem.
>>
>> Signed-off-by: Tejun Heo <[email protected]>
>> Cc: William Thompson <[email protected]>
>> ---
>> Willam, please verify this fixes your problem. Jeff, after all, we
>> seem to need this. :-(
>
> Even though you have a different libata-core.c than I, the patch does work!

The different part is in libata devel tree and the rejection at the
variable declaration doesn't matter as long as you add the variable.
Thanks for testing. :-)

--
tejun

2007-05-10 00:08:41

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] libata: fallback to the other IDENTIFY on device error

Tejun Heo wrote:
> It seems the world isn't as frank as we thought and some devices lie
> about who they are. Fallback to the other IDENTIFY if IDENTIFY fails
> with device error. As this is the strategy used by IDE for a long
> time, it shouldn't cause too much problem.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: William Thompson <[email protected]>
> ---
> Willam, please verify this fixes your problem. Jeff, after all, we
> seem to need this. :-(
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index ca67484..f543109 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -1652,7 +1652,7 @@ int ata_dev_read_id(struct ata_device *d
> struct ata_taskfile tf;
> unsigned int err_mask = 0;
> const char *reason;
> - int tried_spinup = 0;
> + int may_fallback = 1, tried_spinup = 0;
> int rc;
>
> if (ata_msg_ctl(ap))
> @@ -1696,11 +1696,30 @@ int ata_dev_read_id(struct ata_device *d
> return -ENOENT;
> }
>
> + if (may_fallback && (err_mask == AC_ERR_DEV)) {
> + may_fallback = 0;
> +
> + /* Device or controller might have reported
> + * the wrong device class. Give a shot at then
> + * other IDENTIFY
> + */
> + if (class == ATA_DEV_ATA)
> + class = ATA_DEV_ATAPI;
> + else
> + class = ATA_DEV_ATA;

The error handling should be more specific -- check and make sure the
error is 'command aborted'

Jeff



2007-05-11 12:35:52

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2

It seems the world isn't as frank as we thought and some devices lie
about who they are. Fallback to the other IDENTIFY if IDENTIFY is
aborted by the device. As this is the strategy used by IDE for a long
time, it shouldn't cause too much problem.

Signed-off-by: Tejun Heo <[email protected]>
Cc: William Thompson <[email protected]>
---
Updated to fallback iff IDENTIFY is aborted by the device.

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 4595d1f..96a7c6f 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1654,7 +1654,7 @@ int ata_dev_read_id(struct ata_device *d
struct ata_taskfile tf;
unsigned int err_mask = 0;
const char *reason;
- int tried_spinup = 0;
+ int may_fallback = 1, tried_spinup = 0;
int rc;

if (ata_msg_ctl(ap))
@@ -1698,11 +1698,31 @@ int ata_dev_read_id(struct ata_device *d
return -ENOENT;
}

+ /* Device or controller might have reported the wrong
+ * device class. Give a shot at the other IDENTIFY if
+ * the current one is aborted by the device.
+ */
+ if (may_fallback &&
+ (err_mask == AC_ERR_DEV) && (tf.feature & ATA_ABORTED)) {
+ may_fallback = 0;
+
+ if (class == ATA_DEV_ATA)
+ class = ATA_DEV_ATAPI;
+ else
+ class = ATA_DEV_ATA;
+ goto retry;
+ }
+
rc = -EIO;
reason = "I/O error";
goto err_out;
}

+ /* Falling back doesn't make sense if ID data was read
+ * successfully at least once.
+ */
+ may_fallback = 0;
+
swap_buf_le16(id, ATA_ID_WORDS);

/* sanity check */

2007-05-11 22:10:29

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2

Tejun Heo wrote:
> It seems the world isn't as frank as we thought and some devices lie
> about who they are. Fallback to the other IDENTIFY if IDENTIFY is
> aborted by the device. As this is the strategy used by IDE for a long
> time, it shouldn't cause too much problem.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: William Thompson <[email protected]>
> ---
> Updated to fallback iff IDENTIFY is aborted by the device.

applied


2007-05-11 22:10:59

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2

Tejun Heo wrote:
> + if (class == ATA_DEV_ATA)
> + class = ATA_DEV_ATAPI;
> + else
> + class = ATA_DEV_ATA;


the 'else' branch is obviously redundant

2007-05-13 12:59:20

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2

Jeff Garzik wrote:
> Tejun Heo wrote:
>> + if (class == ATA_DEV_ATA)
>> + class = ATA_DEV_ATAPI;
>> + else
>> + class = ATA_DEV_ATA;
>
>
> the 'else' branch is obviously redundant

Why? We can also fallback from ATAPI to ATA.

--
tejun

2007-05-13 16:14:21

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2

On Sun, May 13, 2007 at 02:57:44PM +0200, Tejun Heo wrote:
> Jeff Garzik wrote:
> > Tejun Heo wrote:
> >> + if (class == ATA_DEV_ATA)
> >> + class = ATA_DEV_ATAPI;
> >> + else
> >> + class = ATA_DEV_ATA;
> >
> >
> > the 'else' branch is obviously redundant
>
> Why? We can also fallback from ATAPI to ATA.

Then did you mean to write..

+ if (class == ATA_DEV_ATA)
+ class = ATA_DEV_ATAPI;
+ else if (class == ATA_DEV_ATAPI)
+ class = ATA_DEV_ATA;

?

Otherwise, as Jeff mentions, you're doing a redundant assignment
in the else branch.

Dave

--
http://www.codemonkey.org.uk

2007-05-13 17:51:44

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2

Dave Jones wrote:
> On Sun, May 13, 2007 at 02:57:44PM +0200, Tejun Heo wrote:
> > Jeff Garzik wrote:
> > > Tejun Heo wrote:
> > >> + if (class == ATA_DEV_ATA)
> > >> + class = ATA_DEV_ATAPI;
> > >> + else
> > >> + class = ATA_DEV_ATA;
> > >
> > >
> > > the 'else' branch is obviously redundant
> >
> > Why? We can also fallback from ATAPI to ATA.
>
> Then did you mean to write..
>
> + if (class == ATA_DEV_ATA)
> + class = ATA_DEV_ATAPI;
> + else if (class == ATA_DEV_ATAPI)
> + class = ATA_DEV_ATA;
>
> ?
>
> Otherwise, as Jeff mentions, you're doing a redundant assignment
> in the else branch.

Hmmm... I'm feeling very dense today. At that point, class is either
ATA_DEV_ATA or ATA_DEV_ATAPI. The if-else clause tries to flip between
the two.

1. if class == ATA_DEV_ATA, the 'if' test succeeds and "class =
ATA_DEV_ATAPI" runs, so it flips correctly.

2. if class == ATA_DEV_ATAPI, the 'if' test fails and "class =
ATA_DEV_ATA" runs, so it flips correctly.

What am I missing here? Feel free to scream at me and hammer me into
senses. :-)

--
tejun

2007-05-13 18:14:43

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2

On Sun, May 13, 2007 at 07:50:16PM +0200, Tejun Heo wrote:
> > Otherwise, as Jeff mentions, you're doing a redundant assignment
> > in the else branch.
>
> Hmmm... I'm feeling very dense today. At that point, class is either
> ATA_DEV_ATA or ATA_DEV_ATAPI. The if-else clause tries to flip between
> the two.
>
> 1. if class == ATA_DEV_ATA, the 'if' test succeeds and "class =
> ATA_DEV_ATAPI" runs, so it flips correctly.
>
> 2. if class == ATA_DEV_ATAPI, the 'if' test fails and "class =
> ATA_DEV_ATA" runs, so it flips correctly.
>
> What am I missing here? Feel free to scream at me and hammer me into
> senses. :-)

actually, I think I'm denser today. Ignore that last mail.

Dave

--
http://www.codemonkey.org.uk