2004-11-20 18:42:19

by Alan Chandler

[permalink] [raw]
Subject: ide-cd problem

I have been trying to track down why all attempts to burn a cd on my ide cdrw
fails (see bug #3741 on bugzilla ), with a subprocess of cdrecord ending up
hanging in uninterruptable sleep state.

I think I understand what is happening, I just don't know what to do about it.

Inside drivers/ide/ide-cd.c

the ide_do_rw_cdrom routine has been called via a request with the request
flags having the REQ_BLOCK_PC flag set. The request->data_len of this
request is set to 0.

This request is sent to the device and it generates interrupts to eventually
land it up inside the routine cdrom_newpc_intr.

At this point the status register on the hardware is set to 0x58 - implying, I
think that the DRQ_STAT bit is set and that something should be sent to the
device.

Normally, because the requested data_len is not zero, the data is sent. In
this case however, because the original request had nothing to send, the
while/if clauses to initiate a new transfer are skipped and the routine ends
up setting a new interrupt handler address and returning to await an
interrupt that will never come.

Question: should something validate that the request length is not zero
earlier, or should there be a check in ide-cd.c, or is it my hardware (its a
generic cd read/rewriter which announces itself as 'CW078D CD-R/RW')
--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi


2004-11-20 19:48:37

by Jens Axboe

[permalink] [raw]
Subject: Re: ide-cd problem

On Sat, Nov 20 2004, Alan Chandler wrote:
> I have been trying to track down why all attempts to burn a cd on my
> ide cdrw fails (see bug #3741 on bugzilla ), with a subprocess of
> cdrecord ending up hanging in uninterruptable sleep state.
>
> I think I understand what is happening, I just don't know what to do
> about it.
>
> Inside drivers/ide/ide-cd.c
>
> the ide_do_rw_cdrom routine has been called via a request with the
> request flags having the REQ_BLOCK_PC flag set. The request->data_len
> of this request is set to 0.
>
> This request is sent to the device and it generates interrupts to
> eventually land it up inside the routine cdrom_newpc_intr.
>
> At this point the status register on the hardware is set to 0x58 -
> implying, I think that the DRQ_STAT bit is set and that something
> should be sent to the device.
>
> Normally, because the requested data_len is not zero, the data is
> sent. In this case however, because the original request had nothing
> to send, the while/if clauses to initiate a new transfer are skipped
> and the routine ends up setting a new interrupt handler address and
> returning to await an interrupt that will never come.

The big question is - what does the original command look like? Just
dumping rq->cmd[0] would be a big help, but really just put code in
sg_io() in block/scsi_ioctl.c to dump the completed sg_io_hdr_t and send
that.

> Question: should something validate that the request length is not
> zero earlier, or should there be a check in ide-cd.c, or is it my
> hardware (its a generic cd read/rewriter which announces itself as
> 'CW078D CD-R/RW')

It's hard to know, you would have to parse every command type to verify
if the dxfer_len made sense or not. It's perfectly possible to generate
a command that would hang the drive as you describe above, only to be
aborted after it times out.

--
Jens Axboe

2004-11-21 00:53:51

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Saturday 20 November 2004 19:47, Jens Axboe wrote:
> On Sat, Nov 20 2004, Alan Chandler wrote:
...
> > Normally, because the requested data_len is not zero, the data is
> > sent. In this case however, because the original request had nothing
> > to send, the while/if clauses to initiate a new transfer are skipped
> > and the routine ends up setting a new interrupt handler address and
> > returning to await an interrupt that will never come.
>
> The big question is - what does the original command look like? Just
> dumping rq->cmd[0] would be a big help, but really just put code in
> sg_io() in block/scsi_ioctl.c to dump the completed sg_io_hdr_t and send
> that.

I haven't dumped the whole request header, but the command (after it has been
retrieved from the user) and the dxfer_length. Is there anything else I
should dump?

Here is the output leading up to the point where ide-cd hangs because the IO
is just left pending

Nov 21 00:44:20 kanger kernel: sg_io command length 10
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
Nov 21 00:44:20 kanger kernel: sg_io command length 10
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
Nov 21 00:44:20 kanger kernel: sg_io command length 10
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
Nov 21 00:44:20 kanger kernel: sg_io command length 10
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
Nov 21 00:44:20 kanger kernel: sg_io command length 6
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 0
Nov 21 00:44:20 kanger kernel: sg_io command length 6
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x1b
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x3
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 0
Nov 21 00:45:00 kanger kernel: hdc: lost interrupt
Nov 21 00:45:40 kanger kernel: hdc: lost interrupt
Nov 21 00:47:00 kanger last message repeated 2 times
Nov 21 00:47:40 kanger kernel: hdc: lost interrupt



--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi

2004-11-21 08:57:30

by Jens Axboe

[permalink] [raw]
Subject: Re: ide-cd problem

On Sun, Nov 21 2004, Alan Chandler wrote:
> On Saturday 20 November 2004 19:47, Jens Axboe wrote:
> > On Sat, Nov 20 2004, Alan Chandler wrote:
> ...
> > > Normally, because the requested data_len is not zero, the data is
> > > sent. In this case however, because the original request had nothing
> > > to send, the while/if clauses to initiate a new transfer are skipped
> > > and the routine ends up setting a new interrupt handler address and
> > > returning to await an interrupt that will never come.
> >
> > The big question is - what does the original command look like? Just
> > dumping rq->cmd[0] would be a big help, but really just put code in
> > sg_io() in block/scsi_ioctl.c to dump the completed sg_io_hdr_t and send
> > that.
>
> I haven't dumped the whole request header, but the command (after it has been
> retrieved from the user) and the dxfer_length. Is there anything else I
> should dump?

No that's fine, that's all I need.

> Here is the output leading up to the point where ide-cd hangs because the IO
> is just left pending
>
> Nov 21 00:44:20 kanger kernel: sg_io command length 10
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
> Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
> Nov 21 00:44:20 kanger kernel: sg_io command length 10
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
> Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
> Nov 21 00:44:20 kanger kernel: sg_io command length 10
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
> Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
> Nov 21 00:44:20 kanger kernel: sg_io command length 10
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
> Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
> Nov 21 00:44:20 kanger kernel: sg_io command length 6
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 0
> Nov 21 00:44:20 kanger kernel: sg_io command length 6
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x1b
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x3
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 0
> Nov 21 00:45:00 kanger kernel: hdc: lost interrupt
> Nov 21 00:45:40 kanger kernel: hdc: lost interrupt
> Nov 21 00:47:00 kanger last message repeated 2 times
> Nov 21 00:47:40 kanger kernel: hdc: lost interrupt

So the last request is a START_STOP unit, which doesn't transfer any
data. If the drive has DRQ_STAT stat set here, it looks very odd. Any
chance you could instrument cdrom_newpc_intr() as well to dump status
bytes and expected transfer lengths from the drive?

--
Jens Axboe

2004-11-21 10:25:31

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Sunday 21 November 2004 08:56, Jens Axboe wrote:
> On Sun, Nov 21 2004, Alan Chandler wrote:
...
> > Nov 21 00:44:20 kanger kernel: sg_io command length 6
> > Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x1b
> > Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> > Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> > Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> > Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x3
> > Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> > Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 0
> > Nov 21 00:45:00 kanger kernel: hdc: lost interrupt
> > Nov 21 00:45:40 kanger kernel: hdc: lost interrupt
> > Nov 21 00:47:00 kanger last message repeated 2 times
> > Nov 21 00:47:40 kanger kernel: hdc: lost interrupt
>
> So the last request is a START_STOP unit, which doesn't transfer any
> data. If the drive has DRQ_STAT stat set here, it looks very odd. Any
> chance you could instrument cdrom_newpc_intr() as well to dump status
> bytes and expected transfer lengths from the drive?

As below - note I have also got a printk in cdrom_timer_expiry () - but
nothing there. I have included a couple of commands before the key one, and
also a few 'lost interrupt' calls after the lock to show how it then doesn't
recover. (and I made a spelling error in one of the messages the message -
seld should mean self).

Nov 21 10:13:44 kanger kernel: scsi_cmd_ioctl - cmd = 0x2285
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io cmd [0] = 0x3c
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io dxfer_len = 64512
Nov 21 10:13:44 kanger kernel: scsi_ioctl: about to execute cmd 0x3c
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_do_block_pc
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_start_packet_command - xferlen =
64512 - dma = 1
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (before decode status)
- dma = 1 dma_error = 0x0
Nov 21 10:13:44 kanger kernel: scsi_ioctl: completed command with status 0x0
Nov 21 10:13:44 kanger kernel: scsi_cmd_ioctl - cmd = 0x2285
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io cmd [0] = 0x3c
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io dxfer_len = 64512
Nov 21 10:13:44 kanger kernel: scsi_ioctl: about to execute cmd 0x3c
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_do_block_pc
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_start_packet_command - xferlen =
64512 - dma = 1
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (before decode status)
- dma = 1 dma_error = 0x0
Nov 21 10:13:44 kanger kernel: scsi_ioctl: completed command with status 0x0
Nov 21 10:13:44 kanger kernel: scsi_cmd_ioctl - cmd = 0x2285
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io cmd [0] = 0x0
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io dxfer_len = 0
Nov 21 10:13:44 kanger kernel: scsi_ioctl: about to execute cmd 0x0
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_do_block_pc
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_start_packet_command - xferlen = 0
- dma = 0
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (before decode status)
- dma = 0 dma_error = 0x0
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (in pio after reading
registers) ireason = 2 len = 0 stat = 0x58
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (DRQ not clear) - rq
cmd[0] = 0x0 rq len = 0
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (about to call
ide_set_handler with seld as rentry) timeout = 40000
Nov 21 10:14:24 kanger kernel: hdc: lost interrupt
Nov 21 10:14:24 kanger kernel: ide-cd:cdrom_newpc_intr (before decode status)
- dma = 0 dma_error = 0x0
Nov 21 10:14:24 kanger kernel: ide-cd:cdrom_newpc_intr (in pio after reading
registers) ireason = 2 len = 0 stat = 0x58
Nov 21 10:14:24 kanger kernel: ide-cd:cdrom_newpc_intr (DRQ not clear) - rq
cmd[0] = 0x0 rq len = 0
Nov 21 10:14:24 kanger kernel: ide-cd:cdrom_newpc_intr (about to call
ide_set_handler with seld as rentry) timeout = 40000
Nov 21 10:15:04 kanger kernel: hdc: lost interrupt
Nov 21 10:15:04 kanger kernel: ide-cd:cdrom_newpc_intr (before decode status)
- dma = 0 dma_error = 0x0
Nov 21 10:15:04 kanger kernel: ide-cd:cdrom_newpc_intr (in pio after reading
registers) ireason = 2 len = 0 stat = 0x58
Nov 21 10:15:04 kanger kernel: ide-cd:cdrom_newpc_intr (DRQ not clear) - rq
cmd[0] = 0x0 rq len = 0
Nov 21 10:15:04 kanger kernel: ide-cd:cdrom_newpc_intr (about to call
ide_set_handler with seld as rentry) timeout = 40000



--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi

2004-11-21 16:20:19

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Sunday 21 November 2004 10:25, Alan Chandler wrote:
...
> 0x0 Nov 21 10:13:44 kanger kernel: scsi_cmd_ioctl - cmd = 0x2285
> Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io cmd [0] = 0x0
...
> Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (in pio after
> reading registers) ireason = 2 len = 0 stat = 0x58
> Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (DRQ not clear) - rq
> cmd[0] = 0x0 rq len = 0
> Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (about to call
> ide_set_handler with seld as rentry) timeout = 40000
> Nov 21 10:14:24 kanger kernel: hdc: lost interrupt

This seems to be some combination of frequently occuring timing problem, and
the difference treatment in cdrom_newpc_intr to cdrom_pc_intr

I instrumented the latter interrupt routine as well as the first, and got the
following whilst loading up the system. It seems that there are several
times where DRQ is asserted initially, but after a timeout no longer is.

ide-cd:cdrom_do_packet_command
ide-cd:cdrom_start_packet_command - xferlen = 24 - dma = 0
ide-cd:cdrpm_pc_intr - from command 0x5a
ide-cd:cdrom_pc_intr - stat = 0x58 ireason = 2 len = 24
ide-cd:cdrpm_pc_intr - from command 0x5a
ide-cd:cdrom_pc_intr - stat = 0x50 ireason = 3 len = 24
hdc: ATAPI 48X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-cd:cdrom_do_packet_command
ide-cd:cdrom_start_packet_command - xferlen = 0 - dma = 0
ide-cd:cdrpm_pc_intr - from command 0x0
ide-cd:cdrom_pc_intr - stat = 0x50 ireason = 3 len = 0
ide-cd:cdrom_do_packet_command
ide-cd:cdrom_start_packet_command - xferlen = 8 - dma = 0
ide-cd:cdrpm_pc_intr - from command 0x25
ide-cd:cdrom_pc_intr - stat = 0x58 ireason = 2 len = 8
ide-cd:cdrpm_pc_intr - from command 0x25
ide-cd:cdrom_pc_intr - stat = 0x50 ireason = 3 len = 8
ide-cd:cdrom_do_packet_command
ide-cd:cdrom_start_packet_command - xferlen = 4 - dma = 0
ide-cd:cdrpm_pc_intr - from command 0x43
ide-cd:cdrom_do_packet_command
ide-cd:cdrom_start_packet_command - xferlen = 18 - dma = 0
ide-cd:cdrpm_pc_intr - from command 0x3
ide-cd:cdrom_pc_intr - stat = 0x58 ireason = 2 len = 18
ide-cd:cdrpm_pc_intr - from command 0x3
ide-cd:cdrom_pc_intr - stat = 0x50 ireason = 3 len = 18

The other point is that the point of error is not entirely consistent between
runs. Occasssionally the 0x0 command succeeds and moves on to the 0x1b
command

[It might also be worth pointing out here that I am using 2.6.9 and tried with
2.6.10-rc2 and could not even get the system to get past the cd-rom
initialisation stage. I haven't been able to find out what might be
different between the two ways of initialising the system - I can't see much
difference inside ide-cd.c]


--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi

2004-11-22 07:52:32

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Sunday 21 November 2004 16:13, Alan Chandler wrote:
...
>
> This seems to be some combination of frequently occuring timing problem,
> and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr

I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of DRQ
being set when there was no data to transfer disappeared. It appears that my
hardware is too slow.

I have been reading the ATA/ATAPI - 6 spec, and it implies that the state of
DRQ line need one pio cycle before being correct and that you should read the
alternative status register to achieve this. I tried a simple

HWIF(drive)->INB( IDE_ALTSTATUS_REG);

But that made no difference.



--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi

2004-11-22 08:04:15

by Jens Axboe

[permalink] [raw]
Subject: Re: ide-cd problem

On Mon, Nov 22 2004, Alan Chandler wrote:
> On Sunday 21 November 2004 16:13, Alan Chandler wrote:
> ...
> >
> > This seems to be some combination of frequently occuring timing problem,
> > and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr
>
> I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of
> DRQ being set when there was no data to transfer disappeared. It
> appears that my hardware is too slow.
>
> I have been reading the ATA/ATAPI - 6 spec, and it implies that the
> state of DRQ line need one pio cycle before being correct and that you
> should read the alternative status register to achieve this. I tried
> a simple
>
> HWIF(drive)->INB( IDE_ALTSTATUS_REG);
>
> But that made no difference.

ALTSTATUS read should be fine as well, but the implicit delay is
probably better.

Is this enough to fix it? For ->drq_interrupt we already should have
an adequate delay, Alan fixed this one recently.

===== drivers/ide/ide-cd.c 1.97 vs edited =====
--- 1.97/drivers/ide/ide-cd.c 2004-11-07 02:54:41 +01:00
+++ edited/drivers/ide/ide-cd.c 2004-11-22 08:58:15 +01:00
@@ -890,8 +890,13 @@
ide_execute_command(drive, WIN_PACKETCMD, handler, ATAPI_WAIT_PC, cdrom_timer_expiry);
return ide_started;
} else {
+ unsigned long flags;
+
/* packet command */
- HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+ spin_lock_irqsave(&ide_lock, flags);
+ HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
+ ndelay(400);
+ spin_unlock_irqrestore(&ide_lock, flags);
return (*handler) (drive);
}
}

--
Jens Axboe

2004-11-22 10:30:46

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

Jens Axboe writes:

> On Mon, Nov 22 2004, Alan Chandler wrote:
>> On Sunday 21 November 2004 16:13, Alan Chandler wrote:
>> ...
>> >
>> > This seems to be some combination of frequently occuring timing problem,
>> > and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr
>>
>> I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of
>> DRQ being set when there was no data to transfer disappeared. It
>> appears that my hardware is too slow.
>>
>> I have been reading the ATA/ATAPI - 6 spec, and it implies that the
>> state of DRQ line need one pio cycle before being correct and that you
>> should read the alternative status register to achieve this. I tried
>> a simple
>>
>> HWIF(drive)->INB( IDE_ALTSTATUS_REG);
>>
>> But that made no difference.
>
> ALTSTATUS read should be fine as well, but the implicit delay is
> probably better.
>

I don't know why, but the ALTSTATUS read did NOT work when I tried it
yesterday (am currently at work using web mail to access my mail - can't do
more until this evening). Its possible I put it in the wrong place (ie
after the cdrom_decode_status call, but I don't think so.

The ndelay(400) did work.

> Is this enough to fix it? For ->drq_interrupt we already should have
> an adequate delay, Alan fixed this one recently.
>

Yes, I had included this patch quite early in my process of tracking the
problem down (when I corrected it like you have (add the drive parameter to
the OUTBSYNC macro like you have, you also need to declare an unsigned long
flags at the head of the routine that was also not in that patch). Indeed
it was this that was the inspiration for the 400 nanosecs in my change. I
have no idea what the right number should be


Alan Chandler
[email protected]

2004-11-22 10:56:13

by Jens Axboe

[permalink] [raw]
Subject: Re: ide-cd problem

On Mon, Nov 22 2004, Alan Chandler wrote:
> Jens Axboe writes:
>
> >On Mon, Nov 22 2004, Alan Chandler wrote:
> >>On Sunday 21 November 2004 16:13, Alan Chandler wrote:
> >>...
> >>>
> >>> This seems to be some combination of frequently occuring timing problem,
> >>> and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr
> >>
> >>I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of
> >>DRQ being set when there was no data to transfer disappeared. It
> >>appears that my hardware is too slow.
> >>
> >>I have been reading the ATA/ATAPI - 6 spec, and it implies that the
> >>state of DRQ line need one pio cycle before being correct and that you
> >>should read the alternative status register to achieve this. I tried
> >>a simple
> >>
> >>HWIF(drive)->INB( IDE_ALTSTATUS_REG);
> >>
> >>But that made no difference.
> >
> >ALTSTATUS read should be fine as well, but the implicit delay is
> >probably better.
> >
>
> I don't know why, but the ALTSTATUS read did NOT work when I tried it
> yesterday (am currently at work using web mail to access my mail - can't do
> more until this evening). Its possible I put it in the wrong place (ie
> after the cdrom_decode_status call, but I don't think so.
>
> The ndelay(400) did work.
>
> >Is this enough to fix it? For ->drq_interrupt we already should have
> >an adequate delay, Alan fixed this one recently.
> >
>
> Yes, I had included this patch quite early in my process of tracking
> the problem down (when I corrected it like you have (add the drive
> parameter to the OUTBSYNC macro like you have, you also need to
> declare an unsigned long flags at the head of the routine that was
> also not in that patch). Indeed it was this that was the inspiration
> for the 400 nanosecs in my change. I have no idea what the right
> number should be

400ns is the correctl value. Your writing is a little unclear to me -
did it work or not, with that change alone?

--
Jens Axboe

2004-11-22 11:30:15

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

Jens Axboe writes:

> On Mon, Nov 22 2004, Alan Chandler wrote:
>> Jens Axboe writes:
>>
>> >On Mon, Nov 22 2004, Alan Chandler wrote:
>> >>On Sunday 21 November 2004 16:13, Alan Chandler wrote:
>> >>...
>> >>>
>> >>> This seems to be some combination of frequently occuring timing problem,
>> >>> and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr
>> >>
>> >>I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of
>> >>DRQ being set when there was no data to transfer disappeared. It
>> >>appears that my hardware is too slow.
>> >>
>> >>I have been reading the ATA/ATAPI - 6 spec, and it implies that the
>> >>state of DRQ line need one pio cycle before being correct and that you
>> >>should read the alternative status register to achieve this. I tried
>> >>a simple
>> >>
>> >>HWIF(drive)->INB( IDE_ALTSTATUS_REG);
>> >>
>> >>But that made no difference.
>> >
>> >ALTSTATUS read should be fine as well, but the implicit delay is
>> >probably better.
>> >
>>
>> I don't know why, but the ALTSTATUS read did NOT work when I tried it
>> yesterday (am currently at work using web mail to access my mail - can't do
>> more until this evening). Its possible I put it in the wrong place (ie
>> after the cdrom_decode_status call, but I don't think so.
>>
>> The ndelay(400) did work.
>>
>> >Is this enough to fix it? For ->drq_interrupt we already should have
>> >an adequate delay, Alan fixed this one recently.
>> >
>>
>> Yes, I had included this patch quite early in my process of tracking
>> the problem down (when I corrected it like you have (add the drive
>> parameter to the OUTBSYNC macro like you have, you also need to
>> declare an unsigned long flags at the head of the routine that was
>> also not in that patch). Indeed it was this that was the inspiration
>> for the 400 nanosecs in my change. I have no idea what the right
>> number should be
>
> 400ns is the correctl value. Your writing is a little unclear to me -
> did it work or not, with that change alone?
>

To be clear ...


I have modified ide-cd.c with

1) ndelay(400) at the head of cdrom_newpc_intr()

2) Alan Cox's patch in the place he originally identified for it to go

3) Some printk's in cdrom_newpc_intr() after the point where it reads the
status and IREASON and length registers and just for the purposes of
diagnostics.

With only those changes it now works.



Alan Chandler
[email protected]

2004-11-22 11:32:50

by Jens Axboe

[permalink] [raw]
Subject: Re: ide-cd problem

On Mon, Nov 22 2004, Alan Chandler wrote:
> Jens Axboe writes:
>
> >On Mon, Nov 22 2004, Alan Chandler wrote:
> >>Jens Axboe writes:
> >>
> >>>On Mon, Nov 22 2004, Alan Chandler wrote:
> >>>>On Sunday 21 November 2004 16:13, Alan Chandler wrote:
> >>>>...
> >>>>>
> >>>>> This seems to be some combination of frequently occuring timing
> >>problem,
> >>>>> and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr
> >>>>
> >>>>I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of
> >>>>DRQ being set when there was no data to transfer disappeared. It
> >>>>appears that my hardware is too slow.
> >>>>
> >>>>I have been reading the ATA/ATAPI - 6 spec, and it implies that the
> >>>>state of DRQ line need one pio cycle before being correct and that you
> >>>>should read the alternative status register to achieve this. I tried
> >>>>a simple
> >>>>
> >>>>HWIF(drive)->INB( IDE_ALTSTATUS_REG);
> >>>>
> >>>>But that made no difference.
> >>>
> >>>ALTSTATUS read should be fine as well, but the implicit delay is
> >>>probably better.
> >>>
> >>
> >>I don't know why, but the ALTSTATUS read did NOT work when I tried it
> >>yesterday (am currently at work using web mail to access my mail - can't
> >>do more until this evening). Its possible I put it in the wrong place
> >>(ie after the cdrom_decode_status call, but I don't think so.
> >>
> >>The ndelay(400) did work.
> >>
> >>>Is this enough to fix it? For ->drq_interrupt we already should have
> >>>an adequate delay, Alan fixed this one recently.
> >>>
> >>
> >>Yes, I had included this patch quite early in my process of tracking
> >>the problem down (when I corrected it like you have (add the drive
> >>parameter to the OUTBSYNC macro like you have, you also need to
> >>declare an unsigned long flags at the head of the routine that was
> >>also not in that patch). Indeed it was this that was the inspiration
> >>for the 400 nanosecs in my change. I have no idea what the right
> >>number should be
> >
> >400ns is the correctl value. Your writing is a little unclear to me -
> >did it work or not, with that change alone?
> >
>
> To be clear ...
>
>
> I have modified ide-cd.c with
>
> 1) ndelay(400) at the head of cdrom_newpc_intr()
>
> 2) Alan Cox's patch in the place he originally identified for it to go
>
> 3) Some printk's in cdrom_newpc_intr() after the point where it reads the
> status and IREASON and length registers and just for the purposes of
> diagnostics.
>
> With only those changes it now works.

You are not answering my question :-)

Here's is Alans patch as I posted some mails ago. Does it work with that
alone?? I'm curious of it is enough. It should not be necessary to incur
extra delay in the interrupt handler, if it is invoked from a real irq.

===== drivers/ide/ide-cd.c 1.97 vs edited =====
--- 1.97/drivers/ide/ide-cd.c 2004-11-07 02:54:41 +01:00
+++ edited/drivers/ide/ide-cd.c 2004-11-22 08:58:15 +01:00
@@ -890,8 +890,13 @@
ide_execute_command(drive, WIN_PACKETCMD, handler, ATAPI_WAIT_PC, cdrom_timer_expiry);
return ide_started;
} else {
+ unsigned long flags;
+
/* packet command */
- HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+ spin_lock_irqsave(&ide_lock, flags);
+ HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
+ ndelay(400);
+ spin_unlock_irqrestore(&ide_lock, flags);
return (*handler) (drive);
}
}

--
Jens Axboe

2004-11-22 12:55:19

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

Jens Axboe writes:

> On Mon, Nov 22 2004, Alan Chandler wrote:
>> Jens Axboe writes:

>> >400ns is the correctl value. Your writing is a little unclear to me -
>> >did it work or not, with that change alone?
>> >
>>
>> To be clear ...
>>
>>
>> I have modified ide-cd.c with
>>
>> 1) ndelay(400) at the head of cdrom_newpc_intr()
>>
>> 2) Alan Cox's patch in the place he originally identified for it to go
>>
>> 3) Some printk's in cdrom_newpc_intr() after the point where it reads the
>> status and IREASON and length registers and just for the purposes of
>> diagnostics.
>>
>> With only those changes it now works.
>
> You are not answering my question :-)
>
> Here's is Alans patch as I posted some mails ago. Does it work with that
> alone?? I'm curious of it is enough. It should not be necessary to incur
> extra delay in the interrupt handler, if it is invoked from a real irq.

Sorry, I misunderstood what you meant. I presume you think that the
interrupt may be triggered immediately the command packet has been sent but
before 400ns delay had occurred.

NO - with Alan's patch alone, this did not work.

The delay seesm to be needed in the path between the interrupt occuring and
the IDE_STATUS_REG being read.

I had seen an note on a web site that said that there was two delays
required in the ATA/ATAPI spec - the 400ns which Alan's patch deals with and
a shorter delay (one PIO cycle) between busy being cleared and DRQ reaching
the correct state where the technique had been to read the ALTSTATUS
register. That was why I had tried that approach but found it not to work.

(I have subsequently downloaded a copy of the full spec and haven't been
able to find this - but then its just short of 500 pages of dense text:-)).

Thinking about it now, I tried the ALTSTATUS delay before applying Alan's
patch, so maybe its the some of the two delays that maybe necessary. If you
think its appropriate I will try that again this evening.

--
Alan Chandler
[email protected]

2004-11-22 13:03:46

by Jens Axboe

[permalink] [raw]
Subject: Re: ide-cd problem

On Mon, Nov 22 2004, Alan Chandler wrote:
> Jens Axboe writes:
>
> >On Mon, Nov 22 2004, Alan Chandler wrote:
> >>Jens Axboe writes:
>
> >>>400ns is the correctl value. Your writing is a little unclear to me -
> >>>did it work or not, with that change alone?
> >>>
> >>
> >>To be clear ...
> >>
> >>
> >>I have modified ide-cd.c with
> >>
> >>1) ndelay(400) at the head of cdrom_newpc_intr()
> >>
> >>2) Alan Cox's patch in the place he originally identified for it to go
> >>
> >>3) Some printk's in cdrom_newpc_intr() after the point where it reads the
> >>status and IREASON and length registers and just for the purposes of
> >>diagnostics.
> >>
> >>With only those changes it now works.
> >
> >You are not answering my question :-)
> >
> >Here's is Alans patch as I posted some mails ago. Does it work with that
> >alone?? I'm curious of it is enough. It should not be necessary to incur
> >extra delay in the interrupt handler, if it is invoked from a real irq.
>
> Sorry, I misunderstood what you meant. I presume you think that the
> interrupt may be triggered immediately the command packet has been sent but
> before 400ns delay had occurred.
>
> NO - with Alan's patch alone, this did not work.
>
> The delay seesm to be needed in the path between the interrupt occuring and
> the IDE_STATUS_REG being read.
>
> I had seen an note on a web site that said that there was two delays
> required in the ATA/ATAPI spec - the 400ns which Alan's patch deals with
> and a shorter delay (one PIO cycle) between busy being cleared and DRQ
> reaching the correct state where the technique had been to read the
> ALTSTATUS register. That was why I had tried that approach but found it
> not to work.
> (I have subsequently downloaded a copy of the full spec and haven't been
> able to find this - but then its just short of 500 pages of dense text:-)).
>
> Thinking about it now, I tried the ALTSTATUS delay before applying Alan's
> patch, so maybe its the some of the two delays that maybe necessary. If
> you think its appropriate I will try that again this evening.

I think the more correct patch is the following. It seems I was wrong in
assuming that the ide_intr() path already waited 400ns for us, I think
this should work for you. Can you test it?

===== drivers/ide/ide-iops.c 1.31 vs edited =====
--- 1.31/drivers/ide/ide-iops.c 2004-11-01 18:06:50 +01:00
+++ edited/drivers/ide/ide-iops.c 2004-11-22 13:59:27 +01:00
@@ -476,10 +476,8 @@
if (drive->waiting_for_dma)
return hwif->ide_dma_test_irq(drive);

-#if 0
/* need to guarantee 400ns since last command was issued */
- udelay(1);
-#endif
+ ndelay(400);

#ifdef CONFIG_IDEPCI_SHARE_IRQ
/*

--
Jens Axboe

2004-11-22 19:22:59

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Monday 22 November 2004 13:02, Jens Axboe wrote:

>
> I think the more correct patch is the following. It seems I was wrong in
> assuming that the ide_intr() path already waited 400ns for us, I think
> this should work for you. Can you test it?

Bad news - it didn't work.

It certainly looks as though it should - I am trying to find out what not.


>
> ===== drivers/ide/ide-iops.c 1.31 vs edited =====
> --- 1.31/drivers/ide/ide-iops.c 2004-11-01 18:06:50 +01:00
> +++ edited/drivers/ide/ide-iops.c 2004-11-22 13:59:27 +01:00
> @@ -476,10 +476,8 @@
> if (drive->waiting_for_dma)
> return hwif->ide_dma_test_irq(drive);
>
> -#if 0
> /* need to guarantee 400ns since last command was issued */
> - udelay(1);
> -#endif
> + ndelay(400);
>
> #ifdef CONFIG_IDEPCI_SHARE_IRQ
> /*

--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi

2004-11-22 23:54:10

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Monday 22 November 2004 19:19, Alan Chandler wrote:
> On Monday 22 November 2004 13:02, Jens Axboe wrote:
> > I think the more correct patch is the following. It seems I was wrong in
> > assuming that the ide_intr() path already waited 400ns for us, I think
> > this should work for you. Can you test it?
>
> Bad news - it didn't work.
>
> It certainly looks as though it should - I am trying to find out what not.
>

I meant of course "I am trying to find out why not".

If I make the delay 600ns it works - I guess my hardware is a little off spec.

If I leave the original udelay(1) in there, I get irq timeouts all over the
place.



--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi

2004-11-23 07:14:56

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Monday 22 November 2004 23:48, Alan Chandler wrote:
...
> If I make the delay 600ns it works - I guess my hardware is a little off
> spec.
>

I did a binary chop on the value to find the cut off point between what works
and what doesn't. Its approx 535ns (534 failed, 537 worked).

All this was with 2.6.9,

2.6.10-rc2 is still failing during the cd initialisation on boot. Here I
tried with bot 600ns and 700ns delays in drive_is_ready, but both values fail
with what looks like missed interrupts. I'll try instrumenting this a bit
more to find out what is happening.

--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi

2004-11-23 14:52:08

by Jens Axboe

[permalink] [raw]
Subject: Re: ide-cd problem

On Tue, Nov 23 2004, Alan Chandler wrote:
> On Monday 22 November 2004 23:48, Alan Chandler wrote:
> ...
> > If I make the delay 600ns it works - I guess my hardware is a little off
> > spec.
> >
>
> I did a binary chop on the value to find the cut off point between what works
> and what doesn't. Its approx 535ns (534 failed, 537 worked).
>
> All this was with 2.6.9,
>
> 2.6.10-rc2 is still failing during the cd initialisation on boot. Here I
> tried with bot 600ns and 700ns delays in drive_is_ready, but both values fail
> with what looks like missed interrupts. I'll try instrumenting this a bit
> more to find out what is happening.

It's getting more and more interesting! Look forward to hearing what
your instrumentation brings.

There are other reports of acpi causing interrupt problems with cdroms
in 2.6.10-rc2, so it would be best if you stuck to 2.6.9 for testing
this particular problem.

--
Jens Axboe

2004-11-23 18:39:07

by Jeff Garzik

[permalink] [raw]
Subject: Re: ide-cd problem

Jens Axboe wrote:
> /* packet command */
> - HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> + spin_lock_irqsave(&ide_lock, flags);
> + HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
> + ndelay(400);
> + spin_unlock_irqrestore(&ide_lock, flags);
> return (*handler) (drive);


FWIW this ndelay(400) is required by the spec, when you submit a command.

Jeff


2004-11-23 19:18:23

by Jens Axboe

[permalink] [raw]
Subject: Re: ide-cd problem

On Tue, Nov 23 2004, Jeff Garzik wrote:
> Jens Axboe wrote:
> > /* packet command */
> >- HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> >+ spin_lock_irqsave(&ide_lock, flags);
> >+ HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
> >+ ndelay(400);
> >+ spin_unlock_irqrestore(&ide_lock, flags);
> > return (*handler) (drive);
>
>
> FWIW this ndelay(400) is required by the spec, when you submit a command.

Yes, I know it isn't pulled out of thin air :)

--
Jens Axboe

2004-11-23 21:52:51

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Tuesday 23 November 2004 14:51, Jens Axboe wrote:
> On Tue, Nov 23 2004, Alan Chandler wrote:
> > On Monday 22 November 2004 23:48, Alan Chandler wrote:
> > ...
> >
> > > If I make the delay 600ns it works - I guess my hardware is a little
> > > off spec.
> >
> > I did a binary chop on the value to find the cut off point between what
> > works and what doesn't. Its approx 535ns (534 failed, 537 worked).
> >
> > All this was with 2.6.9,
> >
> > 2.6.10-rc2 is still failing during the cd initialisation on boot. Here I
> > tried with bot 600ns and 700ns delays in drive_is_ready, but both values
> > fail with what looks like missed interrupts. I'll try instrumenting this
> > a bit more to find out what is happening.
>
> It's getting more and more interesting! Look forward to hearing what
> your instrumentation brings.
>
> There are other reports of acpi causing interrupt problems with cdroms
> in 2.6.10-rc2, so it would be best if you stuck to 2.6.9 for testing
> this particular problem.

There is good and bad news related to 2.6.10-rc2

The good news is that the acpi problem was the cause of the startup issues.
adding pci=noacpi to the boot command line fixed that.

The bad news is that with the delay at 800ns in drive_is_ready() I am getting
the exact same symptoms I got with 2.6.9 before upping the delay to over
540ns.

Before, I thought my hardware was a little out of spec - now I think there is
something else at play here.

Firstly, the symptoms are the same between 2.6.9 and 2.6.10-rc2. The halt
seem to always be in exactly the same place. If it was a timing problem I
would have thought it would have varied.

Secondly, the command before seems to have an expectation of a 2048 transfer
rather than the 0 in the command, before the problem and then you get the
strange DRQ=1 but 0 in the len register.

Nov 23 20:37:33 kanger kernel: ide-cd:ide_do_rq_cdrom - cmd = 0x0
Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x0 stat=0x50
ireason=3 len=2048 rq len=0
Nov 23 20:37:33 kanger kernel: ide-cd:ide_do_rq_cdrom - cmd = 0x1b
Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x1b stat=0x58
ireason=2 len=0 rq len=0

I have got myself a copy of the ATA/ATAPI spec (document T13/1410D revision
3), I think I need to read more of it to understand what is the code is
trying to do.



--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi

2004-11-26 23:46:11

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Tuesday 23 November 2004 21:49, Alan Chandler wrote:

>
> Before, I thought my hardware was a little out of spec - now I think there
> is something else at play here.
>

Firstly, I think there might be another race condition like the one Alan Cox
found. I attach a patch below with the fix for that (against 2.6.10-rc2, an
including Alan's patch) I'm not 100% sure its necessary, but it seems fix a
variation I have been seeing.

With it in place, and apart from the ongoing issue - see below, I have managed
to remove the delay in drive_is_ready() altogether without any ill effects.

[my reading of the ATA spec is that 400ns is needed after reading the status
reg before IRQ is removed, I had wondered whether it would be better to
record the time here and then check whether we had used up the 400ns just
before returning from the interrupt state]

> Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x0 stat=0x50
> ireason=3 len=2048 rq len=0
..
> Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x1b stat=0x58
> ireason=2 len=0 rq len=0

I think these two lines hold the crux of the problem. They are the result of
a printk in cdrom_newpc_intr just after reading the interrupt reason register
and byte count registers.

However, I have been unable to get any closer as to why DRQ gets set.

The patch

--- ide-cd.c 2004-11-26 20:29:59.000000000 +0000
+++ ide-cd.patch.c 2004-11-26 20:51:23.000000000 +0000
@@ -890,8 +890,12 @@
ide_execute_command(drive, WIN_PACKETCMD, handler, ATAPI_WAIT_PC,
cdrom_timer_expiry);
return ide_started;
} else {
+ unsigned long flags;
+ spin_lock_irqsave(&ide_lock, flags);
+ HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
+ ndelay(400);
+ spin_unlock_irqrestore(&ide_lock, flags);
/* packet command */
- HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
return (*handler) (drive);
}
}
@@ -938,8 +942,12 @@
cmd_len = ATAPI_MIN_CDB_BYTES;

/* Send the command to the device. */
+ unsigned long flags;
+ spin_lock_irqsave(&ide_lock, flags);
HWIF(drive)->atapi_output_bytes(drive, rq->cmd, cmd_len);
-
+ ndelay(400);
+ spin_unlock_irqrestore(&ide_lock, flags);
+
/* Start the DMA if need be */
if (info->dma)
hwif->dma_start(drive);

--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi

2004-11-27 01:48:12

by Jens Axboe

[permalink] [raw]
Subject: Re: ide-cd problem

On Wed, Nov 24 2004, Alan Cox wrote:
> > the following. It seems I was wrong in
> > assuming that the ide_intr() path already waited 400ns for us, I think
> > this should work for you. Can you test it?
>
> The locking on ide_execute_command and friends is supposed to ensure
> this, can we please keep it out of the IRQ handler once its debugged ?

Something is funky with this drive, it requires an extra delay after it
has raised an interrupt. But we can restrict it to ide-cd once it's
fully understood.

--
Jens Axboe

2004-11-27 01:51:54

by Alan

[permalink] [raw]
Subject: Re: ide-cd problem

> the following. It seems I was wrong in
> assuming that the ide_intr() path already waited 400ns for us, I think
> this should work for you. Can you test it?

The locking on ide_execute_command and friends is supposed to ensure
this, can we please keep it out of the IRQ handler once its debugged ?

2004-11-27 01:51:55

by Alan

[permalink] [raw]
Subject: Re: ide-cd problem

On Iau, 2004-11-25 at 15:29, Jens Axboe wrote:
> Something is funky with this drive, it requires an extra delay after it
> has raised an interrupt. But we can restrict it to ide-cd once it's
> fully understood.

You are assuming its a drive issue. Some controllers have errata around
this area that might be involved too. We can certainly stick a quirk for
IRQ delay into the drive tho

2004-11-27 02:11:52

by Jens Axboe

[permalink] [raw]
Subject: Re: ide-cd problem

On Thu, Nov 25 2004, Alan Cox wrote:
> On Iau, 2004-11-25 at 15:29, Jens Axboe wrote:
> > Something is funky with this drive, it requires an extra delay after it
> > has raised an interrupt. But we can restrict it to ide-cd once it's
> > fully understood.
>
> You are assuming its a drive issue. Some controllers have errata around
> this area that might be involved too. We can certainly stick a quirk for
> IRQ delay into the drive tho

That's true - Alan (Chandler), can you provide an lspci of the system as
well?

--
Jens Axboe

2004-11-27 06:02:59

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Thursday 25 November 2004 18:12, Jens Axboe wrote:
> On Thu, Nov 25 2004, Alan Cox wrote:
> > On Iau, 2004-11-25 at 15:29, Jens Axboe wrote:
> > > Something is funky with this drive, it requires an extra delay after it
> > > has raised an interrupt. But we can restrict it to ide-cd once it's
> > > fully understood.
> >
> > You are assuming its a drive issue. Some controllers have errata around
> > this area that might be involved too. We can certainly stick a quirk for
> > IRQ delay into the drive tho
>
> That's true - Alan (Chandler), can you provide an lspci of the system as
> well?

lspci -vvx output

0000:00:00.0 Host bridge: Silicon Integrated Systems [SiS] 735 Host (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort+ >SERR- <PERR-
Latency: 32
Region 0: Memory at d0000000 (32-bit, non-prefetchable) [size=64M]
Capabilities: <available only to root>
00: 39 10 35 07 07 00 10 22 01 00 00 06 00 20 80 00
10: 00 00 00 d0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 c0 00 00 00 00 00 00 00 00 00 00 00

0000:00:01.0 PCI bridge: Silicon Integrated Systems [SiS] Virtual PCI-to-PCI
bridge (AGP) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 0000b000-0000bfff
Memory behind bridge: cfe00000-cfefffff
Prefetchable memory behind bridge: afc00000-cfcfffff
BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
00: 39 10 01 00 07 01 00 00 00 00 04 06 00 40 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 40 b0 b0 00 20
20: e0 cf e0 cf c0 af c0 cf 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0a 00

0000:00:02.0 ISA bridge: Silicon Integrated Systems [SiS] SiS85C503/5513 (LPC
Bridge)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 0
00: 39 10 18 00 0f 00 00 02 00 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:02.1 SMBus: Silicon Integrated Systems [SiS]: Unknown device 0016
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Interrupt: pin B routed to IRQ 5
Region 4: I/O ports at 0c00 [size=32]
00: 39 10 16 00 01 00 00 02 00 00 05 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 0c 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00

0000:00:02.2 USB Controller: Silicon Integrated Systems [SiS] USB 1.0
Controller (rev 07) (prog-if 10 [OHCI])
Subsystem: Elitegroup Computer Systems: Unknown device 0a14
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64 (20000ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin D routed to IRQ 11
Region 0: Memory at cfffe000 (32-bit, non-prefetchable) [size=4K]
00: 39 10 01 70 17 01 80 02 07 10 03 0c 08 40 00 00
10: 00 e0 ff cf 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 19 10 14 0a
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 04 00 50

0000:00:02.3 USB Controller: Silicon Integrated Systems [SiS] USB 1.0
Controller (rev 07) (prog-if 10 [OHCI])
Subsystem: Elitegroup Computer Systems: Unknown device 0a14
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR+
Latency: 64 (20000ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 10
Region 0: Memory at cffff000 (32-bit, non-prefetchable) [size=4K]
00: 39 10 01 70 17 01 80 82 07 10 03 0c 08 40 00 00
10: 00 f0 ff cf 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 19 10 14 0a
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 50

0000:00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] (rev
d0) (prog-if 80 [Master])
Subsystem: Silicon Integrated Systems [SiS] SiS5513 EIDE Controller (A,B
step)
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 128
Region 4: I/O ports at ff00 [size=16]
00: 39 10 13 55 05 00 00 00 d0 80 01 01 00 80 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 ff 00 00 00 00 00 00 00 00 00 00 39 10 13 55
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:0b.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev
07)
Subsystem: Creative Labs SBLive! Player 5.1
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64 (500ns min, 5000ns max)
Interrupt: pin A routed to IRQ 5
Region 0: I/O ports at d800 [size=32]
Capabilities: <available only to root>
00: 02 11 02 00 05 01 90 02 07 00 01 04 00 40 80 00
10: 01 d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 02 11 61 80
30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 02 14

0000:00:0b.1 Input device controller: Creative Labs SB Live! MIDI/Game Port
(rev 07)
Subsystem: Creative Labs Gameport Joystick
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64
Region 0: I/O ports at dc00 [size=8]
Capabilities: <available only to root>
00: 02 11 02 70 05 01 90 02 07 00 80 09 00 40 80 00
10: 01 dc 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 02 11 20 00
30: 00 00 00 00 dc 00 00 00 00 00 00 00 00 00 00 00

0000:00:0f.0 Ethernet controller: 3Com Corporation 3cSOHO100-TX Hurricane (rev
30)
Subsystem: 3Com Corporation 3cSOHO100-TX Hurricane
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64 (2500ns min, 2500ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at d400 [size=128]
Region 1: Memory at cfffdf80 (32-bit, non-prefetchable) [size=128]
Expansion ROM at cffc0000 [disabled] [size=128K]
Capabilities: <available only to root>
00: b7 10 46 76 17 01 10 02 30 00 00 02 08 40 00 00
10: 01 d4 00 00 80 df ff cf 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 10 46 76
30: 00 00 fc cf dc 00 00 00 00 00 00 00 0b 01 0a 0a

0000:01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon
9200] (rev 01) (prog-if 00 [VGA])
Subsystem: Giga-byte Technology: Unknown device 4018
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64 (2000ns min), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 11
Region 0: Memory at c0000000 (32-bit, prefetchable) [size=128M]
Region 1: I/O ports at b800 [size=256]
Region 2: Memory at cfef0000 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at cfec0000 [disabled] [size=128K]
Capabilities: <available only to root>
00: 02 10 61 59 07 01 b0 02 01 00 00 03 08 40 80 00
10: 08 00 00 c0 01 b8 00 00 00 00 ef cf 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 58 14 18 40
30: 00 00 ec cf 58 00 00 00 00 00 00 00 00 01 08 00

0000:01:00.1 Display controller: ATI Technologies Inc RV280 [Radeon 9200]
(Secondary) (rev 01)
Subsystem: Giga-byte Technology: Unknown device 4019
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64 (2000ns min), Cache Line Size: 0x08 (32 bytes)
Region 0: Memory at b8000000 (32-bit, prefetchable) [size=128M]
Region 1: Memory at cfee0000 (32-bit, non-prefetchable) [size=64K]
Capabilities: <available only to root>
00: 02 10 41 59 07 00 b0 02 01 00 80 03 08 40 00 00
10: 08 00 00 b8 00 00 ee cf 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 58 14 19 40
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 00 08 00


--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi

2004-11-29 20:46:20

by Bill Davidsen

[permalink] [raw]
Subject: Re: ide-cd problem

Alan Chandler wrote:
> On Tuesday 23 November 2004 21:49, Alan Chandler wrote:
>
>
>>Before, I thought my hardware was a little out of spec - now I think there
>>is something else at play here.
>>
>
>
> Firstly, I think there might be another race condition like the one Alan Cox
> found. I attach a patch below with the fix for that (against 2.6.10-rc2, an
> including Alan's patch) I'm not 100% sure its necessary, but it seems fix a
> variation I have been seeing.
>
> With it in place, and apart from the ongoing issue - see below, I have managed
> to remove the delay in drive_is_ready() altogether without any ill effects.
>
> [my reading of the ATA spec is that 400ns is needed after reading the status
> reg before IRQ is removed, I had wondered whether it would be better to
> record the time here and then check whether we had used up the 400ns just
> before returning from the interrupt state]

The method is sound, I did some industrial control in a previous job and
used a back-to-back timer in a similar way. Just before starting an
operation I checked to see that it wasn't too soon and did a delay if so.

Depending on the path through the code, it may be easier to get right by
just putting in the delay.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2004-11-30 09:03:42

by Alan Chandler

[permalink] [raw]
Subject: Re: ide-cd problem

On Tuesday 23 November 2004 21:49, Alan Chandler wrote:
> On Tuesday 23 November 2004 14:51, Jens Axboe wrote:
> > On Tue, Nov 23 2004, Alan Chandler wrote:
> > > On Monday 22 November 2004 23:48, Alan Chandler wrote:
> > > ...
> > >
> > > > If I make the delay 600ns it works - I guess my hardware is a little
> > > > off spec.
...
> There is good and bad news related to 2.6.10-rc2
>
> The good news is that the acpi problem was the cause of the startup issues.
> adding pci=noacpi to the boot command line fixed that.
>
> The bad news is that with the delay at 800ns in drive_is_ready() I am
> getting the exact same symptoms I got with 2.6.9 before upping the delay to
> over 540ns.
...
>
> Firstly, the symptoms are the same between 2.6.9 and 2.6.10-rc2. The halt
> seem to always be in exactly the same place. If it was a timing problem I
> would have thought it would have varied.
>
> Secondly, the command before seems to have an expectation of a 2048
> transfer rather than the 0 in the command, before the problem and then you
> get the strange DRQ=1 but 0 in the len register.
>
> Nov 23 20:37:33 kanger kernel: ide-cd:ide_do_rq_cdrom - cmd = 0x0
> Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x0 stat=0x50
> ireason=3 len=2048 rq len=0
> Nov 23 20:37:33 kanger kernel: ide-cd:ide_do_rq_cdrom - cmd = 0x1b
> Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x1b stat=0x58
> ireason=2 len=0 rq len=0
>

I have looked at this "every which way" to try and understand what is
happening, and now think I have a handle on it.

The Test Unit Ready command (cmd=0x00) is the first command in the sequence
that is issued with an expectation of no data being returned (rq len=0), but
for some reason (which I don't know why) the device has decided it has data
to send us (len=2048). I was confused for a long time because in normal
circumstances cdrom_newpc_intr() is entered twice for a packet command with a
data transfer attached, the first time with DRQ asserted and rq len == len,
and once the transfer has taken place with DRQ not asserted and rq len having
been set to 0 whilst len remains at its original value. But in the case
shown above, the first time cdrom_newpc_intr() is entered rq len (actually
rq->data_len) is set to zero

In this case and (as far as I can see related to the packet commands) only
this case, with my drive (CW078D), there is a delay between IRQ being
asserted to say the command has been completed and DRQ being raised. (DRQ for
the other cases seems to be asserted by the time of the interrupt).

Once DRQ is asserted in this way, and until some command tries to read the
data it wants to send us, it remains asserted. That is why the start stop
unit command (0x1b) appears to be the one that is failing. In one trial I
conducted I made this command into a no op - returning successfully, and the
problem just full into the next command in sequence.

I was also experiencing different symptoms between 2.6.9 and 2.6.10-rc2, in
that a change made to solve the problem in 2.6.9 (changing the delay in
drive_is_ready to 600ns) did not appear to work in 2.6.10-rc2. I haven't
quite got to the bottom of this - but as the patch below shows, I have
plugged what might be race conditions similar to the one identified by Alan
Cox. I do not fully understand the logic of the Alan's race condition, as
from comments elsewhere in the code (and I really don't know how it really
works) it says linux does not allow rentrant interrupts on the same unit - so
I don't understand why the spinlocks need to be there, so my logic maybe
totally spurious in this case - except it did make the difference between it
working in 2.6.9 and getting it to work in 2.6.10-rc2.

I did attempt in early tests I did with 2.6.9 to determine the accurately the
size of the delay needed, it appeared to be around 540ns (400ns was not
enough). I have done all recent tests with 2.6.10-rc2 and 400ns is not
enough 600ns is. I have searched all over the ATAPI spec, but I can not see
any really definitive statement on DRQ assertion in this case. There is of
course the 400ns delay in removing IRQ after either reading the status
register or writing to the command register (clause 5.2.9), which the code
seems to interpret as 400ns after either occurrence, but which my reading of
the spec would be 400ns after the first occurence of one of these. As I
indicated in a previous note on this, as we always read the status register
at the start of interrupt processing (in drive_is_ready), perhaps the
solution is to store jiffies at this point and then delay 400ns - difference
between jiffies now and as we exit ide_intr().

The other place where any delay is mentioned is when transitioning from bus
idle state to various command state following issue of a command. Again the
400ns is mentioned. In the specific case of the packet command, this delay
seems to need to be when the packet command is written - and before the
command packet is transfered. The only delay mentioned after that point is
the one pio cycle (where it recommends reading the alt status reg) before
checking status. Given that we read the status twice (drive_is_ready and
cdrom_decode_status) it appears we fufil this rule.

In fact, unless someone can point me at another place in the spec, the
implication is that DRQ will be correct at the time of the interrupt and two
reads of the status register that we do when entering cdrom_newpc_intr. So I
Have to conclude that my CD is operating outside of spec in this case.

This patch below is not a definitive answer to the issue, rather an
illustration of the test envionment I was using to show things working. You
will see from the patches in drivers/ide/ide-iops.c that I have removed most
of the timing delays, mainly to get a real handle on what was needed in
ide-cd.c. I never had any problem with delays apart from the one described
above

Index: drivers/ide/ide-iops.c
===================================================================
--- drivers/ide/ide-iops.c (.../vendor/2.6.10-rc2) (revision 14)
+++ drivers/ide/ide-iops.c (.../akc/delay) (revision 14)
@@ -476,10 +476,6 @@
if (drive->waiting_for_dma)
return hwif->ide_dma_test_irq(drive);

-#if 0
- /* need to guarantee 400ns since last command was issued */
- udelay(1);
-#endif

#ifdef CONFIG_IDEPCI_SHARE_IRQ
/*
@@ -560,7 +556,6 @@
return 1;
}

- udelay(1); /* spec allows drive 400ns to assert "BUSY" */
if ((stat = hwif->INB(IDE_STATUS_REG)) & BUSY_STAT) {
local_irq_set(flags);
timeout += jiffies;
@@ -589,11 +584,13 @@
* rather than expensively fail things immediately.
* This fix courtesy of Matthew Faupel & Niccolo Rigacci.
*/
- for (i = 0; i < 10; i++) {
- udelay(1);
+
+ /* take out this fix temporarily to test out what is happening without
distractions*/
+/* for (i = 0; i < 10; i++) {
+ udelay(1);*/
if (OK_STAT((stat = hwif->INB(IDE_STATUS_REG)), good, bad))
return 0;
- }
+/* }*/
*startstop = DRIVER(drive)->error(drive, "status error", stat);
return 1;
}
Index: drivers/ide/ide-cd.c
===================================================================
--- drivers/ide/ide-cd.c (.../vendor/2.6.10-rc2) (revision 14)
+++ drivers/ide/ide-cd.c (.../akc/delay) (revision 14)
@@ -890,8 +890,13 @@
ide_execute_command(drive, WIN_PACKETCMD, handler, ATAPI_WAIT_PC,
cdrom_timer_expiry);
return ide_started;
} else {
- /* packet command */
- HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+ unsigned long flags;
+
+ /* packet command */
+ spin_lock_irqsave(&ide_lock, flags);
+ HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
+ ndelay(400);
+ spin_unlock_irqrestore(&ide_lock, flags);
return (*handler) (drive);
}
}
@@ -938,8 +943,12 @@
cmd_len = ATAPI_MIN_CDB_BYTES;

/* Send the command to the device. */
+ unsigned long flags;
+ spin_lock_irqsave(&ide_lock, flags);
HWIF(drive)->atapi_output_bytes(drive, rq->cmd, cmd_len);
-
+ ndelay(400);
+ spin_unlock_irqrestore(&ide_lock, flags);
+
/* Start the DMA if need be */
if (info->dma)
hwif->dma_start(drive);
@@ -1649,6 +1658,8 @@
xfer_func_t *xferfunc;
unsigned long flags;

+ /* delay here enough to ensure the command has had time to complete and
decide if its going to set DRQ */
+ ndelay(600);
/* Check for errors. */
dma_error = 0;
dma = info->dma;
@@ -1665,7 +1676,6 @@
*/
if (dma) {
if (dma_error) {
- printk("ide-cd: dma error\n");
__ide_dma_off(drive);
return DRIVER(drive)->error(drive, "dma error", stat);
}
@@ -1737,10 +1747,19 @@
if (blen > thislen)
blen = thislen;

- xferfunc(drive, ptr, blen);
-
thislen -= blen;
len -= blen;
+
+ if((thislen <=0) && (len<=0)) {
+ unsigned long flags;
+ /* this is the last of this output */
+ spin_lock_irqsave(&ide_lock, flags);
+ xferfunc(drive, ptr, blen);
+ ndelay (400);
+ spin_unlock_irqrestore(&ide_lock, flags);
+ } else
+ xferfunc(drive,ptr,blen);
+
rq->data_len -= blen;

if (rq->bio)
@@ -1748,7 +1767,7 @@
else
rq->data += blen;
}
-
+
/*
* pad, if necessary
*/
@@ -1756,10 +1775,19 @@
while (len > 0) {
int pad = 0;

- xferfunc(drive, &pad, sizeof(pad));
len -= sizeof(pad);
+ if (len <= 0) {
+ unsigned long flags;
+
+ /* last output */
+ spin_lock_irqsave(&ide_lock,flags);
+ xferfunc(drive, &pad, sizeof(pad));
+ ndelay(400);
+ spin_unlock_irqrestore(&ide_lock,flags);
+ } else
+ xferfunc(drive, &pad, sizeof(pad));
}
- }
+ }

if (HWGROUP(drive)->handler != NULL)
BUG();



--
Alan Chandler
[email protected]
First they ignore you, then they laugh at you,
then they fight you, then you win. --Gandhi