Working on tracing down Fedora bug #115458
(https://bugzilla.redhat.com/bugzilla/process_bug.cgi) I found what
appears to be a race between the IDE CD driver and the hardware status.
It doesn't appear to explain the bug at all but it does look like a bug
of itself
When we issue an ide command the status bits don't become valid for
400nS. In the DMA case ide_execute_command handles this but in the PIO
case we don't do the needed locking, use OUTBSYNC to avoid posting or
delay. This means that in some situations we can execute the command
handler in PIO mode before the command status bits are valid and the
handler may read and act wrongly.
--- drivers/ide/ide-cd.c~ 2004-11-17 14:08:42.950485320 +0000
+++ drivers/ide/ide-cd.c 2004-11-17 14:08:42.951485168 +0000
@@ -897,7 +897,10 @@
return ide_started;
} else {
/* packet command */
- HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+ spin_lock_irqsave(&ide_lock, flags);
+ HWIF(drive)->OUTBSYNC(WIN_PACKETCMD, IDE_COMMAND_REG);
+ ndelay(400);
+ spin_unlock_irqsave(&ide_lock, flags);
return (*handler) (drive);
}
}
On Wed, Nov 17 2004, Alan Cox wrote:
> Working on tracing down Fedora bug #115458
> (https://bugzilla.redhat.com/bugzilla/process_bug.cgi) I found what
> appears to be a race between the IDE CD driver and the hardware status.
> It doesn't appear to explain the bug at all but it does look like a bug
> of itself
>
> When we issue an ide command the status bits don't become valid for
> 400nS. In the DMA case ide_execute_command handles this but in the PIO
> case we don't do the needed locking, use OUTBSYNC to avoid posting or
> delay. This means that in some situations we can execute the command
> handler in PIO mode before the command status bits are valid and the
> handler may read and act wrongly.
>
> --- drivers/ide/ide-cd.c~ 2004-11-17 14:08:42.950485320 +0000
> +++ drivers/ide/ide-cd.c 2004-11-17 14:08:42.951485168 +0000
> @@ -897,7 +897,10 @@
> return ide_started;
> } else {
> /* packet command */
> - HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> + spin_lock_irqsave(&ide_lock, flags);
> + HWIF(drive)->OUTBSYNC(WIN_PACKETCMD, IDE_COMMAND_REG);
> + ndelay(400);
> + spin_unlock_irqsave(&ide_lock, flags);
> return (*handler) (drive);
> }
> }
What good does the lock do?
--
Jens Axboe
On Mer, 2004-11-17 at 15:37, Jens Axboe wrote:
> > - HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> > + spin_lock_irqsave(&ide_lock, flags);
> > + HWIF(drive)->OUTBSYNC(WIN_PACKETCMD, IDE_COMMAND_REG);
> > + ndelay(400);
> > + spin_unlock_irqsave(&ide_lock, flags);
> > return (*handler) (drive);
> > }
> > }
>
> What good does the lock do?
The same as in ide_execute_command - make sure we don't take an IDE
interrupt that tries to read the state during the delay. That is the old
2.4 "may drives shared IRQ random fails" fix and why the lock is taken
in ide_execute_command.
On Wed, Nov 17 2004, Alan Cox wrote:
> On Mer, 2004-11-17 at 15:37, Jens Axboe wrote:
> > > - HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> > > + spin_lock_irqsave(&ide_lock, flags);
> > > + HWIF(drive)->OUTBSYNC(WIN_PACKETCMD, IDE_COMMAND_REG);
> > > + ndelay(400);
> > > + spin_unlock_irqsave(&ide_lock, flags);
> > > return (*handler) (drive);
> > > }
> > > }
> >
> > What good does the lock do?
>
> The same as in ide_execute_command - make sure we don't take an IDE
> interrupt that tries to read the state during the delay. That is the old
> 2.4 "may drives shared IRQ random fails" fix and why the lock is taken
> in ide_execute_command.
Ah I see, makes sense. I didn't think about the shared irq case.
--
Jens Axboe
On Wed, Nov 17 2004, Jens Axboe wrote:
> On Wed, Nov 17 2004, Alan Cox wrote:
> > Working on tracing down Fedora bug #115458
> > (https://bugzilla.redhat.com/bugzilla/process_bug.cgi) I found what
> > appears to be a race between the IDE CD driver and the hardware status.
> > It doesn't appear to explain the bug at all but it does look like a bug
> > of itself
> >
> > When we issue an ide command the status bits don't become valid for
> > 400nS. In the DMA case ide_execute_command handles this but in the PIO
> > case we don't do the needed locking, use OUTBSYNC to avoid posting or
> > delay. This means that in some situations we can execute the command
> > handler in PIO mode before the command status bits are valid and the
> > handler may read and act wrongly.
> >
> > --- drivers/ide/ide-cd.c~ 2004-11-17 14:08:42.950485320 +0000
> > +++ drivers/ide/ide-cd.c 2004-11-17 14:08:42.951485168 +0000
> > @@ -897,7 +897,10 @@
> > return ide_started;
> > } else {
> > /* packet command */
> > - HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> > + spin_lock_irqsave(&ide_lock, flags);
> > + HWIF(drive)->OUTBSYNC(WIN_PACKETCMD, IDE_COMMAND_REG);
> > + ndelay(400);
> > + spin_unlock_irqsave(&ide_lock, flags);
> > return (*handler) (drive);
> > }
btw alan, have you attempted to compile this? It averages 2 errors out
of 4 lines :)
--
Jens Axboe
On Llu, 2004-11-22 at 07:58, Jens Axboe wrote:
> > > + spin_unlock_irqsave(&ide_lock, flags);
> > > return (*handler) (drive);
> > > }
>
> btw alan, have you attempted to compile this? It averages 2 errors out
> of 4 lines :)
Guess why it said "for comment". The one that does compile is in current
-ac
but the fixes are kind of obvious 8)