2002-03-18 18:17:31

by Martin Dalecki

[permalink] [raw]
Subject: Re: Console output from IDE lockup

Adam J. Richter wrote:
> I don't know if this is helpful, but I was able to extract the
> console output from a machine got an IDE hang. I have not seen these
> messages previously, and they may have been output long before the IDE
> hang and be unrelated. However, they are the last thing in the console
> log:
>
> hda: timeout waiting for DMA
> hda: ide_dma_timeout: Lets do it again!stat = 0x58, dma_stat = 0x20
> hda: DMA disabled
> hda: ide_set_handler: handler not null; old=d8802258, new=d88008ac
> bug: kernel timer added twice at d8801f51.


Ah this actually gives some insight!

It is showing us, that the problem is occurring in the
case where the drivers fall back from DMA mode to PIO
transfer mode.

Apparently it seems that in this case we don't recover properly
from the previous transfer.

But, since you mention that the lockup is very infrequent
and there are quite a lot of places in the IDE code, where
timeout values are directly compared by subtracting from
jiffies it could be that we have here just a simple case
of 32 bit timer overflow. (Assuming that you have a *very*
highly clocked CPU.)

Since unfortunately the code path involved here depends
upon the particular chipset you use:
ll {} \;
./include/linux/ide.h: ide_dma_lostirq, ide_dma_
timeout
./drivers/ide/icside.c: case ide_dma_timeout:
./drivers/ide/hpt366.c: case ide_dma_timeout:
./drivers/ide/hpt366.c: func = ide_dma_timeout;
./drivers/ide/hpt366.c: case ide_dma_timeout:
./drivers/ide/ide-dma.c: case ide_dma_timeout:
./drivers/ide/pdc202xx.c: case ide_dma_timeout:
./drivers/ide/aec62xx.c: case ide_dma_timeout:
./drivers/ide/ide.c:void ide_dma_timeout_retry(ide_drive_t *drive)
./drivers/ide/ide.c: hwif->dmaproc(ide_dma_timeout, drive);
./drivers/ide/ide.c: ide_dma_timeout_retry(dr
ive);
./drivers/ide/ide-features.c: case ide_dma_timeout: return("
ide_dma_timeout");
./drivers/ide/ide-pmac.c: case ide_dma_timeout:
[root@kozaczek linux]#

Could you have a look at the corresponding ide_dma_timeout
or ide_dma_lostirq usage places by just comparing whatever
there are some omissions in comparision of the different
implementations found there? (Four eyes may see more than
just two...) Thank you in advance.

I will just try to replace all the subtractions done
there by proper additive values at the places where the
ticks matter. This has to be done anywaym, since the macros
time_before and time_after are for serious reasons
out there ;-).