2004-03-18 10:56:56

by Michael Frank

[permalink] [raw]
Subject: 2.6.4 under heavy ioload disables sis5513 DMA

Happens every few hours with heavy io and cpu load:

hda: dma_timer_expiry: dma status == 0x21
hda: DMA timeout error
hda: dma timeout error: status=0xd0 { Busy }

hda: DMA disabled
ide0: reset: success

DMA auto-reenabled by boot time hdparm -k

lspci -vv

00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] (prog-if 80 [Master])
Subsystem: Micro-Star International Co., Ltd.: Unknown device 5332
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 128
Interrupt: pin ? routed to IRQ 10
Region 4: I/O ports at 4000 [size=16]
Capabilities: [58] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-


hdparm -iI

/dev/hda:

Model=IC35L090AVV207-0, FwRev=V23OA63A, SerialNo=VNVC00G3CABSMD
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=52
BuffType=DualPortCache, BuffSize=1821kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=160836480
IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: ATA/ATAPI-6 T13 1410D revision 3a: 2 3 4 5 6


ATA device, with non-removable media
powers-up in standby; SET FEATURES subcmd spins-up.
Model Number: IC35L090AVV207-0
Serial Number: VNVC00G3CABSMD
Firmware Revision: V23OA63A
Standards:
Used: ATA/ATAPI-6 T13 1410D revision 3a
Supported: 6 5 4 3
Configuration:
Logical max current
cylinders 16383 65535
heads 16 1
sectors/track 63 63
--
CHS current addressable sectors: 4128705
LBA user addressable sectors: 160836480
LBA48 user addressable sectors: 160836480
device size with M = 1024*1024: 78533 MBytes
device size with M = 1000*1000: 82348 MBytes (82 GB)
Capabilities:
LBA, IORDY(can be disabled)
bytes avail on r/w long: 52 Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: unknown setting (0x0000)
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=240ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* NOP cmd
* READ BUFFER cmd
* WRITE BUFFER cmd
* Host Protected Area feature set
Release interrupt
* Look-ahead
* Write cache
* Power Management feature set
Security Mode feature set
* SMART feature set
* FLUSH CACHE EXT command
* Mandatory FLUSH CACHE command
* Device Configuration Overlay feature set
* 48-bit Address feature set
Automatic Acoustic Management feature set
SET MAX security extension
Address Offset Reserved Area Boot
SET FEATURES subcommand required to spinup after power up
Power-Up In Standby feature set
Advanced Power Management feature set
* READ/WRITE DMA QUEUED
* General Purpose Logging feature set
* SMART self-test
* SMART error logging
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
46min for SECURITY ERASE UNIT.
HW reset results:
CBLID- above Vih
Device num = 0 determined by the jumper
Checksum: correct

Regards
Michael


2004-03-18 11:30:42

by Lionel Bouton

[permalink] [raw]
Subject: SiS APIC, hacker looking for docs/help, was : Re: 2.6.4 under heavy ioload disables sis5513 DMA

Michael Frank wrote the following on 03/18/2004 11:52 AM :
> Happens every few hours with heavy io and cpu load:
>
> hda: dma_timer_expiry: dma status == 0x21
> hda: DMA timeout error
> hda: dma timeout error: status=0xd0 { Busy }
>
> hda: DMA disabled
> ide0: reset: success
>
> DMA auto-reenabled by boot time hdparm -k
>

Hum, I'm wondering if -k is fully functionnal (hdparm man page hints
that this isn't supported by all drives and I don't remember any
success/failure stories here).

> lspci -vv
>
> 00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE]

SiS chipset : is APIC functionnal ? (cat /proc/interrupts)

If not, I believe this might be the problem and the solutions still
eludes me (I don't think the problem lies in the IDE driver but in APIC
support).

I've 2 SiS based mainboards forced to use XT-PIC (SiS735 and SiS645
based) here but without this kind of problems (everything works until I
start to add to many PCI cards in one system...). I'm willing to start
hacking around (mostly on the 645 as the 735 is an always-on system).

Is reading the arch/i386/kernel/*pic* files (and probably others) enough
to start or is there somewhere else to look for information ?

Regards,
--
Lionel Bouton - inet6
---------------------------------------------------------------------
o Siege social: 51, rue de Verdun - 92158 Suresnes
/ _ __ _ Acces Bureaux: 33 rue Benoit Malon - 92150 Suresnes
/ /\ /_ / /_ France
\/ \/_ / /_/ Tel. +33 (0) 1 41 44 85 36
Inetsys S.A. Fax +33 (0) 1 46 97 20 10

2004-03-18 11:58:00

by Michael Frank

[permalink] [raw]
Subject: Re: SiS APIC, hacker looking for docs/help, was : Re: 2.6.4 under heavy ioload disables sis5513 DMA

On Thu, 18 Mar 2004 12:30:17 +0100, Lionel Bouton <[email protected]> wrote:

> Michael Frank wrote the following on 03/18/2004 11:52 AM :
>> Happens every few hours with heavy io and cpu load:
>>
>> hda: dma_timer_expiry: dma status == 0x21
>> hda: DMA timeout error
>> hda: dma timeout error: status=0xd0 { Busy }
>>
>> hda: DMA disabled
>> ide0: reset: success
>>
>> DMA auto-reenabled by boot time hdparm -k
>>
>
> Hum, I'm wondering if -k is fully functionnal (hdparm man page hints
> that this isn't supported by all drives and I don't remember any
> success/failure stories here).

It is OK here as otherwhise io-performance would collapse and hdparm -iI would show PIO.

>
>> lspci -vv
>>
>> 00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE]
>
> SiS chipset : is APIC functionnal ? (cat /proc/interrupts)

APIC disabled in BIOS.
APIC not compiled in kernel. This board did not work well with APIC,
lots of spurious interrupts and delayloop-cal hangs during boot also
with 2.4 but have not tried in 6 months.

>
> If not, I believe this might be the problem and the solutions still
> eludes me (I don't think the problem lies in the IDE driver but in APIC
> support).
>
> I've 2 SiS based mainboards forced to use XT-PIC (SiS735 and SiS645
> based) here but without this kind of problems (everything works until I
> start to add to many PCI cards in one system...). I'm willing to start
> hacking around (mostly on the 645 as the 735 is an always-on system).

No (PCI,AGP) cards plugged in.

>
> Is reading the arch/i386/kernel/*pic* files (and probably others) enough
> to start or is there somewhere else to look for information ?
>

Dunno how DMA timeout is related to interrupts or are you suggesting it
is loosing dma-complete interrupts?

Same board runs same and higher loads with 2.4.2[345] flawlessly. Also
8 hours OK with 2.4.26-pre4 last night + 430 cycles of swsusp2.

By now, it happened with 2.6.4 four (4) times in 2.7 hours

IO is still at 20MB/s average btw so it is still using DMA.

Regards
Michael

2004-03-18 12:26:41

by Lionel Bouton

[permalink] [raw]
Subject: Re: SiS APIC, hacker looking for docs/help, was : Re: 2.6.4 under heavy ioload disables sis5513 DMA

Michael Frank wrote the following on 03/18/2004 12:52 PM :

>
>>
>> Is reading the arch/i386/kernel/*pic* files (and probably others) enough
>> to start or is there somewhere else to look for information ?
>>
>
> Dunno how DMA timeout is related to interrupts or are you suggesting it
> is loosing dma-complete interrupts?
>

I don't know the details (yet I hope), but I'm quite sure that interrupt
handling in XT-PIC mode leads to problems on several SiS configurations
(when you reorganize PCI cards in a system and the behaviour changes or
when you disable the VGA IRQ and some things start to work, the suspect
becomes obvious). One user reported that putting 2 disks on one channel
instead of one on each (so 1 IRQ used instead of 2) solve instability
issues too... I ruled out IDE driver problems several times by
repeatedly checking the code and the run-time register values against
known-good values. My lack of knowledge on the interrupt handling
details is what prevents me from being 100% sure that the problem lies
here. This is why I'm willing to work on this subject.

> Same board runs same and higher loads with 2.4.2[345] flawlessly. Also
> 8 hours OK with 2.4.26-pre4 last night + 430 cycles of swsusp2.
>

Now I remember why your name ringed a bell ! Thanks for your testing
work on swsusp, my amount of free time went up thanks to 2.4's swsusp.

Regards,

--
Lionel Bouton - inet6
---------------------------------------------------------------------
o Siege social: 51, rue de Verdun - 92158 Suresnes
/ _ __ _ Acces Bureaux: 33 rue Benoit Malon - 92150 Suresnes
/ /\ /_ / /_ France
\/ \/_ / /_/ Tel. +33 (0) 1 41 44 85 36
Inetsys S.A. Fax +33 (0) 1 46 97 20 10


2004-03-18 13:46:25

by Ross Dickson

[permalink] [raw]
Subject: Re: SiS APIC, hacker looking for docs/help, was : Re: 2.6.4 under heavy ioload disables sis5513 DMA

<snip>
> > Dunno how DMA timeout is related to interrupts or are you suggesting it
> > is loosing dma-complete interrupts?
> >
> > I don't know the details (yet I hope), but I'm quite sure that interrupt
> > handling in XT-PIC mode leads to problems on several SiS configurations
> > (when you reorganize PCI cards in a system and the behaviour changes or
> > when you disable the VGA IRQ and some things start to work, the suspect
> > becomes obvious). One user reported that putting 2 disks on one channel
> > instead of one on each (so 1 IRQ used instead of 2) solve instability
> > issues too... I ruled out IDE driver problems several times by
> > repeatedly checking the code and the run-time register values against
> > known-good values. My lack of knowledge on the interrupt handling
> > details is what prevents me from being 100% sure that the problem lies
> > here. This is why I'm willing to work on this subject.

I had a report that my patch helped an sis740 board run in apic ioapic mode.
Don't know if it would help your situation. Here is relevant link.
http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-03/4278.html

There is a problem with apm mode with my patch - small fix here if reqd.
http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-03/4410.html

Hope it helps.
Ross.

> > Same board runs same and higher loads with 2.4.2[345] flawlessly. Also
> > 8 hours OK with 2.4.26-pre4 last night + 430 cycles of swsusp2.
> >