2001-11-30 15:40:46

by Emmanuele Bassi

[permalink] [raw]
Subject: Deadlock on kernels > 2.4.13-pre6

Hi everyone,

I've recently compiled and tested each kernel since 2.4.13-pre6[0], and
I've noticed a recurrent (and reproducible[1]) deadlock on my system
when I try to play an mp3[2].

It occurs randomly, i.e. not after a precise amount of time the mp3 is
playing, but each and every time I try to play an mp3 file, my box
suddenly ``freeze'': no life signs at all (SysRq keys, network, even via
a serial terminal), no Oops, no trace in logs. The box simply `dies'.

I've tried hundreds of combinations, trying to understand where the
problem lies, and I've come up with... er... nothing...

o it's not ext3: even vanilla kernels lock up;
o it's not an hardware problem: I've tested my RAM and compiled
kernels over kernels with (and without) optimization;
o kernels <= 2.4.13-pre6 works properly;
o it's not the player/library fault: I've tried many
players, on different libraries; besides, a user-level program
shouldn't cause such deadlocks;
o every other operation on kernels > 2.4.13-pre6 works quite well
(this new VM is *great*), *except* when I try to listen a
mp3[3]: that always leads to disaster.

So far, I've excluded everything but a bug in the OSS sound drivers,
but, according to the ChangeLogs, they did not change from 2.4.13-pre6
(the last working kernel) to 2.4.13.

TIA.

+++

[0] Mainly, because it was the first kernel with the new VM and with the
ext3 patch available, excluding 2.4.10.

[1] At least, on my box.

[2] I use a SoundBlaster AWE64 (ISA) perfectly recognized both by isapnp
and 2.4.x kernels, using OSS modules. Yes, I've also tried not to use
modules. No, I did not try ALSA. Yes, the card works perfectly.

[3] Any other format, except .MOD files, works perfectly. And that's why
I suspect the sequencer code.

Bye,
Emmanuele.


--
Emmanuele Bassi (Zefram) [ http://digilander.iol.it/ebassi ]
GnuPG Key fingerprint = 4DD0 C90D 4070 F071 5738 08BD 8ECC DB8F A432 0FF4


2001-11-30 15:50:15

by Alan

[permalink] [raw]
Subject: Re: Deadlock on kernels > 2.4.13-pre6

> So far, I've excluded everything but a bug in the OSS sound drivers,
> but, according to the ChangeLogs, they did not change from 2.4.13-pre6
> (the last working kernel) to 2.4.13.

The OSS core and SB AWE driver have to all intents not changed since before
2.4 was released.

You might want to check when the various VIA chipset fixes went in if you
are using a VIA chipset

2001-11-30 16:07:29

by Andreas Steinmetz

[permalink] [raw]
Subject: Re: Deadlock on kernels > 2.4.13-pre6

I had this kind of deadlock on a MSI-6215 (i815) running in console mode (no X).
It always happened during screen blanking while there was interrupt load
(networking via ISDN). APM based screen blanking didn't work so I suspected APM
but at least this is only half true at maximum. The system does run fine with
APM but no APM screen blanking, if you disable console blanking completely by
issuing:

echo -n -e "\033[9;0]\033[10;0]\033[11;0]\033[14;0]"

during the boot sequence, i.e. output to /dev/console (beeps silenced too but I
do believe this can be ignored). By now I do suspect the console blanking code
to be the trigger of the lockup, not the APM code.

On 30-Nov-2001 Alan Cox wrote:
>> So far, I've excluded everything but a bug in the OSS sound drivers,
>> but, according to the ChangeLogs, they did not change from 2.4.13-pre6
>> (the last working kernel) to 2.4.13.
>
> The OSS core and SB AWE driver have to all intents not changed since before
> 2.4 was released.
>
> You might want to check when the various VIA chipset fixes went in if you
> are using a VIA chipset
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Andreas Steinmetz
D.O.M. Datenverarbeitung GmbH

2001-11-30 21:13:46

by Emmanuele Bassi

[permalink] [raw]
Subject: Re: Deadlock on kernels > 2.4.13-pre6

* Alan Cox <[email protected]>:

> You might want to check when the various VIA chipset fixes went in if you
> are using a VIA chipset

I am, indeed, using a VIA chipset, and I found something in the 2.4.10
ChangeLog, but that does not explain why the 2.4.13-pre6 kernel does
work properly... Unless some corrections, not reported into the
changelogs, did eventually occur between 2.4.13-pre* series and 2.4.13.

I frankly gave up hope about getting an explanation...

+++

Even if it shows up to be a VIA problem, what do I have to do, to get my
system work properly with this chipset?

[Output of lspci -v -v -v -v]
00:00.0 Host bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3] (rev 04)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR+
Latency: 64
Region 0: Memory at e0000000 (32-bit, prefetchable) [size=64M]
Capabilities: <available only to root>

00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP] (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: d0000000-dfffffff
Prefetchable memory behind bridge: a0000000-afffffff
BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C586/A/B PCI-to-ISA [Apollo VP] (rev 47)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0

00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Region 4: I/O ports at f000 [size=16]

00:07.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 02) (prog-if 00 [UHCI])
Subsystem: Unknown device 0925:1234
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64, cache line size 08
Interrupt: pin D routed to IRQ 11
Region 4: I/O ports at 6400 [size=32]

00:07.3 Bridge: VIA Technologies, Inc. VT82C586B ACPI (rev 10)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:0c.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
Subsystem: Realtek Semiconductor Co., Ltd. RT8029(AS)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at 6800 [size=32]

01:00.0 VGA compatible controller: nVidia Corporation Riva TnT [NV04] (rev 04) (prog-if 00 [VGA])
Subsystem: Creative Labs Graphics Blaster CT6710
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (1250ns min, 250ns max)
Interrupt: pin A routed to IRQ 12
Region 0: Memory at d0000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at a0000000 (32-bit, prefetchable) [size=16M]
Expansion ROM at <unassigned> [disabled] [size=64K]
Capabilities: <available only to root>

Bye,
Emmanuele.

--
Emmanuele Bassi (Zefram) [ http://digilander.iol.it/ebassi ]
GnuPG Key fingerprint = 4DD0 C90D 4070 F071 5738 08BD 8ECC DB8F A432 0FF4

2001-11-30 22:31:28

by Alan

[permalink] [raw]
Subject: Re: Deadlock on kernels > 2.4.13-pre6

> work properly... Unless some corrections, not reported into the
> changelogs, did eventually occur between 2.4.13-pre* series and 2.4.13.

Nope.

> Even if it shows up to be a VIA problem, what do I have to do, to get my
> system work properly with this chipset?

The reason I ask is VIA have had a history of weird ISA DMA hangs when doing
certain other operations. It could be some combination of these triggering
problems.

> 01:00.0 VGA compatible controller: nVidia Corporation Riva TnT [NV04] (rev 04) (prog-if 00 [VGA])

Are you seeing the hangs in X11, and which X setup (one with agp loaded ?)
Also does it print "Activiating ISA DMA workarounds" during the boot ?

Alan

2001-12-01 13:39:52

by Emmanuele Bassi

[permalink] [raw]
Subject: Re: Deadlock on kernels > 2.4.13-pre6

* Alan Cox <[email protected]>:

> > Even if it shows up to be a VIA problem, what do I have to do, to get my
> > system work properly with this chipset?
>
> The reason I ask is VIA have had a history of weird ISA DMA hangs when doing
> certain other operations. It could be some combination of these triggering
> problems.

Considering this issue, and that everything else I've tried this far, I
beginning to think that the controller is, indeed, guilty as charged.

> > 01:00.0 VGA compatible controller: nVidia Corporation Riva TnT [NV04] (rev 04) (prog-if 00 [VGA])
>
> Are you seeing the hangs in X11, and which X setup (one with agp loaded ?)

I see hangs in both X11 and console, but under console I use the
framebuffer device (rivafb).

This is the section about my hardware inside /etc/X11/XF86Config-4:

# **********************************************************************
# Graphics device section
# **********************************************************************

# Device configured by xf86config:

Section "Device"
Identifier "Creative VideoBlaster"
Driver "nv"
VideoRam 16384
EndSection

+++

Just to be certain, I have an old PCI graphic card (a Matrox Mystique)
that worked nicely since three years now... If that works nice, this
should be the final proof...

> Also does it print "Activiating ISA DMA workarounds" during the boot ?

Yes.

Bye,
Emmanuele.

--
Emmanuele Bassi (Zefram) [ http://digilander.iol.it/ebassi ]
GnuPG Key fingerprint = 4DD0 C90D 4070 F071 5738 08BD 8ECC DB8F A432 0FF4

2001-12-01 17:42:26

by Emmanuele Bassi

[permalink] [raw]
Subject: Re: Deadlock on kernels > 2.4.13-pre6

* Emmanuele Bassi <[email protected]>:

> Just to be certain, I have an old PCI graphic card (a Matrox Mystique)
> that worked nicely since three years now... If that works nice, this
> should be the final proof...

Tested on the old Mystique. The system hangs up after a while (kernel
2.4.16)... At this point, I think I should blame the kernel, or some
workaround to this fscking chipset.

Bye,
Emmanuele.

--
Emmanuele Bassi (Zefram) [ http://digilander.iol.it/ebassi ]
GnuPG Key fingerprint = 4DD0 C90D 4070 F071 5738 08BD 8ECC DB8F A432 0FF4

2001-12-03 08:51:51

by Chris Siebenmann

[permalink] [raw]
Subject: Re: Deadlock on kernels > 2.4.13-pre6

| I've recently compiled and tested each kernel since 2.4.13-pre6[0],
| and I've noticed a recurrent (and reproducible[1]) deadlock on my
| system when I try to play an mp3[2].

I have now seen two lockups under 2.4.16 that may be the same problem.
Points of similarity:
- playing mp3s (both times have been streaming mp3 audio over my PPP link)
- Creative Soundblaster AWE (AWE32 for me, though); mine is configured
entirely through the kernel's PnP mechanisms.

I believe my lockups are irregular and infrequent, but it may just feel
that way to me because I haven't listened to mp3's on this machine very
much.

When the hang happens, the first symptom is that my mouse cursor in
X locks and stops tracking; music continues playing for a bit longer
before stopping. As far as I can tell nothing really happens after that
point; if I forcefully disconnect the PPP link it does not redial, for
example. SysRq-B will reboot the machine but SysRq-S to sync it will
produce no audible results, and SysRq-U doesn't seem to have any effect.
(This is an isolated home machine, which makes more precise diagnostics
hard to get when it hangs in X.)

2.4.13-ac5 is, as far as I can tell, completely stable. I don't have
experience with intermediate kernel versions; I jumped straight from
2.4.13-ac5 to 2.4.16.

My environment is UP Pentium II, ext2, aic7880 SCSI with a single disk,
Matrox G400 AGP graphics, XFree86 4.1.0 (using the fully free drivers).
Both lockups have happened while X was running.

I will see if I can reproduce a lockup in the console and capture
SysRq output that's got some useful information.

- cks