2004-01-26 23:06:02

by Måns Rullgård

[permalink] [raw]
Subject: Strange xmms deaths under high disk load


I have encountered a very strange (to me, at least) error. I can
reliably cause xmms to crash by simply doing some intensive disk IO.
Copying a few hundred megabytes usually does it. After about 20
seconds of heavy disk IO, xmms will die with this message:

** WARNING **: snd_pcm_wait: Input/output error
Xlib: unexpected async reply (sequence 0x2ef5c)!

The hex number varies.

The machine is an Alpha SX164 running Linux 2.6.2-rc2 patched up to
ALSA 1.0.1. The problem has been around for quite a while, probably
also with kernel 2.4.21, though I can't confirm that at the moment.
The sound card is a cmi8738.

If I play music with TCVP instead, it keeps playing, but sound is
choppy at intervals. Below is vmstat output during a copying.

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 86128 3424 165920 0 0 66 66 24 4 23 23 54 0
0 0 0 86128 3424 165920 0 0 0 0 1131 1352 17 14 69 0
0 0 0 86128 3424 165920 0 0 0 0 1128 1360 13 4 84 0
0 0 0 86128 3424 165920 0 0 0 0 1129 1351 13 4 84 0
0 0 0 86128 3424 165920 0 0 0 153 1142 1361 14 3 83 0
0 0 0 86128 3424 165920 0 0 0 0 1127 1353 14 4 83 0
0 0 0 86128 3424 165920 0 0 0 0 1128 1348 12 3 84 0
0 0 0 86128 3424 165920 0 0 0 0 1130 1344 17 15 69 0
0 0 0 86000 3424 166048 0 0 128 0 1132 1356 13 3 84 0
1 1 0 71648 3472 179992 0 0 7064 81 1255 1540 16 26 44 14
2 0 0 34912 3480 216296 0 0 18081 88 1409 1792 20 61 1 18
1 0 0 2016 3464 248720 0 0 19420 0 1423 1795 20 65 0 15
14 1 0 2400 3392 246544 0 0 11632 24929 1596 1489 16 60 1 23
11 2 0 2400 3392 246368 0 0 768 26112 969 626 25 71 0 3
3 0 0 1952 3392 248552 0 0 10112 6310 1250 1396 29 51 0 20
1 0 0 1568 3376 248760 0 0 18328 4 1409 1794 20 63 0 17
1 0 0 2336 1696 249464 0 0 19576 0 1417 1858 20 68 0 11
7 1 0 2400 1704 248160 0 0 8904 22984 1230 955 16 70 1 13
11 0 0 3040 1704 248160 0 0 0 33704 696 58 10 90 0 0
1 1 0 2016 1704 249840 0 0 13980 88 1345 1710 29 60 0 10
0 1 0 2016 1696 250056 0 0 17920 0 1479 1903 22 65 0 13
1 0 0 2016 1712 250616 0 0 17968 0 1605 2085 27 66 0 7
16 0 0 2016 1736 249352 0 0 3224 53457 1284 379 11 86 0 4
4 0 0 2208 1736 250448 0 0 9736 88 1558 1793 41 45 2 12
1 1 0 2272 1728 250368 0 0 10200 105 1406 1691 46 42 0 12
4 1 0 2272 1728 250544 0 0 11024 0 1362 1790 53 40 0 8
2 0 0 2336 1720 250480 0 0 16760 33 1488 1916 26 62 0 12
19 0 0 2976 1720 248904 0 0 5644 53328 1537 682 12 81 0 6
2 0 0 1760 1744 251080 0 0 12216 73 1365 1700 31 57 0 12
1 0 0 1568 1736 251400 0 0 18520 72 1410 1809 21 66 0 13
0 1 0 1568 1736 251368 0 0 17316 20 1405 1759 20 61 0 19
13 0 0 1888 1736 250632 0 0 5352 32205 1090 625 11 74 0 15
1 1 0 2528 1736 250608 0 0 1756 21332 965 590 35 56 0 8
4 0 0 1760 1696 251352 0 0 20040 0 1428 1825 21 69 0 10
1 1 0 1504 1696 251824 0 0 15164 16 1366 1742 24 65 0 11
6 1 0 2400 1696 249424 0 0 16460 27005 2106 1992 19 68 0 13

Whenever there was a high value for "bo" the sound was really choppy.

I don't mind an occasional glitch in the sound too much, but the fact
that xmms died makes me a little nervous. What's worse, yesterday it
even corrupted the playlist file on disk rather badly.

The filesystems are XFS on device-mapper on raid1 over raid0 on four
disks connected to a hpt374 controller.

Should I just call it an xmms bug, or should I start digging for
hardware problems or kernel bugs?

--
M?ns Rullg?rd
[email protected]


2004-01-26 23:27:46

by Nick Piggin

[permalink] [raw]
Subject: Re: Strange xmms deaths under high disk load



M?ns Rullg?rd wrote:

>I have encountered a very strange (to me, at least) error. I can
>reliably cause xmms to crash by simply doing some intensive disk IO.
>Copying a few hundred megabytes usually does it. After about 20
>seconds of heavy disk IO, xmms will die with this message:
>
>** WARNING **: snd_pcm_wait: Input/output error
>Xlib: unexpected async reply (sequence 0x2ef5c)!
>
>The hex number varies.
>
>The machine is an Alpha SX164 running Linux 2.6.2-rc2 patched up to
>ALSA 1.0.1. The problem has been around for quite a while, probably
>also with kernel 2.4.21, though I can't confirm that at the moment.
>The sound card is a cmi8738.
>
>If I play music with TCVP instead, it keeps playing, but sound is
>choppy at intervals. Below is vmstat output during a copying.
>
>procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 0 0 86128 3424 165920 0 0 66 66 24 4 23 23 54 0
> 0 0 0 86128 3424 165920 0 0 0 0 1131 1352 17 14 69 0
> 0 0 0 86128 3424 165920 0 0 0 0 1128 1360 13 4 84 0
> 0 0 0 86128 3424 165920 0 0 0 0 1129 1351 13 4 84 0
> 0 0 0 86128 3424 165920 0 0 0 153 1142 1361 14 3 83 0
> 0 0 0 86128 3424 165920 0 0 0 0 1127 1353 14 4 83 0
> 0 0 0 86128 3424 165920 0 0 0 0 1128 1348 12 3 84 0
> 0 0 0 86128 3424 165920 0 0 0 0 1130 1344 17 15 69 0
> 0 0 0 86000 3424 166048 0 0 128 0 1132 1356 13 3 84 0
>
...

>11 2 0 2400 3392 246368 0 0 768 26112 969 626 25 71 0 3
>
...

>11 0 0 3040 1704 248160 0 0 0 33704 696 58 10 90 0 0
>
...

>
> 1 1 0 2528 1736 250608 0 0 1756 21332 965 590 35 56 0 8
>

Looks like you might be losing timer interrupts, possibly caused
by an IDE disk doing PIO?


2004-01-26 23:50:03

by Måns Rullgård

[permalink] [raw]
Subject: Re: Strange xmms deaths under high disk load

Nick Piggin <[email protected]> writes:

>>If I play music with TCVP instead, it keeps playing, but sound is
>>choppy at intervals. Below is vmstat output during a copying.
>>
>>procs ---------memory---------- ---swap-- -----io---- --system-- ----cpu----
>> r b swpd free buff cache si so bi bo in cs us sy id wa
>> 0 0 0 86128 3424 165920 0 0 66 66 24 4 23 23 54 0
>> 0 0 0 86128 3424 165920 0 0 0 0 1131 1352 17 14 69 0
>> 0 0 0 86128 3424 165920 0 0 0 0 1128 1360 13 4 84 0
>> 0 0 0 86128 3424 165920 0 0 0 0 1129 1351 13 4 84 0
>> 0 0 0 86128 3424 165920 0 0 0 153 1142 1361 14 3 83 0
>> 0 0 0 86128 3424 165920 0 0 0 0 1127 1353 14 4 83 0
>> 0 0 0 86128 3424 165920 0 0 0 0 1128 1348 12 3 84 0
>> 0 0 0 86128 3424 165920 0 0 0 0 1130 1344 17 15 69 0
>> 0 0 0 86000 3424 166048 0 0 128 0 1132 1356 13 3 84 0
>>
> ...
>
>>11 2 0 2400 3392 246368 0 0 768 26112 969 626 25 71 0 3
>>
> ...
>
>>11 0 0 3040 1704 248160 0 0 0 33704 696 58 10 90 0 0
>>
> ...
>
>>
>> 1 1 0 2528 1736 250608 0 0 1756 21332 965 590 35 56 0 8
>>
>
> Looks like you might be losing timer interrupts, possibly caused by
> an IDE disk doing PIO?

All the disks are doing ATA-100, so that's not it. Something else
could have caused it, of course. Rather bad of xmms to crash, too.

What worries me most is the file corruption I saw in connection with
one such crash. It hasn't happened again, though.

--
M?ns Rullg?rd
[email protected]

2004-01-27 10:36:11

by Piotr Kowalczyk

[permalink] [raw]
Subject: Re: Strange xmms deaths under high disk load

Hello World\n

> I have encountered a very strange (to me, at least) error. I can
> reliably cause xmms to crash by simply doing some intensive disk IO.
> Copying a few hundred megabytes usually does it. After about 20
> seconds of heavy disk IO, xmms will die with this message:

[...]

> Should I just call it an xmms bug, or should I start digging for
> hardware problems or kernel bugs?

Well, at least not hardware problems, I think. I don't know anything
about messages that xmms throws, but it hangs regularly under heavy
disk load. Usually clicking pause and play is enough to bring it back.
In case of doing anything else it hangs completely.
I have Athlon XP 1700+, sound via via82xx, ATA-100,
kernel 2.6.1 (with 2.6.0 was the same).
Had no problems with 2.4, but haven't used alsa then. Maybe i'll try to
debug it a bit more if i find some time.
chears,

--
Piotr Kowalczyk <[email protected]>
PRDKW (iols)