2008-12-04 09:48:51

by Eric Rannaud

[permalink] [raw]
Subject: Freeze under OOM condition (2.6.27)

Version: 2.6.27.5-117.fc10.x86_64

Reporting this as a bug because: at first the machine behaves in a
typical OOM fashion, but the lack of network response, the absence of
significant disk activity and no OOM kill appear abnormal.



A program allocating less than 50 MB (memory-wise, doing essentially a
sequence of realloc() on a buffer, with 16k increments). While running
this program under Valgrind, the memory consumption shots up (as
expected), enough to use up 3GB of RAM (that's a little surprising but
besides the point).

The machine is nearly dead: X is frozen, sshd doesn't respond, the
machine wasn't pingable in one case (no route to host), was in another
test; in the first case, audio was playing and stopped (no audio in the
second test run). There evidently is some disk activity (HDD light,
activity noise), but not much.

The OOM killer is not triggered after up to 10 minutes in this mostly
frozen state.

No information in logs, except pulseaudio complaining about increased
latency (in the first case). Note that there was no audio activity in
the second case, and no pulseaudio output in the logs then.
NetworkManager also complains about not receiving messages on time.
Although I didn't keep track of the exact timing of the events, these
messages appear to be printed in /var/log/messages during the first 2
minutes of the "freeze".


Reproducible: Yes; reproduced twice.

Default Fedora 10 .config, attached.

Can provide more information or a test program on demand.
I can also run a kernel with instrumentation, if this can be helpful.


Thanks.


P.S. Reported a different (hard) lockup a couple days ago, but they seem
unrelated (https://bugzilla.redhat.com/show_bug.cgi?id=474255)



--- /var/log/messages
Dec 4 00:35:32 nc050 NetworkManager: <WARN> killswitch_getpower_reply(): Error
getting killswitch power: Did not receive a reply. Possible causes include: the
remote application did not send a reply, the message bus security policy blocke
d the reply, the reply timeout expired, or the network connection was broken..
Dec 4 00:35:52 nc050 NetworkManager: <WARN> killswitch_getpower_reply(): HAL d
id not reply to killswitch power request; assuming radio is blocked.
Dec 4 00:35:52 nc050 NetworkManager: <info> Wireless now disabled by radio kil
lswitch
Dec 4 00:35:52 nc050 NetworkManager: <info> (wlan0): device state change: 8 ->
2
Dec 4 00:35:52 nc050 NetworkManager: <info> (wlan0): deactivating device (reas
on: 0).
Dec 4 00:35:52 nc050 pulseaudio[5910]: alsa-util.c: snd_pcm_avail_update() retu
rned a value that is exceptionally large: 4320940 bytes (24495 ms) Most likely t
his is an ALSA driver bug. Please report this issue to the PulseAudio developers
.
Dec 4 00:35:52 nc050 pulseaudio[5910]: alsa-util.c: snd_pcm_avail_update() retu
rned a value that is exceptionally large: 2197868 bytes (12459 ms) Most likely t
his is an ALSA driver bug. Please report this issue to the PulseAudio developers
.
Dec 4 00:35:52 nc050 pulseaudio[5910]: alsa-util.c: snd_pcm_avail_update() retu
rned a value that is exceptionally large: 231372 bytes (1311 ms) Most likely thi
s is an ALSA driver bug. Please report this issue to the PulseAudio developers.
Dec 4 00:35:52 nc050 pulseaudio[5910]: alsa-util.c: snd_pcm_avail_update() retu
rned a value that is exceptionally large: 1326444 bytes (7519 ms) Most likely th
is is an ALSA driver bug. Please report this issue to the PulseAudio developers.
Dec 4 00:36:03 nc050 pulseaudio[5910]: alsa-util.c: snd_pcm_avail_update() retu
rned a value that is exceptionally large: 3835116 bytes (21741 ms) Most likely t
his is an ALSA driver bug. Please report this issue to the PulseAudio developers
.
Dec 4 00:36:27 nc050 pulseaudio[5910]: alsa-util.c: snd_pcm_avail_update() retu
rned a value that is exceptionally large: 4248364 bytes (24083 ms) Most likely t
his is an ALSA driver bug. Please report this issue to the PulseAudio developers
.
Dec 4 00:36:28 nc050 pulseaudio[5910]: alsa-util.c: snd_pcm_avail_update() retu
rned a value that is exceptionally large: 198668 bytes (1126 ms) Most likely thi
s is an ALSA driver bug. Please report this issue to the PulseAudio developers.
Dec 4 00:36:31 nc050 pulseaudio[5910]: module-alsa-sink.c: Increasing wakeup wa
termark to 34.01 ms
Dec 4 00:36:34 nc050 pulseaudio[5910]: module-alsa-sink.c: Increasing wakeup wa
termark to 68.03 ms
---


Attachments:
config-2.6.27.5-117.fc10.x86_64 (83.00 kB)