LinuxLists.cc - scheduler went mad?

2001-04-11 14:29:29

by Priit Randla

[permalink] [raw]

Subject: scheduler went mad?

Hi,

Yesterday i tried to start cdda2wav but somehow it didn't do
anything.
It didn't die to kill -9 too. Machine was slow but usable.
vmstat 10 output:

procs memory swap io
system cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
2 0 1 2972 40916 108 18292 0 0 0 0 121 12735 0
100 0
2 0 1 2972 40492 108 18292 0 0 0 0 109 12740 1
99 0
2 0 1 2972 40492 108 18292 0 0 0 0 103 12996 0
100 0
3 0 0 2972 40492 108 18292 0 0 0 0 102 12932 0
100 0
3 0 1 2972 40492 108 18292 0 0 0 0 131 12652 1
99 0
2 0 0 2972 40496 108 18292 0 0 0 0 142 12562 1
99 0
2 0 0 2972 40500 108 18292 0 0 0 0 120 12684 0
100 0
2 0 1 2972 40496 108 18292 0 0 0 0 140 12480 1
99 0
2 0 0 2972 39952 108 18292 0 0 0 0 160 11445 7
93 0
3 0 0 2972 39952 108 18292 0 0 0 0 178 12295 2
98 0
2 0 0 2972 39956 108 18292 0 0 0 0 214 11958 2
98 0
3 0 1 2972 39952 108 18292 0 0 0 0 138 12579 1
99 0

cs field is absolutely ridiculous for my machine.

ps showed cdda2wav & kswapd eating all of processor time. When i tried
to close
netscape, it hang too and joined cdda2wav and kswapd:

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME
COMMAND
9990 priitr 17 0 42380 41M 9928 R 0 32.5 33.4 21:47
netscape-commun
3 root 17 0 0 0 0 SW 0 32.3 0.0 11:12
kswapd
10538 priitr 16 0 84 8 0 R 0 32.3 0.0 11:09
cdda2wav
5 root 9 0 0 0 0 SW 0 1.5 0.0 0:19
bdflush
10616 priitr 13 0 856 856 668 R 0 0.7 0.6 0:00 top
657 root 9 0 21160 20M 1668 S 0 0.1 16.7 29:36 X

I couldn't leave X and had to kill it. After that, both netscape and
cdda2wav were
gone and everything looks normal since then.
I'm running 2.4.3ac3 right now.

dmesg:

2001-04-11 18:50:04

by Josh McKinney

[permalink] [raw]

Subject: Re: scheduler went mad?

I had the almost exact same thing happen to me just yesterday, I started up
xcdroast, and cdda2wav and kswapd went crazy, backed out of X and all was
well, and still is.

Same kernel as you too.

On approximately Wed, Apr 11, 2001 at 04:24:48PM +0200, Priit Randla wrote:
>
>
> Hi,
>
>
> Yesterday i tried to start cdda2wav but somehow it didn't do
> anything.
> It didn't die to kill -9 too. Machine was slow but usable.
> vmstat 10 output:
>
> procs memory swap io
> system cpu
> r b w swpd free buff cache si so bi bo in cs us
> sy id
> 2 0 1 2972 40916 108 18292 0 0 0 0 121 12735 0
> 100 0
> 2 0 1 2972 40492 108 18292 0 0 0 0 109 12740 1
> 99 0
> 2 0 1 2972 40492 108 18292 0 0 0 0 103 12996 0
> 100 0
> 3 0 0 2972 40492 108 18292 0 0 0 0 102 12932 0
> 100 0
> 3 0 1 2972 40492 108 18292 0 0 0 0 131 12652 1
> 99 0
> 2 0 0 2972 40496 108 18292 0 0 0 0 142 12562 1
> 99 0
> 2 0 0 2972 40500 108 18292 0 0 0 0 120 12684 0
> 100 0
> 2 0 1 2972 40496 108 18292 0 0 0 0 140 12480 1
> 99 0
> 2 0 0 2972 39952 108 18292 0 0 0 0 160 11445 7
> 93 0
> 3 0 0 2972 39952 108 18292 0 0 0 0 178 12295 2
> 98 0
> 2 0 0 2972 39956 108 18292 0 0 0 0 214 11958 2
> 98 0
> 3 0 1 2972 39952 108 18292 0 0 0 0 138 12579 1
> 99 0
>
> cs field is absolutely ridiculous for my machine.
>
> ps showed cdda2wav & kswapd eating all of processor time. When i tried
> to close
> netscape, it hang too and joined cdda2wav and kswapd:
>
> PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME
> COMMAND
> 9990 priitr 17 0 42380 41M 9928 R 0 32.5 33.4 21:47
> netscape-commun
> 3 root 17 0 0 0 0 SW 0 32.3 0.0 11:12
> kswapd
> 10538 priitr 16 0 84 8 0 R 0 32.3 0.0 11:09
> cdda2wav
> 5 root 9 0 0 0 0 SW 0 1.5 0.0 0:19
> bdflush
> 10616 priitr 13 0 856 856 668 R 0 0.7 0.6 0:00 top
> 657 root 9 0 21160 20M 1668 S 0 0.1 16.7 29:36 X
>
>
> I couldn't leave X and had to kill it. After that, both netscape and
> cdda2wav were
> gone and everything looks normal since then.
> I'm running 2.4.3ac3 right now.
>
> dmesg:
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-04-12 14:57:27

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: scheduler went mad?

I've seen the same scenario about 2-3 times a week. kswapd and one or
more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung
on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
The 'hung' processes are consistently immune to kill -9, even as root, which
indicates to me that they're hung inside a kernel call or something.

Sometimes, something *else* will exit, and everything will 'break loose'
and return to normal after a minute or so.

It *may* not be related, but I also have a lot of this in 'dmesg':

__alloc_pages: 4-order allocation failed.
__alloc_pages: 3-order allocation failed.
i810_audio: DMA overrun on send

There was a recent posting re: the i810_audio driver amounting to "I've got
one bug to fix and then I'll put up a patch" for the 'dma overrun' message.
__alloc_pages doesn't give much information on who its caller was, so
that's somewhat of a dead end...

In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
cause it to loop around and try to get more memory. I'm wondering if
the "hung" process is entering __alloc_pages(), and gets wedged in the
'try_again' loop - which has a call to wakeup_kswapd() inside it, which
would explain the high context-switch rate. I'm not clear on how kswapd
can end up getting stuck and failing to free up something - unless it ends
up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
enough to get it the memory it needs, causing a deadlock/loop between
kswapd and __alloc_pages/wakeup_kswapd().

Unfortunately, I've just exhausted my ability to debug this one here.. ;)

I'm running the 2.4.3 kernel, with the following patches:

Reiserfs: 2.4.3-3.6.25.quota.bz2
linux-2.4.3-knfsd-6.g.patch.gz
linux-2.4.3-reiserfs-20010327.patch.bz2

IPv6: linux24-2.4.3-usagi-20010406.patch.gz
Crypto: patch-int-2.4.3.1

am using ReiserFS-on-LVM for basically all filesystems, if that matters...

--
Valdis Kletnieks
Operating Systems Analyst
Virginia Tech

Attachments:

(No filename) (211.00 B)

2001-04-12 15:11:07

[permalink] [raw]

Subject: Re: scheduler went mad?

> I've seen the same scenario about 2-3 times a week. kswapd and one or
> more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung
> on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
> The 'hung' processes are consistently immune to kill -9, even as root, which
> indicates to me that they're hung inside a kernel call or something.

Do you have > 800Mb of RAM ?

> In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
> cause it to loop around and try to get more memory. I'm wondering if

Even outside of that certain drivers also loop on alloc failures as does
TCP.

> would explain the high context-switch rate. I'm not clear on how kswapd
> can end up getting stuck and failing to free up something - unless it ends
> up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
> enough to get it the memory it needs, causing a deadlock/loop between
> kswapd and __alloc_pages/wakeup_kswapd().

bounce buffers for one

2001-04-12 15:42:51

by Hugh Dickins

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001 [email protected] wrote:
> I've seen the same scenario about 2-3 times a week. kswapd and one or
> more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung
> on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
> The 'hung' processes are consistently immune to kill -9, even as root, which
> indicates to me that they're hung inside a kernel call or something.
[snip]
> __alloc_pages: 4-order allocation failed.
> __alloc_pages: 3-order allocation failed.
[snip]
> In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
> cause it to loop around and try to get more memory. I'm wondering if
[snip]
> I'm running the 2.4.3 kernel

2.4.3-pre6 quietly made a very significant change there:
it used to say "if (!order) goto try_again;" and now just
says "goto try_again;". Which seems very sensible since
__GFP_WAIT is set, but I do wonder if it was a safe change.
We have mechanisms for freeing pages (order 0), but whether
any higher orders come out of that is a matter of chance.

(But of course, this may not be related to your problem,
and your "N-order allocation failed" messages must have
been from other instances than stuck in this loop.)

Hugh

2001-04-12 15:46:01

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001 16:12:55 BST, Alan Cox said:
> > I've seen the same scenario about 2-3 times a week. kswapd and one or
> > more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung
> > on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
> > The 'hung' processes are consistently immune to kill -9, even as root, which
> > indicates to me that they're hung inside a kernel call or something.
>
> Do you have > 800Mb of RAM ?

256M of RAM, 256M of swap.

Here's /proc/meminfo as I type:
[~]3 cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 260276224 246419456 13856768 0 8347648 75317248
Swap: 271392768 58589184 212803584
MemTotal: 254176 kB
MemFree: 13532 kB
MemShared: 0 kB
Buffers: 8152 kB
Cached: 73552 kB
Active: 49716 kB
Inact_dirty: 28800 kB
Inact_clean: 3188 kB
Inact_target: 212 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 254176 kB
LowFree: 13532 kB
SwapTotal: 265032 kB
SwapFree: 207816 kB
[~]3

> > would explain the high context-switch rate. I'm not clear on how kswapd
> > can end up getting stuck and failing to free up something - unless it ends
> > up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
> > enough to get it the memory it needs, causing a deadlock/loop between
> > kswapd and __alloc_pages/wakeup_kswapd().
>
> bounce buffers for one

It's a Dell Optiplex GX110, using IDE. Grepping for 'bounce buffer' in
the source shows most hits in the SCSI code, and nothing obviously jumping
out at me...

<just speculating> Is it possible that i810_audio.c is to blame? I'm looking
at alloc_dmabuf() in there, and it tries to grab a big chunk of memory
for a DMA buffer (starting at order-4), which probably explains my __alloc_pages
messages. In addition, I run Enlightenment with audio enabled - so it's
quite possible that xscreensaver will generate a 'click' sound when it
pops up its dialog window - again tossing us into i810_audio. (scenario
there - mouse event happens while screen locked, xscreensaver wakes up and
starts mapping a window - E plays the sound, hosing the i810_audio driver,
and then when xscreensaver gets the CPU back, its next call for a page
gets wedged up.

Would it be worth applying Ed Tomlinson's icache/dcache patches and seeing
if that helps?

--
Valdis Kletnieks
Operating Systems Analyst
Virginia Tech

Attachments:

(No filename) (211.00 B)

2001-04-12 15:47:11

[permalink] [raw]

Subject: Re: scheduler went mad?

On Wed, Apr 11 2001, Josh McKinney wrote:
> I had the almost exact same thing happen to me just yesterday, I started up
> xcdroast, and cdda2wav and kswapd went crazy, backed out of X and all was
> well, and still is.
>
> Same kernel as you too.

I can tell you why this happens. Earlier kernels allocated one cd frame
worth of data for cdda ripping, but it was recently bumped to allow as
many as the ripping program asks for (up to 8). This requires a 4-5 page
allocation on x86, which is of course not reliable. cdrom.c adjusts for
failed allocations and drops to fewer number of frames (8 -> 4 -> 2 and
then just 1), but apparently the vm isn't handling this too well if
kswapd is going crazy.

I can switch to a static 8 frame allocation, but IMHO the vm should be
able to handle situations like this. It's not that unusual for a driver
to ask for a bigger chunk of memory if it can go faster that way, and
then be prepared to settle for less if need be. For cdda ripping, it
really does make a difference.

However, I can change ide-cd to do scatter gather in this case. It's the
nicer thing to do anyway. Does cdda2wav have some sort of 'do X number
of frames at the time' option? If so, use 1 and there should be no
problems.

--
Jens Axboe

2001-04-12 16:01:13

[permalink] [raw]

Subject: Re: scheduler went mad?

> 2.4.3-pre6 quietly made a very significant change there:
> it used to say "if (!order) goto try_again;" and now just
> says "goto try_again;". Which seems very sensible since
> __GFP_WAIT is set, but I do wonder if it was a safe change.
> We have mechanisms for freeing pages (order 0), but whether
> any higher orders come out of that is a matter of chance.

The fundamental problem is that it should say

wait_for_mm_progress();
goto try_again;

and we dont have that facility right now. At that point the looping on
failed allocations problem is ok as we will allow someone to make progress.
That leaves the bounce buffers for > 800Mb RAM which currently are seriously
horked and will loop and may even stack overflow by inspection

2001-04-12 16:29:36

by Rik van Riel

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001, Alan Cox wrote:

> > 2.4.3-pre6 quietly made a very significant change there:
> > it used to say "if (!order) goto try_again;" and now just
> > says "goto try_again;". Which seems very sensible since
> > __GFP_WAIT is set, but I do wonder if it was a safe change.
> > We have mechanisms for freeing pages (order 0), but whether
> > any higher orders come out of that is a matter of chance.
>
> The fundamental problem is that it should say
>
> wait_for_mm_progress();
> goto try_again;
>
> and we dont have that facility right now.

>From mm/page_alloc.c, around line 453:

if (gfp_mask & __GFP_WAIT) {
memory_pressure++;
try_to_free_pages(gfp_mask);
wakeup_bdflush(0);
goto try_again;
}

I guess we should remove the wakeup_bdflush(0) ... who put it
there anyway ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-04-12 16:49:59

by Marcelo Tosatti

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001, Rik van Riel wrote:

> On Thu, 12 Apr 2001, Alan Cox wrote:
>
> > > 2.4.3-pre6 quietly made a very significant change there:
> > > it used to say "if (!order) goto try_again;" and now just
> > > says "goto try_again;". Which seems very sensible since
> > > __GFP_WAIT is set, but I do wonder if it was a safe change.
> > > We have mechanisms for freeing pages (order 0), but whether
> > > any higher orders come out of that is a matter of chance.
> >
> > The fundamental problem is that it should say
> >
> > wait_for_mm_progress();
> > goto try_again;
> >
> > and we dont have that facility right now.
>
> >From mm/page_alloc.c, around line 453:
>
> if (gfp_mask & __GFP_WAIT) {
> memory_pressure++;
> try_to_free_pages(gfp_mask);
> wakeup_bdflush(0);
> goto try_again;
> }
>
> I guess we should remove the wakeup_bdflush(0) ... who put it
> there anyway ?

I did :)

This should fix it

--- mm/page_alloc.c.orig Thu Apr 12 13:47:53 2001
+++ mm/page_alloc.c Thu Apr 12 13:48:06 2001
@@ -454,7 +454,7 @@
if (gfp_mask & __GFP_WAIT) {
memory_pressure++;
try_to_free_pages(gfp_mask);
- wakeup_bdflush(0);
+ balance_dirty(NODEV);
goto try_again;
}

2001-04-12 16:52:39

by Marcelo Tosatti

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001, Marcelo Tosatti wrote:
>
> I did :)
>
> This should fix it
>
> --- mm/page_alloc.c.orig Thu Apr 12 13:47:53 2001
> +++ mm/page_alloc.c Thu Apr 12 13:48:06 2001
> @@ -454,7 +454,7 @@
> if (gfp_mask & __GFP_WAIT) {
> memory_pressure++;
> try_to_free_pages(gfp_mask);
> - wakeup_bdflush(0);
> + balance_dirty(NODEV);
> goto try_again;
> }

This patch is broken, ignore it.

Just removing wakeup_bdflush() is indeed correct.

We already wakeup bdflush at try_to_free_buffers() anyway.

2001-04-12 16:54:59

by Rik van Riel

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001, Marcelo Tosatti wrote:

> This should fix it
>
> --- mm/page_alloc.c.orig Thu Apr 12 13:47:53 2001
> +++ mm/page_alloc.c Thu Apr 12 13:48:06 2001
> @@ -454,7 +454,7 @@
> if (gfp_mask & __GFP_WAIT) {
> memory_pressure++;
> try_to_free_pages(gfp_mask);
> - wakeup_bdflush(0);
> + balance_dirty(NODEV);
> goto try_again;
> }

Remember that we can ONLY do this if we have __GFP_IO ...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-04-12 17:09:33

by Szabolcs Szakacsits

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001, Marcelo Tosatti wrote:

> This patch is broken, ignore it.
> Just removing wakeup_bdflush() is indeed correct.
> We already wakeup bdflush at try_to_free_buffers() anyway.

I still feel a bit unconfortable about processes looping forever in
__alloc_pages and because of this oom_killer also can't be moved to page
fault handler where I think its place should be. I'm using the patch
below.

Szaka

--- mm/page_alloc.c.orig Sat Mar 31 19:07:22 2001
+++ mm/page_alloc.c Mon Apr 2 21:05:31 2001
@@ -453,8 +453,12 @@
*/
if (gfp_mask & __GFP_WAIT) {
memory_pressure++;
- try_to_free_pages(gfp_mask);
- wakeup_bdflush(0);
+ if (!try_to_free_pages(gfp_mask));
+ return NULL;
goto try_again;
}
}

2001-04-12 17:59:59

by Rik van Riel

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:
> On Thu, 12 Apr 2001, Marcelo Tosatti wrote:
>
> > This patch is broken, ignore it.
> > Just removing wakeup_bdflush() is indeed correct.
> > We already wakeup bdflush at try_to_free_buffers() anyway.
>
> I still feel a bit unconfortable about processes looping forever in
> __alloc_pages and because of this oom_killer also can't be moved to
> page fault handler where I think its place should be. I'm using the
> patch below.

It's BROKEN. This means that if you have one task using up
all memory and you're waiting for the OOM kill of that task
to have effect, your syslogd, etc... will have their allocations
fail and will die.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-04-12 18:32:06

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001 16:12:55 BST, Alan Cox said:

> Do you have > 800Mb of RAM ?

Following up - it just bit again (twice)

The first time, it was xmms/kswapd fighting for CPU, and xmms was again immune
to kill -9. Interestingly enough, several minutes later, I closed 'netscape',
and xmms took the kill within a second or two.

10 minutes later, and another 2 programs that do audio got
wedged up. Oddly enough, I did an 'su', and they broke loose immediately.

I've ruled out i810_audio.c as a culprit - although I have programs that
do audio hanging, *those* programs are always writing their data down
a Unix socket to the actual process that writes to /dev/audio/dsp.
Hmm.. 'su' writes to syslog, and netscape has a few Unix sockets too.
Could the problem be related to running out of some resource related
to Unix-domain sockets, which clears up once some socket is closed?

Oddly enough, while I had 2 programs doing audio wedged, I was still
seeing (hearing actually ;) *new* processes open a connection to esd
and play sounds. Weird.

--
Valdis Kletnieks
Operating Systems Analyst
Virginia Tech

Attachments:

(No filename) (211.00 B)

2001-04-12 19:03:22

by Szabolcs Szakacsits

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001, Rik van Riel wrote:
> On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:
> > I still feel a bit unconfortable about processes looping forever in
> > __alloc_pages and because of this oom_killer also can't be moved to
> > page fault handler where I think its place should be. I'm using the
> > patch below.
> It's BROKEN. This means that if you have one task using up
> all memory and you're waiting for the OOM kill of that task
> to have effect, your syslogd, etc... will have their allocations
> fail and will die.

You mean without dropping out_of_memory() test in kswapd and calling
oom_kill() in page fault [i.e. without additional patch]? Yes, you're
competely true but I have the patch [see example below, 'm1' is the bad
guy] just didn't have time to extensively test it and don't know whether
there is side efffects getting rid of this infinite looping in
__alloc_pages() but locked up processes apparently don't make people
very happy ;)

Szaka

Out of Memory: Killed process 830 (m1), saved process 696 (httpd)
procs memory swap io system
r b w swpd free buff cache si so bi bo in cs
6 0 0 0 9492 100 1496 0 0 1386 2 2904 3877
5 0 0 0 7812 104 1788 0 0 289 0 689 22
5 0 0 0 6248 104 1788 0 0 0 0 108 19
5 0 0 0 4748 108 1840 0 0 56 0 219 21
5 0 0 0 3268 108 1868 0 0 28 0 165 23
5 0 1 0 1864 76 1868 0 0 0 5 120 61
5 0 1 0 1432 76 1252 0 0 0 0 108 1130
5 0 1 0 1236 80 796 0 0 65 0 246 4588
5 0 1 0 1236 80 668 0 0 0 0 110 8869
6 0 1 0 948 112 696 0 0 805 0 1814 8231
Out of Memory: Killed process 858 (m1), saved process 811 (vmstat)
5 0 1 0 924 152 444 0 0 1153 0 2731 18231
4 0 1 0 1720 148 828 0 0 750 3 1711 1876
5 0 1 0 1156 148 760 0 0 290 0 723 1967
4 0 1 0 1152 132 664 0 0 70 0 277 7249
4 0 1 0 1140 144 560 0 0 54 0 238 7942
4 0 1 0 1140 144 460 0 0 32 0 212 7521
Out of Memory: Killed process 834 (m1), saved process 418 (identd)

2001-04-12 20:19:34

by Rik van Riel

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:

> You mean without dropping out_of_memory() test in kswapd and calling
> oom_kill() in page fault [i.e. without additional patch]?

No. I think it's ok for __alloc_pages() to call oom_kill()
IF we turn out to be out of memory, but that should not even
be needed.

Also, when a task in __alloc_pages() is OOM-killed, it will
have PF_MEMALLOC set and will immediately break out of the
loop. The rest of the system will spin around in the loop
until the victim has exited and then their allocations will
succeed.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-04-12 21:53:32

by Szabolcs Szakacsits

[permalink] [raw]

Subject: Re: scheduler went mad?

On Thu, 12 Apr 2001, Rik van Riel wrote:
> On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:
> > You mean without dropping out_of_memory() test in kswapd and calling
> > oom_kill() in page fault [i.e. without additional patch]?
> No. I think it's ok for __alloc_pages() to call oom_kill()
> IF we turn out to be out of memory, but that should not even
> be needed.

Not __alloc_pages() calls oom_kill() however do_page_fault(). Not the
same. After the system tried *really* hard to get *one* free page and
couldn't managed why loop forever? To eat CPU and waiting for
out_of_memory() to *guess* when system is in OOM? I don't think so, if
processes can't progress because system can't page in any of their
pages, somebody must go.

> Also, when a task in __alloc_pages() is OOM-killed, it will
> have PF_MEMALLOC set and will immediately break out of the
> loop. The rest of the system will spin around in the loop
> until the victim has exited and then their allocations will
> succeed.

Yes, I think this is a problem. In page fault if OOM, "bad" process
selected, scheduled, killed and everybody runs happily even without to
notice system is low on memory. Fast and gracious process killing
instead of slow, painful death IF out_of_memory() correctly detects OOM.

Szaka

2001-04-12 22:24:57

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: scheduler went mad?

On Fri, 13 Apr 2001 01:02:21 +0200, Szabolcs Szakacsits said:

> Not __alloc_pages() calls oom_kill() however do_page_fault(). Not the
> same. After the system tried *really* hard to get *one* free page and
> couldn't managed why loop forever? To eat CPU and waiting for

For what it's worth, this *IS NOT* the case I'm getting bit by:

While kswapd was hung, I already had (from /proc/meminfo)

MemFree: 34064 kB

I suspect that kswapd is getting hung spinning on some *specific*
requirement that it's falling short on?

/Valdis

Attachments:

(No filename) (211.00 B)