2000-11-02 07:23:42

by Wayne Whitney

[permalink] [raw]
Subject: [BUG?] two swapping processes freeze 2.4.0-test10 (but not 2.2.18pre19)


Hi,

My group runs computations on a small linux cluster of RedHat 7.0 dual
PIII-700's with 512MB RAM/512MB swap. We have been experiencing some
lockups/poor performance on 2.4.x kernels when running two computations at
once. I've narrowed it down to a reproducible problem under 2.4.0-test10
(gcc 2.91.66), using a test computation which will grow in memory
footprint to about 830MB over the course of a few minutes:

I simultaneously run "top d1" and two of the test computations. All is
well (top updates smoothly) until physical RAM is exhausted. However, as
soon as swap is touched, then top freezes and does not update. In this
state, I can switch virtual consoles but not login to a new one; the
machine is pingable but does not respond to ssh. Once swap is exhausted,
the OOM killer kicks in and kills one of the test computations; then all
is well and everything works as expected.

A few observations/comments:

(1) Under 2.2.18pre19, this problem does not occur. Even while swapping,
top, sshd, etc work fine.

(2) If I run only one process, this problem does not occur.

(3) I sometimes find one of the machines frozen in the morning after
running two computations overnight (pingable, no ssh or console
switching). The last time this happened (under 2.4.0-test10-pre6/gcc 2.96
[before I knew better]) there were some unusual log messages, which I
attached below, in case this is related.

(4) I noticed a recent message on the kernel mailing list that I thought
might be the same problem:

On Wed, 1 Nov 2000, Rik van Riel wrote (in "Re: [BUG] /proc/<pid>/stat
access stalls badly for swapping process, 2.4.0-test10"):

> I have one possible reason for this ....
>
> 1) the procfs process does (in fs/proc/array.c::proc_pid_stat)
> down(&mm->mmap_sem);
>
> 2) but, in order to do that, it has to wait until the process
> it is trying to stat has /finished/ its page fault, and is
> not into its next one ...
>
> 3) combine this with the elevator starvation stuff (ask Jens
> Axboe for blk-7 to alleviate this issue) and you have a
> scenario where processes using /proc/<pid>/stat have the
> possibility to block on multiple processes that are in the
> process of handling a page fault (but are being starved)

Any help would be greatly appreciated. I am not on the kernel mailing
list but read it daily via an archive, so please cc: if you'd like a
timely response.

Best wishes,
Wayne


Oct 30 15:35:27 mf2 kernel: eed.
Oct 30 15:35:27 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 30 15:35:27 mf2 last message repeated 363 times
Oct 30 19:33:53 mf2 kernel: Out of Memory: Killed process 1485 (magma.exe).eed.
Oct 30 19:33:54 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 30 19:33:54 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 30 19:33:54 mf2 kernel: __alloc_pages: 0-order allocatied.
Oct 30 19:33:54 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 30 19:33:54 mf2 last message repeated 363 times
Oct 31 01:52:46 mf2 kernel: <ed.
Oct 31 01:52:46 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 31 01:52:46 mf2 last message repeated 363 times
Oct 31 02:47:53 mf2 kernel: eed.
Oct 31 02:47:57 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 31 02:47:58 mf2 last message repeated 89 times
Oct 31 02:47:58 mf2 kernel: __alloc_pages: 0-order allocation faied.
Oct 31 02:47:58 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 31 02:47:58 mf2 last message repeated 363 times
Oct 31 12:01:46 mf2 kernel: Out of Memory: Killed process 1691 (magma.exe).<3>__alloc_pages: 0-order allocation failed.
Oct 31 12:01:47 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 31 12:01:47 mf2 last message repeated 89 times
Oct 31 12:01:47 mf2 kernel: ed.
Oct 31 12:01:47 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 31 12:01:47 mf2 last message repeated 363 times
Oct 31 16:02:36 mf2 kernel: <ed.
Oct 31 16:02:45 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 31 16:02:46 mf2 last message repeated 89 times
Oct 31 16:02:46 mf2 kernel: __alloc_pages: 0-order allocation faied.
Oct 31 16:02:46 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 31 16:02:46 mf2 last message repeated 363 times
Oct 31 16:03:03 mf2 kernel: <ed.
Oct 31 16:03:03 mf2 kernel: __alloc_pages: 0-order allocation failed.
Oct 31 16:03:04 mf2 last message repeated 89 times
Oct 31 16:03:04 mf2 kernel: __alloc_pages: 0-order allocation faied.
Oct 31 16:03:04 mf2 kernel: __alloc_pages: 0-order allocation failed.


2000-11-02 08:12:36

by Wayne Whitney

[permalink] [raw]
Subject: Re: [BUG?] two swapping processes freeze 2.4.0-test10 (but not 2.2.18pre19)

On Wed, 1 Nov 2000, Wayne Whitney wrote:

> On Wed, 1 Nov 2000, Rik van Riel wrote (in "Re: [BUG] /proc/<pid>/stat
> access stalls badly for swapping process, 2.4.0-test10"):
>
> > 3) combine this with the elevator starvation stuff (ask Jens
> > Axboe for blk-7 to alleviate this issue) and you have a
> > scenario where processes using /proc/<pid>/stat have the
> > possibility to block on multiple processes that are in the
> > process of handling a page fault (but are being starved)

I just tried patching 2.4.0-test10 with Jens Axboe's blk-7 patch for
2.4.0-test10-pre6. It applies cleanly other than "Hunk #1 succeeded at
855 (offset -20 lines)" on file mm/filemap.c. Unfortunately, this had no
effect on the behavior I described . . .

Best wishes,
Wayne



2000-11-04 01:32:23

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: [BUG?] two swapping processes freeze 2.4.0-test10 (but not 2.2.18pre19)

In article <Pine.LNX.4.21.0011012222210.1296-100000@shimura.math.berkeley.edu> you wrote:
> I simultaneously run "top d1" and two of the test computations. All is
> well (top updates smoothly) until physical RAM is exhausted. However, as
> soon as swap is touched, then top freezes and does not update. In this
> state, I can switch virtual consoles but not login to a new one; the
> machine is pingable but does not respond to ssh. Once swap is exhausted,
> the OOM killer kicks in and kills one of the test computations; then all
> is well and everything works as expected.

Yes, i described the same behaviour. Rick answered, that the swap out
situation is stopping other processes, since the out of free memory
situation will also page out other active processes' pages.

I suggested a fix to actually start to page out only a own process pages if
the system is that short that it starts to page out very recently used
pages.

this will lead to the efect, that as long as there are unused pages on the
system one growing process can page out other pages "idle" pages. But as
soon as the growing process can only page out other processes (which will in
turn page in their own pages, ...) we should chnage page out strategy and
only page out the oldest pages of the growing process. Therefore the swap
penalty is fully on the growing process.

Greetings
Bernd