2006-12-04 17:08:39

by Horst H. von Brand

[permalink] [raw]
Subject: Re: la la la la ... swappiness

Aucoin <[email protected]> wrote:

[...]

> The definition of perfectly good here may be up for debate or
> someone can explain it to me. This perfectly good data was
> cached under the tar yet hours after the tar has completed the
> pages are still cached.

That means that there isn't a need for that memory at all (and so they stay
around; why actively delete data (using up resources!) needlessly when it
would be a win to have them around in the (admittedly remote) case they'll
be needed again?), or the whole memory handling in Linux is very broken.
I'd vote for the former, i.e., your problems have nothing to do with memory
pressure and swapping. That would explain why your maneuvres didn't make a
difference...

In any case, how do you know it is the tar data that stays around, and not
just that the number of pages "in use" stays roughly constant?

Please explain again:

- What you are doing, step by step
- What are your exact requirements
- In what exact way is it missbehaving. Please tell /in detail/ how you
determine the real behaviour, not your deductions.

[Yes, I'm in my "dense" day today.]
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria +56 32 2654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 2797513


2006-12-04 17:49:50

by Aucoin

[permalink] [raw]
Subject: RE: la la la la ... swappiness



> From: Horst H. von Brand [mailto:[email protected]]
> That means that there isn't a need for that memory at all (and so they

In the current isolated non-production, not actually bearing a load test
case yes. But if I can't get it to not swap on an idle system I have no hope
of avoiding OOM on a loaded system.

> In any case, how do you know it is the tar data that stays around, and not
> just that the number of pages "in use" stays roughly constant?

I'm not dumping the contents of memory so I don't.

> - What you are doing, step by step

Trying to deliver a high availability, linearly scalable, clustered iSCSI
storage solution that can be upgraded with minimum downtime.

> - What are your exact requirements

OOM not to kill anything.

> - In what exact way is it missbehaving. Please tell /in detail/ how you

OOM kills important stuff.


2006-12-04 18:08:25

by Andrew Morton

[permalink] [raw]
Subject: Re: la la la la ... swappiness

On Mon, 04 Dec 2006 14:07:22 -0300
"Horst H. von Brand" <[email protected]> wrote:

> Please explain again:
>
> - What you are doing, step by step

That 2GB machine apparently has a 1.6GB shm segment which is mlocked. That will
cause the VM to do one heck of a lot of pointless scanning and could, I guess,
cause false oom decisions. It's also an ia32 highmem machine, which adds to the
fun.

We could scan more:

--- a/mm/vmscan.c~a
+++ a/mm/vmscan.c
@@ -918,6 +918,7 @@ static unsigned long shrink_zone(int pri
* slowly sift through the active list.
*/
zone->nr_scan_active += (zone->nr_active >> priority) + 1;
+ zone->nr_scan_active *= 2;
nr_active = zone->nr_scan_active;
if (nr_active >= sc->swap_cluster_max)
zone->nr_scan_active = 0;
@@ -925,6 +926,7 @@ static unsigned long shrink_zone(int pri
nr_active = 0;

zone->nr_scan_inactive += (zone->nr_inactive >> priority) + 1;
+ zone->nr_scan_inactive *= 2;
nr_inactive = zone->nr_scan_inactive;
if (nr_inactive >= sc->swap_cluster_max)
zone->nr_scan_inactive = 0;
_

but that's rather dumb. Better would be to remove mlocked pages from the
LRU.

2006-12-04 18:15:57

by Christoph Lameter

[permalink] [raw]
Subject: Re: la la la la ... swappiness

On Mon, 4 Dec 2006, Andrew Morton wrote:

> but that's rather dumb. Better would be to remove mlocked pages from the
> LRU.

Could we generalize the removal of sections of a zone from the LRU? I
believe this would help various buffer allocation schemes. We have some
issues with heavy LRU scans if large buffers are allocated on some
nodes.

2006-12-04 18:39:47

by Jeffrey Hundstad

[permalink] [raw]
Subject: Re: la la la la ... swappiness

Hello,

Please forgive me if this is naive. It seems that you could recompile
your tar and patch commands to use the POSIX_FADVISE(2) feature with the
POSIX_FADV_NOREUSE flags. It seems these would cause the tar and patch
commands to not clutter the page cache at all.

It'd be nice to be able to make a wrapper out of this kind of like the
fakeroot(1) command like such as:

nocachesuck tar xvfz kernel.tar.gz

ya know what I mean?

--
Jeffrey Hundstad

2006-12-04 18:44:15

by Tim Schmielau

[permalink] [raw]
Subject: RE: la la la la ... swappiness

On Mon, 4 Dec 2006, Aucoin wrote:

> > From: Horst H. von Brand [mailto:[email protected]]
> > That means that there isn't a need for that memory at all (and so they
>
> In the current isolated non-production, not actually bearing a load test
> case yes. But if I can't get it to not swap on an idle system I have no hope
> of avoiding OOM on a loaded system.

I don't think that assumption is correct. If you have no load on your
system and the pages in the shared application cache are not actually
touched, it is perfectly reasonable for the kernel to push out these
unused pages to swap space to have even more RAM available (e.g. for
caching the pages more recently accessed by the tar and patch commands).

I believe your OOM problem is not connected to these observations. There
might be a problem in the handling of OOM situations in Linux. But before
coming to that conclusion, I would suggest trying your simulated software
upgrade scenario with plenty of swap space available and without playing
any tricks with MM settings.

Tim

2006-12-04 21:26:29

by Aucoin

[permalink] [raw]
Subject: RE: la la la la ... swappiness

> From: Jeffrey Hundstad [mailto:[email protected]]
> POSIX_FADV_NOREUSE flags. It seems these would cause the tar and patch

WI may be na?ve as well, but that sounds interesting. Unless someone knows
of an obvious reason this won't work we can make a one-off tar command and
give it a whirl.


2006-12-04 21:30:04

by Aucoin

[permalink] [raw]
Subject: RE: la la la la ... swappiness

> From: Tim Schmielau [mailto:[email protected]]
> I believe your OOM problem is not connected to these observations. There

I don't know what to tell you except oom fires only when the update runs. I
know it's a pitiful datapoint so I'll work on getting more data.


2006-12-04 21:45:13

by Andrew Morton

[permalink] [raw]
Subject: Re: la la la la ... swappiness

On Mon, 4 Dec 2006 15:25:47 -0600
"Aucoin" <[email protected]> wrote:

> > From: Jeffrey Hundstad [mailto:[email protected]]
> > POSIX_FADV_NOREUSE flags. It seems these would cause the tar and patch
>
> WI may be na__ve as well, but that sounds interesting. Unless someone knows
> of an obvious reason this won't work we can make a one-off tar command and
> give it a whirl.
>

Well if altering tar is an option then sure, a
sync_file_range(SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER) followed
by fadvise(POSIX_FADV_DONTNEED) will free the memory up again.