2004-03-01 08:26:07

by Mike Fedyk

[permalink] [raw]
Subject: MM VM patches was: 2.6.3-mm4

Andrew Morton wrote:
> shrink_slab-for-all-zones.patch
> vm: scan slab in response to highmem scanning
>
> zone-balancing-fix.patch
> vmscan: zone balancing fix

On 2.6.3 + [1] + nfsd-lofft.patch running on a 1GB ram file server. I
have noticed two related issues.

First, under 2.6.3 it averages about 90MB[2] anon memory, and 30MB with
the -mm4 vm (the rest is in swap cache). This could balance out on the
normal non-idle week-day load though...

Second the -mm4 vm, there is a lot more swapping[3] going on during the
daily updatedb, and backup runs that are performed on this machine.
I'd have to call this second issue a regression, but I want to run it a
couple more days to see if it gets any better (unless you agree of course).

Mike

[1]
instrument-highmem-page-reclaim.patch
blk_congestion_wait-return-remaining.patch
vmscan-remove-priority.patch
kswapd-throttling-fixes.patch
vm-dont-rotate-active-list.patch
vm-lru-info.patch
vm-shrink-zone.patch
vm-tune-throttle.patch
shrink_slab-for-all-zones.patch
zone-balancing-fix.patch
zone-balancing-batching.patch

[2]
http://www.matchmail.com/stats/lrrd/matchmail.com/fileserver.matchmail.com-memory.html

[3]
http://www.matchmail.com/stats/lrrd/matchmail.com/fileserver.matchmail.com-swap.html


2004-03-01 08:49:41

by Nick Piggin

[permalink] [raw]
Subject: Re: MM VM patches was: 2.6.3-mm4



Mike Fedyk wrote:

> Andrew Morton wrote:
>
>> shrink_slab-for-all-zones.patch
>> vm: scan slab in response to highmem scanning
>>
>> zone-balancing-fix.patch
>> vmscan: zone balancing fix
>
>
> On 2.6.3 + [1] + nfsd-lofft.patch running on a 1GB ram file server.
> I have noticed two related issues.
>
> First, under 2.6.3 it averages about 90MB[2] anon memory, and 30MB
> with the -mm4 vm (the rest is in swap cache). This could balance out
> on the normal non-idle week-day load though...
>
> Second the -mm4 vm, there is a lot more swapping[3] going on during
> the daily updatedb, and backup runs that are performed on this machine.
> I'd have to call this second issue a regression, but I want to run it
> a couple more days to see if it gets any better (unless you agree of
> course).
>

There are a few things backed out now in 2.6.4-rc1-mm1, and quite a
few other changes. I hope we can trouble you to test 2.6.4-rc1-mm1?

Tell me, do you have highmem enabled on this system? If so, swapping
might be explained by the batching patch. With it, a small highmem
zone could possibly place quite a lot more pressure on a large
ZONE_NORMAL.

2.6.4-rc1-mm1 sould do much better here.

2004-03-01 09:08:36

by Mike Fedyk

[permalink] [raw]
Subject: Re: MM VM patches was: 2.6.3-mm4

Nick Piggin wrote:
>
>
> Mike Fedyk wrote:
>
>> Andrew Morton wrote:
>>
>>> shrink_slab-for-all-zones.patch
>>> vm: scan slab in response to highmem scanning
>>>
>>> zone-balancing-fix.patch
>>> vmscan: zone balancing fix
>>
>>
>>
>> On 2.6.3 + [1] + nfsd-lofft.patch running on a 1GB ram file server.
>> I have noticed two related issues.
>>
>> First, under 2.6.3 it averages about 90MB[2] anon memory, and 30MB
>> with the -mm4 vm (the rest is in swap cache). This could balance out
>> on the normal non-idle week-day load though...
>>
>> Second the -mm4 vm, there is a lot more swapping[3] going on during
>> the daily updatedb, and backup runs that are performed on this machine.
>> I'd have to call this second issue a regression, but I want to run it
>> a couple more days to see if it gets any better (unless you agree of
>> course).
>>
>
> There are a few things backed out now in 2.6.4-rc1-mm1, and quite a
> few other changes. I hope we can trouble you to test 2.6.4-rc1-mm1?

Yes, I saw that, but since I wasn't using the new code, I chose to keep
it in the "-mm4" thread. :-D

I'll backport it to 2.6.3 if it doesn't patch with "-F3"...

> Tell me, do you have highmem enabled on this system? If so, swapping

Yes, to get that extra 128MB ram. :)

> might be explained by the batching patch. With it, a small highmem
> zone could possibly place quite a lot more pressure on a large
> ZONE_NORMAL.
>
> 2.6.4-rc1-mm1 sould do much better here.

OK, I'll give that one a shot Monday or Tuesday night.

So, I'll merge up 2.6.3 + "vm of rc1-mm1" and tell you guys what I see.

Are the graphs helpful at all?

Mike

2004-03-01 09:11:06

by Nick Piggin

[permalink] [raw]
Subject: Re: MM VM patches was: 2.6.3-mm4



Nick Piggin wrote:

>
> There are a few things backed out now in 2.6.4-rc1-mm1, and quite a
> few other changes. I hope we can trouble you to test 2.6.4-rc1-mm1?
>
> Tell me, do you have highmem enabled on this system? If so, swapping
> might be explained by the batching patch. With it, a small highmem
> zone could possibly place quite a lot more pressure on a large
> ZONE_NORMAL.
>
> 2.6.4-rc1-mm1 sould do much better here.


Gah no. It would have the same problem actually, if that is indeed
what is happening.

It will take a bit more work to solve this in rc1-mm1. You would
probably want to explicitly use incremental min limits for kswapd.

(background info in kswapd-avoid-higher-zones.patch)

2004-03-01 09:27:43

by Nick Piggin

[permalink] [raw]
Subject: Re: MM VM patches was: 2.6.3-mm4



Mike Fedyk wrote:

> Nick Piggin wrote:
>
>>
>>
>> There are a few things backed out now in 2.6.4-rc1-mm1, and quite a
>> few other changes. I hope we can trouble you to test 2.6.4-rc1-mm1?
>
>
> Yes, I saw that, but since I wasn't using the new code, I chose to
> keep it in the "-mm4" thread. :-D
>
> I'll backport it to 2.6.3 if it doesn't patch with "-F3"...
>

Actually, see my other post. It is possible you'll have the same
problem.

>> Tell me, do you have highmem enabled on this system? If so, swapping
>
>
> Yes, to get that extra 128MB ram. :)
>


Yeah thats fine. I think this would be the right thing to do,
especially for a file server. It is something that should work.


>> might be explained by the batching patch. With it, a small highmem
>> zone could possibly place quite a lot more pressure on a large
>> ZONE_NORMAL.
>>
>> 2.6.4-rc1-mm1 sould do much better here.
>
>
> OK, I'll give that one a shot Monday or Tuesday night.
>
> So, I'll merge up 2.6.3 + "vm of rc1-mm1" and tell you guys what I see.
>

I'm not so hopeful for you anymore :P

> Are the graphs helpful at all?
>


My eyes! The goggles, they do nothing!

They have a lot of good info but I'm a bit hard pressed working
out what kernel is running where, and it's a bit hard working out
all the shades of blue on my crappy little monitor.

But if they were easier to read I reckon they'd be useful ;)

2004-03-01 09:48:00

by Mike Fedyk

[permalink] [raw]
Subject: Re: MM VM patches was: 2.6.3-mm4

Nick Piggin wrote:
>
>
> Mike Fedyk wrote:
>> So, I'll merge up 2.6.3 + "vm of rc1-mm1" and tell you guys what I see.
>>
>
> I'm not so hopeful for you anymore :P

These patches apply with only a few offsets if you apply them like in
the series file, so there's not much work for either of us in applying
these patches (unless I need to test without a dependent patch or
something obvious like that...)

>
>> Are the graphs helpful at all?
>>
>
>
> My eyes! The goggles, they do nothing!
>

Heh.

> They have a lot of good info but I'm a bit hard pressed working
> out what kernel is running where

Suffice it to say, 2.6.3 is the begining of week9, and 2.6.3-lofft-mm4vm
is the end of week9. The graphs weren't meant to keep secondary
information like kernel version...

> and it's a bit hard working out
> all the shades of blue on my crappy little monitor.

Yeah, I see what you mean. The code in the lrrd/munin project controls
what colors come in what order, but I can control what order the info is
output in...

> But if they were easier to read I reckon they'd be useful ;)

I'd like for that to be true especially since I rewrote the memory
plugin for munin to graph as much as was exported to userspace from the
Linux kernel...

Did I miss anything? ;)

2004-03-01 09:52:55

by Nick Piggin

[permalink] [raw]
Subject: [PATCH] 2.6.4-rc1-mm1: vm-kswapd-incremental-min (was Re: MM VM patches was: 2.6.3-mm4)

linux-2.6-npiggin/mm/vmscan.c | 36 ++++++++++++++++++++++++------------
1 files changed, 24 insertions(+), 12 deletions(-)

diff -puN mm/vmscan.c~vm-kswapd-incremental-min mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-kswapd-incremental-min 2004-03-01 20:29:18.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-03-01 20:44:26.000000000 +1100
@@ -889,6 +889,8 @@ out:
return ret;
}

+extern int sysctl_lower_zone_protection;
+
/*
* For kswapd, balance_pgdat() will work across all this node's zones until
* they are all at pages_high.
@@ -907,12 +909,9 @@ out:
* dead and from now on, only perform a short scan. Basically we're polling
* the zone for when the problem goes away.
*
- * kswapd scans the zones in the highmem->normal->dma direction. It skips
- * zones which have free_pages > pages_high, but once a zone is found to have
- * free_pages <= pages_high, we scan that zone and the lower zones regardless
- * of the number of free pages in the lower zones. This interoperates with
- * the page allocator fallback scheme to ensure that aging of pages is balanced
- * across the zones.
+ * balance_pgdat tries to coexist with the INFAMOUS "incremental min" by
+ * trying to free lower zones a bit harder if higher zones are low too.
+ * See mm/page_alloc.c
*/
static int balance_pgdat(pg_data_t *pgdat, int nr_pages, struct page_state *ps)
{
@@ -930,24 +929,37 @@ static int balance_pgdat(pg_data_t *pgda
}

for (priority = DEF_PRIORITY; priority; priority--) {
+ unsigned long min;
int all_zones_ok = 1;
int pages_scanned = 0;
+ min = 0; /* Shut up gcc */

- for (i = pgdat->nr_zones - 1; i >= 0; i--) {
+ for (i = 0; i < pgdat->nr_zones; i++) {
struct zone *zone = pgdat->node_zones + i;
int total_scanned = 0;
int max_scan;
int reclaimed;

- if (zone->all_unreclaimable && priority != DEF_PRIORITY)
- continue;
-
if (nr_pages == 0) { /* Not software suspend */
- if (zone->free_pages <= zone->pages_high)
- all_zones_ok = 0;
+ /* "incremental min" right here */
if (all_zones_ok)
+ min = zone->pages_high;
+ else
+ min += zone->pages_high;
+
+ if (zone->free_pages <= min)
+ all_zones_ok = 0;
+ else
continue;
+
+ min += zone->pages_high *
+ sysctl_lower_zone_protection;
}
+
+ /* Note: this is checked *after* min is incremented */
+ if (zone->all_unreclaimable && priority != DEF_PRIORITY)
+ continue;
+
zone->temp_priority = priority;
max_scan = zone->nr_inactive >> priority;
reclaimed = shrink_zone(zone, max_scan, GFP_KERNEL,

_


Attachments:
vm-kswapd-incremental-min.patch (2.52 kB)

2004-03-01 10:18:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] 2.6.4-rc1-mm1: vm-kswapd-incremental-min (was Re: MM VM patches was: 2.6.3-mm4)

Nick Piggin <[email protected]> wrote:
>
> Andrew, I think you had kswapd scanning in the direction opposite the
> one indicated by your comments. Or maybe I've just confused myself?
>

Nope, the node_zones[] array is indexed by

#define ZONE_DMA 0
#define ZONE_NORMAL 1
#define ZONE_HIGHMEM 2

Whereas the zonelist.zones[] array goes the other way:
[[[highmem,]normal,]dma,]NULL.

It's a complete dog's breakfast that stuff. Hard to understand, not very
logical and insufficiently commented.

2004-03-01 10:30:12

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] 2.6.4-rc1-mm1: vm-kswapd-incremental-min (was Re: MM VM patches was: 2.6.3-mm4)

linux-2.6-npiggin/mm/vmscan.c | 34 +++++++++++++++++++++++-----------
1 files changed, 23 insertions(+), 11 deletions(-)

diff -puN mm/vmscan.c~vm-kswapd-incremental-min mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-kswapd-incremental-min 2004-03-01 20:29:18.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-03-01 21:27:24.000000000 +1100
@@ -889,6 +889,8 @@ out:
return ret;
}

+extern int sysctl_lower_zone_protection;
+
/*
* For kswapd, balance_pgdat() will work across all this node's zones until
* they are all at pages_high.
@@ -907,12 +909,9 @@ out:
* dead and from now on, only perform a short scan. Basically we're polling
* the zone for when the problem goes away.
*
- * kswapd scans the zones in the highmem->normal->dma direction. It skips
- * zones which have free_pages > pages_high, but once a zone is found to have
- * free_pages <= pages_high, we scan that zone and the lower zones regardless
- * of the number of free pages in the lower zones. This interoperates with
- * the page allocator fallback scheme to ensure that aging of pages is balanced
- * across the zones.
+ * balance_pgdat tries to coexist with the INFAMOUS "incremental min" by
+ * trying to free lower zones a bit harder if higher zones are low too.
+ * See mm/page_alloc.c
*/
static int balance_pgdat(pg_data_t *pgdat, int nr_pages, struct page_state *ps)
{
@@ -930,8 +929,10 @@ static int balance_pgdat(pg_data_t *pgda
}

for (priority = DEF_PRIORITY; priority; priority--) {
+ unsigned long min;
int all_zones_ok = 1;
int pages_scanned = 0;
+ min = 0; /* Shut up gcc */

for (i = pgdat->nr_zones - 1; i >= 0; i--) {
struct zone *zone = pgdat->node_zones + i;
@@ -939,15 +940,26 @@ static int balance_pgdat(pg_data_t *pgda
int max_scan;
int reclaimed;

- if (zone->all_unreclaimable && priority != DEF_PRIORITY)
- continue;
-
if (nr_pages == 0) { /* Not software suspend */
- if (zone->free_pages <= zone->pages_high)
- all_zones_ok = 0;
+ /* "incremental min" right here */
if (all_zones_ok)
+ min = zone->pages_high;
+ else
+ min += zone->pages_high;
+
+ if (zone->free_pages <= min)
+ all_zones_ok = 0;
+ else
continue;
+
+ min += zone->pages_high *
+ sysctl_lower_zone_protection;
}
+
+ /* Note: this is checked *after* min is incremented */
+ if (zone->all_unreclaimable && priority != DEF_PRIORITY)
+ continue;
+
zone->temp_priority = priority;
max_scan = zone->nr_inactive >> priority;
reclaimed = shrink_zone(zone, max_scan, GFP_KERNEL,

_


Attachments:
vm-kswapd-incremental-min.patch (2.51 kB)