2004-10-07 21:25:51

by Chris Wright

[permalink] [raw]
Subject: kswapd in tight loop 2.6.9-rc3-bk-recent

Is this known? Just came back from lunch, so I've no clue what kicked it
off. Profile below. (2.6.9-rc3-bk from yesterday, pending updates don't
appear to touch vmscan or mm/ in general).

CPU: AMD64 processors, speed 1994.35 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask
of 0x00 (No unit mask) count 100000
samples % symbol name
2410135 53.4092 balance_pgdat
1328186 29.4329 shrink_zone
555121 12.3016 shrink_slab
84942 1.8823 __read_page_state
40770 0.9035 timer_interrupt
...

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net


2004-10-07 21:41:44

by Dave Jones

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

On Thu, Oct 07, 2004 at 02:20:19PM -0700, Chris Wright wrote:
> Is this known? Just came back from lunch, so I've no clue what kicked it
> off. Profile below. (2.6.9-rc3-bk from yesterday, pending updates don't
> appear to touch vmscan or mm/ in general).
>
> CPU: AMD64 processors, speed 1994.35 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
> unit mask
> of 0x00 (No unit mask) count 100000
> samples % symbol name
> 2410135 53.4092 balance_pgdat
> 1328186 29.4329 shrink_zone
> 555121 12.3016 shrink_slab
> 84942 1.8823 __read_page_state
> 40770 0.9035 timer_interrupt

I saw the same thing yesterday, also on an amd64 box though that
could be coincidence. The kswapd1 process was pegging the cpu at 99%
kswapd0 was idle. After a few minutes, the box became so unresponsive
I had to reboot it.

I had put this down to me fiddling with some patches, and it hasnt'
reappeared today yet, but it sounds like we're seeing the same thing.

Sadly, I didn't get a profile of what was happening.
A 'make allmodconfig' triggered it for me, on a box with 2GB of ram,
and 2GB of swap. No swap was in use when things 'went wierd', and
there was a bunch of RAM sitting free too (about half a gig if memory
serves correctly)

Dave

2004-10-07 23:41:42

by Andrew Morton

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

Chris Wright <[email protected]> wrote:
>
> Is this known? Just came back from lunch, so I've no clue what kicked it
> off. Profile below. (2.6.9-rc3-bk from yesterday, pending updates don't
> appear to touch vmscan or mm/ in general).
>
> CPU: AMD64 processors, speed 1994.35 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
> unit mask
> of 0x00 (No unit mask) count 100000
> samples % symbol name
> 2410135 53.4092 balance_pgdat
> 1328186 29.4329 shrink_zone
> 555121 12.3016 shrink_slab
> 84942 1.8823 __read_page_state

Oh fanfuckingtastic. Something in there is failing to reach its
termination condition. The code has become a trainwreck, so heaven knows
what it was.

For starters, let's actually use that local variable for something.





We haven't been incrementing local variable total_scanned since the
scan_control stuff went in.

Signed-off-by: Andrew Morton <[email protected]>
---

25-akpm/mm/vmscan.c | 1 +
1 files changed, 1 insertion(+)

diff -puN mm/vmscan.c~vmscan-total_scanned-fix mm/vmscan.c
--- 25/mm/vmscan.c~vmscan-total_scanned-fix Thu Oct 7 16:31:55 2004
+++ 25-akpm/mm/vmscan.c Thu Oct 7 16:31:55 2004
@@ -1054,6 +1054,7 @@ scan:
shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages);
sc.nr_reclaimed += reclaim_state->reclaimed_slab;
total_reclaimed += sc.nr_reclaimed;
+ total_scanned += sc.nr_scanned;
if (zone->all_unreclaimable)
continue;
if (zone->pages_scanned > zone->present_pages * 2)
_



This probably won't fix it.

It looks like the code will lock up if all zones are out of unreclaimable
memory, but you won't be hitting that.

I also wonder if it'll lock up if just the first zone has ->all_unreclaimable.

I think a good starting point here will be to revert the most recent
change.

2004-10-08 00:40:53

by Nick Piggin

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

Andrew Morton wrote:

>
>This probably won't fix it.
>
>It looks like the code will lock up if all zones are out of unreclaimable
>memory, but you won't be hitting that.
>
>

Out of _reclaimable_ memory?

It shouldn't if all_unreclaimable is being set correctly.

>I also wonder if it'll lock up if just the first zone has ->all_unreclaimable.
>
>

Well, not if the all_unreclaimable flag is set, but if it should be and
isn't,
then probably it will lock up.

>I think a good starting point here will be to revert the most recent
>change.
>

That may fix it for the simple fact that kswapd will just go through its
priority loop once then stop.

I think that resetting all_unreclaimable in free_pages_bulk is the wrong
idea though, because that will keep it clear if a bit of kernel memory is
being pinned and freed in the background, won't it?

I had a look and decided that all_unreclaimable should probably be cleared
only if vmscan.c frees some memory - but I couldn't really come up with any
hard numbers to back me up :P

2004-10-08 00:49:26

by Andrew Morton

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent


OK, after backing out the `goto spaghetti;' patch and cleaning up a few
thing I'll test the below. It'll make kswapd much less aggressive.



diff -puN mm/vmscan.c~no-wild-kswapd-2 mm/vmscan.c
--- 25/mm/vmscan.c~no-wild-kswapd-2 2004-10-07 17:38:20.342906376 -0700
+++ 25-akpm/mm/vmscan.c 2004-10-07 17:38:20.348905464 -0700
@@ -964,6 +964,17 @@ out:
* of the number of free pages in the lower zones. This interoperates with
* the page allocator fallback scheme to ensure that aging of pages is balanced
* across the zones.
+ *
+ * kswapd can be semi-livelocked if some other process is allocating pages
+ * while kswapd is simultaneously trying to balance the same zone. That's OK,
+ * because we _want_ kswapd to work continuously in this situation. But a
+ * side-effect of kswapd's ongoing work is that the pageout priority keeps on
+ * winding up so we bogusly start doing swapout.
+ *
+ * To fix this we take a snapshot of the number of pages which need to be
+ * reclaimed from each zone in zone->pages_to_reclaim and never reclaim more
+ * pages than that. Once the required number of pages have been reclaimed from
+ * each zone, we're done. kwsapd will go back to sleep until someone wakes it.
*/
static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
{
@@ -984,6 +995,7 @@ static int balance_pgdat(pg_data_t *pgda
struct zone *zone = pgdat->node_zones + i;

zone->temp_priority = DEF_PRIORITY;
+ zone->pages_to_reclaim = zone->pages_high - zone->pages_free;
}

for (priority = DEF_PRIORITY; priority >= 0; priority--) {
@@ -1003,7 +1015,7 @@ static int balance_pgdat(pg_data_t *pgda
priority != DEF_PRIORITY)
continue;

- if (zone->free_pages <= zone->pages_high) {
+ if (zone->pages_to_reclaim > 0) {
end_zone = i;
break;
}
@@ -1036,10 +1048,11 @@ static int balance_pgdat(pg_data_t *pgda
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue;

- if (nr_pages == 0) { /* Not software suspend */
- if (zone->free_pages <= zone->pages_high)
- all_zones_ok = 0;
- }
+ if (zone->pages_to_reclaim <= 0)
+ continue;
+
+ if (nr_pages == 0) /* Not software suspend */
+ all_zones_ok = 0;
zone->temp_priority = priority;
if (zone->prev_priority > priority)
zone->prev_priority = priority;
@@ -1049,6 +1062,10 @@ static int balance_pgdat(pg_data_t *pgda
shrink_zone(zone, &sc);
reclaim_state->reclaimed_slab = 0;
shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages);
+
+ /* This fails to account for slab reclaim */
+ zone->pages_to_reclaim -= sc.nr_reclaimed;
+
sc.nr_reclaimed += reclaim_state->reclaimed_slab;
total_reclaimed += sc.nr_reclaimed;
total_scanned += sc.nr_scanned;
diff -puN include/linux/mmzone.h~no-wild-kswapd-2 include/linux/mmzone.h
--- 25/include/linux/mmzone.h~no-wild-kswapd-2 2004-10-07 17:38:20.343906224 -0700
+++ 25-akpm/include/linux/mmzone.h 2004-10-07 17:40:20.847586880 -0700
@@ -137,8 +137,9 @@ struct zone {
unsigned long nr_scan_inactive;
unsigned long nr_active;
unsigned long nr_inactive;
- int all_unreclaimable; /* All pages pinned */
- unsigned long pages_scanned; /* since last reclaim */
+ long pages_to_reclaim; /* kswapd usage */
+ int all_unreclaimable; /* All pages pinned */
+ unsigned long pages_scanned; /* since last reclaim */

ZONE_PADDING(_pad2_)

_

2004-10-08 00:58:17

by Chris Wright

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

* Andrew Morton ([email protected]) wrote:
> Nick Piggin <[email protected]> wrote:
> >
> > >I think a good starting point here will be to revert the most recent
> > >change.
> > >
> >
> > That may fix it for the simple fact that kswapd will just go through its
> > priority loop once then stop.
>
> No it won't. It'll probably make the priority windup worse.

In the interest of data collection, here's the last bit before I reboot.

SysRq : Show Memory
Mem-info:
Node 1 DMA per-cpu: empty
Node 1 Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
Node 1 HighMem per-cpu: empty
Node 0 DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Node 0 Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
Node 0 HighMem per-cpu: empty

Free pages: 53200kB (0kB HighMem)
Active:239438 inactive:178323 dirty:8 writeback:0 unstable:0 free:13300 slab:76700 mapped:52305 pagetables:2172
Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
protections[]: 0 0 0
Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
protections[]: 0 0 0
Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
protections[]: 0 0 0
Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
Node 1 DMA: empty
Node 1 Normal: 150*4kB 66*8kB 45*16kB 0*32kB 192*64kB 55*128kB 8*256kB 2*512kB 1*1024kB 0*2048kB 0*4096kB = 25272kB
Node 1 HighMem: empty
Node 0 DMA: 8*4kB 1*8kB 1*16kB 13*32kB 4*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 728kB
Node 0 Normal: 10*4kB 3*8kB 2*16kB 3*32kB 308*64kB 43*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 27200kB
Node 0 HighMem: empty
Swap cache: add 79, delete 1, find 0/0, race 0+0
Free swap: 2031292kB
524127 pages of RAM
10946 reserved pages
263444 pages shared
78 pages swap cached

2004-10-08 00:45:54

by Andrew Morton

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

Nick Piggin <[email protected]> wrote:
>
> >I think a good starting point here will be to revert the most recent
> >change.
> >
>
> That may fix it for the simple fact that kswapd will just go through its
> priority loop once then stop.

No it won't. It'll probably make the priority windup worse.

2004-10-08 01:49:00

by Chris Wright

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

* Andrew Morton ([email protected]) wrote:
>
> OK, after backing out the `goto spaghetti;' patch and cleaning up a few
> thing I'll test the below. It'll make kswapd much less aggressive.

testing with this compile fix:

diff -u 25-akpm/mm/vmscan.c edited/mm/vmscan.c
--- 25-akpm/mm/vmscan.c 2004-10-07 17:38:20.348905464 -0700
+++ edited/mm/vmscan.c 2004-10-07 18:38:14 -07:00
@@ -999,7 +999,7 @@
struct zone *zone = pgdat->node_zones + i;

zone->temp_priority = DEF_PRIORITY;
- zone->pages_to_reclaim = zone->pages_high - zone->pages_free;
+ zone->pages_to_reclaim = zone->pages_high - zone->free_pages;
}

for (priority = DEF_PRIORITY; priority >= 0; priority--) {

2004-10-08 01:48:59

by Andrew Morton

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

Chris Wright <[email protected]> wrote:
>
> Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0
> Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
> protections[]: 0 0 0
> Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0

hm. empty zones all over the zonelist. We may not be handling that right.

> Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
> protections[]: 0 0 0
> Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
> protections[]: 0 0 0
> Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0

I can't see anything here which would cause kswapd to go nuts.

2004-10-08 01:53:39

by Chris Wright

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

* Chris Wright ([email protected]) wrote:
> * Andrew Morton ([email protected]) wrote:
> >
> > OK, after backing out the `goto spaghetti;' patch and cleaning up a few
> > thing I'll test the below. It'll make kswapd much less aggressive.
>
> testing with this compile fix:

passes initial simple testing (whereas I could get the mainline code, and the
one-liner to spin right off).

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-10-07 21:56:09

by Chris Wright

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

* Dave Jones ([email protected]) wrote:
> I saw the same thing yesterday, also on an amd64 box though that
> could be coincidence. The kswapd1 process was pegging the cpu at 99%
> kswapd0 was idle.

Same here.

> After a few minutes, the box became so unresponsive
> I had to reboot it.

Well, that CPU is useless ;-) Otherwise the thing is still responding.
One thing I noticed is the swap usage:

# grep Swap proc/meminfo
SwapCached: 4 kB
SwapTotal: 2031608 kB
SwapFree: 2031604 kB

So, I'm guessing the first page that went out to swap started this thing
off.

> I had put this down to me fiddling with some patches, and it hasnt'
> reappeared today yet, but it sounds like we're seeing the same thing.

I do have some local changes too. But nothing that should trigger this.

> Sadly, I didn't get a profile of what was happening.
> A 'make allmodconfig' triggered it for me, on a box with 2GB of ram,
> and 2GB of swap. No swap was in use when things 'went wierd', and
> there was a bunch of RAM sitting free too (about half a gig if memory
> serves correctly)

Memory's a bit tighter here, but still 55M free though.

# cat /proc/meminfo
MemTotal: 2053240 kB
MemFree: 55324 kB
Buffers: 161900 kB
Cached: 1321312 kB
SwapCached: 4 kB
Active: 988476 kB
Inactive: 651820 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 2053240 kB
LowFree: 55564 kB
SwapTotal: 2031608 kB
SwapFree: 2031604 kB
Dirty: 28 kB
Writeback: 0 kB
Mapped: 205600 kB
Slab: 334968 kB
Committed_AS: 581136 kB
PageTables: 8916 kB
VmallocTotal: 536870911 kB
VmallocUsed: 3080 kB
VmallocChunk: 536867727 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-10-08 01:57:45

by Andrew Morton

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

Chris Wright <[email protected]> wrote:
>
> (whereas I could get the mainline code, and the
> one-liner to spin right off).

How? (up to and including .config please).

2004-10-08 02:52:43

by Nick Piggin

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent



Andrew Morton wrote:

>Chris Wright <[email protected]> wrote:
>
>>(whereas I could get the mainline code, and the
>> one-liner to spin right off).
>>
>
>How? (up to and including .config please).
>
>
>

Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
commence spinning.

How's this go?



Attachments:
vm-fix-empty-zones.patch (1.11 kB)

2004-10-08 03:16:25

by Chris Wright

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

* Andrew Morton ([email protected]) wrote:
> Chris Wright <[email protected]> wrote:
> >
> > (whereas I could get the mainline code, and the
> > one-liner to spin right off).
>
> How? (up to and including .config please).

I just started up memhog (attached), and waited for swapping to start.
After a bit into swap (nothing happened) I killed it, and did swapoff
in order to start over. This promptly triggered the looping. Then I
tried your one-liner patch. And once memhog pushed pages into swap,
kswapd spun off. In both cases it was quick and easy to reproduce,
although the trigger wasn't identical.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net


Attachments:
(No filename) (705.00 B)
memhog.c (301.00 B)
kswapd.config (28.43 kB)
Download all attachments

2004-10-08 03:16:23

by Andrew Morton

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

Nick Piggin <[email protected]> wrote:
>
> >Chris Wright <[email protected]> wrote:
> >
> >>(whereas I could get the mainline code, and the
> >> one-liner to spin right off).
> >>
> >
> >How? (up to and including .config please).
> >
> >
> >
>
> Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
> commence spinning.

Maybe. It requires that the zonelists be screwy:

Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
protections[]: 0 0 0
Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
protections[]: 0 0 0
Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
protections[]: 0 0 0
Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0

See that DMA zone on node 1? Wonder how it got like that. It
should not be inside pgdat->nrzones anyway.

David, is your setup NUMA? Can you show us a sysrq-M dump?

2004-10-08 03:21:12

by Nick Piggin

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent



Andrew Morton wrote:

>Nick Piggin <[email protected]> wrote:
>
>>>Chris Wright <[email protected]> wrote:
>>>
>> >
>> >>(whereas I could get the mainline code, and the
>> >> one-liner to spin right off).
>> >>
>> >
>> >How? (up to and including .config please).
>> >
>> >
>> >
>>
>> Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
>> commence spinning.
>>
>
>Maybe. It requires that the zonelists be screwy:
>
>

Note that if this *was* the problem, then it would not be the fault of
my recent patch, rather *every* allocation (when memory is lowish)
would cause kswapd to wind through its priority loop then stop.
Probably not using enough CPU for anyone to really notice though.

2004-10-08 03:28:54

by Nick Piggin

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent



Andrew Morton wrote:

>Nick Piggin <[email protected]> wrote:
>
>>>Chris Wright <[email protected]> wrote:
>>>
>> >
>> >>(whereas I could get the mainline code, and the
>> >> one-liner to spin right off).
>> >>
>> >
>> >How? (up to and including .config please).
>> >
>> >
>> >
>>
>> Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
>> commence spinning.
>>
>
>Maybe. It requires that the zonelists be screwy:
>
> Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0
> Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
> protections[]: 0 0 0
> Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0
> Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
> protections[]: 0 0 0
> Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
> protections[]: 0 0 0
> Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0
>
>See that DMA zone on node 1? Wonder how it got like that. It
>should not be inside pgdat->nrzones anyway.
>
>
Oh? Why not? I didn't think empty zones were filtered out? (perhaps
they should be).

>David, is your setup NUMA? Can you show us a sysrq-M dump?
>
>

Yes. Incase he's alseep, I'll quote:

"I saw the same thing yesterday, also on an amd64 box though that
could be coincidence. The kswapd1 process was pegging the cpu at 99%
kswapd0 was idle. After a few minutes, the box became so unresponsive"

2004-10-08 03:33:52

by Andrew Morton

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

Nick Piggin <[email protected]> wrote:
>
> >> Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
> >> commence spinning.
> >>
> >
> >Maybe. It requires that the zonelists be screwy:
> >
> > Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
> > protections[]: 0 0 0
> > Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
> > protections[]: 0 0 0
> > Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> > protections[]: 0 0 0
> > Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
> > protections[]: 0 0 0
> > Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
> > protections[]: 0 0 0
> > Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> > protections[]: 0 0 0
> >
> >See that DMA zone on node 1? Wonder how it got like that. It
> >should not be inside pgdat->nrzones anyway.
> >
> >
> Oh? Why not? I didn't think empty zones were filtered out?

That ininitialised zone should be outside pgdat->nr_zones, no?

> (perhaps they should be).

They will be.

Chris, do you have time to test this, against -linus?

diff -puN mm/vmscan.c~vmscan-handle-empty-zones mm/vmscan.c
--- 25/mm/vmscan.c~vmscan-handle-empty-zones 2004-10-07 19:10:52.844797784 -0700
+++ 25-akpm/mm/vmscan.c 2004-10-07 19:11:49.804138648 -0700
@@ -851,6 +851,9 @@ shrink_caches(struct zone **zones, struc
for (i = 0; zones[i] != NULL; i++) {
struct zone *zone = zones[i];

+ if (zone->present_pages == 0)
+ continue;
+
zone->temp_priority = sc->priority;
if (zone->prev_priority > sc->priority)
zone->prev_priority = sc->priority;
@@ -999,6 +1002,9 @@ static int balance_pgdat(pg_data_t *pgda
for (i = pgdat->nr_zones - 1; i >= 0; i--) {
struct zone *zone = pgdat->node_zones + i;

+ if (zone->present_pages == 0)
+ continue;
+
if (zone->all_unreclaimable &&
priority != DEF_PRIORITY)
continue;
@@ -1033,6 +1039,9 @@ static int balance_pgdat(pg_data_t *pgda
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;

+ if (zone->present_pages == 0)
+ continue;
+
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue;

_

2004-10-08 03:59:26

by Nick Piggin

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent



Andrew Morton wrote:

>Nick Piggin <[email protected]> wrote:
>
>>>>Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
>>>>
>> >> commence spinning.
>> >>
>> >
>> >Maybe. It requires that the zonelists be screwy:
>> >
>> > Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
>> > protections[]: 0 0 0
>> > Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
>> > protections[]: 0 0 0
>> > Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
>> > protections[]: 0 0 0
>> > Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
>> > protections[]: 0 0 0
>> > Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
>> > protections[]: 0 0 0
>> > Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
>> > protections[]: 0 0 0
>> >
>> >See that DMA zone on node 1? Wonder how it got like that. It
>> >should not be inside pgdat->nrzones anyway.
>> >
>> >
>> Oh? Why not? I didn't think empty zones were filtered out?
>>
>
>That ininitialised zone should be outside pgdat->nr_zones, no?
>
>
>> (perhaps they should be).
>>
>
>They will be.
>

I don't think the DMA side will be skipped though? Because you're
always counting down from nr_zones... That would be what it is
what it is spinning on.

>
>Chris, do you have time to test this, against -linus?
>
>

I think my patch would be sufficient to handle the kswapd side
(yours would be valid too, but no need to add the extra checks).

>diff -puN mm/vmscan.c~vmscan-handle-empty-zones mm/vmscan.c
>--- 25/mm/vmscan.c~vmscan-handle-empty-zones 2004-10-07 19:10:52.844797784 -0700
>+++ 25-akpm/mm/vmscan.c 2004-10-07 19:11:49.804138648 -0700
>@@ -851,6 +851,9 @@ shrink_caches(struct zone **zones, struc
> for (i = 0; zones[i] != NULL; i++) {
> struct zone *zone = zones[i];
>
>+ if (zone->present_pages == 0)
>+ continue;
>+
> zone->temp_priority = sc->priority;
> if (zone->prev_priority > sc->priority)
> zone->prev_priority = sc->priority;
>

...but could probably use something like this as well, for the
direct reclaim side.

2004-10-08 04:48:58

by Nick Piggin

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

Nick Piggin wrote:

>
> I think my patch would be sufficient to handle the kswapd side
> (yours would be valid too, but no need to add the extra checks).

Here is the combined patch with Andrew's hunk added.

I guess it doesn't _really_ matter which gets tested, but this
one is probably the preferred way to go because it doesn't even
wake up kswapd for empty zones. Andrew, do you agree?

I guess it should get into -bk pretty soon if it passes testing.
It is fairly easy to see the failure cases that are fixed (and
hopefully it doesn't domino yet another latent bug :P).


Attachments:
vm-fix-empty-zones.patch (1.41 kB)

2004-10-08 04:58:55

by Andrew Morton

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

Nick Piggin <[email protected]> wrote:
>
> - if (zone->free_pages <= zone->pages_high) {
> + if (zone->free_pages < zone->pages_high) {

This is too subtle. An open-coded test for non-zero ->present_pages is far
more readable and maintainable.

2004-10-08 05:21:35

by Chris Wright

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

* Andrew Morton ([email protected]) wrote:
> Chris, do you have time to test this, against -linus?

Yeah. This patch held up against the simple testing, as did Nick's (not
the most recent combined one from him).

Here's some sample output from annotation I added:

balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1013: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-10-08 05:27:18

by Chris Wright

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

* Chris Wright ([email protected]) wrote:
> * Andrew Morton ([email protected]) wrote:
> > Chris, do you have time to test this, against -linus?
>
> Yeah. This patch held up against the simple testing, as did Nick's (not
> the most recent combined one from him).

err, meaning: I have not tested Nick's most recent combined patch.

2004-10-08 10:12:30

by Nick Piggin

[permalink] [raw]
Subject: Re: kswapd in tight loop 2.6.9-rc3-bk-recent

Chris Wright wrote:
> * Andrew Morton ([email protected]) wrote:
>
>>Chris, do you have time to test this, against -linus?
>
>
> Yeah. This patch held up against the simple testing, as did Nick's (not
> the most recent combined one from him).
>

Thanks. Any/all patches should do much the same job.

I'm pretty confident this was just a minor artifact brought out
by an earlier change, and things still look good for 2.6.9.