Is this known? Just came back from lunch, so I've no clue what kicked it
off. Profile below. (2.6.9-rc3-bk from yesterday, pending updates don't
appear to touch vmscan or mm/ in general).
CPU: AMD64 processors, speed 1994.35 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask
of 0x00 (No unit mask) count 100000
samples % symbol name
2410135 53.4092 balance_pgdat
1328186 29.4329 shrink_zone
555121 12.3016 shrink_slab
84942 1.8823 __read_page_state
40770 0.9035 timer_interrupt
...
thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
On Thu, Oct 07, 2004 at 02:20:19PM -0700, Chris Wright wrote:
> Is this known? Just came back from lunch, so I've no clue what kicked it
> off. Profile below. (2.6.9-rc3-bk from yesterday, pending updates don't
> appear to touch vmscan or mm/ in general).
>
> CPU: AMD64 processors, speed 1994.35 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
> unit mask
> of 0x00 (No unit mask) count 100000
> samples % symbol name
> 2410135 53.4092 balance_pgdat
> 1328186 29.4329 shrink_zone
> 555121 12.3016 shrink_slab
> 84942 1.8823 __read_page_state
> 40770 0.9035 timer_interrupt
I saw the same thing yesterday, also on an amd64 box though that
could be coincidence. The kswapd1 process was pegging the cpu at 99%
kswapd0 was idle. After a few minutes, the box became so unresponsive
I had to reboot it.
I had put this down to me fiddling with some patches, and it hasnt'
reappeared today yet, but it sounds like we're seeing the same thing.
Sadly, I didn't get a profile of what was happening.
A 'make allmodconfig' triggered it for me, on a box with 2GB of ram,
and 2GB of swap. No swap was in use when things 'went wierd', and
there was a bunch of RAM sitting free too (about half a gig if memory
serves correctly)
Dave
Chris Wright <[email protected]> wrote:
>
> Is this known? Just came back from lunch, so I've no clue what kicked it
> off. Profile below. (2.6.9-rc3-bk from yesterday, pending updates don't
> appear to touch vmscan or mm/ in general).
>
> CPU: AMD64 processors, speed 1994.35 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
> unit mask
> of 0x00 (No unit mask) count 100000
> samples % symbol name
> 2410135 53.4092 balance_pgdat
> 1328186 29.4329 shrink_zone
> 555121 12.3016 shrink_slab
> 84942 1.8823 __read_page_state
Oh fanfuckingtastic. Something in there is failing to reach its
termination condition. The code has become a trainwreck, so heaven knows
what it was.
For starters, let's actually use that local variable for something.
We haven't been incrementing local variable total_scanned since the
scan_control stuff went in.
Signed-off-by: Andrew Morton <[email protected]>
---
25-akpm/mm/vmscan.c | 1 +
1 files changed, 1 insertion(+)
diff -puN mm/vmscan.c~vmscan-total_scanned-fix mm/vmscan.c
--- 25/mm/vmscan.c~vmscan-total_scanned-fix Thu Oct 7 16:31:55 2004
+++ 25-akpm/mm/vmscan.c Thu Oct 7 16:31:55 2004
@@ -1054,6 +1054,7 @@ scan:
shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages);
sc.nr_reclaimed += reclaim_state->reclaimed_slab;
total_reclaimed += sc.nr_reclaimed;
+ total_scanned += sc.nr_scanned;
if (zone->all_unreclaimable)
continue;
if (zone->pages_scanned > zone->present_pages * 2)
_
This probably won't fix it.
It looks like the code will lock up if all zones are out of unreclaimable
memory, but you won't be hitting that.
I also wonder if it'll lock up if just the first zone has ->all_unreclaimable.
I think a good starting point here will be to revert the most recent
change.
Andrew Morton wrote:
>
>This probably won't fix it.
>
>It looks like the code will lock up if all zones are out of unreclaimable
>memory, but you won't be hitting that.
>
>
Out of _reclaimable_ memory?
It shouldn't if all_unreclaimable is being set correctly.
>I also wonder if it'll lock up if just the first zone has ->all_unreclaimable.
>
>
Well, not if the all_unreclaimable flag is set, but if it should be and
isn't,
then probably it will lock up.
>I think a good starting point here will be to revert the most recent
>change.
>
That may fix it for the simple fact that kswapd will just go through its
priority loop once then stop.
I think that resetting all_unreclaimable in free_pages_bulk is the wrong
idea though, because that will keep it clear if a bit of kernel memory is
being pinned and freed in the background, won't it?
I had a look and decided that all_unreclaimable should probably be cleared
only if vmscan.c frees some memory - but I couldn't really come up with any
hard numbers to back me up :P
OK, after backing out the `goto spaghetti;' patch and cleaning up a few
thing I'll test the below. It'll make kswapd much less aggressive.
diff -puN mm/vmscan.c~no-wild-kswapd-2 mm/vmscan.c
--- 25/mm/vmscan.c~no-wild-kswapd-2 2004-10-07 17:38:20.342906376 -0700
+++ 25-akpm/mm/vmscan.c 2004-10-07 17:38:20.348905464 -0700
@@ -964,6 +964,17 @@ out:
* of the number of free pages in the lower zones. This interoperates with
* the page allocator fallback scheme to ensure that aging of pages is balanced
* across the zones.
+ *
+ * kswapd can be semi-livelocked if some other process is allocating pages
+ * while kswapd is simultaneously trying to balance the same zone. That's OK,
+ * because we _want_ kswapd to work continuously in this situation. But a
+ * side-effect of kswapd's ongoing work is that the pageout priority keeps on
+ * winding up so we bogusly start doing swapout.
+ *
+ * To fix this we take a snapshot of the number of pages which need to be
+ * reclaimed from each zone in zone->pages_to_reclaim and never reclaim more
+ * pages than that. Once the required number of pages have been reclaimed from
+ * each zone, we're done. kwsapd will go back to sleep until someone wakes it.
*/
static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
{
@@ -984,6 +995,7 @@ static int balance_pgdat(pg_data_t *pgda
struct zone *zone = pgdat->node_zones + i;
zone->temp_priority = DEF_PRIORITY;
+ zone->pages_to_reclaim = zone->pages_high - zone->pages_free;
}
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
@@ -1003,7 +1015,7 @@ static int balance_pgdat(pg_data_t *pgda
priority != DEF_PRIORITY)
continue;
- if (zone->free_pages <= zone->pages_high) {
+ if (zone->pages_to_reclaim > 0) {
end_zone = i;
break;
}
@@ -1036,10 +1048,11 @@ static int balance_pgdat(pg_data_t *pgda
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue;
- if (nr_pages == 0) { /* Not software suspend */
- if (zone->free_pages <= zone->pages_high)
- all_zones_ok = 0;
- }
+ if (zone->pages_to_reclaim <= 0)
+ continue;
+
+ if (nr_pages == 0) /* Not software suspend */
+ all_zones_ok = 0;
zone->temp_priority = priority;
if (zone->prev_priority > priority)
zone->prev_priority = priority;
@@ -1049,6 +1062,10 @@ static int balance_pgdat(pg_data_t *pgda
shrink_zone(zone, &sc);
reclaim_state->reclaimed_slab = 0;
shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages);
+
+ /* This fails to account for slab reclaim */
+ zone->pages_to_reclaim -= sc.nr_reclaimed;
+
sc.nr_reclaimed += reclaim_state->reclaimed_slab;
total_reclaimed += sc.nr_reclaimed;
total_scanned += sc.nr_scanned;
diff -puN include/linux/mmzone.h~no-wild-kswapd-2 include/linux/mmzone.h
--- 25/include/linux/mmzone.h~no-wild-kswapd-2 2004-10-07 17:38:20.343906224 -0700
+++ 25-akpm/include/linux/mmzone.h 2004-10-07 17:40:20.847586880 -0700
@@ -137,8 +137,9 @@ struct zone {
unsigned long nr_scan_inactive;
unsigned long nr_active;
unsigned long nr_inactive;
- int all_unreclaimable; /* All pages pinned */
- unsigned long pages_scanned; /* since last reclaim */
+ long pages_to_reclaim; /* kswapd usage */
+ int all_unreclaimable; /* All pages pinned */
+ unsigned long pages_scanned; /* since last reclaim */
ZONE_PADDING(_pad2_)
_
* Andrew Morton ([email protected]) wrote:
> Nick Piggin <[email protected]> wrote:
> >
> > >I think a good starting point here will be to revert the most recent
> > >change.
> > >
> >
> > That may fix it for the simple fact that kswapd will just go through its
> > priority loop once then stop.
>
> No it won't. It'll probably make the priority windup worse.
In the interest of data collection, here's the last bit before I reboot.
SysRq : Show Memory
Mem-info:
Node 1 DMA per-cpu: empty
Node 1 Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
Node 1 HighMem per-cpu: empty
Node 0 DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Node 0 Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
Node 0 HighMem per-cpu: empty
Free pages: 53200kB (0kB HighMem)
Active:239438 inactive:178323 dirty:8 writeback:0 unstable:0 free:13300 slab:76700 mapped:52305 pagetables:2172
Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
protections[]: 0 0 0
Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
protections[]: 0 0 0
Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
protections[]: 0 0 0
Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
Node 1 DMA: empty
Node 1 Normal: 150*4kB 66*8kB 45*16kB 0*32kB 192*64kB 55*128kB 8*256kB 2*512kB 1*1024kB 0*2048kB 0*4096kB = 25272kB
Node 1 HighMem: empty
Node 0 DMA: 8*4kB 1*8kB 1*16kB 13*32kB 4*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 728kB
Node 0 Normal: 10*4kB 3*8kB 2*16kB 3*32kB 308*64kB 43*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 27200kB
Node 0 HighMem: empty
Swap cache: add 79, delete 1, find 0/0, race 0+0
Free swap: 2031292kB
524127 pages of RAM
10946 reserved pages
263444 pages shared
78 pages swap cached
Nick Piggin <[email protected]> wrote:
>
> >I think a good starting point here will be to revert the most recent
> >change.
> >
>
> That may fix it for the simple fact that kswapd will just go through its
> priority loop once then stop.
No it won't. It'll probably make the priority windup worse.
* Andrew Morton ([email protected]) wrote:
>
> OK, after backing out the `goto spaghetti;' patch and cleaning up a few
> thing I'll test the below. It'll make kswapd much less aggressive.
testing with this compile fix:
diff -u 25-akpm/mm/vmscan.c edited/mm/vmscan.c
--- 25-akpm/mm/vmscan.c 2004-10-07 17:38:20.348905464 -0700
+++ edited/mm/vmscan.c 2004-10-07 18:38:14 -07:00
@@ -999,7 +999,7 @@
struct zone *zone = pgdat->node_zones + i;
zone->temp_priority = DEF_PRIORITY;
- zone->pages_to_reclaim = zone->pages_high - zone->pages_free;
+ zone->pages_to_reclaim = zone->pages_high - zone->free_pages;
}
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
Chris Wright <[email protected]> wrote:
>
> Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0
> Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
> protections[]: 0 0 0
> Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0
hm. empty zones all over the zonelist. We may not be handling that right.
> Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
> protections[]: 0 0 0
> Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
> protections[]: 0 0 0
> Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0
I can't see anything here which would cause kswapd to go nuts.
* Chris Wright ([email protected]) wrote:
> * Andrew Morton ([email protected]) wrote:
> >
> > OK, after backing out the `goto spaghetti;' patch and cleaning up a few
> > thing I'll test the below. It'll make kswapd much less aggressive.
>
> testing with this compile fix:
passes initial simple testing (whereas I could get the mainline code, and the
one-liner to spin right off).
thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
* Dave Jones ([email protected]) wrote:
> I saw the same thing yesterday, also on an amd64 box though that
> could be coincidence. The kswapd1 process was pegging the cpu at 99%
> kswapd0 was idle.
Same here.
> After a few minutes, the box became so unresponsive
> I had to reboot it.
Well, that CPU is useless ;-) Otherwise the thing is still responding.
One thing I noticed is the swap usage:
# grep Swap proc/meminfo
SwapCached: 4 kB
SwapTotal: 2031608 kB
SwapFree: 2031604 kB
So, I'm guessing the first page that went out to swap started this thing
off.
> I had put this down to me fiddling with some patches, and it hasnt'
> reappeared today yet, but it sounds like we're seeing the same thing.
I do have some local changes too. But nothing that should trigger this.
> Sadly, I didn't get a profile of what was happening.
> A 'make allmodconfig' triggered it for me, on a box with 2GB of ram,
> and 2GB of swap. No swap was in use when things 'went wierd', and
> there was a bunch of RAM sitting free too (about half a gig if memory
> serves correctly)
Memory's a bit tighter here, but still 55M free though.
# cat /proc/meminfo
MemTotal: 2053240 kB
MemFree: 55324 kB
Buffers: 161900 kB
Cached: 1321312 kB
SwapCached: 4 kB
Active: 988476 kB
Inactive: 651820 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 2053240 kB
LowFree: 55564 kB
SwapTotal: 2031608 kB
SwapFree: 2031604 kB
Dirty: 28 kB
Writeback: 0 kB
Mapped: 205600 kB
Slab: 334968 kB
Committed_AS: 581136 kB
PageTables: 8916 kB
VmallocTotal: 536870911 kB
VmallocUsed: 3080 kB
VmallocChunk: 536867727 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
Chris Wright <[email protected]> wrote:
>
> (whereas I could get the mainline code, and the
> one-liner to spin right off).
How? (up to and including .config please).
Andrew Morton wrote:
>Chris Wright <[email protected]> wrote:
>
>>(whereas I could get the mainline code, and the
>> one-liner to spin right off).
>>
>
>How? (up to and including .config please).
>
>
>
Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
commence spinning.
How's this go?
* Andrew Morton ([email protected]) wrote:
> Chris Wright <[email protected]> wrote:
> >
> > (whereas I could get the mainline code, and the
> > one-liner to spin right off).
>
> How? (up to and including .config please).
I just started up memhog (attached), and waited for swapping to start.
After a bit into swap (nothing happened) I killed it, and did swapoff
in order to start over. This promptly triggered the looping. Then I
tried your one-liner patch. And once memhog pushed pages into swap,
kswapd spun off. In both cases it was quick and easy to reproduce,
although the trigger wasn't identical.
thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
Nick Piggin <[email protected]> wrote:
>
> >Chris Wright <[email protected]> wrote:
> >
> >>(whereas I could get the mainline code, and the
> >> one-liner to spin right off).
> >>
> >
> >How? (up to and including .config please).
> >
> >
> >
>
> Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
> commence spinning.
Maybe. It requires that the zonelists be screwy:
Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
protections[]: 0 0 0
Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
protections[]: 0 0 0
Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
protections[]: 0 0 0
Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
protections[]: 0 0 0
See that DMA zone on node 1? Wonder how it got like that. It
should not be inside pgdat->nrzones anyway.
David, is your setup NUMA? Can you show us a sysrq-M dump?
Andrew Morton wrote:
>Nick Piggin <[email protected]> wrote:
>
>>>Chris Wright <[email protected]> wrote:
>>>
>> >
>> >>(whereas I could get the mainline code, and the
>> >> one-liner to spin right off).
>> >>
>> >
>> >How? (up to and including .config please).
>> >
>> >
>> >
>>
>> Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
>> commence spinning.
>>
>
>Maybe. It requires that the zonelists be screwy:
>
>
Note that if this *was* the problem, then it would not be the fault of
my recent patch, rather *every* allocation (when memory is lowish)
would cause kswapd to wind through its priority loop then stop.
Probably not using enough CPU for anyone to really notice though.
Andrew Morton wrote:
>Nick Piggin <[email protected]> wrote:
>
>>>Chris Wright <[email protected]> wrote:
>>>
>> >
>> >>(whereas I could get the mainline code, and the
>> >> one-liner to spin right off).
>> >>
>> >
>> >How? (up to and including .config please).
>> >
>> >
>> >
>>
>> Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
>> commence spinning.
>>
>
>Maybe. It requires that the zonelists be screwy:
>
> Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0
> Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
> protections[]: 0 0 0
> Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0
> Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
> protections[]: 0 0 0
> Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
> protections[]: 0 0 0
> Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> protections[]: 0 0 0
>
>See that DMA zone on node 1? Wonder how it got like that. It
>should not be inside pgdat->nrzones anyway.
>
>
Oh? Why not? I didn't think empty zones were filtered out? (perhaps
they should be).
>David, is your setup NUMA? Can you show us a sysrq-M dump?
>
>
Yes. Incase he's alseep, I'll quote:
"I saw the same thing yesterday, also on an amd64 box though that
could be coincidence. The kswapd1 process was pegging the cpu at 99%
kswapd0 was idle. After a few minutes, the box became so unresponsive"
Nick Piggin <[email protected]> wrote:
>
> >> Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
> >> commence spinning.
> >>
> >
> >Maybe. It requires that the zonelists be screwy:
> >
> > Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
> > protections[]: 0 0 0
> > Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
> > protections[]: 0 0 0
> > Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> > protections[]: 0 0 0
> > Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
> > protections[]: 0 0 0
> > Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
> > protections[]: 0 0 0
> > Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
> > protections[]: 0 0 0
> >
> >See that DMA zone on node 1? Wonder how it got like that. It
> >should not be inside pgdat->nrzones anyway.
> >
> >
> Oh? Why not? I didn't think empty zones were filtered out?
That ininitialised zone should be outside pgdat->nr_zones, no?
> (perhaps they should be).
They will be.
Chris, do you have time to test this, against -linus?
diff -puN mm/vmscan.c~vmscan-handle-empty-zones mm/vmscan.c
--- 25/mm/vmscan.c~vmscan-handle-empty-zones 2004-10-07 19:10:52.844797784 -0700
+++ 25-akpm/mm/vmscan.c 2004-10-07 19:11:49.804138648 -0700
@@ -851,6 +851,9 @@ shrink_caches(struct zone **zones, struc
for (i = 0; zones[i] != NULL; i++) {
struct zone *zone = zones[i];
+ if (zone->present_pages == 0)
+ continue;
+
zone->temp_priority = sc->priority;
if (zone->prev_priority > sc->priority)
zone->prev_priority = sc->priority;
@@ -999,6 +1002,9 @@ static int balance_pgdat(pg_data_t *pgda
for (i = pgdat->nr_zones - 1; i >= 0; i--) {
struct zone *zone = pgdat->node_zones + i;
+ if (zone->present_pages == 0)
+ continue;
+
if (zone->all_unreclaimable &&
priority != DEF_PRIORITY)
continue;
@@ -1033,6 +1039,9 @@ static int balance_pgdat(pg_data_t *pgda
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;
+ if (zone->present_pages == 0)
+ continue;
+
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue;
_
Andrew Morton wrote:
>Nick Piggin <[email protected]> wrote:
>
>>>>Ah, free_pages <= pages_high, ie. 0 <= 0, which is true;
>>>>
>> >> commence spinning.
>> >>
>> >
>> >Maybe. It requires that the zonelists be screwy:
>> >
>> > Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
>> > protections[]: 0 0 0
>> > Node 1 Normal free:25272kB min:1020kB low:2040kB high:3060kB active:624172kB inactive:282700kB present:1047936kB
>> > protections[]: 0 0 0
>> > Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
>> > protections[]: 0 0 0
>> > Node 0 DMA free:728kB min:12kB low:24kB high:36kB active:788kB inactive:7848kB present:16384kB
>> > protections[]: 0 0 0
>> > Node 0 Normal free:27200kB min:1004kB low:2008kB high:3012kB active:332792kB inactive:422744kB present:1032188kB
>> > protections[]: 0 0 0
>> > Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
>> > protections[]: 0 0 0
>> >
>> >See that DMA zone on node 1? Wonder how it got like that. It
>> >should not be inside pgdat->nrzones anyway.
>> >
>> >
>> Oh? Why not? I didn't think empty zones were filtered out?
>>
>
>That ininitialised zone should be outside pgdat->nr_zones, no?
>
>
>> (perhaps they should be).
>>
>
>They will be.
>
I don't think the DMA side will be skipped though? Because you're
always counting down from nr_zones... That would be what it is
what it is spinning on.
>
>Chris, do you have time to test this, against -linus?
>
>
I think my patch would be sufficient to handle the kswapd side
(yours would be valid too, but no need to add the extra checks).
>diff -puN mm/vmscan.c~vmscan-handle-empty-zones mm/vmscan.c
>--- 25/mm/vmscan.c~vmscan-handle-empty-zones 2004-10-07 19:10:52.844797784 -0700
>+++ 25-akpm/mm/vmscan.c 2004-10-07 19:11:49.804138648 -0700
>@@ -851,6 +851,9 @@ shrink_caches(struct zone **zones, struc
> for (i = 0; zones[i] != NULL; i++) {
> struct zone *zone = zones[i];
>
>+ if (zone->present_pages == 0)
>+ continue;
>+
> zone->temp_priority = sc->priority;
> if (zone->prev_priority > sc->priority)
> zone->prev_priority = sc->priority;
>
...but could probably use something like this as well, for the
direct reclaim side.
Nick Piggin wrote:
>
> I think my patch would be sufficient to handle the kswapd side
> (yours would be valid too, but no need to add the extra checks).
Here is the combined patch with Andrew's hunk added.
I guess it doesn't _really_ matter which gets tested, but this
one is probably the preferred way to go because it doesn't even
wake up kswapd for empty zones. Andrew, do you agree?
I guess it should get into -bk pretty soon if it passes testing.
It is fairly easy to see the failure cases that are fixed (and
hopefully it doesn't domino yet another latent bug :P).
Nick Piggin <[email protected]> wrote:
>
> - if (zone->free_pages <= zone->pages_high) {
> + if (zone->free_pages < zone->pages_high) {
This is too subtle. An open-coded test for non-zero ->present_pages is far
more readable and maintainable.
* Andrew Morton ([email protected]) wrote:
> Chris, do you have time to test this, against -linus?
Yeah. This patch held up against the simple testing, as did Nick's (not
the most recent combined one from him).
Here's some sample output from annotation I added:
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1013: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
balance_pgdat:1050: zone->DMA present_pages == 0
thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
* Chris Wright ([email protected]) wrote:
> * Andrew Morton ([email protected]) wrote:
> > Chris, do you have time to test this, against -linus?
>
> Yeah. This patch held up against the simple testing, as did Nick's (not
> the most recent combined one from him).
err, meaning: I have not tested Nick's most recent combined patch.
Chris Wright wrote:
> * Andrew Morton ([email protected]) wrote:
>
>>Chris, do you have time to test this, against -linus?
>
>
> Yeah. This patch held up against the simple testing, as did Nick's (not
> the most recent combined one from him).
>
Thanks. Any/all patches should do much the same job.
I'm pretty confident this was just a minor artifact brought out
by an earlier change, and things still look good for 2.6.9.