During an AIM7 run on a 16GB system, fork started failing around
32000 threads, despite the system having plenty of free swap and
15GB of pageable memory.
If normal pageout does not result in contiguous free pages for
kernel stacks, fall back to lumpy reclaim instead of failing fork
or doing excessive pageout IO.
I do not know whether this change is needed due to the extreme
stress test or because the inactive list is a smaller fraction
of system memory on huge systems.
Signed-off-by: Rik van Riel <[email protected]>
Index: linux-2.6.24-rc6-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.24-rc6-mm1.orig/mm/vmscan.c 2008-01-08 12:08:03.000000000 -0500
+++ linux-2.6.24-rc6-mm1/mm/vmscan.c 2008-01-08 12:21:04.000000000 -0500
@@ -870,7 +870,8 @@ int isolate_lru_page(struct page *page)
* of reclaimed pages
*/
static unsigned long shrink_inactive_list(unsigned long max_scan,
- struct zone *zone, struct scan_control *sc, int file)
+ struct zone *zone, struct scan_control *sc,
+ int priority, int file)
{
LIST_HEAD(page_list);
struct pagevec pvec;
@@ -888,8 +889,19 @@ static unsigned long shrink_inactive_lis
unsigned long nr_freed;
unsigned long nr_active;
unsigned int count[NR_LRU_LISTS] = { 0, };
- int mode = (sc->order > PAGE_ALLOC_COSTLY_ORDER) ?
- ISOLATE_BOTH : ISOLATE_INACTIVE;
+ int mode = ISOLATE_INACTIVE;
+
+ /*
+ * If we need a large contiguous chunk of memory, or have
+ * trouble getting a small set of contiguous pages, we
+ * will reclaim both active and inactive pages.
+ *
+ * We use the same threshold as pageout congestion_wait below.
+ */
+ if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+ mode = ISOLATE_BOTH;
+ else if (sc->order && priority < DEF_PRIORITY - 2)
+ mode = ISOLATE_BOTH;
nr_taken = sc->isolate_pages(sc->swap_cluster_max,
&page_list, &nr_scan, sc->order, mode,
@@ -1166,7 +1178,7 @@ static unsigned long shrink_list(enum lr
shrink_active_list(nr_to_scan, zone, sc, priority, file);
return 0;
}
- return shrink_inactive_list(nr_to_scan, zone, sc, file);
+ return shrink_inactive_list(nr_to_scan, zone, sc, priority, file);
}
/*
--
All Rights Reversed
On Tue, 8 Jan 2008, Rik van Riel wrote:
> If normal pageout does not result in contiguous free pages for
> kernel stacks, fall back to lumpy reclaim instead of failing fork
> or doing excessive pageout IO.
Good. Ccing Mel. This is going to help higher order pages which is useful
for a couple of other projects.
Reviewed-by: Christoph Lameter <[email protected]>
On (08/01/08 14:30), Christoph Lameter didst pronounce:
> On Tue, 8 Jan 2008, Rik van Riel wrote:
>
> > If normal pageout does not result in contiguous free pages for
> > kernel stacks, fall back to lumpy reclaim instead of failing fork
> > or doing excessive pageout IO.
>
> Good. Ccing Mel. This is going to help higher order pages which is useful
> for a couple of other projects.
>
Well, the patch only has any impact when the order you are reclaiming is
less than PAGE_ALLOC_COSTLY_ORDER so I would not have considered it of major
impact to other projects interested in high order allocations. However, in
isolation I have no problem with this patch and I can see how it makes sense
for the problem scenario described. I rebased just this patch to 2.6.24-rc7
and found no problems but I have not had the chance to review the whole set.
> Reviewed-by: Christoph Lameter <[email protected]>
>
Acked-by: Mel Gorman <[email protected]>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab