2009-10-01 14:04:08

by Suresh Jayaraman

[permalink] [raw]
Subject: [PATCH 04/31] mm: tag reseve pages

From: Peter Zijlstra <[email protected]>

Tag pages allocated from the reserves with a non-zero page->reserve.
This allows us to distinguish and account reserve pages.

Since low-memory situations are transient, and unrelated the the actual
page (any page can be on the freelist when we run low), don't mark the
page in any permanent way - just pass along the information to the
allocatee.

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Suresh Jayaraman <[email protected]>
---
include/linux/mm_types.h | 1 +
mm/page_alloc.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)

Index: mmotm/include/linux/mm_types.h
===================================================================
--- mmotm.orig/include/linux/mm_types.h
+++ mmotm/include/linux/mm_types.h
@@ -77,6 +77,7 @@ struct page {
union {
pgoff_t index; /* Our offset within mapping. */
void *freelist; /* SLUB: freelist req. slab lock */
+ int reserve; /* page_alloc: page is a reserve page */
};
struct list_head lru; /* Pageout list, eg. active_list
* protected by zone->lru_lock !
Index: mmotm/mm/page_alloc.c
===================================================================
--- mmotm.orig/mm/page_alloc.c
+++ mmotm/mm/page_alloc.c
@@ -1501,8 +1501,10 @@ zonelist_scan:
try_this_zone:
page = buffered_rmqueue(preferred_zone, zone, order,
gfp_mask, migratetype);
- if (page)
+ if (page) {
+ page->reserve = !!(alloc_flags & ALLOC_NO_WATERMARKS);
break;
+ }
this_zone_full:
if (NUMA_BUILD)
zlc_mark_zone_full(zonelist, z);


2009-10-01 21:09:58

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 04/31] mm: tag reseve pages

On Thu, 1 Oct 2009, Suresh Jayaraman wrote:

> Index: mmotm/mm/page_alloc.c
> ===================================================================
> --- mmotm.orig/mm/page_alloc.c
> +++ mmotm/mm/page_alloc.c
> @@ -1501,8 +1501,10 @@ zonelist_scan:
> try_this_zone:
> page = buffered_rmqueue(preferred_zone, zone, order,
> gfp_mask, migratetype);
> - if (page)
> + if (page) {
> + page->reserve = !!(alloc_flags & ALLOC_NO_WATERMARKS);
> break;
> + }
> this_zone_full:
> if (NUMA_BUILD)
> zlc_mark_zone_full(zonelist, z);

page->reserve won't necessary indicate that access to reserves was
_necessary_ for the allocation to succeed, though. This will mark any
page being allocated under PF_MEMALLOC as reserve when all zones may be
well above their min watermarks.

2009-10-02 04:42:23

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 04/31] mm: tag reseve pages

On Thursday October 1, [email protected] wrote:
> On Thu, 1 Oct 2009, Suresh Jayaraman wrote:
>
> > Index: mmotm/mm/page_alloc.c
> > ===================================================================
> > --- mmotm.orig/mm/page_alloc.c
> > +++ mmotm/mm/page_alloc.c
> > @@ -1501,8 +1501,10 @@ zonelist_scan:
> > try_this_zone:
> > page = buffered_rmqueue(preferred_zone, zone, order,
> > gfp_mask, migratetype);
> > - if (page)
> > + if (page) {
> > + page->reserve = !!(alloc_flags & ALLOC_NO_WATERMARKS);
> > break;
> > + }
> > this_zone_full:
> > if (NUMA_BUILD)
> > zlc_mark_zone_full(zonelist, z);
>
> page->reserve won't necessary indicate that access to reserves was
> _necessary_ for the allocation to succeed, though. This will mark any
> page being allocated under PF_MEMALLOC as reserve when all zones may be
> well above their min watermarks.

Normally if zones are above their watermarks, page->reserve will not
be set.
This is because __alloc_page_nodemask (which seems to be the main
non-inline entrypoint) first calls get_page_from_freelist with
alloc_flags set to ALLOC_WMARK_LOW|ALLOC_CPUSET.
Only if this fails does __alloc_page_nodemask call
__alloc_pages_slowpath which potentially sets ALLOC_NO_WATERMARKS in
alloc_flags.

So page->reserved being set actually tells us:
PF_MEMALLOC or GFP_MEMALLOC were used, and
a WMARK_LOW allocation attempt failed very recently

which is close enough to "the emergency reserves were used" I think.

Thanks,
NeilBrown

2009-10-02 09:50:45

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 04/31] mm: tag reseve pages

On Fri, 2 Oct 2009, Neil Brown wrote:

> Normally if zones are above their watermarks, page->reserve will not
> be set.
> This is because __alloc_page_nodemask (which seems to be the main
> non-inline entrypoint) first calls get_page_from_freelist with
> alloc_flags set to ALLOC_WMARK_LOW|ALLOC_CPUSET.
> Only if this fails does __alloc_page_nodemask call
> __alloc_pages_slowpath which potentially sets ALLOC_NO_WATERMARKS in
> alloc_flags.
>
> So page->reserved being set actually tells us:
> PF_MEMALLOC or GFP_MEMALLOC were used, and
> a WMARK_LOW allocation attempt failed very recently
>
> which is close enough to "the emergency reserves were used" I think.
>

There're a couple cornercases for GFP_ATOMIC, though:

- it isn't restricted by cpuset, so ALLOC_CPUSET will never get set for
the slowpath allocs and may very well allow the allocation to succeed
in zones far above their min watermark.

- it allows for allocating beyond the min watermark in allowed zones
simply by setting ALLOC_HARDER; these types of "reserve" allocations
wouldn't be marked as page->reserve with your patches if
ALLOC_NO_WATERMARKS wasn't set because of the allocation context.

The second one is debatable whether it fits your definition of reserve or
not, but there's an inconsistency if it doesn't because the allocation may
succeed in "no watermark" context (for example, in hard irq context) even
though that privilege wasn't necessary to successfully allocate: perhaps
it only needed ALLOC_HARDER.