2010-11-30 10:15:09

by Balbir Singh

[permalink] [raw]
Subject: [PATCH 0/3] Series short description

The following series implements page cache control,
this is a split out version of patch 1 of version 3 of the
page cache optimization patches posted earlier at
http://www.mail-archive.com/[email protected]/msg43654.html

Christoph Lamater recommended splitting out patch 1, which
is what this series does

Detailed Description
====================
This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario

- In a virtualized environment with cache=writethrough, we see
double caching - (one in the host and one in the guest). As
we try to scale guests, cache usage across the system grows.
The goal of this patch is to reclaim page cache when Linux is running
as a guest and get the host to hold the page cache and manage it.
There might be temporary duplication, but in the long run, memory
in the guests would be used for mapped pages.
- The option is controlled via a boot option and the administrator
can selectively turn it on, on a need to use basis.

A lot of the code is borrowed from zone_reclaim_mode logic for
__zone_reclaim(). One might argue that the with ballooning and
KSM this feature is not very useful, but even with ballooning,
we need extra logic to balloon multiple VM machines and it is hard
to figure out the correct amount of memory to balloon. With these
patches applied, each guest has a sufficient amount of free memory
available, that can be easily seen and reclaimed by the balloon driver.
The additional memory in the guest can be reused for additional
applications or used to start additional guests/balance memory in
the host.

KSM currently does not de-duplicate host and guest page cache. The goal
of this patch is to help automatically balance unmapped page cache when
instructed to do so.

There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
and the number of pages to reclaim when unmapped_page_control argument
is supplied. These numbers were chosen to avoid aggressiveness in
reaping page cache ever so frequently, at the same time providing control.

The sysctl for min_unmapped_ratio provides further control from
within the guest on the amount of unmapped pages to reclaim.



For a single VM - running kernbench

Enabled

Optimal load -j 8 run number 1...
Optimal load -j 8 run number 2...
Optimal load -j 8 run number 3...
Optimal load -j 8 run number 4...
Optimal load -j 8 run number 5...
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 273.726 (1.2683)
User Time 190.014 (0.589941)
System Time 298.758 (1.72574)
Percent CPU 178 (0)
Context Switches 119953 (865.74)
Sleeps 38758 (795.074)

Disabled

Optimal load -j 8 run number 1...
Optimal load -j 8 run number 2...
Optimal load -j 8 run number 3...
Optimal load -j 8 run number 4...
Optimal load -j 8 run number 5...
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 272.672 (0.453178)
User Time 189.7 (0.718157)
System Time 296.77 (0.845606)
Percent CPU 178 (0)
Context Switches 118822 (277.434)
Sleeps 37542.8 (545.922)

More data on the test results with the earlier patch is
at http://www.mail-archive.com/[email protected]/msg43655.html

---

Balbir Singh (3):
Move zone_reclaim() outside of CONFIG_NUMA
Refactor zone_reclaim, move reusable functionality outside
Provide control over unmapped pages


include/linux/mmzone.h | 4 +-
include/linux/swap.h | 5 +-
mm/page_alloc.c | 7 ++-
mm/vmscan.c | 109 +++++++++++++++++++++++++++++++++++++++++-------
4 files changed, 104 insertions(+), 21 deletions(-)

--
Balbir


2010-11-30 10:15:21

by Balbir Singh

[permalink] [raw]
Subject: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA

This patch moves zone_reclaim and associated helpers
outside CONFIG_NUMA. This infrastructure is reused
in the patches for page cache control that follow.

Signed-off-by: Balbir Singh <[email protected]>
---
include/linux/mmzone.h | 4 ++--
mm/vmscan.c | 2 --
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4890662..aeede91 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -302,12 +302,12 @@ struct zone {
*/
unsigned long lowmem_reserve[MAX_NR_ZONES];

-#ifdef CONFIG_NUMA
- int node;
/*
* zone reclaim becomes active if more unmapped pages exist.
*/
unsigned long min_unmapped_pages;
+#ifdef CONFIG_NUMA
+ int node;
unsigned long min_slab_pages;
#endif
struct per_cpu_pageset __percpu *pageset;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8cc90d5..325443a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2644,7 +2644,6 @@ static int __init kswapd_init(void)

module_init(kswapd_init)

-#ifdef CONFIG_NUMA
/*
* Zone reclaim mode
*
@@ -2854,7 +2853,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)

return ret;
}
-#endif

/*
* page_evictable - test whether a page is evictable

2010-11-30 10:16:06

by Balbir Singh

[permalink] [raw]
Subject: [PATCH 2/3] Refactor zone_reclaim

Refactor zone_reclaim, move reusable functionality outside
of zone_reclaim. Make zone_reclaim_unmapped_pages modular

Signed-off-by: Balbir Singh <[email protected]>
---
mm/vmscan.c | 35 +++++++++++++++++++++++------------
1 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 325443a..0ac444f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2719,6 +2719,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
}

/*
+ * Helper function to reclaim unmapped pages, we might add something
+ * similar to this for slab cache as well. Currently this function
+ * is shared with __zone_reclaim()
+ */
+static inline void
+zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
+ unsigned long nr_pages)
+{
+ int priority;
+ /*
+ * Free memory by calling shrink zone with increasing
+ * priorities until we have enough memory freed.
+ */
+ priority = ZONE_RECLAIM_PRIORITY;
+ do {
+ shrink_zone(priority, zone, sc);
+ priority--;
+ } while (priority >= 0 && sc->nr_reclaimed < nr_pages);
+}
+
+/*
* Try to free up some pages from this zone through reclaim.
*/
static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
@@ -2727,7 +2748,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
const unsigned long nr_pages = 1 << order;
struct task_struct *p = current;
struct reclaim_state reclaim_state;
- int priority;
struct scan_control sc = {
.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2751,17 +2771,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
reclaim_state.reclaimed_slab = 0;
p->reclaim_state = &reclaim_state;

- if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) {
- /*
- * Free memory by calling shrink zone with increasing
- * priorities until we have enough memory freed.
- */
- priority = ZONE_RECLAIM_PRIORITY;
- do {
- shrink_zone(priority, zone, &sc);
- priority--;
- } while (priority >= 0 && sc.nr_reclaimed < nr_pages);
- }
+ if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages)
+ zone_reclaim_unmapped_pages(zone, &sc, nr_pages);

nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
if (nr_slab_pages0 > zone->min_slab_pages) {

2010-11-30 10:16:40

by Balbir Singh

[permalink] [raw]
Subject: [PATCH 3/3] Provide control over unmapped pages

Provide control using zone_reclaim() and a boot parameter. The
code reuses functionality from zone_reclaim() to isolate unmapped
pages and reclaim them as a priority, ahead of other mapped pages.

Signed-off-by: Balbir Singh <[email protected]>
---
include/linux/swap.h | 5 ++-
mm/page_alloc.c | 7 +++--
mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index eba53e7..78b0830 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -252,11 +252,12 @@ extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
extern long vm_total_pages;

-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
extern int sysctl_min_unmapped_ratio;
extern int sysctl_min_slab_ratio;
extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+extern bool should_balance_unmapped_pages(struct zone *zone);
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
#else
#define zone_reclaim_mode 0
static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 62b7280..4228da3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1662,6 +1662,9 @@ zonelist_scan:
unsigned long mark;
int ret;

+ if (should_balance_unmapped_pages(zone))
+ wakeup_kswapd(zone, order);
+
mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
if (zone_watermark_ok(zone, order, mark,
classzone_idx, alloc_flags))
@@ -4136,10 +4139,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,

zone->spanned_pages = size;
zone->present_pages = realsize;
-#ifdef CONFIG_NUMA
- zone->node = nid;
zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
/ 100;
+#ifdef CONFIG_NUMA
+ zone->node = nid;
zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
#endif
zone->name = zone_names[j];
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0ac444f..98950f4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
#define scanning_global_lru(sc) (1)
#endif

+static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
+ struct scan_control *sc);
+static int unmapped_page_control __read_mostly;
+
+static int __init unmapped_page_control_parm(char *str)
+{
+ unmapped_page_control = 1;
+ /*
+ * XXX: Should we tweak swappiness here?
+ */
+ return 1;
+}
+__setup("unmapped_page_control", unmapped_page_control_parm);
+
+
static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
struct scan_control *sc)
{
@@ -2223,6 +2238,12 @@ loop_again:
shrink_active_list(SWAP_CLUSTER_MAX, zone,
&sc, priority, 0);

+ /*
+ * We do unmapped page balancing once here and once
+ * below, so that we don't lose out
+ */
+ balance_unmapped_pages(priority, zone, &sc);
+
if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone), 0, 0)) {
end_zone = i;
@@ -2258,6 +2279,11 @@ loop_again:
continue;

sc.nr_scanned = 0;
+ /*
+ * Balance unmapped pages upfront, this should be
+ * really cheap
+ */
+ balance_unmapped_pages(priority, zone, &sc);

/*
* Call soft limit reclaim before calling shrink_zone.
@@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order)
pgdat->kswapd_max_order = order;
if (!waitqueue_active(&pgdat->kswapd_wait))
return;
- if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
+ if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
+ !should_balance_unmapped_pages(zone))
return;

trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
@@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
}

/*
+ * Routine to balance unmapped pages, inspired from the code under
+ * CONFIG_NUMA that does unmapped page and slab page control by keeping
+ * min_unmapped_pages in the zone. We currently reclaim just unmapped
+ * pages, slab control will come in soon, at which point this routine
+ * should be called balance cached pages
+ */
+static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
+ struct scan_control *sc)
+{
+ if (unmapped_page_control &&
+ (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
+ struct scan_control nsc;
+ unsigned long nr_pages;
+
+ nsc = *sc;
+
+ nsc.swappiness = 0;
+ nsc.may_writepage = 0;
+ nsc.may_unmap = 0;
+ nsc.nr_reclaimed = 0;
+
+ nr_pages = zone_unmapped_file_pages(zone) -
+ zone->min_unmapped_pages;
+ /* Magically try to reclaim eighth the unmapped cache pages */
+ nr_pages >>= 3;
+
+ zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
+ return nsc.nr_reclaimed;
+ }
+ return 0;
+}
+
+#define UNMAPPED_PAGE_RATIO 16
+bool should_balance_unmapped_pages(struct zone *zone)
+{
+ if (unmapped_page_control &&
+ (zone_unmapped_file_pages(zone) >
+ UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
+ return true;
+ return false;
+}
+
+/*
* Try to free up some pages from this zone through reclaim.
*/
static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)

Subject: Re: [PATCH 2/3] Refactor zone_reclaim


Reviewed-by: Christoph Lameter <[email protected]>

Subject: Re: [PATCH 3/3] Provide control over unmapped pages


Looks good.

Reviewed-by: Christoph Lameter <[email protected]>

2010-11-30 22:24:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA

On Tue, 30 Nov 2010 15:45:12 +0530
Balbir Singh <[email protected]> wrote:

> This patch moves zone_reclaim and associated helpers
> outside CONFIG_NUMA. This infrastructure is reused
> in the patches for page cache control that follow.
>

Thereby adding a nice dollop of bloat to everyone's kernel. I don't
think that is justifiable given that the audience for this feature is
about eight people :(

How's about CONFIG_UNMAPPED_PAGECACHE_CONTROL?

Also this patch instantiates sysctl_min_unmapped_ratio and
sysctl_min_slab_ratio on non-NUMA builds but fails to make those
tunables actually tunable in procfs. Changes to sysctl.c are
needed.

> Reviewed-by: Christoph Lameter <[email protected]>

More careful reviewers, please.

2010-11-30 22:25:44

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

On Tue, 30 Nov 2010 15:46:31 +0530
Balbir Singh <[email protected]> wrote:

> Provide control using zone_reclaim() and a boot parameter. The
> code reuses functionality from zone_reclaim() to isolate unmapped
> pages and reclaim them as a priority, ahead of other mapped pages.
>
> Signed-off-by: Balbir Singh <[email protected]>
> ---
> include/linux/swap.h | 5 ++-
> mm/page_alloc.c | 7 +++--
> mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 79 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index eba53e7..78b0830 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -252,11 +252,12 @@ extern int vm_swappiness;
> extern int remove_mapping(struct address_space *mapping, struct page *page);
> extern long vm_total_pages;
>
> -#ifdef CONFIG_NUMA
> -extern int zone_reclaim_mode;
> extern int sysctl_min_unmapped_ratio;
> extern int sysctl_min_slab_ratio;

This change will need to be moved into the first patch.

> extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> +extern bool should_balance_unmapped_pages(struct zone *zone);
> +#ifdef CONFIG_NUMA
> +extern int zone_reclaim_mode;
> #else
> #define zone_reclaim_mode 0
> static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 62b7280..4228da3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1662,6 +1662,9 @@ zonelist_scan:
> unsigned long mark;
> int ret;
>
> + if (should_balance_unmapped_pages(zone))
> + wakeup_kswapd(zone, order);

gack, this is on the page allocator fastpath, isn't it? So
99.99999999% of the world's machines end up doing a pointless call to a
pointless function which pointlessly tests a pointless global and
pointlessly returns? All because of some whacky KSM thing?

The speed and space overhead of this code should be *zero* if
!CONFIG_UNMAPPED_PAGECACHE_CONTROL and should be minimal if
CONFIG_UNMAPPED_PAGECACHE_CONTROL=y. The way to do the latter is to
inline the test of unmapped_page_control into callers and only if it is
true (and use unlikely(), please) do we call into the KSM gunk.

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
> #define scanning_global_lru(sc) (1)
> #endif
>
> +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> + struct scan_control *sc);
> +static int unmapped_page_control __read_mostly;
> +
> +static int __init unmapped_page_control_parm(char *str)
> +{
> + unmapped_page_control = 1;
> + /*
> + * XXX: Should we tweak swappiness here?
> + */
> + return 1;
> +}
> +__setup("unmapped_page_control", unmapped_page_control_parm);

aw c'mon guys, everybody knows that when you add a kernel parameter you
document it in Documentation/kernel-parameters.txt.

> static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
> struct scan_control *sc)
> {
> @@ -2223,6 +2238,12 @@ loop_again:
> shrink_active_list(SWAP_CLUSTER_MAX, zone,
> &sc, priority, 0);
>
> + /*
> + * We do unmapped page balancing once here and once
> + * below, so that we don't lose out
> + */
> + balance_unmapped_pages(priority, zone, &sc);
> +
> if (!zone_watermark_ok_safe(zone, order,
> high_wmark_pages(zone), 0, 0)) {
> end_zone = i;
> @@ -2258,6 +2279,11 @@ loop_again:
> continue;
>
> sc.nr_scanned = 0;
> + /*
> + * Balance unmapped pages upfront, this should be
> + * really cheap
> + */
> + balance_unmapped_pages(priority, zone, &sc);

More unjustifiable overhead on a commonly-executed codepath.

> /*
> * Call soft limit reclaim before calling shrink_zone.
> @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order)
> pgdat->kswapd_max_order = order;
> if (!waitqueue_active(&pgdat->kswapd_wait))
> return;
> - if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
> + if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
> + !should_balance_unmapped_pages(zone))
> return;
>
> trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
> @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> }
>
> /*
> + * Routine to balance unmapped pages, inspired from the code under
> + * CONFIG_NUMA that does unmapped page and slab page control by keeping
> + * min_unmapped_pages in the zone. We currently reclaim just unmapped
> + * pages, slab control will come in soon, at which point this routine
> + * should be called balance cached pages
> + */

The problem I have with this comment is that it uses the term "balance"
without ever defining it. Plus "balance" is already a term which is used
in memory reclaim.

So if you can think up a unique noun then that's good but whether or
not that is done, please describe with great care what that term
actually means in this context.

> +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> + struct scan_control *sc)
> +{
> + if (unmapped_page_control &&
> + (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
> + struct scan_control nsc;
> + unsigned long nr_pages;
> +
> + nsc = *sc;
> +
> + nsc.swappiness = 0;
> + nsc.may_writepage = 0;
> + nsc.may_unmap = 0;
> + nsc.nr_reclaimed = 0;

Doing a clone-and-own of a scan_control is novel. What's going on here?

> + nr_pages = zone_unmapped_file_pages(zone) -
> + zone->min_unmapped_pages;
> + /* Magically try to reclaim eighth the unmapped cache pages */
> + nr_pages >>= 3;
> +
> + zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
> + return nsc.nr_reclaimed;
> + }
> + return 0;
> +}
> +
> +#define UNMAPPED_PAGE_RATIO 16

Well. Giving 16 a name didn't really clarify anything. Attentive
readers will want to know what this does, why 16 was chosen and what
the effects of changing it will be.

> +bool should_balance_unmapped_pages(struct zone *zone)
> +{
> + if (unmapped_page_control &&
> + (zone_unmapped_file_pages(zone) >
> + UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> + return true;
> + return false;
> +}


> Reviewed-by: Christoph Lameter <[email protected]>

So you're OK with shoving all this flotsam into 100,000,000 cellphones?
This was a pretty outrageous patchset!

2010-12-01 00:14:20

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

> Provide control using zone_reclaim() and a boot parameter. The
> code reuses functionality from zone_reclaim() to isolate unmapped
> pages and reclaim them as a priority, ahead of other mapped pages.
>
> Signed-off-by: Balbir Singh <[email protected]>
> ---
> include/linux/swap.h | 5 ++-
> mm/page_alloc.c | 7 +++--
> mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 79 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index eba53e7..78b0830 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -252,11 +252,12 @@ extern int vm_swappiness;
> extern int remove_mapping(struct address_space *mapping, struct page *page);
> extern long vm_total_pages;
>
> -#ifdef CONFIG_NUMA
> -extern int zone_reclaim_mode;
> extern int sysctl_min_unmapped_ratio;
> extern int sysctl_min_slab_ratio;
> extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> +extern bool should_balance_unmapped_pages(struct zone *zone);
> +#ifdef CONFIG_NUMA
> +extern int zone_reclaim_mode;
> #else
> #define zone_reclaim_mode 0
> static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 62b7280..4228da3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1662,6 +1662,9 @@ zonelist_scan:
> unsigned long mark;
> int ret;
>
> + if (should_balance_unmapped_pages(zone))
> + wakeup_kswapd(zone, order);
> +

You don't have to add extra branch into fast path.


> mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
> if (zone_watermark_ok(zone, order, mark,
> classzone_idx, alloc_flags))
> @@ -4136,10 +4139,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>
> zone->spanned_pages = size;
> zone->present_pages = realsize;
> -#ifdef CONFIG_NUMA
> - zone->node = nid;
> zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
> / 100;
> +#ifdef CONFIG_NUMA
> + zone->node = nid;
> zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
> #endif
> zone->name = zone_names[j];
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 0ac444f..98950f4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
> #define scanning_global_lru(sc) (1)
> #endif
>
> +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> + struct scan_control *sc);
> +static int unmapped_page_control __read_mostly;
> +
> +static int __init unmapped_page_control_parm(char *str)
> +{
> + unmapped_page_control = 1;
> + /*
> + * XXX: Should we tweak swappiness here?
> + */
> + return 1;
> +}
> +__setup("unmapped_page_control", unmapped_page_control_parm);
> +
> +
> static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
> struct scan_control *sc)
> {
> @@ -2223,6 +2238,12 @@ loop_again:
> shrink_active_list(SWAP_CLUSTER_MAX, zone,
> &sc, priority, 0);
>
> + /*
> + * We do unmapped page balancing once here and once
> + * below, so that we don't lose out
> + */
> + balance_unmapped_pages(priority, zone, &sc);

You can't invoke any reclaim from here. It is in zone balancing detection
phase. It mean your code reclaim pages from zones which has lots free pages too.

> +
> if (!zone_watermark_ok_safe(zone, order,
> high_wmark_pages(zone), 0, 0)) {
> end_zone = i;
> @@ -2258,6 +2279,11 @@ loop_again:
> continue;
>
> sc.nr_scanned = 0;
> + /*
> + * Balance unmapped pages upfront, this should be
> + * really cheap
> + */
> + balance_unmapped_pages(priority, zone, &sc);


This code break page-cache/slab balancing logic. And this is conflict
against Nick's per-zone slab effort.

Plus, high-order + priority=5 reclaim Simon's case. (see "Free memory never
fully used, swapping" threads)

>
> /*
> * Call soft limit reclaim before calling shrink_zone.
> @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order)
> pgdat->kswapd_max_order = order;
> if (!waitqueue_active(&pgdat->kswapd_wait))
> return;
> - if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
> + if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
> + !should_balance_unmapped_pages(zone))
> return;
>
> trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
> @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> }
>
> /*
> + * Routine to balance unmapped pages, inspired from the code under
> + * CONFIG_NUMA that does unmapped page and slab page control by keeping
> + * min_unmapped_pages in the zone. We currently reclaim just unmapped
> + * pages, slab control will come in soon, at which point this routine
> + * should be called balance cached pages
> + */
> +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> + struct scan_control *sc)
> +{
> + if (unmapped_page_control &&
> + (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
> + struct scan_control nsc;
> + unsigned long nr_pages;
> +
> + nsc = *sc;
> +
> + nsc.swappiness = 0;
> + nsc.may_writepage = 0;
> + nsc.may_unmap = 0;
> + nsc.nr_reclaimed = 0;

Don't you need to fill nsc.nr_to_reclaim field?

> +
> + nr_pages = zone_unmapped_file_pages(zone) -
> + zone->min_unmapped_pages;
> + /* Magically try to reclaim eighth the unmapped cache pages */
> + nr_pages >>= 3;

Please don't make magic.

> +
> + zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
> + return nsc.nr_reclaimed;
> + }
> + return 0;
> +}
> +
> +#define UNMAPPED_PAGE_RATIO 16

Please don't make magic ratio.

> +bool should_balance_unmapped_pages(struct zone *zone)
> +{
> + if (unmapped_page_control &&
> + (zone_unmapped_file_pages(zone) >
> + UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> + return true;
> + return false;
> +}
> +
> +/*
> * Try to free up some pages from this zone through reclaim.
> */
> static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)


Hmm....

As far as I reviewed, I can't find any reason why this patch works as expected.
So, I think cleancache looks promising more than this idea. Have you seen Dan's
patch? I would suggested discuss him.

Thanks.

2010-12-01 01:29:09

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/3] Refactor zone_reclaim

On Tue, 30 Nov 2010 15:45:55 +0530
Balbir Singh <[email protected]> wrote:

> Refactor zone_reclaim, move reusable functionality outside
> of zone_reclaim. Make zone_reclaim_unmapped_pages modular
>
> Signed-off-by: Balbir Singh <[email protected]>

Why is this min_mapped_pages based on zone (IOW, per-zone) ?


Thanks,
-Kame

2010-12-01 01:38:33

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

On Tue, 30 Nov 2010 15:46:31 +0530
Balbir Singh <[email protected]> wrote:

> Provide control using zone_reclaim() and a boot parameter. The
> code reuses functionality from zone_reclaim() to isolate unmapped
> pages and reclaim them as a priority, ahead of other mapped pages.
>
> Signed-off-by: Balbir Singh <[email protected]>
> ---
> include/linux/swap.h | 5 ++-
> mm/page_alloc.c | 7 +++--
> mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 79 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index eba53e7..78b0830 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -252,11 +252,12 @@ extern int vm_swappiness;
> extern int remove_mapping(struct address_space *mapping, struct page *page);
> extern long vm_total_pages;
>
> -#ifdef CONFIG_NUMA
> -extern int zone_reclaim_mode;
> extern int sysctl_min_unmapped_ratio;
> extern int sysctl_min_slab_ratio;
> extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> +extern bool should_balance_unmapped_pages(struct zone *zone);
> +#ifdef CONFIG_NUMA
> +extern int zone_reclaim_mode;
> #else
> #define zone_reclaim_mode 0
> static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 62b7280..4228da3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1662,6 +1662,9 @@ zonelist_scan:
> unsigned long mark;
> int ret;
>
> + if (should_balance_unmapped_pages(zone))
> + wakeup_kswapd(zone, order);
> +

Hm, I'm not sure the final vision of this feature. Does this reclaiming feature
can't be called directly via balloon driver just before alloc_page() ?

Do you need to keep page caches small even when there are free memory on host ?

Thanks,
-Kame

2010-12-01 05:21:52

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA

* Balbir Singh <[email protected]> [2010-12-01 10:04:08]:

> * Andrew Morton <[email protected]> [2010-11-30 14:23:38]:
>
> > On Tue, 30 Nov 2010 15:45:12 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> > > This patch moves zone_reclaim and associated helpers
> > > outside CONFIG_NUMA. This infrastructure is reused
> > > in the patches for page cache control that follow.
> > >
> >
> > Thereby adding a nice dollop of bloat to everyone's kernel. I don't
> > think that is justifiable given that the audience for this feature is
> > about eight people :(
> >
> > How's about CONFIG_UNMAPPED_PAGECACHE_CONTROL?
> >
>
> OK, I'll add the config, but this code is enabled under CONFIG_NUMA
> today, so the bloat I agree is more for non NUMA users. I'll make
> CONFIG_UNMAPPED_PAGECACHE_CONTROL default if CONFIG_NUMA is set.
>
> > Also this patch instantiates sysctl_min_unmapped_ratio and
> > sysctl_min_slab_ratio on non-NUMA builds but fails to make those
> > tunables actually tunable in procfs. Changes to sysctl.c are
> > needed.
> >
>
> Oh! yeah.. I missed it while refactoring, my fault.
>
> > > Reviewed-by: Christoph Lameter <[email protected]>
> >

My local MTA failed to deliver the message, trying again.

--
Three Cheers,
Balbir

2010-12-01 05:22:27

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 2/3] Refactor zone_reclaim

* Balbir Singh <[email protected]> [2010-12-01 10:16:34]:

> * KAMEZAWA Hiroyuki <[email protected]> [2010-12-01 10:23:29]:
>
> > On Tue, 30 Nov 2010 15:45:55 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> > > Refactor zone_reclaim, move reusable functionality outside
> > > of zone_reclaim. Make zone_reclaim_unmapped_pages modular
> > >
> > > Signed-off-by: Balbir Singh <[email protected]>
> >
> > Why is this min_mapped_pages based on zone (IOW, per-zone) ?
> >
>
> Kamezawa-San, this has been the design before the refactoring (it is
> based on zone_reclaim_mode and reclaim based on top of that). I am
> reusing bits of existing technology. The advantage of it being
> per-zone is that it integrates well with kswapd.
>

My local MTA failed to deliver the message, trying again.

--
Three Cheers,
Balbir

2010-12-01 05:22:42

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

* Balbir Singh <[email protected]> [2010-12-01 10:24:21]:

> * Andrew Morton <[email protected]> [2010-11-30 14:25:09]:
>
> > On Tue, 30 Nov 2010 15:46:31 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> > > Provide control using zone_reclaim() and a boot parameter. The
> > > code reuses functionality from zone_reclaim() to isolate unmapped
> > > pages and reclaim them as a priority, ahead of other mapped pages.
> > >
> > > Signed-off-by: Balbir Singh <[email protected]>
> > > ---
> > > include/linux/swap.h | 5 ++-
> > > mm/page_alloc.c | 7 +++--
> > > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > > 3 files changed, 79 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > index eba53e7..78b0830 100644
> > > --- a/include/linux/swap.h
> > > +++ b/include/linux/swap.h
> > > @@ -252,11 +252,12 @@ extern int vm_swappiness;
> > > extern int remove_mapping(struct address_space *mapping, struct page *page);
> > > extern long vm_total_pages;
> > >
> > > -#ifdef CONFIG_NUMA
> > > -extern int zone_reclaim_mode;
> > > extern int sysctl_min_unmapped_ratio;
> > > extern int sysctl_min_slab_ratio;
> >
> > This change will need to be moved into the first patch.
> >
>
> OK, will do, thanks for pointing it out
>
> > > extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> > > +extern bool should_balance_unmapped_pages(struct zone *zone);
> > > +#ifdef CONFIG_NUMA
> > > +extern int zone_reclaim_mode;
> > > #else
> > > #define zone_reclaim_mode 0
> > > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 62b7280..4228da3 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1662,6 +1662,9 @@ zonelist_scan:
> > > unsigned long mark;
> > > int ret;
> > >
> > > + if (should_balance_unmapped_pages(zone))
> > > + wakeup_kswapd(zone, order);
> >
> > gack, this is on the page allocator fastpath, isn't it? So
> > 99.99999999% of the world's machines end up doing a pointless call to a
> > pointless function which pointlessly tests a pointless global and
> > pointlessly returns? All because of some whacky KSM thing?
> >
> > The speed and space overhead of this code should be *zero* if
> > !CONFIG_UNMAPPED_PAGECACHE_CONTROL and should be minimal if
> > CONFIG_UNMAPPED_PAGECACHE_CONTROL=y. The way to do the latter is to
> > inline the test of unmapped_page_control into callers and only if it is
> > true (and use unlikely(), please) do we call into the KSM gunk.
> >
>
> Will do, should_balance_unmapped_pages() will be a made a no-op in the
> absence of CONFIG_UNMAPPED_PAGECACHE_CONTROL
>
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
> > > #define scanning_global_lru(sc) (1)
> > > #endif
> > >
> > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> > > + struct scan_control *sc);
> > > +static int unmapped_page_control __read_mostly;
> > > +
> > > +static int __init unmapped_page_control_parm(char *str)
> > > +{
> > > + unmapped_page_control = 1;
> > > + /*
> > > + * XXX: Should we tweak swappiness here?
> > > + */
> > > + return 1;
> > > +}
> > > +__setup("unmapped_page_control", unmapped_page_control_parm);
> >
> > aw c'mon guys, everybody knows that when you add a kernel parameter you
> > document it in Documentation/kernel-parameters.txt.
>
> Will do - feeling silly on missing it out, that is where reviews help.
>
> >
> > > static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
> > > struct scan_control *sc)
> > > {
> > > @@ -2223,6 +2238,12 @@ loop_again:
> > > shrink_active_list(SWAP_CLUSTER_MAX, zone,
> > > &sc, priority, 0);
> > >
> > > + /*
> > > + * We do unmapped page balancing once here and once
> > > + * below, so that we don't lose out
> > > + */
> > > + balance_unmapped_pages(priority, zone, &sc);
> > > +
> > > if (!zone_watermark_ok_safe(zone, order,
> > > high_wmark_pages(zone), 0, 0)) {
> > > end_zone = i;
> > > @@ -2258,6 +2279,11 @@ loop_again:
> > > continue;
> > >
> > > sc.nr_scanned = 0;
> > > + /*
> > > + * Balance unmapped pages upfront, this should be
> > > + * really cheap
> > > + */
> > > + balance_unmapped_pages(priority, zone, &sc);
> >
> > More unjustifiable overhead on a commonly-executed codepath.
> >
>
> Will refactor with a CONFIG suggested above.
>
> > > /*
> > > * Call soft limit reclaim before calling shrink_zone.
> > > @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order)
> > > pgdat->kswapd_max_order = order;
> > > if (!waitqueue_active(&pgdat->kswapd_wait))
> > > return;
> > > - if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
> > > + if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
> > > + !should_balance_unmapped_pages(zone))
> > > return;
> > >
> > > trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
> > > @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> > > }
> > >
> > > /*
> > > + * Routine to balance unmapped pages, inspired from the code under
> > > + * CONFIG_NUMA that does unmapped page and slab page control by keeping
> > > + * min_unmapped_pages in the zone. We currently reclaim just unmapped
> > > + * pages, slab control will come in soon, at which point this routine
> > > + * should be called balance cached pages
> > > + */
> >
> > The problem I have with this comment is that it uses the term "balance"
> > without ever defining it. Plus "balance" is already a term which is used
> > in memory reclaim.
> >
> > So if you can think up a unique noun then that's good but whether or
> > not that is done, please describe with great care what that term
> > actually means in this context.
>
> I used balance as a not a 1:1 balance, but to balance the proportion
> of unmapped page cache based on a sysctl/tunable.
>
> >
> > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> > > + struct scan_control *sc)
> > > +{
> > > + if (unmapped_page_control &&
> > > + (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
> > > + struct scan_control nsc;
> > > + unsigned long nr_pages;
> > > +
> > > + nsc = *sc;
> > > +
> > > + nsc.swappiness = 0;
> > > + nsc.may_writepage = 0;
> > > + nsc.may_unmap = 0;
> > > + nsc.nr_reclaimed = 0;
> >
> > Doing a clone-and-own of a scan_control is novel. What's going on here?
>
> This code overwrites the swappiness, may_* and nr_reclaimed for
> correct stats. The idea is to vary the reclaim behaviour/bias it.
>
> >
> > > + nr_pages = zone_unmapped_file_pages(zone) -
> > > + zone->min_unmapped_pages;
> > > + /* Magically try to reclaim eighth the unmapped cache pages */
> > > + nr_pages >>= 3;
> > > +
> > > + zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
> > > + return nsc.nr_reclaimed;
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +#define UNMAPPED_PAGE_RATIO 16
> >
> > Well. Giving 16 a name didn't really clarify anything. Attentive
> > readers will want to know what this does, why 16 was chosen and what
> > the effects of changing it will be.
>
> Sorry, I documented that in the changelog of the first patchset. I'll
> document it here as well. The reason for choosing 16 is based on
> heuristics and test, the tradeoff being overenthusiastic reclaim
> versus size of cache/performance.
>
> >
> > > +bool should_balance_unmapped_pages(struct zone *zone)
> > > +{
> > > + if (unmapped_page_control &&
> > > + (zone_unmapped_file_pages(zone) >
> > > + UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> > > + return true;
> > > + return false;
> > > +}
> >
> >
> > > Reviewed-by: Christoph Lameter <[email protected]>
> >
> > So you're OK with shoving all this flotsam into 100,000,000 cellphones?
> > This was a pretty outrageous patchset!
>
> I'll do a better one, BTW, a lot of embedded folks are interested in
> page cache control outside of cgroup behaviour.
>
> Thanks for the detailed review!
>

My local MTA failed to deliver the message, trying again.

--
Three Cheers,
Balbir

2010-12-01 05:23:01

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

* Balbir Singh <[email protected]> [2010-12-01 10:46:32]:

> * KOSAKI Motohiro <[email protected]> [2010-12-01 09:14:13]:
>
> > > Provide control using zone_reclaim() and a boot parameter. The
> > > code reuses functionality from zone_reclaim() to isolate unmapped
> > > pages and reclaim them as a priority, ahead of other mapped pages.
> > >
> > > Signed-off-by: Balbir Singh <[email protected]>
> > > ---
> > > include/linux/swap.h | 5 ++-
> > > mm/page_alloc.c | 7 +++--
> > > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > > 3 files changed, 79 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > index eba53e7..78b0830 100644
> > > --- a/include/linux/swap.h
> > > +++ b/include/linux/swap.h
> > > @@ -252,11 +252,12 @@ extern int vm_swappiness;
> > > extern int remove_mapping(struct address_space *mapping, struct page *page);
> > > extern long vm_total_pages;
> > >
> > > -#ifdef CONFIG_NUMA
> > > -extern int zone_reclaim_mode;
> > > extern int sysctl_min_unmapped_ratio;
> > > extern int sysctl_min_slab_ratio;
> > > extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> > > +extern bool should_balance_unmapped_pages(struct zone *zone);
> > > +#ifdef CONFIG_NUMA
> > > +extern int zone_reclaim_mode;
> > > #else
> > > #define zone_reclaim_mode 0
> > > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 62b7280..4228da3 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1662,6 +1662,9 @@ zonelist_scan:
> > > unsigned long mark;
> > > int ret;
> > >
> > > + if (should_balance_unmapped_pages(zone))
> > > + wakeup_kswapd(zone, order);
> > > +
> >
> > You don't have to add extra branch into fast path.
> >
> >
> > > mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
> > > if (zone_watermark_ok(zone, order, mark,
> > > classzone_idx, alloc_flags))
> > > @@ -4136,10 +4139,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
> > >
> > > zone->spanned_pages = size;
> > > zone->present_pages = realsize;
> > > -#ifdef CONFIG_NUMA
> > > - zone->node = nid;
> > > zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
> > > / 100;
> > > +#ifdef CONFIG_NUMA
> > > + zone->node = nid;
> > > zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
> > > #endif
> > > zone->name = zone_names[j];
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 0ac444f..98950f4 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
> > > #define scanning_global_lru(sc) (1)
> > > #endif
> > >
> > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> > > + struct scan_control *sc);
> > > +static int unmapped_page_control __read_mostly;
> > > +
> > > +static int __init unmapped_page_control_parm(char *str)
> > > +{
> > > + unmapped_page_control = 1;
> > > + /*
> > > + * XXX: Should we tweak swappiness here?
> > > + */
> > > + return 1;
> > > +}
> > > +__setup("unmapped_page_control", unmapped_page_control_parm);
> > > +
> > > +
> > > static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
> > > struct scan_control *sc)
> > > {
> > > @@ -2223,6 +2238,12 @@ loop_again:
> > > shrink_active_list(SWAP_CLUSTER_MAX, zone,
> > > &sc, priority, 0);
> > >
> > > + /*
> > > + * We do unmapped page balancing once here and once
> > > + * below, so that we don't lose out
> > > + */
> > > + balance_unmapped_pages(priority, zone, &sc);
> >
> > You can't invoke any reclaim from here. It is in zone balancing detection
> > phase. It mean your code reclaim pages from zones which has lots free pages too.
>
> The goal is to check not only for zone_watermark_ok, but also to see
> if unmapped pages are way higher than expected values.
>
> >
> > > +
> > > if (!zone_watermark_ok_safe(zone, order,
> > > high_wmark_pages(zone), 0, 0)) {
> > > end_zone = i;
> > > @@ -2258,6 +2279,11 @@ loop_again:
> > > continue;
> > >
> > > sc.nr_scanned = 0;
> > > + /*
> > > + * Balance unmapped pages upfront, this should be
> > > + * really cheap
> > > + */
> > > + balance_unmapped_pages(priority, zone, &sc);
> >
> >
> > This code break page-cache/slab balancing logic. And this is conflict
> > against Nick's per-zone slab effort.
> >
>
> OK, cc'ing Nick for comments.
>
> > Plus, high-order + priority=5 reclaim Simon's case. (see "Free memory never
> > fully used, swapping" threads)
> >
>
> OK, this path should not add to swapping activity, if that is your
> concern.
>
>
> > >
> > > /*
> > > * Call soft limit reclaim before calling shrink_zone.
> > > @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order)
> > > pgdat->kswapd_max_order = order;
> > > if (!waitqueue_active(&pgdat->kswapd_wait))
> > > return;
> > > - if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
> > > + if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
> > > + !should_balance_unmapped_pages(zone))
> > > return;
> > >
> > > trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
> > > @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> > > }
> > >
> > > /*
> > > + * Routine to balance unmapped pages, inspired from the code under
> > > + * CONFIG_NUMA that does unmapped page and slab page control by keeping
> > > + * min_unmapped_pages in the zone. We currently reclaim just unmapped
> > > + * pages, slab control will come in soon, at which point this routine
> > > + * should be called balance cached pages
> > > + */
> > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> > > + struct scan_control *sc)
> > > +{
> > > + if (unmapped_page_control &&
> > > + (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
> > > + struct scan_control nsc;
> > > + unsigned long nr_pages;
> > > +
> > > + nsc = *sc;
> > > +
> > > + nsc.swappiness = 0;
> > > + nsc.may_writepage = 0;
> > > + nsc.may_unmap = 0;
> > > + nsc.nr_reclaimed = 0;
> >
> > Don't you need to fill nsc.nr_to_reclaim field?
> >
>
> Yes, since the relcaim code looks at nr_reclaimed, it needs to be 0 at
> every iteration - did I miss something?
>
> > > +
> > > + nr_pages = zone_unmapped_file_pages(zone) -
> > > + zone->min_unmapped_pages;
> > > + /* Magically try to reclaim eighth the unmapped cache pages */
> > > + nr_pages >>= 3;
> >
> > Please don't make magic.
> >
>
> OK, it is a hueristic, how do I use it?
>
> > > +
> > > + zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
> > > + return nsc.nr_reclaimed;
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +#define UNMAPPED_PAGE_RATIO 16
> >
> > Please don't make magic ratio.
>
> OK, it is a hueristic, how do I use heuristics - sysctl?
>
> >
> > > +bool should_balance_unmapped_pages(struct zone *zone)
> > > +{
> > > + if (unmapped_page_control &&
> > > + (zone_unmapped_file_pages(zone) >
> > > + UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> > > + return true;
> > > + return false;
> > > +}
> > > +
> > > +/*
> > > * Try to free up some pages from this zone through reclaim.
> > > */
> > > static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> >
> >
> > Hmm....
> >
> > As far as I reviewed, I can't find any reason why this patch works as expected.
> > So, I think cleancache looks promising more than this idea. Have you seen Dan's
> > patch? I would suggested discuss him.
>
> Please try the patch, I've been using it and it works exactly as
> expected for me. kswapd does the balancing and works well. I've posted
> some data as well.
>

My local MTA failed to deliver the message, trying again.

--
Three Cheers,
Balbir

2010-12-01 05:23:12

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

* Balbir Singh <[email protected]> [2010-12-01 10:48:16]:

> * KAMEZAWA Hiroyuki <[email protected]> [2010-12-01 10:32:54]:
>
> > On Tue, 30 Nov 2010 15:46:31 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> > > Provide control using zone_reclaim() and a boot parameter. The
> > > code reuses functionality from zone_reclaim() to isolate unmapped
> > > pages and reclaim them as a priority, ahead of other mapped pages.
> > >
> > > Signed-off-by: Balbir Singh <[email protected]>
> > > ---
> > > include/linux/swap.h | 5 ++-
> > > mm/page_alloc.c | 7 +++--
> > > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > > 3 files changed, 79 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > index eba53e7..78b0830 100644
> > > --- a/include/linux/swap.h
> > > +++ b/include/linux/swap.h
> > > @@ -252,11 +252,12 @@ extern int vm_swappiness;
> > > extern int remove_mapping(struct address_space *mapping, struct page *page);
> > > extern long vm_total_pages;
> > >
> > > -#ifdef CONFIG_NUMA
> > > -extern int zone_reclaim_mode;
> > > extern int sysctl_min_unmapped_ratio;
> > > extern int sysctl_min_slab_ratio;
> > > extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> > > +extern bool should_balance_unmapped_pages(struct zone *zone);
> > > +#ifdef CONFIG_NUMA
> > > +extern int zone_reclaim_mode;
> > > #else
> > > #define zone_reclaim_mode 0
> > > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 62b7280..4228da3 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1662,6 +1662,9 @@ zonelist_scan:
> > > unsigned long mark;
> > > int ret;
> > >
> > > + if (should_balance_unmapped_pages(zone))
> > > + wakeup_kswapd(zone, order);
> > > +
> >
> > Hm, I'm not sure the final vision of this feature. Does this reclaiming feature
> > can't be called directly via balloon driver just before alloc_page() ?
> >
>
> That is a separate patch, this is a boot paramter based control
> approach.
>
> > Do you need to keep page caches small even when there are free memory on host ?
> >
>
> The goal is to avoid duplication, as you know page cache fills itself
> to consume as much memory as possible. The host generally does not
> have a lot of free memory in a consolidated environment.
>
My local MTA failed to deliver the message, trying again.

--
Three Cheers,
Balbir

2010-12-01 05:41:39

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

On Wed, 1 Dec 2010 10:52:59 +0530
Balbir Singh <[email protected]> wrote:

> * Balbir Singh <[email protected]> [2010-12-01 10:48:16]:
>
> > * KAMEZAWA Hiroyuki <[email protected]> [2010-12-01 10:32:54]:
> >
> > > On Tue, 30 Nov 2010 15:46:31 +0530
> > > Balbir Singh <[email protected]> wrote:
> > >
> > > > Provide control using zone_reclaim() and a boot parameter. The
> > > > code reuses functionality from zone_reclaim() to isolate unmapped
> > > > pages and reclaim them as a priority, ahead of other mapped pages.
> > > >
> > > > Signed-off-by: Balbir Singh <[email protected]>
> > > > ---
> > > > include/linux/swap.h | 5 ++-
> > > > mm/page_alloc.c | 7 +++--
> > > > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > 3 files changed, 79 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > > index eba53e7..78b0830 100644
> > > > --- a/include/linux/swap.h
> > > > +++ b/include/linux/swap.h
> > > > @@ -252,11 +252,12 @@ extern int vm_swappiness;
> > > > extern int remove_mapping(struct address_space *mapping, struct page *page);
> > > > extern long vm_total_pages;
> > > >
> > > > -#ifdef CONFIG_NUMA
> > > > -extern int zone_reclaim_mode;
> > > > extern int sysctl_min_unmapped_ratio;
> > > > extern int sysctl_min_slab_ratio;
> > > > extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> > > > +extern bool should_balance_unmapped_pages(struct zone *zone);
> > > > +#ifdef CONFIG_NUMA
> > > > +extern int zone_reclaim_mode;
> > > > #else
> > > > #define zone_reclaim_mode 0
> > > > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 62b7280..4228da3 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -1662,6 +1662,9 @@ zonelist_scan:
> > > > unsigned long mark;
> > > > int ret;
> > > >
> > > > + if (should_balance_unmapped_pages(zone))
> > > > + wakeup_kswapd(zone, order);
> > > > +
> > >
> > > Hm, I'm not sure the final vision of this feature. Does this reclaiming feature
> > > can't be called directly via balloon driver just before alloc_page() ?
> > >
> >
> > That is a separate patch, this is a boot paramter based control
> > approach.
> >
> > > Do you need to keep page caches small even when there are free memory on host ?
> > >
> >
> > The goal is to avoid duplication, as you know page cache fills itself
> > to consume as much memory as possible. The host generally does not
> > have a lot of free memory in a consolidated environment.
> >

That's a point. Then, why the guest has to do _extra_ work for host even when
the host says nothing ? I think trigger this by guests themselves is not very good.

Thanks,
-Kame


2010-12-01 06:40:54

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

* KAMEZAWA Hiroyuki <[email protected]> [2010-12-01 14:35:50]:

> On Wed, 1 Dec 2010 10:52:59 +0530
> Balbir Singh <[email protected]> wrote:
>
> > * Balbir Singh <[email protected]> [2010-12-01 10:48:16]:
> >
> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-12-01 10:32:54]:
> > >
> > > > On Tue, 30 Nov 2010 15:46:31 +0530
> > > > Balbir Singh <[email protected]> wrote:
> > > >
> > > > > Provide control using zone_reclaim() and a boot parameter. The
> > > > > code reuses functionality from zone_reclaim() to isolate unmapped
> > > > > pages and reclaim them as a priority, ahead of other mapped pages.
> > > > >
> > > > > Signed-off-by: Balbir Singh <[email protected]>
> > > > > ---
> > > > > include/linux/swap.h | 5 ++-
> > > > > mm/page_alloc.c | 7 +++--
> > > > > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > 3 files changed, 79 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > > > index eba53e7..78b0830 100644
> > > > > --- a/include/linux/swap.h
> > > > > +++ b/include/linux/swap.h
> > > > > @@ -252,11 +252,12 @@ extern int vm_swappiness;
> > > > > extern int remove_mapping(struct address_space *mapping, struct page *page);
> > > > > extern long vm_total_pages;
> > > > >
> > > > > -#ifdef CONFIG_NUMA
> > > > > -extern int zone_reclaim_mode;
> > > > > extern int sysctl_min_unmapped_ratio;
> > > > > extern int sysctl_min_slab_ratio;
> > > > > extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> > > > > +extern bool should_balance_unmapped_pages(struct zone *zone);
> > > > > +#ifdef CONFIG_NUMA
> > > > > +extern int zone_reclaim_mode;
> > > > > #else
> > > > > #define zone_reclaim_mode 0
> > > > > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > > index 62b7280..4228da3 100644
> > > > > --- a/mm/page_alloc.c
> > > > > +++ b/mm/page_alloc.c
> > > > > @@ -1662,6 +1662,9 @@ zonelist_scan:
> > > > > unsigned long mark;
> > > > > int ret;
> > > > >
> > > > > + if (should_balance_unmapped_pages(zone))
> > > > > + wakeup_kswapd(zone, order);
> > > > > +
> > > >
> > > > Hm, I'm not sure the final vision of this feature. Does this reclaiming feature
> > > > can't be called directly via balloon driver just before alloc_page() ?
> > > >
> > >
> > > That is a separate patch, this is a boot paramter based control
> > > approach.
> > >
> > > > Do you need to keep page caches small even when there are free memory on host ?
> > > >
> > >
> > > The goal is to avoid duplication, as you know page cache fills itself
> > > to consume as much memory as possible. The host generally does not
> > > have a lot of free memory in a consolidated environment.
> > >
>
> That's a point. Then, why the guest has to do _extra_ work for host even when
> the host says nothing ? I think trigger this by guests themselves is not very good.

I've mentioned it before, the guest keeping free memory without a
large performance hit, helps, the balloon driver is able to quickly
retrieve this memory if required or the guest can use this memory for
some other application/task. The cached data is mostly already present
in the host page cache.

--
Three Cheers,
Balbir

2010-12-01 07:30:27

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

On Wed, 1 Dec 2010 12:10:43 +0530
Balbir Singh <[email protected]> wrote:

> > That's a point. Then, why the guest has to do _extra_ work for host even when
> > the host says nothing ? I think trigger this by guests themselves is not very good.
>
> I've mentioned it before, the guest keeping free memory without a
> large performance hit, helps, the balloon driver is able to quickly
> retrieve this memory if required or the guest can use this memory for
> some other application/task.


> The cached data is mostly already present in the host page cache.

Why ? Are there parameters/stats which shows this is _true_ ? How we can
guarantee/show it to users ?
Please add an interface to show "shared rate between guest/host" If not,
any admin will not turn this on because "file cache status on host" is a
black box for guest admins. I think this patch skips something important steps.

2nd point is maybe for reducing total host memory usage and for increasing
the number of guests on a host. For that, this feature is useful only when all guests
on a host are friendly and devoted to the health of host memory management because
all setting must be done in the guest. This can be passed as even by qemu's command line
argument. And _no_ benefit for the guests who reduce it's resource to help
host management because there is no guarantee dropped caches are on host memory.


So, for both claim, I want to see an interface to show the number of shared pages
between hosts and guests rather than imagine it.

BTW, I don't like this kind of "please give us your victim, please please please"
logic. The host should be able to "steal" what it wants in force.
Then, I think there should be no On/Off visible interfaces. The vm firmware
should tell to turn on this if administrator of the host wants.

BTW2, please test with some other benchmarks (which read file caches.)
I don't think kernel make is good test for this.

Thanks,
-Kame

2010-12-01 07:54:11

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3] Refactor zone_reclaim

Hi Balbir,

On Tue, Nov 30, 2010 at 7:15 PM, Balbir Singh <[email protected]> wrote:
> Refactor zone_reclaim, move reusable functionality outside
> of zone_reclaim. Make zone_reclaim_unmapped_pages modular
>
> Signed-off-by: Balbir Singh <[email protected]>
> ---
> ?mm/vmscan.c | ? 35 +++++++++++++++++++++++------------
> ?1 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 325443a..0ac444f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2719,6 +2719,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
> ?}
>
> ?/*
> + * Helper function to reclaim unmapped pages, we might add something
> + * similar to this for slab cache as well. Currently this function
> + * is shared with __zone_reclaim()
> + */
> +static inline void
> +zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long nr_pages)
> +{
> + ? ? ? int priority;
> + ? ? ? /*
> + ? ? ? ?* Free memory by calling shrink zone with increasing
> + ? ? ? ?* priorities until we have enough memory freed.
> + ? ? ? ?*/
> + ? ? ? priority = ZONE_RECLAIM_PRIORITY;
> + ? ? ? do {
> + ? ? ? ? ? ? ? shrink_zone(priority, zone, sc);
> + ? ? ? ? ? ? ? priority--;
> + ? ? ? } while (priority >= 0 && sc->nr_reclaimed < nr_pages);
> +}
> +

I don't see any specific logic about naming
"zone_reclaim_unmapped_pages" in your function.
Maybe, caller of this function have to handle sc->may_unmap. So, this
function's naming
is not good.
As I see your logic, the function name would be just "zone_reclaim_pages"
If you want to name it with zone_reclaim_unmapped_pages, it could be
better with setting sc->may_unmap in this function.


--
Kind regards,
Minchan Kim

2010-12-01 08:18:54

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

On Wed, Dec 1, 2010 at 2:22 PM, Balbir Singh <[email protected]> wrote:
> * Balbir Singh <[email protected]> [2010-12-01 10:24:21]:
>
>> * Andrew Morton <[email protected]> [2010-11-30 14:25:09]:
>> > So you're OK with shoving all this flotsam into 100,000,000 cellphones?
>> > This was a pretty outrageous patchset!
>>
>> I'll do a better one, BTW, a lot of embedded folks are interested in
>> page cache control outside of cgroup behaviour.

Yes. Embedded people(at least, me) want it. That's because they don't
have any swap device so they could reclaim only page cache page.
And many page cache pages are mapped at address space of
application(ex, android uses java model so many pages are mapped by
application's address space). It means it's hard to reclaim them
without lagging.
So I have a interest in this patch.

--
Kind regards,
Minchan Kim

2010-12-01 09:04:56

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/3] Refactor zone_reclaim

On Wed, 1 Dec 2010 10:52:18 +0530
Balbir Singh <[email protected]> wrote:

> * Balbir Singh <[email protected]> [2010-12-01 10:16:34]:
>
> > * KAMEZAWA Hiroyuki <[email protected]> [2010-12-01 10:23:29]:
> >
> > > On Tue, 30 Nov 2010 15:45:55 +0530
> > > Balbir Singh <[email protected]> wrote:
> > >
> > > > Refactor zone_reclaim, move reusable functionality outside
> > > > of zone_reclaim. Make zone_reclaim_unmapped_pages modular
> > > >
> > > > Signed-off-by: Balbir Singh <[email protected]>
> > >
> > > Why is this min_mapped_pages based on zone (IOW, per-zone) ?
> > >
> >
> > Kamezawa-San, this has been the design before the refactoring (it is
> > based on zone_reclaim_mode and reclaim based on top of that). I am
> > reusing bits of existing technology. The advantage of it being
> > per-zone is that it integrates well with kswapd.
> >
>

Sorry, what I wanted to here was:

Why min_mapped_pages per zone ?
Why you don't add "limit_for_unmapped_page_cache_size" for the whole system ?

I guess what you really want is "limit_for_unmapped_page_cache_size".

Then, you have to use this kind of mysterious code.
==
(zone_unmapped_file_pages(zone) >
+ UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))

Thanks,
-Kame

Subject: Re: [PATCH 3/3] Provide control over unmapped pages

On Tue, 30 Nov 2010, Andrew Morton wrote:

> > +#define UNMAPPED_PAGE_RATIO 16
>
> Well. Giving 16 a name didn't really clarify anything. Attentive
> readers will want to know what this does, why 16 was chosen and what
> the effects of changing it will be.

The meaning is analoguous to the other zone reclaim ratio. But yes it
should be justified and defined.

> > Reviewed-by: Christoph Lameter <[email protected]>
>
> So you're OK with shoving all this flotsam into 100,000,000 cellphones?
> This was a pretty outrageous patchset!

This is a feature that has been requested over and over for years. Using
/proc/vm/drop_caches for fixing situations where one simply has too many
page cache pages is not so much fun in the long run.

2010-12-02 01:22:20

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

> On Tue, 30 Nov 2010, Andrew Morton wrote:
>
> > > +#define UNMAPPED_PAGE_RATIO 16
> >
> > Well. Giving 16 a name didn't really clarify anything. Attentive
> > readers will want to know what this does, why 16 was chosen and what
> > the effects of changing it will be.
>
> The meaning is analoguous to the other zone reclaim ratio. But yes it
> should be justified and defined.
>
> > > Reviewed-by: Christoph Lameter <[email protected]>
> >
> > So you're OK with shoving all this flotsam into 100,000,000 cellphones?
> > This was a pretty outrageous patchset!
>
> This is a feature that has been requested over and over for years. Using
> /proc/vm/drop_caches for fixing situations where one simply has too many
> page cache pages is not so much fun in the long run.

I'm not against page cache limitation feature at all. But, this is
too ugly and too destructive fast path. I hope this patch reduce negative
impact more.

2010-12-02 02:56:26

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

On Thu, 2 Dec 2010 10:22:16 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> > On Tue, 30 Nov 2010, Andrew Morton wrote:
> >
> > > > +#define UNMAPPED_PAGE_RATIO 16
> > >
> > > Well. Giving 16 a name didn't really clarify anything. Attentive
> > > readers will want to know what this does, why 16 was chosen and what
> > > the effects of changing it will be.
> >
> > The meaning is analoguous to the other zone reclaim ratio. But yes it
> > should be justified and defined.
> >
> > > > Reviewed-by: Christoph Lameter <[email protected]>
> > >
> > > So you're OK with shoving all this flotsam into 100,000,000 cellphones?
> > > This was a pretty outrageous patchset!
> >
> > This is a feature that has been requested over and over for years. Using
> > /proc/vm/drop_caches for fixing situations where one simply has too many
> > page cache pages is not so much fun in the long run.
>
> I'm not against page cache limitation feature at all. But, this is
> too ugly and too destructive fast path. I hope this patch reduce negative
> impact more.
>

And I think min_mapped_unmapped_pages is ugly. It should be
"unmapped_pagecache_limit" or some because it's for limitation feature.

Thanks,
-Kame

2010-12-05 05:14:19

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 3/3] Provide control over unmapped pages

* KAMEZAWA Hiroyuki <[email protected]> [2010-12-02 11:50:36]:

> On Thu, 2 Dec 2010 10:22:16 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > > On Tue, 30 Nov 2010, Andrew Morton wrote:
> > >
> > > > > +#define UNMAPPED_PAGE_RATIO 16
> > > >
> > > > Well. Giving 16 a name didn't really clarify anything. Attentive
> > > > readers will want to know what this does, why 16 was chosen and what
> > > > the effects of changing it will be.
> > >
> > > The meaning is analoguous to the other zone reclaim ratio. But yes it
> > > should be justified and defined.
> > >
> > > > > Reviewed-by: Christoph Lameter <[email protected]>
> > > >
> > > > So you're OK with shoving all this flotsam into 100,000,000 cellphones?
> > > > This was a pretty outrageous patchset!
> > >
> > > This is a feature that has been requested over and over for years. Using
> > > /proc/vm/drop_caches for fixing situations where one simply has too many
> > > page cache pages is not so much fun in the long run.
> >
> > I'm not against page cache limitation feature at all. But, this is
> > too ugly and too destructive fast path. I hope this patch reduce negative
> > impact more.
> >
>
> And I think min_mapped_unmapped_pages is ugly. It should be
> "unmapped_pagecache_limit" or some because it's for limitation feature.
>

The feature will now be enabled with a CONFIG and boot parameter, I
find changing the naming convention now - it is already in use and
well known is not a good idea. THe name of the boot parameter can be
changed of-course.

--
Three Cheers,
Balbir