The following series implements page cache control,
this is a split out version of patch 1 of version 3 of the
page cache optimization patches posted earlier at
Previous posting https://lkml.org/lkml/2010/11/30/79
The previous revision received lot of comments, I've tried to
address as many of those as possible in this revision. The
last series was reviewed-by Christoph Lameter.
There were comments on overlap with Nick's changes and overlap
with them. I don't feel these changes impact Nick's work and
integration can/will be considered as the patches evolve, if
need be.
Detailed Description
====================
This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario
- In a virtualized environment with cache=writethrough, we see
double caching - (one in the host and one in the guest). As
we try to scale guests, cache usage across the system grows.
The goal of this patch is to reclaim page cache when Linux is running
as a guest and get the host to hold the page cache and manage it.
There might be temporary duplication, but in the long run, memory
in the guests would be used for mapped pages.
- The option is controlled via a boot option and the administrator
can selectively turn it on, on a need to use basis.
A lot of the code is borrowed from zone_reclaim_mode logic for
__zone_reclaim(). One might argue that the with ballooning and
KSM this feature is not very useful, but even with ballooning,
we need extra logic to balloon multiple VM machines and it is hard
to figure out the correct amount of memory to balloon. With these
patches applied, each guest has a sufficient amount of free memory
available, that can be easily seen and reclaimed by the balloon driver.
The additional memory in the guest can be reused for additional
applications or used to start additional guests/balance memory in
the host.
KSM currently does not de-duplicate host and guest page cache. The goal
of this patch is to help automatically balance unmapped page cache when
instructed to do so.
There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
and the number of pages to reclaim when unmapped_page_control argument
is supplied. These numbers were chosen to avoid aggressiveness in
reaping page cache ever so frequently, at the same time providing control.
The sysctl for min_unmapped_ratio provides further control from
within the guest on the amount of unmapped pages to reclaim.
Data from the previous patchsets can be found at
https://lkml.org/lkml/2010/11/30/79
Size measurement
CONFIG_UNMAPPED_PAGECACHE_CONTROL and CONFIG_NUMA enabled
# size mm/built-in.o
text data bss dec hex filename
419431 1883047 140888 2443366 254866 mm/built-in.o
CONFIG_UNMAPPED_PAGECACHE_CONTROL disabled, CONFIG_NUMA enabled
# size mm/built-in.o
text data bss dec hex filename
418908 1883023 140888 2442819 254643 mm/built-in.o
---
Balbir Singh (3):
Move zone_reclaim() outside of CONFIG_NUMA
Refactor zone_reclaim, move reusable functionality outside
Provide control over unmapped pages
Documentation/kernel-parameters.txt | 8 ++
include/linux/mmzone.h | 4 +
include/linux/swap.h | 21 +++++-
init/Kconfig | 12 +++
kernel/sysctl.c | 20 +++--
mm/page_alloc.c | 9 ++
mm/vmscan.c | 132 +++++++++++++++++++++++++++++++----
7 files changed, 175 insertions(+), 31 deletions(-)
--
Three Cheers,
Balbir
Changelog v2
Moved sysctl for min_unmapped_ratio as well
This patch moves zone_reclaim and associated helpers
outside CONFIG_NUMA. This infrastructure is reused
in the patches for page cache control that follow.
Signed-off-by: Balbir Singh <[email protected]>
---
include/linux/mmzone.h | 4 ++--
include/linux/swap.h | 4 ++--
kernel/sysctl.c | 18 +++++++++---------
mm/vmscan.c | 2 --
4 files changed, 13 insertions(+), 15 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4890662..aeede91 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -302,12 +302,12 @@ struct zone {
*/
unsigned long lowmem_reserve[MAX_NR_ZONES];
-#ifdef CONFIG_NUMA
- int node;
/*
* zone reclaim becomes active if more unmapped pages exist.
*/
unsigned long min_unmapped_pages;
+#ifdef CONFIG_NUMA
+ int node;
unsigned long min_slab_pages;
#endif
struct per_cpu_pageset __percpu *pageset;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 84375e4..ac5c06e 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -253,11 +253,11 @@ extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
extern long vm_total_pages;
+extern int sysctl_min_unmapped_ratio;
+extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
#ifdef CONFIG_NUMA
extern int zone_reclaim_mode;
-extern int sysctl_min_unmapped_ratio;
extern int sysctl_min_slab_ratio;
-extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
#else
#define zone_reclaim_mode 0
static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index a00fdef..e40040e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1211,15 +1211,6 @@ static struct ctl_table vm_table[] = {
.extra1 = &zero,
},
#endif
-#ifdef CONFIG_NUMA
- {
- .procname = "zone_reclaim_mode",
- .data = &zone_reclaim_mode,
- .maxlen = sizeof(zone_reclaim_mode),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- .extra1 = &zero,
- },
{
.procname = "min_unmapped_ratio",
.data = &sysctl_min_unmapped_ratio,
@@ -1229,6 +1220,15 @@ static struct ctl_table vm_table[] = {
.extra1 = &zero,
.extra2 = &one_hundred,
},
+#ifdef CONFIG_NUMA
+ {
+ .procname = "zone_reclaim_mode",
+ .data = &zone_reclaim_mode,
+ .maxlen = sizeof(zone_reclaim_mode),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ .extra1 = &zero,
+ },
{
.procname = "min_slab_ratio",
.data = &sysctl_min_slab_ratio,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 42a4859..e841cae 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2740,7 +2740,6 @@ static int __init kswapd_init(void)
module_init(kswapd_init)
-#ifdef CONFIG_NUMA
/*
* Zone reclaim mode
*
@@ -2950,7 +2949,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
return ret;
}
-#endif
/*
* page_evictable - test whether a page is evictable
Move reusable functionality outside of zone_reclaim.
Make zone_reclaim_unmapped_pages modular
Signed-off-by: Balbir Singh <[email protected]>
---
mm/vmscan.c | 35 +++++++++++++++++++++++------------
1 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index e841cae..4e2ad05 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2815,6 +2815,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
}
/*
+ * Helper function to reclaim unmapped pages, we might add something
+ * similar to this for slab cache as well. Currently this function
+ * is shared with __zone_reclaim()
+ */
+static inline void
+zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
+ unsigned long nr_pages)
+{
+ int priority;
+ /*
+ * Free memory by calling shrink zone with increasing
+ * priorities until we have enough memory freed.
+ */
+ priority = ZONE_RECLAIM_PRIORITY;
+ do {
+ shrink_zone(priority, zone, sc);
+ priority--;
+ } while (priority >= 0 && sc->nr_reclaimed < nr_pages);
+}
+
+/*
* Try to free up some pages from this zone through reclaim.
*/
static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
@@ -2823,7 +2844,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
const unsigned long nr_pages = 1 << order;
struct task_struct *p = current;
struct reclaim_state reclaim_state;
- int priority;
struct scan_control sc = {
.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2847,17 +2867,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
reclaim_state.reclaimed_slab = 0;
p->reclaim_state = &reclaim_state;
- if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) {
- /*
- * Free memory by calling shrink zone with increasing
- * priorities until we have enough memory freed.
- */
- priority = ZONE_RECLAIM_PRIORITY;
- do {
- shrink_zone(priority, zone, &sc);
- priority--;
- } while (priority >= 0 && sc.nr_reclaimed < nr_pages);
- }
+ if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages)
+ zone_reclaim_unmapped_pages(zone, &sc, nr_pages);
nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
if (nr_slab_pages0 > zone->min_slab_pages) {
Changelog v2
1. Use a config option to enable the code (Andrew Morton)
2. Explain the magic tunables in the code or at-least attempt
to explain them (General comment)
3. Hint uses of the boot parameter with unlikely (Andrew Morton)
4. Use better names (balanced is not a good naming convention)
5. Updated Documentation/kernel-parameters.txt (Andrew Morton)
Provide control using zone_reclaim() and a boot parameter. The
code reuses functionality from zone_reclaim() to isolate unmapped
pages and reclaim them as a priority, ahead of other mapped pages.
Signed-off-by: Balbir Singh <[email protected]>
---
Documentation/kernel-parameters.txt | 8 +++
include/linux/swap.h | 21 ++++++--
init/Kconfig | 12 ++++
kernel/sysctl.c | 2 +
mm/page_alloc.c | 9 +++
mm/vmscan.c | 97 +++++++++++++++++++++++++++++++++++
6 files changed, 142 insertions(+), 7 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index dd8fe2b..f52b0bd 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2515,6 +2515,14 @@ and is between 256 and 4096 characters. It is defined in the file
[X86]
Set unknown_nmi_panic=1 early on boot.
+ unmapped_page_control
+ [KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL
+ is enabled. It controls the amount of unmapped memory
+ that is present in the system. This boot option plus
+ vm.min_unmapped_ratio (sysctl) provide granular control
+ over how much unmapped page cache can exist in the system
+ before kswapd starts reclaiming unmapped page cache pages.
+
usbcore.autosuspend=
[USB] The autosuspend time delay (in seconds) used
for newly-detected USB devices (default 2). This
diff --git a/include/linux/swap.h b/include/linux/swap.h
index ac5c06e..773d7e5 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -253,19 +253,32 @@ extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
extern long vm_total_pages;
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
extern int sysctl_min_unmapped_ratio;
extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
-extern int sysctl_min_slab_ratio;
#else
-#define zone_reclaim_mode 0
static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
{
return 0;
}
#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+extern bool should_reclaim_unmapped_pages(struct zone *zone);
+#else
+static inline bool should_reclaim_unmapped_pages(struct zone *zone)
+{
+ return false;
+}
+#endif
+
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
+extern int sysctl_min_slab_ratio;
+#else
+#define zone_reclaim_mode 0
+#endif
+
extern int page_evictable(struct page *page, struct vm_area_struct *vma);
extern void scan_mapping_unevictable_pages(struct address_space *);
diff --git a/init/Kconfig b/init/Kconfig
index 3eb22ad..78c9169 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -782,6 +782,18 @@ endif # NAMESPACES
config MM_OWNER
bool
+config UNMAPPED_PAGECACHE_CONTROL
+ bool "Provide control over unmapped page cache"
+ default n
+ help
+ This option adds support for controlling unmapped page cache
+ via a boot parameter (unmapped_page_control). The boot parameter
+ with sysctl (vm.min_unmapped_ratio) control the total number
+ of unmapped pages in the system. This feature is useful if
+ you want to limit the amount of unmapped page cache or want
+ to reduce page cache duplication in a virtualized environment.
+ If unsure say 'N'
+
config SYSFS_DEPRECATED
bool "enable deprecated sysfs features to support old userspace tools"
depends on SYSFS
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index e40040e..ab2c60a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1211,6 +1211,7 @@ static struct ctl_table vm_table[] = {
.extra1 = &zero,
},
#endif
+#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
{
.procname = "min_unmapped_ratio",
.data = &sysctl_min_unmapped_ratio,
@@ -1220,6 +1221,7 @@ static struct ctl_table vm_table[] = {
.extra1 = &zero,
.extra2 = &one_hundred,
},
+#endif
#ifdef CONFIG_NUMA
{
.procname = "zone_reclaim_mode",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1845a97..1c9fbab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1662,6 +1662,9 @@ zonelist_scan:
unsigned long mark;
int ret;
+ if (should_reclaim_unmapped_pages(zone))
+ wakeup_kswapd(zone, order);
+
mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
if (zone_watermark_ok(zone, order, mark,
classzone_idx, alloc_flags))
@@ -4154,10 +4157,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
zone->spanned_pages = size;
zone->present_pages = realsize;
-#ifdef CONFIG_NUMA
- zone->node = nid;
+#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
/ 100;
+#endif
+#ifdef CONFIG_NUMA
+ zone->node = nid;
zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
#endif
zone->name = zone_names[j];
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4e2ad05..daf2ad1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -158,6 +158,29 @@ static DECLARE_RWSEM(shrinker_rwsem);
#define scanning_global_lru(sc) (1)
#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+static unsigned long reclaim_unmapped_pages(int priority, struct zone *zone,
+ struct scan_control *sc);
+static int unmapped_page_control __read_mostly;
+
+static int __init unmapped_page_control_parm(char *str)
+{
+ unmapped_page_control = 1;
+ /*
+ * XXX: Should we tweak swappiness here?
+ */
+ return 1;
+}
+__setup("unmapped_page_control", unmapped_page_control_parm);
+
+#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
+static inline unsigned long reclaim_unmapped_pages(int priority,
+ struct zone *zone, struct scan_control *sc)
+{
+ return 0;
+}
+#endif
+
static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
struct scan_control *sc)
{
@@ -2297,6 +2320,12 @@ loop_again:
shrink_active_list(SWAP_CLUSTER_MAX, zone,
&sc, priority, 0);
+ /*
+ * We do unmapped page reclaim once here and once
+ * below, so that we don't lose out
+ */
+ reclaim_unmapped_pages(priority, zone, &sc);
+
if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone), 0, 0)) {
end_zone = i;
@@ -2332,6 +2361,11 @@ loop_again:
continue;
sc.nr_scanned = 0;
+ /*
+ * Reclaim unmapped pages upfront, this should be
+ * really cheap
+ */
+ reclaim_unmapped_pages(priority, zone, &sc);
/*
* Call soft limit reclaim before calling shrink_zone.
@@ -2587,7 +2621,8 @@ void wakeup_kswapd(struct zone *zone, int order)
pgdat->kswapd_max_order = order;
if (!waitqueue_active(&pgdat->kswapd_wait))
return;
- if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
+ if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
+ !should_reclaim_unmapped_pages(zone))
return;
trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
@@ -2740,6 +2775,7 @@ static int __init kswapd_init(void)
module_init(kswapd_init)
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
/*
* Zone reclaim mode
*
@@ -2960,6 +2996,65 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
return ret;
}
+#endif
+
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+/*
+ * Routine to reclaim unmapped pages, inspired from the code under
+ * CONFIG_NUMA that does unmapped page and slab page control by keeping
+ * min_unmapped_pages in the zone. We currently reclaim just unmapped
+ * pages, slab control will come in soon, at which point this routine
+ * should be called reclaim cached pages
+ */
+unsigned long reclaim_unmapped_pages(int priority, struct zone *zone,
+ struct scan_control *sc)
+{
+ if (unlikely(unmapped_page_control) &&
+ (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
+ struct scan_control nsc;
+ unsigned long nr_pages;
+
+ nsc = *sc;
+
+ nsc.swappiness = 0;
+ nsc.may_writepage = 0;
+ nsc.may_unmap = 0;
+ nsc.nr_reclaimed = 0;
+
+ nr_pages = zone_unmapped_file_pages(zone) -
+ zone->min_unmapped_pages;
+ /*
+ * We don't want to be too aggressive with our
+ * reclaim, it is our best effort to control
+ * unmapped pages
+ */
+ nr_pages >>= 3;
+
+ zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
+ return nsc.nr_reclaimed;
+ }
+ return 0;
+}
+
+/*
+ * 16 is a magic number that was pulled out of a magician's
+ * hat. This number automatically provided the best performance
+ * to memory usage (unmapped pages). Lower than this and we spend
+ * a lot of time in frequent reclaims, higher and our control is
+ * weakend.
+ */
+#define UNMAPPED_PAGE_RATIO 16
+
+bool should_reclaim_unmapped_pages(struct zone *zone)
+{
+ if (unlikely(unmapped_page_control) &&
+ (zone_unmapped_file_pages(zone) >
+ UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
+ return true;
+ return false;
+}
+#endif
+
/*
* page_evictable - test whether a page is evictable
Hi Balbir,
On Fri, Dec 10, 2010 at 11:31 PM, Balbir Singh
<[email protected]> wrote:
> Move reusable functionality outside of zone_reclaim.
> Make zone_reclaim_unmapped_pages modular
>
> Signed-off-by: Balbir Singh <[email protected]>
> ---
> ?mm/vmscan.c | ? 35 +++++++++++++++++++++++------------
> ?1 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index e841cae..4e2ad05 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2815,6 +2815,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
> ?}
>
> ?/*
> + * Helper function to reclaim unmapped pages, we might add something
> + * similar to this for slab cache as well. Currently this function
> + * is shared with __zone_reclaim()
> + */
> +static inline void
> +zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long nr_pages)
> +{
> + ? ? ? int priority;
> + ? ? ? /*
> + ? ? ? ?* Free memory by calling shrink zone with increasing
> + ? ? ? ?* priorities until we have enough memory freed.
> + ? ? ? ?*/
> + ? ? ? priority = ZONE_RECLAIM_PRIORITY;
> + ? ? ? do {
> + ? ? ? ? ? ? ? shrink_zone(priority, zone, sc);
> + ? ? ? ? ? ? ? priority--;
> + ? ? ? } while (priority >= 0 && sc->nr_reclaimed < nr_pages);
> +}
As I said previous version, zone_reclaim_unmapped_pages doesn't have
any functions related to reclaim unmapped pages.
The function name is rather strange.
It would be better to add scan_control setup in function inner to
reclaim only unmapped pages.
--
Kind regards,
Minchan Kim
On Fri, Dec 10, 2010 at 11:32 PM, Balbir Singh
<[email protected]> wrote:
> Changelog v2
> 1. Use a config option to enable the code (Andrew Morton)
> 2. Explain the magic tunables in the code or at-least attempt
> ? to explain them (General comment)
> 3. Hint uses of the boot parameter with unlikely (Andrew Morton)
> 4. Use better names (balanced is not a good naming convention)
> 5. Updated Documentation/kernel-parameters.txt (Andrew Morton)
>
> Provide control using zone_reclaim() and a boot parameter. The
> code reuses functionality from zone_reclaim() to isolate unmapped
> pages and reclaim them as a priority, ahead of other mapped pages.
>
> Signed-off-by: Balbir Singh <[email protected]>
> ---
> ?Documentation/kernel-parameters.txt | ? ?8 +++
> ?include/linux/swap.h ? ? ? ? ? ? ? ?| ? 21 ++++++--
> ?init/Kconfig ? ? ? ? ? ? ? ? ? ? ? ?| ? 12 ++++
> ?kernel/sysctl.c ? ? ? ? ? ? ? ? ? ? | ? ?2 +
> ?mm/page_alloc.c ? ? ? ? ? ? ? ? ? ? | ? ?9 +++
> ?mm/vmscan.c ? ? ? ? ? ? ? ? ? ? ? ? | ? 97 +++++++++++++++++++++++++++++++++++
> ?6 files changed, 142 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index dd8fe2b..f52b0bd 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2515,6 +2515,14 @@ and is between 256 and 4096 characters. It is defined in the file
> ? ? ? ? ? ? ? ? ? ? ? ?[X86]
> ? ? ? ? ? ? ? ? ? ? ? ?Set unknown_nmi_panic=1 early on boot.
>
> + ? ? ? unmapped_page_control
> + ? ? ? ? ? ? ? ? ? ? ? [KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL
> + ? ? ? ? ? ? ? ? ? ? ? is enabled. It controls the amount of unmapped memory
> + ? ? ? ? ? ? ? ? ? ? ? that is present in the system. This boot option plus
> + ? ? ? ? ? ? ? ? ? ? ? vm.min_unmapped_ratio (sysctl) provide granular control
> + ? ? ? ? ? ? ? ? ? ? ? over how much unmapped page cache can exist in the system
> + ? ? ? ? ? ? ? ? ? ? ? before kswapd starts reclaiming unmapped page cache pages.
> +
> ? ? ? ?usbcore.autosuspend=
> ? ? ? ? ? ? ? ? ? ? ? ?[USB] The autosuspend time delay (in seconds) used
> ? ? ? ? ? ? ? ? ? ? ? ?for newly-detected USB devices (default 2). ?This
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index ac5c06e..773d7e5 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -253,19 +253,32 @@ extern int vm_swappiness;
> ?extern int remove_mapping(struct address_space *mapping, struct page *page);
> ?extern long vm_total_pages;
>
> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
> ?extern int sysctl_min_unmapped_ratio;
> ?extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> -#ifdef CONFIG_NUMA
> -extern int zone_reclaim_mode;
> -extern int sysctl_min_slab_ratio;
> ?#else
> -#define zone_reclaim_mode 0
> ?static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> ?{
> ? ? ? ?return 0;
> ?}
> ?#endif
>
> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
> +extern bool should_reclaim_unmapped_pages(struct zone *zone);
> +#else
> +static inline bool should_reclaim_unmapped_pages(struct zone *zone)
> +{
> + ? ? ? return false;
> +}
> +#endif
> +
> +#ifdef CONFIG_NUMA
> +extern int zone_reclaim_mode;
> +extern int sysctl_min_slab_ratio;
> +#else
> +#define zone_reclaim_mode 0
> +#endif
> +
> ?extern int page_evictable(struct page *page, struct vm_area_struct *vma);
> ?extern void scan_mapping_unevictable_pages(struct address_space *);
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 3eb22ad..78c9169 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -782,6 +782,18 @@ endif # NAMESPACES
> ?config MM_OWNER
> ? ? ? ?bool
>
> +config UNMAPPED_PAGECACHE_CONTROL
> + ? ? ? bool "Provide control over unmapped page cache"
> + ? ? ? default n
> + ? ? ? help
> + ? ? ? ? This option adds support for controlling unmapped page cache
> + ? ? ? ? via a boot parameter (unmapped_page_control). The boot parameter
> + ? ? ? ? with sysctl (vm.min_unmapped_ratio) control the total number
> + ? ? ? ? of unmapped pages in the system. This feature is useful if
> + ? ? ? ? you want to limit the amount of unmapped page cache or want
> + ? ? ? ? to reduce page cache duplication in a virtualized environment.
> + ? ? ? ? If unsure say 'N'
> +
> ?config SYSFS_DEPRECATED
> ? ? ? ?bool "enable deprecated sysfs features to support old userspace tools"
> ? ? ? ?depends on SYSFS
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index e40040e..ab2c60a 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1211,6 +1211,7 @@ static struct ctl_table vm_table[] = {
> ? ? ? ? ? ? ? ?.extra1 ? ? ? ? = &zero,
> ? ? ? ?},
> ?#endif
> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
> ? ? ? ?{
> ? ? ? ? ? ? ? ?.procname ? ? ? = "min_unmapped_ratio",
> ? ? ? ? ? ? ? ?.data ? ? ? ? ? = &sysctl_min_unmapped_ratio,
> @@ -1220,6 +1221,7 @@ static struct ctl_table vm_table[] = {
> ? ? ? ? ? ? ? ?.extra1 ? ? ? ? = &zero,
> ? ? ? ? ? ? ? ?.extra2 ? ? ? ? = &one_hundred,
> ? ? ? ?},
> +#endif
> ?#ifdef CONFIG_NUMA
> ? ? ? ?{
> ? ? ? ? ? ? ? ?.procname ? ? ? = "zone_reclaim_mode",
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1845a97..1c9fbab 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1662,6 +1662,9 @@ zonelist_scan:
> ? ? ? ? ? ? ? ? ? ? ? ?unsigned long mark;
> ? ? ? ? ? ? ? ? ? ? ? ?int ret;
>
> + ? ? ? ? ? ? ? ? ? ? ? if (should_reclaim_unmapped_pages(zone))
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? wakeup_kswapd(zone, order);
I think we can put the logic into zone_watermark_okay.
> +
> ? ? ? ? ? ? ? ? ? ? ? ?mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
> ? ? ? ? ? ? ? ? ? ? ? ?if (zone_watermark_ok(zone, order, mark,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?classzone_idx, alloc_flags))
> @@ -4154,10 +4157,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>
> ? ? ? ? ? ? ? ?zone->spanned_pages = size;
> ? ? ? ? ? ? ? ?zone->present_pages = realsize;
> -#ifdef CONFIG_NUMA
> - ? ? ? ? ? ? ? zone->node = nid;
> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
> ? ? ? ? ? ? ? ?zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/ 100;
> +#endif
> +#ifdef CONFIG_NUMA
> + ? ? ? ? ? ? ? zone->node = nid;
> ? ? ? ? ? ? ? ?zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
> ?#endif
> ? ? ? ? ? ? ? ?zone->name = zone_names[j];
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4e2ad05..daf2ad1 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -158,6 +158,29 @@ static DECLARE_RWSEM(shrinker_rwsem);
> ?#define scanning_global_lru(sc) ? ? ? ?(1)
> ?#endif
>
> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
> +static unsigned long reclaim_unmapped_pages(int priority, struct zone *zone,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct scan_control *sc);
> +static int unmapped_page_control __read_mostly;
> +
> +static int __init unmapped_page_control_parm(char *str)
> +{
> + ? ? ? unmapped_page_control = 1;
> + ? ? ? /*
> + ? ? ? ?* XXX: Should we tweak swappiness here?
> + ? ? ? ?*/
> + ? ? ? return 1;
> +}
> +__setup("unmapped_page_control", unmapped_page_control_parm);
> +
> +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
> +static inline unsigned long reclaim_unmapped_pages(int priority,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct zone *zone, struct scan_control *sc)
> +{
> + ? ? ? return 0;
> +}
> +#endif
> +
> ?static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct scan_control *sc)
> ?{
> @@ -2297,6 +2320,12 @@ loop_again:
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?shrink_active_list(SWAP_CLUSTER_MAX, zone,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?&sc, priority, 0);
>
> + ? ? ? ? ? ? ? ? ? ? ? /*
> + ? ? ? ? ? ? ? ? ? ? ? ?* We do unmapped page reclaim once here and once
> + ? ? ? ? ? ? ? ? ? ? ? ?* below, so that we don't lose out
> + ? ? ? ? ? ? ? ? ? ? ? ?*/
> + ? ? ? ? ? ? ? ? ? ? ? reclaim_unmapped_pages(priority, zone, &sc);
It can make unnecessary stir of lru pages.
How about this?
zone_watermark_ok returns ZONE_UNMAPPED_PAGE_FULL.
wakeup_kswapd(..., please reclaim unmapped page cache).
If kswapd is woke up by unmapped page full, kswapd sets up sc with unmap = 0.
If the kswapd try to reclaim unmapped page, shrink_page_list doesn't
rotate non-unmapped pages.
> +
> ? ? ? ? ? ? ? ? ? ? ? ?if (!zone_watermark_ok_safe(zone, order,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?high_wmark_pages(zone), 0, 0)) {
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?end_zone = i;
> @@ -2332,6 +2361,11 @@ loop_again:
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?continue;
>
> ? ? ? ? ? ? ? ? ? ? ? ?sc.nr_scanned = 0;
> + ? ? ? ? ? ? ? ? ? ? ? /*
> + ? ? ? ? ? ? ? ? ? ? ? ?* Reclaim unmapped pages upfront, this should be
> + ? ? ? ? ? ? ? ? ? ? ? ?* really cheap
> + ? ? ? ? ? ? ? ? ? ? ? ?*/
> + ? ? ? ? ? ? ? ? ? ? ? reclaim_unmapped_pages(priority, zone, &sc);
Remove the hacky code.
>
> ? ? ? ? ? ? ? ? ? ? ? ?/*
> ? ? ? ? ? ? ? ? ? ? ? ? * Call soft limit reclaim before calling shrink_zone.
> @@ -2587,7 +2621,8 @@ void wakeup_kswapd(struct zone *zone, int order)
> ? ? ? ? ? ? ? ?pgdat->kswapd_max_order = order;
> ? ? ? ?if (!waitqueue_active(&pgdat->kswapd_wait))
> ? ? ? ? ? ? ? ?return;
> - ? ? ? if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
> + ? ? ? if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
> + ? ? ? ? ? ? ? !should_reclaim_unmapped_pages(zone))
> ? ? ? ? ? ? ? ?return;
>
> ? ? ? ?trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
> @@ -2740,6 +2775,7 @@ static int __init kswapd_init(void)
>
> ?module_init(kswapd_init)
>
> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
> ?/*
> ?* Zone reclaim mode
> ?*
> @@ -2960,6 +2996,65 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>
> ? ? ? ?return ret;
> ?}
> +#endif
> +
> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
> +/*
> + * Routine to reclaim unmapped pages, inspired from the code under
> + * CONFIG_NUMA that does unmapped page and slab page control by keeping
> + * min_unmapped_pages in the zone. We currently reclaim just unmapped
> + * pages, slab control will come in soon, at which point this routine
> + * should be called reclaim cached pages
> + */
> +unsigned long reclaim_unmapped_pages(int priority, struct zone *zone,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct scan_control *sc)
> +{
> + ? ? ? if (unlikely(unmapped_page_control) &&
> + ? ? ? ? ? ? ? (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
> + ? ? ? ? ? ? ? struct scan_control nsc;
> + ? ? ? ? ? ? ? unsigned long nr_pages;
> +
> + ? ? ? ? ? ? ? nsc = *sc;
> +
> + ? ? ? ? ? ? ? nsc.swappiness = 0;
> + ? ? ? ? ? ? ? nsc.may_writepage = 0;
> + ? ? ? ? ? ? ? nsc.may_unmap = 0;
> + ? ? ? ? ? ? ? nsc.nr_reclaimed = 0;
This logic can be put in zone_reclaim_unmapped_pages.
> +
> + ? ? ? ? ? ? ? nr_pages = zone_unmapped_file_pages(zone) -
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? zone->min_unmapped_pages;
> + ? ? ? ? ? ? ? /*
> + ? ? ? ? ? ? ? ?* We don't want to be too aggressive with our
> + ? ? ? ? ? ? ? ?* reclaim, it is our best effort to control
> + ? ? ? ? ? ? ? ?* unmapped pages
> + ? ? ? ? ? ? ? ?*/
> + ? ? ? ? ? ? ? nr_pages >>= 3;
> +
> + ? ? ? ? ? ? ? zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
> + ? ? ? ? ? ? ? return nsc.nr_reclaimed;
> + ? ? ? }
> + ? ? ? return 0;
> +}
> +
> +/*
> + * 16 is a magic number that was pulled out of a magician's
> + * hat. This number automatically provided the best performance
> + * to memory usage (unmapped pages). Lower than this and we spend
> + * a lot of time in frequent reclaims, higher and our control is
> + * weakend.
> + */
> +#define UNMAPPED_PAGE_RATIO 16
> +
> +bool should_reclaim_unmapped_pages(struct zone *zone)
> +{
> + ? ? ? if (unlikely(unmapped_page_control) &&
> + ? ? ? ? ? ? ? (zone_unmapped_file_pages(zone) >
> + ? ? ? ? ? ? ? ? ? ? ? UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> + ? ? ? ? ? ? ? return true;
> + ? ? ? return false;
> +}
> +#endif
> +
>
> ?/*
> ?* page_evictable - test whether a page is evictable
>
>
As I look first, the code isn't good shape.
But more important thing is how many there are users.
Could we pay cost to maintain feature few user?
It depends on your effort which proves the usage cases and benefit.
It would be good to give a real data.
If we want really this, how about the new cache lru idea as Kame suggests?
For example, add_to_page_cache_lru adds the page into cache lru.
page_add_file_rmap moves the page into inactive file.
page_remove_rmap moves the page into lru cache, again.
We can count the unmapped pages and if the size exceeds limit, we can
wake up kswapd.
whenever the memory pressure happens, first of all, reclaimer try to
reclaim cache lru.
It can enhance reclaim latency and is good to embedded people.
--
Kind regards,
Minchan Kim
* MinChan Kim <[email protected]> [2010-12-14 19:01:26]:
> Hi Balbir,
>
> On Fri, Dec 10, 2010 at 11:31 PM, Balbir Singh
> <[email protected]> wrote:
> > Move reusable functionality outside of zone_reclaim.
> > Make zone_reclaim_unmapped_pages modular
> >
> > Signed-off-by: Balbir Singh <[email protected]>
> > ---
> > ?mm/vmscan.c | ? 35 +++++++++++++++++++++++------------
> > ?1 files changed, 23 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index e841cae..4e2ad05 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2815,6 +2815,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
> > ?}
> >
> > ?/*
> > + * Helper function to reclaim unmapped pages, we might add something
> > + * similar to this for slab cache as well. Currently this function
> > + * is shared with __zone_reclaim()
> > + */
> > +static inline void
> > +zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long nr_pages)
> > +{
> > + ? ? ? int priority;
> > + ? ? ? /*
> > + ? ? ? ?* Free memory by calling shrink zone with increasing
> > + ? ? ? ?* priorities until we have enough memory freed.
> > + ? ? ? ?*/
> > + ? ? ? priority = ZONE_RECLAIM_PRIORITY;
> > + ? ? ? do {
> > + ? ? ? ? ? ? ? shrink_zone(priority, zone, sc);
> > + ? ? ? ? ? ? ? priority--;
> > + ? ? ? } while (priority >= 0 && sc->nr_reclaimed < nr_pages);
> > +}
>
> As I said previous version, zone_reclaim_unmapped_pages doesn't have
> any functions related to reclaim unmapped pages.
The scan control point has the right arguments for implementing
reclaim of unmapped pages.
> The function name is rather strange.
> It would be better to add scan_control setup in function inner to
> reclaim only unmapped pages.
>
> --
> Kind regards,
> Minchan Kim
--
Three Cheers,
Balbir
On Tue, Dec 14, 2010 at 8:45 PM, Balbir Singh <[email protected]> wrote:
> * MinChan Kim <[email protected]> [2010-12-14 19:01:26]:
>
>> Hi Balbir,
>>
>> On Fri, Dec 10, 2010 at 11:31 PM, Balbir Singh
>> <[email protected]> wrote:
>> > Move reusable functionality outside of zone_reclaim.
>> > Make zone_reclaim_unmapped_pages modular
>> >
>> > Signed-off-by: Balbir Singh <[email protected]>
>> > ---
>> > ?mm/vmscan.c | ? 35 +++++++++++++++++++++++------------
>> > ?1 files changed, 23 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index e841cae..4e2ad05 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -2815,6 +2815,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
>> > ?}
>> >
>> > ?/*
>> > + * Helper function to reclaim unmapped pages, we might add something
>> > + * similar to this for slab cache as well. Currently this function
>> > + * is shared with __zone_reclaim()
>> > + */
>> > +static inline void
>> > +zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
>> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long nr_pages)
>> > +{
>> > + ? ? ? int priority;
>> > + ? ? ? /*
>> > + ? ? ? ?* Free memory by calling shrink zone with increasing
>> > + ? ? ? ?* priorities until we have enough memory freed.
>> > + ? ? ? ?*/
>> > + ? ? ? priority = ZONE_RECLAIM_PRIORITY;
>> > + ? ? ? do {
>> > + ? ? ? ? ? ? ? shrink_zone(priority, zone, sc);
>> > + ? ? ? ? ? ? ? priority--;
>> > + ? ? ? } while (priority >= 0 && sc->nr_reclaimed < nr_pages);
>> > +}
>>
>> As I said previous version, zone_reclaim_unmapped_pages doesn't have
>> any functions related to reclaim unmapped pages.
>
> The scan control point has the right arguments for implementing
> reclaim of unmapped pages.
I mean you should set up scan_control setup in this function.
Current zone_reclaim_unmapped_pages doesn't have any specific routine
related to reclaim unmapped pages.
Otherwise, change the function name with just "zone_reclaim_pages". I
think you don't want it.
--
Kind regards,
Minchan Kim
* MinChan Kim <[email protected]> [2010-12-14 19:01:26]:
> Hi Balbir,
>
> On Fri, Dec 10, 2010 at 11:31 PM, Balbir Singh
> <[email protected]> wrote:
> > Move reusable functionality outside of zone_reclaim.
> > Make zone_reclaim_unmapped_pages modular
> >
> > Signed-off-by: Balbir Singh <[email protected]>
> > ---
> > ?mm/vmscan.c | ? 35 +++++++++++++++++++++++------------
> > ?1 files changed, 23 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index e841cae..4e2ad05 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2815,6 +2815,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
> > ?}
> >
> > ?/*
> > + * Helper function to reclaim unmapped pages, we might add something
> > + * similar to this for slab cache as well. Currently this function
> > + * is shared with __zone_reclaim()
> > + */
> > +static inline void
> > +zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long nr_pages)
> > +{
> > + ? ? ? int priority;
> > + ? ? ? /*
> > + ? ? ? ?* Free memory by calling shrink zone with increasing
> > + ? ? ? ?* priorities until we have enough memory freed.
> > + ? ? ? ?*/
> > + ? ? ? priority = ZONE_RECLAIM_PRIORITY;
> > + ? ? ? do {
> > + ? ? ? ? ? ? ? shrink_zone(priority, zone, sc);
> > + ? ? ? ? ? ? ? priority--;
> > + ? ? ? } while (priority >= 0 && sc->nr_reclaimed < nr_pages);
> > +}
>
> As I said previous version, zone_reclaim_unmapped_pages doesn't have
> any functions related to reclaim unmapped pages.
> The function name is rather strange.
> It would be better to add scan_control setup in function inner to
> reclaim only unmapped pages.
OK, that is an idea worth looking at, I'll revisit this function.
Thanks for the review!
--
Three Cheers,
Balbir
* MinChan Kim <[email protected]> [2010-12-14 20:02:45]:
> > + ? ? ? ? ? ? ? ? ? ? ? if (should_reclaim_unmapped_pages(zone))
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? wakeup_kswapd(zone, order);
>
> I think we can put the logic into zone_watermark_okay.
>
I did some checks and zone_watermark_ok is used in several places for
a generic check like this -- for example prior to zone_reclaim(), if
in get_page_from_freelist() we skip zones based on the return value.
The compaction code uses it as well, the impact would be deeper. The
compaction code uses it to check whether an allocation will succeed or
not, I don't want unmapped page control to impact that.
> > + ? ? ? ? ? ? ? ? ? ? ? /*
> > + ? ? ? ? ? ? ? ? ? ? ? ?* We do unmapped page reclaim once here and once
> > + ? ? ? ? ? ? ? ? ? ? ? ?* below, so that we don't lose out
> > + ? ? ? ? ? ? ? ? ? ? ? ?*/
> > + ? ? ? ? ? ? ? ? ? ? ? reclaim_unmapped_pages(priority, zone, &sc);
>
> It can make unnecessary stir of lru pages.
> How about this?
> zone_watermark_ok returns ZONE_UNMAPPED_PAGE_FULL.
> wakeup_kswapd(..., please reclaim unmapped page cache).
> If kswapd is woke up by unmapped page full, kswapd sets up sc with unmap = 0.
> If the kswapd try to reclaim unmapped page, shrink_page_list doesn't
> rotate non-unmapped pages.
With may_unmap set to 0 and may_writepage set to 0, I don't think this
should be a major problem, like I said this code is already enabled if
zone_reclaim_mode != 0 and CONFIG_NUMA is set.
> > +unsigned long reclaim_unmapped_pages(int priority, struct zone *zone,
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct scan_control *sc)
> > +{
> > + ? ? ? if (unlikely(unmapped_page_control) &&
> > + ? ? ? ? ? ? ? (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
> > + ? ? ? ? ? ? ? struct scan_control nsc;
> > + ? ? ? ? ? ? ? unsigned long nr_pages;
> > +
> > + ? ? ? ? ? ? ? nsc = *sc;
> > +
> > + ? ? ? ? ? ? ? nsc.swappiness = 0;
> > + ? ? ? ? ? ? ? nsc.may_writepage = 0;
> > + ? ? ? ? ? ? ? nsc.may_unmap = 0;
> > + ? ? ? ? ? ? ? nsc.nr_reclaimed = 0;
>
> This logic can be put in zone_reclaim_unmapped_pages.
>
Now that I refactored the code and called it zone_reclaim_pages, I
expect the correct sc to be passed to it. This code is reused between
zone_reclaim() and reclaim_unmapped_pages(). In the former,
zone_reclaim does the setup.
> If we want really this, how about the new cache lru idea as Kame suggests?
> For example, add_to_page_cache_lru adds the page into cache lru.
> page_add_file_rmap moves the page into inactive file.
> page_remove_rmap moves the page into lru cache, again.
> We can count the unmapped pages and if the size exceeds limit, we can
> wake up kswapd.
> whenever the memory pressure happens, first of all, reclaimer try to
> reclaim cache lru.
We already have a file LRU and that has active/inactive lists, I don't
think a special mapped/unmapped list makes sense at this point.
--
Three Cheers,
Balbir
* MinChan Kim <[email protected]> [2010-12-15 07:38:42]:
> On Tue, Dec 14, 2010 at 8:45 PM, Balbir Singh <[email protected]> wrote:
> > * MinChan Kim <[email protected]> [2010-12-14 19:01:26]:
> >
> >> Hi Balbir,
> >>
> >> On Fri, Dec 10, 2010 at 11:31 PM, Balbir Singh
> >> <[email protected]> wrote:
> >> > Move reusable functionality outside of zone_reclaim.
> >> > Make zone_reclaim_unmapped_pages modular
> >> >
> >> > Signed-off-by: Balbir Singh <[email protected]>
> >> > ---
> >> > ?mm/vmscan.c | ? 35 +++++++++++++++++++++++------------
> >> > ?1 files changed, 23 insertions(+), 12 deletions(-)
> >> >
> >> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> > index e841cae..4e2ad05 100644
> >> > --- a/mm/vmscan.c
> >> > +++ b/mm/vmscan.c
> >> > @@ -2815,6 +2815,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
> >> > ?}
> >> >
> >> > ?/*
> >> > + * Helper function to reclaim unmapped pages, we might add something
> >> > + * similar to this for slab cache as well. Currently this function
> >> > + * is shared with __zone_reclaim()
> >> > + */
> >> > +static inline void
> >> > +zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> >> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long nr_pages)
> >> > +{
> >> > + ? ? ? int priority;
> >> > + ? ? ? /*
> >> > + ? ? ? ?* Free memory by calling shrink zone with increasing
> >> > + ? ? ? ?* priorities until we have enough memory freed.
> >> > + ? ? ? ?*/
> >> > + ? ? ? priority = ZONE_RECLAIM_PRIORITY;
> >> > + ? ? ? do {
> >> > + ? ? ? ? ? ? ? shrink_zone(priority, zone, sc);
> >> > + ? ? ? ? ? ? ? priority--;
> >> > + ? ? ? } while (priority >= 0 && sc->nr_reclaimed < nr_pages);
> >> > +}
> >>
> >> As I said previous version, zone_reclaim_unmapped_pages doesn't have
> >> any functions related to reclaim unmapped pages.
> >
> > The scan control point has the right arguments for implementing
> > reclaim of unmapped pages.
>
> I mean you should set up scan_control setup in this function.
> Current zone_reclaim_unmapped_pages doesn't have any specific routine
> related to reclaim unmapped pages.
> Otherwise, change the function name with just "zone_reclaim_pages". I
> think you don't want it.
Done, I renamed the function to zone_reclaim_pages.
Thanks!
--
Three Cheers,
Balbir
On Thu, Dec 23, 2010 at 5:33 PM, Balbir Singh <[email protected]> wrote:
> * MinChan Kim <[email protected]> [2010-12-14 20:02:45]:
>
>> > + ? ? ? ? ? ? ? ? ? ? ? if (should_reclaim_unmapped_pages(zone))
>> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? wakeup_kswapd(zone, order);
>>
>> I think we can put the logic into zone_watermark_okay.
>>
>
> I did some checks and zone_watermark_ok is used in several places for
> a generic check like this -- for example prior to zone_reclaim(), if
> in get_page_from_freelist() we skip zones based on the return value.
> The compaction code uses it as well, the impact would be deeper. The
> compaction code uses it to check whether an allocation will succeed or
> not, I don't want unmapped page control to impact that.
Agree.
>
>> > + ? ? ? ? ? ? ? ? ? ? ? /*
>> > + ? ? ? ? ? ? ? ? ? ? ? ?* We do unmapped page reclaim once here and once
>> > + ? ? ? ? ? ? ? ? ? ? ? ?* below, so that we don't lose out
>> > + ? ? ? ? ? ? ? ? ? ? ? ?*/
>> > + ? ? ? ? ? ? ? ? ? ? ? reclaim_unmapped_pages(priority, zone, &sc);
>>
>> It can make unnecessary stir of lru pages.
>> How about this?
>> zone_watermark_ok returns ZONE_UNMAPPED_PAGE_FULL.
>> wakeup_kswapd(..., please reclaim unmapped page cache).
>> If kswapd is woke up by unmapped page full, kswapd sets up sc with unmap = 0.
>> If the kswapd try to reclaim unmapped page, shrink_page_list doesn't
>> rotate non-unmapped pages.
>
> With may_unmap set to 0 and may_writepage set to 0, I don't think this
> should be a major problem, like I said this code is already enabled if
> zone_reclaim_mode != 0 and CONFIG_NUMA is set.
True. It has been already in there.
But it is only NUMA and you are going to take out of NUMA.
That's why I have a concern.
I want to make this usefully in embedded.
Recently ChromOS try to protect mapped page so I think your series hep
the situation.
But frequent shrink unmapped pages makes stir of LRU which victim
mapped page(ie, tail of inactive file) can move into head of inactive
file. After all, LRU ordering makes confused so that NOT-LRU page can
be evicted.
>
>> > +unsigned long reclaim_unmapped_pages(int priority, struct zone *zone,
>> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct scan_control *sc)
>> > +{
>> > + ? ? ? if (unlikely(unmapped_page_control) &&
>> > + ? ? ? ? ? ? ? (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
>> > + ? ? ? ? ? ? ? struct scan_control nsc;
>> > + ? ? ? ? ? ? ? unsigned long nr_pages;
>> > +
>> > + ? ? ? ? ? ? ? nsc = *sc;
>> > +
>> > + ? ? ? ? ? ? ? nsc.swappiness = 0;
>> > + ? ? ? ? ? ? ? nsc.may_writepage = 0;
>> > + ? ? ? ? ? ? ? nsc.may_unmap = 0;
>> > + ? ? ? ? ? ? ? nsc.nr_reclaimed = 0;
>>
>> This logic can be put in zone_reclaim_unmapped_pages.
>>
>
> Now that I refactored the code and called it zone_reclaim_pages, I
> expect the correct sc to be passed to it. This code is reused between
> zone_reclaim() and reclaim_unmapped_pages(). In the former,
> zone_reclaim does the setup.
Thanks.
>
>> If we want really this, how about the new cache lru idea as Kame suggests?
>> For example, add_to_page_cache_lru adds the page into cache lru.
>> page_add_file_rmap moves the page into inactive file.
>> page_remove_rmap moves the page into lru cache, again.
>> We can count the unmapped pages and if the size exceeds limit, we can
>> wake up kswapd.
>> whenever the memory pressure happens, first of all, reclaimer try to
>> reclaim cache lru.
>
> We already have a file LRU and that has active/inactive lists, I don't
> think a special mapped/unmapped list makes sense at this point.
That's for reclaim latency for embedded use case but I think it would
have benefit in desktop, too.
But it can be another patch series so I don't insist on.
Thanks, Balbir.
>
>
> --
> ? ? ? ?Three Cheers,
> ? ? ? ?Balbir
>
--
Kind regards,
Minchan Kim