Recently, Mel Gorman added some vmscan tracepoint. but they can't
trace memcg. So, This patch series does.
following three patches are nit fix and cleanups.
memcg: sc.nr_to_reclaim should be initialized
memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask
memcg: nid and zid can be calculated from zone
following four patches are tracepoint conversion and adding memcg tracepoints.
vmscan: convert direct reclaim tracepoint to DEFINE_EVENT
memcg, vmscan: add memcg reclaim tracepoint
vmscan: convert mm_vmscan_lru_isolate to DEFINE_EVENT
memcg, vmscan: add mm_vmscan_memcg_isolate tracepoint
diffstat
================
include/linux/memcontrol.h | 6 ++--
include/linux/mmzone.h | 5 +++
include/linux/swap.h | 3 +-
include/trace/events/vmscan.h | 79 +++++++++++++++++++++++++++++++++++++++--
mm/memcontrol.c | 15 +++++---
mm/vmscan.c | 35 ++++++++++++------
6 files changed, 118 insertions(+), 25 deletions(-)
Sameple output is here.
=========================
dd-1851 [001] 158.837763: mm_vmscan_memcg_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
dd-1851 [001] 158.837783: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
dd-1851 [001] 158.837860: mm_vmscan_memcg_reclaim_end: nr_reclaimed=32
(...)
dd-1970 [000] 266.608235: mm_vmscan_wakeup_kswapd: nid=0 zid=1 order=0
dd-1970 [000] 266.608239: mm_vmscan_wakeup_kswapd: nid=1 zid=1 order=0
dd-1970 [000] 266.608248: mm_vmscan_wakeup_kswapd: nid=2 zid=1 order=0
kswapd1-348 [001] 266.608254: mm_vmscan_kswapd_wake: nid=1 order=0
dd-1970 [000] 266.608254: mm_vmscan_wakeup_kswapd: nid=3 zid=1 order=0
kswapd3-350 [000] 266.608266: mm_vmscan_kswapd_wake: nid=3 order=0
(...)
kswapd0-347 [001] 267.328891: mm_vmscan_memcg_softlimit_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
kswapd0-347 [001] 267.328897: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
kswapd0-347 [001] 267.328915: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
kswapd0-347 [001] 267.328989: mm_vmscan_memcg_softlimit_reclaim_end: nr_reclaimed=32
kswapd0-347 [001] 267.329019: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
kswapd0-347 [001] 267.330562: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
(...)
kswapd2-349 [001] 267.407081: mm_vmscan_kswapd_sleep: nid=2
kswapd3-350 [001] 267.408077: mm_vmscan_kswapd_sleep: nid=3
kswapd1-348 [000] 267.427858: mm_vmscan_kswapd_sleep: nid=1
kswapd0-347 [001] 267.430064: mm_vmscan_kswapd_sleep: nid=0
Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
It mean shrink_zone() only scan 32 pages and immediately return even if
it doesn't reclaim any pages.
This patch fixes it.
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/vmscan.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1691ad0..bd1d035 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
struct zone *zone, int nid)
{
struct scan_control sc = {
+ .nr_to_reclaim = SWAP_CLUSTER_MAX,
.may_writepage = !laptop_mode,
.may_unmap = 1,
.may_swap = !noswap,
--
1.6.5.2
Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.
thus it doesn't need to initialize sc.nodemask. shrink_zone() doesn't
use it at all.
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/linux/swap.h | 3 +--
mm/memcontrol.c | 3 +--
mm/vmscan.c | 8 ++------
3 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index ff4acea..bf4eb62 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -244,8 +244,7 @@ extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
gfp_t gfp_mask, bool noswap,
unsigned int swappiness,
- struct zone *zone,
- int nid);
+ struct zone *zone);
extern int __isolate_lru_page(struct page *page, int mode, int file);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index aba4310..01f38ff 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1307,8 +1307,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
/* we use swappiness of local cgroup */
if (check_soft)
ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
- noswap, get_swappiness(victim), zone,
- zone->zone_pgdat->node_id);
+ noswap, get_swappiness(victim), zone);
else
ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
noswap, get_swappiness(victim));
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bd1d035..be860a0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1929,7 +1929,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
gfp_t gfp_mask, bool noswap,
unsigned int swappiness,
- struct zone *zone, int nid)
+ struct zone *zone)
{
struct scan_control sc = {
.nr_to_reclaim = SWAP_CLUSTER_MAX,
@@ -1940,13 +1940,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
.order = 0,
.mem_cgroup = mem,
};
- nodemask_t nm = nodemask_of_node(nid);
-
sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
- sc.nodemask = &nm;
- sc.nr_reclaimed = 0;
- sc.nr_scanned = 0;
+
/*
* NOTE: Although we can get the priority field, using it
* here is not a good idea, since it limits the pages we can scan.
--
1.6.5.2
mem_cgroup_soft_limit_reclaim() has zone, nid and zid argument. but nid
and zid can be calculated from zone. So remove it.
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/linux/memcontrol.h | 6 +++---
include/linux/mmzone.h | 5 +++++
mm/memcontrol.c | 5 ++---
mm/vmscan.c | 7 ++-----
4 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9411d32..9dec218 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -128,8 +128,8 @@ static inline bool mem_cgroup_disabled(void)
void mem_cgroup_update_file_mapped(struct page *page, int val);
unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
- gfp_t gfp_mask, int nid,
- int zid);
+ gfp_t gfp_mask);
+
#else /* CONFIG_CGROUP_MEM_RES_CTLR */
struct mem_cgroup;
@@ -304,7 +304,7 @@ static inline void mem_cgroup_update_file_mapped(struct page *page,
static inline
unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
- gfp_t gfp_mask, int nid, int zid)
+ gfp_t gfp_mask)
{
return 0;
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9ed9c45..34ac27a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -684,6 +684,11 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
*/
#define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)
+static inline int zone_nid(struct zone *zone)
+{
+ return zone->zone_pgdat->node_id;
+}
+
static inline int populated_zone(struct zone *zone)
{
return (!!zone->present_pages);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 01f38ff..81bc9bf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2833,8 +2833,7 @@ static int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
}
unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
- gfp_t gfp_mask, int nid,
- int zid)
+ gfp_t gfp_mask)
{
unsigned long nr_reclaimed = 0;
struct mem_cgroup_per_zone *mz, *next_mz = NULL;
@@ -2846,7 +2845,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
if (order > 0)
return 0;
- mctz = soft_limit_tree_node_zone(nid, zid);
+ mctz = soft_limit_tree_node_zone(zone_nid(zone), zone_idx(zone));
/*
* This loop can run a while, specially if mem_cgroup's continuously
* keep exceeding their soft limit and putting the system under
diff --git a/mm/vmscan.c b/mm/vmscan.c
index be860a0..89b4287 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2121,7 +2121,6 @@ loop_again:
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;
int nr_slab;
- int nid, zid;
if (!populated_zone(zone))
continue;
@@ -2133,14 +2132,12 @@ loop_again:
sc.nr_scanned = 0;
note_zone_scanning_priority(zone, priority);
- nid = pgdat->node_id;
- zid = zone_idx(zone);
/*
* Call soft limit reclaim before calling shrink_zone.
* For now we ignore the return value
*/
- mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask,
- nid, zid);
+ mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask);
+
/*
* We put equal pressure on every zone, unless one
* zone has way too many pages free already.
--
1.6.5.2
TRACE_EVENT() is a bit old fashion. convert it.
no functionally change.
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/trace/events/vmscan.h | 19 +++++++++++++++++--
1 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index b26daa9..bd749c1 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -68,7 +68,7 @@ TRACE_EVENT(mm_vmscan_wakeup_kswapd,
__entry->order)
);
-TRACE_EVENT(mm_vmscan_direct_reclaim_begin,
+DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_begin_template,
TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
@@ -92,7 +92,15 @@ TRACE_EVENT(mm_vmscan_direct_reclaim_begin,
show_gfp_flags(__entry->gfp_flags))
);
-TRACE_EVENT(mm_vmscan_direct_reclaim_end,
+DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_direct_reclaim_begin,
+
+ TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
+
+ TP_ARGS(order, may_writepage, gfp_flags)
+);
+
+
+DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template,
TP_PROTO(unsigned long nr_reclaimed),
@@ -109,6 +117,13 @@ TRACE_EVENT(mm_vmscan_direct_reclaim_end,
TP_printk("nr_reclaimed=%lu", __entry->nr_reclaimed)
);
+DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_direct_reclaim_end,
+
+ TP_PROTO(unsigned long nr_reclaimed),
+
+ TP_ARGS(nr_reclaimed)
+);
+
TRACE_EVENT(mm_vmscan_lru_isolate,
TP_PROTO(int order,
--
1.6.5.2
Memcg also need to trace reclaim progress as direct reclaim. This patch
add it.
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/trace/events/vmscan.h | 28 ++++++++++++++++++++++++++++
mm/vmscan.c | 19 ++++++++++++++++++-
2 files changed, 46 insertions(+), 1 deletions(-)
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index bd749c1..cc19cb0 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -99,6 +99,19 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_direct_reclaim_b
TP_ARGS(order, may_writepage, gfp_flags)
);
+DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_memcg_reclaim_begin,
+
+ TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
+
+ TP_ARGS(order, may_writepage, gfp_flags)
+);
+
+DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_memcg_softlimit_reclaim_begin,
+
+ TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
+
+ TP_ARGS(order, may_writepage, gfp_flags)
+);
DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template,
@@ -124,6 +137,21 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_direct_reclaim_end
TP_ARGS(nr_reclaimed)
);
+DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_reclaim_end,
+
+ TP_PROTO(unsigned long nr_reclaimed),
+
+ TP_ARGS(nr_reclaimed)
+);
+
+DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_reclaim_end,
+
+ TP_PROTO(unsigned long nr_reclaimed),
+
+ TP_ARGS(nr_reclaimed)
+);
+
+
TRACE_EVENT(mm_vmscan_lru_isolate,
TP_PROTO(int order,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 89b4287..21eb94f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1943,6 +1943,10 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
+ trace_mm_vmscan_memcg_softlimit_reclaim_begin(0,
+ sc.may_writepage,
+ sc.gfp_mask);
+
/*
* NOTE: Although we can get the priority field, using it
* here is not a good idea, since it limits the pages we can scan.
@@ -1951,6 +1955,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
* the priority and make it zero.
*/
shrink_zone(0, zone, &sc);
+
+ trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
+
return sc.nr_reclaimed;
}
@@ -1960,6 +1967,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
unsigned int swappiness)
{
struct zonelist *zonelist;
+ unsigned long nr_reclaimed;
struct scan_control sc = {
.may_writepage = !laptop_mode,
.may_unmap = 1,
@@ -1974,7 +1982,16 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
zonelist = NODE_DATA(numa_node_id())->node_zonelists;
- return do_try_to_free_pages(zonelist, &sc);
+
+ trace_mm_vmscan_memcg_reclaim_begin(0,
+ sc.may_writepage,
+ sc.gfp_mask);
+
+ nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+
+ trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
+
+ return nr_reclaimed;
}
#endif
--
1.6.5.2
TRACE_EVENT() is a bit old fashion and we need to use
DECLARE_EVENT_CLASS for introducing memcg isolate pages
tracepoint.
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/trace/events/vmscan.h | 17 ++++++++++++++++-
1 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index cc19cb0..e37fe72 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -152,7 +152,7 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_re
);
-TRACE_EVENT(mm_vmscan_lru_isolate,
+DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
TP_PROTO(int order,
unsigned long nr_requested,
@@ -198,6 +198,21 @@ TRACE_EVENT(mm_vmscan_lru_isolate,
__entry->nr_lumpy_failed)
);
+DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
+
+ TP_PROTO(int order,
+ unsigned long nr_requested,
+ unsigned long nr_scanned,
+ unsigned long nr_taken,
+ unsigned long nr_lumpy_taken,
+ unsigned long nr_lumpy_dirty,
+ unsigned long nr_lumpy_failed,
+ int isolate_mode),
+
+ TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
+
+);
+
TRACE_EVENT(mm_vmscan_writepage,
TP_PROTO(struct page *page,
--
1.6.5.2
Memcg also need to trace page isolation information as global reclaim.
This patch does it.
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/trace/events/vmscan.h | 15 +++++++++++++++
mm/memcontrol.c | 7 +++++++
2 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index e37fe72..eefd399 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -213,6 +213,21 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
);
+DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
+
+ TP_PROTO(int order,
+ unsigned long nr_requested,
+ unsigned long nr_scanned,
+ unsigned long nr_taken,
+ unsigned long nr_lumpy_taken,
+ unsigned long nr_lumpy_dirty,
+ unsigned long nr_lumpy_failed,
+ int isolate_mode),
+
+ TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
+
+);
+
TRACE_EVENT(mm_vmscan_writepage,
TP_PROTO(struct page *page,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 81bc9bf..82e191f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -52,6 +52,9 @@
#include <asm/uaccess.h>
+#include <trace/events/vmscan.h>
+
+
struct cgroup_subsys mem_cgroup_subsys __read_mostly;
#define MEM_CGROUP_RECLAIM_RETRIES 5
struct mem_cgroup *root_mem_cgroup __read_mostly;
@@ -1042,6 +1045,10 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
}
*scanned = scan;
+
+ trace_mm_vmscan_memcg_isolate(0, nr_to_scan, scan, nr_taken,
+ 0, 0, 0, mode);
+
return nr_taken;
}
--
1.6.5.2
On Fri, 16 Jul 2010 19:13:31 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
> Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
> It mean shrink_zone() only scan 32 pages and immediately return even if
> it doesn't reclaim any pages.
>
> This patch fixes it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
> ---
> mm/vmscan.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1691ad0..bd1d035 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> struct zone *zone, int nid)
> {
> struct scan_control sc = {
> + .nr_to_reclaim = SWAP_CLUSTER_MAX,
> .may_writepage = !laptop_mode,
> .may_unmap = 1,
> .may_swap = !noswap,
> --
> 1.6.5.2
>
>
>
>
On Fri, 16 Jul 2010 19:14:15 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
> Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.
> thus it doesn't need to initialize sc.nodemask. shrink_zone() doesn't
> use it at all.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
On Fri, 16 Jul 2010 19:15:05 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
>
> mem_cgroup_soft_limit_reclaim() has zone, nid and zid argument. but nid
> and zid can be calculated from zone. So remove it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
On Fri, 16 Jul 2010 19:16:05 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
>
> TRACE_EVENT() is a bit old fashion. convert it.
>
> no functionally change.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
On Fri, Jul 16, 2010 at 07:13:31PM +0900, KOSAKI Motohiro wrote:
> Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
> It mean shrink_zone() only scan 32 pages and immediately return even if
> it doesn't reclaim any pages.
>
Do you mean it immediately returns once one page is reclaimed? i.e. this
check
if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
break;
> This patch fixes it.
>
Otherwise it seems ok. It's unrelated to trace points though so should
be submitted on its own.
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: Mel Gorman <[email protected]>
> ---
> mm/vmscan.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1691ad0..bd1d035 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> struct zone *zone, int nid)
> {
> struct scan_control sc = {
> + .nr_to_reclaim = SWAP_CLUSTER_MAX,
> .may_writepage = !laptop_mode,
> .may_unmap = 1,
> .may_swap = !noswap,
> --
> 1.6.5.2
>
>
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Fri, 16 Jul 2010 19:16:46 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
>
> Memcg also need to trace reclaim progress as direct reclaim. This patch
> add it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Thank you!!
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
On Fri, 16 Jul 2010 19:17:36 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
>
> TRACE_EVENT() is a bit old fashion and we need to use
> DECLARE_EVENT_CLASS for introducing memcg isolate pages
> tracepoint.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
On Fri, 16 Jul 2010 19:18:18 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
>
> Memcg also need to trace page isolation information as global reclaim.
> This patch does it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
On Fri, Jul 16, 2010 at 07:14:15PM +0900, KOSAKI Motohiro wrote:
> Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.
> thus it doesn't need to initialize sc.nodemask. shrink_zone() doesn't
> use it at all.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> include/linux/swap.h | 3 +--
> mm/memcontrol.c | 3 +--
> mm/vmscan.c | 8 ++------
> 3 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index ff4acea..bf4eb62 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -244,8 +244,7 @@ extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
> extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> gfp_t gfp_mask, bool noswap,
> unsigned int swappiness,
> - struct zone *zone,
> - int nid);
> + struct zone *zone);
> extern int __isolate_lru_page(struct page *page, int mode, int file);
> extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index aba4310..01f38ff 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1307,8 +1307,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> /* we use swappiness of local cgroup */
> if (check_soft)
> ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
> - noswap, get_swappiness(victim), zone,
> - zone->zone_pgdat->node_id);
> + noswap, get_swappiness(victim), zone);
> else
> ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
> noswap, get_swappiness(victim));
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bd1d035..be860a0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1929,7 +1929,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> gfp_t gfp_mask, bool noswap,
> unsigned int swappiness,
> - struct zone *zone, int nid)
> + struct zone *zone)
> {
> struct scan_control sc = {
> .nr_to_reclaim = SWAP_CLUSTER_MAX,
> @@ -1940,13 +1940,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> .order = 0,
> .mem_cgroup = mem,
> };
> - nodemask_t nm = nodemask_of_node(nid);
> -
> sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> - sc.nodemask = &nm;
> - sc.nr_reclaimed = 0;
> - sc.nr_scanned = 0;
> +
Removing the initialisation of nr_reclaimed and nr_scanned is slightly
outside the scope of the patch but harmless.
I see no problem with the patch but it's also not related to tracepoints
so should be part of a separate series. Still, I didn't see any problem
with it.
Acked-by: Mel Gorman <[email protected]>
> /*
> * NOTE: Although we can get the priority field, using it
> * here is not a good idea, since it limits the pages we can scan.
> --
> 1.6.5.2
>
>
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Fri, Jul 16, 2010 at 07:15:05PM +0900, KOSAKI Motohiro wrote:
>
> mem_cgroup_soft_limit_reclaim() has zone, nid and zid argument. but nid
> and zid can be calculated from zone. So remove it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> include/linux/memcontrol.h | 6 +++---
> include/linux/mmzone.h | 5 +++++
> mm/memcontrol.c | 5 ++---
> mm/vmscan.c | 7 ++-----
> 4 files changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 9411d32..9dec218 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -128,8 +128,8 @@ static inline bool mem_cgroup_disabled(void)
>
> void mem_cgroup_update_file_mapped(struct page *page, int val);
> unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
> - gfp_t gfp_mask, int nid,
> - int zid);
> + gfp_t gfp_mask);
> +
> #else /* CONFIG_CGROUP_MEM_RES_CTLR */
> struct mem_cgroup;
>
> @@ -304,7 +304,7 @@ static inline void mem_cgroup_update_file_mapped(struct page *page,
>
> static inline
> unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
> - gfp_t gfp_mask, int nid, int zid)
> + gfp_t gfp_mask)
> {
> return 0;
> }
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 9ed9c45..34ac27a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -684,6 +684,11 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
> */
> #define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)
>
> +static inline int zone_nid(struct zone *zone)
> +{
> + return zone->zone_pgdat->node_id;
> +}
> +
hmm, adding a helper and not converting the existing users of
zone->zone_pgdat may be a little confusing particularly as both types of
usage would exist in the same file e.g. in mem_cgroup_zone_nr_pages.
> static inline int populated_zone(struct zone *zone)
> {
> return (!!zone->present_pages);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 01f38ff..81bc9bf 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2833,8 +2833,7 @@ static int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
> }
>
> unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
> - gfp_t gfp_mask, int nid,
> - int zid)
> + gfp_t gfp_mask)
> {
> unsigned long nr_reclaimed = 0;
> struct mem_cgroup_per_zone *mz, *next_mz = NULL;
> @@ -2846,7 +2845,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
> if (order > 0)
> return 0;
>
> - mctz = soft_limit_tree_node_zone(nid, zid);
> + mctz = soft_limit_tree_node_zone(zone_nid(zone), zone_idx(zone));
> /*
> * This loop can run a while, specially if mem_cgroup's continuously
> * keep exceeding their soft limit and putting the system under
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index be860a0..89b4287 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2121,7 +2121,6 @@ loop_again:
> for (i = 0; i <= end_zone; i++) {
> struct zone *zone = pgdat->node_zones + i;
> int nr_slab;
> - int nid, zid;
>
> if (!populated_zone(zone))
> continue;
> @@ -2133,14 +2132,12 @@ loop_again:
> sc.nr_scanned = 0;
> note_zone_scanning_priority(zone, priority);
>
> - nid = pgdat->node_id;
> - zid = zone_idx(zone);
> /*
> * Call soft limit reclaim before calling shrink_zone.
> * For now we ignore the return value
> */
> - mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask,
> - nid, zid);
> + mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask);
> +
Other than the usual comment of it belonging in its own series
Acked-by: Mel Gorman <[email protected]>
> /*
> * We put equal pressure on every zone, unless one
> * zone has way too many pages free already.
> --
> 1.6.5.2
>
>
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Fri, Jul 16, 2010 at 07:16:05PM +0900, KOSAKI Motohiro wrote:
>
> TRACE_EVENT() is a bit old fashion. convert it.
>
heh, it's not a question of fashion :). It was defined this way because
there wasn't a second user that would use the template. I didn't see the
advantage of creating a template for one user. Your series adds a second
one so it makes sense to convert.
> no functionally change.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> include/trace/events/vmscan.h | 19 +++++++++++++++++--
> 1 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index b26daa9..bd749c1 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -68,7 +68,7 @@ TRACE_EVENT(mm_vmscan_wakeup_kswapd,
> __entry->order)
> );
>
> -TRACE_EVENT(mm_vmscan_direct_reclaim_begin,
> +DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_begin_template,
>
> TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
>
> @@ -92,7 +92,15 @@ TRACE_EVENT(mm_vmscan_direct_reclaim_begin,
> show_gfp_flags(__entry->gfp_flags))
> );
>
> -TRACE_EVENT(mm_vmscan_direct_reclaim_end,
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_direct_reclaim_begin,
> +
> + TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
> +
> + TP_ARGS(order, may_writepage, gfp_flags)
> +);
Over 80 columns there.
> +
> +
> +DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template,
>
> TP_PROTO(unsigned long nr_reclaimed),
>
> @@ -109,6 +117,13 @@ TRACE_EVENT(mm_vmscan_direct_reclaim_end,
> TP_printk("nr_reclaimed=%lu", __entry->nr_reclaimed)
> );
>
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_direct_reclaim_end,
> +
> + TP_PROTO(unsigned long nr_reclaimed),
> +
> + TP_ARGS(nr_reclaimed)
> +);
> +
Over 80 columns here too.
I know I broke it multiple times in my last series because I thought it
wasn't enforced any more but I got called on it.
> TRACE_EVENT(mm_vmscan_lru_isolate,
>
> TP_PROTO(int order,
Otherwise, I can't spot a problem.
Acked-by: Mel Gorman <[email protected]>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Fri, Jul 16, 2010 at 07:16:46PM +0900, KOSAKI Motohiro wrote:
>
> Memcg also need to trace reclaim progress as direct reclaim. This patch
> add it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: Mel Gorman <[email protected]>
> ---
> include/trace/events/vmscan.h | 28 ++++++++++++++++++++++++++++
> mm/vmscan.c | 19 ++++++++++++++++++-
> 2 files changed, 46 insertions(+), 1 deletions(-)
>
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index bd749c1..cc19cb0 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -99,6 +99,19 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_direct_reclaim_b
> TP_ARGS(order, may_writepage, gfp_flags)
> );
>
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_memcg_reclaim_begin,
> +
> + TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
> +
> + TP_ARGS(order, may_writepage, gfp_flags)
> +);
> +
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_memcg_softlimit_reclaim_begin,
> +
> + TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
> +
> + TP_ARGS(order, may_writepage, gfp_flags)
> +);
>
> DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template,
>
> @@ -124,6 +137,21 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_direct_reclaim_end
> TP_ARGS(nr_reclaimed)
> );
>
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_reclaim_end,
> +
> + TP_PROTO(unsigned long nr_reclaimed),
> +
> + TP_ARGS(nr_reclaimed)
> +);
> +
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_reclaim_end,
> +
> + TP_PROTO(unsigned long nr_reclaimed),
> +
> + TP_ARGS(nr_reclaimed)
> +);
> +
> +
> TRACE_EVENT(mm_vmscan_lru_isolate,
>
> TP_PROTO(int order,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 89b4287..21eb94f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1943,6 +1943,10 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
>
> + trace_mm_vmscan_memcg_softlimit_reclaim_begin(0,
> + sc.may_writepage,
> + sc.gfp_mask);
> +
> /*
> * NOTE: Although we can get the priority field, using it
> * here is not a good idea, since it limits the pages we can scan.
> @@ -1951,6 +1955,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> * the priority and make it zero.
> */
> shrink_zone(0, zone, &sc);
> +
> + trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
> +
> return sc.nr_reclaimed;
> }
>
> @@ -1960,6 +1967,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> unsigned int swappiness)
> {
> struct zonelist *zonelist;
> + unsigned long nr_reclaimed;
> struct scan_control sc = {
> .may_writepage = !laptop_mode,
> .may_unmap = 1,
> @@ -1974,7 +1982,16 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> zonelist = NODE_DATA(numa_node_id())->node_zonelists;
> - return do_try_to_free_pages(zonelist, &sc);
> +
> + trace_mm_vmscan_memcg_reclaim_begin(0,
> + sc.may_writepage,
> + sc.gfp_mask);
> +
> + nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
> +
> + trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
> +
> + return nr_reclaimed;
> }
> #endif
>
> --
> 1.6.5.2
>
>
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Fri, Jul 16, 2010 at 07:18:18PM +0900, KOSAKI Motohiro wrote:
>
> Memcg also need to trace page isolation information as global reclaim.
> This patch does it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> include/trace/events/vmscan.h | 15 +++++++++++++++
> mm/memcontrol.c | 7 +++++++
> 2 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index e37fe72..eefd399 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -213,6 +213,21 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
>
> );
>
> +DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
> +
> + TP_PROTO(int order,
> + unsigned long nr_requested,
I just spotted that this is badly named by myself. It should have been
order.
> + unsigned long nr_scanned,
> + unsigned long nr_taken,
> + unsigned long nr_lumpy_taken,
> + unsigned long nr_lumpy_dirty,
> + unsigned long nr_lumpy_failed,
> + int isolate_mode),
> +
> + TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
> +
> +);
> +
> TRACE_EVENT(mm_vmscan_writepage,
>
> TP_PROTO(struct page *page,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 81bc9bf..82e191f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -52,6 +52,9 @@
>
> #include <asm/uaccess.h>
>
> +#include <trace/events/vmscan.h>
> +
> +
Excessive whitespace there.
Otherwise, I didn't spot any problems.
> struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> #define MEM_CGROUP_RECLAIM_RETRIES 5
> struct mem_cgroup *root_mem_cgroup __read_mostly;
> @@ -1042,6 +1045,10 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> }
>
> *scanned = scan;
> +
> + trace_mm_vmscan_memcg_isolate(0, nr_to_scan, scan, nr_taken,
> + 0, 0, 0, mode);
> +
> return nr_taken;
> }
>
> --
> 1.6.5.2
>
>
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Fri, Jul 16, 2010 at 12:21:09PM +0100, Mel Gorman wrote:
> On Fri, Jul 16, 2010 at 07:18:18PM +0900, KOSAKI Motohiro wrote:
> >
> > Memcg also need to trace page isolation information as global reclaim.
> > This patch does it.
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > include/trace/events/vmscan.h | 15 +++++++++++++++
> > mm/memcontrol.c | 7 +++++++
> > 2 files changed, 22 insertions(+), 0 deletions(-)
> >
> > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> > index e37fe72..eefd399 100644
> > --- a/include/trace/events/vmscan.h
> > +++ b/include/trace/events/vmscan.h
> > @@ -213,6 +213,21 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
> >
> > );
> >
> > +DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
> > +
> > + TP_PROTO(int order,
> > + unsigned long nr_requested,
>
> I just spotted that this is badly named by myself. It should have been
> order.
>
I meant it should have been nr_to_scan (not order) because it's easier to
translate between what is happening in the source when reading the tracepoints.
> > + unsigned long nr_scanned,
> > + unsigned long nr_taken,
> > + unsigned long nr_lumpy_taken,
> > + unsigned long nr_lumpy_dirty,
> > + unsigned long nr_lumpy_failed,
> > + int isolate_mode),
> > +
> > + TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
> > +
> > +);
> > +
> > TRACE_EVENT(mm_vmscan_writepage,
> >
> > TP_PROTO(struct page *page,
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 81bc9bf..82e191f 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -52,6 +52,9 @@
> >
> > #include <asm/uaccess.h>
> >
> > +#include <trace/events/vmscan.h>
> > +
> > +
>
> Excessive whitespace there.
>
> Otherwise, I didn't spot any problems.
>
> > struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> > #define MEM_CGROUP_RECLAIM_RETRIES 5
> > struct mem_cgroup *root_mem_cgroup __read_mostly;
> > @@ -1042,6 +1045,10 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> > }
> >
> > *scanned = scan;
> > +
> > + trace_mm_vmscan_memcg_isolate(0, nr_to_scan, scan, nr_taken,
> > + 0, 0, 0, mode);
> > +
> > return nr_taken;
> > }
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Fri, 16 Jul 2010 19:12:46 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
> Recently, Mel Gorman added some vmscan tracepoint. but they can't
> trace memcg. So, This patch series does.
>
>
> following three patches are nit fix and cleanups.
>
> memcg: sc.nr_to_reclaim should be initialized
> memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask
> memcg: nid and zid can be calculated from zone
>
I'm not so familiar with tracepoints, but I don't have any objection to these
3 fix/cleanup for memcg.
Thanks,
Daisuke Nishimura
> following four patches are tracepoint conversion and adding memcg tracepoints.
>
> vmscan: convert direct reclaim tracepoint to DEFINE_EVENT
> memcg, vmscan: add memcg reclaim tracepoint
> vmscan: convert mm_vmscan_lru_isolate to DEFINE_EVENT
> memcg, vmscan: add mm_vmscan_memcg_isolate tracepoint
>
>
> diffstat
> ================
> include/linux/memcontrol.h | 6 ++--
> include/linux/mmzone.h | 5 +++
> include/linux/swap.h | 3 +-
> include/trace/events/vmscan.h | 79 +++++++++++++++++++++++++++++++++++++++--
> mm/memcontrol.c | 15 +++++---
> mm/vmscan.c | 35 ++++++++++++------
> 6 files changed, 118 insertions(+), 25 deletions(-)
>
>
> Sameple output is here.
> =========================
>
> dd-1851 [001] 158.837763: mm_vmscan_memcg_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
> dd-1851 [001] 158.837783: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> dd-1851 [001] 158.837860: mm_vmscan_memcg_reclaim_end: nr_reclaimed=32
> (...)
> dd-1970 [000] 266.608235: mm_vmscan_wakeup_kswapd: nid=0 zid=1 order=0
> dd-1970 [000] 266.608239: mm_vmscan_wakeup_kswapd: nid=1 zid=1 order=0
> dd-1970 [000] 266.608248: mm_vmscan_wakeup_kswapd: nid=2 zid=1 order=0
> kswapd1-348 [001] 266.608254: mm_vmscan_kswapd_wake: nid=1 order=0
> dd-1970 [000] 266.608254: mm_vmscan_wakeup_kswapd: nid=3 zid=1 order=0
> kswapd3-350 [000] 266.608266: mm_vmscan_kswapd_wake: nid=3 order=0
> (...)
> kswapd0-347 [001] 267.328891: mm_vmscan_memcg_softlimit_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
> kswapd0-347 [001] 267.328897: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.328915: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.328989: mm_vmscan_memcg_softlimit_reclaim_end: nr_reclaimed=32
> kswapd0-347 [001] 267.329019: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.330562: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> (...)
> kswapd2-349 [001] 267.407081: mm_vmscan_kswapd_sleep: nid=2
> kswapd3-350 [001] 267.408077: mm_vmscan_kswapd_sleep: nid=3
> kswapd1-348 [000] 267.427858: mm_vmscan_kswapd_sleep: nid=1
> kswapd0-347 [001] 267.430064: mm_vmscan_kswapd_sleep: nid=0
>
>
>
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Fri, 2010-07-16 at 12:08 +0100, Mel Gorman wrote:
>
> > +DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_direct_reclaim_end,
> > +
> > + TP_PROTO(unsigned long nr_reclaimed),
> > +
> > + TP_ARGS(nr_reclaimed)
> > +);
> > +
>
> Over 80 columns here too.
>
> I know I broke it multiple times in my last series because I thought it
> wasn't enforced any more but I got called on it.
Note, The TRACE_EVENT() macros have a bit more leniency to the 80 column
rule.
-- Steve
> > +static inline int zone_nid(struct zone *zone)
> > +{
> > + return zone->zone_pgdat->node_id;
> > +}
> > +
>
> hmm, adding a helper and not converting the existing users of
> zone->zone_pgdat may be a little confusing particularly as both types of
> usage would exist in the same file e.g. in mem_cgroup_zone_nr_pages.
I see. here is incrementa patch.
Thanks
>From 62cf765251af257c98fc92a58215d101d200e7ef Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <[email protected]>
Date: Tue, 20 Jul 2010 11:30:14 +0900
Subject: [PATCH] memcg: convert to zone_nid() from bare zone->zone_pgdat->node_id
Now, we have zone_nid(). this patch convert all existing users of
zone->zone_pgdat.
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/memcontrol.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 82e191f..3d5b645 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -951,7 +951,7 @@ unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
struct zone *zone,
enum lru_list lru)
{
- int nid = zone->zone_pgdat->node_id;
+ int nid = zone_nid(zone);
int zid = zone_idx(zone);
struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
@@ -961,7 +961,7 @@ unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
struct zone *zone)
{
- int nid = zone->zone_pgdat->node_id;
+ int nid = zone_nid(zone);
int zid = zone_idx(zone);
struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
@@ -1006,7 +1006,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
LIST_HEAD(pc_list);
struct list_head *src;
struct page_cgroup *pc, *tmp;
- int nid = z->zone_pgdat->node_id;
+ int nid = zone_nid(z);
int zid = zone_idx(z);
struct mem_cgroup_per_zone *mz;
int lru = LRU_FILE * file + active;
--
1.6.5.2
> On Fri, Jul 16, 2010 at 07:13:31PM +0900, KOSAKI Motohiro wrote:
> > Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
> > It mean shrink_zone() only scan 32 pages and immediately return even if
> > it doesn't reclaim any pages.
> >
>
> Do you mean it immediately returns once one page is reclaimed? i.e. this
> check
>
> if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
> break;
Strictly speaking, once SWAP_CLUSTER_MAX batch is scanned. no need
to reclaim pages at all.
* KOSAKI Motohiro <[email protected]> [2010-07-16 19:12:46]:
> Recently, Mel Gorman added some vmscan tracepoint. but they can't
> trace memcg. So, This patch series does.
>
>
> following three patches are nit fix and cleanups.
>
> memcg: sc.nr_to_reclaim should be initialized
> memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask
> memcg: nid and zid can be calculated from zone
>
> following four patches are tracepoint conversion and adding memcg tracepoints.
>
> vmscan: convert direct reclaim tracepoint to DEFINE_EVENT
> memcg, vmscan: add memcg reclaim tracepoint
> vmscan: convert mm_vmscan_lru_isolate to DEFINE_EVENT
> memcg, vmscan: add mm_vmscan_memcg_isolate tracepoint
>
>
> diffstat
> ================
> include/linux/memcontrol.h | 6 ++--
> include/linux/mmzone.h | 5 +++
> include/linux/swap.h | 3 +-
> include/trace/events/vmscan.h | 79 +++++++++++++++++++++++++++++++++++++++--
> mm/memcontrol.c | 15 +++++---
> mm/vmscan.c | 35 ++++++++++++------
> 6 files changed, 118 insertions(+), 25 deletions(-)
>
>
> Sameple output is here.
> =========================
>
> dd-1851 [001] 158.837763: mm_vmscan_memcg_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
> dd-1851 [001] 158.837783: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> dd-1851 [001] 158.837860: mm_vmscan_memcg_reclaim_end: nr_reclaimed=32
> (...)
> dd-1970 [000] 266.608235: mm_vmscan_wakeup_kswapd: nid=0 zid=1 order=0
> dd-1970 [000] 266.608239: mm_vmscan_wakeup_kswapd: nid=1 zid=1 order=0
> dd-1970 [000] 266.608248: mm_vmscan_wakeup_kswapd: nid=2 zid=1 order=0
> kswapd1-348 [001] 266.608254: mm_vmscan_kswapd_wake: nid=1 order=0
> dd-1970 [000] 266.608254: mm_vmscan_wakeup_kswapd: nid=3 zid=1 order=0
> kswapd3-350 [000] 266.608266: mm_vmscan_kswapd_wake: nid=3 order=0
> (...)
> kswapd0-347 [001] 267.328891: mm_vmscan_memcg_softlimit_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
> kswapd0-347 [001] 267.328897: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.328915: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.328989: mm_vmscan_memcg_softlimit_reclaim_end: nr_reclaimed=32
> kswapd0-347 [001] 267.329019: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.330562: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> (...)
> kswapd2-349 [001] 267.407081: mm_vmscan_kswapd_sleep: nid=2
> kswapd3-350 [001] 267.408077: mm_vmscan_kswapd_sleep: nid=3
> kswapd1-348 [000] 267.427858: mm_vmscan_kswapd_sleep: nid=1
> kswapd0-347 [001] 267.430064: mm_vmscan_kswapd_sleep: nid=0
>
This looks interesting, but I think I need to look deeper to see how
the name like mm_vmscan_memcg_softlimit_reclaim_begin is generated.
--
Three Cheers,
Balbir
* KOSAKI Motohiro <[email protected]> [2010-07-16 19:14:15]:
> Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.
> thus it doesn't need to initialize sc.nodemask. shrink_zone() doesn't
> use it at all.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> include/linux/swap.h | 3 +--
> mm/memcontrol.c | 3 +--
> mm/vmscan.c | 8 ++------
> 3 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index ff4acea..bf4eb62 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -244,8 +244,7 @@ extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
> extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> gfp_t gfp_mask, bool noswap,
> unsigned int swappiness,
> - struct zone *zone,
> - int nid);
> + struct zone *zone);
> extern int __isolate_lru_page(struct page *page, int mode, int file);
> extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index aba4310..01f38ff 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1307,8 +1307,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> /* we use swappiness of local cgroup */
> if (check_soft)
> ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
> - noswap, get_swappiness(victim), zone,
> - zone->zone_pgdat->node_id);
> + noswap, get_swappiness(victim), zone);
> else
> ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
> noswap, get_swappiness(victim));
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bd1d035..be860a0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1929,7 +1929,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> gfp_t gfp_mask, bool noswap,
> unsigned int swappiness,
> - struct zone *zone, int nid)
> + struct zone *zone)
> {
> struct scan_control sc = {
> .nr_to_reclaim = SWAP_CLUSTER_MAX,
> @@ -1940,13 +1940,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> .order = 0,
> .mem_cgroup = mem,
> };
> - nodemask_t nm = nodemask_of_node(nid);
> -
> sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> - sc.nodemask = &nm;
> - sc.nr_reclaimed = 0;
> - sc.nr_scanned = 0;
We need the initialization to 0, is there a reason why it was removed?
What happens when we compare or increment sc.nr_*?
Can we keep this indepedent of the tracing patches?
> +
> /*
> * NOTE: Although we can get the priority field, using it
> * here is not a good idea, since it limits the pages we can scan.
> --
> 1.6.5.2
>
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Three Cheers,
Balbir
* KOSAKI Motohiro <[email protected]> [2010-07-16 19:13:31]:
> Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
> It mean shrink_zone() only scan 32 pages and immediately return even if
> it doesn't reclaim any pages.
>
> This patch fixes it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> mm/vmscan.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1691ad0..bd1d035 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> struct zone *zone, int nid)
> {
> struct scan_control sc = {
> + .nr_to_reclaim = SWAP_CLUSTER_MAX,
> .may_writepage = !laptop_mode,
> .may_unmap = 1,
> .may_swap = !noswap,
Could you please do some additional testing on
1. How far does this push pages (in terms of when limit is hit)?
2. Did you hit a problem with the current setting or is it a review
fix?
--
Three Cheers,
Balbir
* KOSAKI Motohiro <[email protected]> [2010-07-16 19:16:46]:
>
> Memcg also need to trace reclaim progress as direct reclaim. This patch
> add it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Looks good to me
Acked-by: Balbir Singh <[email protected]>
--
Three Cheers,
Balbir
* KOSAKI Motohiro <[email protected]> [2010-07-21 22:33:56]:
> > > +static inline int zone_nid(struct zone *zone)
> > > +{
> > > + return zone->zone_pgdat->node_id;
> > > +}
> > > +
> >
> > hmm, adding a helper and not converting the existing users of
> > zone->zone_pgdat may be a little confusing particularly as both types of
> > usage would exist in the same file e.g. in mem_cgroup_zone_nr_pages.
>
> I see. here is incrementa patch.
>
> Thanks
>
> From 62cf765251af257c98fc92a58215d101d200e7ef Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <[email protected]>
> Date: Tue, 20 Jul 2010 11:30:14 +0900
> Subject: [PATCH] memcg: convert to zone_nid() from bare zone->zone_pgdat->node_id
>
> Now, we have zone_nid(). this patch convert all existing users of
> zone->zone_pgdat.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> mm/memcontrol.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 82e191f..3d5b645 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -951,7 +951,7 @@ unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
> struct zone *zone,
> enum lru_list lru)
> {
> - int nid = zone->zone_pgdat->node_id;
> + int nid = zone_nid(zone);
> int zid = zone_idx(zone);
> struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
>
> @@ -961,7 +961,7 @@ unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
> struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
> struct zone *zone)
> {
> - int nid = zone->zone_pgdat->node_id;
> + int nid = zone_nid(zone);
> int zid = zone_idx(zone);
> struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
>
> @@ -1006,7 +1006,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> LIST_HEAD(pc_list);
> struct list_head *src;
> struct page_cgroup *pc, *tmp;
> - int nid = z->zone_pgdat->node_id;
> + int nid = zone_nid(z);
> int zid = zone_idx(z);
> struct mem_cgroup_per_zone *mz;
> int lru = LRU_FILE * file + active;
Acked-by: Balbir Singh <[email protected]>
--
Three Cheers,
Balbir
On Wed, Jul 21, 2010 at 10:33:56PM +0900, KOSAKI Motohiro wrote:
> > > +static inline int zone_nid(struct zone *zone)
> > > +{
> > > + return zone->zone_pgdat->node_id;
> > > +}
> > > +
> >
> > hmm, adding a helper and not converting the existing users of
> > zone->zone_pgdat may be a little confusing particularly as both types of
> > usage would exist in the same file e.g. in mem_cgroup_zone_nr_pages.
>
> I see. here is incrementa patch.
>
Looks grand. Thanks
> From 62cf765251af257c98fc92a58215d101d200e7ef Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <[email protected]>
> Date: Tue, 20 Jul 2010 11:30:14 +0900
> Subject: [PATCH] memcg: convert to zone_nid() from bare zone->zone_pgdat->node_id
>
> Now, we have zone_nid(). this patch convert all existing users of
> zone->zone_pgdat.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
>
> <SNIP>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
> * KOSAKI Motohiro <[email protected]> [2010-07-16 19:14:15]:
>
> > Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.
> > thus it doesn't need to initialize sc.nodemask. shrink_zone() doesn't
> > use it at all.
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > include/linux/swap.h | 3 +--
> > mm/memcontrol.c | 3 +--
> > mm/vmscan.c | 8 ++------
> > 3 files changed, 4 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index ff4acea..bf4eb62 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -244,8 +244,7 @@ extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
> > extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> > gfp_t gfp_mask, bool noswap,
> > unsigned int swappiness,
> > - struct zone *zone,
> > - int nid);
> > + struct zone *zone);
> > extern int __isolate_lru_page(struct page *page, int mode, int file);
> > extern unsigned long shrink_all_memory(unsigned long nr_pages);
> > extern int vm_swappiness;
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index aba4310..01f38ff 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1307,8 +1307,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > /* we use swappiness of local cgroup */
> > if (check_soft)
> > ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
> > - noswap, get_swappiness(victim), zone,
> > - zone->zone_pgdat->node_id);
> > + noswap, get_swappiness(victim), zone);
> > else
> > ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
> > noswap, get_swappiness(victim));
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index bd1d035..be860a0 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1929,7 +1929,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> > unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> > gfp_t gfp_mask, bool noswap,
> > unsigned int swappiness,
> > - struct zone *zone, int nid)
> > + struct zone *zone)
> > {
> > struct scan_control sc = {
> > .nr_to_reclaim = SWAP_CLUSTER_MAX,
> > @@ -1940,13 +1940,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> > .order = 0,
> > .mem_cgroup = mem,
> > };
> > - nodemask_t nm = nodemask_of_node(nid);
> > -
> > sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> > (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> > - sc.nodemask = &nm;
> > - sc.nr_reclaimed = 0;
> > - sc.nr_scanned = 0;
>
> We need the initialization to 0, is there a reason why it was removed?
please reread C spec and other scan_control user.
sc_nr_* were already initialized in struct scan_control sc = { } line.
> What happens when we compare or increment sc.nr_*?
>
> Can we keep this indepedent of the tracing patches?
>
> > +
> > /*
> > * NOTE: Although we can get the priority field, using it
> > * here is not a good idea, since it limits the pages we can scan.
> > --
> > 1.6.5.2
> >
> >
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to [email protected]. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
> --
> Three Cheers,
> Balbir
> * KOSAKI Motohiro <[email protected]> [2010-07-16 19:13:31]:
>
> > Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
> > It mean shrink_zone() only scan 32 pages and immediately return even if
> > it doesn't reclaim any pages.
> >
> > This patch fixes it.
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > mm/vmscan.c | 1 +
> > 1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 1691ad0..bd1d035 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> > struct zone *zone, int nid)
> > {
> > struct scan_control sc = {
> > + .nr_to_reclaim = SWAP_CLUSTER_MAX,
> > .may_writepage = !laptop_mode,
> > .may_unmap = 1,
> > .may_swap = !noswap,
>
> Could you please do some additional testing on
>
> 1. How far does this push pages (in terms of when limit is hit)?
32 pages per mem_cgroup_shrink_node_zone().
That said, the algorithm is here.
1. call mem_cgroup_largest_soft_limit_node()
calculate largest cgroup
2. call mem_cgroup_shrink_node_zone() and shrink 32 pages
3. goto 1 if limit is still exceed.
If it's not your intention, can you please your intended algorithm?
> 2. Did you hit a problem with the current setting or is it a review
> fix?
I've found this by review. and my patch works fine on my test environment.
Of cource, if you do _not_ run the code on heavy pressure, your original code
works too.
On Fri, Jul 23, 2010 at 1:03 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> * KOSAKI Motohiro <[email protected]> [2010-07-16 19:13:31]:
>>
>> > Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
>> > It mean shrink_zone() only scan 32 pages and immediately return even if
>> > it doesn't reclaim any pages.
>> >
>> > This patch fixes it.
>> >
>> > Signed-off-by: KOSAKI Motohiro <[email protected]>
>> > ---
>> > ?mm/vmscan.c | ? ?1 +
>> > ?1 files changed, 1 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index 1691ad0..bd1d035 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct zone *zone, int nid)
>> > ?{
>> > ? ? struct scan_control sc = {
>> > + ? ? ? ? ? .nr_to_reclaim = SWAP_CLUSTER_MAX,
>> > ? ? ? ? ? ? .may_writepage = !laptop_mode,
>> > ? ? ? ? ? ? .may_unmap = 1,
>> > ? ? ? ? ? ? .may_swap = !noswap,
>>
>> Could you please do some additional testing on
>>
>> 1. How far does this push pages (in terms of when limit is hit)?
>
> 32 pages per mem_cgroup_shrink_node_zone().
>
> That said, the algorithm is here.
>
> 1. call mem_cgroup_largest_soft_limit_node()
> ? calculate largest cgroup
> 2. call mem_cgroup_shrink_node_zone() and shrink 32 pages
> 3. goto 1 if limit is still exceed.
>
> If it's not your intention, can you please your intended algorithm?
We set it to 0, since we care only about a single page reclaim on
hitting the limit. IIRC, in the past we saw an excessive pushback on
reclaiming SWAP_CLUSTER_MAX pages, just wanted to check if you are
seeing the same behaviour even now after your changes.
Balbir
>>
>> We need the initialization to 0, is there a reason why it was removed?
>
> please reread C spec and other scan_control user.
> sc_nr_* were already initialized in struct scan_control sc = { } line.
>
I missed the fact that designated initializers do the right thing. Thanks!
Balbir
> >> 1. How far does this push pages (in terms of when limit is hit)?
> >
> > 32 pages per mem_cgroup_shrink_node_zone().
> >
> > That said, the algorithm is here.
> >
> > 1. call mem_cgroup_largest_soft_limit_node()
> > ? calculate largest cgroup
> > 2. call mem_cgroup_shrink_node_zone() and shrink 32 pages
> > 3. goto 1 if limit is still exceed.
> >
> > If it's not your intention, can you please your intended algorithm?
>
> We set it to 0, since we care only about a single page reclaim on
> hitting the limit. IIRC, in the past we saw an excessive pushback on
> reclaiming SWAP_CLUSTER_MAX pages, just wanted to check if you are
> seeing the same behaviour even now after your changes.
Actually, we have 32 pages reclaim batch size. (see nr_scan_try_batch() and related functions)
thus <32 value doesn't works as your intended.
But, If you run your test again, and (if there is) report any bugs. I'm very glad and fix it soon.
Thanks.
On Sun, Jul 25, 2010 at 3:18 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> >> 1. How far does this push pages (in terms of when limit is hit)?
>> >
>> > 32 pages per mem_cgroup_shrink_node_zone().
>> >
>> > That said, the algorithm is here.
>> >
>> > 1. call mem_cgroup_largest_soft_limit_node()
>> > ? calculate largest cgroup
>> > 2. call mem_cgroup_shrink_node_zone() and shrink 32 pages
>> > 3. goto 1 if limit is still exceed.
>> >
>> > If it's not your intention, can you please your intended algorithm?
>>
>> We set it to 0, since we care only about a single page reclaim on
>> hitting the limit. IIRC, in the past we saw an excessive pushback on
>> reclaiming SWAP_CLUSTER_MAX pages, just wanted to check if you are
>> seeing the same behaviour even now after your changes.
>
> Actually, we have 32 pages reclaim batch size. (see nr_scan_try_batch() and related functions)
> thus <32 value doesn't works as your intended.
>
> But, If you run your test again, and (if there is) report any bugs. I'm very glad and fix it soon.
>
I understand that, the point is when to do stop the reclaim (do we
really need 32 pages to stop the reclaim, when we hit the limit,
something as low as a single page can help). This is quite a subtle
thing, I'd mark it as low priority. I'll definitely come back if I see
unexpected behaviour.
Balbir