2010-07-16 10:12:51

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 0/7] memcg reclaim tracepoint

Recently, Mel Gorman added some vmscan tracepoint. but they can't
trace memcg. So, This patch series does.


following three patches are nit fix and cleanups.

memcg: sc.nr_to_reclaim should be initialized
memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask
memcg: nid and zid can be calculated from zone

following four patches are tracepoint conversion and adding memcg tracepoints.

vmscan: convert direct reclaim tracepoint to DEFINE_EVENT
memcg, vmscan: add memcg reclaim tracepoint
vmscan: convert mm_vmscan_lru_isolate to DEFINE_EVENT
memcg, vmscan: add mm_vmscan_memcg_isolate tracepoint


diffstat
================
include/linux/memcontrol.h | 6 ++--
include/linux/mmzone.h | 5 +++
include/linux/swap.h | 3 +-
include/trace/events/vmscan.h | 79 +++++++++++++++++++++++++++++++++++++++--
mm/memcontrol.c | 15 +++++---
mm/vmscan.c | 35 ++++++++++++------
6 files changed, 118 insertions(+), 25 deletions(-)


Sameple output is here.
=========================

dd-1851 [001] 158.837763: mm_vmscan_memcg_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
dd-1851 [001] 158.837783: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
dd-1851 [001] 158.837860: mm_vmscan_memcg_reclaim_end: nr_reclaimed=32
(...)
dd-1970 [000] 266.608235: mm_vmscan_wakeup_kswapd: nid=0 zid=1 order=0
dd-1970 [000] 266.608239: mm_vmscan_wakeup_kswapd: nid=1 zid=1 order=0
dd-1970 [000] 266.608248: mm_vmscan_wakeup_kswapd: nid=2 zid=1 order=0
kswapd1-348 [001] 266.608254: mm_vmscan_kswapd_wake: nid=1 order=0
dd-1970 [000] 266.608254: mm_vmscan_wakeup_kswapd: nid=3 zid=1 order=0
kswapd3-350 [000] 266.608266: mm_vmscan_kswapd_wake: nid=3 order=0
(...)
kswapd0-347 [001] 267.328891: mm_vmscan_memcg_softlimit_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
kswapd0-347 [001] 267.328897: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
kswapd0-347 [001] 267.328915: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
kswapd0-347 [001] 267.328989: mm_vmscan_memcg_softlimit_reclaim_end: nr_reclaimed=32
kswapd0-347 [001] 267.329019: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
kswapd0-347 [001] 267.330562: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
(...)
kswapd2-349 [001] 267.407081: mm_vmscan_kswapd_sleep: nid=2
kswapd3-350 [001] 267.408077: mm_vmscan_kswapd_sleep: nid=3
kswapd1-348 [000] 267.427858: mm_vmscan_kswapd_sleep: nid=1
kswapd0-347 [001] 267.430064: mm_vmscan_kswapd_sleep: nid=0





2010-07-16 10:13:38

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 1/7] memcg: sc.nr_to_reclaim should be initialized

Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
It mean shrink_zone() only scan 32 pages and immediately return even if
it doesn't reclaim any pages.

This patch fixes it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/vmscan.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1691ad0..bd1d035 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
struct zone *zone, int nid)
{
struct scan_control sc = {
+ .nr_to_reclaim = SWAP_CLUSTER_MAX,
.may_writepage = !laptop_mode,
.may_unmap = 1,
.may_swap = !noswap,
--
1.6.5.2


2010-07-16 10:14:24

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 2/7] memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask

Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.
thus it doesn't need to initialize sc.nodemask. shrink_zone() doesn't
use it at all.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/linux/swap.h | 3 +--
mm/memcontrol.c | 3 +--
mm/vmscan.c | 8 ++------
3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index ff4acea..bf4eb62 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -244,8 +244,7 @@ extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
gfp_t gfp_mask, bool noswap,
unsigned int swappiness,
- struct zone *zone,
- int nid);
+ struct zone *zone);
extern int __isolate_lru_page(struct page *page, int mode, int file);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index aba4310..01f38ff 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1307,8 +1307,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
/* we use swappiness of local cgroup */
if (check_soft)
ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
- noswap, get_swappiness(victim), zone,
- zone->zone_pgdat->node_id);
+ noswap, get_swappiness(victim), zone);
else
ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
noswap, get_swappiness(victim));
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bd1d035..be860a0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1929,7 +1929,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
gfp_t gfp_mask, bool noswap,
unsigned int swappiness,
- struct zone *zone, int nid)
+ struct zone *zone)
{
struct scan_control sc = {
.nr_to_reclaim = SWAP_CLUSTER_MAX,
@@ -1940,13 +1940,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
.order = 0,
.mem_cgroup = mem,
};
- nodemask_t nm = nodemask_of_node(nid);
-
sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
- sc.nodemask = &nm;
- sc.nr_reclaimed = 0;
- sc.nr_scanned = 0;
+
/*
* NOTE: Although we can get the priority field, using it
* here is not a good idea, since it limits the pages we can scan.
--
1.6.5.2


2010-07-16 10:15:14

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 3/7] memcg: nid and zid can be calculated from zone


mem_cgroup_soft_limit_reclaim() has zone, nid and zid argument. but nid
and zid can be calculated from zone. So remove it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/linux/memcontrol.h | 6 +++---
include/linux/mmzone.h | 5 +++++
mm/memcontrol.c | 5 ++---
mm/vmscan.c | 7 ++-----
4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9411d32..9dec218 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -128,8 +128,8 @@ static inline bool mem_cgroup_disabled(void)

void mem_cgroup_update_file_mapped(struct page *page, int val);
unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
- gfp_t gfp_mask, int nid,
- int zid);
+ gfp_t gfp_mask);
+
#else /* CONFIG_CGROUP_MEM_RES_CTLR */
struct mem_cgroup;

@@ -304,7 +304,7 @@ static inline void mem_cgroup_update_file_mapped(struct page *page,

static inline
unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
- gfp_t gfp_mask, int nid, int zid)
+ gfp_t gfp_mask)
{
return 0;
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9ed9c45..34ac27a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -684,6 +684,11 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
*/
#define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)

+static inline int zone_nid(struct zone *zone)
+{
+ return zone->zone_pgdat->node_id;
+}
+
static inline int populated_zone(struct zone *zone)
{
return (!!zone->present_pages);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 01f38ff..81bc9bf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2833,8 +2833,7 @@ static int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
}

unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
- gfp_t gfp_mask, int nid,
- int zid)
+ gfp_t gfp_mask)
{
unsigned long nr_reclaimed = 0;
struct mem_cgroup_per_zone *mz, *next_mz = NULL;
@@ -2846,7 +2845,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
if (order > 0)
return 0;

- mctz = soft_limit_tree_node_zone(nid, zid);
+ mctz = soft_limit_tree_node_zone(zone_nid(zone), zone_idx(zone));
/*
* This loop can run a while, specially if mem_cgroup's continuously
* keep exceeding their soft limit and putting the system under
diff --git a/mm/vmscan.c b/mm/vmscan.c
index be860a0..89b4287 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2121,7 +2121,6 @@ loop_again:
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;
int nr_slab;
- int nid, zid;

if (!populated_zone(zone))
continue;
@@ -2133,14 +2132,12 @@ loop_again:
sc.nr_scanned = 0;
note_zone_scanning_priority(zone, priority);

- nid = pgdat->node_id;
- zid = zone_idx(zone);
/*
* Call soft limit reclaim before calling shrink_zone.
* For now we ignore the return value
*/
- mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask,
- nid, zid);
+ mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask);
+
/*
* We put equal pressure on every zone, unless one
* zone has way too many pages free already.
--
1.6.5.2


2010-07-16 10:16:13

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 4/7] vmscan: convert direct reclaim tracepoint to DEFINE_EVENT


TRACE_EVENT() is a bit old fashion. convert it.

no functionally change.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/trace/events/vmscan.h | 19 +++++++++++++++++--
1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index b26daa9..bd749c1 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -68,7 +68,7 @@ TRACE_EVENT(mm_vmscan_wakeup_kswapd,
__entry->order)
);

-TRACE_EVENT(mm_vmscan_direct_reclaim_begin,
+DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_begin_template,

TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),

@@ -92,7 +92,15 @@ TRACE_EVENT(mm_vmscan_direct_reclaim_begin,
show_gfp_flags(__entry->gfp_flags))
);

-TRACE_EVENT(mm_vmscan_direct_reclaim_end,
+DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_direct_reclaim_begin,
+
+ TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
+
+ TP_ARGS(order, may_writepage, gfp_flags)
+);
+
+
+DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template,

TP_PROTO(unsigned long nr_reclaimed),

@@ -109,6 +117,13 @@ TRACE_EVENT(mm_vmscan_direct_reclaim_end,
TP_printk("nr_reclaimed=%lu", __entry->nr_reclaimed)
);

+DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_direct_reclaim_end,
+
+ TP_PROTO(unsigned long nr_reclaimed),
+
+ TP_ARGS(nr_reclaimed)
+);
+
TRACE_EVENT(mm_vmscan_lru_isolate,

TP_PROTO(int order,
--
1.6.5.2


2010-07-16 10:16:50

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 5/7] memcg, vmscan: add memcg reclaim tracepoint


Memcg also need to trace reclaim progress as direct reclaim. This patch
add it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/trace/events/vmscan.h | 28 ++++++++++++++++++++++++++++
mm/vmscan.c | 19 ++++++++++++++++++-
2 files changed, 46 insertions(+), 1 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index bd749c1..cc19cb0 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -99,6 +99,19 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_direct_reclaim_b
TP_ARGS(order, may_writepage, gfp_flags)
);

+DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_memcg_reclaim_begin,
+
+ TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
+
+ TP_ARGS(order, may_writepage, gfp_flags)
+);
+
+DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_memcg_softlimit_reclaim_begin,
+
+ TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
+
+ TP_ARGS(order, may_writepage, gfp_flags)
+);

DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template,

@@ -124,6 +137,21 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_direct_reclaim_end
TP_ARGS(nr_reclaimed)
);

+DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_reclaim_end,
+
+ TP_PROTO(unsigned long nr_reclaimed),
+
+ TP_ARGS(nr_reclaimed)
+);
+
+DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_reclaim_end,
+
+ TP_PROTO(unsigned long nr_reclaimed),
+
+ TP_ARGS(nr_reclaimed)
+);
+
+
TRACE_EVENT(mm_vmscan_lru_isolate,

TP_PROTO(int order,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 89b4287..21eb94f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1943,6 +1943,10 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);

+ trace_mm_vmscan_memcg_softlimit_reclaim_begin(0,
+ sc.may_writepage,
+ sc.gfp_mask);
+
/*
* NOTE: Although we can get the priority field, using it
* here is not a good idea, since it limits the pages we can scan.
@@ -1951,6 +1955,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
* the priority and make it zero.
*/
shrink_zone(0, zone, &sc);
+
+ trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
+
return sc.nr_reclaimed;
}

@@ -1960,6 +1967,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
unsigned int swappiness)
{
struct zonelist *zonelist;
+ unsigned long nr_reclaimed;
struct scan_control sc = {
.may_writepage = !laptop_mode,
.may_unmap = 1,
@@ -1974,7 +1982,16 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
zonelist = NODE_DATA(numa_node_id())->node_zonelists;
- return do_try_to_free_pages(zonelist, &sc);
+
+ trace_mm_vmscan_memcg_reclaim_begin(0,
+ sc.may_writepage,
+ sc.gfp_mask);
+
+ nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+
+ trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
+
+ return nr_reclaimed;
}
#endif

--
1.6.5.2


2010-07-16 10:17:40

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 6/7] vmscan: convert mm_vmscan_lru_isolate to DEFINE_EVENT


TRACE_EVENT() is a bit old fashion and we need to use
DECLARE_EVENT_CLASS for introducing memcg isolate pages
tracepoint.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/trace/events/vmscan.h | 17 ++++++++++++++++-
1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index cc19cb0..e37fe72 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -152,7 +152,7 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_re
);


-TRACE_EVENT(mm_vmscan_lru_isolate,
+DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,

TP_PROTO(int order,
unsigned long nr_requested,
@@ -198,6 +198,21 @@ TRACE_EVENT(mm_vmscan_lru_isolate,
__entry->nr_lumpy_failed)
);

+DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
+
+ TP_PROTO(int order,
+ unsigned long nr_requested,
+ unsigned long nr_scanned,
+ unsigned long nr_taken,
+ unsigned long nr_lumpy_taken,
+ unsigned long nr_lumpy_dirty,
+ unsigned long nr_lumpy_failed,
+ int isolate_mode),
+
+ TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
+
+);
+
TRACE_EVENT(mm_vmscan_writepage,

TP_PROTO(struct page *page,
--
1.6.5.2


2010-07-16 10:18:22

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 7/7] memcg: add mm_vmscan_memcg_isolate tracepoint


Memcg also need to trace page isolation information as global reclaim.
This patch does it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/trace/events/vmscan.h | 15 +++++++++++++++
mm/memcontrol.c | 7 +++++++
2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index e37fe72..eefd399 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -213,6 +213,21 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,

);

+DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
+
+ TP_PROTO(int order,
+ unsigned long nr_requested,
+ unsigned long nr_scanned,
+ unsigned long nr_taken,
+ unsigned long nr_lumpy_taken,
+ unsigned long nr_lumpy_dirty,
+ unsigned long nr_lumpy_failed,
+ int isolate_mode),
+
+ TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
+
+);
+
TRACE_EVENT(mm_vmscan_writepage,

TP_PROTO(struct page *page,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 81bc9bf..82e191f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -52,6 +52,9 @@

#include <asm/uaccess.h>

+#include <trace/events/vmscan.h>
+
+
struct cgroup_subsys mem_cgroup_subsys __read_mostly;
#define MEM_CGROUP_RECLAIM_RETRIES 5
struct mem_cgroup *root_mem_cgroup __read_mostly;
@@ -1042,6 +1045,10 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
}

*scanned = scan;
+
+ trace_mm_vmscan_memcg_isolate(0, nr_to_scan, scan, nr_taken,
+ 0, 0, 0, mode);
+
return nr_taken;
}

--
1.6.5.2


2010-07-16 10:21:49

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 1/7] memcg: sc.nr_to_reclaim should be initialized

On Fri, 16 Jul 2010 19:13:31 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
> It mean shrink_zone() only scan 32 pages and immediately return even if
> it doesn't reclaim any pages.
>
> This patch fixes it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

Acked-by: KAMEZAWA Hiroyuki <[email protected]>

> ---
> mm/vmscan.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1691ad0..bd1d035 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> struct zone *zone, int nid)
> {
> struct scan_control sc = {
> + .nr_to_reclaim = SWAP_CLUSTER_MAX,
> .may_writepage = !laptop_mode,
> .may_unmap = 1,
> .may_swap = !noswap,
> --
> 1.6.5.2
>
>
>
>

2010-07-16 10:23:21

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/7] memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask

On Fri, 16 Jul 2010 19:14:15 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.
> thus it doesn't need to initialize sc.nodemask. shrink_zone() doesn't
> use it at all.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

Acked-by: KAMEZAWA Hiroyuki <[email protected]>

2010-07-16 10:24:16

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 3/7] memcg: nid and zid can be calculated from zone

On Fri, 16 Jul 2010 19:15:05 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

>
> mem_cgroup_soft_limit_reclaim() has zone, nid and zid argument. but nid
> and zid can be calculated from zone. So remove it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

Acked-by: KAMEZAWA Hiroyuki <[email protected]>

2010-07-16 10:25:13

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 4/7] vmscan: convert direct reclaim tracepoint to DEFINE_EVENT

On Fri, 16 Jul 2010 19:16:05 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

>
> TRACE_EVENT() is a bit old fashion. convert it.
>
> no functionally change.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>

2010-07-16 10:26:16

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 1/7] memcg: sc.nr_to_reclaim should be initialized

On Fri, Jul 16, 2010 at 07:13:31PM +0900, KOSAKI Motohiro wrote:
> Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
> It mean shrink_zone() only scan 32 pages and immediately return even if
> it doesn't reclaim any pages.
>

Do you mean it immediately returns once one page is reclaimed? i.e. this
check

if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
break;


> This patch fixes it.
>

Otherwise it seems ok. It's unrelated to trace points though so should
be submitted on its own.

> Signed-off-by: KOSAKI Motohiro <[email protected]>

Acked-by: Mel Gorman <[email protected]>

> ---
> mm/vmscan.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1691ad0..bd1d035 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> struct zone *zone, int nid)
> {
> struct scan_control sc = {
> + .nr_to_reclaim = SWAP_CLUSTER_MAX,
> .may_writepage = !laptop_mode,
> .may_unmap = 1,
> .may_swap = !noswap,
> --
> 1.6.5.2
>
>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-07-16 10:26:38

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 5/7] memcg, vmscan: add memcg reclaim tracepoint

On Fri, 16 Jul 2010 19:16:46 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

>
> Memcg also need to trace reclaim progress as direct reclaim. This patch
> add it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

Thank you!!

Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>

2010-07-16 10:27:33

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 6/7] vmscan: convert mm_vmscan_lru_isolate to DEFINE_EVENT

On Fri, 16 Jul 2010 19:17:36 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

>
> TRACE_EVENT() is a bit old fashion and we need to use
> DECLARE_EVENT_CLASS for introducing memcg isolate pages
> tracepoint.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>

2010-07-16 10:28:20

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 7/7] memcg: add mm_vmscan_memcg_isolate tracepoint

On Fri, 16 Jul 2010 19:18:18 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

>
> Memcg also need to trace page isolation information as global reclaim.
> This patch does it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>

2010-07-16 10:47:28

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/7] memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask

On Fri, Jul 16, 2010 at 07:14:15PM +0900, KOSAKI Motohiro wrote:
> Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.
> thus it doesn't need to initialize sc.nodemask. shrink_zone() doesn't
> use it at all.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> include/linux/swap.h | 3 +--
> mm/memcontrol.c | 3 +--
> mm/vmscan.c | 8 ++------
> 3 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index ff4acea..bf4eb62 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -244,8 +244,7 @@ extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
> extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> gfp_t gfp_mask, bool noswap,
> unsigned int swappiness,
> - struct zone *zone,
> - int nid);
> + struct zone *zone);
> extern int __isolate_lru_page(struct page *page, int mode, int file);
> extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index aba4310..01f38ff 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1307,8 +1307,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> /* we use swappiness of local cgroup */
> if (check_soft)
> ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
> - noswap, get_swappiness(victim), zone,
> - zone->zone_pgdat->node_id);
> + noswap, get_swappiness(victim), zone);
> else
> ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
> noswap, get_swappiness(victim));
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bd1d035..be860a0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1929,7 +1929,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> gfp_t gfp_mask, bool noswap,
> unsigned int swappiness,
> - struct zone *zone, int nid)
> + struct zone *zone)
> {
> struct scan_control sc = {
> .nr_to_reclaim = SWAP_CLUSTER_MAX,
> @@ -1940,13 +1940,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> .order = 0,
> .mem_cgroup = mem,
> };
> - nodemask_t nm = nodemask_of_node(nid);
> -
> sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> - sc.nodemask = &nm;
> - sc.nr_reclaimed = 0;
> - sc.nr_scanned = 0;
> +

Removing the initialisation of nr_reclaimed and nr_scanned is slightly
outside the scope of the patch but harmless.

I see no problem with the patch but it's also not related to tracepoints
so should be part of a separate series. Still, I didn't see any problem
with it.

Acked-by: Mel Gorman <[email protected]>

> /*
> * NOTE: Although we can get the priority field, using it
> * here is not a good idea, since it limits the pages we can scan.
> --
> 1.6.5.2
>
>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-07-16 10:57:08

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 3/7] memcg: nid and zid can be calculated from zone

On Fri, Jul 16, 2010 at 07:15:05PM +0900, KOSAKI Motohiro wrote:
>
> mem_cgroup_soft_limit_reclaim() has zone, nid and zid argument. but nid
> and zid can be calculated from zone. So remove it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> include/linux/memcontrol.h | 6 +++---
> include/linux/mmzone.h | 5 +++++
> mm/memcontrol.c | 5 ++---
> mm/vmscan.c | 7 ++-----
> 4 files changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 9411d32..9dec218 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -128,8 +128,8 @@ static inline bool mem_cgroup_disabled(void)
>
> void mem_cgroup_update_file_mapped(struct page *page, int val);
> unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
> - gfp_t gfp_mask, int nid,
> - int zid);
> + gfp_t gfp_mask);
> +
> #else /* CONFIG_CGROUP_MEM_RES_CTLR */
> struct mem_cgroup;
>
> @@ -304,7 +304,7 @@ static inline void mem_cgroup_update_file_mapped(struct page *page,
>
> static inline
> unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
> - gfp_t gfp_mask, int nid, int zid)
> + gfp_t gfp_mask)
> {
> return 0;
> }
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 9ed9c45..34ac27a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -684,6 +684,11 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
> */
> #define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)
>
> +static inline int zone_nid(struct zone *zone)
> +{
> + return zone->zone_pgdat->node_id;
> +}
> +

hmm, adding a helper and not converting the existing users of
zone->zone_pgdat may be a little confusing particularly as both types of
usage would exist in the same file e.g. in mem_cgroup_zone_nr_pages.

> static inline int populated_zone(struct zone *zone)
> {
> return (!!zone->present_pages);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 01f38ff..81bc9bf 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2833,8 +2833,7 @@ static int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
> }
>
> unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
> - gfp_t gfp_mask, int nid,
> - int zid)
> + gfp_t gfp_mask)
> {
> unsigned long nr_reclaimed = 0;
> struct mem_cgroup_per_zone *mz, *next_mz = NULL;
> @@ -2846,7 +2845,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
> if (order > 0)
> return 0;
>
> - mctz = soft_limit_tree_node_zone(nid, zid);
> + mctz = soft_limit_tree_node_zone(zone_nid(zone), zone_idx(zone));
> /*
> * This loop can run a while, specially if mem_cgroup's continuously
> * keep exceeding their soft limit and putting the system under
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index be860a0..89b4287 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2121,7 +2121,6 @@ loop_again:
> for (i = 0; i <= end_zone; i++) {
> struct zone *zone = pgdat->node_zones + i;
> int nr_slab;
> - int nid, zid;
>
> if (!populated_zone(zone))
> continue;
> @@ -2133,14 +2132,12 @@ loop_again:
> sc.nr_scanned = 0;
> note_zone_scanning_priority(zone, priority);
>
> - nid = pgdat->node_id;
> - zid = zone_idx(zone);
> /*
> * Call soft limit reclaim before calling shrink_zone.
> * For now we ignore the return value
> */
> - mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask,
> - nid, zid);
> + mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask);
> +

Other than the usual comment of it belonging in its own series

Acked-by: Mel Gorman <[email protected]>

> /*
> * We put equal pressure on every zone, unless one
> * zone has way too many pages free already.
> --
> 1.6.5.2
>
>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-07-16 11:09:17

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 4/7] vmscan: convert direct reclaim tracepoint to DEFINE_EVENT

On Fri, Jul 16, 2010 at 07:16:05PM +0900, KOSAKI Motohiro wrote:
>
> TRACE_EVENT() is a bit old fashion. convert it.
>

heh, it's not a question of fashion :). It was defined this way because
there wasn't a second user that would use the template. I didn't see the
advantage of creating a template for one user. Your series adds a second
one so it makes sense to convert.

> no functionally change.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> include/trace/events/vmscan.h | 19 +++++++++++++++++--
> 1 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index b26daa9..bd749c1 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -68,7 +68,7 @@ TRACE_EVENT(mm_vmscan_wakeup_kswapd,
> __entry->order)
> );
>
> -TRACE_EVENT(mm_vmscan_direct_reclaim_begin,
> +DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_begin_template,
>
> TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
>
> @@ -92,7 +92,15 @@ TRACE_EVENT(mm_vmscan_direct_reclaim_begin,
> show_gfp_flags(__entry->gfp_flags))
> );
>
> -TRACE_EVENT(mm_vmscan_direct_reclaim_end,
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_direct_reclaim_begin,
> +
> + TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
> +
> + TP_ARGS(order, may_writepage, gfp_flags)
> +);

Over 80 columns there.

> +
> +
> +DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template,
>
> TP_PROTO(unsigned long nr_reclaimed),
>
> @@ -109,6 +117,13 @@ TRACE_EVENT(mm_vmscan_direct_reclaim_end,
> TP_printk("nr_reclaimed=%lu", __entry->nr_reclaimed)
> );
>
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_direct_reclaim_end,
> +
> + TP_PROTO(unsigned long nr_reclaimed),
> +
> + TP_ARGS(nr_reclaimed)
> +);
> +

Over 80 columns here too.

I know I broke it multiple times in my last series because I thought it
wasn't enforced any more but I got called on it.


> TRACE_EVENT(mm_vmscan_lru_isolate,
>
> TP_PROTO(int order,

Otherwise, I can't spot a problem.

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-07-16 11:18:13

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 5/7] memcg, vmscan: add memcg reclaim tracepoint

On Fri, Jul 16, 2010 at 07:16:46PM +0900, KOSAKI Motohiro wrote:
>
> Memcg also need to trace reclaim progress as direct reclaim. This patch
> add it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

Acked-by: Mel Gorman <[email protected]>

> ---
> include/trace/events/vmscan.h | 28 ++++++++++++++++++++++++++++
> mm/vmscan.c | 19 ++++++++++++++++++-
> 2 files changed, 46 insertions(+), 1 deletions(-)
>
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index bd749c1..cc19cb0 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -99,6 +99,19 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_direct_reclaim_b
> TP_ARGS(order, may_writepage, gfp_flags)
> );
>
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_memcg_reclaim_begin,
> +
> + TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
> +
> + TP_ARGS(order, may_writepage, gfp_flags)
> +);
> +
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_begin_template, mm_vmscan_memcg_softlimit_reclaim_begin,
> +
> + TP_PROTO(int order, int may_writepage, gfp_t gfp_flags),
> +
> + TP_ARGS(order, may_writepage, gfp_flags)
> +);
>
> DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template,
>
> @@ -124,6 +137,21 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_direct_reclaim_end
> TP_ARGS(nr_reclaimed)
> );
>
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_reclaim_end,
> +
> + TP_PROTO(unsigned long nr_reclaimed),
> +
> + TP_ARGS(nr_reclaimed)
> +);
> +
> +DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_reclaim_end,
> +
> + TP_PROTO(unsigned long nr_reclaimed),
> +
> + TP_ARGS(nr_reclaimed)
> +);
> +
> +
> TRACE_EVENT(mm_vmscan_lru_isolate,
>
> TP_PROTO(int order,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 89b4287..21eb94f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1943,6 +1943,10 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
>
> + trace_mm_vmscan_memcg_softlimit_reclaim_begin(0,
> + sc.may_writepage,
> + sc.gfp_mask);
> +
> /*
> * NOTE: Although we can get the priority field, using it
> * here is not a good idea, since it limits the pages we can scan.
> @@ -1951,6 +1955,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> * the priority and make it zero.
> */
> shrink_zone(0, zone, &sc);
> +
> + trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
> +
> return sc.nr_reclaimed;
> }
>
> @@ -1960,6 +1967,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> unsigned int swappiness)
> {
> struct zonelist *zonelist;
> + unsigned long nr_reclaimed;
> struct scan_control sc = {
> .may_writepage = !laptop_mode,
> .may_unmap = 1,
> @@ -1974,7 +1982,16 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> zonelist = NODE_DATA(numa_node_id())->node_zonelists;
> - return do_try_to_free_pages(zonelist, &sc);
> +
> + trace_mm_vmscan_memcg_reclaim_begin(0,
> + sc.may_writepage,
> + sc.gfp_mask);
> +
> + nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
> +
> + trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
> +
> + return nr_reclaimed;
> }
> #endif
>
> --
> 1.6.5.2
>
>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-07-16 11:21:28

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 7/7] memcg: add mm_vmscan_memcg_isolate tracepoint

On Fri, Jul 16, 2010 at 07:18:18PM +0900, KOSAKI Motohiro wrote:
>
> Memcg also need to trace page isolation information as global reclaim.
> This patch does it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> include/trace/events/vmscan.h | 15 +++++++++++++++
> mm/memcontrol.c | 7 +++++++
> 2 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index e37fe72..eefd399 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -213,6 +213,21 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
>
> );
>
> +DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
> +
> + TP_PROTO(int order,
> + unsigned long nr_requested,

I just spotted that this is badly named by myself. It should have been
order.

> + unsigned long nr_scanned,
> + unsigned long nr_taken,
> + unsigned long nr_lumpy_taken,
> + unsigned long nr_lumpy_dirty,
> + unsigned long nr_lumpy_failed,
> + int isolate_mode),
> +
> + TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
> +
> +);
> +
> TRACE_EVENT(mm_vmscan_writepage,
>
> TP_PROTO(struct page *page,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 81bc9bf..82e191f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -52,6 +52,9 @@
>
> #include <asm/uaccess.h>
>
> +#include <trace/events/vmscan.h>
> +
> +

Excessive whitespace there.

Otherwise, I didn't spot any problems.

> struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> #define MEM_CGROUP_RECLAIM_RETRIES 5
> struct mem_cgroup *root_mem_cgroup __read_mostly;
> @@ -1042,6 +1045,10 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> }
>
> *scanned = scan;
> +
> + trace_mm_vmscan_memcg_isolate(0, nr_to_scan, scan, nr_taken,
> + 0, 0, 0, mode);
> +
> return nr_taken;
> }
>
> --
> 1.6.5.2
>
>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-07-16 13:19:11

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 7/7] memcg: add mm_vmscan_memcg_isolate tracepoint

On Fri, Jul 16, 2010 at 12:21:09PM +0100, Mel Gorman wrote:
> On Fri, Jul 16, 2010 at 07:18:18PM +0900, KOSAKI Motohiro wrote:
> >
> > Memcg also need to trace page isolation information as global reclaim.
> > This patch does it.
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > include/trace/events/vmscan.h | 15 +++++++++++++++
> > mm/memcontrol.c | 7 +++++++
> > 2 files changed, 22 insertions(+), 0 deletions(-)
> >
> > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> > index e37fe72..eefd399 100644
> > --- a/include/trace/events/vmscan.h
> > +++ b/include/trace/events/vmscan.h
> > @@ -213,6 +213,21 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
> >
> > );
> >
> > +DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
> > +
> > + TP_PROTO(int order,
> > + unsigned long nr_requested,
>
> I just spotted that this is badly named by myself. It should have been
> order.
>

I meant it should have been nr_to_scan (not order) because it's easier to
translate between what is happening in the source when reading the tracepoints.

> > + unsigned long nr_scanned,
> > + unsigned long nr_taken,
> > + unsigned long nr_lumpy_taken,
> > + unsigned long nr_lumpy_dirty,
> > + unsigned long nr_lumpy_failed,
> > + int isolate_mode),
> > +
> > + TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
> > +
> > +);
> > +
> > TRACE_EVENT(mm_vmscan_writepage,
> >
> > TP_PROTO(struct page *page,
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 81bc9bf..82e191f 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -52,6 +52,9 @@
> >
> > #include <asm/uaccess.h>
> >
> > +#include <trace/events/vmscan.h>
> > +
> > +
>
> Excessive whitespace there.
>
> Otherwise, I didn't spot any problems.
>
> > struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> > #define MEM_CGROUP_RECLAIM_RETRIES 5
> > struct mem_cgroup *root_mem_cgroup __read_mostly;
> > @@ -1042,6 +1045,10 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> > }
> >
> > *scanned = scan;
> > +
> > + trace_mm_vmscan_memcg_isolate(0, nr_to_scan, scan, nr_taken,
> > + 0, 0, 0, mode);
> > +
> > return nr_taken;
> > }

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-07-20 01:59:20

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [PATCH 0/7] memcg reclaim tracepoint

On Fri, 16 Jul 2010 19:12:46 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Recently, Mel Gorman added some vmscan tracepoint. but they can't
> trace memcg. So, This patch series does.
>
>
> following three patches are nit fix and cleanups.
>
> memcg: sc.nr_to_reclaim should be initialized
> memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask
> memcg: nid and zid can be calculated from zone
>
I'm not so familiar with tracepoints, but I don't have any objection to these
3 fix/cleanup for memcg.

Thanks,
Daisuke Nishimura

> following four patches are tracepoint conversion and adding memcg tracepoints.
>
> vmscan: convert direct reclaim tracepoint to DEFINE_EVENT
> memcg, vmscan: add memcg reclaim tracepoint
> vmscan: convert mm_vmscan_lru_isolate to DEFINE_EVENT
> memcg, vmscan: add mm_vmscan_memcg_isolate tracepoint
>
>
> diffstat
> ================
> include/linux/memcontrol.h | 6 ++--
> include/linux/mmzone.h | 5 +++
> include/linux/swap.h | 3 +-
> include/trace/events/vmscan.h | 79 +++++++++++++++++++++++++++++++++++++++--
> mm/memcontrol.c | 15 +++++---
> mm/vmscan.c | 35 ++++++++++++------
> 6 files changed, 118 insertions(+), 25 deletions(-)
>
>
> Sameple output is here.
> =========================
>
> dd-1851 [001] 158.837763: mm_vmscan_memcg_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
> dd-1851 [001] 158.837783: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> dd-1851 [001] 158.837860: mm_vmscan_memcg_reclaim_end: nr_reclaimed=32
> (...)
> dd-1970 [000] 266.608235: mm_vmscan_wakeup_kswapd: nid=0 zid=1 order=0
> dd-1970 [000] 266.608239: mm_vmscan_wakeup_kswapd: nid=1 zid=1 order=0
> dd-1970 [000] 266.608248: mm_vmscan_wakeup_kswapd: nid=2 zid=1 order=0
> kswapd1-348 [001] 266.608254: mm_vmscan_kswapd_wake: nid=1 order=0
> dd-1970 [000] 266.608254: mm_vmscan_wakeup_kswapd: nid=3 zid=1 order=0
> kswapd3-350 [000] 266.608266: mm_vmscan_kswapd_wake: nid=3 order=0
> (...)
> kswapd0-347 [001] 267.328891: mm_vmscan_memcg_softlimit_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
> kswapd0-347 [001] 267.328897: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.328915: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.328989: mm_vmscan_memcg_softlimit_reclaim_end: nr_reclaimed=32
> kswapd0-347 [001] 267.329019: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.330562: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> (...)
> kswapd2-349 [001] 267.407081: mm_vmscan_kswapd_sleep: nid=2
> kswapd3-350 [001] 267.408077: mm_vmscan_kswapd_sleep: nid=3
> kswapd1-348 [000] 267.427858: mm_vmscan_kswapd_sleep: nid=1
> kswapd0-347 [001] 267.430064: mm_vmscan_kswapd_sleep: nid=0
>
>
>
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2010-07-20 22:45:14

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 4/7] vmscan: convert direct reclaim tracepoint to DEFINE_EVENT

On Fri, 2010-07-16 at 12:08 +0100, Mel Gorman wrote:
>
> > +DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_direct_reclaim_end,
> > +
> > + TP_PROTO(unsigned long nr_reclaimed),
> > +
> > + TP_ARGS(nr_reclaimed)
> > +);
> > +
>
> Over 80 columns here too.
>
> I know I broke it multiple times in my last series because I thought it
> wasn't enforced any more but I got called on it.

Note, The TRACE_EVENT() macros have a bit more leniency to the 80 column
rule.

-- Steve

2010-07-21 13:34:03

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 3/7] memcg: nid and zid can be calculated from zone

> > +static inline int zone_nid(struct zone *zone)
> > +{
> > + return zone->zone_pgdat->node_id;
> > +}
> > +
>
> hmm, adding a helper and not converting the existing users of
> zone->zone_pgdat may be a little confusing particularly as both types of
> usage would exist in the same file e.g. in mem_cgroup_zone_nr_pages.

I see. here is incrementa patch.

Thanks

>From 62cf765251af257c98fc92a58215d101d200e7ef Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <[email protected]>
Date: Tue, 20 Jul 2010 11:30:14 +0900
Subject: [PATCH] memcg: convert to zone_nid() from bare zone->zone_pgdat->node_id

Now, we have zone_nid(). this patch convert all existing users of
zone->zone_pgdat.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/memcontrol.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 82e191f..3d5b645 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -951,7 +951,7 @@ unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
struct zone *zone,
enum lru_list lru)
{
- int nid = zone->zone_pgdat->node_id;
+ int nid = zone_nid(zone);
int zid = zone_idx(zone);
struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);

@@ -961,7 +961,7 @@ unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
struct zone *zone)
{
- int nid = zone->zone_pgdat->node_id;
+ int nid = zone_nid(zone);
int zid = zone_idx(zone);
struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);

@@ -1006,7 +1006,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
LIST_HEAD(pc_list);
struct list_head *src;
struct page_cgroup *pc, *tmp;
- int nid = z->zone_pgdat->node_id;
+ int nid = zone_nid(z);
int zid = zone_idx(z);
struct mem_cgroup_per_zone *mz;
int lru = LRU_FILE * file + active;
--
1.6.5.2


2010-07-21 13:34:30

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 1/7] memcg: sc.nr_to_reclaim should be initialized

> On Fri, Jul 16, 2010 at 07:13:31PM +0900, KOSAKI Motohiro wrote:
> > Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
> > It mean shrink_zone() only scan 32 pages and immediately return even if
> > it doesn't reclaim any pages.
> >
>
> Do you mean it immediately returns once one page is reclaimed? i.e. this
> check
>
> if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
> break;

Strictly speaking, once SWAP_CLUSTER_MAX batch is scanned. no need
to reclaim pages at all.


2010-07-22 04:00:32

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 0/7] memcg reclaim tracepoint

* KOSAKI Motohiro <[email protected]> [2010-07-16 19:12:46]:

> Recently, Mel Gorman added some vmscan tracepoint. but they can't
> trace memcg. So, This patch series does.
>
>
> following three patches are nit fix and cleanups.
>
> memcg: sc.nr_to_reclaim should be initialized
> memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask
> memcg: nid and zid can be calculated from zone
>
> following four patches are tracepoint conversion and adding memcg tracepoints.
>
> vmscan: convert direct reclaim tracepoint to DEFINE_EVENT
> memcg, vmscan: add memcg reclaim tracepoint
> vmscan: convert mm_vmscan_lru_isolate to DEFINE_EVENT
> memcg, vmscan: add mm_vmscan_memcg_isolate tracepoint
>
>
> diffstat
> ================
> include/linux/memcontrol.h | 6 ++--
> include/linux/mmzone.h | 5 +++
> include/linux/swap.h | 3 +-
> include/trace/events/vmscan.h | 79 +++++++++++++++++++++++++++++++++++++++--
> mm/memcontrol.c | 15 +++++---
> mm/vmscan.c | 35 ++++++++++++------
> 6 files changed, 118 insertions(+), 25 deletions(-)
>
>
> Sameple output is here.
> =========================
>
> dd-1851 [001] 158.837763: mm_vmscan_memcg_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
> dd-1851 [001] 158.837783: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> dd-1851 [001] 158.837860: mm_vmscan_memcg_reclaim_end: nr_reclaimed=32
> (...)
> dd-1970 [000] 266.608235: mm_vmscan_wakeup_kswapd: nid=0 zid=1 order=0
> dd-1970 [000] 266.608239: mm_vmscan_wakeup_kswapd: nid=1 zid=1 order=0
> dd-1970 [000] 266.608248: mm_vmscan_wakeup_kswapd: nid=2 zid=1 order=0
> kswapd1-348 [001] 266.608254: mm_vmscan_kswapd_wake: nid=1 order=0
> dd-1970 [000] 266.608254: mm_vmscan_wakeup_kswapd: nid=3 zid=1 order=0
> kswapd3-350 [000] 266.608266: mm_vmscan_kswapd_wake: nid=3 order=0
> (...)
> kswapd0-347 [001] 267.328891: mm_vmscan_memcg_softlimit_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_HIGHUSER_MOVABLE
> kswapd0-347 [001] 267.328897: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.328915: mm_vmscan_memcg_isolate: isolate_mode=0 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.328989: mm_vmscan_memcg_softlimit_reclaim_end: nr_reclaimed=32
> kswapd0-347 [001] 267.329019: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> kswapd0-347 [001] 267.330562: mm_vmscan_lru_isolate: isolate_mode=1 order=0 nr_requested=32 nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
> (...)
> kswapd2-349 [001] 267.407081: mm_vmscan_kswapd_sleep: nid=2
> kswapd3-350 [001] 267.408077: mm_vmscan_kswapd_sleep: nid=3
> kswapd1-348 [000] 267.427858: mm_vmscan_kswapd_sleep: nid=1
> kswapd0-347 [001] 267.430064: mm_vmscan_kswapd_sleep: nid=0
>

This looks interesting, but I think I need to look deeper to see how
the name like mm_vmscan_memcg_softlimit_reclaim_begin is generated.

--
Three Cheers,
Balbir

2010-07-22 04:49:20

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 2/7] memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask

* KOSAKI Motohiro <[email protected]> [2010-07-16 19:14:15]:

> Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.
> thus it doesn't need to initialize sc.nodemask. shrink_zone() doesn't
> use it at all.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> include/linux/swap.h | 3 +--
> mm/memcontrol.c | 3 +--
> mm/vmscan.c | 8 ++------
> 3 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index ff4acea..bf4eb62 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -244,8 +244,7 @@ extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
> extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> gfp_t gfp_mask, bool noswap,
> unsigned int swappiness,
> - struct zone *zone,
> - int nid);
> + struct zone *zone);
> extern int __isolate_lru_page(struct page *page, int mode, int file);
> extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index aba4310..01f38ff 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1307,8 +1307,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> /* we use swappiness of local cgroup */
> if (check_soft)
> ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
> - noswap, get_swappiness(victim), zone,
> - zone->zone_pgdat->node_id);
> + noswap, get_swappiness(victim), zone);
> else
> ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
> noswap, get_swappiness(victim));
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bd1d035..be860a0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1929,7 +1929,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> gfp_t gfp_mask, bool noswap,
> unsigned int swappiness,
> - struct zone *zone, int nid)
> + struct zone *zone)
> {
> struct scan_control sc = {
> .nr_to_reclaim = SWAP_CLUSTER_MAX,
> @@ -1940,13 +1940,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> .order = 0,
> .mem_cgroup = mem,
> };
> - nodemask_t nm = nodemask_of_node(nid);
> -
> sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> - sc.nodemask = &nm;
> - sc.nr_reclaimed = 0;
> - sc.nr_scanned = 0;

We need the initialization to 0, is there a reason why it was removed?
What happens when we compare or increment sc.nr_*?

Can we keep this indepedent of the tracing patches?

> +
> /*
> * NOTE: Although we can get the priority field, using it
> * here is not a good idea, since it limits the pages we can scan.
> --
> 1.6.5.2
>
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Three Cheers,
Balbir

2010-07-22 05:31:23

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 1/7] memcg: sc.nr_to_reclaim should be initialized

* KOSAKI Motohiro <[email protected]> [2010-07-16 19:13:31]:

> Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
> It mean shrink_zone() only scan 32 pages and immediately return even if
> it doesn't reclaim any pages.
>
> This patch fixes it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> mm/vmscan.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1691ad0..bd1d035 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> struct zone *zone, int nid)
> {
> struct scan_control sc = {
> + .nr_to_reclaim = SWAP_CLUSTER_MAX,
> .may_writepage = !laptop_mode,
> .may_unmap = 1,
> .may_swap = !noswap,

Could you please do some additional testing on

1. How far does this push pages (in terms of when limit is hit)?
2. Did you hit a problem with the current setting or is it a review
fix?


--
Three Cheers,
Balbir

2010-07-22 05:34:39

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 5/7] memcg, vmscan: add memcg reclaim tracepoint

* KOSAKI Motohiro <[email protected]> [2010-07-16 19:16:46]:

>
> Memcg also need to trace reclaim progress as direct reclaim. This patch
> add it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

Looks good to me


Acked-by: Balbir Singh <[email protected]>


--
Three Cheers,
Balbir

2010-07-22 05:36:30

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 3/7] memcg: nid and zid can be calculated from zone

* KOSAKI Motohiro <[email protected]> [2010-07-21 22:33:56]:

> > > +static inline int zone_nid(struct zone *zone)
> > > +{
> > > + return zone->zone_pgdat->node_id;
> > > +}
> > > +
> >
> > hmm, adding a helper and not converting the existing users of
> > zone->zone_pgdat may be a little confusing particularly as both types of
> > usage would exist in the same file e.g. in mem_cgroup_zone_nr_pages.
>
> I see. here is incrementa patch.
>
> Thanks
>
> From 62cf765251af257c98fc92a58215d101d200e7ef Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <[email protected]>
> Date: Tue, 20 Jul 2010 11:30:14 +0900
> Subject: [PATCH] memcg: convert to zone_nid() from bare zone->zone_pgdat->node_id
>
> Now, we have zone_nid(). this patch convert all existing users of
> zone->zone_pgdat.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> mm/memcontrol.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 82e191f..3d5b645 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -951,7 +951,7 @@ unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
> struct zone *zone,
> enum lru_list lru)
> {
> - int nid = zone->zone_pgdat->node_id;
> + int nid = zone_nid(zone);
> int zid = zone_idx(zone);
> struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
>
> @@ -961,7 +961,7 @@ unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
> struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
> struct zone *zone)
> {
> - int nid = zone->zone_pgdat->node_id;
> + int nid = zone_nid(zone);
> int zid = zone_idx(zone);
> struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
>
> @@ -1006,7 +1006,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> LIST_HEAD(pc_list);
> struct list_head *src;
> struct page_cgroup *pc, *tmp;
> - int nid = z->zone_pgdat->node_id;
> + int nid = zone_nid(z);
> int zid = zone_idx(z);
> struct mem_cgroup_per_zone *mz;
> int lru = LRU_FILE * file + active;

Acked-by: Balbir Singh <[email protected]>

--
Three Cheers,
Balbir

2010-07-22 11:02:13

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 3/7] memcg: nid and zid can be calculated from zone

On Wed, Jul 21, 2010 at 10:33:56PM +0900, KOSAKI Motohiro wrote:
> > > +static inline int zone_nid(struct zone *zone)
> > > +{
> > > + return zone->zone_pgdat->node_id;
> > > +}
> > > +
> >
> > hmm, adding a helper and not converting the existing users of
> > zone->zone_pgdat may be a little confusing particularly as both types of
> > usage would exist in the same file e.g. in mem_cgroup_zone_nr_pages.
>
> I see. here is incrementa patch.
>

Looks grand. Thanks

> From 62cf765251af257c98fc92a58215d101d200e7ef Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <[email protected]>
> Date: Tue, 20 Jul 2010 11:30:14 +0900
> Subject: [PATCH] memcg: convert to zone_nid() from bare zone->zone_pgdat->node_id
>
> Now, we have zone_nid(). this patch convert all existing users of
> zone->zone_pgdat.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
>
> <SNIP>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-07-23 05:14:09

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/7] memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask

> * KOSAKI Motohiro <[email protected]> [2010-07-16 19:14:15]:
>
> > Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.
> > thus it doesn't need to initialize sc.nodemask. shrink_zone() doesn't
> > use it at all.
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > include/linux/swap.h | 3 +--
> > mm/memcontrol.c | 3 +--
> > mm/vmscan.c | 8 ++------
> > 3 files changed, 4 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index ff4acea..bf4eb62 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -244,8 +244,7 @@ extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
> > extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> > gfp_t gfp_mask, bool noswap,
> > unsigned int swappiness,
> > - struct zone *zone,
> > - int nid);
> > + struct zone *zone);
> > extern int __isolate_lru_page(struct page *page, int mode, int file);
> > extern unsigned long shrink_all_memory(unsigned long nr_pages);
> > extern int vm_swappiness;
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index aba4310..01f38ff 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1307,8 +1307,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > /* we use swappiness of local cgroup */
> > if (check_soft)
> > ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
> > - noswap, get_swappiness(victim), zone,
> > - zone->zone_pgdat->node_id);
> > + noswap, get_swappiness(victim), zone);
> > else
> > ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
> > noswap, get_swappiness(victim));
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index bd1d035..be860a0 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1929,7 +1929,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> > unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> > gfp_t gfp_mask, bool noswap,
> > unsigned int swappiness,
> > - struct zone *zone, int nid)
> > + struct zone *zone)
> > {
> > struct scan_control sc = {
> > .nr_to_reclaim = SWAP_CLUSTER_MAX,
> > @@ -1940,13 +1940,9 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> > .order = 0,
> > .mem_cgroup = mem,
> > };
> > - nodemask_t nm = nodemask_of_node(nid);
> > -
> > sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> > (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> > - sc.nodemask = &nm;
> > - sc.nr_reclaimed = 0;
> > - sc.nr_scanned = 0;
>
> We need the initialization to 0, is there a reason why it was removed?

please reread C spec and other scan_control user.
sc_nr_* were already initialized in struct scan_control sc = { } line.



> What happens when we compare or increment sc.nr_*?
>
> Can we keep this indepedent of the tracing patches?
>
> > +
> > /*
> > * NOTE: Although we can get the priority field, using it
> > * here is not a good idea, since it limits the pages we can scan.
> > --
> > 1.6.5.2
> >
> >
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to [email protected]. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
> --
> Three Cheers,
> Balbir


2010-07-23 07:33:19

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 1/7] memcg: sc.nr_to_reclaim should be initialized

> * KOSAKI Motohiro <[email protected]> [2010-07-16 19:13:31]:
>
> > Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
> > It mean shrink_zone() only scan 32 pages and immediately return even if
> > it doesn't reclaim any pages.
> >
> > This patch fixes it.
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > mm/vmscan.c | 1 +
> > 1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 1691ad0..bd1d035 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> > struct zone *zone, int nid)
> > {
> > struct scan_control sc = {
> > + .nr_to_reclaim = SWAP_CLUSTER_MAX,
> > .may_writepage = !laptop_mode,
> > .may_unmap = 1,
> > .may_swap = !noswap,
>
> Could you please do some additional testing on
>
> 1. How far does this push pages (in terms of when limit is hit)?

32 pages per mem_cgroup_shrink_node_zone().

That said, the algorithm is here.

1. call mem_cgroup_largest_soft_limit_node()
calculate largest cgroup
2. call mem_cgroup_shrink_node_zone() and shrink 32 pages
3. goto 1 if limit is still exceed.

If it's not your intention, can you please your intended algorithm?


> 2. Did you hit a problem with the current setting or is it a review
> fix?

I've found this by review. and my patch works fine on my test environment.
Of cource, if you do _not_ run the code on heavy pressure, your original code
works too.

2010-07-25 08:25:35

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 1/7] memcg: sc.nr_to_reclaim should be initialized

On Fri, Jul 23, 2010 at 1:03 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> * KOSAKI Motohiro <[email protected]> [2010-07-16 19:13:31]:
>>
>> > Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
>> > It mean shrink_zone() only scan 32 pages and immediately return even if
>> > it doesn't reclaim any pages.
>> >
>> > This patch fixes it.
>> >
>> > Signed-off-by: KOSAKI Motohiro <[email protected]>
>> > ---
>> > ?mm/vmscan.c | ? ?1 +
>> > ?1 files changed, 1 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index 1691ad0..bd1d035 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -1932,6 +1932,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct zone *zone, int nid)
>> > ?{
>> > ? ? struct scan_control sc = {
>> > + ? ? ? ? ? .nr_to_reclaim = SWAP_CLUSTER_MAX,
>> > ? ? ? ? ? ? .may_writepage = !laptop_mode,
>> > ? ? ? ? ? ? .may_unmap = 1,
>> > ? ? ? ? ? ? .may_swap = !noswap,
>>
>> Could you please do some additional testing on
>>
>> 1. How far does this push pages (in terms of when limit is hit)?
>
> 32 pages per mem_cgroup_shrink_node_zone().
>
> That said, the algorithm is here.
>
> 1. call mem_cgroup_largest_soft_limit_node()
> ? calculate largest cgroup
> 2. call mem_cgroup_shrink_node_zone() and shrink 32 pages
> 3. goto 1 if limit is still exceed.
>
> If it's not your intention, can you please your intended algorithm?

We set it to 0, since we care only about a single page reclaim on
hitting the limit. IIRC, in the past we saw an excessive pushback on
reclaiming SWAP_CLUSTER_MAX pages, just wanted to check if you are
seeing the same behaviour even now after your changes.

Balbir

2010-07-25 08:28:45

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 2/7] memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask

>>
>> We need the initialization to 0, is there a reason why it was removed?
>
> please reread C spec and other scan_control user.
> sc_nr_* were already initialized in struct scan_control sc = { } line.
>

I missed the fact that designated initializers do the right thing. Thanks!

Balbir

2010-07-25 09:48:14

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 1/7] memcg: sc.nr_to_reclaim should be initialized

> >> 1. How far does this push pages (in terms of when limit is hit)?
> >
> > 32 pages per mem_cgroup_shrink_node_zone().
> >
> > That said, the algorithm is here.
> >
> > 1. call mem_cgroup_largest_soft_limit_node()
> > ? calculate largest cgroup
> > 2. call mem_cgroup_shrink_node_zone() and shrink 32 pages
> > 3. goto 1 if limit is still exceed.
> >
> > If it's not your intention, can you please your intended algorithm?
>
> We set it to 0, since we care only about a single page reclaim on
> hitting the limit. IIRC, in the past we saw an excessive pushback on
> reclaiming SWAP_CLUSTER_MAX pages, just wanted to check if you are
> seeing the same behaviour even now after your changes.

Actually, we have 32 pages reclaim batch size. (see nr_scan_try_batch() and related functions)
thus <32 value doesn't works as your intended.

But, If you run your test again, and (if there is) report any bugs. I'm very glad and fix it soon.

Thanks.


2010-07-25 16:40:17

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 1/7] memcg: sc.nr_to_reclaim should be initialized

On Sun, Jul 25, 2010 at 3:18 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> >> 1. How far does this push pages (in terms of when limit is hit)?
>> >
>> > 32 pages per mem_cgroup_shrink_node_zone().
>> >
>> > That said, the algorithm is here.
>> >
>> > 1. call mem_cgroup_largest_soft_limit_node()
>> > ? calculate largest cgroup
>> > 2. call mem_cgroup_shrink_node_zone() and shrink 32 pages
>> > 3. goto 1 if limit is still exceed.
>> >
>> > If it's not your intention, can you please your intended algorithm?
>>
>> We set it to 0, since we care only about a single page reclaim on
>> hitting the limit. IIRC, in the past we saw an excessive pushback on
>> reclaiming SWAP_CLUSTER_MAX pages, just wanted to check if you are
>> seeing the same behaviour even now after your changes.
>
> Actually, we have 32 pages reclaim batch size. (see nr_scan_try_batch() and related functions)
> thus <32 value doesn't works as your intended.
>
> But, If you run your test again, and (if there is) report any bugs. I'm very glad and fix it soon.
>

I understand that, the point is when to do stop the reclaim (do we
really need 32 pages to stop the reclaim, when we hit the limit,
something as low as a single page can help). This is quite a subtle
thing, I'd mark it as low priority. I'll definitely come back if I see
unexpected behaviour.

Balbir