2012-05-14 18:01:03

by Johannes Weiner

[permalink] [raw]
Subject: [patch 0/6] mm: memcg: statistics implementation cleanups

Before piling more things (reclaim stats) on top of the current mess,
I thought it'd be better to clean up a bit.

The biggest change is printing statistics directly from live counters,
it has always been annoying to declare a new counter in two separate
enums and corresponding name string arrays. After this series we are
down to one of each.

mm/memcontrol.c | 223 +++++++++++++++++------------------------------
1 file changed, 82 insertions(+), 141 deletions(-)


2012-05-14 18:01:04

by Johannes Weiner

[permalink] [raw]
Subject: [patch 3/6] mm: memcg: print statistics directly to seq_file

Being able to use seq_printf() allows being smarter about statistics
name strings, which are currently listed twice, with the only
difference being a "total_" prefix on the hierarchical version.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/memcontrol.c | 56 +++++++++++++++++++++++++++----------------------------
1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f0d248b..9e8551c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4274,24 +4274,21 @@ struct mcs_total_stat {
s64 stat[NR_MCS_STAT];
};

-static struct {
- char *local_name;
- char *total_name;
-} memcg_stat_strings[NR_MCS_STAT] = {
- {"cache", "total_cache"},
- {"rss", "total_rss"},
- {"mapped_file", "total_mapped_file"},
- {"mlock", "total_mlock"},
- {"pgpgin", "total_pgpgin"},
- {"pgpgout", "total_pgpgout"},
- {"swap", "total_swap"},
- {"pgfault", "total_pgfault"},
- {"pgmajfault", "total_pgmajfault"},
- {"inactive_anon", "total_inactive_anon"},
- {"active_anon", "total_active_anon"},
- {"inactive_file", "total_inactive_file"},
- {"active_file", "total_active_file"},
- {"unevictable", "total_unevictable"}
+static const char *memcg_stat_strings[NR_MCS_STAT] = {
+ "cache",
+ "rss",
+ "mapped_file",
+ "mlock",
+ "pgpgin",
+ "pgpgout",
+ "swap",
+ "pgfault",
+ "pgmajfault",
+ "inactive_anon",
+ "active_anon",
+ "inactive_file",
+ "active_file",
+ "unevictable",
};


@@ -4392,7 +4389,7 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
#endif /* CONFIG_NUMA */

static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
- struct cgroup_map_cb *cb)
+ struct seq_file *m)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
struct mcs_total_stat mystat;
@@ -4405,16 +4402,18 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
for (i = 0; i < NR_MCS_STAT; i++) {
if (i == MCS_SWAP && !do_swap_account)
continue;
- cb->fill(cb, memcg_stat_strings[i].local_name, mystat.stat[i]);
+ seq_printf(m, "%s %llu\n", memcg_stat_strings[i],
+ (unsigned long long)mystat.stat[i]);
}

/* Hierarchical information */
{
unsigned long long limit, memsw_limit;
memcg_get_hierarchical_limit(memcg, &limit, &memsw_limit);
- cb->fill(cb, "hierarchical_memory_limit", limit);
+ seq_printf(m, "hierarchical_memory_limit %llu\n", limit);
if (do_swap_account)
- cb->fill(cb, "hierarchical_memsw_limit", memsw_limit);
+ seq_printf(m, "hierarchical_memsw_limit %llu\n",
+ memsw_limit);
}

memset(&mystat, 0, sizeof(mystat));
@@ -4422,7 +4421,8 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
for (i = 0; i < NR_MCS_STAT; i++) {
if (i == MCS_SWAP && !do_swap_account)
continue;
- cb->fill(cb, memcg_stat_strings[i].total_name, mystat.stat[i]);
+ seq_printf(m, "total_%s %llu\n", memcg_stat_strings[i],
+ (unsigned long long)mystat.stat[i]);
}

#ifdef CONFIG_DEBUG_VM
@@ -4443,10 +4443,10 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
recent_scanned[0] += rstat->recent_scanned[0];
recent_scanned[1] += rstat->recent_scanned[1];
}
- cb->fill(cb, "recent_rotated_anon", recent_rotated[0]);
- cb->fill(cb, "recent_rotated_file", recent_rotated[1]);
- cb->fill(cb, "recent_scanned_anon", recent_scanned[0]);
- cb->fill(cb, "recent_scanned_file", recent_scanned[1]);
+ seq_printf(m, "recent_rotated_anon %lu\n", recent_rotated[0]);
+ seq_printf(m, "recent_rotated_file %lu\n", recent_rotated[1]);
+ seq_printf(m, "recent_scanned_anon %lu\n", recent_scanned[0]);
+ seq_printf(m, "recent_scanned_file %lu\n", recent_scanned[1]);
}
#endif

@@ -4880,7 +4880,7 @@ static struct cftype mem_cgroup_files[] = {
},
{
.name = "stat",
- .read_map = mem_control_stat_show,
+ .read_seq_string = mem_control_stat_show,
},
{
.name = "force_empty",
--
1.7.10.1

2012-05-14 18:01:28

by Johannes Weiner

[permalink] [raw]
Subject: [patch 6/6] mm: memcg: print statistics from live counters

Directly print statistics and event counters instead of going through
an intermediate accumulation stage into a separate array, which used
to require defining statistic items in more than one place.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/memcontrol.c | 173 +++++++++++++++++++++----------------------------------
1 file changed, 66 insertions(+), 107 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3ee63f6..b0d343a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -102,6 +102,14 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_NSTATS,
};

+static const char *mem_cgroup_stat_names[] = {
+ "cache",
+ "rss",
+ "mapped_file",
+ "mlock",
+ "swap",
+};
+
enum mem_cgroup_events_index {
MEM_CGROUP_EVENTS_PGPGIN, /* # of pages paged in */
MEM_CGROUP_EVENTS_PGPGOUT, /* # of pages paged out */
@@ -109,6 +117,14 @@ enum mem_cgroup_events_index {
MEM_CGROUP_EVENTS_PGMAJFAULT, /* # of major page-faults */
MEM_CGROUP_EVENTS_NSTATS,
};
+
+static const char *mem_cgroup_events_names[] = {
+ "pgpgin",
+ "pgpgout",
+ "pgfault",
+ "pgmajfault",
+};
+
/*
* Per memcg event counter is incremented at every pagein/pageout. With THP,
* it will be incremated by the number of pages. This counter is used for
@@ -4250,97 +4266,6 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
}
#endif

-
-/* For read statistics */
-enum {
- MCS_CACHE,
- MCS_RSS,
- MCS_FILE_MAPPED,
- MCS_MLOCK,
- MCS_SWAP,
- MCS_PGPGIN,
- MCS_PGPGOUT,
- MCS_PGFAULT,
- MCS_PGMAJFAULT,
- MCS_INACTIVE_ANON,
- MCS_ACTIVE_ANON,
- MCS_INACTIVE_FILE,
- MCS_ACTIVE_FILE,
- MCS_UNEVICTABLE,
- NR_MCS_STAT,
-};
-
-struct mcs_total_stat {
- s64 stat[NR_MCS_STAT];
-};
-
-static const char *memcg_stat_strings[NR_MCS_STAT] = {
- "cache",
- "rss",
- "mapped_file",
- "mlock",
- "swap",
- "pgpgin",
- "pgpgout",
- "pgfault",
- "pgmajfault",
- "inactive_anon",
- "active_anon",
- "inactive_file",
- "active_file",
- "unevictable",
-};
-
-
-static void
-mem_cgroup_get_local_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
-{
- s64 val;
-
- /* per cpu stat */
- val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_CACHE);
- s->stat[MCS_CACHE] += val * PAGE_SIZE;
- val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_RSS);
- s->stat[MCS_RSS] += val * PAGE_SIZE;
- val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED);
- s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE;
- val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_MLOCK);
- s->stat[MCS_MLOCK] += val * PAGE_SIZE;
- if (do_swap_account) {
- val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_SWAPOUT);
- s->stat[MCS_SWAP] += val * PAGE_SIZE;
- }
- val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGIN);
- s->stat[MCS_PGPGIN] += val;
- val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGOUT);
- s->stat[MCS_PGPGOUT] += val;
- val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGFAULT);
- s->stat[MCS_PGFAULT] += val;
- val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT);
- s->stat[MCS_PGMAJFAULT] += val;
-
- /* per zone stat */
- val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_INACTIVE_ANON));
- s->stat[MCS_INACTIVE_ANON] += val * PAGE_SIZE;
- val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_ACTIVE_ANON));
- s->stat[MCS_ACTIVE_ANON] += val * PAGE_SIZE;
- val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_INACTIVE_FILE));
- s->stat[MCS_INACTIVE_FILE] += val * PAGE_SIZE;
- val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_ACTIVE_FILE));
- s->stat[MCS_ACTIVE_FILE] += val * PAGE_SIZE;
- val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
- s->stat[MCS_UNEVICTABLE] += val * PAGE_SIZE;
-}
-
-static void
-mem_cgroup_get_total_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
-{
- struct mem_cgroup *iter;
-
- for_each_mem_cgroup_tree(iter, memcg)
- mem_cgroup_get_local_stat(iter, s);
-}
-
#ifdef CONFIG_NUMA
static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
struct seq_file *m)
@@ -4388,24 +4313,40 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
}
#endif /* CONFIG_NUMA */

+static const char *mem_cgroup_lru_names[] = {
+ "inactive_anon",
+ "active_anon",
+ "inactive_file",
+ "active_file",
+ "unevictable",
+};
+static inline void mem_cgroup_lru_names_not_uptodate(void)
+{
+ BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
+}
+
static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
struct seq_file *m)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
- struct mcs_total_stat mystat;
- int i;
-
- memset(&mystat, 0, sizeof(mystat));
- mem_cgroup_get_local_stat(memcg, &mystat);
+ struct mem_cgroup *mi;
+ unsigned int i;

-
- for (i = 0; i < NR_MCS_STAT; i++) {
- if (i == MCS_SWAP && !do_swap_account)
+ for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+ if (i == MEM_CGROUP_STAT_SWAPOUT && !do_swap_account)
continue;
- seq_printf(m, "%s %llu\n", memcg_stat_strings[i],
- (unsigned long long)mystat.stat[i]);
+ seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
+ mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
}

+ for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
+ seq_printf(m, "%s %lu\n", mem_cgroup_events_names[i],
+ mem_cgroup_read_events(memcg, i));
+
+ for (i = 0; i < NR_LRU_LISTS; i++)
+ seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i],
+ mem_cgroup_nr_lru_pages(memcg, BIT(i)) * PAGE_SIZE);
+
/* Hierarchical information */
{
unsigned long long limit, memsw_limit;
@@ -4416,13 +4357,31 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
memsw_limit);
}

- memset(&mystat, 0, sizeof(mystat));
- mem_cgroup_get_total_stat(memcg, &mystat);
- for (i = 0; i < NR_MCS_STAT; i++) {
- if (i == MCS_SWAP && !do_swap_account)
+ for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+ long long val = 0;
+
+ if (i == MEM_CGROUP_STAT_SWAPOUT && !do_swap_account)
continue;
- seq_printf(m, "total_%s %llu\n", memcg_stat_strings[i],
- (unsigned long long)mystat.stat[i]);
+ for_each_mem_cgroup_tree(mi, memcg)
+ val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
+ seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
+ }
+
+ for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
+ unsigned long long val = 0;
+
+ for_each_mem_cgroup_tree(mi, memcg)
+ val += mem_cgroup_read_events(mi, i);
+ seq_printf(m, "total_%s %llu\n",
+ mem_cgroup_events_names[i], val);
+ }
+
+ for (i = 0; i < NR_LRU_LISTS; i++) {
+ unsigned long long val = 0;
+
+ for_each_mem_cgroup_tree(mi, memcg)
+ val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE;
+ seq_printf(m, "total_%s %llu\n", mem_cgroup_lru_names[i], val);
}

#ifdef CONFIG_DEBUG_VM
--
1.7.10.1

2012-05-14 18:02:03

by Johannes Weiner

[permalink] [raw]
Subject: [patch 2/6] mm: memcg: convert numa stat to read_seq_string interface

Instead of using the raw seq_file file interface, switch over to the
read_seq_string cftype callback and let cgroup core code set up the
seq_file.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/memcontrol.c | 23 +++--------------------
1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index aef89c1..f0d248b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4345,12 +4345,12 @@ mem_cgroup_get_total_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
}

#ifdef CONFIG_NUMA
-static int mem_control_numa_stat_show(struct seq_file *m, void *arg)
+static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
+ struct seq_file *m)
{
int nid;
unsigned long total_nr, file_nr, anon_nr, unevictable_nr;
unsigned long node_nr;
- struct cgroup *cont = m->private;
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);

total_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL);
@@ -4825,22 +4825,6 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
return 0;
}

-#ifdef CONFIG_NUMA
-static const struct file_operations mem_control_numa_stat_file_operations = {
- .read = seq_read,
- .llseek = seq_lseek,
- .release = single_release,
-};
-
-static int mem_control_numa_stat_open(struct inode *unused, struct file *file)
-{
- struct cgroup *cont = file->f_dentry->d_parent->d_fsdata;
-
- file->f_op = &mem_control_numa_stat_file_operations;
- return single_open(file, mem_control_numa_stat_show, cont);
-}
-#endif /* CONFIG_NUMA */
-
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
{
@@ -4928,8 +4912,7 @@ static struct cftype mem_cgroup_files[] = {
#ifdef CONFIG_NUMA
{
.name = "numa_stat",
- .open = mem_control_numa_stat_open,
- .mode = S_IRUGO,
+ .read_seq_string = mem_control_numa_stat_show,
},
#endif
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
--
1.7.10.1

2012-05-14 18:02:01

by Johannes Weiner

[permalink] [raw]
Subject: [patch 5/6] mm: memcg: group swapped-out statistics counter logically

The counter of currently swapped out pages in a memcg (hierarchy) is
sitting amidst ever-increasing event counters. Move this item to the
other counters that reflect current state rather than history.

This technically breaks the kernel ABI, but hopefully nobody relies on
the order of items in memory.stat.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/memcontrol.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 546e7db..3ee63f6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4257,9 +4257,9 @@ enum {
MCS_RSS,
MCS_FILE_MAPPED,
MCS_MLOCK,
+ MCS_SWAP,
MCS_PGPGIN,
MCS_PGPGOUT,
- MCS_SWAP,
MCS_PGFAULT,
MCS_PGMAJFAULT,
MCS_INACTIVE_ANON,
@@ -4279,9 +4279,9 @@ static const char *memcg_stat_strings[NR_MCS_STAT] = {
"rss",
"mapped_file",
"mlock",
+ "swap",
"pgpgin",
"pgpgout",
- "swap",
"pgfault",
"pgmajfault",
"inactive_anon",
@@ -4306,14 +4306,14 @@ mem_cgroup_get_local_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE;
val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_MLOCK);
s->stat[MCS_MLOCK] += val * PAGE_SIZE;
- val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGIN);
- s->stat[MCS_PGPGIN] += val;
- val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGOUT);
- s->stat[MCS_PGPGOUT] += val;
if (do_swap_account) {
val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_SWAPOUT);
s->stat[MCS_SWAP] += val * PAGE_SIZE;
}
+ val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGIN);
+ s->stat[MCS_PGPGIN] += val;
+ val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGOUT);
+ s->stat[MCS_PGPGOUT] += val;
val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGFAULT);
s->stat[MCS_PGFAULT] += val;
val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT);
--
1.7.10.1

2012-05-14 18:02:33

by Johannes Weiner

[permalink] [raw]
Subject: [patch 4/6] mm: memcg: keep ratelimit counter separate from event counters

All events except the ratelimit counter are statistics exported to
userspace. Keep this internal value out of the event count array.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/memcontrol.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9e8551c..546e7db 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -105,7 +105,6 @@ enum mem_cgroup_stat_index {
enum mem_cgroup_events_index {
MEM_CGROUP_EVENTS_PGPGIN, /* # of pages paged in */
MEM_CGROUP_EVENTS_PGPGOUT, /* # of pages paged out */
- MEM_CGROUP_EVENTS_COUNT, /* # of pages paged in/out */
MEM_CGROUP_EVENTS_PGFAULT, /* # of page-faults */
MEM_CGROUP_EVENTS_PGMAJFAULT, /* # of major page-faults */
MEM_CGROUP_EVENTS_NSTATS,
@@ -129,6 +128,7 @@ enum mem_cgroup_events_target {
struct mem_cgroup_stat_cpu {
long count[MEM_CGROUP_STAT_NSTATS];
unsigned long events[MEM_CGROUP_EVENTS_NSTATS];
+ unsigned long nr_page_events;
unsigned long targets[MEM_CGROUP_NTARGETS];
};

@@ -736,7 +736,7 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
nr_pages = -nr_pages; /* for event */
}

- __this_cpu_add(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT], nr_pages);
+ __this_cpu_add(memcg->stat->nr_page_events, nr_pages);

preempt_enable();
}
@@ -797,7 +797,7 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
{
unsigned long val, next;

- val = __this_cpu_read(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT]);
+ val = __this_cpu_read(memcg->stat->nr_page_events);
next = __this_cpu_read(memcg->stat->targets[target]);
/* from time_after() in jiffies.h */
if ((long)next - (long)val < 0) {
--
1.7.10.1

2012-05-14 18:00:59

by Johannes Weiner

[permalink] [raw]
Subject: [patch 1/6] mm: memcg: remove obsolete statistics array boundary enum item

MEM_CGROUP_STAT_DATA is a leftover from when item counters were living
in the same array as ever-increasing event counters. It's no longer
needed, use MEM_CGROUP_STAT_NSTATS to iterate over the stat array.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/memcontrol.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9520ee9..aef89c1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -99,7 +99,6 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
MEM_CGROUP_STAT_MLOCK, /* # of pages charged as mlock()ed */
MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
- MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
MEM_CGROUP_STAT_NSTATS,
};

@@ -2158,7 +2157,7 @@ static void mem_cgroup_drain_pcp_counter(struct mem_cgroup *memcg, int cpu)
int i;

spin_lock(&memcg->pcp_counter_lock);
- for (i = 0; i < MEM_CGROUP_STAT_DATA; i++) {
+ for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
long x = per_cpu(memcg->stat->count[i], cpu);

per_cpu(memcg->stat->count[i], cpu) = 0;
--
1.7.10.1

2012-05-15 00:21:26

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [patch 0/6] mm: memcg: statistics implementation cleanups

(2012/05/15 3:00), Johannes Weiner wrote:

> Before piling more things (reclaim stats) on top of the current mess,
> I thought it'd be better to clean up a bit.
>
> The biggest change is printing statistics directly from live counters,
> it has always been annoying to declare a new counter in two separate
> enums and corresponding name string arrays. After this series we are
> down to one of each.
>
> mm/memcontrol.c | 223 +++++++++++++++++------------------------------
> 1 file changed, 82 insertions(+), 141 deletions(-)
>


to all 1-6. Thank you.

Acked-by: KAMEZAWA Hiroyuki <[email protected]>

One excuse for my old implementation of mem_cgroup_get_total_stat(),
which is fixed in patch 6, is that I thought it's better to touch all counters
in a cachineline at once and avoiding long distance for-each loop.

What number of performance difference with some big hierarchy(100+children) tree ?
(But I agree your code is cleaner. I'm just curious.)

Thanks,
-Kame

2012-05-15 11:03:21

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch 0/6] mm: memcg: statistics implementation cleanups

On Tue, May 15, 2012 at 09:19:33AM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/05/15 3:00), Johannes Weiner wrote:
>
> > Before piling more things (reclaim stats) on top of the current mess,
> > I thought it'd be better to clean up a bit.
> >
> > The biggest change is printing statistics directly from live counters,
> > it has always been annoying to declare a new counter in two separate
> > enums and corresponding name string arrays. After this series we are
> > down to one of each.
> >
> > mm/memcontrol.c | 223 +++++++++++++++++------------------------------
> > 1 file changed, 82 insertions(+), 141 deletions(-)
>
> to all 1-6. Thank you.
>
> Acked-by: KAMEZAWA Hiroyuki <[email protected]>

Thanks!

> One excuse for my old implementation of mem_cgroup_get_total_stat(),
> which is fixed in patch 6, is that I thought it's better to touch all counters
> in a cachineline at once and avoiding long distance for-each loop.
>
> What number of performance difference with some big hierarchy(100+children) tree ?
> (But I agree your code is cleaner. I'm just curious.)

I set up a parental group with hierarchy enabled, then created 512
children and did a 4-job kernel bench in one of them. Every 0.1
seconds, I read the stats of the parent, which requires reading each
stat/event/lru item from 512 groups before moving to the next one:

512stats-vanilla 512stats-patched
Walltime (s) 62.61 ( +0.00%) 62.88 ( +0.43%)
Walltime (stddev) 0.17 ( +0.00%) 0.14 ( -3.17%)

That should be acceptable, I think.

2012-05-15 14:14:52

by Michal Hocko

[permalink] [raw]
Subject: Re: [patch 1/6] mm: memcg: remove obsolete statistics array boundary enum item

On Mon 14-05-12 20:00:46, Johannes Weiner wrote:
> MEM_CGROUP_STAT_DATA is a leftover from when item counters were living
> in the same array as ever-increasing event counters. It's no longer
> needed, use MEM_CGROUP_STAT_NSTATS to iterate over the stat array.
>
> Signed-off-by: Johannes Weiner <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/memcontrol.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9520ee9..aef89c1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -99,7 +99,6 @@ enum mem_cgroup_stat_index {
> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
> MEM_CGROUP_STAT_MLOCK, /* # of pages charged as mlock()ed */
> MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> - MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
> MEM_CGROUP_STAT_NSTATS,
> };
>
> @@ -2158,7 +2157,7 @@ static void mem_cgroup_drain_pcp_counter(struct mem_cgroup *memcg, int cpu)
> int i;
>
> spin_lock(&memcg->pcp_counter_lock);
> - for (i = 0; i < MEM_CGROUP_STAT_DATA; i++) {
> + for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> long x = per_cpu(memcg->stat->count[i], cpu);
>
> per_cpu(memcg->stat->count[i], cpu) = 0;
> --
> 1.7.10.1
>

--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2012-05-15 14:43:10

by Michal Hocko

[permalink] [raw]
Subject: Re: [patch 2/6] mm: memcg: convert numa stat to read_seq_string interface

On Mon 14-05-12 20:00:47, Johannes Weiner wrote:
> Instead of using the raw seq_file file interface, switch over to the
> read_seq_string cftype callback and let cgroup core code set up the
> seq_file.
>
> Signed-off-by: Johannes Weiner <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/memcontrol.c | 23 +++--------------------
> 1 file changed, 3 insertions(+), 20 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index aef89c1..f0d248b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4345,12 +4345,12 @@ mem_cgroup_get_total_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
> }
>
> #ifdef CONFIG_NUMA
> -static int mem_control_numa_stat_show(struct seq_file *m, void *arg)
> +static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
> + struct seq_file *m)
> {
> int nid;
> unsigned long total_nr, file_nr, anon_nr, unevictable_nr;
> unsigned long node_nr;
> - struct cgroup *cont = m->private;
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> total_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL);
> @@ -4825,22 +4825,6 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
> return 0;
> }
>
> -#ifdef CONFIG_NUMA
> -static const struct file_operations mem_control_numa_stat_file_operations = {
> - .read = seq_read,
> - .llseek = seq_lseek,
> - .release = single_release,
> -};
> -
> -static int mem_control_numa_stat_open(struct inode *unused, struct file *file)
> -{
> - struct cgroup *cont = file->f_dentry->d_parent->d_fsdata;
> -
> - file->f_op = &mem_control_numa_stat_file_operations;
> - return single_open(file, mem_control_numa_stat_show, cont);
> -}
> -#endif /* CONFIG_NUMA */
> -
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
> static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
> {
> @@ -4928,8 +4912,7 @@ static struct cftype mem_cgroup_files[] = {
> #ifdef CONFIG_NUMA
> {
> .name = "numa_stat",
> - .open = mem_control_numa_stat_open,
> - .mode = S_IRUGO,
> + .read_seq_string = mem_control_numa_stat_show,
> },
> #endif
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> --
> 1.7.10.1
>

--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2012-05-15 14:46:39

by Michal Hocko

[permalink] [raw]
Subject: Re: [patch 3/6] mm: memcg: print statistics directly to seq_file

On Mon 14-05-12 20:00:48, Johannes Weiner wrote:
> Being able to use seq_printf() allows being smarter about statistics
> name strings, which are currently listed twice, with the only
> difference being a "total_" prefix on the hierarchical version.
>
> Signed-off-by: Johannes Weiner <[email protected]>

Nice
Acked-by: Michal Hocko <[email protected]>

> ---
> mm/memcontrol.c | 56 +++++++++++++++++++++++++++----------------------------
> 1 file changed, 28 insertions(+), 28 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f0d248b..9e8551c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4274,24 +4274,21 @@ struct mcs_total_stat {
> s64 stat[NR_MCS_STAT];
> };
>
> -static struct {
> - char *local_name;
> - char *total_name;
> -} memcg_stat_strings[NR_MCS_STAT] = {
> - {"cache", "total_cache"},
> - {"rss", "total_rss"},
> - {"mapped_file", "total_mapped_file"},
> - {"mlock", "total_mlock"},
> - {"pgpgin", "total_pgpgin"},
> - {"pgpgout", "total_pgpgout"},
> - {"swap", "total_swap"},
> - {"pgfault", "total_pgfault"},
> - {"pgmajfault", "total_pgmajfault"},
> - {"inactive_anon", "total_inactive_anon"},
> - {"active_anon", "total_active_anon"},
> - {"inactive_file", "total_inactive_file"},
> - {"active_file", "total_active_file"},
> - {"unevictable", "total_unevictable"}
> +static const char *memcg_stat_strings[NR_MCS_STAT] = {
> + "cache",
> + "rss",
> + "mapped_file",
> + "mlock",
> + "pgpgin",
> + "pgpgout",
> + "swap",
> + "pgfault",
> + "pgmajfault",
> + "inactive_anon",
> + "active_anon",
> + "inactive_file",
> + "active_file",
> + "unevictable",
> };
>
>
> @@ -4392,7 +4389,7 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
> #endif /* CONFIG_NUMA */
>
> static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
> - struct cgroup_map_cb *cb)
> + struct seq_file *m)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> struct mcs_total_stat mystat;
> @@ -4405,16 +4402,18 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
> for (i = 0; i < NR_MCS_STAT; i++) {
> if (i == MCS_SWAP && !do_swap_account)
> continue;
> - cb->fill(cb, memcg_stat_strings[i].local_name, mystat.stat[i]);
> + seq_printf(m, "%s %llu\n", memcg_stat_strings[i],
> + (unsigned long long)mystat.stat[i]);
> }
>
> /* Hierarchical information */
> {
> unsigned long long limit, memsw_limit;
> memcg_get_hierarchical_limit(memcg, &limit, &memsw_limit);
> - cb->fill(cb, "hierarchical_memory_limit", limit);
> + seq_printf(m, "hierarchical_memory_limit %llu\n", limit);
> if (do_swap_account)
> - cb->fill(cb, "hierarchical_memsw_limit", memsw_limit);
> + seq_printf(m, "hierarchical_memsw_limit %llu\n",
> + memsw_limit);
> }
>
> memset(&mystat, 0, sizeof(mystat));
> @@ -4422,7 +4421,8 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
> for (i = 0; i < NR_MCS_STAT; i++) {
> if (i == MCS_SWAP && !do_swap_account)
> continue;
> - cb->fill(cb, memcg_stat_strings[i].total_name, mystat.stat[i]);
> + seq_printf(m, "total_%s %llu\n", memcg_stat_strings[i],
> + (unsigned long long)mystat.stat[i]);
> }
>
> #ifdef CONFIG_DEBUG_VM
> @@ -4443,10 +4443,10 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
> recent_scanned[0] += rstat->recent_scanned[0];
> recent_scanned[1] += rstat->recent_scanned[1];
> }
> - cb->fill(cb, "recent_rotated_anon", recent_rotated[0]);
> - cb->fill(cb, "recent_rotated_file", recent_rotated[1]);
> - cb->fill(cb, "recent_scanned_anon", recent_scanned[0]);
> - cb->fill(cb, "recent_scanned_file", recent_scanned[1]);
> + seq_printf(m, "recent_rotated_anon %lu\n", recent_rotated[0]);
> + seq_printf(m, "recent_rotated_file %lu\n", recent_rotated[1]);
> + seq_printf(m, "recent_scanned_anon %lu\n", recent_scanned[0]);
> + seq_printf(m, "recent_scanned_file %lu\n", recent_scanned[1]);
> }
> #endif
>
> @@ -4880,7 +4880,7 @@ static struct cftype mem_cgroup_files[] = {
> },
> {
> .name = "stat",
> - .read_map = mem_control_stat_show,
> + .read_seq_string = mem_control_stat_show,
> },
> {
> .name = "force_empty",
> --
> 1.7.10.1
>

--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2012-05-15 14:58:09

by Michal Hocko

[permalink] [raw]
Subject: Re: [patch 4/6] mm: memcg: keep ratelimit counter separate from event counters

On Mon 14-05-12 20:00:49, Johannes Weiner wrote:
> All events except the ratelimit counter are statistics exported to
> userspace. Keep this internal value out of the event count array.

OK, makes sense. I was just thinking that events_internal array (with a
single MEM_CGROUP_EVENTS_COUNT) would be more consistent. Probably too
much churn for a single event though.

>
> Signed-off-by: Johannes Weiner <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/memcontrol.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9e8551c..546e7db 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -105,7 +105,6 @@ enum mem_cgroup_stat_index {
> enum mem_cgroup_events_index {
> MEM_CGROUP_EVENTS_PGPGIN, /* # of pages paged in */
> MEM_CGROUP_EVENTS_PGPGOUT, /* # of pages paged out */
> - MEM_CGROUP_EVENTS_COUNT, /* # of pages paged in/out */
> MEM_CGROUP_EVENTS_PGFAULT, /* # of page-faults */
> MEM_CGROUP_EVENTS_PGMAJFAULT, /* # of major page-faults */
> MEM_CGROUP_EVENTS_NSTATS,
> @@ -129,6 +128,7 @@ enum mem_cgroup_events_target {
> struct mem_cgroup_stat_cpu {
> long count[MEM_CGROUP_STAT_NSTATS];
> unsigned long events[MEM_CGROUP_EVENTS_NSTATS];
> + unsigned long nr_page_events;
> unsigned long targets[MEM_CGROUP_NTARGETS];
> };
>
> @@ -736,7 +736,7 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
> nr_pages = -nr_pages; /* for event */
> }
>
> - __this_cpu_add(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT], nr_pages);
> + __this_cpu_add(memcg->stat->nr_page_events, nr_pages);
>
> preempt_enable();
> }
> @@ -797,7 +797,7 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
> {
> unsigned long val, next;
>
> - val = __this_cpu_read(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT]);
> + val = __this_cpu_read(memcg->stat->nr_page_events);
> next = __this_cpu_read(memcg->stat->targets[target]);
> /* from time_after() in jiffies.h */
> if ((long)next - (long)val < 0) {
> --
> 1.7.10.1
>

--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2012-05-15 15:04:56

by Michal Hocko

[permalink] [raw]
Subject: Re: [patch 5/6] mm: memcg: group swapped-out statistics counter logically

On Mon 14-05-12 20:00:50, Johannes Weiner wrote:
> The counter of currently swapped out pages in a memcg (hierarchy) is
> sitting amidst ever-increasing event counters. Move this item to the
> other counters that reflect current state rather than history.
>
> This technically breaks the kernel ABI, but hopefully nobody relies on
> the order of items in memory.stat.

We did that already in 456f998e. Nobody complained AFAIR.
>
> Signed-off-by: Johannes Weiner <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/memcontrol.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 546e7db..3ee63f6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4257,9 +4257,9 @@ enum {
> MCS_RSS,
> MCS_FILE_MAPPED,
> MCS_MLOCK,
> + MCS_SWAP,
> MCS_PGPGIN,
> MCS_PGPGOUT,
> - MCS_SWAP,
> MCS_PGFAULT,
> MCS_PGMAJFAULT,
> MCS_INACTIVE_ANON,
> @@ -4279,9 +4279,9 @@ static const char *memcg_stat_strings[NR_MCS_STAT] = {
> "rss",
> "mapped_file",
> "mlock",
> + "swap",
> "pgpgin",
> "pgpgout",
> - "swap",
> "pgfault",
> "pgmajfault",
> "inactive_anon",
> @@ -4306,14 +4306,14 @@ mem_cgroup_get_local_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
> s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE;
> val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_MLOCK);
> s->stat[MCS_MLOCK] += val * PAGE_SIZE;
> - val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGIN);
> - s->stat[MCS_PGPGIN] += val;
> - val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGOUT);
> - s->stat[MCS_PGPGOUT] += val;
> if (do_swap_account) {
> val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_SWAPOUT);
> s->stat[MCS_SWAP] += val * PAGE_SIZE;
> }
> + val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGIN);
> + s->stat[MCS_PGPGIN] += val;
> + val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGOUT);
> + s->stat[MCS_PGPGOUT] += val;
> val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGFAULT);
> s->stat[MCS_PGFAULT] += val;
> val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT);
> --
> 1.7.10.1
>

--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2012-05-15 15:27:09

by Michal Hocko

[permalink] [raw]
Subject: Re: [patch 6/6] mm: memcg: print statistics from live counters

On Mon 14-05-12 20:00:51, Johannes Weiner wrote:
> Directly print statistics and event counters instead of going through
> an intermediate accumulation stage into a separate array, which used
> to require defining statistic items in more than one place.
>
> Signed-off-by: Johannes Weiner <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/memcontrol.c | 173 +++++++++++++++++++++----------------------------------
> 1 file changed, 66 insertions(+), 107 deletions(-)

Nice

>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3ee63f6..b0d343a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -102,6 +102,14 @@ enum mem_cgroup_stat_index {
> MEM_CGROUP_STAT_NSTATS,
> };
>
> +static const char *mem_cgroup_stat_names[] = {
> + "cache",
> + "rss",
> + "mapped_file",
> + "mlock",
> + "swap",
> +};
> +
> enum mem_cgroup_events_index {
> MEM_CGROUP_EVENTS_PGPGIN, /* # of pages paged in */
> MEM_CGROUP_EVENTS_PGPGOUT, /* # of pages paged out */
> @@ -109,6 +117,14 @@ enum mem_cgroup_events_index {
> MEM_CGROUP_EVENTS_PGMAJFAULT, /* # of major page-faults */
> MEM_CGROUP_EVENTS_NSTATS,
> };
> +
> +static const char *mem_cgroup_events_names[] = {
> + "pgpgin",
> + "pgpgout",
> + "pgfault",
> + "pgmajfault",
> +};
> +
> /*
> * Per memcg event counter is incremented at every pagein/pageout. With THP,
> * it will be incremated by the number of pages. This counter is used for
> @@ -4250,97 +4266,6 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
> }
> #endif
>
> -
> -/* For read statistics */
> -enum {
> - MCS_CACHE,
> - MCS_RSS,
> - MCS_FILE_MAPPED,
> - MCS_MLOCK,
> - MCS_SWAP,
> - MCS_PGPGIN,
> - MCS_PGPGOUT,
> - MCS_PGFAULT,
> - MCS_PGMAJFAULT,
> - MCS_INACTIVE_ANON,
> - MCS_ACTIVE_ANON,
> - MCS_INACTIVE_FILE,
> - MCS_ACTIVE_FILE,
> - MCS_UNEVICTABLE,
> - NR_MCS_STAT,
> -};
> -
> -struct mcs_total_stat {
> - s64 stat[NR_MCS_STAT];
> -};
> -
> -static const char *memcg_stat_strings[NR_MCS_STAT] = {
> - "cache",
> - "rss",
> - "mapped_file",
> - "mlock",
> - "swap",
> - "pgpgin",
> - "pgpgout",
> - "pgfault",
> - "pgmajfault",
> - "inactive_anon",
> - "active_anon",
> - "inactive_file",
> - "active_file",
> - "unevictable",
> -};
> -
> -
> -static void
> -mem_cgroup_get_local_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
> -{
> - s64 val;
> -
> - /* per cpu stat */
> - val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_CACHE);
> - s->stat[MCS_CACHE] += val * PAGE_SIZE;
> - val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_RSS);
> - s->stat[MCS_RSS] += val * PAGE_SIZE;
> - val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED);
> - s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE;
> - val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_MLOCK);
> - s->stat[MCS_MLOCK] += val * PAGE_SIZE;
> - if (do_swap_account) {
> - val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_SWAPOUT);
> - s->stat[MCS_SWAP] += val * PAGE_SIZE;
> - }
> - val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGIN);
> - s->stat[MCS_PGPGIN] += val;
> - val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGPGOUT);
> - s->stat[MCS_PGPGOUT] += val;
> - val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGFAULT);
> - s->stat[MCS_PGFAULT] += val;
> - val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT);
> - s->stat[MCS_PGMAJFAULT] += val;
> -
> - /* per zone stat */
> - val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_INACTIVE_ANON));
> - s->stat[MCS_INACTIVE_ANON] += val * PAGE_SIZE;
> - val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_ACTIVE_ANON));
> - s->stat[MCS_ACTIVE_ANON] += val * PAGE_SIZE;
> - val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_INACTIVE_FILE));
> - s->stat[MCS_INACTIVE_FILE] += val * PAGE_SIZE;
> - val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_ACTIVE_FILE));
> - s->stat[MCS_ACTIVE_FILE] += val * PAGE_SIZE;
> - val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
> - s->stat[MCS_UNEVICTABLE] += val * PAGE_SIZE;
> -}
> -
> -static void
> -mem_cgroup_get_total_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
> -{
> - struct mem_cgroup *iter;
> -
> - for_each_mem_cgroup_tree(iter, memcg)
> - mem_cgroup_get_local_stat(iter, s);
> -}
> -
> #ifdef CONFIG_NUMA
> static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
> struct seq_file *m)
> @@ -4388,24 +4313,40 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
> }
> #endif /* CONFIG_NUMA */
>
> +static const char *mem_cgroup_lru_names[] = {
> + "inactive_anon",
> + "active_anon",
> + "inactive_file",
> + "active_file",
> + "unevictable",
> +};
> +static inline void mem_cgroup_lru_names_not_uptodate(void)
> +{
> + BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
> +}
> +
> static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
> struct seq_file *m)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> - struct mcs_total_stat mystat;
> - int i;
> -
> - memset(&mystat, 0, sizeof(mystat));
> - mem_cgroup_get_local_stat(memcg, &mystat);
> + struct mem_cgroup *mi;
> + unsigned int i;
>
> -
> - for (i = 0; i < NR_MCS_STAT; i++) {
> - if (i == MCS_SWAP && !do_swap_account)
> + for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> + if (i == MEM_CGROUP_STAT_SWAPOUT && !do_swap_account)
> continue;
> - seq_printf(m, "%s %llu\n", memcg_stat_strings[i],
> - (unsigned long long)mystat.stat[i]);
> + seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
> + mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
> }
>
> + for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
> + seq_printf(m, "%s %lu\n", mem_cgroup_events_names[i],
> + mem_cgroup_read_events(memcg, i));
> +
> + for (i = 0; i < NR_LRU_LISTS; i++)
> + seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i],
> + mem_cgroup_nr_lru_pages(memcg, BIT(i)) * PAGE_SIZE);
> +
> /* Hierarchical information */
> {
> unsigned long long limit, memsw_limit;
> @@ -4416,13 +4357,31 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
> memsw_limit);
> }
>
> - memset(&mystat, 0, sizeof(mystat));
> - mem_cgroup_get_total_stat(memcg, &mystat);
> - for (i = 0; i < NR_MCS_STAT; i++) {
> - if (i == MCS_SWAP && !do_swap_account)
> + for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> + long long val = 0;
> +
> + if (i == MEM_CGROUP_STAT_SWAPOUT && !do_swap_account)
> continue;
> - seq_printf(m, "total_%s %llu\n", memcg_stat_strings[i],
> - (unsigned long long)mystat.stat[i]);
> + for_each_mem_cgroup_tree(mi, memcg)
> + val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
> + seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
> + }
> +
> + for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
> + unsigned long long val = 0;
> +
> + for_each_mem_cgroup_tree(mi, memcg)
> + val += mem_cgroup_read_events(mi, i);
> + seq_printf(m, "total_%s %llu\n",
> + mem_cgroup_events_names[i], val);
> + }
> +
> + for (i = 0; i < NR_LRU_LISTS; i++) {
> + unsigned long long val = 0;
> +
> + for_each_mem_cgroup_tree(mi, memcg)
> + val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE;
> + seq_printf(m, "total_%s %llu\n", mem_cgroup_lru_names[i], val);
> }
>
> #ifdef CONFIG_DEBUG_VM
> --
> 1.7.10.1
>

--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2012-05-16 00:03:23

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [patch 0/6] mm: memcg: statistics implementation cleanups

(2012/05/15 20:03), Johannes Weiner wrote:

> On Tue, May 15, 2012 at 09:19:33AM +0900, KAMEZAWA Hiroyuki wrote:
>> (2012/05/15 3:00), Johannes Weiner wrote:
>>
>>> Before piling more things (reclaim stats) on top of the current mess,
>>> I thought it'd be better to clean up a bit.
>>>
>>> The biggest change is printing statistics directly from live counters,
>>> it has always been annoying to declare a new counter in two separate
>>> enums and corresponding name string arrays. After this series we are
>>> down to one of each.
>>>
>>> mm/memcontrol.c | 223 +++++++++++++++++------------------------------
>>> 1 file changed, 82 insertions(+), 141 deletions(-)
>>
>> to all 1-6. Thank you.
>>
>> Acked-by: KAMEZAWA Hiroyuki <[email protected]>
>
> Thanks!
>
>> One excuse for my old implementation of mem_cgroup_get_total_stat(),
>> which is fixed in patch 6, is that I thought it's better to touch all counters
>> in a cachineline at once and avoiding long distance for-each loop.
>>
>> What number of performance difference with some big hierarchy(100+children) tree ?
>> (But I agree your code is cleaner. I'm just curious.)
>
> I set up a parental group with hierarchy enabled, then created 512
> children and did a 4-job kernel bench in one of them. Every 0.1
> seconds, I read the stats of the parent, which requires reading each
> stat/event/lru item from 512 groups before moving to the next one:
>
> 512stats-vanilla 512stats-patched
> Walltime (s) 62.61 ( +0.00%) 62.88 ( +0.43%)
> Walltime (stddev) 0.17 ( +0.00%) 0.14 ( -3.17%)
>
> That should be acceptable, I think.
>
>


Yes, thank you.
Thanks,
-Kame

2012-05-16 23:01:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 6/6] mm: memcg: print statistics from live counters

On Mon, 14 May 2012 20:00:51 +0200
Johannes Weiner <[email protected]> wrote:

> Directly print statistics and event counters instead of going through
> an intermediate accumulation stage into a separate array, which used
> to require defining statistic items in more than one place.
>
> ...
>
> -static const char *memcg_stat_strings[NR_MCS_STAT] = {
> - "cache",
> - "rss",
> - "mapped_file",

Bah humbug, who went and called this mapped_file?

This stat is derived from MEM_CGROUP_STAT_FILE_MAPPED. But if we
rename MEM_CGROUP_STAT_FILE_MAPPED to MEM_CGROUP_STAT_MAPPED_FILE then
we also need to rename the non-memcg NR_FILE_MAPPED. And we can't
change the text to "file_mapped" because it's ABI.

> - "mlock",
> - "swap",

And "swap" is derived from MEM_CGROUP_STAT_SWAPOUT. We could rename
that to MEM_CGROUP_STAT_SWAP without trouble.

But both are poor names. There are two concepts here: a) swapout
events (ie: swap writeout initiation) and b) swapspace usage. Type a)
only ever counts up, whereas type b) counts up and down.

MEM_CGROUP_STAT_SWAPOUT is actually of type b), but "swapout" is a
misleading term, because it refers to type a) events.

And the human-displayed "swap" is useless because it can refer to
either type a) or type b) events. These should be called "swapped" and
MEM_CGROUP_STAT_SWAPPED. But we can't change the userspace interface.

argh, I hate you all!

2012-05-17 00:03:36

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [patch 6/6] mm: memcg: print statistics from live counters

(2012/05/17 8:01), Andrew Morton wrote:

> On Mon, 14 May 2012 20:00:51 +0200
> Johannes Weiner <[email protected]> wrote:
>
>> Directly print statistics and event counters instead of going through
>> an intermediate accumulation stage into a separate array, which used
>> to require defining statistic items in more than one place.
>>
>> ...
>>
>> -static const char *memcg_stat_strings[NR_MCS_STAT] = {
>> - "cache",
>> - "rss",
>> - "mapped_file",
>
> Bah humbug, who went and called this mapped_file?
>
> This stat is derived from MEM_CGROUP_STAT_FILE_MAPPED. But if we
> rename MEM_CGROUP_STAT_FILE_MAPPED to MEM_CGROUP_STAT_MAPPED_FILE then
> we also need to rename the non-memcg NR_FILE_MAPPED. And we can't
> change the text to "file_mapped" because it's ABI.
>


Sorry..

>> - "mlock",
>> - "swap",
>
> And "swap" is derived from MEM_CGROUP_STAT_SWAPOUT. We could rename
> that to MEM_CGROUP_STAT_SWAP without trouble.
>

Yes.

> But both are poor names. There are two concepts here: a) swapout
> events (ie: swap writeout initiation) and b) swapspace usage. Type a)
> only ever counts up, whereas type b) counts up and down.
>
> MEM_CGROUP_STAT_SWAPOUT is actually of type b), but "swapout" is a
> misleading term, because it refers to type a) events.
>

I'll prepare a patch.

> And the human-displayed "swap" is useless because it can refer to
> either type a) or type b) events. These should be called "swapped" and
> MEM_CGROUP_STAT_SWAPPED. But we can't change the userspace interface.
>
> argh, I hate you all!
>

Hm...sorry. I(fujitsu) am now considering to add meminfo for memcg...,

add an option to override /proc/meminfo if a task is in container or
meminfo file somewhere.
(Now, we cannot trust /usr/bin/free, /usr/bin/top etc...in a container.)

so...I think usual user experience will be better because of the same format
with meminfo.

Thanks,
-Kame

2012-05-17 10:58:08

by Glauber Costa

[permalink] [raw]
Subject: Re: [patch 6/6] mm: memcg: print statistics from live counters

On 05/17/2012 04:01 AM, KAMEZAWA Hiroyuki wrote:
> Hm...sorry. I(fujitsu) am now considering to add meminfo for memcg...,
>
> add an option to override /proc/meminfo if a task is in container or
> meminfo file somewhere.
> (Now, we cannot trust /usr/bin/free, /usr/bin/top etc...in a container.)

Yes, and all the previous times I tried to touch those, I think the
general agreement was to come up with some kind of fuse overlay that
would read information available from the kernel, and present it
correctly formatted through bind-mounts on the files of interest.

But that's mainly because we never reached agreement on how to make that
appear automatically from such tasks