2012-06-15 12:00:25

by Sha Zhengju

[permalink] [raw]
Subject: [PATCH 1/2] memcg: remove MEMCG_NR_FILE_MAPPED

While doing memcg page stat accounting, there's no need to use MEMCG_NR_FILE_MAPPED
as an intermediate, we can use MEM_CGROUP_STAT_FILE_MAPPED directly.

Signed-off-by: Sha Zhengju <[email protected]>
---
include/linux/memcontrol.h | 22 ++++++++++++++++------
mm/memcontrol.c | 25 +------------------------
mm/rmap.c | 4 ++--
3 files changed, 19 insertions(+), 32 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f94efd2..a337c2e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -27,9 +27,19 @@ struct page_cgroup;
struct page;
struct mm_struct;

-/* Stats that can be updated by kernel. */
-enum mem_cgroup_page_stat_item {
- MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
+/*
+ * Statistics for memory cgroup.
+ */
+enum mem_cgroup_stat_index {
+ /*
+ * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
+ */
+ MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
+ MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */
+ MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
+ MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
+ MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
+ MEM_CGROUP_STAT_NSTATS,
};

struct mem_cgroup_reclaim_cookie {
@@ -170,17 +180,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page,
}

void mem_cgroup_update_page_stat(struct page *page,
- enum mem_cgroup_page_stat_item idx,
+ enum mem_cgroup_stat_index idx,
int val);

static inline void mem_cgroup_inc_page_stat(struct page *page,
- enum mem_cgroup_page_stat_item idx)
+ enum mem_cgroup_stat_index idx)
{
mem_cgroup_update_page_stat(page, idx, 1);
}

static inline void mem_cgroup_dec_page_stat(struct page *page,
- enum mem_cgroup_page_stat_item idx)
+ enum mem_cgroup_stat_index idx)
{
mem_cgroup_update_page_stat(page, idx, -1);
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7685d4a..9102b8c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -77,21 +77,6 @@ static int really_do_swap_account __initdata = 0;
#endif


-/*
- * Statistics for memory cgroup.
- */
-enum mem_cgroup_stat_index {
- /*
- * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
- */
- MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
- MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */
- MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
- MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
- MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
- MEM_CGROUP_STAT_NSTATS,
-};
-
enum mem_cgroup_events_index {
MEM_CGROUP_EVENTS_PGPGIN, /* # of pages paged in */
MEM_CGROUP_EVENTS_PGPGOUT, /* # of pages paged out */
@@ -1958,7 +1943,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags)
}

void mem_cgroup_update_page_stat(struct page *page,
- enum mem_cgroup_page_stat_item idx, int val)
+ enum mem_cgroup_stat_index idx, int val)
{
struct mem_cgroup *memcg;
struct page_cgroup *pc = lookup_page_cgroup(page);
@@ -1971,14 +1956,6 @@ void mem_cgroup_update_page_stat(struct page *page,
if (unlikely(!memcg || !PageCgroupUsed(pc)))
return;

- switch (idx) {
- case MEMCG_NR_FILE_MAPPED:
- idx = MEM_CGROUP_STAT_FILE_MAPPED;
- break;
- default:
- BUG();
- }
-
this_cpu_add(memcg->stat->count[idx], val);
}

diff --git a/mm/rmap.c b/mm/rmap.c
index 5b5ad58..7e4e481 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1154,7 +1154,7 @@ void page_add_file_rmap(struct page *page)
mem_cgroup_begin_update_page_stat(page, &locked, &flags);
if (atomic_inc_and_test(&page->_mapcount)) {
__inc_zone_page_state(page, NR_FILE_MAPPED);
- mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED);
+ mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
}
mem_cgroup_end_update_page_stat(page, &locked, &flags);
}
@@ -1208,7 +1208,7 @@ void page_remove_rmap(struct page *page)
NR_ANON_TRANSPARENT_HUGEPAGES);
} else {
__dec_zone_page_state(page, NR_FILE_MAPPED);
- mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED);
+ mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
}
/*
* It would be tidy to reset the PageAnon mapping here,
--
1.7.1


2012-06-15 12:02:05

by Sha Zhengju

[permalink] [raw]
Subject: [PATCH 2/2] memcg: add per cgroup dirty pages accounting

This patch adds memcg routines to count dirty pages. I notice that
the list has talked about per-cgroup dirty page limiting
(http://lwn.net/Articles/455341/) before, but it did not get merged.
I've no idea how is this going now, but maybe we can add per cgroup
dirty pages accounting first. This allows the memory controller to
maintain an accurate view of the amount of its memory that is dirty
and can provide some infomation while group's direct reclaim is working.

After commit 89c06bd5 (memcg: use new logic for page stat accounting),
we do not need per page_cgroup flag anymore and can directly use
struct page flag.


Signed-off-by: Sha Zhengju <[email protected]>
---
include/linux/memcontrol.h | 1 +
mm/filemap.c | 1 +
mm/memcontrol.c | 32 +++++++++++++++++++++++++-------
mm/page-writeback.c | 2 ++
mm/truncate.c | 1 +
5 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a337c2e..8154ade 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -39,6 +39,7 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
+ MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */
MEM_CGROUP_STAT_NSTATS,
};

diff --git a/mm/filemap.c b/mm/filemap.c
index 79c4b2b..5b5c121 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -141,6 +141,7 @@ void __delete_from_page_cache(struct page *page)
* having removed the page entirely.
*/
if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
+ mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9102b8c..d200ad1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2548,6 +2548,18 @@ void mem_cgroup_split_huge_fixup(struct page *head)
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

+static inline
+void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
+ struct mem_cgroup *to,
+ enum mem_cgroup_stat_index idx)
+{
+ /* Update stat data for mem_cgroup */
+ preempt_disable();
+ __this_cpu_dec(from->stat->count[idx]);
+ __this_cpu_inc(to->stat->count[idx]);
+ preempt_enable();
+}
+
/**
* mem_cgroup_move_account - move account of the page
* @page: the page
@@ -2597,13 +2609,14 @@ static int mem_cgroup_move_account(struct page *page,

move_lock_mem_cgroup(from, &flags);

- if (!anon && page_mapped(page)) {
- /* Update mapped_file data for mem_cgroup */
- preempt_disable();
- __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
- __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
- preempt_enable();
- }
+ if (!anon && page_mapped(page))
+ mem_cgroup_move_account_page_stat(from, to,
+ MEM_CGROUP_STAT_FILE_MAPPED);
+
+ if (PageDirty(page))
+ mem_cgroup_move_account_page_stat(from, to,
+ MEM_CGROUP_STAT_FILE_DIRTY);
+
mem_cgroup_charge_statistics(from, anon, -nr_pages);
if (uncharge)
/* This is not "cancel", but cancel_charge does all we need. */
@@ -4023,6 +4036,7 @@ enum {
MCS_SWAP,
MCS_PGFAULT,
MCS_PGMAJFAULT,
+ MCS_FILE_DIRTY,
MCS_INACTIVE_ANON,
MCS_ACTIVE_ANON,
MCS_INACTIVE_FILE,
@@ -4047,6 +4061,7 @@ struct {
{"swap", "total_swap"},
{"pgfault", "total_pgfault"},
{"pgmajfault", "total_pgmajfault"},
+ {"dirty", "total_dirty"},
{"inactive_anon", "total_inactive_anon"},
{"active_anon", "total_active_anon"},
{"inactive_file", "total_inactive_file"},
@@ -4080,6 +4095,9 @@ mem_cgroup_get_local_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT);
s->stat[MCS_PGMAJFAULT] += val;

+ val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_FILE_DIRTY);
+ s->stat[MCS_FILE_DIRTY] += val * PAGE_SIZE;
+
/* per zone stat */
val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_INACTIVE_ANON));
s->stat[MCS_INACTIVE_ANON] += val * PAGE_SIZE;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 26adea8..b17c692 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1936,6 +1936,7 @@ int __set_page_dirty_no_writeback(struct page *page)
void account_page_dirtied(struct page *page, struct address_space *mapping)
{
if (mapping_cap_account_dirty(mapping)) {
+ mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
__inc_zone_page_state(page, NR_FILE_DIRTY);
__inc_zone_page_state(page, NR_DIRTIED);
__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
@@ -2155,6 +2156,7 @@ int clear_page_dirty_for_io(struct page *page)
* for more comments.
*/
if (TestClearPageDirty(page)) {
+ mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info,
BDI_RECLAIMABLE);
diff --git a/mm/truncate.c b/mm/truncate.c
index 61a183b..fe8363e 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -76,6 +76,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size)
if (TestClearPageDirty(page)) {
struct address_space *mapping = page->mapping;
if (mapping && mapping_cap_account_dirty(mapping)) {
+ mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info,
BDI_RECLAIMABLE);
--
1.7.1

2012-06-15 15:18:21

by Greg Thelen

[permalink] [raw]
Subject: Re: [PATCH 1/2] memcg: remove MEMCG_NR_FILE_MAPPED

On Fri, Jun 15 2012, Sha Zhengju wrote:

> While doing memcg page stat accounting, there's no need to use MEMCG_NR_FILE_MAPPED
> as an intermediate, we can use MEM_CGROUP_STAT_FILE_MAPPED directly.
>
> Signed-off-by: Sha Zhengju <[email protected]>
> ---
> include/linux/memcontrol.h | 22 ++++++++++++++++------
> mm/memcontrol.c | 25 +------------------------
> mm/rmap.c | 4 ++--
> 3 files changed, 19 insertions(+), 32 deletions(-)

I assume this patch is relative to v3.4.

> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index f94efd2..a337c2e 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -27,9 +27,19 @@ struct page_cgroup;
> struct page;
> struct mm_struct;
>
> -/* Stats that can be updated by kernel. */
> -enum mem_cgroup_page_stat_item {
> - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
> +/*
> + * Statistics for memory cgroup.
> + */
> +enum mem_cgroup_stat_index {
> + /*
> + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
> + */
> + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
> + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */
> + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
> + MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> + MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
> + MEM_CGROUP_STAT_NSTATS,
> };

This has unfortunate side effect of letting code outside of memcontrol.c
manipulate memcg internally managed statistics
(e.g. MEM_CGROUP_STAT_CACHE) with mem_cgroup_{dec,inc}_page_stat. I
think that your change is fine. The complexity and presumed performance
overhead of the extra layer of indirection was not worth it.

> struct mem_cgroup_reclaim_cookie {
> @@ -170,17 +180,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page,
> }
>
> void mem_cgroup_update_page_stat(struct page *page,
> - enum mem_cgroup_page_stat_item idx,
> + enum mem_cgroup_stat_index idx,
> int val);
>
> static inline void mem_cgroup_inc_page_stat(struct page *page,
> - enum mem_cgroup_page_stat_item idx)
> + enum mem_cgroup_stat_index idx)
> {
> mem_cgroup_update_page_stat(page, idx, 1);
> }
>
> static inline void mem_cgroup_dec_page_stat(struct page *page,
> - enum mem_cgroup_page_stat_item idx)
> + enum mem_cgroup_stat_index idx)
> {
> mem_cgroup_update_page_stat(page, idx, -1);
> }

You missed two more uses of enum mem_cgroup_page_stat_item in
memcontrol.h.

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a337c2e..08475b9 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -390,12 +390,12 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page,
}

static inline void mem_cgroup_inc_page_stat(struct page *page,
- enum mem_cgroup_page_stat_item idx)
+ enum mem_cgroup_stat_index idx)
{
}

static inline void mem_cgroup_dec_page_stat(struct page *page,
- enum mem_cgroup_page_stat_item idx)
+ enum mem_cgroup_stat_index idx)
{
}

2012-06-15 15:32:49

by Greg Thelen

[permalink] [raw]
Subject: Re: [PATCH 2/2] memcg: add per cgroup dirty pages accounting

On Fri, Jun 15 2012, Sha Zhengju wrote:

> This patch adds memcg routines to count dirty pages. I notice that
> the list has talked about per-cgroup dirty page limiting
> (http://lwn.net/Articles/455341/) before, but it did not get merged.

Good timing, I was just about to make another effort to get some of
these patches upstream. Like you, I was going to start with some basic
counters.

Your approach is similar to what I have in mind. While it is good to
use the existing PageDirty flag, rather than introducing a new
page_cgroup flag, there are locking complications (see below) to handle
races between moving pages between memcg and the pages being {un}marked
dirty.

> I've no idea how is this going now, but maybe we can add per cgroup
> dirty pages accounting first. This allows the memory controller to
> maintain an accurate view of the amount of its memory that is dirty
> and can provide some infomation while group's direct reclaim is working.
>
> After commit 89c06bd5 (memcg: use new logic for page stat accounting),
> we do not need per page_cgroup flag anymore and can directly use
> struct page flag.
>
>
> Signed-off-by: Sha Zhengju <[email protected]>
> ---
> include/linux/memcontrol.h | 1 +
> mm/filemap.c | 1 +
> mm/memcontrol.c | 32 +++++++++++++++++++++++++-------
> mm/page-writeback.c | 2 ++
> mm/truncate.c | 1 +
> 5 files changed, 30 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index a337c2e..8154ade 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index {
> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
> MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */
> MEM_CGROUP_STAT_NSTATS,
> };
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 79c4b2b..5b5c121 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -141,6 +141,7 @@ void __delete_from_page_cache(struct page *page)
> * having removed the page entirely.
> */
> if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);

You need to use mem_cgroup_{begin,end}_update_page_stat around critical
sections that:
1) check PageDirty
2) update MEM_CGROUP_STAT_FILE_DIRTY counter

This protects against the page from being moved between memcg while
accounting. Same comment applies to all of your new calls to
mem_cgroup_{dec,inc}_page_stat. For usage pattern, see
page_add_file_rmap.

> dec_zone_page_state(page, NR_FILE_DIRTY);
> dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
> }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9102b8c..d200ad1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2548,6 +2548,18 @@ void mem_cgroup_split_huge_fixup(struct page *head)
> }
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> +static inline
> +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
> + struct mem_cgroup *to,
> + enum mem_cgroup_stat_index idx)
> +{
> + /* Update stat data for mem_cgroup */
> + preempt_disable();
> + __this_cpu_dec(from->stat->count[idx]);
> + __this_cpu_inc(to->stat->count[idx]);
> + preempt_enable();
> +}
> +
> /**
> * mem_cgroup_move_account - move account of the page
> * @page: the page
> @@ -2597,13 +2609,14 @@ static int mem_cgroup_move_account(struct page *page,
>
> move_lock_mem_cgroup(from, &flags);
>
> - if (!anon && page_mapped(page)) {
> - /* Update mapped_file data for mem_cgroup */
> - preempt_disable();
> - __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> - __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> - preempt_enable();
> - }
> + if (!anon && page_mapped(page))
> + mem_cgroup_move_account_page_stat(from, to,
> + MEM_CGROUP_STAT_FILE_MAPPED);
> +
> + if (PageDirty(page))
> + mem_cgroup_move_account_page_stat(from, to,
> + MEM_CGROUP_STAT_FILE_DIRTY);
> +
> mem_cgroup_charge_statistics(from, anon, -nr_pages);
> if (uncharge)
> /* This is not "cancel", but cancel_charge does all we need. */
> @@ -4023,6 +4036,7 @@ enum {
> MCS_SWAP,
> MCS_PGFAULT,
> MCS_PGMAJFAULT,
> + MCS_FILE_DIRTY,
> MCS_INACTIVE_ANON,
> MCS_ACTIVE_ANON,
> MCS_INACTIVE_FILE,
> @@ -4047,6 +4061,7 @@ struct {
> {"swap", "total_swap"},
> {"pgfault", "total_pgfault"},
> {"pgmajfault", "total_pgmajfault"},
> + {"dirty", "total_dirty"},

Please add something to Documentation/cgroups/memory.txt describing this
new user visible data. See my previous patch
http://thread.gmane.org/gmane.linux.kernel.mm/67114 for example text.

> {"inactive_anon", "total_inactive_anon"},
> {"active_anon", "total_active_anon"},
> {"inactive_file", "total_inactive_file"},
> @@ -4080,6 +4095,9 @@ mem_cgroup_get_local_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
> val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT);
> s->stat[MCS_PGMAJFAULT] += val;
>
> + val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_FILE_DIRTY);
> + s->stat[MCS_FILE_DIRTY] += val * PAGE_SIZE;
> +
> /* per zone stat */
> val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_INACTIVE_ANON));
> s->stat[MCS_INACTIVE_ANON] += val * PAGE_SIZE;
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 26adea8..b17c692 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1936,6 +1936,7 @@ int __set_page_dirty_no_writeback(struct page *page)
> void account_page_dirtied(struct page *page, struct address_space *mapping)
> {
> if (mapping_cap_account_dirty(mapping)) {
> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
> __inc_zone_page_state(page, NR_FILE_DIRTY);
> __inc_zone_page_state(page, NR_DIRTIED);
> __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
> @@ -2155,6 +2156,7 @@ int clear_page_dirty_for_io(struct page *page)
> * for more comments.
> */
> if (TestClearPageDirty(page)) {
> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
> dec_zone_page_state(page, NR_FILE_DIRTY);
> dec_bdi_stat(mapping->backing_dev_info,
> BDI_RECLAIMABLE);
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 61a183b..fe8363e 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -76,6 +76,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size)
> if (TestClearPageDirty(page)) {
> struct address_space *mapping = page->mapping;
> if (mapping && mapping_cap_account_dirty(mapping)) {
> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
> dec_zone_page_state(page, NR_FILE_DIRTY);
> dec_bdi_stat(mapping->backing_dev_info,
> BDI_RECLAIMABLE);

2012-06-16 06:33:58

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 1/2] memcg: remove MEMCG_NR_FILE_MAPPED

(2012/06/15 21:00), Sha Zhengju wrote:
> While doing memcg page stat accounting, there's no need to use MEMCG_NR_FILE_MAPPED
> as an intermediate, we can use MEM_CGROUP_STAT_FILE_MAPPED directly.
>
> Signed-off-by: Sha Zhengju<[email protected]>

I'm sorry but my recent patch modified mem_cgroup_stat_index and this will hunk with
mm tree. (not visible in linux-next yet.)

I have no objection to the patch. I'm grad if you'll update this and repost, later.

Thanks,
-Kame


> ---
> include/linux/memcontrol.h | 22 ++++++++++++++++------
> mm/memcontrol.c | 25 +------------------------
> mm/rmap.c | 4 ++--
> 3 files changed, 19 insertions(+), 32 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index f94efd2..a337c2e 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -27,9 +27,19 @@ struct page_cgroup;
> struct page;
> struct mm_struct;
>
> -/* Stats that can be updated by kernel. */
> -enum mem_cgroup_page_stat_item {
> - MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
> +/*
> + * Statistics for memory cgroup.
> + */
> +enum mem_cgroup_stat_index {
> + /*
> + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
> + */
> + MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
> + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */
> + MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
> + MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> + MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
> + MEM_CGROUP_STAT_NSTATS,
> };
>
> struct mem_cgroup_reclaim_cookie {
> @@ -170,17 +180,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page,
> }
>
> void mem_cgroup_update_page_stat(struct page *page,
> - enum mem_cgroup_page_stat_item idx,
> + enum mem_cgroup_stat_index idx,
> int val);
>
> static inline void mem_cgroup_inc_page_stat(struct page *page,
> - enum mem_cgroup_page_stat_item idx)
> + enum mem_cgroup_stat_index idx)
> {
> mem_cgroup_update_page_stat(page, idx, 1);
> }
>
> static inline void mem_cgroup_dec_page_stat(struct page *page,
> - enum mem_cgroup_page_stat_item idx)
> + enum mem_cgroup_stat_index idx)
> {
> mem_cgroup_update_page_stat(page, idx, -1);
> }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7685d4a..9102b8c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -77,21 +77,6 @@ static int really_do_swap_account __initdata = 0;
> #endif
>
>
> -/*
> - * Statistics for memory cgroup.
> - */
> -enum mem_cgroup_stat_index {
> - /*
> - * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
> - */
> - MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
> - MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */
> - MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
> - MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> - MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
> - MEM_CGROUP_STAT_NSTATS,
> -};
> -
> enum mem_cgroup_events_index {
> MEM_CGROUP_EVENTS_PGPGIN, /* # of pages paged in */
> MEM_CGROUP_EVENTS_PGPGOUT, /* # of pages paged out */
> @@ -1958,7 +1943,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags)
> }
>
> void mem_cgroup_update_page_stat(struct page *page,
> - enum mem_cgroup_page_stat_item idx, int val)
> + enum mem_cgroup_stat_index idx, int val)
> {
> struct mem_cgroup *memcg;
> struct page_cgroup *pc = lookup_page_cgroup(page);
> @@ -1971,14 +1956,6 @@ void mem_cgroup_update_page_stat(struct page *page,
> if (unlikely(!memcg || !PageCgroupUsed(pc)))
> return;
>
> - switch (idx) {
> - case MEMCG_NR_FILE_MAPPED:
> - idx = MEM_CGROUP_STAT_FILE_MAPPED;
> - break;
> - default:
> - BUG();
> - }
> -
> this_cpu_add(memcg->stat->count[idx], val);
> }
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 5b5ad58..7e4e481 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1154,7 +1154,7 @@ void page_add_file_rmap(struct page *page)
> mem_cgroup_begin_update_page_stat(page,&locked,&flags);
> if (atomic_inc_and_test(&page->_mapcount)) {
> __inc_zone_page_state(page, NR_FILE_MAPPED);
> - mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED);
> + mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
> }
> mem_cgroup_end_update_page_stat(page,&locked,&flags);
> }
> @@ -1208,7 +1208,7 @@ void page_remove_rmap(struct page *page)
> NR_ANON_TRANSPARENT_HUGEPAGES);
> } else {
> __dec_zone_page_state(page, NR_FILE_MAPPED);
> - mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED);
> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
> }
> /*
> * It would be tidy to reset the PageAnon mapping here,

2012-06-16 06:36:29

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/2] memcg: add per cgroup dirty pages accounting

(2012/06/16 0:32), Greg Thelen wrote:
> On Fri, Jun 15 2012, Sha Zhengju wrote:
>
>> This patch adds memcg routines to count dirty pages. I notice that
>> the list has talked about per-cgroup dirty page limiting
>> (http://lwn.net/Articles/455341/) before, but it did not get merged.
>
> Good timing, I was just about to make another effort to get some of
> these patches upstream. Like you, I was going to start with some basic
> counters.
>
> Your approach is similar to what I have in mind. While it is good to
> use the existing PageDirty flag, rather than introducing a new
> page_cgroup flag, there are locking complications (see below) to handle
> races between moving pages between memcg and the pages being {un}marked
> dirty.
>
>> I've no idea how is this going now, but maybe we can add per cgroup
>> dirty pages accounting first. This allows the memory controller to
>> maintain an accurate view of the amount of its memory that is dirty
>> and can provide some infomation while group's direct reclaim is working.
>>
>> After commit 89c06bd5 (memcg: use new logic for page stat accounting),
>> we do not need per page_cgroup flag anymore and can directly use
>> struct page flag.
>>
>>
>> Signed-off-by: Sha Zhengju<[email protected]>
>> ---
>> include/linux/memcontrol.h | 1 +
>> mm/filemap.c | 1 +
>> mm/memcontrol.c | 32 +++++++++++++++++++++++++-------
>> mm/page-writeback.c | 2 ++
>> mm/truncate.c | 1 +
>> 5 files changed, 30 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index a337c2e..8154ade 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index {
>> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
>> MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>> MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
>> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */
>> MEM_CGROUP_STAT_NSTATS,
>> };
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 79c4b2b..5b5c121 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -141,6 +141,7 @@ void __delete_from_page_cache(struct page *page)
>> * having removed the page entirely.
>> */
>> if (PageDirty(page)&& mapping_cap_account_dirty(mapping)) {
>> + mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
>
> You need to use mem_cgroup_{begin,end}_update_page_stat around critical
> sections that:
> 1) check PageDirty
> 2) update MEM_CGROUP_STAT_FILE_DIRTY counter
>
> This protects against the page from being moved between memcg while
> accounting. Same comment applies to all of your new calls to
> mem_cgroup_{dec,inc}_page_stat. For usage pattern, see
> page_add_file_rmap.
>

If you feel some difficulty with mem_cgroup_{begin,end}_update_page_stat(),
please let me know...I hope they should work enough....

Thanks,
-Kame


2012-06-17 06:53:25

by Sha Zhengju

[permalink] [raw]
Subject: Re: [PATCH 1/2] memcg: remove MEMCG_NR_FILE_MAPPED

On Fri, Jun 15, 2012 at 11:18 PM, Greg Thelen <[email protected]> wrote:
> On Fri, Jun 15 2012, Sha Zhengju wrote:
>
>> While doing memcg page stat accounting, there's no need to use MEMCG_NR_FILE_MAPPED
>> as an intermediate, we can use MEM_CGROUP_STAT_FILE_MAPPED directly.
>>
>> Signed-off-by: Sha Zhengju <[email protected]>
>> ---
>> ?include/linux/memcontrol.h | ? 22 ++++++++++++++++------
>> ?mm/memcontrol.c ? ? ? ? ? ?| ? 25 +------------------------
>> ?mm/rmap.c ? ? ? ? ? ? ? ? ?| ? ?4 ++--
>> ?3 files changed, 19 insertions(+), 32 deletions(-)
>
> I assume this patch is relative to v3.4.
>


Yeah, I cook it based on linux-stable v3.4.


>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index f94efd2..a337c2e 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -27,9 +27,19 @@ struct page_cgroup;
>> ?struct page;
>> ?struct mm_struct;
>>
>> -/* Stats that can be updated by kernel. */
>> -enum mem_cgroup_page_stat_item {
>> - ? ? MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
>> +/*
>> + * Statistics for memory cgroup.
>> + */
>> +enum mem_cgroup_stat_index {
>> + ? ? /*
>> + ? ? ?* For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
>> + ? ? ?*/
>> + ? ? MEM_CGROUP_STAT_CACHE, ? ? /* # of pages charged as cache */
>> + ? ? MEM_CGROUP_STAT_RSS, ? ? ? /* # of pages charged as anon rss */
>> + ? ? MEM_CGROUP_STAT_FILE_MAPPED, ?/* # of pages charged as file rss */
>> + ? ? MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>> + ? ? MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
>> + ? ? MEM_CGROUP_STAT_NSTATS,
>> ?};
>
> This has unfortunate side effect of letting code outside of memcontrol.c
> manipulate memcg internally managed statistics
> (e.g. MEM_CGROUP_STAT_CACHE) with mem_cgroup_{dec,inc}_page_stat. ?I
> think that your change is fine. ?The complexity and presumed performance
> overhead of the extra layer of indirection was not worth it.
>
>> ?struct mem_cgroup_reclaim_cookie {
>> @@ -170,17 +180,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page,
>> ?}
>>
>> ?void mem_cgroup_update_page_stat(struct page *page,
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?enum mem_cgroup_page_stat_item idx,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?enum mem_cgroup_stat_index idx,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?int val);
>>
>> ?static inline void mem_cgroup_inc_page_stat(struct page *page,
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum mem_cgroup_page_stat_item idx)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum mem_cgroup_stat_index idx)
>> ?{
>> ? ? ? mem_cgroup_update_page_stat(page, idx, 1);
>> ?}
>>
>> ?static inline void mem_cgroup_dec_page_stat(struct page *page,
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum mem_cgroup_page_stat_item idx)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum mem_cgroup_stat_index idx)
>> ?{
>> ? ? ? mem_cgroup_update_page_stat(page, idx, -1);
>> ?}
>
> You missed two more uses of enum mem_cgroup_page_stat_item in
> memcontrol.h.
>

Ah, I find them, thanks for reviewing!


Thanks,
Sha

2012-06-17 06:56:17

by Sha Zhengju

[permalink] [raw]
Subject: Re: [PATCH 1/2] memcg: remove MEMCG_NR_FILE_MAPPED

On Sat, Jun 16, 2012 at 2:31 PM, Kamezawa Hiroyuki
<[email protected]> wrote:
> (2012/06/15 21:00), Sha Zhengju wrote:
>> While doing memcg page stat accounting, there's no need to use MEMCG_NR_FILE_MAPPED
>> as an intermediate, we can use MEM_CGROUP_STAT_FILE_MAPPED directly.
>>
>> Signed-off-by: Sha Zhengju<[email protected]>
>
> I'm sorry but my recent patch modified mem_cgroup_stat_index and this will hunk with
> mm tree. (not visible in linux-next yet.)
>
> I have no objection to the patch. I'm grad if you'll update this and repost, later.
>


Okay, I'll repost one based on mm tree.

Thanks,
Sha


> Thanks,
> -Kame
>
>
>> ---
>> ? include/linux/memcontrol.h | ? 22 ++++++++++++++++------
>> ? mm/memcontrol.c ? ? ? ? ? ?| ? 25 +------------------------
>> ? mm/rmap.c ? ? ? ? ? ? ? ? ?| ? ?4 ++--
>> ? 3 files changed, 19 insertions(+), 32 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index f94efd2..a337c2e 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -27,9 +27,19 @@ struct page_cgroup;
>> ? struct page;
>> ? struct mm_struct;
>>
>> -/* Stats that can be updated by kernel. */
>> -enum mem_cgroup_page_stat_item {
>> - ? ? MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
>> +/*
>> + * Statistics for memory cgroup.
>> + */
>> +enum mem_cgroup_stat_index {
>> + ? ? /*
>> + ? ? ?* For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
>> + ? ? ?*/
>> + ? ? MEM_CGROUP_STAT_CACHE, ? ? /* # of pages charged as cache */
>> + ? ? MEM_CGROUP_STAT_RSS, ? ? ? /* # of pages charged as anon rss */
>> + ? ? MEM_CGROUP_STAT_FILE_MAPPED, ?/* # of pages charged as file rss */
>> + ? ? MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>> + ? ? MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
>> + ? ? MEM_CGROUP_STAT_NSTATS,
>> ? };
>>
>> ? struct mem_cgroup_reclaim_cookie {
>> @@ -170,17 +180,17 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page,
>> ? }
>>
>> ? void mem_cgroup_update_page_stat(struct page *page,
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?enum mem_cgroup_page_stat_item idx,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?enum mem_cgroup_stat_index idx,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?int val);
>>
>> ? static inline void mem_cgroup_inc_page_stat(struct page *page,
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum mem_cgroup_page_stat_item idx)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum mem_cgroup_stat_index idx)
>> ? {
>> ? ? ? mem_cgroup_update_page_stat(page, idx, 1);
>> ? }
>>
>> ? static inline void mem_cgroup_dec_page_stat(struct page *page,
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum mem_cgroup_page_stat_item idx)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum mem_cgroup_stat_index idx)
>> ? {
>> ? ? ? mem_cgroup_update_page_stat(page, idx, -1);
>> ? }
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 7685d4a..9102b8c 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -77,21 +77,6 @@ static int really_do_swap_account __initdata = 0;
>> ? #endif
>>
>>
>> -/*
>> - * Statistics for memory cgroup.
>> - */
>> -enum mem_cgroup_stat_index {
>> - ? ? /*
>> - ? ? ?* For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
>> - ? ? ?*/
>> - ? ? MEM_CGROUP_STAT_CACHE, ? ? /* # of pages charged as cache */
>> - ? ? MEM_CGROUP_STAT_RSS, ? ? ? /* # of pages charged as anon rss */
>> - ? ? MEM_CGROUP_STAT_FILE_MAPPED, ?/* # of pages charged as file rss */
>> - ? ? MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>> - ? ? MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
>> - ? ? MEM_CGROUP_STAT_NSTATS,
>> -};
>> -
>> ? enum mem_cgroup_events_index {
>> ? ? ? MEM_CGROUP_EVENTS_PGPGIN, ? ? ? /* # of pages paged in */
>> ? ? ? MEM_CGROUP_EVENTS_PGPGOUT, ? ? ?/* # of pages paged out */
>> @@ -1958,7 +1943,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags)
>> ? }
>>
>> ? void mem_cgroup_update_page_stat(struct page *page,
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?enum mem_cgroup_page_stat_item idx, int val)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?enum mem_cgroup_stat_index idx, int val)
>> ? {
>> ? ? ? struct mem_cgroup *memcg;
>> ? ? ? struct page_cgroup *pc = lookup_page_cgroup(page);
>> @@ -1971,14 +1956,6 @@ void mem_cgroup_update_page_stat(struct page *page,
>> ? ? ? if (unlikely(!memcg || !PageCgroupUsed(pc)))
>> ? ? ? ? ? ? ? return;
>>
>> - ? ? switch (idx) {
>> - ? ? case MEMCG_NR_FILE_MAPPED:
>> - ? ? ? ? ? ? idx = MEM_CGROUP_STAT_FILE_MAPPED;
>> - ? ? ? ? ? ? break;
>> - ? ? default:
>> - ? ? ? ? ? ? BUG();
>> - ? ? }
>> -
>> ? ? ? this_cpu_add(memcg->stat->count[idx], val);
>> ? }
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 5b5ad58..7e4e481 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1154,7 +1154,7 @@ void page_add_file_rmap(struct page *page)
>> ? ? ? mem_cgroup_begin_update_page_stat(page,&locked,&flags);
>> ? ? ? if (atomic_inc_and_test(&page->_mapcount)) {
>> ? ? ? ? ? ? ? __inc_zone_page_state(page, NR_FILE_MAPPED);
>> - ? ? ? ? ? ? mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED);
>> + ? ? ? ? ? ? mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
>> ? ? ? }
>> ? ? ? mem_cgroup_end_update_page_stat(page,&locked,&flags);
>> ? }
>> @@ -1208,7 +1208,7 @@ void page_remove_rmap(struct page *page)
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? NR_ANON_TRANSPARENT_HUGEPAGES);
>> ? ? ? } else {
>> ? ? ? ? ? ? ? __dec_zone_page_state(page, NR_FILE_MAPPED);
>> - ? ? ? ? ? ? mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED);
>> + ? ? ? ? ? ? mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
>> ? ? ? }
>> ? ? ? /*
>> ? ? ? ?* It would be tidy to reset the PageAnon mapping here,
>
>

2012-06-17 07:44:26

by Sha Zhengju

[permalink] [raw]
Subject: Re: [PATCH 2/2] memcg: add per cgroup dirty pages accounting

On Fri, Jun 15, 2012 at 11:32 PM, Greg Thelen <[email protected]> wrote:
> On Fri, Jun 15 2012, Sha Zhengju wrote:
>
>> This patch adds memcg routines to count dirty pages. I notice that
>> the list has talked about per-cgroup dirty page limiting
>> (http://lwn.net/Articles/455341/) before, but it did not get merged.
>
> Good timing, I was just about to make another effort to get some of
> these patches upstream. ?Like you, I was going to start with some basic
> counters.
>
> Your approach is similar to what I have in mind. ?While it is good to
> use the existing PageDirty flag, rather than introducing a new
> page_cgroup flag, there are locking complications (see below) to handle
> races between moving pages between memcg and the pages being {un}marked
> dirty.
>
>> I've no idea how is this going now, but maybe we can add per cgroup
>> dirty pages accounting first. This allows the memory controller to
>> maintain an accurate view of the amount of its memory that is dirty
>> and can provide some infomation while group's direct reclaim is working.
>>
>> After commit 89c06bd5 (memcg: use new logic for page stat accounting),
>> we do not need per page_cgroup flag anymore and can directly use
>> struct page flag.
>>
>>
>> Signed-off-by: Sha Zhengju <[email protected]>
>> ---
>> ?include/linux/memcontrol.h | ? ?1 +
>> ?mm/filemap.c ? ? ? ? ? ? ? | ? ?1 +
>> ?mm/memcontrol.c ? ? ? ? ? ?| ? 32 +++++++++++++++++++++++++-------
>> ?mm/page-writeback.c ? ? ? ?| ? ?2 ++
>> ?mm/truncate.c ? ? ? ? ? ? ?| ? ?1 +
>> ?5 files changed, 30 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index a337c2e..8154ade 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index {
>> ? ? ? MEM_CGROUP_STAT_FILE_MAPPED, ?/* # of pages charged as file rss */
>> ? ? ? MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>> ? ? ? MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
>> + ? ? MEM_CGROUP_STAT_FILE_DIRTY, ?/* # of dirty pages in page cache */
>> ? ? ? MEM_CGROUP_STAT_NSTATS,
>> ?};
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 79c4b2b..5b5c121 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -141,6 +141,7 @@ void __delete_from_page_cache(struct page *page)
>> ? ? ? ?* having removed the page entirely.
>> ? ? ? ?*/
>> ? ? ? if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
>> + ? ? ? ? ? ? mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
>
> You need to use mem_cgroup_{begin,end}_update_page_stat around critical
> sections that:
> 1) check PageDirty
> 2) update MEM_CGROUP_STAT_FILE_DIRTY counter
>
> This protects against the page from being moved between memcg while
> accounting. ?Same comment applies to all of your new calls to
> mem_cgroup_{dec,inc}_page_stat. ?For usage pattern, see
> page_add_file_rmap.


It seems I should call mem_cgroup_{begin,end}_update_page_stat to prevent race
while modifying struct page info.
Thanks for patiently explaining!


>
>> ? ? ? ? ? ? ? dec_zone_page_state(page, NR_FILE_DIRTY);
>> ? ? ? ? ? ? ? dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
>> ? ? ? }
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 9102b8c..d200ad1 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2548,6 +2548,18 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>> ?}
>> ?#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>
>> +static inline
>> +void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct mem_cgroup *to,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum mem_cgroup_stat_index idx)
>> +{
>> + ? ? /* Update stat data for mem_cgroup */
>> + ? ? preempt_disable();
>> + ? ? __this_cpu_dec(from->stat->count[idx]);
>> + ? ? __this_cpu_inc(to->stat->count[idx]);
>> + ? ? preempt_enable();
>> +}
>> +
>> ?/**
>> ? * mem_cgroup_move_account - move account of the page
>> ? * @page: the page
>> @@ -2597,13 +2609,14 @@ static int mem_cgroup_move_account(struct page *page,
>>
>> ? ? ? move_lock_mem_cgroup(from, &flags);
>>
>> - ? ? if (!anon && page_mapped(page)) {
>> - ? ? ? ? ? ? /* Update mapped_file data for mem_cgroup */
>> - ? ? ? ? ? ? preempt_disable();
>> - ? ? ? ? ? ? __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
>> - ? ? ? ? ? ? __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
>> - ? ? ? ? ? ? preempt_enable();
>> - ? ? }
>> + ? ? if (!anon && page_mapped(page))
>> + ? ? ? ? ? ? mem_cgroup_move_account_page_stat(from, to,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEM_CGROUP_STAT_FILE_MAPPED);
>> +
>> + ? ? if (PageDirty(page))
>> + ? ? ? ? ? ? mem_cgroup_move_account_page_stat(from, to,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MEM_CGROUP_STAT_FILE_DIRTY);
>> +
>> ? ? ? mem_cgroup_charge_statistics(from, anon, -nr_pages);
>> ? ? ? if (uncharge)
>> ? ? ? ? ? ? ? /* This is not "cancel", but cancel_charge does all we need. */
>> @@ -4023,6 +4036,7 @@ enum {
>> ? ? ? MCS_SWAP,
>> ? ? ? MCS_PGFAULT,
>> ? ? ? MCS_PGMAJFAULT,
>> + ? ? MCS_FILE_DIRTY,
>> ? ? ? MCS_INACTIVE_ANON,
>> ? ? ? MCS_ACTIVE_ANON,
>> ? ? ? MCS_INACTIVE_FILE,
>> @@ -4047,6 +4061,7 @@ struct {
>> ? ? ? {"swap", "total_swap"},
>> ? ? ? {"pgfault", "total_pgfault"},
>> ? ? ? {"pgmajfault", "total_pgmajfault"},
>> + ? ? {"dirty", "total_dirty"},
>
> Please add something to Documentation/cgroups/memory.txt describing this
> new user visible data. ?See my previous patch
> http://thread.gmane.org/gmane.linux.kernel.mm/67114 for example text.
>


Got it. I'll add it in next version.

Thanks,
Sha


>> ? ? ? {"inactive_anon", "total_inactive_anon"},
>> ? ? ? {"active_anon", "total_active_anon"},
>> ? ? ? {"inactive_file", "total_inactive_file"},
>> @@ -4080,6 +4095,9 @@ mem_cgroup_get_local_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
>> ? ? ? val = mem_cgroup_read_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT);
>> ? ? ? s->stat[MCS_PGMAJFAULT] += val;
>>
>> + ? ? val = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_FILE_DIRTY);
>> + ? ? s->stat[MCS_FILE_DIRTY] += val * PAGE_SIZE;
>> +
>> ? ? ? /* per zone stat */
>> ? ? ? val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_INACTIVE_ANON));
>> ? ? ? s->stat[MCS_INACTIVE_ANON] += val * PAGE_SIZE;
>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> index 26adea8..b17c692 100644
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -1936,6 +1936,7 @@ int __set_page_dirty_no_writeback(struct page *page)
>> ?void account_page_dirtied(struct page *page, struct address_space *mapping)
>> ?{
>> ? ? ? if (mapping_cap_account_dirty(mapping)) {
>> + ? ? ? ? ? ? mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
>> ? ? ? ? ? ? ? __inc_zone_page_state(page, NR_FILE_DIRTY);
>> ? ? ? ? ? ? ? __inc_zone_page_state(page, NR_DIRTIED);
>> ? ? ? ? ? ? ? __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
>> @@ -2155,6 +2156,7 @@ int clear_page_dirty_for_io(struct page *page)
>> ? ? ? ? ? ? ? ?* for more comments.
>> ? ? ? ? ? ? ? ?*/
>> ? ? ? ? ? ? ? if (TestClearPageDirty(page)) {
>> + ? ? ? ? ? ? ? ? ? ? mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
>> ? ? ? ? ? ? ? ? ? ? ? dec_zone_page_state(page, NR_FILE_DIRTY);
>> ? ? ? ? ? ? ? ? ? ? ? dec_bdi_stat(mapping->backing_dev_info,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? BDI_RECLAIMABLE);
>> diff --git a/mm/truncate.c b/mm/truncate.c
>> index 61a183b..fe8363e 100644
>> --- a/mm/truncate.c
>> +++ b/mm/truncate.c
>> @@ -76,6 +76,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size)
>> ? ? ? if (TestClearPageDirty(page)) {
>> ? ? ? ? ? ? ? struct address_space *mapping = page->mapping;
>> ? ? ? ? ? ? ? if (mapping && mapping_cap_account_dirty(mapping)) {
>> + ? ? ? ? ? ? ? ? ? ? mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
>> ? ? ? ? ? ? ? ? ? ? ? dec_zone_page_state(page, NR_FILE_DIRTY);
>> ? ? ? ? ? ? ? ? ? ? ? dec_bdi_stat(mapping->backing_dev_info,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? BDI_RECLAIMABLE);

2012-06-19 14:31:38

by Sha Zhengju

[permalink] [raw]
Subject: Re: [PATCH 2/2] memcg: add per cgroup dirty pages accounting

On Sat, Jun 16, 2012 at 2:34 PM, Kamezawa Hiroyuki
<[email protected]> wrote:
> (2012/06/16 0:32), Greg Thelen wrote:
>>
>> On Fri, Jun 15 2012, Sha Zhengju wrote:
>>
>>> This patch adds memcg routines to count dirty pages. I notice that
>>> the list has talked about per-cgroup dirty page limiting
>>> (http://lwn.net/Articles/455341/) before, but it did not get merged.
>>
>>
>> Good timing, I was just about to make another effort to get some of
>> these patches upstream. ?Like you, I was going to start with some basic
>> counters.
>>
>> Your approach is similar to what I have in mind. ?While it is good to
>> use the existing PageDirty flag, rather than introducing a new
>> page_cgroup flag, there are locking complications (see below) to handle
>> races between moving pages between memcg and the pages being {un}marked
>> dirty.
>>
>>> I've no idea how is this going now, but maybe we can add per cgroup
>>> dirty pages accounting first. This allows the memory controller to
>>> maintain an accurate view of the amount of its memory that is dirty
>>> and can provide some infomation while group's direct reclaim is working.
>>>
>>> After commit 89c06bd5 (memcg: use new logic for page stat accounting),
>>> we do not need per page_cgroup flag anymore and can directly use
>>> struct page flag.
>>>
>>>
>>> Signed-off-by: Sha Zhengju<[email protected]>
>>> ---
>>> ?include/linux/memcontrol.h | ? ?1 +
>>> ?mm/filemap.c ? ? ? ? ? ? ? | ? ?1 +
>>> ?mm/memcontrol.c ? ? ? ? ? ?| ? 32 +++++++++++++++++++++++++-------
>>> ?mm/page-writeback.c ? ? ? ?| ? ?2 ++
>>> ?mm/truncate.c ? ? ? ? ? ? ?| ? ?1 +
>>> ?5 files changed, 30 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>> index a337c2e..8154ade 100644
>>> --- a/include/linux/memcontrol.h
>>> +++ b/include/linux/memcontrol.h
>>> @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index {
>>> ? ? ? ?MEM_CGROUP_STAT_FILE_MAPPED, ?/* # of pages charged as file rss */
>>> ? ? ? ?MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>>> ? ? ? ?MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
>>> + ? ? ? MEM_CGROUP_STAT_FILE_DIRTY, ?/* # of dirty pages in page cache */
>>> ? ? ? ?MEM_CGROUP_STAT_NSTATS,
>>> ?};
>>>
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index 79c4b2b..5b5c121 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -141,6 +141,7 @@ void __delete_from_page_cache(struct page *page)
>>> ? ? ? ? * having removed the page entirely.
>>> ? ? ? ? */
>>> ? ? ? ?if (PageDirty(page)&& ?mapping_cap_account_dirty(mapping)) {
>>> + ? ? ? ? ? ? ? mem_cgroup_dec_page_stat(page,
>>> MEM_CGROUP_STAT_FILE_DIRTY);
>>
>>
>> You need to use mem_cgroup_{begin,end}_update_page_stat around critical
>> sections that:
>> 1) check PageDirty
>> 2) update MEM_CGROUP_STAT_FILE_DIRTY counter
>>
>> This protects against the page from being moved between memcg while
>> accounting. ?Same comment applies to all of your new calls to
>> mem_cgroup_{dec,inc}_page_stat. ?For usage pattern, see
>> page_add_file_rmap.
>>
>
> If you feel some difficulty with mem_cgroup_{begin,end}_update_page_stat(),
> please let me know...I hope they should work enough....
>

Hi, Kame

While digging into the bigger lock of mem_cgroup_{begin,end}_update_page_stat(),
I find the reality is more complex than I thought. Simply stated,
modifying page info
and update page stat may be wide apart and in different level (eg.
mm&fs), so if we
use the big lock it may lead to scalability and maintainability issues.

For example:
mem_cgroup_begin_update_page_stat()
modify page information => TestSetPageDirty in
ceph_set_page_dirty() (fs/ceph/addr.c)
XXXXXX => other fs operations
mem_cgroup_update_page_stat() => account_page_dirtied() in
mm/page-writeback.c
mem_cgroup_end_update_page_stat().

We can choose to get lock in higher level meaning vfs set_page_dirty()
but this may span
too much and can also have some missing cases.
What's your opinion of this problem?


Thanks,
Sha

2012-06-21 07:56:30

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/2] memcg: add per cgroup dirty pages accounting

(2012/06/19 23:31), Sha Zhengju wrote:
> On Sat, Jun 16, 2012 at 2:34 PM, Kamezawa Hiroyuki
> <[email protected]> wrote:
>> (2012/06/16 0:32), Greg Thelen wrote:
>>>
>>> On Fri, Jun 15 2012, Sha Zhengju wrote:
>>>
>>>> This patch adds memcg routines to count dirty pages. I notice that
>>>> the list has talked about per-cgroup dirty page limiting
>>>> (http://lwn.net/Articles/455341/) before, but it did not get merged.
>>>
>>>
>>> Good timing, I was just about to make another effort to get some of
>>> these patches upstream. Like you, I was going to start with some basic
>>> counters.
>>>
>>> Your approach is similar to what I have in mind. While it is good to
>>> use the existing PageDirty flag, rather than introducing a new
>>> page_cgroup flag, there are locking complications (see below) to handle
>>> races between moving pages between memcg and the pages being {un}marked
>>> dirty.
>>>
>>>> I've no idea how is this going now, but maybe we can add per cgroup
>>>> dirty pages accounting first. This allows the memory controller to
>>>> maintain an accurate view of the amount of its memory that is dirty
>>>> and can provide some infomation while group's direct reclaim is working.
>>>>
>>>> After commit 89c06bd5 (memcg: use new logic for page stat accounting),
>>>> we do not need per page_cgroup flag anymore and can directly use
>>>> struct page flag.
>>>>
>>>>
>>>> Signed-off-by: Sha Zhengju<[email protected]>
>>>> ---
>>>> include/linux/memcontrol.h | 1 +
>>>> mm/filemap.c | 1 +
>>>> mm/memcontrol.c | 32 +++++++++++++++++++++++++-------
>>>> mm/page-writeback.c | 2 ++
>>>> mm/truncate.c | 1 +
>>>> 5 files changed, 30 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>>> index a337c2e..8154ade 100644
>>>> --- a/include/linux/memcontrol.h
>>>> +++ b/include/linux/memcontrol.h
>>>> @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index {
>>>> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
>>>> MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>>>> MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
>>>> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */
>>>> MEM_CGROUP_STAT_NSTATS,
>>>> };
>>>>
>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>> index 79c4b2b..5b5c121 100644
>>>> --- a/mm/filemap.c
>>>> +++ b/mm/filemap.c
>>>> @@ -141,6 +141,7 @@ void __delete_from_page_cache(struct page *page)
>>>> * having removed the page entirely.
>>>> */
>>>> if (PageDirty(page)&& mapping_cap_account_dirty(mapping)) {
>>>> + mem_cgroup_dec_page_stat(page,
>>>> MEM_CGROUP_STAT_FILE_DIRTY);
>>>
>>>
>>> You need to use mem_cgroup_{begin,end}_update_page_stat around critical
>>> sections that:
>>> 1) check PageDirty
>>> 2) update MEM_CGROUP_STAT_FILE_DIRTY counter
>>>
>>> This protects against the page from being moved between memcg while
>>> accounting. Same comment applies to all of your new calls to
>>> mem_cgroup_{dec,inc}_page_stat. For usage pattern, see
>>> page_add_file_rmap.
>>>
>>
>> If you feel some difficulty with mem_cgroup_{begin,end}_update_page_stat(),
>> please let me know...I hope they should work enough....
>>
>
> Hi, Kame
>
> While digging into the bigger lock of mem_cgroup_{begin,end}_update_page_stat(),
> I find the reality is more complex than I thought. Simply stated,
> modifying page info
> and update page stat may be wide apart and in different level (eg.
> mm&fs), so if we
> use the big lock it may lead to scalability and maintainability issues.
>
> For example:
> mem_cgroup_begin_update_page_stat()
> modify page information => TestSetPageDirty in ceph_set_page_dirty() (fs/ceph/addr.c)
> XXXXXX => other fs operations
> mem_cgroup_update_page_stat() => account_page_dirtied() in mm/page-writeback.c
> mem_cgroup_end_update_page_stat().
>
> We can choose to get lock in higher level meaning vfs set_page_dirty()
> but this may span
> too much and can also have some missing cases.
> What's your opinion of this problem?
>

yes, that's sad....If set_page_dirty() is always called under lock_page(), the
story will be easier (we'll take lock_page() in move side.)
but the comment on set_page_dirty() says it's not true.....Now, I haven't found a magical
way for avoiding the race.
(*) If holding lock_page() in move_account() can be a generic solution, it will be good.

A proposal from me is a small-start. You can start from adding hooks to a generic
functions as set_page_dirty() and __set_page_dirty_nobuffers(), clear_page_dirty_for_io().

And see what happens. I guess we can add WARN_ONCE() against callers of update_page_stat()
who don't take mem_cgroup_begin/end_update_page_stat()
(by some new check, for example, checking !rcu_read_lock_held() in update_stat())

I think we can make TODO list and catch up remaining things one by one.

Thanks,
-Kame
















2012-06-21 16:02:42

by Greg Thelen

[permalink] [raw]
Subject: Re: [PATCH 2/2] memcg: add per cgroup dirty pages accounting

On Thu, Jun 21 2012, Kamezawa Hiroyuki wrote:

> (2012/06/19 23:31), Sha Zhengju wrote:
>> On Sat, Jun 16, 2012 at 2:34 PM, Kamezawa Hiroyuki
>> <[email protected]> wrote:
>>> (2012/06/16 0:32), Greg Thelen wrote:
>>>>
>>>> On Fri, Jun 15 2012, Sha Zhengju wrote:
>>>>
>>>>> This patch adds memcg routines to count dirty pages. I notice that
>>>>> the list has talked about per-cgroup dirty page limiting
>>>>> (http://lwn.net/Articles/455341/) before, but it did not get merged.
>>>>
>>>>
>>>> Good timing, I was just about to make another effort to get some of
>>>> these patches upstream. Like you, I was going to start with some basic
>>>> counters.
>>>>
>>>> Your approach is similar to what I have in mind. While it is good to
>>>> use the existing PageDirty flag, rather than introducing a new
>>>> page_cgroup flag, there are locking complications (see below) to handle
>>>> races between moving pages between memcg and the pages being {un}marked
>>>> dirty.
>>>>
>>>>> I've no idea how is this going now, but maybe we can add per cgroup
>>>>> dirty pages accounting first. This allows the memory controller to
>>>>> maintain an accurate view of the amount of its memory that is dirty
>>>>> and can provide some infomation while group's direct reclaim is working.
>>>>>
>>>>> After commit 89c06bd5 (memcg: use new logic for page stat accounting),
>>>>> we do not need per page_cgroup flag anymore and can directly use
>>>>> struct page flag.
>>>>>
>>>>>
>>>>> Signed-off-by: Sha Zhengju<[email protected]>
>>>>> ---
>>>>> include/linux/memcontrol.h | 1 +
>>>>> mm/filemap.c | 1 +
>>>>> mm/memcontrol.c | 32 +++++++++++++++++++++++++-------
>>>>> mm/page-writeback.c | 2 ++
>>>>> mm/truncate.c | 1 +
>>>>> 5 files changed, 30 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>>>> index a337c2e..8154ade 100644
>>>>> --- a/include/linux/memcontrol.h
>>>>> +++ b/include/linux/memcontrol.h
>>>>> @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index {
>>>>> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
>>>>> MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>>>>> MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
>>>>> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */
>>>>> MEM_CGROUP_STAT_NSTATS,
>>>>> };
>>>>>
>>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>>> index 79c4b2b..5b5c121 100644
>>>>> --- a/mm/filemap.c
>>>>> +++ b/mm/filemap.c
>>>>> @@ -141,6 +141,7 @@ void __delete_from_page_cache(struct page *page)
>>>>> * having removed the page entirely.
>>>>> */
>>>>> if (PageDirty(page)&& mapping_cap_account_dirty(mapping)) {
>>>>> + mem_cgroup_dec_page_stat(page,
>>>>> MEM_CGROUP_STAT_FILE_DIRTY);
>>>>
>>>>
>>>> You need to use mem_cgroup_{begin,end}_update_page_stat around critical
>>>> sections that:
>>>> 1) check PageDirty
>>>> 2) update MEM_CGROUP_STAT_FILE_DIRTY counter
>>>>
>>>> This protects against the page from being moved between memcg while
>>>> accounting. Same comment applies to all of your new calls to
>>>> mem_cgroup_{dec,inc}_page_stat. For usage pattern, see
>>>> page_add_file_rmap.
>>>>
>>>
>>> If you feel some difficulty with mem_cgroup_{begin,end}_update_page_stat(),
>>> please let me know...I hope they should work enough....
>>>
>>
>> Hi, Kame
>>
>> While digging into the bigger lock of mem_cgroup_{begin,end}_update_page_stat(),
>> I find the reality is more complex than I thought. Simply stated,
>> modifying page info
>> and update page stat may be wide apart and in different level (eg.
>> mm&fs), so if we
>> use the big lock it may lead to scalability and maintainability issues.
>>
>> For example:
>> mem_cgroup_begin_update_page_stat()
>> modify page information => TestSetPageDirty in ceph_set_page_dirty() (fs/ceph/addr.c)
>> XXXXXX => other fs operations
>> mem_cgroup_update_page_stat() => account_page_dirtied() in mm/page-writeback.c
>> mem_cgroup_end_update_page_stat().
>>
>> We can choose to get lock in higher level meaning vfs set_page_dirty()
>> but this may span
>> too much and can also have some missing cases.
>> What's your opinion of this problem?
>>
>
> yes, that's sad....If set_page_dirty() is always called under lock_page(), the
> story will be easier (we'll take lock_page() in move side.)
> but the comment on set_page_dirty() says it's not true.....Now, I haven't found a magical
> way for avoiding the race.
> (*) If holding lock_page() in move_account() can be a generic solution, it will be good.
> A proposal from me is a small-start. You can start from adding hooks to a
> generic
> functions as set_page_dirty() and __set_page_dirty_nobuffers(), clear_page_dirty_for_io().
>
> And see what happens. I guess we can add WARN_ONCE() against callers of update_page_stat()
> who don't take mem_cgroup_begin/end_update_page_stat()
> (by some new check, for example, checking !rcu_read_lock_held() in update_stat())
>
> I think we can make TODO list and catch up remaining things one by one.
>
> Thanks,
> -Kame

This might be a crazy idea. Synchronization of PageDirty with the
page->memcg->nr_dirty counter is a challenge because page->memcg can be
reassigned due to inter-memcg page moving. Could we avoid moving dirty
pages between memcg? Specifically, could we make them clean before
moving. This problem feels similar to page migration. This would slow
down inter-memcg page movement, because it would require writeback. But
I'm suspect that this is an infrequent operation.

2012-06-21 23:12:37

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2/2] memcg: add per cgroup dirty pages accounting

(2012/06/22 1:02), Greg Thelen wrote:
> On Thu, Jun 21 2012, Kamezawa Hiroyuki wrote:
>
>> (2012/06/19 23:31), Sha Zhengju wrote:
>>> On Sat, Jun 16, 2012 at 2:34 PM, Kamezawa Hiroyuki
>>> <[email protected]> wrote:
>>>> (2012/06/16 0:32), Greg Thelen wrote:
>>>>>
>>>>> On Fri, Jun 15 2012, Sha Zhengju wrote:
>>>>>
>>>>>> This patch adds memcg routines to count dirty pages. I notice that
>>>>>> the list has talked about per-cgroup dirty page limiting
>>>>>> (http://lwn.net/Articles/455341/) before, but it did not get merged.
>>>>>
>>>>>
>>>>> Good timing, I was just about to make another effort to get some of
>>>>> these patches upstream. Like you, I was going to start with some basic
>>>>> counters.
>>>>>
>>>>> Your approach is similar to what I have in mind. While it is good to
>>>>> use the existing PageDirty flag, rather than introducing a new
>>>>> page_cgroup flag, there are locking complications (see below) to handle
>>>>> races between moving pages between memcg and the pages being {un}marked
>>>>> dirty.
>>>>>
>>>>>> I've no idea how is this going now, but maybe we can add per cgroup
>>>>>> dirty pages accounting first. This allows the memory controller to
>>>>>> maintain an accurate view of the amount of its memory that is dirty
>>>>>> and can provide some infomation while group's direct reclaim is working.
>>>>>>
>>>>>> After commit 89c06bd5 (memcg: use new logic for page stat accounting),
>>>>>> we do not need per page_cgroup flag anymore and can directly use
>>>>>> struct page flag.
>>>>>>
>>>>>>
>>>>>> Signed-off-by: Sha Zhengju<[email protected]>
>>>>>> ---
>>>>>> include/linux/memcontrol.h | 1 +
>>>>>> mm/filemap.c | 1 +
>>>>>> mm/memcontrol.c | 32 +++++++++++++++++++++++++-------
>>>>>> mm/page-writeback.c | 2 ++
>>>>>> mm/truncate.c | 1 +
>>>>>> 5 files changed, 30 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>>>>> index a337c2e..8154ade 100644
>>>>>> --- a/include/linux/memcontrol.h
>>>>>> +++ b/include/linux/memcontrol.h
>>>>>> @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index {
>>>>>> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
>>>>>> MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>>>>>> MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
>>>>>> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */
>>>>>> MEM_CGROUP_STAT_NSTATS,
>>>>>> };
>>>>>>
>>>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>>>> index 79c4b2b..5b5c121 100644
>>>>>> --- a/mm/filemap.c
>>>>>> +++ b/mm/filemap.c
>>>>>> @@ -141,6 +141,7 @@ void __delete_from_page_cache(struct page *page)
>>>>>> * having removed the page entirely.
>>>>>> */
>>>>>> if (PageDirty(page)&& mapping_cap_account_dirty(mapping)) {
>>>>>> + mem_cgroup_dec_page_stat(page,
>>>>>> MEM_CGROUP_STAT_FILE_DIRTY);
>>>>>
>>>>>
>>>>> You need to use mem_cgroup_{begin,end}_update_page_stat around critical
>>>>> sections that:
>>>>> 1) check PageDirty
>>>>> 2) update MEM_CGROUP_STAT_FILE_DIRTY counter
>>>>>
>>>>> This protects against the page from being moved between memcg while
>>>>> accounting. Same comment applies to all of your new calls to
>>>>> mem_cgroup_{dec,inc}_page_stat. For usage pattern, see
>>>>> page_add_file_rmap.
>>>>>
>>>>
>>>> If you feel some difficulty with mem_cgroup_{begin,end}_update_page_stat(),
>>>> please let me know...I hope they should work enough....
>>>>
>>>
>>> Hi, Kame
>>>
>>> While digging into the bigger lock of mem_cgroup_{begin,end}_update_page_stat(),
>>> I find the reality is more complex than I thought. Simply stated,
>>> modifying page info
>>> and update page stat may be wide apart and in different level (eg.
>>> mm&fs), so if we
>>> use the big lock it may lead to scalability and maintainability issues.
>>>
>>> For example:
>>> mem_cgroup_begin_update_page_stat()
>>> modify page information => TestSetPageDirty in ceph_set_page_dirty() (fs/ceph/addr.c)
>>> XXXXXX => other fs operations
>>> mem_cgroup_update_page_stat() => account_page_dirtied() in mm/page-writeback.c
>>> mem_cgroup_end_update_page_stat().
>>>
>>> We can choose to get lock in higher level meaning vfs set_page_dirty()
>>> but this may span
>>> too much and can also have some missing cases.
>>> What's your opinion of this problem?
>>>
>>
>> yes, that's sad....If set_page_dirty() is always called under lock_page(), the
>> story will be easier (we'll take lock_page() in move side.)
>> but the comment on set_page_dirty() says it's not true.....Now, I haven't found a magical
>> way for avoiding the race.
>> (*) If holding lock_page() in move_account() can be a generic solution, it will be good.
>> A proposal from me is a small-start. You can start from adding hooks to a
>> generic
>> functions as set_page_dirty() and __set_page_dirty_nobuffers(), clear_page_dirty_for_io().
>>
>> And see what happens. I guess we can add WARN_ONCE() against callers of update_page_stat()
>> who don't take mem_cgroup_begin/end_update_page_stat()
>> (by some new check, for example, checking !rcu_read_lock_held() in update_stat())
>>
>> I think we can make TODO list and catch up remaining things one by one.
>>
>> Thanks,
>> -Kame
>
> This might be a crazy idea. Synchronization of PageDirty with the
> page->memcg->nr_dirty counter is a challenge because page->memcg can be
> reassigned due to inter-memcg page moving.

Yes. That's the heart of the problem.

> Could we avoid moving dirty pages between memcg?

How to detect it is the proebm here....

> Specifically, could we make them clean before moving.

I considered that but a case

CPU-A CPU-B
wait_for_page_cleaned
..... SetPageDirty()
account-memcg-nr_dirty

is problematic. _If_

CPU-A
lock_page()
move_page_for_accounting()
unlock_page()

can help 99% of cases, I think this is a choice. But I haven't investigated
how many callers of set_page_dirty() holds locks....
(I guess CleraPageDirty() callers are under lock_page() always...by quick look.)

If most of callers calls lock_page() or mem_cgroup_begin/end_update....I think
adding WARNING(!page_locked(page) || !rcu_read_locked()) to update_stat() will
be a proof of concept and automatically shows what we should do more...

> This problem feels similar to page migration. This would slow
> down inter-memcg page movement, because it would require writeback. But
> I'm suspect that this is an infrequent operation.

I agree. But, IIUC, the reason page-migration waits for the end of I/O is that migrating
pages under I/O (in being copied by devices) seems crazy. So, just lock_page()
will be an enough help....

Thanks,
-Kame





2012-06-28 11:33:08

by Sha Zhengju

[permalink] [raw]
Subject: Re: [PATCH 2/2] memcg: add per cgroup dirty pages accounting

On 06/22/2012 07:09 AM, Kamezawa Hiroyuki wrote:
> (2012/06/22 1:02), Greg Thelen wrote:
>> On Thu, Jun 21 2012, Kamezawa Hiroyuki wrote:
>>
>>> (2012/06/19 23:31), Sha Zhengju wrote:
>>>> On Sat, Jun 16, 2012 at 2:34 PM, Kamezawa Hiroyuki
>>>> <[email protected]> wrote:
>>>>> (2012/06/16 0:32), Greg Thelen wrote:
>>>>>>
>>>>>> On Fri, Jun 15 2012, Sha Zhengju wrote:
>>>>>>
>>>>>>> This patch adds memcg routines to count dirty pages. I notice that
>>>>>>> the list has talked about per-cgroup dirty page limiting
>>>>>>> (http://lwn.net/Articles/455341/) before, but it did not get
>>>>>>> merged.
>>>>>>
>>>>>>
>>>>>> Good timing, I was just about to make another effort to get some of
>>>>>> these patches upstream. Like you, I was going to start with some
>>>>>> basic
>>>>>> counters.
>>>>>>
>>>>>> Your approach is similar to what I have in mind. While it is
>>>>>> good to
>>>>>> use the existing PageDirty flag, rather than introducing a new
>>>>>> page_cgroup flag, there are locking complications (see below) to
>>>>>> handle
>>>>>> races between moving pages between memcg and the pages being
>>>>>> {un}marked
>>>>>> dirty.
>>>>>>
>>>>>>> I've no idea how is this going now, but maybe we can add per cgroup
>>>>>>> dirty pages accounting first. This allows the memory controller to
>>>>>>> maintain an accurate view of the amount of its memory that is dirty
>>>>>>> and can provide some infomation while group's direct reclaim is
>>>>>>> working.
>>>>>>>
>>>>>>> After commit 89c06bd5 (memcg: use new logic for page stat
>>>>>>> accounting),
>>>>>>> we do not need per page_cgroup flag anymore and can directly use
>>>>>>> struct page flag.
>>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Sha Zhengju<[email protected]>
>>>>>>> ---
>>>>>>> include/linux/memcontrol.h | 1 +
>>>>>>> mm/filemap.c | 1 +
>>>>>>> mm/memcontrol.c | 32
>>>>>>> +++++++++++++++++++++++++-------
>>>>>>> mm/page-writeback.c | 2 ++
>>>>>>> mm/truncate.c | 1 +
>>>>>>> 5 files changed, 30 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/memcontrol.h
>>>>>>> b/include/linux/memcontrol.h
>>>>>>> index a337c2e..8154ade 100644
>>>>>>> --- a/include/linux/memcontrol.h
>>>>>>> +++ b/include/linux/memcontrol.h
>>>>>>> @@ -39,6 +39,7 @@ enum mem_cgroup_stat_index {
>>>>>>> MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as
>>>>>>> file rss */
>>>>>>> MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>>>>>>> MEM_CGROUP_STAT_DATA, /* end of data requires
>>>>>>> synchronization */
>>>>>>> + MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page
>>>>>>> cache */
>>>>>>> MEM_CGROUP_STAT_NSTATS,
>>>>>>> };
>>>>>>>
>>>>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>>>>> index 79c4b2b..5b5c121 100644
>>>>>>> --- a/mm/filemap.c
>>>>>>> +++ b/mm/filemap.c
>>>>>>> @@ -141,6 +141,7 @@ void __delete_from_page_cache(struct page
>>>>>>> *page)
>>>>>>> * having removed the page entirely.
>>>>>>> */
>>>>>>> if (PageDirty(page)&&
>>>>>>> mapping_cap_account_dirty(mapping)) {
>>>>>>> + mem_cgroup_dec_page_stat(page,
>>>>>>> MEM_CGROUP_STAT_FILE_DIRTY);
>>>>>>
>>>>>>
>>>>>> You need to use mem_cgroup_{begin,end}_update_page_stat around
>>>>>> critical
>>>>>> sections that:
>>>>>> 1) check PageDirty
>>>>>> 2) update MEM_CGROUP_STAT_FILE_DIRTY counter
>>>>>>
>>>>>> This protects against the page from being moved between memcg while
>>>>>> accounting. Same comment applies to all of your new calls to
>>>>>> mem_cgroup_{dec,inc}_page_stat. For usage pattern, see
>>>>>> page_add_file_rmap.
>>>>>>
>>>>>
>>>>> If you feel some difficulty with
>>>>> mem_cgroup_{begin,end}_update_page_stat(),
>>>>> please let me know...I hope they should work enough....
>>>>>
>>>>
>>>> Hi, Kame
>>>>
>>>> While digging into the bigger lock of
>>>> mem_cgroup_{begin,end}_update_page_stat(),
>>>> I find the reality is more complex than I thought. Simply stated,
>>>> modifying page info
>>>> and update page stat may be wide apart and in different level (eg.
>>>> mm&fs), so if we
>>>> use the big lock it may lead to scalability and maintainability
>>>> issues.
>>>>
>>>> For example:
>>>> mem_cgroup_begin_update_page_stat()
>>>> modify page information =>
>>>> TestSetPageDirty in ceph_set_page_dirty() (fs/ceph/addr.c)
>>>> XXXXXX => other fs
>>>> operations
>>>> mem_cgroup_update_page_stat() => account_page_dirtied()
>>>> in mm/page-writeback.c
>>>> mem_cgroup_end_update_page_stat().
>>>>
>>>> We can choose to get lock in higher level meaning vfs set_page_dirty()
>>>> but this may span
>>>> too much and can also have some missing cases.
>>>> What's your opinion of this problem?
>>>>
>>>
>>> yes, that's sad....If set_page_dirty() is always called under
>>> lock_page(), the
>>> story will be easier (we'll take lock_page() in move side.)
>>> but the comment on set_page_dirty() says it's not true.....Now, I
>>> haven't found a magical
>>> way for avoiding the race.
>>> (*) If holding lock_page() in move_account() can be a generic
>>> solution, it will be good.
>>> A proposal from me is a small-start. You can start from adding
>>> hooks to a
>>> generic
>>> functions as set_page_dirty() and __set_page_dirty_nobuffers(),
>>> clear_page_dirty_for_io().
>>>
>>> And see what happens. I guess we can add WARN_ONCE() against callers
>>> of update_page_stat()
>>> who don't take mem_cgroup_begin/end_update_page_stat()
>>> (by some new check, for example, checking !rcu_read_lock_held() in
>>> update_stat())
>>>
>>> I think we can make TODO list and catch up remaining things one by one.
>>>
>>> Thanks,
>>> -Kame
>>
>> This might be a crazy idea. Synchronization of PageDirty with the
>> page->memcg->nr_dirty counter is a challenge because page->memcg can be
>> reassigned due to inter-memcg page moving.
>
> Yes. That's the heart of the problem.
>
>> Could we avoid moving dirty pages between memcg?
>
> How to detect it is the proebm here....
>
>> Specifically, could we make them clean before moving.
>
> I considered that but a case
>
> CPU-A CPU-B
> wait_for_page_cleaned
> ..... SetPageDirty()
> account-memcg-nr_dirty
>
> is problematic. _If_
>
> CPU-A
> lock_page()
> move_page_for_accounting()
> unlock_page()
>
> can help 99% of cases, I think this is a choice. But I haven't
> investigated
> how many callers of set_page_dirty() holds locks....
> (I guess CleraPageDirty() callers are under lock_page() always...by
> quick look.)
>
> If most of callers calls lock_page() or
> mem_cgroup_begin/end_update....I think
> adding WARNING(!page_locked(page) || !rcu_read_locked()) to
> update_stat() will
> be a proof of concept and automatically shows what we should do more...
>
>> This problem feels similar to page migration. This would slow
>> down inter-memcg page movement, because it would require writeback. But
>> I'm suspect that this is an infrequent operation.
>
> I agree. But, IIUC, the reason page-migration waits for the end of I/O
> is that migrating
> pages under I/O (in being copied by devices) seems crazy. So, just
> lock_page()
> will be an enough help....
>
Hi, Kame

I've checked some set_page_dirty callers and found that dozes of them
don't lock the page.
Following is some comments of __set_page_dirty_nobuffers:

* Most callers have locked the page, which pins the address_space in
memory.
* But zap_pte_range() does not lock the page, however in that case the
* mapping is pinned by the vma's ->vm_file reference.

So lock_page() may not be enough too.
Meanwhile, the move side have already token mem_cgroup_begin/end_update
lock for
FILE_MAPPED page accounting and it may be too heavy to hold another page
lock.

I try to rework vfs set dirty page routines to make SetPageDirty and
dirty page accounting be
in generic interfaces and still use mem_cgroup_begin/end_update lock. I
also add writeback
page accounting in similar way but more easier.

I've sent out the patch set. Please feel free to point out any mistakes.

Thanks,
Sha