2008-12-01 10:01:24

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [PATCH 0/4] memcg: unified easy patch series for mmotm-Nov30

This patch series are fixed and clean-up agasist
"mm-of-the-moment snapshot 2008-11-30-22-35"

Picked up useful ones from recent linux-mm.
patches are numbered but I think each can be applied one by one independently.

[1/4] clean up move_tasks of memcg.
(From Nikanth Karthikesan <[email protected]>)
Obvious change.

[2/4] fixes mem+swap controller's limit check
(From Daisuke Nishimura <[email protected]>)
Maybe Ack from Balbir is appropriate.

[3/4] clean up gfp_mask passed to memcg.
From me. I want a review by Hugh Dickins and Nick Piggin. please.

[4/4] fix the name of scan_global_lru().
From me. Obvious change.

I'll queue Kosaki's memcg LRU fixes and cgroup-ID based on this.

Thanks,
-Kame


2008-12-01 10:02:38

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [PATCH 1/4] memcg: remove unused logic in mem_cgroup_move_task()

Remove unnecessary codes (...fragments of not-implemented functionalilty...)

Changelog:
- removed all unused fragments.
- added comment.


Reported-by: Nikanth Karthikesan <[email protected]>
Signed-off-by: Nikanth Karthikesan <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

Index: mmotm-2.6.28-Nov30/mm/memcontrol.c
===================================================================
--- mmotm-2.6.28-Nov30.orig/mm/memcontrol.c
+++ mmotm-2.6.28-Nov30/mm/memcontrol.c
@@ -2008,25 +2008,10 @@ static void mem_cgroup_move_task(struct
struct cgroup *old_cont,
struct task_struct *p)
{
- struct mm_struct *mm;
- struct mem_cgroup *mem, *old_mem;
-
- mm = get_task_mm(p);
- if (mm == NULL)
- return;
-
- mem = mem_cgroup_from_cont(cont);
- old_mem = mem_cgroup_from_cont(old_cont);
-
/*
- * Only thread group leaders are allowed to migrate, the mm_struct is
- * in effect owned by the leader
+ * FIXME: It's better to move charges of this process from old
+ * memcg to new memcg. But it's just on TODO-List now.
*/
- if (!thread_group_leader(p))
- goto out;
-
-out:
- mmput(mm);
}

struct cgroup_subsys mem_cgroup_subsys = {

2008-12-01 10:03:37

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [PATCH 2/4] memcg: fix limit check for mem+swap controller

There are scatterd calls of res_counter_check_under_limit(), and most
of them don't take mem+swap accounting into account.

define mem_cgroup_check_under_limit() and avoid direct use of
res_counter_check_limit().

Changelog:
- replaces all res_counter_check_under_limit().

Reported-by: Daisuke Nishimura <[email protected]>
Signed-off-by: Daisuke Nishimura <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

mm/memcontrol.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)

Index: mmotm-2.6.28-Nov30/mm/memcontrol.c
===================================================================
--- mmotm-2.6.28-Nov30.orig/mm/memcontrol.c
+++ mmotm-2.6.28-Nov30/mm/memcontrol.c
@@ -571,6 +571,18 @@ done:
return ret;
}

+static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem)
+{
+ if (do_swap_account) {
+ if (res_counter_check_under_limit(&mem->res) &&
+ res_counter_check_under_limit(&mem->memsw))
+ return true;
+ } else
+ if (res_counter_check_under_limit(&mem->res))
+ return true;
+ return false;
+}
+
/*
* Dance down the hierarchy if needed to reclaim memory. We remember the
* last child we reclaimed from, so that we don't end up penalizing
@@ -592,7 +604,7 @@ static int mem_cgroup_hierarchical_recla
* have left.
*/
ret = try_to_free_mem_cgroup_pages(root_mem, gfp_mask, noswap);
- if (res_counter_check_under_limit(&root_mem->res))
+ if (mem_cgroup_check_under_limit(root_mem))
return 0;

next_mem = mem_cgroup_get_first_node(root_mem);
@@ -606,7 +618,7 @@ static int mem_cgroup_hierarchical_recla
continue;
}
ret = try_to_free_mem_cgroup_pages(next_mem, gfp_mask, noswap);
- if (res_counter_check_under_limit(&root_mem->res))
+ if (mem_cgroup_check_under_limit(root_mem))
return 0;
cgroup_lock();
next_mem = mem_cgroup_get_next_node(next_mem, root_mem);
@@ -709,12 +721,8 @@ static int __mem_cgroup_try_charge(struc
* current usage of the cgroup before giving up
*
*/
- if (do_swap_account) {
- if (res_counter_check_under_limit(&mem_over_limit->res) &&
- res_counter_check_under_limit(&mem_over_limit->memsw))
- continue;
- } else if (res_counter_check_under_limit(&mem_over_limit->res))
- continue;
+ if (mem_cgroup_check_under_limit(mem_over_limit))
+ continue;

if (!nr_retries--) {
if (oom) {
@@ -1334,7 +1342,7 @@ int mem_cgroup_shrink_usage(struct mm_st

do {
progress = try_to_free_mem_cgroup_pages(mem, gfp_mask, true);
- progress += res_counter_check_under_limit(&mem->res);
+ progress += mem_cgroup_check_under_limit(mem);
} while (!progress && --retry);

css_put(&mem->css);

2008-12-01 10:06:35

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [PATCH 3/4] memcg: explaing memcg's gfp_mask behavior in explicit way.

mem_cgroup_xxx_charge(...gfpmask) function take gfpmask as its argument.
But this gfp_t is only used for check GFP_RECALIM_MASK. In other words,
memcg has no interst where the memory should be reclaimed from.
It just see usage of pages.

Using bare gfp_t is misleading and this is a patch for explaining
expected behavior in explicit way. (better name/code is welcome.)

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

include/linux/gfp.h | 19 +++++++++++++++++++
mm/filemap.c | 2 +-
mm/memory.c | 12 +++++++-----
mm/shmem.c | 8 ++++----
mm/swapfile.c | 2 +-
mm/vmscan.c | 3 +--
6 files changed, 33 insertions(+), 13 deletions(-)

Index: mmotm-2.6.28-Nov30/include/linux/gfp.h
===================================================================
--- mmotm-2.6.28-Nov30.orig/include/linux/gfp.h
+++ mmotm-2.6.28-Nov30/include/linux/gfp.h
@@ -245,4 +245,23 @@ void drain_zone_pages(struct zone *zone,
void drain_all_pages(void);
void drain_local_pages(void *dummy);

+#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+static inline gfp_t gfp_memcg_mask(gfp_t gfp)
+{
+ gfp_t mask;
+ /*
+ * Memory Resource Controller memory reclaim is called to reduce usage
+ * of memory, not to get free memory from specified area.
+ * Remove zone constraints.
+ */
+ mask = gfp & GFP_RECLAIM_MASK;
+ return mask | (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
+}
+#else
+static inline gfp_t gfp_memcg_mask(gfp_t gfp)
+{
+ return gfp;
+}
+#endif
+
#endif /* __LINUX_GFP_H */
Index: mmotm-2.6.28-Nov30/mm/filemap.c
===================================================================
--- mmotm-2.6.28-Nov30.orig/mm/filemap.c
+++ mmotm-2.6.28-Nov30/mm/filemap.c
@@ -461,7 +461,7 @@ int add_to_page_cache_locked(struct page
VM_BUG_ON(!PageLocked(page));

error = mem_cgroup_cache_charge(page, current->mm,
- gfp_mask & ~__GFP_HIGHMEM);
+ gfp_memcg_mask(gfp_mask));
if (error)
goto out;

Index: mmotm-2.6.28-Nov30/mm/vmscan.c
===================================================================
--- mmotm-2.6.28-Nov30.orig/mm/vmscan.c
+++ mmotm-2.6.28-Nov30/mm/vmscan.c
@@ -1733,8 +1733,7 @@ unsigned long try_to_free_mem_cgroup_pag
if (noswap)
sc.may_swap = 0;

- sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
- (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
+ sc.gfp_mask = gfp_memcg_mask(gfp_mask);
zonelist = NODE_DATA(numa_node_id())->node_zonelists;
return do_try_to_free_pages(zonelist, &sc);
}
Index: mmotm-2.6.28-Nov30/mm/memory.c
===================================================================
--- mmotm-2.6.28-Nov30.orig/mm/memory.c
+++ mmotm-2.6.28-Nov30/mm/memory.c
@@ -1913,7 +1913,8 @@ gotten:
cow_user_page(new_page, old_page, address, vma);
__SetPageUptodate(new_page);

- if (mem_cgroup_newpage_charge(new_page, mm, GFP_HIGHUSER_MOVABLE))
+ if (mem_cgroup_newpage_charge(new_page, mm,
+ gfp_memcg_mask(GFP_HIGHUSER_MOVABLE)))
goto oom_free_new;

/*
@@ -2345,7 +2346,7 @@ static int do_swap_page(struct mm_struct
delayacct_clear_flag(DELAYACCT_PF_SWAPIN);

if (mem_cgroup_try_charge_swapin(mm, page,
- GFP_HIGHUSER_MOVABLE, &ptr) == -ENOMEM) {
+ gfp_memcg_mask(GFP_HIGHUSER_MOVABLE), &ptr) == -ENOMEM) {
ret = VM_FAULT_OOM;
unlock_page(page);
goto out;
@@ -2437,7 +2438,8 @@ static int do_anonymous_page(struct mm_s
goto oom;
__SetPageUptodate(page);

- if (mem_cgroup_newpage_charge(page, mm, GFP_HIGHUSER_MOVABLE))
+ if (mem_cgroup_newpage_charge(page, mm,
+ gfp_memcg_mask(GFP_HIGHUSER_MOVABLE)))
goto oom_free_page;

entry = mk_pte(page, vma->vm_page_prot);
@@ -2528,8 +2530,8 @@ static int __do_fault(struct mm_struct *
ret = VM_FAULT_OOM;
goto out;
}
- if (mem_cgroup_newpage_charge(page,
- mm, GFP_HIGHUSER_MOVABLE)) {
+ if (mem_cgroup_newpage_charge(page, mm,
+ gfp_memcg_mask(GFP_HIGHUSER_MOVABLE))) {
ret = VM_FAULT_OOM;
page_cache_release(page);
goto out;
Index: mmotm-2.6.28-Nov30/mm/swapfile.c
===================================================================
--- mmotm-2.6.28-Nov30.orig/mm/swapfile.c
+++ mmotm-2.6.28-Nov30/mm/swapfile.c
@@ -698,7 +698,7 @@ static int unuse_pte(struct vm_area_stru
int ret = 1;

if (mem_cgroup_try_charge_swapin(vma->vm_mm, page,
- GFP_HIGHUSER_MOVABLE, &ptr))
+ gfp_memcg_mask(GFP_HIGHUSER_MOVABLE), &ptr))
ret = -ENOMEM;

pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
Index: mmotm-2.6.28-Nov30/mm/shmem.c
===================================================================
--- mmotm-2.6.28-Nov30.orig/mm/shmem.c
+++ mmotm-2.6.28-Nov30/mm/shmem.c
@@ -924,8 +924,8 @@ found:
* Charge page using GFP_HIGHUSER_MOVABLE while we can wait.
* charged back to the user(not to caller) when swap account is used.
*/
- error = mem_cgroup_cache_charge_swapin(page,
- current->mm, GFP_HIGHUSER_MOVABLE, true);
+ error = mem_cgroup_cache_charge_swapin(page, current->mm,
+ gfp_memcg_mask(GFP_HIGHUSER_MOVABLE), true);
if (error)
goto out;
error = radix_tree_preload(GFP_KERNEL);
@@ -1267,7 +1267,7 @@ repeat:
* charge against this swap cache here.
*/
if (mem_cgroup_cache_charge_swapin(swappage,
- current->mm, gfp, false)) {
+ current->mm, gfp_memcg_mask(gfp), false)) {
page_cache_release(swappage);
error = -ENOMEM;
goto failed;
@@ -1385,7 +1385,7 @@ repeat:

/* Precharge page while we can wait, compensate after */
error = mem_cgroup_cache_charge(filepage, current->mm,
- GFP_HIGHUSER_MOVABLE);
+ gfp_memcg_mask(GFP_HIGHUSER_MOVABLE));
if (error) {
page_cache_release(filepage);
shmem_unacct_blocks(info->flags, 1);

2008-12-01 10:08:20

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [PATCH 4/4] memcg: rename scan_global_lru macro

Rename scan_grobal_lru() to scanning_global_lru().

scan_global_lru() sounds like that it does "scan" global lru.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

mm/vmscan.c | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)

Index: mmotm-2.6.28-Nov30/mm/vmscan.c
===================================================================
--- mmotm-2.6.28-Nov30.orig/mm/vmscan.c
+++ mmotm-2.6.28-Nov30/mm/vmscan.c
@@ -126,9 +126,9 @@ static LIST_HEAD(shrinker_list);
static DECLARE_RWSEM(shrinker_rwsem);

#ifdef CONFIG_CGROUP_MEM_RES_CTLR
-#define scan_global_lru(sc) (!(sc)->mem_cgroup)
+#define scanning_global_lru(sc) (!(sc)->mem_cgroup)
#else
-#define scan_global_lru(sc) (1)
+#define scanning_global_lru(sc) (1)
#endif

/*
@@ -1124,7 +1124,7 @@ static unsigned long shrink_inactive_lis
__mod_zone_page_state(zone, NR_INACTIVE_ANON,
-count[LRU_INACTIVE_ANON]);

- if (scan_global_lru(sc)) {
+ if (scanning_global_lru(sc)) {
zone->pages_scanned += nr_scan;
zone->recent_scanned[0] += count[LRU_INACTIVE_ANON];
zone->recent_scanned[0] += count[LRU_ACTIVE_ANON];
@@ -1162,7 +1162,7 @@ static unsigned long shrink_inactive_lis
if (current_is_kswapd()) {
__count_zone_vm_events(PGSCAN_KSWAPD, zone, nr_scan);
__count_vm_events(KSWAPD_STEAL, nr_freed);
- } else if (scan_global_lru(sc))
+ } else if (scanning_global_lru(sc))
__count_zone_vm_events(PGSCAN_DIRECT, zone, nr_scan);

__count_zone_vm_events(PGSTEAL, zone, nr_freed);
@@ -1188,7 +1188,7 @@ static unsigned long shrink_inactive_lis
SetPageLRU(page);
lru = page_lru(page);
add_page_to_lru_list(zone, page, lru);
- if (PageActive(page) && scan_global_lru(sc)) {
+ if (PageActive(page) && scanning_global_lru(sc)) {
int file = !!page_is_file_cache(page);
zone->recent_rotated[file]++;
}
@@ -1265,7 +1265,7 @@ static void shrink_active_list(unsigned
* zone->pages_scanned is used for detect zone's oom
* mem_cgroup remembers nr_scan by itself.
*/
- if (scan_global_lru(sc)) {
+ if (scanning_global_lru(sc)) {
zone->pages_scanned += pgscanned;
zone->recent_scanned[!!file] += pgmoved;
}
@@ -1301,7 +1301,7 @@ static void shrink_active_list(unsigned
* This helps balance scan pressure between file and anonymous
* pages in get_scan_ratio.
*/
- if (scan_global_lru(sc))
+ if (scanning_global_lru(sc))
zone->recent_rotated[!!file] += pgmoved;

/*
@@ -1361,7 +1361,7 @@ static unsigned long shrink_list(enum lr
}

if (lru == LRU_ACTIVE_ANON &&
- (!scan_global_lru(sc) || inactive_anon_is_low(zone))) {
+ (!scanning_global_lru(sc) || inactive_anon_is_low(zone))) {
shrink_active_list(nr_to_scan, zone, sc, priority, file);
return 0;
}
@@ -1467,7 +1467,7 @@ static void shrink_zone(int priority, st
get_scan_ratio(zone, sc, percent);

for_each_evictable_lru(l) {
- if (scan_global_lru(sc)) {
+ if (scanning_global_lru(sc)) {
int file = is_file_lru(l);
int scan;

@@ -1522,9 +1522,9 @@ static void shrink_zone(int priority, st
* Even if we did not try to evict anon pages at all, we want to
* rebalance the anon lru active/inactive ratio.
*/
- if (!scan_global_lru(sc) || inactive_anon_is_low(zone))
+ if (!scanning_global_lru(sc) || inactive_anon_is_low(zone))
shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
- else if (!scan_global_lru(sc))
+ else if (!scanning_global_lru(sc))
shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);

throttle_vm_writeout(sc->gfp_mask);
@@ -1559,7 +1559,7 @@ static void shrink_zones(int priority, s
* Take care memory controller reclaiming has small influence
* to global LRU.
*/
- if (scan_global_lru(sc)) {
+ if (scanning_global_lru(sc)) {
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
continue;
note_zone_scanning_priority(zone, priority);
@@ -1612,12 +1612,12 @@ static unsigned long do_try_to_free_page

delayacct_freepages_start();

- if (scan_global_lru(sc))
+ if (scanning_global_lru(sc))
count_vm_event(ALLOCSTALL);
/*
* mem_cgroup will not do shrink_slab.
*/
- if (scan_global_lru(sc)) {
+ if (scanning_global_lru(sc)) {
for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {

if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
@@ -1636,7 +1636,7 @@ static unsigned long do_try_to_free_page
* Don't shrink slabs when reclaiming memory from
* over limit cgroups
*/
- if (scan_global_lru(sc)) {
+ if (scanning_global_lru(sc)) {
shrink_slab(sc->nr_scanned, sc->gfp_mask, lru_pages, NULL);
if (reclaim_state) {
sc->nr_reclaimed += reclaim_state->reclaimed_slab;
@@ -1667,7 +1667,7 @@ static unsigned long do_try_to_free_page
congestion_wait(WRITE, HZ/10);
}
/* top priority shrink_zones still had more to do? don't OOM, then */
- if (!sc->all_unreclaimable && scan_global_lru(sc))
+ if (!sc->all_unreclaimable && scanning_global_lru(sc))
ret = sc->nr_reclaimed;
out:
/*
@@ -1680,7 +1680,7 @@ out:
if (priority < 0)
priority = 0;

- if (scan_global_lru(sc)) {
+ if (scanning_global_lru(sc)) {
for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {

if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))

2008-12-02 00:32:23

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 3/4] memcg: explaing memcg's gfp_mask behavior in explicit way.

On Mon, 1 Dec 2008, KAMEZAWA Hiroyuki wrote:
> mem_cgroup_xxx_charge(...gfpmask) function take gfpmask as its argument.
> But this gfp_t is only used for check GFP_RECALIM_MASK. In other words,
> memcg has no interst where the memory should be reclaimed from.
> It just see usage of pages.
>
> Using bare gfp_t is misleading and this is a patch for explaining
> expected behavior in explicit way. (better name/code is welcome.)
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

Sorry, but I hate it. You're spreading mem_cgroup ugliness throughout.

This is a good demonstation of why I wanted to go the opposite way to
Nick, why I wanted to push the masking as low as possible; I accept
Nick's point, but please, not at the expense of being this ugly.

It's a pity about loop over tmpfs, IIRC without that case you could
just remove the gfp_mask argument from every one of them - that is a
change I would appreciate! Or am I forgetting some other cases?

I think you should remove the arg from every one you can, and change
those shmem_getpage() ones to say "gfp & GFP_RECLAIM_MASK" (just as
we'll be changing the radix_tree_preload later).

Hmm, shmem_getpage()'s mem_cgroup_cache_charge(,, gfp & ~__GFP_HIGHMEM)
has morphed into mem_cgroup_cache_charge(,, GFP_HIGHUSER_MOVABLE) in
mmotm (well, mmo2daysago, I've not looked today). How come?

It used to be the case that the mem_cgroup calls made their own
memory allocations, and that could become the case again in future:
the gfp mask was passed down for those allocations, and it was almost
everywhere GFP_KERNEL.

mem_cgroup charging may need to reclaim some memory from the memcg:
it happens that the category of memory it goes for is HIGHUSER_MOVABLE;
and it happens that the gfp mask for any incidental allocations it might
want to make, also provides the GFP_RECLAIM_MASK flags for that reclaim.
But please keep that within mem_cgroup_cache_charge_common or whatever.

Hugh

>
> include/linux/gfp.h | 19 +++++++++++++++++++
> mm/filemap.c | 2 +-
> mm/memory.c | 12 +++++++-----
> mm/shmem.c | 8 ++++----
> mm/swapfile.c | 2 +-
> mm/vmscan.c | 3 +--
> 6 files changed, 33 insertions(+), 13 deletions(-)
>
> Index: mmotm-2.6.28-Nov30/include/linux/gfp.h
> ===================================================================
> --- mmotm-2.6.28-Nov30.orig/include/linux/gfp.h
> +++ mmotm-2.6.28-Nov30/include/linux/gfp.h
> @@ -245,4 +245,23 @@ void drain_zone_pages(struct zone *zone,
> void drain_all_pages(void);
> void drain_local_pages(void *dummy);
>
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> +static inline gfp_t gfp_memcg_mask(gfp_t gfp)
> +{
> + gfp_t mask;
> + /*
> + * Memory Resource Controller memory reclaim is called to reduce usage
> + * of memory, not to get free memory from specified area.
> + * Remove zone constraints.
> + */
> + mask = gfp & GFP_RECLAIM_MASK;
> + return mask | (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> +}
> +#else
> +static inline gfp_t gfp_memcg_mask(gfp_t gfp)
> +{
> + return gfp;
> +}
> +#endif
> +
> #endif /* __LINUX_GFP_H */
> Index: mmotm-2.6.28-Nov30/mm/filemap.c
> ===================================================================
> --- mmotm-2.6.28-Nov30.orig/mm/filemap.c
> +++ mmotm-2.6.28-Nov30/mm/filemap.c
> @@ -461,7 +461,7 @@ int add_to_page_cache_locked(struct page
> VM_BUG_ON(!PageLocked(page));
>
> error = mem_cgroup_cache_charge(page, current->mm,
> - gfp_mask & ~__GFP_HIGHMEM);
> + gfp_memcg_mask(gfp_mask));
> if (error)
> goto out;
>
> Index: mmotm-2.6.28-Nov30/mm/vmscan.c
> ===================================================================
> --- mmotm-2.6.28-Nov30.orig/mm/vmscan.c
> +++ mmotm-2.6.28-Nov30/mm/vmscan.c
> @@ -1733,8 +1733,7 @@ unsigned long try_to_free_mem_cgroup_pag
> if (noswap)
> sc.may_swap = 0;
>
> - sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> - (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> + sc.gfp_mask = gfp_memcg_mask(gfp_mask);
> zonelist = NODE_DATA(numa_node_id())->node_zonelists;
> return do_try_to_free_pages(zonelist, &sc);
> }
> Index: mmotm-2.6.28-Nov30/mm/memory.c
> ===================================================================
> --- mmotm-2.6.28-Nov30.orig/mm/memory.c
> +++ mmotm-2.6.28-Nov30/mm/memory.c
> @@ -1913,7 +1913,8 @@ gotten:
> cow_user_page(new_page, old_page, address, vma);
> __SetPageUptodate(new_page);
>
> - if (mem_cgroup_newpage_charge(new_page, mm, GFP_HIGHUSER_MOVABLE))
> + if (mem_cgroup_newpage_charge(new_page, mm,
> + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE)))
> goto oom_free_new;
>
> /*
> @@ -2345,7 +2346,7 @@ static int do_swap_page(struct mm_struct
> delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
>
> if (mem_cgroup_try_charge_swapin(mm, page,
> - GFP_HIGHUSER_MOVABLE, &ptr) == -ENOMEM) {
> + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE), &ptr) == -ENOMEM) {
> ret = VM_FAULT_OOM;
> unlock_page(page);
> goto out;
> @@ -2437,7 +2438,8 @@ static int do_anonymous_page(struct mm_s
> goto oom;
> __SetPageUptodate(page);
>
> - if (mem_cgroup_newpage_charge(page, mm, GFP_HIGHUSER_MOVABLE))
> + if (mem_cgroup_newpage_charge(page, mm,
> + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE)))
> goto oom_free_page;
>
> entry = mk_pte(page, vma->vm_page_prot);
> @@ -2528,8 +2530,8 @@ static int __do_fault(struct mm_struct *
> ret = VM_FAULT_OOM;
> goto out;
> }
> - if (mem_cgroup_newpage_charge(page,
> - mm, GFP_HIGHUSER_MOVABLE)) {
> + if (mem_cgroup_newpage_charge(page, mm,
> + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE))) {
> ret = VM_FAULT_OOM;
> page_cache_release(page);
> goto out;
> Index: mmotm-2.6.28-Nov30/mm/swapfile.c
> ===================================================================
> --- mmotm-2.6.28-Nov30.orig/mm/swapfile.c
> +++ mmotm-2.6.28-Nov30/mm/swapfile.c
> @@ -698,7 +698,7 @@ static int unuse_pte(struct vm_area_stru
> int ret = 1;
>
> if (mem_cgroup_try_charge_swapin(vma->vm_mm, page,
> - GFP_HIGHUSER_MOVABLE, &ptr))
> + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE), &ptr))
> ret = -ENOMEM;
>
> pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> Index: mmotm-2.6.28-Nov30/mm/shmem.c
> ===================================================================
> --- mmotm-2.6.28-Nov30.orig/mm/shmem.c
> +++ mmotm-2.6.28-Nov30/mm/shmem.c
> @@ -924,8 +924,8 @@ found:
> * Charge page using GFP_HIGHUSER_MOVABLE while we can wait.
> * charged back to the user(not to caller) when swap account is used.
> */
> - error = mem_cgroup_cache_charge_swapin(page,
> - current->mm, GFP_HIGHUSER_MOVABLE, true);
> + error = mem_cgroup_cache_charge_swapin(page, current->mm,
> + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE), true);
> if (error)
> goto out;
> error = radix_tree_preload(GFP_KERNEL);
> @@ -1267,7 +1267,7 @@ repeat:
> * charge against this swap cache here.
> */
> if (mem_cgroup_cache_charge_swapin(swappage,
> - current->mm, gfp, false)) {
> + current->mm, gfp_memcg_mask(gfp), false)) {
> page_cache_release(swappage);
> error = -ENOMEM;
> goto failed;
> @@ -1385,7 +1385,7 @@ repeat:
>
> /* Precharge page while we can wait, compensate after */
> error = mem_cgroup_cache_charge(filepage, current->mm,
> - GFP_HIGHUSER_MOVABLE);
> + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE));
> if (error) {
> page_cache_release(filepage);
> shmem_unacct_blocks(info->flags, 1);
>

2008-12-02 01:22:13

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 3/4] memcg: explaing memcg's gfp_mask behavior in explicit way.

On Tue, 2 Dec 2008 00:32:29 +0000 (GMT)
Hugh Dickins <[email protected]> wrote:

> On Mon, 1 Dec 2008, KAMEZAWA Hiroyuki wrote:
> > mem_cgroup_xxx_charge(...gfpmask) function take gfpmask as its argument.
> > But this gfp_t is only used for check GFP_RECALIM_MASK. In other words,
> > memcg has no interst where the memory should be reclaimed from.
> > It just see usage of pages.
> >
> > Using bare gfp_t is misleading and this is a patch for explaining
> > expected behavior in explicit way. (better name/code is welcome.)
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
>
> Sorry, but I hate it. You're spreading mem_cgroup ugliness throughout.
>
> This is a good demonstation of why I wanted to go the opposite way to
> Nick, why I wanted to push the masking as low as possible; I accept
> Nick's point, but please, not at the expense of being this ugly.
>
Okay ;)


> It's a pity about loop over tmpfs, IIRC without that case you could
> just remove the gfp_mask argument from every one of them - that is a
> change I would appreciate! Or am I forgetting some other cases?
>

Ah, I considered that. putting gfp_mask here is for rememebering
"this will do page reclaim".

> I think you should remove the arg from every one you can, and change
> those shmem_getpage() ones to say "gfp & GFP_RECLAIM_MASK" (just as
> we'll be changing the radix_tree_preload later).
>
I'll look into again

> Hmm, shmem_getpage()'s mem_cgroup_cache_charge(,, gfp & ~__GFP_HIGHMEM)
> has morphed into mem_cgroup_cache_charge(,, GFP_HIGHUSER_MOVABLE) in
> mmotm (well, mmo2daysago, I've not looked today). How come?
>
Maybe my patches/memcg-fix-gfp_mask-of-callers-of-charge.patch.
Hmm, I'll write replacement patch for this and remove gfp_t.

> It used to be the case that the mem_cgroup calls made their own
> memory allocations, and that could become the case again in future:
> the gfp mask was passed down for those allocations, and it was almost
> everywhere GFP_KERNEL.
>
> mem_cgroup charging may need to reclaim some memory from the memcg:
> it happens that the category of memory it goes for is HIGHUSER_MOVABLE;
> and it happens that the gfp mask for any incidental allocations it might
> want to make, also provides the GFP_RECLAIM_MASK flags for that reclaim.
> But please keep that within mem_cgroup_cache_charge_common or whatever.
>

Okay. I'll write replacment for memcg-fix-gfp_mask-of-callers-of-charge.patch.

-Kame

> Hugh
>
> >
> > include/linux/gfp.h | 19 +++++++++++++++++++
> > mm/filemap.c | 2 +-
> > mm/memory.c | 12 +++++++-----
> > mm/shmem.c | 8 ++++----
> > mm/swapfile.c | 2 +-
> > mm/vmscan.c | 3 +--
> > 6 files changed, 33 insertions(+), 13 deletions(-)
> >
> > Index: mmotm-2.6.28-Nov30/include/linux/gfp.h
> > ===================================================================
> > --- mmotm-2.6.28-Nov30.orig/include/linux/gfp.h
> > +++ mmotm-2.6.28-Nov30/include/linux/gfp.h
> > @@ -245,4 +245,23 @@ void drain_zone_pages(struct zone *zone,
> > void drain_all_pages(void);
> > void drain_local_pages(void *dummy);
> >
> > +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> > +static inline gfp_t gfp_memcg_mask(gfp_t gfp)
> > +{
> > + gfp_t mask;
> > + /*
> > + * Memory Resource Controller memory reclaim is called to reduce usage
> > + * of memory, not to get free memory from specified area.
> > + * Remove zone constraints.
> > + */
> > + mask = gfp & GFP_RECLAIM_MASK;
> > + return mask | (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> > +}
> > +#else
> > +static inline gfp_t gfp_memcg_mask(gfp_t gfp)
> > +{
> > + return gfp;
> > +}
> > +#endif
> > +
> > #endif /* __LINUX_GFP_H */
> > Index: mmotm-2.6.28-Nov30/mm/filemap.c
> > ===================================================================
> > --- mmotm-2.6.28-Nov30.orig/mm/filemap.c
> > +++ mmotm-2.6.28-Nov30/mm/filemap.c
> > @@ -461,7 +461,7 @@ int add_to_page_cache_locked(struct page
> > VM_BUG_ON(!PageLocked(page));
> >
> > error = mem_cgroup_cache_charge(page, current->mm,
> > - gfp_mask & ~__GFP_HIGHMEM);
> > + gfp_memcg_mask(gfp_mask));
> > if (error)
> > goto out;
> >
> > Index: mmotm-2.6.28-Nov30/mm/vmscan.c
> > ===================================================================
> > --- mmotm-2.6.28-Nov30.orig/mm/vmscan.c
> > +++ mmotm-2.6.28-Nov30/mm/vmscan.c
> > @@ -1733,8 +1733,7 @@ unsigned long try_to_free_mem_cgroup_pag
> > if (noswap)
> > sc.may_swap = 0;
> >
> > - sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> > - (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> > + sc.gfp_mask = gfp_memcg_mask(gfp_mask);
> > zonelist = NODE_DATA(numa_node_id())->node_zonelists;
> > return do_try_to_free_pages(zonelist, &sc);
> > }
> > Index: mmotm-2.6.28-Nov30/mm/memory.c
> > ===================================================================
> > --- mmotm-2.6.28-Nov30.orig/mm/memory.c
> > +++ mmotm-2.6.28-Nov30/mm/memory.c
> > @@ -1913,7 +1913,8 @@ gotten:
> > cow_user_page(new_page, old_page, address, vma);
> > __SetPageUptodate(new_page);
> >
> > - if (mem_cgroup_newpage_charge(new_page, mm, GFP_HIGHUSER_MOVABLE))
> > + if (mem_cgroup_newpage_charge(new_page, mm,
> > + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE)))
> > goto oom_free_new;
> >
> > /*
> > @@ -2345,7 +2346,7 @@ static int do_swap_page(struct mm_struct
> > delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> >
> > if (mem_cgroup_try_charge_swapin(mm, page,
> > - GFP_HIGHUSER_MOVABLE, &ptr) == -ENOMEM) {
> > + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE), &ptr) == -ENOMEM) {
> > ret = VM_FAULT_OOM;
> > unlock_page(page);
> > goto out;
> > @@ -2437,7 +2438,8 @@ static int do_anonymous_page(struct mm_s
> > goto oom;
> > __SetPageUptodate(page);
> >
> > - if (mem_cgroup_newpage_charge(page, mm, GFP_HIGHUSER_MOVABLE))
> > + if (mem_cgroup_newpage_charge(page, mm,
> > + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE)))
> > goto oom_free_page;
> >
> > entry = mk_pte(page, vma->vm_page_prot);
> > @@ -2528,8 +2530,8 @@ static int __do_fault(struct mm_struct *
> > ret = VM_FAULT_OOM;
> > goto out;
> > }
> > - if (mem_cgroup_newpage_charge(page,
> > - mm, GFP_HIGHUSER_MOVABLE)) {
> > + if (mem_cgroup_newpage_charge(page, mm,
> > + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE))) {
> > ret = VM_FAULT_OOM;
> > page_cache_release(page);
> > goto out;
> > Index: mmotm-2.6.28-Nov30/mm/swapfile.c
> > ===================================================================
> > --- mmotm-2.6.28-Nov30.orig/mm/swapfile.c
> > +++ mmotm-2.6.28-Nov30/mm/swapfile.c
> > @@ -698,7 +698,7 @@ static int unuse_pte(struct vm_area_stru
> > int ret = 1;
> >
> > if (mem_cgroup_try_charge_swapin(vma->vm_mm, page,
> > - GFP_HIGHUSER_MOVABLE, &ptr))
> > + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE), &ptr))
> > ret = -ENOMEM;
> >
> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> > Index: mmotm-2.6.28-Nov30/mm/shmem.c
> > ===================================================================
> > --- mmotm-2.6.28-Nov30.orig/mm/shmem.c
> > +++ mmotm-2.6.28-Nov30/mm/shmem.c
> > @@ -924,8 +924,8 @@ found:
> > * Charge page using GFP_HIGHUSER_MOVABLE while we can wait.
> > * charged back to the user(not to caller) when swap account is used.
> > */
> > - error = mem_cgroup_cache_charge_swapin(page,
> > - current->mm, GFP_HIGHUSER_MOVABLE, true);
> > + error = mem_cgroup_cache_charge_swapin(page, current->mm,
> > + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE), true);
> > if (error)
> > goto out;
> > error = radix_tree_preload(GFP_KERNEL);
> > @@ -1267,7 +1267,7 @@ repeat:
> > * charge against this swap cache here.
> > */
> > if (mem_cgroup_cache_charge_swapin(swappage,
> > - current->mm, gfp, false)) {
> > + current->mm, gfp_memcg_mask(gfp), false)) {
> > page_cache_release(swappage);
> > error = -ENOMEM;
> > goto failed;
> > @@ -1385,7 +1385,7 @@ repeat:
> >
> > /* Precharge page while we can wait, compensate after */
> > error = mem_cgroup_cache_charge(filepage, current->mm,
> > - GFP_HIGHUSER_MOVABLE);
> > + gfp_memcg_mask(GFP_HIGHUSER_MOVABLE));
> > if (error) {
> > page_cache_release(filepage);
> > shmem_unacct_blocks(info->flags, 1);
> >
>