2008-12-05 12:30:38

by Daisuke Nishimura

[permalink] [raw]
Subject: [RFC][PATCH -mmotm 0/4] patches for memory cgroup (20081205)

Hi.

These are patches that I have now.

Patches:
[1/4] memcg: don't trigger oom at page migration
[2/4] memcg: remove mem_cgroup_try_charge
[3/4] memcg: avoid deadlock caused by race between oom and cpuset_attach
[4/4] memcg: change try_to_free_pages to hierarchical_reclaim

There is no special meaning in patch order except for 1 and 2.

Any comments would be welcome.


Thanks,
Daisuke Nishimura.


2008-12-05 12:31:24

by Daisuke Nishimura

[permalink] [raw]
Subject: [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration

I think triggering OOM at mem_cgroup_prepare_migration would be just a bit
overkill.
Returning -ENOMEM would be enough for mem_cgroup_prepare_migration.
The caller would handle the case anyway.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4dbce1d..50ee1be 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1330,7 +1330,7 @@ int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
unlock_page_cgroup(pc);

if (mem) {
- ret = mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem);
+ ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
css_put(&mem->css);
}
*ptr = mem;

2008-12-05 12:32:05

by Daisuke Nishimura

[permalink] [raw]
Subject: [RFC][PATCH -mmotm 2/4] memcg: remove mem_cgroup_try_charge

Now, mem_cgroup_try_charge is not used by anyone, so we can remove it.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
include/linux/memcontrol.h | 8 --------
mm/memcontrol.c | 21 +--------------------
2 files changed, 1 insertions(+), 28 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index fe82b58..4b35739 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -40,8 +40,6 @@ struct mm_struct;
extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask);
/* for swap handling */
-extern int mem_cgroup_try_charge(struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **ptr);
extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
struct page *page, gfp_t mask, struct mem_cgroup **ptr);
extern void mem_cgroup_commit_charge_swapin(struct page *page,
@@ -135,12 +133,6 @@ static inline int mem_cgroup_cache_charge(struct page *page,
return 0;
}

-static inline int mem_cgroup_try_charge(struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **ptr)
-{
- return 0;
-}
-
static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
struct page *page, gfp_t gfp_mask, struct mem_cgroup **ptr)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 50ee1be..9c5856b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -808,27 +808,8 @@ nomem:
return -ENOMEM;
}

-/**
- * mem_cgroup_try_charge - get charge of PAGE_SIZE.
- * @mm: an mm_struct which is charged against. (when *memcg is NULL)
- * @gfp_mask: gfp_mask for reclaim.
- * @memcg: a pointer to memory cgroup which is charged against.
- *
- * charge against memory cgroup pointed by *memcg. if *memcg == NULL, estimated
- * memory cgroup from @mm is got and stored in *memcg.
- *
- * Returns 0 if success. -ENOMEM at failure.
- * This call can invoke OOM-Killer.
- */
-
-int mem_cgroup_try_charge(struct mm_struct *mm,
- gfp_t mask, struct mem_cgroup **memcg)
-{
- return __mem_cgroup_try_charge(mm, mask, memcg, true);
-}
-
/*
- * commit a charge got by mem_cgroup_try_charge() and makes page_cgroup to be
+ * commit a charge got by __mem_cgroup_try_charge() and makes page_cgroup to be
* USED state. If already USED, uncharge and return.
*/

2008-12-05 12:32:47

by Daisuke Nishimura

[permalink] [raw]
Subject: [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by race between oom and cpuset_attach

mpol_rebind_mm(), which can be called from cpuset_attach(), does down_write(mm->mmap_sem).
This means down_write(mm->mmap_sem) can be called under cgroup_mutex.

OTOH, page fault path does down_read(mm->mmap_sem) and calls mem_cgroup_try_charge_xxx(),
which may eventually calls mem_cgroup_out_of_memory(). And mem_cgroup_out_of_memory()
calls cgroup_lock().
This means cgroup_lock() can be called under down_read(mm->mmap_sem).

If those two paths race, dead lock can happen.

This patch avoid this dead lock by:
- remove cgroup_lock() from mem_cgroup_out_of_memory().
- define new mutex (memcg_tasklist) and serialize mem_cgroup_move_task()
(->attach handler of memory cgroup) and mem_cgroup_out_of_memory.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 5 +++++
mm/oom_kill.c | 2 --
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9c5856b..ab04725 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -51,6 +51,7 @@ static int really_do_swap_account __initdata = 1; /* for remember boot option*/
#define do_swap_account (0)
#endif

+static DEFINE_MUTEX(memcg_tasklist); /* can be hold under cgroup_mutex */

/*
* Statistics for memory cgroup.
@@ -796,7 +797,9 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,

if (!nr_retries--) {
if (oom) {
+ mutex_lock(&memcg_tasklist);
mem_cgroup_out_of_memory(mem_over_limit, gfp_mask);
+ mutex_unlock(&memcg_tasklist);
mem_over_limit->last_oom_jiffies = jiffies;
}
goto nomem;
@@ -2172,10 +2175,12 @@ static void mem_cgroup_move_task(struct cgroup_subsys *ss,
struct cgroup *old_cont,
struct task_struct *p)
{
+ mutex_lock(&memcg_tasklist);
/*
* FIXME: It's better to move charges of this process from old
* memcg to new memcg. But it's just on TODO-List now.
*/
+ mutex_unlock(&memcg_tasklist);
}

struct cgroup_subsys mem_cgroup_subsys = {
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index fd150e3..40ba050 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -429,7 +429,6 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
unsigned long points = 0;
struct task_struct *p;

- cgroup_lock();
read_lock(&tasklist_lock);
retry:
p = select_bad_process(&points, mem);
@@ -444,7 +443,6 @@ retry:
goto retry;
out:
read_unlock(&tasklist_lock);
- cgroup_unlock();
}
#endif

2008-12-05 12:33:21

by Daisuke Nishimura

[permalink] [raw]
Subject: [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages to hierarchical_reclaim

mem_cgroup_hierarchicl_reclaim() works properly even when !use_hierarchy now,
so, instead of try_to_free_mem_cgroup_pages(), it should be used in many cases.

The only exception is force_empty. The group has no children in this case.


Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab04725..c0b4f37 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1399,8 +1399,7 @@ int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask)
rcu_read_unlock();

do {
- progress = try_to_free_mem_cgroup_pages(mem, gfp_mask, true,
- get_swappiness(mem));
+ progress = mem_cgroup_hierarchical_reclaim(mem, gfp_mask, true);
progress += mem_cgroup_check_under_limit(mem);
} while (!progress && --retry);

@@ -1467,10 +1466,8 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
if (!ret)
break;

- progress = try_to_free_mem_cgroup_pages(memcg,
- GFP_KERNEL,
- false,
- get_swappiness(memcg));
+ progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL,
+ false);
if (!progress) retry_count--;
}

@@ -1514,8 +1511,7 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
break;

oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
- try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL, true,
- get_swappiness(memcg));
+ mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true);
curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
if (curusage >= oldusage)
retry_count--;

2008-12-05 13:22:57

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration

Daisuke Nishimura said:
> I think triggering OOM at mem_cgroup_prepare_migration would be just a bit
> overkill.
> Returning -ENOMEM would be enough for mem_cgroup_prepare_migration.
> The caller would handle the case anyway.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>


> ---
> mm/memcontrol.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4dbce1d..50ee1be 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1330,7 +1330,7 @@ int mem_cgroup_prepare_migration(struct page *page,
> struct mem_cgroup **ptr)
> unlock_page_cgroup(pc);
>
> if (mem) {
> - ret = mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem);
> + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> css_put(&mem->css);
> }
> *ptr = mem;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-12-05 13:24:33

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 2/4] memcg: remove mem_cgroup_try_charge

Daisuke Nishimura said:
> Now, mem_cgroup_try_charge is not used by anyone, so we can remove it.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
Not Now, but after patch [1/4] ;)
Acked-by:KAMEZAWA Hiroyuki <[email protected]>

> ---
> include/linux/memcontrol.h | 8 --------
> mm/memcontrol.c | 21 +--------------------
> 2 files changed, 1 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index fe82b58..4b35739 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -40,8 +40,6 @@ struct mm_struct;
> extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct
> *mm,
> gfp_t gfp_mask);
> /* for swap handling */
> -extern int mem_cgroup_try_charge(struct mm_struct *mm,
> - gfp_t gfp_mask, struct mem_cgroup **ptr);
> extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> struct page *page, gfp_t mask, struct mem_cgroup **ptr);
> extern void mem_cgroup_commit_charge_swapin(struct page *page,
> @@ -135,12 +133,6 @@ static inline int mem_cgroup_cache_charge(struct page
> *page,
> return 0;
> }
>
> -static inline int mem_cgroup_try_charge(struct mm_struct *mm,
> - gfp_t gfp_mask, struct mem_cgroup **ptr)
> -{
> - return 0;
> -}
> -
> static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> struct page *page, gfp_t gfp_mask, struct mem_cgroup **ptr)
> {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 50ee1be..9c5856b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -808,27 +808,8 @@ nomem:
> return -ENOMEM;
> }
>
> -/**
> - * mem_cgroup_try_charge - get charge of PAGE_SIZE.
> - * @mm: an mm_struct which is charged against. (when *memcg is NULL)
> - * @gfp_mask: gfp_mask for reclaim.
> - * @memcg: a pointer to memory cgroup which is charged against.
> - *
> - * charge against memory cgroup pointed by *memcg. if *memcg == NULL,
> estimated
> - * memory cgroup from @mm is got and stored in *memcg.
> - *
> - * Returns 0 if success. -ENOMEM at failure.
> - * This call can invoke OOM-Killer.
> - */
> -
> -int mem_cgroup_try_charge(struct mm_struct *mm,
> - gfp_t mask, struct mem_cgroup **memcg)
> -{
> - return __mem_cgroup_try_charge(mm, mask, memcg, true);
> -}
> -
> /*
> - * commit a charge got by mem_cgroup_try_charge() and makes page_cgroup
> to be
> + * commit a charge got by __mem_cgroup_try_charge() and makes page_cgroup
> to be
> * USED state. If already USED, uncharge and return.
> */
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-12-05 13:27:26

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by racebetween oom and cpuset_attach

Daisuke Nishimura said:
> mpol_rebind_mm(), which can be called from cpuset_attach(), does
> down_write(mm->mmap_sem).
> This means down_write(mm->mmap_sem) can be called under cgroup_mutex.
>
> OTOH, page fault path does down_read(mm->mmap_sem) and calls
> mem_cgroup_try_charge_xxx(),
> which may eventually calls mem_cgroup_out_of_memory(). And
> mem_cgroup_out_of_memory()
> calls cgroup_lock().
> This means cgroup_lock() can be called under down_read(mm->mmap_sem).
>
good catch.

> If those two paths race, dead lock can happen.
>
> This patch avoid this dead lock by:
> - remove cgroup_lock() from mem_cgroup_out_of_memory().
agree to this.

> - define new mutex (memcg_tasklist) and serialize mem_cgroup_move_task()
> (->attach handler of memory cgroup) and mem_cgroup_out_of_memory.
>
Hmm...seems temporal fix (and adding new global lock...)
But ok, we need fix. revist this later.

Reviewed-by: KAMEZAWA Hiroyuki <[email protected],com>

2008-12-05 13:30:52

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages tohierarchical_reclaim

Daisuke Nishimura said:
> mem_cgroup_hierarchicl_reclaim() works properly even when !use_hierarchy
> now,
> so, instead of try_to_free_mem_cgroup_pages(), it should be used in many
> cases.
>
> The only exception is force_empty. The group has no children in this case.
>
Hmm...I postponed this until removing cgroup_lock from reclaim..
But ok, this is a way to go,
Acked-by: KAMEZAWA Hiroyuki <[email protected]>


2008-12-05 13:52:40

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by race between oom and cpuset_attach

* Daisuke Nishimura <[email protected]> [2008-12-05 21:24:50]:

> mpol_rebind_mm(), which can be called from cpuset_attach(), does down_write(mm->mmap_sem).
> This means down_write(mm->mmap_sem) can be called under cgroup_mutex.
>
> OTOH, page fault path does down_read(mm->mmap_sem) and calls mem_cgroup_try_charge_xxx(),
> which may eventually calls mem_cgroup_out_of_memory(). And mem_cgroup_out_of_memory()
> calls cgroup_lock().
> This means cgroup_lock() can be called under down_read(mm->mmap_sem).
>
> If those two paths race, dead lock can happen.
>
> This patch avoid this dead lock by:
> - remove cgroup_lock() from mem_cgroup_out_of_memory().
> - define new mutex (memcg_tasklist) and serialize mem_cgroup_move_task()
> (->attach handler of memory cgroup) and mem_cgroup_out_of_memory.

A similar race has been reported for cpuset_migrate_mm(), which is
called holding the cgroup_mutex and further calls do_migrate_pages,
which can call reclaim and thus try to acquire cgroup_lock. If we
avoid reclaiming pages with cpuset_migrate_mm(), as the first patch
did, it also solves the reported race.

>
> Signed-off-by: Daisuke Nishimura <[email protected]>

Looks good to me

Acked-by: Balbir Singh <[email protected]>

--
Balbir

2008-12-05 13:54:53

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages to hierarchical_reclaim

* Daisuke Nishimura <[email protected]> [2008-12-05 21:25:29]:

> mem_cgroup_hierarchicl_reclaim() works properly even when !use_hierarchy now,
> so, instead of try_to_free_mem_cgroup_pages(), it should be used in many cases.
>

Yes, that was by design. The design is such that use_hierarchy is set
for all children when the parent has it set and the resource counters
are also linked, such that the charge propagates to the root of the
current hierarchy and not any further.

> The only exception is force_empty. The group has no children in this case.
>
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
> ---
> mm/memcontrol.c | 12 ++++--------
> 1 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ab04725..c0b4f37 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1399,8 +1399,7 @@ int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask)
> rcu_read_unlock();
>
> do {
> - progress = try_to_free_mem_cgroup_pages(mem, gfp_mask, true,
> - get_swappiness(mem));
> + progress = mem_cgroup_hierarchical_reclaim(mem, gfp_mask, true);
> progress += mem_cgroup_check_under_limit(mem);
> } while (!progress && --retry);
>
> @@ -1467,10 +1466,8 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
> if (!ret)
> break;
>
> - progress = try_to_free_mem_cgroup_pages(memcg,
> - GFP_KERNEL,
> - false,
> - get_swappiness(memcg));
> + progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL,
> + false);
> if (!progress) retry_count--;
> }
>
> @@ -1514,8 +1511,7 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
> break;
>
> oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> - try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL, true,
> - get_swappiness(memcg));
> + mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true);
> curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> if (curusage >= oldusage)
> retry_count--;
>

Looks good to me

Acked-by: Balbir Singh <[email protected]>

--
Balbir

2008-12-05 13:56:44

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration

* Daisuke Nishimura <[email protected]> [2008-12-05 21:23:04]:

> I think triggering OOM at mem_cgroup_prepare_migration would be just a bit
> overkill.
> Returning -ENOMEM would be enough for mem_cgroup_prepare_migration.
> The caller would handle the case anyway.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
> ---
> mm/memcontrol.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4dbce1d..50ee1be 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1330,7 +1330,7 @@ int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> unlock_page_cgroup(pc);
>
> if (mem) {
> - ret = mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem);
> + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> css_put(&mem->css);
> }
> *ptr = mem;
>

Seems reasonable to me. A comment indicating or adding a noreclaim
wrapper around __mem_cgroup_try_charge to indicate that no reclaim
will take place will be nice.

Acked-by: Balbir Singh <[email protected]>

--
Balbir

2008-12-06 02:48:24

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration

On Fri, 5 Dec 2008 19:09:26 +0530
Balbir Singh <[email protected]> wrote:

> * Daisuke Nishimura <[email protected]> [2008-12-05 21:23:04]:
>
> > I think triggering OOM at mem_cgroup_prepare_migration would be just a bit
> > overkill.
> > Returning -ENOMEM would be enough for mem_cgroup_prepare_migration.
> > The caller would handle the case anyway.
> >
> > Signed-off-by: Daisuke Nishimura <[email protected]>
> > ---
> > mm/memcontrol.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 4dbce1d..50ee1be 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1330,7 +1330,7 @@ int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> > unlock_page_cgroup(pc);
> >
> > if (mem) {
> > - ret = mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem);
> > + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> > css_put(&mem->css);
> > }
> > *ptr = mem;
> >
>
> Seems reasonable to me. A comment indicating or adding a noreclaim
> wrapper around __mem_cgroup_try_charge to indicate that no reclaim
> will take place will be nice.
>
Ah.. this flag to __mem_cgroup_try_charge doesn't mean "don't reclaim"
but "don't cause oom after it tried to free memory but couldn't
free enough memory after all".

Thanks,
Daisuke Nishimura.

> Acked-by: Balbir Singh <[email protected]>
>
> --
> Balbir
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2008-12-06 03:35:28

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by racebetween oom and cpuset_attach

On Fri, 5 Dec 2008 22:27:10 +0900 (JST)
"KAMEZAWA Hiroyuki" <[email protected]> wrote:

> Daisuke Nishimura said:
> > mpol_rebind_mm(), which can be called from cpuset_attach(), does
> > down_write(mm->mmap_sem).
> > This means down_write(mm->mmap_sem) can be called under cgroup_mutex.
> >
> > OTOH, page fault path does down_read(mm->mmap_sem) and calls
> > mem_cgroup_try_charge_xxx(),
> > which may eventually calls mem_cgroup_out_of_memory(). And
> > mem_cgroup_out_of_memory()
> > calls cgroup_lock().
> > This means cgroup_lock() can be called under down_read(mm->mmap_sem).
> >
> good catch.
>
Thanks.

> > If those two paths race, dead lock can happen.
> >
> > This patch avoid this dead lock by:
> > - remove cgroup_lock() from mem_cgroup_out_of_memory().
> agree to this.
>
> > - define new mutex (memcg_tasklist) and serialize mem_cgroup_move_task()
> > (->attach handler of memory cgroup) and mem_cgroup_out_of_memory.
> >
> Hmm...seems temporal fix (and adding new global lock...)
> But ok, we need fix. revist this later.
>
I agree.

> Reviewed-by: KAMEZAWA Hiroyuki <[email protected],com>
>
Thank you.


Daisuke Nishimura.

2008-12-06 03:42:45

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 0/4] patches for memory cgroup (20081205)

On Fri, 5 Dec 2008 21:22:08 +0900
Daisuke Nishimura <[email protected]> wrote:

> Hi.
>
> These are patches that I have now.
>
> Patches:
> [1/4] memcg: don't trigger oom at page migration
> [2/4] memcg: remove mem_cgroup_try_charge
> [3/4] memcg: avoid deadlock caused by race between oom and cpuset_attach
> [4/4] memcg: change try_to_free_pages to hierarchical_reclaim
>
> There is no special meaning in patch order except for 1 and 2.
>
> Any comments would be welcome.
>
Thank you for all your reviews and acks.

I'll send them to Andrew after I back to my office on Monday.


Thanks,
Daisuke Nishimura.

2008-12-06 08:20:39

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration

* Daisuke Nishimura <[email protected]> [2008-12-06 11:47:57]:

> On Fri, 5 Dec 2008 19:09:26 +0530
> Balbir Singh <[email protected]> wrote:
>
> > * Daisuke Nishimura <[email protected]> [2008-12-05 21:23:04]:
> >
> > > I think triggering OOM at mem_cgroup_prepare_migration would be just a bit
> > > overkill.
> > > Returning -ENOMEM would be enough for mem_cgroup_prepare_migration.
> > > The caller would handle the case anyway.
> > >
> > > Signed-off-by: Daisuke Nishimura <[email protected]>
> > > ---
> > > mm/memcontrol.c | 2 +-
> > > 1 files changed, 1 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 4dbce1d..50ee1be 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -1330,7 +1330,7 @@ int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> > > unlock_page_cgroup(pc);
> > >
> > > if (mem) {
> > > - ret = mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem);
> > > + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> > > css_put(&mem->css);
> > > }
> > > *ptr = mem;
> > >
> >
> > Seems reasonable to me. A comment indicating or adding a noreclaim
> > wrapper around __mem_cgroup_try_charge to indicate that no reclaim
> > will take place will be nice.
> >
> Ah.. this flag to __mem_cgroup_try_charge doesn't mean "don't reclaim"
> but "don't cause oom after it tried to free memory but couldn't
> free enough memory after all".

Thanks, I mistook the parameter. Thanks for clarifying!

>
> Thanks,
> Daisuke Nishimura.
>
> > Acked-by: Balbir Singh <[email protected]>
> >
> > --
> > Balbir
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to [email protected]. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
> >
>

--
Balbir