2009-06-04 05:12:26

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [PATCH] remove memory.limit v.s. memsw.limit comparison.

From: KAMEZAWA Hiroyuki <[email protected]>

Removes memory.limit < memsw.limit at setting limit check completely.

The limitation "memory.limit <= memsw.limit" was added just because
it seems sane ...if memory.limit > memsw.limit, only memsw.limit works.

But To implement this limitation, we needed to use private mutex and make
the code a bit complated.
As Nishimura pointed out, in real world, there are people who only want
to use memsw.limit.

Then, this patch removes the check. user-land library or middleware can check
this in userland easily if this really concerns.

And this is a good change to charge-and-reclaim.

Now, memory.limit is always checked before memsw.limit
and it may do swap-out. But, if memory.limit == memsw.limit, swap-out is
finally no help and hits memsw.limit again. So, let's allow the condition
memory.limit > memsw.limit. Then we can skip unnecesary swap-out.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
Documentation/cgroups/memory.txt | 15 +++++++++++----
mm/memcontrol.c | 33 +--------------------------------
2 files changed, 12 insertions(+), 36 deletions(-)

Index: mmotm-2.6.30-Jun3/mm/memcontrol.c
===================================================================
--- mmotm-2.6.30-Jun3.orig/mm/memcontrol.c
+++ mmotm-2.6.30-Jun3/mm/memcontrol.c
@@ -1713,14 +1713,11 @@ int mem_cgroup_shmem_charge_fallback(str
return ret;
}

-static DEFINE_MUTEX(set_limit_mutex);
-
static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
unsigned long long val)
{
int retry_count;
int progress;
- u64 memswlimit;
int ret = 0;
int children = mem_cgroup_count_children(memcg);
u64 curusage, oldusage;
@@ -1739,20 +1736,7 @@ static int mem_cgroup_resize_limit(struc
ret = -EINTR;
break;
}
- /*
- * Rather than hide all in some function, I do this in
- * open coded manner. You see what this really does.
- * We have to guarantee mem->res.limit < mem->memsw.limit.
- */
- mutex_lock(&set_limit_mutex);
- memswlimit = res_counter_read_u64(&memcg->memsw, RES_LIMIT);
- if (memswlimit < val) {
- ret = -EINVAL;
- mutex_unlock(&set_limit_mutex);
- break;
- }
ret = res_counter_set_limit(&memcg->res, val);
- mutex_unlock(&set_limit_mutex);

if (!ret)
break;
@@ -1774,7 +1758,7 @@ static int mem_cgroup_resize_memsw_limit
unsigned long long val)
{
int retry_count;
- u64 memlimit, oldusage, curusage;
+ u64 oldusage, curusage;
int children = mem_cgroup_count_children(memcg);
int ret = -EBUSY;

@@ -1786,24 +1770,9 @@ static int mem_cgroup_resize_memsw_limit
ret = -EINTR;
break;
}
- /*
- * Rather than hide all in some function, I do this in
- * open coded manner. You see what this really does.
- * We have to guarantee mem->res.limit < mem->memsw.limit.
- */
- mutex_lock(&set_limit_mutex);
- memlimit = res_counter_read_u64(&memcg->res, RES_LIMIT);
- if (memlimit > val) {
- ret = -EINVAL;
- mutex_unlock(&set_limit_mutex);
- break;
- }
ret = res_counter_set_limit(&memcg->memsw, val);
- mutex_unlock(&set_limit_mutex);
-
if (!ret)
break;
-
mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true, true);
curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
/* Usage is reduced ? */
Index: mmotm-2.6.30-Jun3/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-2.6.30-Jun3.orig/Documentation/cgroups/memory.txt
+++ mmotm-2.6.30-Jun3/Documentation/cgroups/memory.txt
@@ -155,11 +155,18 @@ usage of mem+swap is limited by memsw.li
Note: why 'mem+swap' rather than swap.
The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
to move account from memory to swap...there is no change in usage of
-mem+swap.
+mem+swap. In other words, when we want to limit the usage of swap
+without affecting global LRU, mem+swap limit is better than just limiting
+swap from OS point of view.
+
+
+memory.limit v.s. memsw.limit
+
+There are no guarantee that memsw.limit is bigger than memory.limit
+in the kernel. The user should notice what he really wants and use
+proper size for limitation. Of course, if memsw.limit < memory.limit,
+only memsw.limit works sane.

-In other words, when we want to limit the usage of swap without affecting
-global LRU, mem+swap limit is better than just limiting swap from OS point
-of view.

2.5 Reclaim


2009-06-04 05:38:32

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [PATCH] remove memory.limit v.s. memsw.limit comparison.

On Thu, 4 Jun 2009 14:10:43 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> From: KAMEZAWA Hiroyuki <[email protected]>
>
> Removes memory.limit < memsw.limit at setting limit check completely.
>
> The limitation "memory.limit <= memsw.limit" was added just because
> it seems sane ...if memory.limit > memsw.limit, only memsw.limit works.
>
> But To implement this limitation, we needed to use private mutex and make
> the code a bit complated.
> As Nishimura pointed out, in real world, there are people who only want
> to use memsw.limit.
>
> Then, this patch removes the check. user-land library or middleware can check
> this in userland easily if this really concerns.
>
> And this is a good change to charge-and-reclaim.
>
> Now, memory.limit is always checked before memsw.limit
> and it may do swap-out. But, if memory.limit == memsw.limit, swap-out is
> finally no help and hits memsw.limit again. So, let's allow the condition
> memory.limit > memsw.limit. Then we can skip unnecesary swap-out.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

I welcome this change very much :)

Reviewed-by: Daisuke Nishimura <[email protected]>

Thanks,
Daisuke Nishimura.

> ---
> Documentation/cgroups/memory.txt | 15 +++++++++++----
> mm/memcontrol.c | 33 +--------------------------------
> 2 files changed, 12 insertions(+), 36 deletions(-)
>
> Index: mmotm-2.6.30-Jun3/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.30-Jun3.orig/mm/memcontrol.c
> +++ mmotm-2.6.30-Jun3/mm/memcontrol.c
> @@ -1713,14 +1713,11 @@ int mem_cgroup_shmem_charge_fallback(str
> return ret;
> }
>
> -static DEFINE_MUTEX(set_limit_mutex);
> -
> static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
> unsigned long long val)
> {
> int retry_count;
> int progress;
> - u64 memswlimit;
> int ret = 0;
> int children = mem_cgroup_count_children(memcg);
> u64 curusage, oldusage;
> @@ -1739,20 +1736,7 @@ static int mem_cgroup_resize_limit(struc
> ret = -EINTR;
> break;
> }
> - /*
> - * Rather than hide all in some function, I do this in
> - * open coded manner. You see what this really does.
> - * We have to guarantee mem->res.limit < mem->memsw.limit.
> - */
> - mutex_lock(&set_limit_mutex);
> - memswlimit = res_counter_read_u64(&memcg->memsw, RES_LIMIT);
> - if (memswlimit < val) {
> - ret = -EINVAL;
> - mutex_unlock(&set_limit_mutex);
> - break;
> - }
> ret = res_counter_set_limit(&memcg->res, val);
> - mutex_unlock(&set_limit_mutex);
>
> if (!ret)
> break;
> @@ -1774,7 +1758,7 @@ static int mem_cgroup_resize_memsw_limit
> unsigned long long val)
> {
> int retry_count;
> - u64 memlimit, oldusage, curusage;
> + u64 oldusage, curusage;
> int children = mem_cgroup_count_children(memcg);
> int ret = -EBUSY;
>
> @@ -1786,24 +1770,9 @@ static int mem_cgroup_resize_memsw_limit
> ret = -EINTR;
> break;
> }
> - /*
> - * Rather than hide all in some function, I do this in
> - * open coded manner. You see what this really does.
> - * We have to guarantee mem->res.limit < mem->memsw.limit.
> - */
> - mutex_lock(&set_limit_mutex);
> - memlimit = res_counter_read_u64(&memcg->res, RES_LIMIT);
> - if (memlimit > val) {
> - ret = -EINVAL;
> - mutex_unlock(&set_limit_mutex);
> - break;
> - }
> ret = res_counter_set_limit(&memcg->memsw, val);
> - mutex_unlock(&set_limit_mutex);
> -
> if (!ret)
> break;
> -
> mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true, true);
> curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> /* Usage is reduced ? */
> Index: mmotm-2.6.30-Jun3/Documentation/cgroups/memory.txt
> ===================================================================
> --- mmotm-2.6.30-Jun3.orig/Documentation/cgroups/memory.txt
> +++ mmotm-2.6.30-Jun3/Documentation/cgroups/memory.txt
> @@ -155,11 +155,18 @@ usage of mem+swap is limited by memsw.li
> Note: why 'mem+swap' rather than swap.
> The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
> to move account from memory to swap...there is no change in usage of
> -mem+swap.
> +mem+swap. In other words, when we want to limit the usage of swap
> +without affecting global LRU, mem+swap limit is better than just limiting
> +swap from OS point of view.
> +
> +
> +memory.limit v.s. memsw.limit
> +
> +There are no guarantee that memsw.limit is bigger than memory.limit
> +in the kernel. The user should notice what he really wants and use
> +proper size for limitation. Of course, if memsw.limit < memory.limit,
> +only memsw.limit works sane.
>
> -In other words, when we want to limit the usage of swap without affecting
> -global LRU, mem+swap limit is better than just limiting swap from OS point
> -of view.
>
> 2.5 Reclaim
>
>

2009-06-04 08:31:23

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] remove memory.limit v.s. memsw.limit comparison.

* KAMEZAWA Hiroyuki <[email protected]> [2009-06-04 14:10:43]:

> From: KAMEZAWA Hiroyuki <[email protected]>
>
> Removes memory.limit < memsw.limit at setting limit check completely.
>
> The limitation "memory.limit <= memsw.limit" was added just because
> it seems sane ...if memory.limit > memsw.limit, only memsw.limit works.
>
> But To implement this limitation, we needed to use private mutex and make
> the code a bit complated.
> As Nishimura pointed out, in real world, there are people who only want
> to use memsw.limit.
>
> Then, this patch removes the check. user-land library or middleware can check
> this in userland easily if this really concerns.
>
> And this is a good change to charge-and-reclaim.
>
> Now, memory.limit is always checked before memsw.limit
> and it may do swap-out. But, if memory.limit == memsw.limit, swap-out is
> finally no help and hits memsw.limit again. So, let's allow the condition
> memory.limit > memsw.limit. Then we can skip unnecesary swap-out.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

We can't change behaviour this way without breaking userspace scripts,
API and code. What does it mean for people already using these
features? Does it break their workflow?

--
Balbir

2009-06-04 08:40:57

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] remove memory.limit v.s. memsw.limit comparison.

On Thu, 4 Jun 2009 16:31:10 +0800
Balbir Singh <[email protected]> wrote:

> * KAMEZAWA Hiroyuki <[email protected]> [2009-06-04 14:10:43]:
>
> > From: KAMEZAWA Hiroyuki <[email protected]>
> >
> > Removes memory.limit < memsw.limit at setting limit check completely.
> >
> > The limitation "memory.limit <= memsw.limit" was added just because
> > it seems sane ...if memory.limit > memsw.limit, only memsw.limit works.
> >
> > But To implement this limitation, we needed to use private mutex and make
> > the code a bit complated.
> > As Nishimura pointed out, in real world, there are people who only want
> > to use memsw.limit.
> >
> > Then, this patch removes the check. user-land library or middleware can check
> > this in userland easily if this really concerns.
> >
> > And this is a good change to charge-and-reclaim.
> >
> > Now, memory.limit is always checked before memsw.limit
> > and it may do swap-out. But, if memory.limit == memsw.limit, swap-out is
> > finally no help and hits memsw.limit again. So, let's allow the condition
> > memory.limit > memsw.limit. Then we can skip unnecesary swap-out.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
>
> We can't change behaviour this way without breaking userspace scripts,
> API and code. What does it mean for people already using these
> features? Does it break their workflow?
>

Hopefully no breaks to current users's workflow.
Because this just has influences to "error path" like below

echo 200M > memory.memsw.limit
echo 300M > memory.limit
=> ERROR

If the user program made in sane way, above case will never happens because
they set memsw.limit to be greater than memory.limit and above is treated as error.


Thanks,
-Kame

2009-06-04 12:36:36

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] remove memory.limit v.s. memsw.limit comparison.

* KAMEZAWA Hiroyuki <[email protected]> [2009-06-04 14:10:43]:

> From: KAMEZAWA Hiroyuki <[email protected]>
>
> Removes memory.limit < memsw.limit at setting limit check completely.
>
> The limitation "memory.limit <= memsw.limit" was added just because
> it seems sane ...if memory.limit > memsw.limit, only memsw.limit works.
>
> But To implement this limitation, we needed to use private mutex and make
> the code a bit complated.
> As Nishimura pointed out, in real world, there are people who only want
> to use memsw.limit.
>
> Then, this patch removes the check. user-land library or middleware can check
> this in userland easily if this really concerns.
>
> And this is a good change to charge-and-reclaim.
>
> Now, memory.limit is always checked before memsw.limit
> and it may do swap-out. But, if memory.limit == memsw.limit, swap-out is
> finally no help and hits memsw.limit again. So, let's allow the condition
> memory.limit > memsw.limit. Then we can skip unnecesary swap-out.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> ---

There is one other option, we could set memory.limit_in_bytes ==
memory.memsw.limit_in_bytes provided it is set to LONG_LONG_MAX. I am
not convinced that we should allow memsw.limit_in_bytes to be less
that limit_in_bytes, it will create confusion and the API is already
exposed.


> Documentation/cgroups/memory.txt | 15 +++++++++++----
> mm/memcontrol.c | 33 +--------------------------------
> 2 files changed, 12 insertions(+), 36 deletions(-)
>
> Index: mmotm-2.6.30-Jun3/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.30-Jun3.orig/mm/memcontrol.c
> +++ mmotm-2.6.30-Jun3/mm/memcontrol.c
> @@ -1713,14 +1713,11 @@ int mem_cgroup_shmem_charge_fallback(str
> return ret;
> }
>
> -static DEFINE_MUTEX(set_limit_mutex);
> -
> static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
> unsigned long long val)
> {
> int retry_count;
> int progress;
> - u64 memswlimit;
> int ret = 0;
> int children = mem_cgroup_count_children(memcg);
> u64 curusage, oldusage;
> @@ -1739,20 +1736,7 @@ static int mem_cgroup_resize_limit(struc
> ret = -EINTR;
> break;
> }
> - /*
> - * Rather than hide all in some function, I do this in
> - * open coded manner. You see what this really does.
> - * We have to guarantee mem->res.limit < mem->memsw.limit.
> - */
> - mutex_lock(&set_limit_mutex);
> - memswlimit = res_counter_read_u64(&memcg->memsw, RES_LIMIT);
> - if (memswlimit < val) {
> - ret = -EINVAL;
> - mutex_unlock(&set_limit_mutex);
> - break;
> - }
> ret = res_counter_set_limit(&memcg->res, val);
> - mutex_unlock(&set_limit_mutex);
>
> if (!ret)
> break;
> @@ -1774,7 +1758,7 @@ static int mem_cgroup_resize_memsw_limit
> unsigned long long val)
> {
> int retry_count;
> - u64 memlimit, oldusage, curusage;
> + u64 oldusage, curusage;
> int children = mem_cgroup_count_children(memcg);
> int ret = -EBUSY;
>
> @@ -1786,24 +1770,9 @@ static int mem_cgroup_resize_memsw_limit
> ret = -EINTR;
> break;
> }
> - /*
> - * Rather than hide all in some function, I do this in
> - * open coded manner. You see what this really does.
> - * We have to guarantee mem->res.limit < mem->memsw.limit.
> - */
> - mutex_lock(&set_limit_mutex);
> - memlimit = res_counter_read_u64(&memcg->res, RES_LIMIT);
> - if (memlimit > val) {
> - ret = -EINVAL;
> - mutex_unlock(&set_limit_mutex);
> - break;
> - }
> ret = res_counter_set_limit(&memcg->memsw, val);
> - mutex_unlock(&set_limit_mutex);
> -
> if (!ret)
> break;
> -
> mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true, true);
> curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> /* Usage is reduced ? */
> Index: mmotm-2.6.30-Jun3/Documentation/cgroups/memory.txt
> ===================================================================
> --- mmotm-2.6.30-Jun3.orig/Documentation/cgroups/memory.txt
> +++ mmotm-2.6.30-Jun3/Documentation/cgroups/memory.txt
> @@ -155,11 +155,18 @@ usage of mem+swap is limited by memsw.li
> Note: why 'mem+swap' rather than swap.
> The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
> to move account from memory to swap...there is no change in usage of
> -mem+swap.
> +mem+swap. In other words, when we want to limit the usage of swap
> +without affecting global LRU, mem+swap limit is better than just limiting
> +swap from OS point of view.
> +
> +
> +memory.limit v.s. memsw.limit
> +
> +There are no guarantee that memsw.limit is bigger than memory.limit
> +in the kernel. The user should notice what he really wants and use
> +proper size for limitation. Of course, if memsw.limit < memory.limit,
> +only memsw.limit works sane.

I think this needs rewording (if we go with this patch)

We should say that the lower of the two limits will be imposed. If
memory.memsw.limit_in_bytes < memory.limit_in_bytes then swap is not
used for the cgroup.

>
> -In other words, when we want to limit the usage of swap without affecting
> -global LRU, mem+swap limit is better than just limiting swap from OS point
> -of view.
>
> 2.5 Reclaim
>
>

--
Balbir

2009-06-04 15:45:18

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] remove memory.limit v.s. memsw.limit comparison.

Balbir Singh wrote:
> * KAMEZAWA Hiroyuki <[email protected]> [2009-06-04
> 14:10:43]:
>
>> From: KAMEZAWA Hiroyuki <[email protected]>
>>
>> Removes memory.limit < memsw.limit at setting limit check completely.
>>
>> The limitation "memory.limit <= memsw.limit" was added just because
>> it seems sane ...if memory.limit > memsw.limit, only memsw.limit works.
>>
>> But To implement this limitation, we needed to use private mutex and
>> make
>> the code a bit complated.
>> As Nishimura pointed out, in real world, there are people who only want
>> to use memsw.limit.
>>
>> Then, this patch removes the check. user-land library or middleware can
>> check
>> this in userland easily if this really concerns.
>>
>> And this is a good change to charge-and-reclaim.
>>
>> Now, memory.limit is always checked before memsw.limit
>> and it may do swap-out. But, if memory.limit == memsw.limit, swap-out is
>> finally no help and hits memsw.limit again. So, let's allow the
>> condition
>> memory.limit > memsw.limit. Then we can skip unnecesary swap-out.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
>> ---
>
> There is one other option, we could set memory.limit_in_bytes ==
> memory.memsw.limit_in_bytes provided it is set to LONG_LONG_MAX. I am
> not convinced that we should allow memsw.limit_in_bytes to be less
> that limit_in_bytes, it will create confusion and the API is already
> exposed.
>
Ahhhh, I get your point.
if memory.memsw.limit_in_bytes < memory.limit_in_bytes, no swap will
be used bacause currnet try_to_free_pages() for memcg skips swap-out.
Then, only global-LRU will use swap.
This behavior is not easy to understand.

Sorry, I don't push this patch as this is. But adding documentation about
"What happens when you set memory.limit == memsw.limit" will be necessary.

...maybe give all jobs to user-land and keep the kernel as it is now
is a good choice.

BTW, I'd like to avoid useless swap-out in memory.limit == memsw.limit case.
If someone has good idea, please :(

Thanks,
-Kame

2009-06-05 00:40:38

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [PATCH] remove memory.limit v.s. memsw.limit comparison.

On Fri, 5 Jun 2009 00:45:03 +0900 (JST), "KAMEZAWA Hiroyuki" <[email protected]> wrote:
> Balbir Singh wrote:
> > * KAMEZAWA Hiroyuki <[email protected]> [2009-06-04
> > 14:10:43]:
> >
> >> From: KAMEZAWA Hiroyuki <[email protected]>
> >>
> >> Removes memory.limit < memsw.limit at setting limit check completely.
> >>
> >> The limitation "memory.limit <= memsw.limit" was added just because
> >> it seems sane ...if memory.limit > memsw.limit, only memsw.limit works.
> >>
> >> But To implement this limitation, we needed to use private mutex and
> >> make
> >> the code a bit complated.
> >> As Nishimura pointed out, in real world, there are people who only want
> >> to use memsw.limit.
> >>
> >> Then, this patch removes the check. user-land library or middleware can
> >> check
> >> this in userland easily if this really concerns.
> >>
> >> And this is a good change to charge-and-reclaim.
> >>
> >> Now, memory.limit is always checked before memsw.limit
> >> and it may do swap-out. But, if memory.limit == memsw.limit, swap-out is
> >> finally no help and hits memsw.limit again. So, let's allow the
> >> condition
> >> memory.limit > memsw.limit. Then we can skip unnecesary swap-out.
> >>
> >> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> >> ---
> >
> > There is one other option, we could set memory.limit_in_bytes ==
> > memory.memsw.limit_in_bytes provided it is set to LONG_LONG_MAX. I am
> > not convinced that we should allow memsw.limit_in_bytes to be less
> > that limit_in_bytes, it will create confusion and the API is already
> > exposed.
> >
> Ahhhh, I get your point.
> if memory.memsw.limit_in_bytes < memory.limit_in_bytes, no swap will
> be used bacause currnet try_to_free_pages() for memcg skips swap-out.
> Then, only global-LRU will use swap.
> This behavior is not easy to understand.
>
> Sorry, I don't push this patch as this is. But adding documentation about
> "What happens when you set memory.limit == memsw.limit" will be necessary.
>
I agree.

> ...maybe give all jobs to user-land and keep the kernel as it is now
> is a good choice.
>
> BTW, I'd like to avoid useless swap-out in memory.limit == memsw.limit case.
> If someone has good idea, please :(
>
I think so too.

>From my simple thoughts, how about changing __mem_cgroup_try_charge() like:

1. initialize "noswap" as "bool noswap = !!(mem->res.limit == mem->memsw.limit)".
2. add check "if (mem->res.limit == mem->memsw.limit)" on charge failure to mem->res
and set "noswap" to true if needed.
3. charge mem->memsw before mem->res.

There would be other ideas, but I prefer 1 among these choices.


Thanks,
Daisuke Nishimura.

2009-06-05 01:10:31

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] remove memory.limit v.s. memsw.limit comparison.

* KAMEZAWA Hiroyuki <[email protected]> [2009-06-05 00:45:03]:

> Balbir Singh wrote:
> > * KAMEZAWA Hiroyuki <[email protected]> [2009-06-04
> > 14:10:43]:
> >
> >> From: KAMEZAWA Hiroyuki <[email protected]>
> >>
> >> Removes memory.limit < memsw.limit at setting limit check completely.
> >>
> >> The limitation "memory.limit <= memsw.limit" was added just because
> >> it seems sane ...if memory.limit > memsw.limit, only memsw.limit works.
> >>
> >> But To implement this limitation, we needed to use private mutex and
> >> make
> >> the code a bit complated.
> >> As Nishimura pointed out, in real world, there are people who only want
> >> to use memsw.limit.
> >>
> >> Then, this patch removes the check. user-land library or middleware can
> >> check
> >> this in userland easily if this really concerns.
> >>
> >> And this is a good change to charge-and-reclaim.
> >>
> >> Now, memory.limit is always checked before memsw.limit
> >> and it may do swap-out. But, if memory.limit == memsw.limit, swap-out is
> >> finally no help and hits memsw.limit again. So, let's allow the
> >> condition
> >> memory.limit > memsw.limit. Then we can skip unnecesary swap-out.
> >>
> >> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> >> ---
> >
> > There is one other option, we could set memory.limit_in_bytes ==
> > memory.memsw.limit_in_bytes provided it is set to LONG_LONG_MAX. I am
> > not convinced that we should allow memsw.limit_in_bytes to be less
> > that limit_in_bytes, it will create confusion and the API is already
> > exposed.
> >
> Ahhhh, I get your point.
> if memory.memsw.limit_in_bytes < memory.limit_in_bytes, no swap will
> be used bacause currnet try_to_free_pages() for memcg skips swap-out.
> Then, only global-LRU will use swap.
> This behavior is not easy to understand.
>
> Sorry, I don't push this patch as this is. But adding documentation about
> "What happens when you set memory.limit == memsw.limit" will be necessary.
>
> ...maybe give all jobs to user-land and keep the kernel as it is now
> is a good choice.

Yes, probably and with libcgroup and configuration, defaults should
not be hard to setup. Worst case we can use a script to setup both the
values.

>
> BTW, I'd like to avoid useless swap-out in memory.limit == memsw.limit case.
> If someone has good idea, please :(
>

Are you seeing swap even with memory.limit == memory.memsw.limit? Only
global pressure should cause swapout, no?

--
Balbir

2009-06-05 02:20:48

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] remove memory.limit v.s. memsw.limit comparison.

On Fri, 5 Jun 2009 09:10:19 +0800
Balbir Singh <[email protected]> wrote:

> * KAMEZAWA Hiroyuki <[email protected]> [2009-06-05 00:45:03]:
>
> > Balbir Singh wrote:
> > > * KAMEZAWA Hiroyuki <[email protected]> [2009-06-04
> > > 14:10:43]:
> > >
> > >> From: KAMEZAWA Hiroyuki <[email protected]>
> > >>
> > >> Removes memory.limit < memsw.limit at setting limit check completely.
> > >>
> > >> The limitation "memory.limit <= memsw.limit" was added just because
> > >> it seems sane ...if memory.limit > memsw.limit, only memsw.limit works.
> > >>
> > >> But To implement this limitation, we needed to use private mutex and
> > >> make
> > >> the code a bit complated.
> > >> As Nishimura pointed out, in real world, there are people who only want
> > >> to use memsw.limit.
> > >>
> > >> Then, this patch removes the check. user-land library or middleware can
> > >> check
> > >> this in userland easily if this really concerns.
> > >>
> > >> And this is a good change to charge-and-reclaim.
> > >>
> > >> Now, memory.limit is always checked before memsw.limit
> > >> and it may do swap-out. But, if memory.limit == memsw.limit, swap-out is
> > >> finally no help and hits memsw.limit again. So, let's allow the
> > >> condition
> > >> memory.limit > memsw.limit. Then we can skip unnecesary swap-out.
> > >>
> > >> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> > >> ---
> > >
> > > There is one other option, we could set memory.limit_in_bytes ==
> > > memory.memsw.limit_in_bytes provided it is set to LONG_LONG_MAX. I am
> > > not convinced that we should allow memsw.limit_in_bytes to be less
> > > that limit_in_bytes, it will create confusion and the API is already
> > > exposed.
> > >
> > Ahhhh, I get your point.
> > if memory.memsw.limit_in_bytes < memory.limit_in_bytes, no swap will
> > be used bacause currnet try_to_free_pages() for memcg skips swap-out.
> > Then, only global-LRU will use swap.
> > This behavior is not easy to understand.
> >
> > Sorry, I don't push this patch as this is. But adding documentation about
> > "What happens when you set memory.limit == memsw.limit" will be necessary.
> >
> > ...maybe give all jobs to user-land and keep the kernel as it is now
> > is a good choice.
>
> Yes, probably and with libcgroup and configuration, defaults should
> not be hard to setup. Worst case we can use a script to setup both the
> values.
>
> >
> > BTW, I'd like to avoid useless swap-out in memory.limit == memsw.limit case.
> > If someone has good idea, please :(
> >
>
> Are you seeing swap even with memory.limit == memory.memsw.limit? Only
> global pressure should cause swapout, no?
>
yes, only global pressure should cause swapout.
Then, this means processes can be killed by OOM even while no swap usage.
This is a _new_ case under the condition memcg is used. Then, documentation will
be good.

Thanks,
-Kame

2009-06-05 02:22:18

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] remove memory.limit v.s. memsw.limit comparison.

On Fri, 5 Jun 2009 09:34:20 +0900
Daisuke Nishimura <[email protected]> wrote:
> > Sorry, I don't push this patch as this is. But adding documentation about
> > "What happens when you set memory.limit == memsw.limit" will be necessary.
> >
> I agree.
>
I'd like to prepare some.

> > ...maybe give all jobs to user-land and keep the kernel as it is now
> > is a good choice.
> >
> > BTW, I'd like to avoid useless swap-out in memory.limit == memsw.limit case.
> > If someone has good idea, please :(
> >
> I think so too.
>
> From my simple thoughts, how about changing __mem_cgroup_try_charge() like:
>
> 1. initialize "noswap" as "bool noswap = !!(mem->res.limit == mem->memsw.limit)".
> 2. add check "if (mem->res.limit == mem->memsw.limit)" on charge failure to mem->res
> and set "noswap" to true if needed.
> 3. charge mem->memsw before mem->res.
>
> There would be other ideas, but I prefer 1 among these choices.
>
ok, thank you for advices.

Regards,
-Kame


>
> Thanks,
> Daisuke Nishimura.
>

2009-06-05 04:07:58

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [PATCH] remove memory.limit v.s. memsw.limit comparison.

On Fri, 5 Jun 2009 09:10:19 +0800, Balbir Singh <[email protected]> wrote:
> * KAMEZAWA Hiroyuki <[email protected]> [2009-06-05 00:45:03]:
>
> > Balbir Singh wrote:
> > > * KAMEZAWA Hiroyuki <[email protected]> [2009-06-04
> > > 14:10:43]:
> > >
> > >> From: KAMEZAWA Hiroyuki <[email protected]>
> > >>
> > >> Removes memory.limit < memsw.limit at setting limit check completely.
> > >>
> > >> The limitation "memory.limit <= memsw.limit" was added just because
> > >> it seems sane ...if memory.limit > memsw.limit, only memsw.limit works.
> > >>
> > >> But To implement this limitation, we needed to use private mutex and
> > >> make
> > >> the code a bit complated.
> > >> As Nishimura pointed out, in real world, there are people who only want
> > >> to use memsw.limit.
> > >>
> > >> Then, this patch removes the check. user-land library or middleware can
> > >> check
> > >> this in userland easily if this really concerns.
> > >>
> > >> And this is a good change to charge-and-reclaim.
> > >>
> > >> Now, memory.limit is always checked before memsw.limit
> > >> and it may do swap-out. But, if memory.limit == memsw.limit, swap-out is
> > >> finally no help and hits memsw.limit again. So, let's allow the
> > >> condition
> > >> memory.limit > memsw.limit. Then we can skip unnecesary swap-out.
> > >>
> > >> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> > >> ---
> > >
> > > There is one other option, we could set memory.limit_in_bytes ==
> > > memory.memsw.limit_in_bytes provided it is set to LONG_LONG_MAX. I am
> > > not convinced that we should allow memsw.limit_in_bytes to be less
> > > that limit_in_bytes, it will create confusion and the API is already
> > > exposed.
> > >
> > Ahhhh, I get your point.
> > if memory.memsw.limit_in_bytes < memory.limit_in_bytes, no swap will
> > be used bacause currnet try_to_free_pages() for memcg skips swap-out.
> > Then, only global-LRU will use swap.
> > This behavior is not easy to understand.
> >
> > Sorry, I don't push this patch as this is. But adding documentation about
> > "What happens when you set memory.limit == memsw.limit" will be necessary.
> >
> > ...maybe give all jobs to user-land and keep the kernel as it is now
> > is a good choice.
>
> Yes, probably and with libcgroup and configuration, defaults should
> not be hard to setup. Worst case we can use a script to setup both the
> values.
>
> >
> > BTW, I'd like to avoid useless swap-out in memory.limit == memsw.limit case.
> > If someone has good idea, please :(
> >
>
> Are you seeing swap even with memory.limit == memory.memsw.limit? Only
> global pressure should cause swapout, no?
>
I don't remember in detail, but I could see swap usage by "free" command
even when I set "memory.limit == memory.memsw.limit" long ago, the usage
was small though.
It cannot be helped in current implementation, because mem.limit is checked
before memsw.limit, and it is a waste of cpu time to scan anon-lru when
"memory.limit == memory.memsw.limit".


Thanks,
Daisuke Nishimura.

2009-06-05 07:17:17

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [PATCH] memcg: fix behavior under memory.limit equals to memsw.limit

I believe memory.limit==memsw.limit is an important special case and should
be handled properly.

-Kame
==
From: KAMEZAWA Hiroyuki <[email protected]>

A user can set memcg.limit_in_bytes == memcg.memsw.limit_in_bytes when
the user just want to limit the total size of applications, in other words,
not very interested in memory usage itself.
In this case, swap-out will be done only by global-LRU.

But, under current implementation, memory.limit_in_bytes is checked at first
and try_to_free_page() may do swap-out. But, that swap-out is useless for
memsw.limit_in_bytes and the thread may hit limit again.

This patch tries to fix the current behavior at memory.limit == memsw.limit
case. And documentation is updated to explain the behavior of this special
case.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

---
Index: mmotm-2.6.30-Jun4/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-2.6.30-Jun4.orig/Documentation/cgroups/memory.txt
+++ mmotm-2.6.30-Jun4/Documentation/cgroups/memory.txt
@@ -152,14 +152,19 @@ When swap is accounted, following files

usage of mem+swap is limited by memsw.limit_in_bytes.

-Note: why 'mem+swap' rather than swap.
+* why 'mem+swap' rather than swap.
The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
to move account from memory to swap...there is no change in usage of
-mem+swap.
-
-In other words, when we want to limit the usage of swap without affecting
-global LRU, mem+swap limit is better than just limiting swap from OS point
-of view.
+mem+swap. In other words, when we want to limit the usage of swap without
+affecting global LRU, mem+swap limit is better than just limiting swap from
+OS point of view.
+
+* What happens when a cgroup hits memory.memsw.limit_in_bytes
+When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out
+in this cgroup. Then, swap-out will not be done by cgroup routine and file
+caches are dropped. But as mentioned above, global LRU can do swapout memory
+from it for sanity of the system's memory management state. You can't forbid
+it by cgroup.

2.5 Reclaim

Index: mmotm-2.6.30-Jun4/mm/memcontrol.c
===================================================================
--- mmotm-2.6.30-Jun4.orig/mm/memcontrol.c
+++ mmotm-2.6.30-Jun4/mm/memcontrol.c
@@ -177,6 +177,9 @@ struct mem_cgroup {

unsigned int swappiness;

+ /* set when res.limit == memsw.limit */
+ bool memsw_is_minimum;
+
/*
* statistics. This must be placed at the end of memcg.
*/
@@ -847,6 +850,10 @@ static int mem_cgroup_hierarchical_recla
int ret, total = 0;
int loop = 0;

+ /* If memsw_is_minimum==1, swap-out is of-no-use. */
+ if (root_mem->memsw_is_minimum)
+ noswap = true;
+
while (loop < 2) {
victim = mem_cgroup_select_victim(root_mem);
if (victim == root_mem)
@@ -1752,6 +1759,12 @@ static int mem_cgroup_resize_limit(struc
break;
}
ret = res_counter_set_limit(&memcg->res, val);
+ if (!ret) {
+ if (memswlimit == val)
+ memcg->memsw_is_minimum = true;
+ else
+ memcg->memsw_is_minimum = false;
+ }
mutex_unlock(&set_limit_mutex);

if (!ret)
@@ -1799,6 +1812,12 @@ static int mem_cgroup_resize_memsw_limit
break;
}
ret = res_counter_set_limit(&memcg->memsw, val);
+ if (!ret) {
+ if (memlimit == val)
+ memcg->memsw_is_minimum = true;
+ else
+ memcg->memsw_is_minimum = false;
+ }
mutex_unlock(&set_limit_mutex);

if (!ret)