LinuxLists.cc - Re: [RFC 0/3] Implementation of cgroup isolation

2011-03-28 18:01:26

Subject: Re: [RFC 0/3] Implementation of cgroup isolation

On Mon, Mar 28, 2011 at 2:39 AM, Michal Hocko <[email protected]> wrote:
> Hi all,
>
> Memory cgroups can be currently used to throttle memory usage of a group of
> processes. It, however, cannot be used for an isolation of processes from
> the rest of the system because all the pages that belong to the group are
> also placed on the global LRU lists and so they are eligible for the global
> memory reclaim.
>
> This patchset aims at providing an opt-in memory cgroup isolation. This
> means that a cgroup can be configured to be isolated from the rest of the
> system by means of cgroup virtual filesystem (/dev/memctl/group/memory.isolated).

Thank you Hugh pointing me to the thread. We are working on similar
problem in memcg currently

Here is the problem we see:
1. In memcg, a page is both on per-memcg-per-zone lru and global-lru.
2. Global memory reclaim will throw page away regardless of cgroup.
3. The zone->lru_lock is shared between per-memcg-per-zone lru and global-lru.

And we know:
1. We shouldn't do global reclaim since it breaks memory isolation.
2. There is no need for a page to be on both LRU list, especially
after having per-memcg background reclaim.

So our approach is to take off page from global lru after it is
charged to a memcg. Only pages allocated at root cgroup remains in
global LRU, and each memcg reclaims pages on its isolated LRU.

By doing this, we can further solve the lock contention mentioned in
3) to have per-memcg-per-zone lock. I can post the patch later if that
helps better understanding.

Thanks

--Ying

>
> Isolated mem cgroup can be particularly helpful in deployments where we have
> a primary service which needs to have a certain guarantees for memory
> resources (e.g. a database server) and we want to shield it off the
> rest of the system (e.g. a burst memory activity in another group). This is
> currently possible only with mlocking memory that is essential for the
> application(s) or a rather hacky configuration where the primary app is in
> the root mem cgroup while all the other system activity happens in other
> groups.
>
> mlocking is not an ideal solution all the time because sometimes the working
> set is very large and it depends on the workload (e.g. number of incoming
> requests) so it can end up not fitting in into memory (leading to a OOM
> killer). If we use mem. cgroup isolation instead we are keeping memory resident
> and if the working set goes wild we can still do per-cgroup reclaim so the
> service is less prone to be OOM killed.
>
> The patch series is split into 3 patches. First one adds a new flag into
> mem_cgroup structure which controls whether the group is isolated (false by
> default) and a cgroup fs interface to set it.
> The second patch implements interaction with the global LRU. The current
> semantic is that we are putting a page into a global LRU only if mem cgroup
> LRU functions say they do not want the page for themselves.
> The last patch prevents from soft reclaim if the group is isolated.
>
> I have tested the patches with the simple memory consumer (allocating
> private and shared anon memory and SYSV SHM).
>
> One instance (call it big consumer) running in the group and paging in the
> memory (>90% of cgroup limit) and sleeping for the rest of its life. Then I
> had a pool of consumers running in the same cgroup which page in smaller
> amount of memory and paging them in the loop to simulate in group memory
> pressure (call them sharks).
> The sum of consumed memory is more than memory.limit_in_bytes so some
> portion of the memory is swapped out.
> There is one consumer running in the root cgroup running in parallel which
> makes a pressure on the memory (to trigger background reclaim).
>
> Rss+cache of the group drops down significantly (~66% of the limit) if the
> group is not isolated. On the other hand if we isolate the group we are
> still saturating the group (~97% of the limit). I can show more
> comprehensive results if somebody is interested.
>
> Thanks for comments.
>
> ---
> ?include/linux/memcontrol.h | ? 24 ++++++++------
> ?include/linux/mm_inline.h ?| ? 10 ++++-
> ?mm/memcontrol.c ? ? ? ? ? ?| ? 76 ++++++++++++++++++++++++++++++++++++---------
> ?mm/swap.c ? ? ? ? ? ? ? ? ?| ? 12 ++++---
> ?mm/vmscan.c ? ? ? ? ? ? ? ?| ? 43 +++++++++++++++----------
> ?5 files changed, 118 insertions(+), 47 deletions(-)
>
> --
> Michal Hocko
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. ?For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2011-03-29 00:19:26

by Kamezawa Hiroyuki

[permalink] [raw]

Subject: Re: [RFC 0/3] Implementation of cgroup isolation

On Mon, 28 Mar 2011 11:01:18 -0700
Ying Han <[email protected]> wrote:

> On Mon, Mar 28, 2011 at 2:39 AM, Michal Hocko <[email protected]> wrote:
> > Hi all,
> >
> > Memory cgroups can be currently used to throttle memory usage of a group of
> > processes. It, however, cannot be used for an isolation of processes from
> > the rest of the system because all the pages that belong to the group are
> > also placed on the global LRU lists and so they are eligible for the global
> > memory reclaim.
> >
> > This patchset aims at providing an opt-in memory cgroup isolation. This
> > means that a cgroup can be configured to be isolated from the rest of the
> > system by means of cgroup virtual filesystem (/dev/memctl/group/memory.isolated).
>
> Thank you Hugh pointing me to the thread. We are working on similar
> problem in memcg currently
>
> Here is the problem we see:
> 1. In memcg, a page is both on per-memcg-per-zone lru and global-lru.
> 2. Global memory reclaim will throw page away regardless of cgroup.
> 3. The zone->lru_lock is shared between per-memcg-per-zone lru and global-lru.
>
> And we know:
> 1. We shouldn't do global reclaim since it breaks memory isolation.
> 2. There is no need for a page to be on both LRU list, especially
> after having per-memcg background reclaim.
>
> So our approach is to take off page from global lru after it is
> charged to a memcg. Only pages allocated at root cgroup remains in
> global LRU, and each memcg reclaims pages on its isolated LRU.
>

Why you don't use cpuset and virtual nodes ? It's what you want.

Thanks,
-Kame

2011-03-29 00:37:08

by Ying Han

[permalink] [raw]

Subject: Re: [RFC 0/3] Implementation of cgroup isolation

On Mon, Mar 28, 2011 at 5:12 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Mon, 28 Mar 2011 11:01:18 -0700
> Ying Han <[email protected]> wrote:
>
>> On Mon, Mar 28, 2011 at 2:39 AM, Michal Hocko <[email protected]> wrote:
>> > Hi all,
>> >
>> > Memory cgroups can be currently used to throttle memory usage of a group of
>> > processes. It, however, cannot be used for an isolation of processes from
>> > the rest of the system because all the pages that belong to the group are
>> > also placed on the global LRU lists and so they are eligible for the global
>> > memory reclaim.
>> >
>> > This patchset aims at providing an opt-in memory cgroup isolation. This
>> > means that a cgroup can be configured to be isolated from the rest of the
>> > system by means of cgroup virtual filesystem (/dev/memctl/group/memory.isolated).
>>
>> Thank you Hugh pointing me to the thread. We are working on similar
>> problem in memcg currently
>>
>> Here is the problem we see:
>> 1. In memcg, a page is both on per-memcg-per-zone lru and global-lru.
>> 2. Global memory reclaim will throw page away regardless of cgroup.
>> 3. The zone->lru_lock is shared between per-memcg-per-zone lru and global-lru.
>>
>> And we know:
>> 1. We shouldn't do global reclaim since it breaks memory isolation.
>> 2. There is no need for a page to be on both LRU list, especially
>> after having per-memcg background reclaim.
>>
>> So our approach is to take off page from global lru after it is
>> charged to a memcg. Only pages allocated at root cgroup remains in
>> global LRU, and each memcg reclaims pages on its isolated LRU.
>>
>
> Why you don't use cpuset and virtual nodes ? It's what you want.

We've been running cpuset + fakenuma nodes configuration in google to
provide memory isolation. The configuration of having the virtual box
is complex which user needs to know great details of the which node to
assign to which cgroup. That is one of the motivations for us moving
towards to memory controller which simply do memory accounting no
matter where pages are allocated.

By saying that, memcg simplified the memory accounting per-cgroup but
the memory isolation is broken. This is one of examples where pages
are shared between global LRU and per-memcg LRU. It is easy to get
cgroup-A's page evicted by adding memory pressure to cgroup-B.

The approach we are thinking to make the page->lru exclusive solve the
problem. and also we should be able to break the zone->lru_lock
sharing.

--Ying

>
> Thanks,
> -Kame
>
>

2011-03-29 00:54:29

by Kamezawa Hiroyuki

[permalink] [raw]

Subject: Re: [RFC 0/3] Implementation of cgroup isolation

On Mon, 28 Mar 2011 17:37:02 -0700
Ying Han <[email protected]> wrote:

> On Mon, Mar 28, 2011 at 5:12 PM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
> > On Mon, 28 Mar 2011 11:01:18 -0700
> > Ying Han <[email protected]> wrote:
> >
> >> On Mon, Mar 28, 2011 at 2:39 AM, Michal Hocko <[email protected]> wrote:
> >> > Hi all,
> >> >
> >> > Memory cgroups can be currently used to throttle memory usage of a group of
> >> > processes. It, however, cannot be used for an isolation of processes from
> >> > the rest of the system because all the pages that belong to the group are
> >> > also placed on the global LRU lists and so they are eligible for the global
> >> > memory reclaim.
> >> >
> >> > This patchset aims at providing an opt-in memory cgroup isolation. This
> >> > means that a cgroup can be configured to be isolated from the rest of the
> >> > system by means of cgroup virtual filesystem (/dev/memctl/group/memory.isolated).
> >>
> >> Thank you Hugh pointing me to the thread. We are working on similar
> >> problem in memcg currently
> >>
> >> Here is the problem we see:
> >> 1. In memcg, a page is both on per-memcg-per-zone lru and global-lru.
> >> 2. Global memory reclaim will throw page away regardless of cgroup.
> >> 3. The zone->lru_lock is shared between per-memcg-per-zone lru and global-lru.
> >>
> >> And we know:
> >> 1. We shouldn't do global reclaim since it breaks memory isolation.
> >> 2. There is no need for a page to be on both LRU list, especially
> >> after having per-memcg background reclaim.
> >>
> >> So our approach is to take off page from global lru after it is
> >> charged to a memcg. Only pages allocated at root cgroup remains in
> >> global LRU, and each memcg reclaims pages on its isolated LRU.
> >>
> >
> > Why you don't use cpuset and virtual nodes ? It's what you want.
>
> We've been running cpuset + fakenuma nodes configuration in google to
> provide memory isolation. The configuration of having the virtual box
> is complex which user needs to know great details of the which node to
> assign to which cgroup. That is one of the motivations for us moving
> towards to memory controller which simply do memory accounting no
> matter where pages are allocated.
>

I think current fake-numa is not useful because it works only at boot time.

> By saying that, memcg simplified the memory accounting per-cgroup but
> the memory isolation is broken. This is one of examples where pages
> are shared between global LRU and per-memcg LRU. It is easy to get
> cgroup-A's page evicted by adding memory pressure to cgroup-B.
>
If you overcommit....Right ?

> The approach we are thinking to make the page->lru exclusive solve the
> problem. and also we should be able to break the zone->lru_lock
> sharing.
>
Is zone->lru_lock is a problem even with the help of pagevecs ?

If LRU management guys acks you to isolate LRUs and to make kswapd etc..
more complex, okay, we'll go that way. This will _change_ the whole
memcg design and concepts Maybe memcg should have some kind of balloon driver to
work happy with isolated lru.

But my current standing position is "never bad effects global reclaim".
So, I'm not very happy with the solution.

If we go that way, I guess we'll think we should have pseudo nodes/zones, which
was proposed in early days of resource controls.(not cgroup).

Thanks,
-Kame

2011-03-29 02:36:11

by Kamezawa Hiroyuki

[permalink] [raw]

Subject: Re: [RFC 0/3] Implementation of cgroup isolation

On Tue, 29 Mar 2011 09:47:56 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> On Mon, 28 Mar 2011 17:37:02 -0700
> Ying Han <[email protected]> wrote:

> > The approach we are thinking to make the page->lru exclusive solve the
> > problem. and also we should be able to break the zone->lru_lock
> > sharing.
> >
> Is zone->lru_lock is a problem even with the help of pagevecs ?
>
> If LRU management guys acks you to isolate LRUs and to make kswapd etc..
> more complex, okay, we'll go that way. This will _change_ the whole
> memcg design and concepts Maybe memcg should have some kind of balloon driver to
> work happy with isolated lru.
>
> But my current standing position is "never bad effects global reclaim".
> So, I'm not very happy with the solution.
>
> If we go that way, I guess we'll think we should have pseudo nodes/zones, which
> was proposed in early days of resource controls.(not cgroup).
>

BTW, against isolation, I have one thought.

Now, soft_limit_reclaim is not called in direct-reclaim path just because we thought
kswapd works enough well. If necessary, I think we can put soft-reclaim call in
generic do_try_to_free_pages(order=0).

So, isolation problem can be reduced to some extent, isn't it ?
Algorithm of softlimit _should_ be updated. I guess it's not heavily tested feature.

About ROOT cgroup, I think some daemon application should put _all_ process to
some controled cgroup. So, I don't want to think about limiting on ROOT cgroup
without any justification.

I'd like you to devide 'the talk on performance' and 'the talk on feature'.

"This makes makes performance better! ...and add an feature" sounds bad to me.

Thanks,
-Kame

2011-03-29 02:46:47

by Ying Han

[permalink] [raw]

Subject: Re: [RFC 0/3] Implementation of cgroup isolation

On Mon, Mar 28, 2011 at 5:47 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Mon, 28 Mar 2011 17:37:02 -0700
> Ying Han <[email protected]> wrote:
>
>> On Mon, Mar 28, 2011 at 5:12 PM, KAMEZAWA Hiroyuki
>> <[email protected]> wrote:
>> > On Mon, 28 Mar 2011 11:01:18 -0700
>> > Ying Han <[email protected]> wrote:
>> >
>> >> On Mon, Mar 28, 2011 at 2:39 AM, Michal Hocko <[email protected]> wrote:
>> >> > Hi all,
>> >> >
>> >> > Memory cgroups can be currently used to throttle memory usage of a group of
>> >> > processes. It, however, cannot be used for an isolation of processes from
>> >> > the rest of the system because all the pages that belong to the group are
>> >> > also placed on the global LRU lists and so they are eligible for the global
>> >> > memory reclaim.
>> >> >
>> >> > This patchset aims at providing an opt-in memory cgroup isolation. This
>> >> > means that a cgroup can be configured to be isolated from the rest of the
>> >> > system by means of cgroup virtual filesystem (/dev/memctl/group/memory.isolated).
>> >>
>> >> Thank you Hugh pointing me to the thread. We are working on similar
>> >> problem in memcg currently
>> >>
>> >> Here is the problem we see:
>> >> 1. In memcg, a page is both on per-memcg-per-zone lru and global-lru.
>> >> 2. Global memory reclaim will throw page away regardless of cgroup.
>> >> 3. The zone->lru_lock is shared between per-memcg-per-zone lru and global-lru.
>> >>
>> >> And we know:
>> >> 1. We shouldn't do global reclaim since it breaks memory isolation.
>> >> 2. There is no need for a page to be on both LRU list, especially
>> >> after having per-memcg background reclaim.
>> >>
>> >> So our approach is to take off page from global lru after it is
>> >> charged to a memcg. Only pages allocated at root cgroup remains in
>> >> global LRU, and each memcg reclaims pages on its isolated LRU.
>> >>
>> >
>> > Why you don't use cpuset and virtual nodes ? It's what you want.
>>
>> We've been running cpuset + fakenuma nodes configuration in google to
>> provide memory isolation. The configuration of having the virtual box
>> is complex which user needs to know great details of the which node to
>> assign to which cgroup. That is one of the motivations for us moving
>> towards to memory controller which simply do memory accounting no
>> matter where pages are allocated.
>>
>
> I think current fake-numa is not useful because it works only at boot time.

yes and the big hassle is to manage the nodes after the boot-up.

>
>> By saying that, memcg simplified the memory accounting per-cgroup but
>> the memory isolation is broken. This is one of examples where pages
>> are shared between global LRU and per-memcg LRU. It is easy to get
>> cgroup-A's page evicted by adding memory pressure to cgroup-B.
>>
> If you overcommit....Right ?

yes, we want to support the configuration of over-committing the
machine w/ limit_in_bytes.

>
>
>> The approach we are thinking to make the page->lru exclusive solve the
>> problem. and also we should be able to break the zone->lru_lock
>> sharing.
>>
> Is zone->lru_lock is a problem even with the help of pagevecs ?

> If LRU management guys acks you to isolate LRUs and to make kswapd etc..
> more complex, okay, we'll go that way.

I would assume the change only apply to memcg users , otherwise
everything is leaving in the global LRU list.

This will _change_ the whole memcg design and concepts Maybe memcg
should have some kind of balloon driver to
> work happy with isolated lru.

We have soft_limit hierarchical reclaim for system memory pressure,
and also we will add per-memcg background reclaim. Both of them do
targeting reclaim on per-memcg LRUs, and where is the balloon driver
needed?

Thanks

--Ying

> But my current standing position is "never bad effects global reclaim".
> So, I'm not very happy with the solution.
>
> If we go that way, I guess we'll think we should have pseudo nodes/zones, which
> was proposed in early days of resource controls.(not cgroup).
>
> Thanks,
> -Kame
>
>
>
>
>
>
>
>
>

2011-03-29 02:52:28

by Kamezawa Hiroyuki

[permalink] [raw]

Subject: Re: [RFC 0/3] Implementation of cgroup isolation

On Mon, 28 Mar 2011 19:46:41 -0700
Ying Han <[email protected]> wrote:

> On Mon, Mar 28, 2011 at 5:47 PM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:

> >
> >> By saying that, memcg simplified the memory accounting per-cgroup but
> >> the memory isolation is broken. This is one of examples where pages
> >> are shared between global LRU and per-memcg LRU. It is easy to get
> >> cgroup-A's page evicted by adding memory pressure to cgroup-B.
> >>
> > If you overcommit....Right ?
>
> yes, we want to support the configuration of over-committing the
> machine w/ limit_in_bytes.
>

Then, soft_limit is a feature for fixing the problem. If you have problem
with soft_limit, let's fix it.

> >
> >
> >> The approach we are thinking to make the page->lru exclusive solve the
> >> problem. and also we should be able to break the zone->lru_lock
> >> sharing.
> >>
> > Is zone->lru_lock is a problem even with the help of pagevecs ?
>
> > If LRU management guys acks you to isolate LRUs and to make kswapd etc..
> > more complex, okay, we'll go that way.
>
> I would assume the change only apply to memcg users , otherwise
> everything is leaving in the global LRU list.
>
> This will _change_ the whole memcg design and concepts Maybe memcg
> should have some kind of balloon driver to
> > work happy with isolated lru.
>
> We have soft_limit hierarchical reclaim for system memory pressure,
> and also we will add per-memcg background reclaim. Both of them do
> targeting reclaim on per-memcg LRUs, and where is the balloon driver
> needed?
>

If soft_limit is _not_ enough. And I think you background reclaim should
be work with soft_limit and be triggered by global memory pressure.

As wrote in other mail, it's not called via direct reclaim.
Maybe its the 1st point to be shooted rather than trying big change.

Thanks,
-Kame

2011-03-29 03:02:48

by Ying Han

[permalink] [raw]

Subject: Re: [RFC 0/3] Implementation of cgroup isolation

On Mon, Mar 28, 2011 at 7:29 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Tue, 29 Mar 2011 09:47:56 +0900
> KAMEZAWA Hiroyuki <[email protected]> wrote:
>
>> On Mon, 28 Mar 2011 17:37:02 -0700
>> Ying Han <[email protected]> wrote:
>
>> > The approach we are thinking to make the page->lru exclusive solve the
>> > problem. and also we should be able to break the zone->lru_lock
>> > sharing.
>> >
>> Is zone->lru_lock is a problem even with the help of pagevecs ?
>>
>> If LRU management guys acks you to isolate LRUs and to make kswapd etc..
>> more complex, okay, we'll go that way. This will _change_ the whole
>> memcg design and concepts Maybe memcg should have some kind of balloon driver to
>> work happy with isolated lru.
>>
>> But my current standing position is "never bad effects global reclaim".
>> So, I'm not very happy with the solution.
>>
>> If we go that way, I guess we'll think we should have pseudo nodes/zones, which
>> was proposed in early days of resource controls.(not cgroup).
>>
>
> BTW, against isolation, I have one thought.
>
> Now, soft_limit_reclaim is not called in direct-reclaim path just because we thought
> kswapd works enough well. If necessary, I think we can put soft-reclaim call in
> generic do_try_to_free_pages(order=0).

We were talking about that internally and that definitely make sense to add.

>
> So, isolation problem can be reduced to some extent, isn't it ?
> Algorithm of softlimit _should_ be updated. I guess it's not heavily tested feature.

Agree and that is something we might want to go and fix. soft_limit in
general provides a nice way to
over_committing the machine, and still have control of doing target
reclaim under system memory pressure.

>
> About ROOT cgroup, I think some daemon application should put _all_ process to
> some controled cgroup. So, I don't want to think about limiting on ROOT cgroup
> without any justification.
>
> I'd like you to devide 'the talk on performance' and 'the talk on feature'.
>
> "This makes makes performance better! ...and add an feature" sounds bad to me.

Ok, then let's stick on the memory isolation feature now :)

--Ying
>
> Thanks,
> -Kame
>
>

2011-03-29 04:03:53

by Ying Han

[permalink] [raw]

Subject: Re: [RFC 0/3] Implementation of cgroup isolation

On Mon, Mar 28, 2011 at 7:45 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Mon, 28 Mar 2011 19:46:41 -0700
> Ying Han <[email protected]> wrote:
>
>> On Mon, Mar 28, 2011 at 5:47 PM, KAMEZAWA Hiroyuki
>> <[email protected]> wrote:
>
>> >
>> >> By saying that, memcg simplified the memory accounting per-cgroup but
>> >> the memory isolation is broken. This is one of examples where pages
>> >> are shared between global LRU and per-memcg LRU. It is easy to get
>> >> cgroup-A's page evicted by adding memory pressure to cgroup-B.
>> >>
>> > If you overcommit....Right ?
>>
>> yes, we want to support the configuration of over-committing the
>> machine w/ limit_in_bytes.
>>
>
> Then, soft_limit is a feature for fixing the problem. If you have problem
> with soft_limit, let's fix it.

The current implementation of soft_limit works as best-effort and some
improvement are needed. Without distracting much from this thread,
simply saying it is not optimized on which cgroup to pick from the
per-zone RB-tree.

>
>
>> >
>> >
>> >> The approach we are thinking to make the page->lru exclusive solve the
>> >> problem. and also we should be able to break the zone->lru_lock
>> >> sharing.
>> >>
>> > Is zone->lru_lock is a problem even with the help of pagevecs ?
>>
>> > If LRU management guys acks you to isolate LRUs and to make kswapd etc..
>> > more complex, okay, we'll go that way.
>>
>> I would assume the change only apply to memcg users , otherwise
>> everything is leaving in the global LRU list.
>>
>> This will _change_ the whole memcg design and concepts Maybe memcg
>> should have some kind of balloon driver to
>> > work happy with isolated lru.
>>
>> We have soft_limit hierarchical reclaim for system memory pressure,
>> and also we will add per-memcg background reclaim. Both of them do
>> targeting reclaim on per-memcg LRUs, and where is the balloon driver
>> needed?
>>
>
> If soft_limit is _not_ enough. And I think you background reclaim should
> be work with soft_limit and be triggered by global memory pressure.

This is something i can think about. Also i think we agree that we
should have efficient target reclaim
so the global LRU scanning should be eliminated.

>
> As wrote in other mail, it's not called via direct reclaim.
> Maybe its the 1st point to be shooted rather than trying big change.

Agree on this.

--Ying

>
>
>
>
> Thanks,
> -Kame
>
>

2011-03-29 07:53:58

by Michal Hocko

[permalink] [raw]

Subject: Re: [RFC 0/3] Implementation of cgroup isolation

Hi,

On Mon 28-03-11 11:01:18, Ying Han wrote:
> On Mon, Mar 28, 2011 at 2:39 AM, Michal Hocko <[email protected]> wrote:
> > Hi all,
> >
> > Memory cgroups can be currently used to throttle memory usage of a group of
> > processes. It, however, cannot be used for an isolation of processes from
> > the rest of the system because all the pages that belong to the group are
> > also placed on the global LRU lists and so they are eligible for the global
> > memory reclaim.
> >
> > This patchset aims at providing an opt-in memory cgroup isolation. This
> > means that a cgroup can be configured to be isolated from the rest of the
> > system by means of cgroup virtual filesystem (/dev/memctl/group/memory.isolated).
>
> Thank you Hugh pointing me to the thread. We are working on similar
> problem in memcg currently
>
> Here is the problem we see:
> 1. In memcg, a page is both on per-memcg-per-zone lru and global-lru.
> 2. Global memory reclaim will throw page away regardless of cgroup.
> 3. The zone->lru_lock is shared between per-memcg-per-zone lru and global-lru.

This is the primary motivation for the patchset. Except that I do not
insist on the strict isolation because I found opt-in approach less
invasive because you have to know what you are doing while you are
setting up a group. If the thing is enabled by default we can see many
side-effects during the reclaim, I am afraid.

> And we know:
> 1. We shouldn't do global reclaim since it breaks memory isolation.
> 2. There is no need for a page to be on both LRU list, especially
> after having per-memcg background reclaim.
>
> So our approach is to take off page from global lru after it is
> charged to a memcg. Only pages allocated at root cgroup remains in
> global LRU, and each memcg reclaims pages on its isolated LRU.

This sounds like an instance where all cgroups are isolated by default
(this can be set by mem_cgroup->isolated = 1).

Thanks
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic