2012-05-24 21:52:31

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Mon, 16 Apr 2012, Aneesh Kumar K.V wrote:

> This patch implements a memcg extension that allows us to control HugeTLB
> allocations via memory controller. The extension allows to limit the
> HugeTLB usage per control group and enforces the controller limit during
> page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit
> at page fault time implies that, the application will get SIGBUS signal if it
> tries to access HugeTLB pages beyond its limit. This requires the application
> to know beforehand how much HugeTLB pages it would require for its use.
>
> The charge/uncharge calls will be added to HugeTLB code in later patch.
> Support for memcg removal will be added in later patches.
>

Again, I disagree with this approach because it's adding the functionality
to memcg when it's unnecessary; it would be a complete legitimate usecase
to want to limit the number of globally available hugepages to a set of
tasks without incurring the per-page tracking from memcg.

This can be implemented as a seperate cgroup and as we move to a single
hierarchy, you lose no functionality if you mount both cgroups from what
is done here.

It would be much cleaner in terms of

- build: not requiring ifdefs and dependencies on CONFIG_HUGETLB_PAGE,
which is a prerequisite for this functionality and is not for
CONFIG_CGROUP_MEM_RES_CTLR,

- code: seperating hugetlb bits out from memcg bits to avoid growing
mm/memcontrol.c beyond its current 5650 lines, and

- performance: not incurring any overhead of enabling memcg for per-
page tracking that is unnecessary if users only want to limit hugetlb
pages.

Kmem accounting and swap accounting is really a seperate topic and makes
sense to be incorporated directly into memcg because their usage is a
single number, the same is not true for hugetlb pages where charging one
1GB page is not the same as charging 512 2M pages. And we have no
usecases for wanting to track kmem or swap only without user page
tracking, what would be the point?

There's a reason we don't enable CONFIG_CGROUP_MEM_RES_CTLR in the
defconfig, we don't want the extra 1% metadata overhead of enabling it and
the potential performance regression from doing per-page tracking if we
only want to limit a global resource (hugetlb pages) to a set of tasks.

So please consider seperating this functionality out into its own cgroup,
there's no reason not to do it and it would benefit hugetlb users who
don't want to incur the disadvantages of enabling memcg entirely.


2012-05-24 22:57:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Thu, 24 May 2012 14:52:26 -0700 (PDT)
David Rientjes <[email protected]> wrote:

> On Mon, 16 Apr 2012, Aneesh Kumar K.V wrote:
>
> > This patch implements a memcg extension that allows us to control HugeTLB
> > allocations via memory controller. The extension allows to limit the
> > HugeTLB usage per control group and enforces the controller limit during
> > page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit
> > at page fault time implies that, the application will get SIGBUS signal if it
> > tries to access HugeTLB pages beyond its limit. This requires the application
> > to know beforehand how much HugeTLB pages it would require for its use.
> >
> > The charge/uncharge calls will be added to HugeTLB code in later patch.
> > Support for memcg removal will be added in later patches.
> >
>
> Again, I disagree with this approach because it's adding the functionality
> to memcg when it's unnecessary; it would be a complete legitimate usecase
> to want to limit the number of globally available hugepages to a set of
> tasks without incurring the per-page tracking from memcg.
>
> This can be implemented as a seperate cgroup and as we move to a single
> hierarchy, you lose no functionality if you mount both cgroups from what
> is done here.
>
> It would be much cleaner in terms of
>
> - build: not requiring ifdefs and dependencies on CONFIG_HUGETLB_PAGE,
> which is a prerequisite for this functionality and is not for
> CONFIG_CGROUP_MEM_RES_CTLR,
>
> - code: seperating hugetlb bits out from memcg bits to avoid growing
> mm/memcontrol.c beyond its current 5650 lines, and
>
> - performance: not incurring any overhead of enabling memcg for per-
> page tracking that is unnecessary if users only want to limit hugetlb
> pages.
>
> Kmem accounting and swap accounting is really a seperate topic and makes
> sense to be incorporated directly into memcg because their usage is a
> single number, the same is not true for hugetlb pages where charging one
> 1GB page is not the same as charging 512 2M pages. And we have no
> usecases for wanting to track kmem or swap only without user page
> tracking, what would be the point?
>
> There's a reason we don't enable CONFIG_CGROUP_MEM_RES_CTLR in the
> defconfig, we don't want the extra 1% metadata overhead of enabling it and
> the potential performance regression from doing per-page tracking if we
> only want to limit a global resource (hugetlb pages) to a set of tasks.
>
> So please consider seperating this functionality out into its own cgroup,
> there's no reason not to do it and it would benefit hugetlb users who
> don't want to incur the disadvantages of enabling memcg entirely.

These arguments look pretty strong to me. But poorly timed :(

2012-05-24 23:20:28

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Thu, 24 May 2012, Andrew Morton wrote:

> These arguments look pretty strong to me. But poorly timed :(
>

What I argued here is nothing new, I said the same thing back on April 27
and I was expecting it to be reproposed as a seperate controller. The
counter argument that memcg shouldn't cause a performance degradation
doesn't hold water: you can't expect every page to be tracked without
incurring some penalty somewhere. And it certainly causes ~1% of memory
to be used up at boot with all the struct page_cgroups.

The counter argument that we'd have to duplicate cgroup setup and
initialization code from memcg also is irrelevant: all generic cgroup
mounting, creation, and initialization code should be in kernel/cgroup.c.
Obviously there will be added code because we're introducing a new cgroup,
but that's not a reason to force everybody who wants to control hugetlb
pages to be forced to enable memcg.

2012-05-27 20:29:09

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Thu, May 24, 2012 at 02:52:26PM -0700, David Rientjes wrote:
> On Mon, 16 Apr 2012, Aneesh Kumar K.V wrote:
>
> > This patch implements a memcg extension that allows us to control HugeTLB
> > allocations via memory controller. The extension allows to limit the
> > HugeTLB usage per control group and enforces the controller limit during
> > page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit
> > at page fault time implies that, the application will get SIGBUS signal if it
> > tries to access HugeTLB pages beyond its limit. This requires the application
> > to know beforehand how much HugeTLB pages it would require for its use.
> >
> > The charge/uncharge calls will be added to HugeTLB code in later patch.
> > Support for memcg removal will be added in later patches.
> >
>
> Again, I disagree with this approach because it's adding the functionality
> to memcg when it's unnecessary; it would be a complete legitimate usecase
> to want to limit the number of globally available hugepages to a set of
> tasks without incurring the per-page tracking from memcg.
>
> This can be implemented as a seperate cgroup and as we move to a single
> hierarchy, you lose no functionality if you mount both cgroups from what
> is done here.
>
> It would be much cleaner in terms of
>
> - build: not requiring ifdefs and dependencies on CONFIG_HUGETLB_PAGE,
> which is a prerequisite for this functionality and is not for
> CONFIG_CGROUP_MEM_RES_CTLR,

I am not sure we have large number of #ifdef as you have outlined above.
Most of the hugetlb limit code is well isolated already. If we were to
split it as a seperate controller, we will be duplicating code related
cgroup deletion, migration support etc from memcg, because in case
of memcg and hugetlb limit they depend on struct page. So I would expect
we would be end up #ifdef around that code or duplicate them in the
new controller if we were to do hugetlb limit as a seperate controller.

Another reason for it to be part of memcg is, it is normal to look
at hugetlb usage also as a memory usage. One of the feedback I got
for the earlier post is to see if i can enhace the current code to
make sure memory.usage_in_bytes can also account for hugetlb usage.
People would also like to look at memory.limit_in_bytes to limit total
usage. (inclusive of hugetlb).

>
> - code: seperating hugetlb bits out from memcg bits to avoid growing
> mm/memcontrol.c beyond its current 5650 lines, and
>

I can definitely look at spliting mm/memcontrol.c


> - performance: not incurring any overhead of enabling memcg for per-
> page tracking that is unnecessary if users only want to limit hugetlb
> pages.
>

-aneesh

2012-05-30 14:43:52

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

"Aneesh Kumar K.V" <[email protected]> writes:

> On Thu, May 24, 2012 at 02:52:26PM -0700, David Rientjes wrote:
>> On Mon, 16 Apr 2012, Aneesh Kumar K.V wrote:
>>
>> > This patch implements a memcg extension that allows us to control HugeTLB
>> > allocations via memory controller. The extension allows to limit the
>> > HugeTLB usage per control group and enforces the controller limit during
>> > page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit
>> > at page fault time implies that, the application will get SIGBUS signal if it
>> > tries to access HugeTLB pages beyond its limit. This requires the application
>> > to know beforehand how much HugeTLB pages it would require for its use.
>> >
>> > The charge/uncharge calls will be added to HugeTLB code in later patch.
>> > Support for memcg removal will be added in later patches.
>> >
>>
>> Again, I disagree with this approach because it's adding the functionality
>> to memcg when it's unnecessary; it would be a complete legitimate usecase
>> to want to limit the number of globally available hugepages to a set of
>> tasks without incurring the per-page tracking from memcg.
>>
>> This can be implemented as a seperate cgroup and as we move to a single
>> hierarchy, you lose no functionality if you mount both cgroups from what
>> is done here.
>>
>> It would be much cleaner in terms of
>>
>> - build: not requiring ifdefs and dependencies on CONFIG_HUGETLB_PAGE,
>> which is a prerequisite for this functionality and is not for
>> CONFIG_CGROUP_MEM_RES_CTLR,
>
> I am not sure we have large number of #ifdef as you have outlined above.
> Most of the hugetlb limit code is well isolated already. If we were to
> split it as a seperate controller, we will be duplicating code related
> cgroup deletion, migration support etc from memcg, because in case
> of memcg and hugetlb limit they depend on struct page. So I would expect
> we would be end up #ifdef around that code or duplicate them in the
> new controller if we were to do hugetlb limit as a seperate controller.
>
> Another reason for it to be part of memcg is, it is normal to look
> at hugetlb usage also as a memory usage. One of the feedback I got
> for the earlier post is to see if i can enhace the current code to
> make sure memory.usage_in_bytes can also account for hugetlb usage.
> People would also like to look at memory.limit_in_bytes to limit total
> usage. (inclusive of hugetlb).
>
>>
>> - code: seperating hugetlb bits out from memcg bits to avoid growing
>> mm/memcontrol.c beyond its current 5650 lines, and
>>
>
> I can definitely look at spliting mm/memcontrol.c
>
>
>> - performance: not incurring any overhead of enabling memcg for per-
>> page tracking that is unnecessary if users only want to limit hugetlb
>> pages.
>>

Since Andrew didn't sent the patchset to Linus because of this
discussion, I looked at reworking the patchset as a seperate
controller. The patchset I sent here

http://thread.gmane.org/gmane.linux.kernel.mm/79230

have seen minimal testing. I also folded the fixup patches
Andrew had in -mm to original patchset.

Let me know if the changes looks good.
-aneesh

2012-06-08 23:06:15

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Wed, 30 May 2012 20:13:31 +0530
"Aneesh Kumar K.V" <[email protected]> wrote:

> >>
> >> - code: seperating hugetlb bits out from memcg bits to avoid growing
> >> mm/memcontrol.c beyond its current 5650 lines, and
> >>
> >
> > I can definitely look at spliting mm/memcontrol.c
> >
> >
> >> - performance: not incurring any overhead of enabling memcg for per-
> >> page tracking that is unnecessary if users only want to limit hugetlb
> >> pages.
> >>
>
> Since Andrew didn't sent the patchset to Linus because of this
> discussion, I looked at reworking the patchset as a seperate
> controller. The patchset I sent here
>
> http://thread.gmane.org/gmane.linux.kernel.mm/79230
>
> have seen minimal testing. I also folded the fixup patches
> Andrew had in -mm to original patchset.
>
> Let me know if the changes looks good.

This is starting to be a problem. I'm still sitting on the old version
of this patchset and it will start to get in the way of other work.

We now have this new version of the patchset which implements a
separate controller but it is unclear to me which way we want to go.

Can the memcg developers please drop everything else and make a
decision here?

2012-06-09 14:17:14

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

Andrew Morton <[email protected]> writes:

> On Wed, 30 May 2012 20:13:31 +0530
> "Aneesh Kumar K.V" <[email protected]> wrote:
>
>> >>
>> >> - code: seperating hugetlb bits out from memcg bits to avoid growing
>> >> mm/memcontrol.c beyond its current 5650 lines, and
>> >>
>> >
>> > I can definitely look at spliting mm/memcontrol.c
>> >
>> >
>> >> - performance: not incurring any overhead of enabling memcg for per-
>> >> page tracking that is unnecessary if users only want to limit hugetlb
>> >> pages.
>> >>
>>
>> Since Andrew didn't sent the patchset to Linus because of this
>> discussion, I looked at reworking the patchset as a seperate
>> controller. The patchset I sent here
>>
>> http://thread.gmane.org/gmane.linux.kernel.mm/79230
>>
>> have seen minimal testing. I also folded the fixup patches
>> Andrew had in -mm to original patchset.
>>
>> Let me know if the changes looks good.
>
> This is starting to be a problem. I'm still sitting on the old version
> of this patchset and it will start to get in the way of other work.
>
> We now have this new version of the patchset which implements a
> separate controller but it is unclear to me which way we want to go.
>
> Can the memcg developers please drop everything else and make a
> decision here?


David Rientjes didn't like HugetTLB limit to be a memcg extension and
wanted this to be a separate controller. I posted a v7 version that did
HugeTLB limit as a separate controller and used page cgroup to track
HugeTLB cgroup. Kamezawa Hiroyuki didn't like the usage of page_cgroup
in HugeTLB controller( http://mid.gmane.org/[email protected] )

I ended up doing a v8 that used page[2].lru.next for storing hugetlb
controller.

http://mid.gmane.org/1339232401-14392-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com

I guess that should address all the concerns.

-aneesh

2012-06-10 01:55:34

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Sat, 9 Jun 2012, Aneesh Kumar K.V wrote:

> David Rientjes didn't like HugetTLB limit to be a memcg extension and
> wanted this to be a separate controller. I posted a v7 version that did
> HugeTLB limit as a separate controller and used page cgroup to track
> HugeTLB cgroup. Kamezawa Hiroyuki didn't like the usage of page_cgroup
> in HugeTLB controller( http://mid.gmane.org/[email protected] )
>

Yes, and thank you very much for working on v8 to remove the dependency on
page_cgroup and to seperate this out. I think it will benefit users who
don't want to enable all of memcg but still want to account and restrict
hugetlb page usage, and I think the code seperation is much cleaner
internally.

I'll review that patchset and suggest that the old hugetlb extension in
-mm be dropped in the interim.

2012-06-10 15:04:59

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Sat, Jun 09, 2012 at 06:55:30PM -0700, David Rientjes wrote:
> On Sat, 9 Jun 2012, Aneesh Kumar K.V wrote:
>
> > David Rientjes didn't like HugetTLB limit to be a memcg extension and
> > wanted this to be a separate controller. I posted a v7 version that did
> > HugeTLB limit as a separate controller and used page cgroup to track
> > HugeTLB cgroup. Kamezawa Hiroyuki didn't like the usage of page_cgroup
> > in HugeTLB controller( http://mid.gmane.org/[email protected] )
> >
>
> Yes, and thank you very much for working on v8 to remove the dependency on
> page_cgroup and to seperate this out. I think it will benefit users who
> don't want to enable all of memcg but still want to account and restrict
> hugetlb page usage, and I think the code seperation is much cleaner
> internally.
>

I have V9 ready to post. Only change I have against v8 is to fix the compund_order
comparison and folding the charge/uncharge patches with its users. I will wait for
your review feedback before posting V9 so that I can address the review comments
in V9. Once we get V9 out we can get the series added to -mm ?

> I'll review that patchset and suggest that the old hugetlb extension in
> -mm be dropped in the interim.
>

I also agree with dropping the old hugetlb extension patchset in -mm.

-aneesh

2012-06-11 03:57:37

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

(2012/06/09 8:06), Andrew Morton wrote:
> On Wed, 30 May 2012 20:13:31 +0530
> "Aneesh Kumar K.V"<[email protected]> wrote:
>
>>>>
>>>> - code: seperating hugetlb bits out from memcg bits to avoid growing
>>>> mm/memcontrol.c beyond its current 5650 lines, and
>>>>
>>>
>>> I can definitely look at spliting mm/memcontrol.c
>>>
>>>
>>>> - performance: not incurring any overhead of enabling memcg for per-
>>>> page tracking that is unnecessary if users only want to limit hugetlb
>>>> pages.
>>>>
>>
>> Since Andrew didn't sent the patchset to Linus because of this
>> discussion, I looked at reworking the patchset as a seperate
>> controller. The patchset I sent here
>>
>> http://thread.gmane.org/gmane.linux.kernel.mm/79230
>>
>> have seen minimal testing. I also folded the fixup patches
>> Andrew had in -mm to original patchset.
>>
>> Let me know if the changes looks good.
>
> This is starting to be a problem. I'm still sitting on the old version
> of this patchset and it will start to get in the way of other work.
>
> We now have this new version of the patchset which implements a
> separate controller but it is unclear to me which way we want to go.
>
> Can the memcg developers please drop everything else and make a
> decision here?

Following is a summary in my point of view.
I think there are several topics.

- overheads.
(A) IMHO, runtime overhead will be negligible because...
- if hugetlb is used, anonymous memory accouning doesn't add much overheads
because they're not used.
- when it comes to file-cache accounting, I/O dominates performance rather
than memcg..
- but you may see some overheads with 100+ cpu system...I'm not sure.

(B) memory space overhead will not be negligible.
- now, memcg uses 16bytes per page....4GB/1TB.
This may be an obvious overhead to the system if working set size are
quite big and the apps want to use huge size memory.

(C) what hugetlbfs is.
- hugetlb is statically allocated. So, they're not usual memory.
Then, hugetlb cgroup is better.

- IMHO, hugetlb is memory. And I thought memory.limit_in_bytes should
take it into account....

(D) code duplication
- memory cgroup and hugetlb cgroup will have similar hooks,codes,UIs.
- we need some #ifdef if we have consolidated memory/hugetlb cgroup.

(E) user experience
- with independent hugetlb cgroup, users can disable memory cgroup.
- with consolidated memcg+hugetlb cgroup, we'll be able to limit
usual page + hugetlb usage by a limit.


Now, I think...

1. I need to agree that overhead is _not_ negligible.

2. THP should be the way rather than hugetlb for my main target platform.
(shmem/tmpfs should support THP. we need study.)
user-experience should be fixed by THP+tmpfs+memcg.

3. It seems Aneesh decided to have independent hugetlb cgroup.

So, now, I admit to have independent hugetlb cgroup.
Other opinions ?

Thanks,
-Kame











2012-06-11 09:23:17

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Mon, 11 Jun 2012, Kamezawa Hiroyuki wrote:

> Now, I think...
>
> 1. I need to agree that overhead is _not_ negligible.
>
> 2. THP should be the way rather than hugetlb for my main target platform.
> (shmem/tmpfs should support THP. we need study.)
> user-experience should be fixed by THP+tmpfs+memcg.
>
> 3. It seems Aneesh decided to have independent hugetlb cgroup.
>
> So, now, I admit to have independent hugetlb cgroup.
> Other opinions ?
>

I suggested the seperate controller in the review of the patchset so I
obviously agree with your conclusion. I don't think we should account for
hugetlb pages in memory.usage_in_bytes and enforce memory.limit_in_bytes
since 512 4K pages is not the same as 1 2M page which may be a sacred
resource if fragmentation is high.

Many thanks to Aneesh for continuing to update the patchset and working
toward a resolution on this, I love the direction its taking.

2012-06-11 09:32:29

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Fri 08-06-12 16:06:12, Andrew Morton wrote:
> On Wed, 30 May 2012 20:13:31 +0530
> "Aneesh Kumar K.V" <[email protected]> wrote:
>
> > >>
> > >> - code: seperating hugetlb bits out from memcg bits to avoid growing
> > >> mm/memcontrol.c beyond its current 5650 lines, and
> > >>
> > >
> > > I can definitely look at spliting mm/memcontrol.c
> > >
> > >
> > >> - performance: not incurring any overhead of enabling memcg for per-
> > >> page tracking that is unnecessary if users only want to limit hugetlb
> > >> pages.
> > >>
> >
> > Since Andrew didn't sent the patchset to Linus because of this
> > discussion, I looked at reworking the patchset as a seperate
> > controller. The patchset I sent here
> >
> > http://thread.gmane.org/gmane.linux.kernel.mm/79230
> >
> > have seen minimal testing. I also folded the fixup patches
> > Andrew had in -mm to original patchset.
> >
> > Let me know if the changes looks good.
>
> This is starting to be a problem. I'm still sitting on the old version
> of this patchset and it will start to get in the way of other work.
>
> We now have this new version of the patchset which implements a
> separate controller but it is unclear to me which way we want to go.

I guess you are talking about v7 which is mem_cgroup based. This one has
some drawbacks (e.g. the most user visible one is that if one wants to
disable memory overhead from memcg he has to disable hugetlb controller
as well).
v8 took a different approach ((ab)use lru.next on the 3rd page to store
the group pointer) which looks as a reasonable compromise.

> Can the memcg developers please drop everything else and make a
> decision here?

I think that v8 (+fixups) is the way to go.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic

2012-06-15 22:31:56

by Aditya Kali

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Mon, Jun 11, 2012 at 2:23 AM, David Rientjes <[email protected]> wrote:
> On Mon, 11 Jun 2012, Kamezawa Hiroyuki wrote:
>
>> Now, I think...
>>
>>   1. I need to agree that overhead is _not_ negligible.
>>
>>   2. THP should be the way rather than hugetlb for my main target platform.
>>      (shmem/tmpfs should support THP. we need study.)
>>      user-experience should be fixed by THP+tmpfs+memcg.
>>
>>   3. It seems Aneesh decided to have independent hugetlb cgroup.
>>
>> So, now, I admit to have independent hugetlb cgroup.
>> Other opinions ?
>>
>
> I suggested the seperate controller in the review of the patchset so I
> obviously agree with your conclusion.  I don't think we should account for
> hugetlb pages in memory.usage_in_bytes and enforce memory.limit_in_bytes
> since 512 4K pages is not the same as 1 2M page which may be a sacred
> resource if fragmentation is high.
>
Based on the usecase at Google, I see a definite value in including
hugepage usage in memory.usage_in_bytes as well and having a single
limit for memory usage for the job. Our jobs wants to specify only one
(total) memory limit (including slab usage, and other kernel memory
usage, hugepages, etc.).

The hugepage/smallpage requirements of the job vary during its
lifetime. Having two different limits means less flexibility for jobs
as they now have to specify their limit as (max_hugepage,
max_smallpage) instead of max(hugepage + smallpage). Two limits
complicates the API for the users and requires them to over-specify
the resources.

> Many thanks to Aneesh for continuing to update the patchset and working
> toward a resolution on this, I love the direction its taking.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected].  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

Thanks,
--
Aditya

2012-06-16 20:26:56

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

On Fri, 15 Jun 2012, Aditya Kali wrote:

> Based on the usecase at Google, I see a definite value in including
> hugepage usage in memory.usage_in_bytes as well and having a single
> limit for memory usage for the job. Our jobs wants to specify only one
> (total) memory limit (including slab usage, and other kernel memory
> usage, hugepages, etc.).
>
> The hugepage/smallpage requirements of the job vary during its
> lifetime. Having two different limits means less flexibility for jobs
> as they now have to specify their limit as (max_hugepage,
> max_smallpage) instead of max(hugepage + smallpage). Two limits
> complicates the API for the users and requires them to over-specify
> the resources.
>

If a large number of hugepages, for example, are allocated on the command
line because there's a lower success rate of dynamic allocation due to
fragmentation, with your suggestion it would no longer allow the admin to
restrict the use of those hugepages to only a particular set of tasks.
Consider especially 1GB hugepagez on x86, your suggestion would treat a
single 1GB hugepage which cannot be freed after boot exactly the same as
using 1GB of memory which is obviously not the desired behavior of any
hugetlb controller.