2012-05-09 18:54:58

by Andre Nathan

[permalink] [raw]
Subject: About cgroup memory limits

Hello

I'm doing some tests with LXC and how it interacts with the memory
cgroup limits, more specifically the memory.limit_in_bytes control file.

Am I correct in my understanding of the memory cgroup documentation[1]
that the limit set in memory.limit_in_bytes is applied to the sum of the
fields 'cache', 'rss' and 'mapped_file' in the memory.stat file?

I am also trying to understand the values reported in memory.stat when
compared to the statistics in /proc/$PID/statm.

Below is the sum of each field in /proc/$PID/statm for every process
running inside a test container, converted to bytes:

size resident share text lib data dt
897208320 28741632 20500480 1171456 0 170676224 0

Compare this with the usage reports from memory.stat (fields total_*,
hierarchical_* and pg* omitted):

cache 16834560
rss 8192000
mapped_file 3743744
swap 0
inactive_anon 0
active_anon 8192000
inactive_file 13996032
active_file 2838528
unevictable 0

Is there a way to reconcile these numbers somehow? I understand that the
fields from the two files represent different things. What I'm trying to
do is to combine, for example, the fields from memory.stat to
approximately reach what is displayed by statm.

Thank you in advance,
Andre

[1] http://www.kernel.org/doc/Documentation/cgroups/memory.txt


2012-05-10 09:49:24

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: About cgroup memory limits

(2012/05/10 3:37), Andre Nathan wrote:

> Hello
>
> I'm doing some tests with LXC and how it interacts with the memory
> cgroup limits, more specifically the memory.limit_in_bytes control file.
>
> Am I correct in my understanding of the memory cgroup documentation[1]
> that the limit set in memory.limit_in_bytes is applied to the sum of the
> fields 'cache', 'rss' and 'mapped_file' in the memory.stat file?
>

cache includes mapped_file. Then,

rss + cache < limit.

cache - mapped_file == unmapped file caches.


> I am also trying to understand the values reported in memory.stat when
> compared to the statistics in /proc/$PID/statm.
>
> Below is the sum of each field in /proc/$PID/statm for every process
> running inside a test container, converted to bytes:
>
> size resident share text lib data dt
> 897208320 28741632 20500480 1171456 0 170676224 0
>

from statm source code.

size = total virtual memory size
# total amount of mmaps().

shared = mapped_file
# resident mapped file caches

text = end_code - start_code
# end_code and start_code is determined when the program is loaded.
# this is virtual memory size.

data = total virtual memory size - MAP_SHARED virtual memory size.

regisdent = anonymous pages + mapped_file.


> Compare this with the usage reports from memory.stat (fields total_*,
> hierarchical_* and pg* omitted):
>
> cache 16834560
> rss 8192000
> mapped_file 3743744
> swap 0
> inactive_anon 0
> active_anon 8192000
> inactive_file 13996032
> active_file 2838528
> unevictable 0
>
> Is there a way to reconcile these numbers somehow? I understand that the
> fields from the two files represent different things. What I'm trying to
> do is to combine, for example, the fields from memory.stat to
> approximately reach what is displayed by statm.
>


>From above, rss + mapped_file == resident.

Thanks,
-Kame

2012-05-10 10:54:35

by Andre Nathan

[permalink] [raw]
Subject: Re: About cgroup memory limits

Thanks a lot Kame.

On Thu, 2012-05-10 at 18:47 +0900, KAMEZAWA Hiroyuki wrote:
> From above, rss + mapped_file == resident.

But if you check the numbers I get from memory.stat and the sum of the
statm fields for all container processes, this doesn't hold.

resident = 28741632
rss = 8192000
mapped_file = 3743744

Am I missing something here?

Thank you,
Andre

2012-05-11 00:32:21

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: About cgroup memory limits

(2012/05/10 19:55), Andre Nathan wrote:

> Thanks a lot Kame.
>
> On Thu, 2012-05-10 at 18:47 +0900, KAMEZAWA Hiroyuki wrote:
>> From above, rss + mapped_file == resident.
>
> But if you check the numbers I get from memory.stat and the sum of the
> statm fields for all container processes, this doesn't hold.
>
> resident = 28741632
> rss = 8192000
> mapped_file = 3743744
>
> Am I missing something here?
>

Considering again...

file_mapped is accounted into the memcg where it's accounted as 'cache' to.

So, you can know 'mapped_file/caches ratio' but you can't know which cgroup
maps it.

> size resident share text lib data dt
> 897208320 28741632 20500480 1171456 0 170676224 0
>

resident - share = anon.

28741632 - 20500480 = 8241152 (anon) near value to rss.

About file_mapped, 3743744(mapped_file) * 100 /20500480(shared) = 18 %

18% of your app's file-mapping may be accounted into the memcg.

You may be able to different score if you do all test after drop-cache.
echo 3 > /proc/sys/vm/drop_caches

Thanks,
-Kame



2012-05-15 11:07:19

by Balbir Singh

[permalink] [raw]
Subject: Re: About cgroup memory limits

On Tue, May 15, 2012 at 4:36 PM, Balbir Singh <[email protected]> wrote:
>
>
> On Thu, May 10, 2012 at 12:07 AM, Andre Nathan <[email protected]>
> wrote:
>>
>> Hello
>>
>> I'm doing some tests with LXC and how it interacts with the memory
>> cgroup limits, more specifically the memory.limit_in_bytes control file.
>>
>> Am I correct in my understanding of the memory cgroup documentation[1]
>> that the limit set in memory.limit_in_bytes is applied to the sum of the
>> fields 'cache', 'rss' and 'mapped_file' in the memory.stat file?
>>
>> I am also trying to understand the values reported in memory.stat when
>> compared to the statistics in /proc/$PID/statm.
>>
>> Below is the sum of each field in /proc/$PID/statm for every process
>> running inside a test container, converted to bytes:
>>
>> ? ? ? size ?resident ? ? share ? ? text ?lib ? ? ? data ?dt
>> ?897208320 ?28741632 ?20500480 ?1171456 ? ?0 ?170676224 ? 0
>>
>> Compare this with the usage reports from memory.stat (fields total_*,
>> hierarchical_* and pg* omitted):
>>
>> cache ? ? ? ? ? ? ? ? ? ? 16834560
>> rss ? ? ? ? ? ? ? ? ? ? ? 8192000
>> mapped_file ? ? ? ? ? ? ? 3743744
>> swap ? ? ? ? ? ? ? ? ? ? ?0
>> inactive_anon ? ? ? ? ? ? 0
>> active_anon ? ? ? ? ? ? ? 8192000
>> inactive_file ? ? ? ? ? ? 13996032
>> active_file ? ? ? ? ? ? ? 2838528
>> unevictable ? ? ? ? ? ? ? 0
>>
>> Is there a way to reconcile these numbers somehow? I understand that the
>> fields from the two files represent different things. What I'm trying to
>> do is to combine, for example, the fields from memory.stat to
>> approximately reach what is displayed by statm.
>>
>

Resending.. Plain text issues (sorry)

> cgroups accounting is different (sorry for that) from statm. From
> Documentation/filesystems/proc.txt
>
> Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
>
> ..............................................................................
> ?Field??? Content
> ?size???? total program size (pages)??????????? (same as VmSize in status)
> ?resident size of memory portions (pages)?????? (same as VmRSS in status)
> ?shared?? number of pages that are shared?????? (i.e. backed by a file)
> ?trs????? number of pages that are 'code'?????? (not including libs;
> broken,
> ??????????????????????????????????????????????????????? includes data
> segment)
> ?lrs????? number of pages of library??????????? (always 0 on 2.6)
> ?drs????? number of pages of data/stack???????? (including libs; broken,
> ??????????????????????????????????????????????????????? includes library
> text)
> ?dt?????? number of dirty pages???????????????? (always 0 on 2.6)
>
> ..............................................................................
>
> vmRSS accounting is different from RSS accounting in cgroups. I presume
> you acquired this data from process running in the cgroup? What does cat of
> tasks file within the cgroup show you? Ideally you want to make sure there
> is one task inside the cgroup to compare against /proc/$PID/statm and the
> data is collected from a task outside that cgroup
>
> Balbir

2012-05-25 04:16:55

by Zhu Yanhai

[permalink] [raw]
Subject: Re: About cgroup memory limits

2012/5/10 KAMEZAWA Hiroyuki <[email protected]>:
> (2012/05/10 3:37), Andre Nathan wrote:
>
>> Hello
>>
>> I'm doing some tests with LXC and how it interacts with the memory
>> cgroup limits, more specifically the memory.limit_in_bytes control file.
>>
>> Am I correct in my understanding of the memory cgroup documentation[1]
>> that the limit set in memory.limit_in_bytes is applied to the sum of the
>> fields 'cache', 'rss' and 'mapped_file' in the memory.stat file?
>>
>
> cache includes mapped_file. Then,

Excuse me, but it does read:

switch (ctype) {
case MEM_CGROUP_CHARGE_TYPE_CACHE:
case MEM_CGROUP_CHARGE_TYPE_SHMEM:
SetPageCgroupCache(pc);
SetPageCgroupUsed(pc);
break;
case MEM_CGROUP_CHARGE_TYPE_MAPPED:
ClearPageCgroupCache(pc);
SetPageCgroupUsed(pc);
break;
default:
break;
}
mem_cgroup_charge_statistics(mem, pc, page_size);

And then, in mem_cgroup_charge_statistics() we have :

if (PageCgroupCache(pc))
__mem_cgroup_stat_add_safe(cpustat,
MEM_CGROUP_STAT_CACHE, numpages);
else
__mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_RSS,
numpages);

So it seems that rss includes mapped_file, not cache?

>
> rss + cache < limit.
>
> cache - mapped_file == unmapped file caches.
>
>
>> I am also trying to understand the values reported in memory.stat when
>> compared to the statistics in /proc/$PID/statm.
>>
>> Below is the sum of each field in /proc/$PID/statm for every process
>> running inside a test container, converted to bytes:
>>
>>        size  resident     share     text  lib       data  dt
>>   897208320  28741632  20500480  1171456    0  170676224   0
>>
>
> from statm source code.
>
> size      = total virtual memory size
>            # total amount of mmaps().
>
> shared    = mapped_file
>            # resident mapped file caches
>
> text      = end_code - start_code
>            # end_code and start_code is determined when the program is loaded.
>            # this is virtual memory size.
>
> data      = total virtual memory size  - MAP_SHARED virtual memory size.
>
> regisdent = anonymous pages + mapped_file.
>
>
>> Compare this with the usage reports from memory.stat (fields total_*,
>> hierarchical_* and pg* omitted):
>>
>> cache                     16834560
>> rss                       8192000
>> mapped_file               3743744
>> swap                      0
>> inactive_anon             0
>> active_anon               8192000
>> inactive_file             13996032
>> active_file               2838528
>> unevictable               0
>>
>> Is there a way to reconcile these numbers somehow? I understand that the
>> fields from the two files represent different things. What I'm trying to
>> do is to combine, for example, the fields from memory.stat to
>> approximately reach what is displayed by statm.
>>
>
>
> From above, rss + mapped_file == resident.
>
> Thanks,
> -Kame
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

2012-05-25 05:00:25

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: About cgroup memory limits

(2012/05/25 13:16), Zhu Yanhai wrote:
> 2012/5/10 KAMEZAWA Hiroyuki<[email protected]>:
>> (2012/05/10 3:37), Andre Nathan wrote:
>>
>>> Hello
>>>
>>> I'm doing some tests with LXC and how it interacts with the memory
>>> cgroup limits, more specifically the memory.limit_in_bytes control file.
>>>
>>> Am I correct in my understanding of the memory cgroup documentation[1]
>>> that the limit set in memory.limit_in_bytes is applied to the sum of the
>>> fields 'cache', 'rss' and 'mapped_file' in the memory.stat file?
>>>
>>
>> cache includes mapped_file. Then,
>
> Excuse me, but it does read:
>
> switch (ctype) {
> case MEM_CGROUP_CHARGE_TYPE_CACHE:
> case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> SetPageCgroupCache(pc);
> SetPageCgroupUsed(pc);
> break;
> case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> ClearPageCgroupCache(pc);
> SetPageCgroupUsed(pc);
> break;
> default:
> break;
> }
> mem_cgroup_charge_statistics(mem, pc, page_size);
>
> And then, in mem_cgroup_charge_statistics() we have :
>
> if (PageCgroupCache(pc))
> __mem_cgroup_stat_add_safe(cpustat,
> MEM_CGROUP_STAT_CACHE, numpages);
> else
> __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_RSS,
> numpages);
>
> So it seems that rss includes mapped_file, not cache?
>
Why you think so ? mapped_file is mapped file cache. All file caches
are accountned as STAT_CACHE.


TYPE_MAPPDED doesn't mean mapped_file.
In above, TYPE_MAPPED is called via anonymous page fault.
It represents anonymous page, counted as RSS.
I wonder it may be better to rename these macros.


Thanks,
-Kame

2012-05-25 06:11:58

by Zhu Yanhai

[permalink] [raw]
Subject: Re: About cgroup memory limits

2012/5/25 Kamezawa Hiroyuki <[email protected]>:
> (2012/05/25 13:16), Zhu Yanhai wrote:
>>
>> 2012/5/10 KAMEZAWA Hiroyuki<[email protected]>:
>>>
>>> (2012/05/10 3:37), Andre Nathan wrote:
>>>
>>>> Hello
>>>>
>>>> I'm doing some tests with LXC and how it interacts with the memory
>>>> cgroup limits, more specifically the memory.limit_in_bytes control file.
>>>>
>>>> Am I correct in my understanding of the memory cgroup documentation[1]
>>>> that the limit set in memory.limit_in_bytes is applied to the sum of the
>>>> fields 'cache', 'rss' and 'mapped_file' in the memory.stat file?
>>>>
>>>
>>> cache includes mapped_file. Then,
>>
>>
>> Excuse me, but it does read:
>>
>>        switch (ctype) {
>>        case MEM_CGROUP_CHARGE_TYPE_CACHE:
>>        case MEM_CGROUP_CHARGE_TYPE_SHMEM:
>>                SetPageCgroupCache(pc);
>>                SetPageCgroupUsed(pc);
>>                break;
>>        case MEM_CGROUP_CHARGE_TYPE_MAPPED:
>>                ClearPageCgroupCache(pc);
>>                SetPageCgroupUsed(pc);
>>                break;
>>        default:
>>                break;
>>        }
>>        mem_cgroup_charge_statistics(mem, pc, page_size);
>>
>> And then, in    mem_cgroup_charge_statistics() we have :
>>
>>        if (PageCgroupCache(pc))
>>                __mem_cgroup_stat_add_safe(cpustat,
>>                        MEM_CGROUP_STAT_CACHE, numpages);
>>        else
>>                __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_RSS,
>>                        numpages);
>>
>> So it seems that rss includes mapped_file, not cache?
>>
> Why you think so ? mapped_file is mapped file cache. All file caches
> are accountned as STAT_CACHE.
>
>
> TYPE_MAPPDED doesn't mean mapped_file.
> In above, TYPE_MAPPED is called via anonymous page fault.
> It represents anonymous page, counted as RSS.
> I wonder it may be better to rename these macros.

Got it, really get confused by the name! Thanks.
>
>
> Thanks,
> -Kame
>
>