2023-04-13 01:23:44

by Michael Honaker

[permalink] [raw]
Subject: cgroup: Clarification around usage_in_bytes and its relation to the page counter

Hello,

This is my first posting to the LKML, so please let me know if this
should be asked
elsewhere or if there is anything else wrong with my email. I'd like
to confirm my understanding on an issue
I've been encountering.

I have been trying to get an accurate measurement of memory usage of a
non-root cgroup, specifically a Kubernetes container, and noticed some
inconsistencies when comparing the
value of `memory.usage_in_bytes` with the information in
`memory.stat`. After further investigation of
the cgroup docs (/admin-guide/cgroups/memory.rst#usage_in_bytes) and
an old LMKL thread ("real meaning
of memory.usage_in_bytes"), I came to the understanding that
`usage_in_bytes` actually shows the value
of the resource counter which is an overestimation due to the counter
being split into per-cpu chunks
for caching, and that the real usage can be calculated from RSS+Cache
gathered from `memory.stat`.
I've created cadvisor issue #3286
(https://github.com/google/cadvisor/issues/3286) which goes into
greater detail on my investigation with examples.

Is the above understanding still correct with the new page counters?
If so, could any memory
allocations be reflected in `usage_in_bytes` but not in `stat` for
child cgroups? I want to ensure I'm not
missing anything by only monitoring the `stat` file.

Thank you for any clarification or corrections.


2023-05-02 17:50:45

by Michal Koutný

[permalink] [raw]
Subject: Re: cgroup: Clarification around usage_in_bytes and its relation to the page counter

Hello Michael.

On Wed, Apr 12, 2023 at 09:22:07PM -0400, Michael Honaker <[email protected]> wrote:
> I have been trying to get an accurate measurement of memory usage of a
> non-root cgroup, specifically a Kubernetes container,

Beware that containers are more or less based on sharing resources,
shared accounting is difficult and hence _accurate_ measurement may not
be available or the numbers need some amount of interpretation.

> and noticed some inconsistencies when comparing the value of
> `memory.usage_in_bytes` with the information in `memory.stat`. After
> further investigation of the cgroup docs
> (/admin-guide/cgroups/memory.rst#usage_in_bytes) and an old LMKL
> thread ("real meaning of memory.usage_in_bytes"),

[OT: I suggest you move to cgroup v2, the entities are IMO better named
and it's also more futureproof ;-)]

> I came to the understanding that `usage_in_bytes` actually shows the
> value of the resource counter which is an overestimation due to the
> counter being split into per-cpu chunks for caching,

I didn't read the thread but it's true that per-cpu batching may result
in an error (both signs in theory). Since around v5.13 the
implementation changed and error should be better:
O(nr_cpus * nr_cgroups(subtree) * MEMCG_CHARGE_BATCH) -> O(nr_cpus * MEMCG_CHARGE_BATCH).

> and that the real usage can be calculated from RSS+Cache gathered from
> `memory.stat`. I've created cadvisor issue #3286
> (https://github.com/google/cadvisor/issues/3286) which goes into
> greater detail on my investigation with examples.

The difference that you spot there is not caused (merely) by the per-cpu
optimization.
What you see as the difference is mainly kernel memory (e.g. dentries,
inodes, task_struct,...) -- RSS+Cache would only show memory that
userspace is directly responsible for but not the kernel structures
(whose size depends on kernel implementation afterall).

(On v2, you could see breakdown of the kernel memory usage besides
others in memory.stat.)

> Is the above understanding still correct with the new page counters?
> If so, could any memory allocations be reflected in `usage_in_bytes`
> but not in `stat` for child cgroups? I want to ensure I'm not
> missing anything by only monitoring the `stat` file.

I hope the abve sheds some light on these questions.

Michal


Attachments:
(No filename) (2.32 kB)
signature.asc (235.00 B)
Download all attachments