Hello,
cgroup1 (by function memcg1_stat_format) already contains two lines
hierarchical_memory_limit %llu
hierarchical_memsw_limit %llu
which are useful for userland to easily and performance-wise find out the
effective cgroup limits being applied. Otherwise userland has to
open+read+close the file "memory.max" and/or "memory.swap.max" in multiple
parent directories of a nested cgroup.
For cgroup1 it was implemented by:
memcg: show real limit under hierarchy mode
https://github.com/torvalds/linux/commit/fee7b548e6f2bd4bfd03a1a45d3afd593de7d5e9
Date: Wed Jan 7 18:08:26 2009 -0800
But for cgroup2 it has been missing so far, this is just a copy-paste of the
cgroup1 code while changing s/memsw/swap/ as that is what cgroup1 vs. cgroup2
tracks. I have added it to the end of "memory.stat" to prevent possible
compatibility problems with existing code parsing that file.
Jan Kratochvil
Signed-off-by: Jan Kratochvil (Azul) <[email protected]>
mm/memcontrol.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 46d8d0211..2631dd810 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1636,6 +1636,8 @@ static inline unsigned long memcg_page_state_local_output(
static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
{
int i;
+ unsigned long memory, swap;
+ struct mem_cgroup *mi;
/*
* Provide statistics on the state of the memory subsystem as
@@ -1682,6 +1684,17 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
memcg_events(memcg, memcg_vm_event_stat[i]));
}
+ /* Hierarchical information */
+ memory = swap = PAGE_COUNTER_MAX;
+ for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) {
+ memory = min(memory, READ_ONCE(mi->memory.max));
+ swap = min(swap, READ_ONCE(mi->swap.max));
+ }
+ seq_buf_printf(s, "hierarchical_memory_limit %llu\n",
+ (u64)memory * PAGE_SIZE);
+ seq_buf_printf(s, "hierarchical_swap_limit %llu\n",
+ (u64)swap * PAGE_SIZE);
+
/* The above should easily fit into one page */
WARN_ON_ONCE(seq_buf_has_overflowed(s));
}
Hello.
Something like this would come quite handy.
On Mon, Feb 12, 2024 at 12:10:38PM +0800, "Jan Kratochvil (Azul)" <[email protected]> wrote:
> which are useful for userland to easily and performance-wise find out the
> effective cgroup limits being applied.
And the only way to figure out inside cgroupns.
> But for cgroup2 it has been missing so far, this is just a copy-paste of the
> cgroup1 code while changing s/memsw/swap/ as that is what cgroup1 vs. cgroup2
> tracks. I have added it to the end of "memory.stat" to prevent possible
> compatibility problems with existing code parsing that file.
I was thinking of memory.max.effective (and others).
- no need to (possibly flush) stats when reading memory.stat
- can be generalized also for pids controller (and other "limiting" controllers)
- analogous to precedent of cpuset.cpus.effective
Whereas, using v1 approach in v2:
- memory.stat mixes true stats and limits,
- memmory.stat is hierarchical by default, no need for the prefix.
What do you think of the separate .effective file(s)?
Thanks
Michal
On 2/12/24 10:00, Michal Koutný wrote:
> Hello.
>
> Something like this would come quite handy.
>
> On Mon, Feb 12, 2024 at 12:10:38PM +0800, "Jan Kratochvil (Azul)" <[email protected]> wrote:
>> which are useful for userland to easily and performance-wise find out the
>> effective cgroup limits being applied.
> And the only way to figure out inside cgroupns.
>
>> But for cgroup2 it has been missing so far, this is just a copy-paste of the
>> cgroup1 code while changing s/memsw/swap/ as that is what cgroup1 vs. cgroup2
>> tracks. I have added it to the end of "memory.stat" to prevent possible
>> compatibility problems with existing code parsing that file.
> I was thinking of memory.max.effective (and others).
>
> - no need to (possibly flush) stats when reading memory.stat
> - can be generalized also for pids controller (and other "limiting" controllers)
> - analogous to precedent of cpuset.cpus.effective
>
> Whereas, using v1 approach in v2:
> - memory.stat mixes true stats and limits,
> - memmory.stat is hierarchical by default, no need for the prefix.
>
> What do you think of the separate .effective file(s)?
This is certainly a good alternative.
Cheers,
Longman