2024-02-12 04:11:02

by Jan Kratochvil (Azul)

[permalink] [raw]
Subject: [PATCH v2] Port hierarchical_{memory,swap}_limit cgroup1->cgroup2

Hello,

cgroup1 (by function memcg1_stat_format) already contains two lines
hierarchical_memory_limit %llu
hierarchical_memsw_limit %llu

which are useful for userland to easily and performance-wise find out the
effective cgroup limits being applied. Otherwise userland has to
open+read+close the file "memory.max" and/or "memory.swap.max" in multiple
parent directories of a nested cgroup.

For cgroup1 it was implemented by:
memcg: show real limit under hierarchy mode
https://github.com/torvalds/linux/commit/fee7b548e6f2bd4bfd03a1a45d3afd593de7d5e9
Date: Wed Jan 7 18:08:26 2009 -0800

But for cgroup2 it has been missing so far, this is just a copy-paste of the
cgroup1 code while changing s/memsw/swap/ as that is what cgroup1 vs. cgroup2
tracks. I have added it to the end of "memory.stat" to prevent possible
compatibility problems with existing code parsing that file.


Jan Kratochvil


Signed-off-by: Jan Kratochvil (Azul) <[email protected]>

mm/memcontrol.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 46d8d0211..2631dd810 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1636,6 +1636,8 @@ static inline unsigned long memcg_page_state_local_output(
static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
{
int i;
+ unsigned long memory, swap;
+ struct mem_cgroup *mi;

/*
* Provide statistics on the state of the memory subsystem as
@@ -1682,6 +1684,17 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
memcg_events(memcg, memcg_vm_event_stat[i]));
}

+ /* Hierarchical information */
+ memory = swap = PAGE_COUNTER_MAX;
+ for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) {
+ memory = min(memory, READ_ONCE(mi->memory.max));
+ swap = min(swap, READ_ONCE(mi->swap.max));
+ }
+ seq_buf_printf(s, "hierarchical_memory_limit %llu\n",
+ (u64)memory * PAGE_SIZE);
+ seq_buf_printf(s, "hierarchical_swap_limit %llu\n",
+ (u64)swap * PAGE_SIZE);
+
/* The above should easily fit into one page */
WARN_ON_ONCE(seq_buf_has_overflowed(s));
}


2024-02-12 15:00:54

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH v2] Port hierarchical_{memory,swap}_limit cgroup1->cgroup2

Hello.

Something like this would come quite handy.

On Mon, Feb 12, 2024 at 12:10:38PM +0800, "Jan Kratochvil (Azul)" <[email protected]> wrote:
> which are useful for userland to easily and performance-wise find out the
> effective cgroup limits being applied.

And the only way to figure out inside cgroupns.

> But for cgroup2 it has been missing so far, this is just a copy-paste of the
> cgroup1 code while changing s/memsw/swap/ as that is what cgroup1 vs. cgroup2
> tracks. I have added it to the end of "memory.stat" to prevent possible
> compatibility problems with existing code parsing that file.

I was thinking of memory.max.effective (and others).

- no need to (possibly flush) stats when reading memory.stat
- can be generalized also for pids controller (and other "limiting" controllers)
- analogous to precedent of cpuset.cpus.effective

Whereas, using v1 approach in v2:
- memory.stat mixes true stats and limits,
- memmory.stat is hierarchical by default, no need for the prefix.

What do you think of the separate .effective file(s)?

Thanks
Michal


Attachments:
(No filename) (1.08 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-12 15:26:57

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v2] Port hierarchical_{memory,swap}_limit cgroup1->cgroup2


On 2/12/24 10:00, Michal Koutný wrote:
> Hello.
>
> Something like this would come quite handy.
>
> On Mon, Feb 12, 2024 at 12:10:38PM +0800, "Jan Kratochvil (Azul)" <[email protected]> wrote:
>> which are useful for userland to easily and performance-wise find out the
>> effective cgroup limits being applied.
> And the only way to figure out inside cgroupns.
>
>> But for cgroup2 it has been missing so far, this is just a copy-paste of the
>> cgroup1 code while changing s/memsw/swap/ as that is what cgroup1 vs. cgroup2
>> tracks. I have added it to the end of "memory.stat" to prevent possible
>> compatibility problems with existing code parsing that file.
> I was thinking of memory.max.effective (and others).
>
> - no need to (possibly flush) stats when reading memory.stat
> - can be generalized also for pids controller (and other "limiting" controllers)
> - analogous to precedent of cpuset.cpus.effective
>
> Whereas, using v1 approach in v2:
> - memory.stat mixes true stats and limits,
> - memmory.stat is hierarchical by default, no need for the prefix.
>
> What do you think of the separate .effective file(s)?

This is certainly a good alternative.

Cheers,
Longman