2024-01-26 21:20:00

by T.J. Mercier

[permalink] [raw]
Subject: [PATCH v2] mm: memcg: Don't periodically flush stats when memcg is disabled

The root memcg is onlined even when memcg is disabled. When it's onlined
a 2 second periodic stat flush is started, but no stat flushing is
required when memcg is disabled because there can be no child memcgs.
Most calls to flush memcg stats are avoided when memcg is disabled as a
result of the mem_cgroup_disabled check added in 7d7ef0a4686a
("mm: memcg: restore subtree stats flushing"), but the periodic flushing
started in mem_cgroup_css_online is not. Skip it.

Fixes: aa48e47e3906 ("memcg: infrastructure to flush memcg stats")
Reported-by: Minchan Kim <[email protected]>
Signed-off-by: T.J. Mercier <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e4c8735e7c85..bad8f9dfc9ab 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5586,7 +5586,7 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
if (alloc_shrinker_info(memcg))
goto offline_kmem;

- if (unlikely(mem_cgroup_is_root(memcg)))
+ if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled())
queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
FLUSH_TIME);
lru_gen_online_memcg(memcg);
--
2.43.0.429.g432eaa2c6b-goog



2024-01-26 21:28:44

by Chris Li

[permalink] [raw]
Subject: Re: [PATCH v2] mm: memcg: Don't periodically flush stats when memcg is disabled

Hi T.J.

Acked-by: Chris Li <[email protected]>

Chris


On Fri, Jan 26, 2024 at 1:19 PM T.J. Mercier <[email protected]> wrote:
>
> The root memcg is onlined even when memcg is disabled. When it's onlined
> a 2 second periodic stat flush is started, but no stat flushing is
> required when memcg is disabled because there can be no child memcgs.
> Most calls to flush memcg stats are avoided when memcg is disabled as a
> result of the mem_cgroup_disabled check added in 7d7ef0a4686a
> ("mm: memcg: restore subtree stats flushing"), but the periodic flushing
> started in mem_cgroup_css_online is not. Skip it.
>
> Fixes: aa48e47e3906 ("memcg: infrastructure to flush memcg stats")
> Reported-by: Minchan Kim <[email protected]>
> Signed-off-by: T.J. Mercier <[email protected]>
> Acked-by: Shakeel Butt <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> ---
> mm/memcontrol.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e4c8735e7c85..bad8f9dfc9ab 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5586,7 +5586,7 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
> if (alloc_shrinker_info(memcg))
> goto offline_kmem;
>
> - if (unlikely(mem_cgroup_is_root(memcg)))
> + if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled())
> queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
> FLUSH_TIME);
> lru_gen_online_memcg(memcg);
> --
> 2.43.0.429.g432eaa2c6b-goog
>
>

2024-01-29 20:36:21

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v2] mm: memcg: Don't periodically flush stats when memcg is disabled

On Fri, Jan 26, 2024 at 1:19 PM T.J. Mercier <[email protected]> wrote:
>
> The root memcg is onlined even when memcg is disabled. When it's onlined
> a 2 second periodic stat flush is started, but no stat flushing is
> required when memcg is disabled because there can be no child memcgs.
> Most calls to flush memcg stats are avoided when memcg is disabled as a
> result of the mem_cgroup_disabled check added in 7d7ef0a4686a
> ("mm: memcg: restore subtree stats flushing"), but the periodic flushing
> started in mem_cgroup_css_online is not. Skip it.
>
> Fixes: aa48e47e3906 ("memcg: infrastructure to flush memcg stats")
> Reported-by: Minchan Kim <[email protected]>
> Signed-off-by: T.J. Mercier <[email protected]>
> Acked-by: Shakeel Butt <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>

Reviewed-by: Yosry Ahmed <[email protected]>

Thanks!

2024-02-01 14:53:18

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH v2] mm: memcg: Don't periodically flush stats when memcg is disabled

On Fri, Jan 26, 2024 at 09:19:25PM +0000, "T.J. Mercier" <[email protected]> wrote:
> The root memcg is onlined even when memcg is disabled. When it's onlined
> a 2 second periodic stat flush is started, but no stat flushing is
> required when memcg is disabled because there can be no child memcgs.
> Most calls to flush memcg stats are avoided when memcg is disabled as a
> result of the mem_cgroup_disabled check added in 7d7ef0a4686a
> ("mm: memcg: restore subtree stats flushing"), but the periodic flushing
> started in mem_cgroup_css_online is not. Skip it.

Have you tried
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6099,6 +6099,9 @@ int __init cgroup_init(void)
cgroup_unlock();

for_each_subsys(ss, ssid) {
+ if (!cgroup_ssid_enabled(ssid))
+ continue;
+
if (ss->early_init) {
struct cgroup_subsys_state *css =
init_css_set.subsys[ss->id];
@@ -6118,9 +6121,6 @@ int __init cgroup_init(void)
* disabled flag and cftype registration needs kmalloc,
* both of which aren't available during early_init.
*/
- if (!cgroup_ssid_enabled(ssid))
- continue;
-
if (cgroup1_ssid_disabled(ssid))
pr_info("Disabling %s control group subsystem in v1 mounts\n",
ss->legacy_name);
?
I'm asking about a try because I'm not sure whether this does not blow
up due to something missing. But I think disabled controllers would not
need to be (root-)onlined at all.

Thanks,
Michal


Attachments:
(No filename) (1.67 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-01 21:02:31

by T.J. Mercier

[permalink] [raw]
Subject: Re: [PATCH v2] mm: memcg: Don't periodically flush stats when memcg is disabled

On Thu, Feb 1, 2024 at 6:26 AM Michal Koutný <[email protected]> wrote:
>
> On Fri, Jan 26, 2024 at 09:19:25PM +0000, "T.J. Mercier" <[email protected]> wrote:
> > The root memcg is onlined even when memcg is disabled. When it's onlined
> > a 2 second periodic stat flush is started, but no stat flushing is
> > required when memcg is disabled because there can be no child memcgs.
> > Most calls to flush memcg stats are avoided when memcg is disabled as a
> > result of the mem_cgroup_disabled check added in 7d7ef0a4686a
> > ("mm: memcg: restore subtree stats flushing"), but the periodic flushing
> > started in mem_cgroup_css_online is not. Skip it.
>
> Have you tried
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6099,6 +6099,9 @@ int __init cgroup_init(void)
> cgroup_unlock();
>
> for_each_subsys(ss, ssid) {
> + if (!cgroup_ssid_enabled(ssid))
> + continue;
> +
> if (ss->early_init) {
> struct cgroup_subsys_state *css =
> init_css_set.subsys[ss->id];
> @@ -6118,9 +6121,6 @@ int __init cgroup_init(void)
> * disabled flag and cftype registration needs kmalloc,
> * both of which aren't available during early_init.
> */
> - if (!cgroup_ssid_enabled(ssid))
> - continue;
> -
> if (cgroup1_ssid_disabled(ssid))
> pr_info("Disabling %s control group subsystem in v1 mounts\n",
> ss->legacy_name);
> ?
> I'm asking about a try because I'm not sure whether this does not blow
> up due to something missing. But I think disabled controllers would not
> need to be (root-)onlined at all.
>
> Thanks,
> Michal

Hi Michal,

It does blow up, but not how I was expecting. There's a null pointer
dereference inside find_css_set when trying to get a css pointer for
the memory controller, I think because the allocation in
cgroup_init_subsys is skipped:

[ 9.591766] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 9.591909] #PF: supervisor read access in kernel mode
[ 9.592023] #PF: error_code(0x0000) - not-present page
[ 9.592138] PGD 0 P4D 0
[ 9.592199] Oops: 0000 [#1] PREEMPT SMP PTI
[ 9.592289] CPU: 1 PID: 1 Comm: init Tainted: G E
6.7.0-mainline-maybe-dirty #1
[ 9.592490] Hardware name: emulation qemu-x86/qemu-x86, BIOS
2023.07-rc6-gb8315a3184b2-ab11347395 07/01/2023
[ 9.592731] RIP: 0010:find_css_set+0x5c3/0x7a0
[ 9.592850] Code: 20 69 cf b2 4e 8b 3c 33 48 c7 c7 1d 3b 41 b2 4c
89 fe e8 10 b0 f7 00 49 8b b4 24 a0 00 00 00 4e 8d 24 2b 49 81 c4 b0
fc ff ff <49> 8b 0f 4c 01 e9 48 c7 c7 c4 c0 46 b2 4c 89 e2 e8 e8 af f7
00 49
[ 9.593251] RSP: 0018:ffffb6218000bb90 EFLAGS: 00010087
[ 9.593370] RAX: 0000000000000021 RBX: ffff99181044a200 RCX: 4f5e789f089a0c00
[ 9.593554] RDX: ffffb6218000ba50 RSI: ffffffffb2448284 RDI: ffffffffb2c91950
[ 9.593735] RBP: ffffb6218000bc28 R08: 0000000000000fff R09: ffffffffb2c79950
[ 9.593909] R10: 0000000000002ffd R11: 0000000000000004 R12: ffff99181044a2d8
[ 9.594102] R13: 0000000000000428 R14: 0000000000000020 R15: 0000000000000000
[ 9.594291] FS: 00007f3d2f986fc8(0000) GS:ffff99182bd00000(0000)
knlGS:0000000000000000
[ 9.594467] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.594610] CR2: 0000000000000000 CR3: 00000001001d8001 CR4: 0000000000370eb0
[ 9.594818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9.595007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9.595178] Call Trace:
[ 9.595237] <TASK>
[ 9.595297] ? __die_body+0x5e/0xb0
[ 9.595392] ? __die+0x9e/0xb0
[ 9.595481] ? page_fault_oops+0x35f/0x3d0
[ 9.595574] ? do_user_addr_fault+0x6ab/0x780
[ 9.595690] ? prb_read_valid+0x28/0x60
[ 9.595782] ? exc_page_fault+0x83/0x220
[ 9.595874] ? asm_exc_page_fault+0x2b/0x30
[ 9.595966] ? find_css_set+0x5c3/0x7a0
[ 9.596059] cgroup_migrate_prepare_dst+0x75/0x2b0
[ 9.596194] cgroup_attach_task+0x293/0x450
[ 9.596305] ? cgroup_attach_task+0xb6/0x450
[ 9.596449] __cgroup_procs_write+0xef/0x1a0
[ 9.596589] cgroup_procs_write+0x16/0x30
[ 9.596733] cgroup_file_write+0x9d/0x260
[ 9.596840] kernfs_fop_write_iter+0x145/0x1e0
[ 9.596981] vfs_write+0x276/0x2e0
[ 9.597092] ksys_write+0x73/0xe0
[ 9.597198] __x64_sys_write+0x1a/0x30
[ 9.597303] do_syscall_64+0x5a/0x100
[ 9.597430] entry_SYSCALL_64_after_hwframe+0x6e/0x76


But there are also calls to mem_cgroup_css_from_folio that I think
would cause a different null pointer deref even if we had the css but
no root_mem_cgroup. There seems to be an assumption that we'll have a
memcg to charge against even when the controller is disabled. To me it
looks like that's to simplify the possible combinations of
CONFIG_MEMCG and memcg being boot-time disabled or not.

Best,
T.J.

2024-02-02 22:47:42

by Michal Koutný

[permalink] [raw]
Subject: Re: Re: [PATCH v2] mm: memcg: Don't periodically flush stats when memcg is disabled

On Thu, Feb 01, 2024 at 01:02:04PM -0800, "T.J. Mercier" <[email protected]> wrote:
> It does blow up, but not how I was expecting. There's a null pointer
> dereference inside find_css_set when trying to get a css pointer for
> the memory controller, I think because the allocation in
> cgroup_init_subsys is skipped:

Thanks for trying! I suspected it won't be easy. At the same time I
suspected there must be a hook for your purpose -- after looking at
cpuset, I was reminded of cgroup_subsys.bind callback.
What about triggering periodic flush in that callback? (memcg doesn't
implement it yet but cgroup_init() takes it into account.)
It would take any dwork activation out of mem_cgroup_css_online() and it
seems cleaner. (Ideally, the flush could be disabled again when memcg
root is unmounted again. (That's impossible and practically unused but
that's why consider callback approach cleaner. Of course, your original
guard serves the purpose too.))

Regards,
Michal