Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp2588807rdb; Tue, 12 Sep 2023 06:37:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGVF9P4JcGBua3Py2Fd0mI9SigXOCi8mdOuCkDqHJZdJQS+At1U+thMxIksmRkzodPBrlsv X-Received: by 2002:a05:6a20:3d24:b0:14b:d28e:e947 with SMTP id y36-20020a056a203d2400b0014bd28ee947mr12374852pzi.48.1694525872490; Tue, 12 Sep 2023 06:37:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694525872; cv=none; d=google.com; s=arc-20160816; b=szCzH2wtm+sqhigBSd2tK6DS8H1Z0R6FIULGRY3SCBI5Zo9Y0RKCCqht9KWkthVK2I toKTLs1ErKsSBZJIsSlm75tof+MGIEFmYbxxfzjYTXcwl5IBdEjuY3BEp8PHDNzA9ejH QnePDjH16J8b+WWJjb7dB9zgcsksmAG20A/sxGtSJ1AkCYQfhddHbE6PC3FJYJ3HGvnw xkDY0E6/C3i73CySXL+ySHIf5af3Kh5K6PXQFQlA3KLFwfWjZRPZe8jv5NQAnN1KTLjf yKBMRO4fyAZqpo3STaz6uLBB0dXix0n5KweVUwO4QQvVyV0ZAFifnF0mj+SYYSL6yvdK 5tpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=LEy15wwFS0JVFa3fWkOLjP4htEcsy3rcKHhw++IpAZ4=; fh=i112R+rv54Cz8IlCpkcaZyp4FTXnX3wrWIBYKz7hcPA=; b=f4Rp4y4eo+shJfzRMpYBCfMSlL4S6MLiCewYQYoVP2rWS9LMrpVZmkzVBGkp2NR10x IP2hTXB4uNVcH8r7cUUHMeodKtImXgLeX+9KDe6sXJmJQPbpRFebbvpZ58d6Mr1NyE71 7CjuvCxSMJ8yblamIEH/67gPzQD/GrVXs6zwCZIQr514Vd5rMUsAO288n/uTAb49ce0b 8FQ7ELRXITCCPEn+kAC+XOOz5BPrEArvj6zYQg7atMvqzSDkQTwqt3ge9/B/lUrj6XiW RuvM6NmMPAP2h17x4mTnK16F7vKWVs6cwG/qL7F1us+bzS0zH+Ot2C+oYomxdHUTDgLt +LQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=25EgoFH5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id y15-20020a17090aa40f00b0025027e0ad3dsi7912253pjp.81.2023.09.12.06.37.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 06:37:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=25EgoFH5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 671D8842E830; Mon, 11 Sep 2023 21:46:56 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236509AbjILCIl (ORCPT + 99 others); Mon, 11 Sep 2023 22:08:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236275AbjILCBO (ORCPT ); Mon, 11 Sep 2023 22:01:14 -0400 Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5B0ECACB9 for ; Mon, 11 Sep 2023 18:31:22 -0700 (PDT) Received: by mail-lf1-x12f.google.com with SMTP id 2adb3069b0e04-502934c88b7so8078263e87.2 for ; Mon, 11 Sep 2023 18:31:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694482281; x=1695087081; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LEy15wwFS0JVFa3fWkOLjP4htEcsy3rcKHhw++IpAZ4=; b=25EgoFH5gfhzkcDlCyOb3STl+Me2JKSzywWFLGhI2l9h+hzeIdq/wkDP9TZgVpexDr roQ61cRqUx5oCwmtCLbVWQrNi1+Qr3Dr4bCksB0xuBhzLdPpWmujGIFgqb/HuFxL6Az/ 0O0VOz2Xgy/5W6tmV9BXFzX0ZxVma6D/kBuw6VbXx9DGrMr7Hc7LXsvxrXrL6vwNt9vg 4hHkJEi6bNBVlC+6FnTd+EENo/95NX8OhusOjYPU5CEgM0fpkwdWY5Q+ULYgCLg9CfTd XIeRZ+rN5bR2wIA4PCEN7ME13KEPv1+fMrHaBS6YtThW7xbzYLO1O02AUBu7xOsexnKU 3/tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694482281; x=1695087081; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LEy15wwFS0JVFa3fWkOLjP4htEcsy3rcKHhw++IpAZ4=; b=tySb3Gd0CUAuDJ7rPpKkm7bXtMropC9hMUN+/H/TF5Hrv6biwZyA9X3RAA7llJgN0i IzWtJgNerq9beu5daFjTSDvSKbOCCXaJdZeimneoO2JxJ7SmbpBJ3LtI3hEYTz4ecSmM 3TT0/M7/qbBnN4YMGdFxAo+5j8GTqoPcKhHtRbmol9v+rwN7RpBo+A/kQEL+arlPrOmU AwYpAuetgJuMB4CHdFL3YKYHPrQcoGI7ilmZN1S+Ql72J8gYEi2tejq1JK8VM36gCNQk MVS1Vq5b1Pdez1T7LlKSFHqIogdu4ZMFqXwWl7x+9Uj1Ae9tydJu1KeKS6mstAgKFe8O WYgA== X-Gm-Message-State: AOJu0Yy5Afb7eUefb8XYG5xgJSlD0BSrzngnQg/9AyTffg8vgyIIfcCi wih7Ldo1Ecj3AeWjJaeXIHKIVz9rvgbspEMUtqY0N5gJuSziQ8StU8Pi55dK X-Received: by 2002:a05:6512:1094:b0:500:7c51:4684 with SMTP id j20-20020a056512109400b005007c514684mr11645309lfg.56.1694470627685; Mon, 11 Sep 2023 15:17:07 -0700 (PDT) MIME-Version: 1.0 References: <20230911075437.74027-1-zeil@nebius.com> <20230911075437.74027-2-zeil@nebius.com> In-Reply-To: <20230911075437.74027-2-zeil@nebius.com> From: Yosry Ahmed Date: Mon, 11 Sep 2023 15:16:28 -0700 Message-ID: Subject: Re: [RFC PATCH 1/3] cgroup: list all subsystem states in debugfs files To: "Yakunin, Dmitry (Nebius)" Cc: "cgroups@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , NB-Core Team , "tj@kernel.org" , "hannes@cmpxchg.org" , "mhocko@kernel.org" , Konstantin Khlebnikov , Andrey Ryabinin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Mon, 11 Sep 2023 21:46:56 -0700 (PDT) On Mon, Sep 11, 2023 at 12:55=E2=80=AFAM Yakunin, Dmitry (Nebius) wrote: > > After removing cgroup subsystem state could leak or live in background > forever because it is pinned by some reference. For example memory cgroup > could be pinned by pages in cache or tmpfs. > > This patch adds common debugfs interface for listing basic state for each > controller. Controller could define callback for dumping own attributes. > > In file /sys/kernel/debug/cgroup/ each line shows state in > format: =3D... [-- =3D... ] > > Common attributes: > > css - css pointer > cgroup - cgroup pointer > id - css id > ino - cgroup inode > flags - css flags > refcnt - css atomic refcount, for online shows huge bias > path - cgroup path > > This patch adds memcg attributes: > > mem_id - 16-bit memory cgroup id > memory - charged pages > memsw - charged memory+swap for v1 and swap for v2 > kmem - charged kernel pages > tcpmem - charged tcp pages > shmem - shmem/tmpfs pages > > Link: https://lore.kernel.org/lkml/153414348591.737150.142299609139532765= 15.stgit@buzz > Suggested-by: Konstantin Khlebnikov > Reviewed-by: Andrey Ryabinin > Signed-off-by: Dmitry Yakunin FWIW, I was just recently working on a debugfs directly that exposes a list of all zombie memcgs as well as the "memory.stat" output for all of them. This entails a file at /sys/kernel/debug/zombie_memcgs/all that contains a list of zombie memcgs (with indentation to reflect the hierarchy) and an id for each of them. This id can be used to index per-memcg directories at /sys/kernel/debug/zombie_memcgs//, which include debug files. The only one we have so far is /sys/kernel/debug/zombie_memcgs//memory.stat. If there is interest in this, I can share more information. > --- > include/linux/cgroup-defs.h | 1 + > kernel/cgroup/cgroup.c | 101 ++++++++++++++++++++++++++++++++++++ > mm/memcontrol.c | 14 +++++ > 3 files changed, 116 insertions(+) > > diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h > index 8a0d5466c7be..810bd300cbee 100644 > --- a/include/linux/cgroup-defs.h > +++ b/include/linux/cgroup-defs.h > @@ -673,6 +673,7 @@ struct cgroup_subsys { > void (*exit)(struct task_struct *task); > void (*release)(struct task_struct *task); > void (*bind)(struct cgroup_subsys_state *root_css); > + void (*css_dump)(struct cgroup_subsys_state *css, struct seq_file= *m); > > bool early_init:1; > > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index 625d7483951c..fb9931ff7570 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -40,6 +40,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -7068,3 +7069,103 @@ static int __init cgroup_sysfs_init(void) > subsys_initcall(cgroup_sysfs_init); > > #endif /* CONFIG_SYSFS */ > + > +#ifdef CONFIG_DEBUG_FS > +void *css_debugfs_seqfile_start(struct seq_file *m, loff_t *pos) > +{ > + struct cgroup_subsys *ss =3D m->private; > + struct cgroup_subsys_state *css; > + int id =3D *pos; > + > + rcu_read_lock(); > + css =3D idr_get_next(&ss->css_idr, &id); > + *pos =3D id; > + return css; > +} > + > +void *css_debugfs_seqfile_next(struct seq_file *m, void *v, loff_t *pos) > +{ > + struct cgroup_subsys *ss =3D m->private; > + struct cgroup_subsys_state *css; > + int id =3D *pos + 1; > + > + css =3D idr_get_next(&ss->css_idr, &id); > + *pos =3D id; > + return css; > +} > + > +void css_debugfs_seqfile_stop(struct seq_file *m, void *v) > +{ > + rcu_read_unlock(); > +} > + > +int css_debugfs_seqfile_show(struct seq_file *m, void *v) > +{ > + struct cgroup_subsys *ss =3D m->private; > + struct cgroup_subsys_state *css =3D v; > + /* data is NULL for root cgroup_subsys_state */ > + struct percpu_ref_data *data =3D css->refcnt.data; > + size_t buflen; > + char *buf; > + int len; > + > + seq_printf(m, "css=3D%pK cgroup=3D%pK id=3D%d ino=3D%lu flags=3D%= #x refcnt=3D%lu path=3D", > + css, css->cgroup, css->id, cgroup_ino(css->cgroup), > + css->flags, data ? atomic_long_read(&data->count) : 0)= ; > + > + buflen =3D seq_get_buf(m, &buf); > + if (buf) { > + len =3D cgroup_path(css->cgroup, buf, buflen); > + seq_commit(m, len < buflen ? len : -1); > + } > + > + if (ss->css_dump) { > + seq_puts(m, " -- "); > + ss->css_dump(css, m); > + } > + > + seq_putc(m, '\n'); > + return 0; > +} > + > +static const struct seq_operations css_debug_seq_ops =3D { > + .start =3D css_debugfs_seqfile_start, > + .next =3D css_debugfs_seqfile_next, > + .stop =3D css_debugfs_seqfile_stop, > + .show =3D css_debugfs_seqfile_show, > +}; > + > +static int css_debugfs_open(struct inode *inode, struct file *file) > +{ > + int ret =3D seq_open(file, &css_debug_seq_ops); > + struct seq_file *m =3D file->private_data; > + > + if (!ret) > + m->private =3D inode->i_private; > + return ret; > +} > + > +static const struct file_operations css_debugfs_fops =3D { > + .open =3D css_debugfs_open, > + .read =3D seq_read, > + .llseek =3D seq_lseek, > + .release =3D seq_release, > +}; > + > +static int __init css_debugfs_init(void) > +{ > + struct cgroup_subsys *ss; > + struct dentry *dir; > + int ssid; > + > + dir =3D debugfs_create_dir("cgroup", NULL); > + if (dir) { > + for_each_subsys(ss, ssid) > + debugfs_create_file(ss->name, 0644, dir, ss, > + &css_debugfs_fops); > + } > + > + return 0; > +} > +late_initcall(css_debugfs_init); > +#endif /* CONFIG_DEBUG_FS */ > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4b27e245a055..7b3d4a10ac63 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5654,6 +5654,20 @@ static void mem_cgroup_css_rstat_flush(struct cgro= up_subsys_state *css, int cpu) > } > } > > +static void mem_cgroup_css_dump(struct cgroup_subsys_state *css, > + struct seq_file *m) > +{ > + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); > + > + seq_printf(m, "mem_id=3D%u memory=3D%lu memsw=3D%lu kmem=3D%lu tc= pmem=3D%lu shmem=3D%lu", > + mem_cgroup_id(memcg), > + page_counter_read(&memcg->memory), > + page_counter_read(&memcg->memsw), > + page_counter_read(&memcg->kmem), > + page_counter_read(&memcg->tcpmem), > + memcg_page_state(memcg, NR_SHMEM)); > +} > + > #ifdef CONFIG_MMU > /* Handlers for move charge at task migration. */ > static int mem_cgroup_do_precharge(unsigned long count) > -- > 2.25.1 > >