Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4273526pxb; Tue, 2 Mar 2021 10:46:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJxyidwMkQi2uXs1Qv9d8kX4jZZlJqKcDlY4eTACFOYPnvMWZqfnt/LWD/gEokBs9uZDaUAB X-Received: by 2002:a17:907:216d:: with SMTP id rl13mr14638353ejb.362.1614710779719; Tue, 02 Mar 2021 10:46:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614710779; cv=none; d=google.com; s=arc-20160816; b=x9Y52VQOfvbdLSVWW9KdD66SoNMSGpLpCF0I5Uj/jqszNkfBJfRGnhAstOJbJO3xEE BaqiB0eByGt4scDprHqBaQWl7ZkHed2v+R7xg0WsYlRrYoKEdwCt+ANs08FkSePjXQ6c /SMWatw3cDfjccyU3bN11TniKnNU1doTDGY+YVkECKj0pUZZMM6qn16OJ8dyui7XwaLi gi8qjNsLMckn6gOMQuLDulHUShYngfGNHqxJ2FGnqRAt7o6ykxzML0TUpmG9ogvDbAwe 5qD2KbrSFfId/OVGo/tJWeVEEPXrCHi7b7A9L9ziS+NgGa73bFMMUJWbsk2vFf+eDlH5 xdXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=SzsqJhnliez1tKlON8yLC5F6USIx2iyCxc12/4u/r+c=; b=yxA0A+R5Km3erQX38n1WD+aDYXOWOu/lNWF19smVrM4sHSM4BXLWzDU34fl2lRJQTn vYER6kPYJKRLdETEkxvk1ziTlw8jqshK/i8k5YNxMKpELner915Gqy4XDTT0sIljW4uM IOA4ynZMOS7/Jx6wrwGCDNdTjSyPsShDDjVHonkoYnSnYBcGE+H9g5//AmzeiJg4jKiB RddTWLgAXwCunOvUwz2ijVTp2R4NuphWiplXMV6zfCGnXvRWPhSD7NkodyXQZWBIShqP 19EPvNM/LhAtmIRho1Vwr2qf8Q6WaGvnt/TDzfVT3bAfzGUubmd7999pFkV6LOxrcs8k w2rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=I4jmhul4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id kb16si6475271ejb.190.2021.03.02.10.45.56; Tue, 02 Mar 2021 10:46:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=I4jmhul4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1576789AbhCBEce (ORCPT + 99 others); Mon, 1 Mar 2021 23:32:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238563AbhCBCwW (ORCPT ); Mon, 1 Mar 2021 21:52:22 -0500 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46B26C06178C for ; Mon, 1 Mar 2021 18:50:41 -0800 (PST) Received: by mail-pl1-x62c.google.com with SMTP id e9so11156272plh.3 for ; Mon, 01 Mar 2021 18:50:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SzsqJhnliez1tKlON8yLC5F6USIx2iyCxc12/4u/r+c=; b=I4jmhul4Z5yg+xo9mAznhlNQXfF9s8TJskKTSIYrPYUwxVyIOcIlOkHWTsBaQVpGvf zDfHR7YnJY4d4j+veaCOxSfnf0WdJ3ID5HmWGdqqfn4X9IRnN3/RLi+lxjq9EkL/uYJg Ok1k3/niPacZp26eNMSpbDoq3GpfxW/XQqR6xZjE05+hd3yztSnH3lGoSdP8q0vzU8mV ti376Ul89wmAe/aqsDWlTzuj21DVQu/MYm6lieeS66zGszHG8yOjJilhLpYNAckcid0b vaMpKvLB8/KzK02Oz61Wzk23FWx6sno07D+PyBkHlLt9mXti8KpMz9jXfJ37NeirOLtu 9sVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SzsqJhnliez1tKlON8yLC5F6USIx2iyCxc12/4u/r+c=; b=RVxonxYUk91pFhd4DwGyJVdnGriDBZKG14phFmM6Bp8QzF0rZqESKEY0P6gyJbWqvs /aniuTmcZAF6txa3abrH9CfEgVAp85JDgrUfgn2ED3hpsOcPThBuRD/3NIWNeDDgHytf CQ+Y0zpVT9FezArVYqnAPcMlqBwvJncItY26QrhkG/QGonrE3Eyw0xjF9dozFzajsr3m aObXnDTCipUy4Afqx0OAJsJ/OxEI8BDD3Nh7PLDTWiUGZ3sdyODGXw6EHoT45L6yKMzT B9J8w0ivNVNNGtOUffmHakrEnJ4BTTILma5ADWwJN0IF3YAiQG8F/hII4CXGZzs/1Haq I6Eg== X-Gm-Message-State: AOAM531a8PGbaJbr2OqN6JAKy6ZMXiKpQ0bEyMqTCXjxMgz7QaJMeEJW eJhflsP0eMnVjd3MLBp07rO88AI98Ddzo3VNJwaYfg== X-Received: by 2002:a17:902:e54e:b029:e3:9f84:db8e with SMTP id n14-20020a170902e54eb02900e39f84db8emr1508267plf.24.1614653440529; Mon, 01 Mar 2021 18:50:40 -0800 (PST) MIME-Version: 1.0 References: <20210301062227.59292-1-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Tue, 2 Mar 2021 10:50:04 +0800 Message-ID: Subject: Re: [External] Re: [PATCH 0/5] Use obj_cgroup APIs to change kmem pages To: Roman Gushchin Cc: viro@zeniv.linux.org.uk, Jan Kara , amir73il@gmail.com, Alexei Starovoitov , Daniel Borkmann , andrii@kernel.org, Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , kpsingh@kernel.org, mingo@redhat.com, Peter Zijlstra , juri.lelli@redhat.com, Vincent Guittot , dietmar.eggemann@arm.com, Steven Rostedt , Benjamin Segall , mgorman@suse.de, bristot@redhat.com, Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Shakeel Butt , Alex Shi , Chris Down , richard.weiyang@gmail.com, Vlastimil Babka , mathieu.desnoyers@efficios.com, posk@google.com, Jann Horn , Joonsoo Kim , Daniel Vetter , longman@redhat.com, Michel Lespinasse , Christian Brauner , "Eric W. Biederman" , Kees Cook , krisman@collabora.com, esyr@redhat.com, Suren Baghdasaryan , Marco Elver , linux-fsdevel , LKML , Networking , bpf , Cgroups , Linux Memory Management List , Xiongchun duan Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 2, 2021 at 9:12 AM Roman Gushchin wrote: > > Hi Muchun! > > On Mon, Mar 01, 2021 at 02:22:22PM +0800, Muchun Song wrote: > > Since Roman series "The new cgroup slab memory controller" applied. All > > slab objects are changed via the new APIs of obj_cgroup. This new APIs > > introduce a struct obj_cgroup instead of using struct mem_cgroup directly > > to charge slab objects. It prevents long-living objects from pinning the > > original memory cgroup in the memory. But there are still some corner > > objects (e.g. allocations larger than order-1 page on SLUB) which are > > not charged via the API of obj_cgroup. Those objects (include the pages > > which are allocated from buddy allocator directly) are charged as kmem > > pages which still hold a reference to the memory cgroup. > > Yes, this is a good idea, large kmallocs should be treated the same > way as small ones. > > > > > E.g. We know that the kernel stack is charged as kmem pages because the > > size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64 > > or arm64). If we create a thread (suppose the thread stack is charged to > > memory cgroup A) and then move it from memory cgroup A to memory cgroup > > B. Because the kernel stack of the thread hold a reference to the memory > > cgroup A. The thread can pin the memory cgroup A in the memory even if > > we remove the cgroup A. If we want to see this scenario by using the > > following script. We can see that the system has added 500 dying cgroups. > > > > #!/bin/bash > > > > cat /proc/cgroups | grep memory > > > > cd /sys/fs/cgroup/memory > > echo 1 > memory.move_charge_at_immigrate > > > > for i in range{1..500} > > do > > mkdir kmem_test > > echo $$ > kmem_test/cgroup.procs > > sleep 3600 & > > echo $$ > cgroup.procs > > echo `cat kmem_test/cgroup.procs` > cgroup.procs > > rmdir kmem_test > > done > > > > cat /proc/cgroups | grep memory > > Well, moving processes between cgroups always created a lot of issues > and corner cases and this one is definitely not the worst. So this problem > looks a bit artificial, unless I'm missing something. But if it doesn't > introduce any new performance costs and doesn't make the code more complex, > I have nothing against. OK. I just want to show that large kmallocs are charged as kmem pages. So I constructed this test case. > > Btw, can you, please, run the spell-checker on commit logs? There are many > typos (starting from the title of the series, I guess), which make the patchset > look less appealing. Sorry for my poor English. I will do that. Thanks for your suggestions. > > Thank you! > > > > > This patchset aims to make those kmem pages drop the reference to memory > > cgroup by using the APIs of obj_cgroup. Finally, we can see that the number > > of the dying cgroups will not increase if we run the above test script. > > > > Patch 1-3 are using obj_cgroup APIs to charge kmem pages. The remote > > memory cgroup charing APIs is a mechanism to charge kernel memory to a > > given memory cgroup. So I also make it use the APIs of obj_cgroup. > > Patch 4-5 are doing this. > > > > Muchun Song (5): > > mm: memcontrol: introduce obj_cgroup_{un}charge_page > > mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem > > page > > mm: memcontrol: reparent the kmem pages on cgroup removal > > mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM > > mm: memcontrol: use object cgroup for remote memory cgroup charging > > > > fs/buffer.c | 10 +- > > fs/notify/fanotify/fanotify.c | 6 +- > > fs/notify/fanotify/fanotify_user.c | 2 +- > > fs/notify/group.c | 3 +- > > fs/notify/inotify/inotify_fsnotify.c | 8 +- > > fs/notify/inotify/inotify_user.c | 2 +- > > include/linux/bpf.h | 2 +- > > include/linux/fsnotify_backend.h | 2 +- > > include/linux/memcontrol.h | 109 +++++++++++--- > > include/linux/sched.h | 6 +- > > include/linux/sched/mm.h | 30 ++-- > > kernel/bpf/syscall.c | 35 ++--- > > kernel/fork.c | 4 +- > > mm/memcontrol.c | 276 ++++++++++++++++++++++------------- > > mm/page_alloc.c | 4 +- > > 15 files changed, 324 insertions(+), 175 deletions(-) > > > > -- > > 2.11.0 > >