Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp3795828iog; Tue, 21 Jun 2022 06:14:35 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uX88I6Jl7Sl46p5rlugDhaeJFoTvARP2JxNun917yUEjchKw7pNVnSoAL3/xsQXGbk58+6 X-Received: by 2002:a17:907:6e0a:b0:718:effb:58b7 with SMTP id sd10-20020a1709076e0a00b00718effb58b7mr25530125ejc.363.1655817274848; Tue, 21 Jun 2022 06:14:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655817274; cv=none; d=google.com; s=arc-20160816; b=dQNQsE2PWuOm27jYec2UPAPi/CtsclUludIeC34VAjk9pNCt4wILxy3GcxTLXdH8Ve Ate4gqbZtBn95eSPbNGd7Q8rQNLUOqo4z4gx6WBKuWTXz6C21RPHSsuKAwzTC82rRoCe B7jrYlt24gcBhdTjwi9KtD0i1W42uK5AKbUnr+d0HcoiTj3nlf3M262eGMTxDfRlZdSQ r9RJ5j0U3gR56vDgwtXTFmndCRJbdAQa62p2D0R9Si4MAjaLtTL438qfHpEnqTqYiL61 IIbk9lqmtTcSyk6HF9g8ykxFEOFXm9uRidsTFqvNLRq3WK3FRjsgkbLK3lKk/a2qaglF hppQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6Y6NgpPc42NYBvW73tDuppRYStJt3qQWgJd6fl6Oef4=; b=vqVEw8ouXlR0P6zcdGy8eCbaMM2uLovZ+4lt10QrBVUachVTspx0c9bYZ6rig/JHSC veOMI/KaRXf2PS+evtYchuenji1htFKHWhKWWofwMpHIvIPPdc/S7yXjTp7NC614e/RL 4vxb1YPbxcZAMNI3Ajoe63P7JwL5QmME3nDDcyvP6K4Pt/EobwOnD1rn65Ephm83M9+k odufXXgRYuG3p304xc9SlLdBmQCefREZ40v5pMMHUxhZrCz/NiKdAchSTxoZIqijgbOl a84STOqZ/aPwDv1HCuPYcZX3OaIFwvGVB+iij0R/4A19JD7lEoQ7r2G4yTbikE0e4HYq t85A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=uF9Oogsq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dd8-20020a1709069b8800b0070f8f93e1f5si15891726ejc.306.2022.06.21.06.14.07; Tue, 21 Jun 2022 06:14:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=uF9Oogsq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351290AbiFUM7i (ORCPT + 99 others); Tue, 21 Jun 2022 08:59:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351017AbiFUM6X (ORCPT ); Tue, 21 Jun 2022 08:58:23 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 605D6B2C for ; Tue, 21 Jun 2022 05:57:57 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id x4so6705063pfq.2 for ; Tue, 21 Jun 2022 05:57:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6Y6NgpPc42NYBvW73tDuppRYStJt3qQWgJd6fl6Oef4=; b=uF9OogsqG1RyhL7U+jblJtPj5YWhZ4AiZjE+EQ8j0xRHnV/O89W7gU9jS5UW7Ou5bk ZD/BJsWxtuzMf2V5MCtrrLDGRQKEb68PPNjcqskQI1e8ES4jEzP5uvODBovx4ljRpfFm URbzN150jWgQm4hYBvEsrmj8rhGXLX85rM84Iu+JLS+u0dihChhDk+5OZJPSxxf/Oqad VeEhlbrM6KTndZCc9GE36AjtfHdnyNwn4fq45ma+q1Ua0zo9M43j5cWF8tG4wN4FA4LC nFGonUPSl7pV6F76pm3vWS+XdEQak09XaD2SO1sRKIBVG64sz0Zi2lGZomsjaCGKNCBu NHFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6Y6NgpPc42NYBvW73tDuppRYStJt3qQWgJd6fl6Oef4=; b=J9ROyvJ6iTMTe96GZxlbKWO6i6eqL3PYa0twCsbSn15LdUe+KMRSMMD0pPA/V10ccP J9MGHcnyu2ODekrPTPtMaOf7Z+i2x7pKc/KlJru8X9D7qv7Fxug581pN/w0PICidP4wF 18LTq1u5B3LW4JOX8AIsUoxyAV40ATC2JVdp7NIsVOwYK4AzfjCuJhq3hY8etUeMLH7C AWv+3TNp4ywTRY4O43PYnAPOvIMZBpAc0pWp6tM/wP9SnnICvYf+a5RjuvB3zHEgdMSz w6JKjI3zpmVLr6V2NHQWmk8rFiOvf57YwlzxJH+E5VZ7LRRDalK5LiuMdavq507iRn4+ YSfg== X-Gm-Message-State: AJIora8+gjt5Y4QVvtWZRQOb5cjoi46cNoUCHcg0mxvAkB8fdeU4/lEy BIX8VZZ07ewQlHYkqJW4JsFzfA== X-Received: by 2002:aa7:88cc:0:b0:51c:319e:772c with SMTP id k12-20020aa788cc000000b0051c319e772cmr30033261pff.41.1655816276884; Tue, 21 Jun 2022 05:57:56 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id e3-20020a170903240300b0015ea3a491a1sm10643134plo.191.2022.06.21.05.57.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jun 2022 05:57:56 -0700 (PDT) From: Muchun Song To: akpm@linux-foundation.org, hannes@cmpxchg.org, longman@redhat.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com Cc: cgroups@vger.kernel.org, duanxiongchun@bytedance.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song , =?UTF-8?q?Michal=20Koutn=C3=BD?= Subject: [PATCH v6 03/11] mm: memcontrol: prepare objcg API for non-kmem usage Date: Tue, 21 Jun 2022 20:56:50 +0800 Message-Id: <20220621125658.64935-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220621125658.64935-1-songmuchun@bytedance.com> References: <20220621125658.64935-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Pagecache pages are charged at the allocation time and holding a reference to the original memory cgroup until being reclaimed. Depending on the memory pressure, specific patterns of the page sharing between different cgroups and the cgroup creation and destruction rates, a large number of dying memory cgroups can be pinned by pagecache pages. It makes the page reclaim less efficient and wastes memory. We can convert LRU pages and most other raw memcg pins to the objcg direction to fix this problem, and then the page->memcg will always point to an object cgroup pointer. Therefore, the infrastructure of objcg no longer only serves CONFIG_MEMCG_KMEM. In this patch, we move the infrastructure of the objcg out of the scope of the CONFIG_MEMCG_KMEM so that the LRU pages can reuse it to charge pages. We know that the LRU pages are not accounted at the root level. But the page->memcg_data points to the root_mem_cgroup. So the page->memcg_data of the LRU pages always points to a valid pointer. But the root_mem_cgroup dose not have an object cgroup. If we use obj_cgroup APIs to charge the LRU pages, we should set the page->memcg_data to a root object cgroup. So we also allocate an object cgroup for the root_mem_cgroup. Signed-off-by: Muchun Song Acked-by: Johannes Weiner Reviewed-by: Michal Koutný Acked-by: Roman Gushchin --- include/linux/memcontrol.h | 2 +- mm/memcontrol.c | 56 +++++++++++++++++++++++++++------------------- 2 files changed, 34 insertions(+), 24 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d0c0da7cafb7..111eda6ff1ce 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -321,10 +321,10 @@ struct mem_cgroup { #ifdef CONFIG_MEMCG_KMEM int kmemcg_id; +#endif struct obj_cgroup __rcu *objcg; /* list of inherited objcgs, protected by objcg_lock */ struct list_head objcg_list; -#endif MEMCG_PADDING(_pad2_); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fc706d6fc265..3c489651d312 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -252,9 +252,9 @@ struct mem_cgroup *vmpressure_to_memcg(struct vmpressure *vmpr) return container_of(vmpr, struct mem_cgroup, vmpressure); } -#ifdef CONFIG_MEMCG_KMEM static DEFINE_SPINLOCK(objcg_lock); +#ifdef CONFIG_MEMCG_KMEM bool mem_cgroup_kmem_disabled(void) { return cgroup_memory_nokmem; @@ -263,12 +263,10 @@ bool mem_cgroup_kmem_disabled(void) static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg, unsigned int nr_pages); -static void obj_cgroup_release(struct percpu_ref *ref) +static void obj_cgroup_release_bytes(struct obj_cgroup *objcg) { - struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); unsigned int nr_bytes; unsigned int nr_pages; - unsigned long flags; /* * At this point all allocated objects are freed, and @@ -282,9 +280,9 @@ static void obj_cgroup_release(struct percpu_ref *ref) * 3) CPU1: a process from another memcg is allocating something, * the stock if flushed, * objcg->nr_charged_bytes = PAGE_SIZE - 92 - * 5) CPU0: we do release this object, + * 4) CPU0: we do release this object, * 92 bytes are added to stock->nr_bytes - * 6) CPU0: stock is flushed, + * 5) CPU0: stock is flushed, * 92 bytes are added to objcg->nr_charged_bytes * * In the result, nr_charged_bytes == PAGE_SIZE. @@ -296,6 +294,19 @@ static void obj_cgroup_release(struct percpu_ref *ref) if (nr_pages) obj_cgroup_uncharge_pages(objcg, nr_pages); +} +#else +static inline void obj_cgroup_release_bytes(struct obj_cgroup *objcg) +{ +} +#endif + +static void obj_cgroup_release(struct percpu_ref *ref) +{ + struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); + unsigned long flags; + + obj_cgroup_release_bytes(objcg); spin_lock_irqsave(&objcg_lock, flags); list_del(&objcg->list); @@ -324,10 +335,10 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } -static void memcg_reparent_objcgs(struct mem_cgroup *memcg, - struct mem_cgroup *parent) +static void memcg_reparent_objcgs(struct mem_cgroup *memcg) { struct obj_cgroup *objcg, *iter; + struct mem_cgroup *parent = parent_mem_cgroup(memcg); objcg = rcu_replace_pointer(memcg->objcg, NULL, true); @@ -346,6 +357,7 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg, percpu_ref_kill(&objcg->refcnt); } +#ifdef CONFIG_MEMCG_KMEM /* * A lot of the calls to the cache allocation functions are expected to be * inlined by the compiler. Since the calls to memcg_slab_pre_alloc_hook() are @@ -3651,21 +3663,12 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, #ifdef CONFIG_MEMCG_KMEM static int memcg_online_kmem(struct mem_cgroup *memcg) { - struct obj_cgroup *objcg; - if (cgroup_memory_nokmem) return 0; if (unlikely(mem_cgroup_is_root(memcg))) return 0; - objcg = obj_cgroup_alloc(); - if (!objcg) - return -ENOMEM; - - objcg->memcg = memcg; - rcu_assign_pointer(memcg->objcg, objcg); - static_branch_enable(&memcg_kmem_enabled_key); memcg->kmemcg_id = memcg->id.id; @@ -3675,17 +3678,13 @@ static int memcg_online_kmem(struct mem_cgroup *memcg) static void memcg_offline_kmem(struct mem_cgroup *memcg) { - struct mem_cgroup *parent; - if (cgroup_memory_nokmem) return; if (unlikely(mem_cgroup_is_root(memcg))) return; - parent = parent_mem_cgroup(memcg); - memcg_reparent_objcgs(memcg, parent); - memcg_reparent_list_lrus(memcg, parent); + memcg_reparent_list_lrus(memcg, parent_mem_cgroup(memcg)); } #else static int memcg_online_kmem(struct mem_cgroup *memcg) @@ -5190,8 +5189,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void) memcg->socket_pressure = jiffies; #ifdef CONFIG_MEMCG_KMEM memcg->kmemcg_id = -1; - INIT_LIST_HEAD(&memcg->objcg_list); #endif + INIT_LIST_HEAD(&memcg->objcg_list); #ifdef CONFIG_CGROUP_WRITEBACK INIT_LIST_HEAD(&memcg->cgwb_list); for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++) @@ -5256,6 +5255,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) static int mem_cgroup_css_online(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); + struct obj_cgroup *objcg; if (memcg_online_kmem(memcg)) goto remove_id; @@ -5268,6 +5268,13 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) if (alloc_shrinker_info(memcg)) goto offline_kmem; + objcg = obj_cgroup_alloc(); + if (!objcg) + goto free_shrinker; + + objcg->memcg = memcg; + rcu_assign_pointer(memcg->objcg, objcg); + /* Online state pins memcg ID, memcg ID pins CSS */ refcount_set(&memcg->id.ref, 1); css_get(css); @@ -5276,6 +5283,8 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ); return 0; +free_shrinker: + free_shrinker_info(memcg); offline_kmem: memcg_offline_kmem(memcg); remove_id: @@ -5303,6 +5312,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); + memcg_reparent_objcgs(memcg); memcg_offline_kmem(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg); -- 2.11.0