Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp417524ybe; Fri, 6 Sep 2019 01:22:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqy/JAygJJQvWR6DFbZuujipwpyyvgv65k/ukTz1jJP42SVgkKk4q3EE88OWKL9qwkqqZHpq X-Received: by 2002:a62:ce0e:: with SMTP id y14mr9188797pfg.73.1567758136681; Fri, 06 Sep 2019 01:22:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567758136; cv=none; d=google.com; s=arc-20160816; b=PJd66I0vYIqZRyy6S8F0b9tfTOK6NYReEueZdyF1Ubi5k/4Ih6oCIEIHMvlgY++hJD hHpbENj0TwzlX1MhfDzxQ6tI5jLdK7kBNZeGO7LoeRddOt7Lpb96BX70AUspnk10z3nP NDEHRuU5ZJZ48wl33nTU0mwByjkGrEMZm3US0X9xFwmcHo2hvWUHuBtcbxSKAF3iivM0 N7ewaa8AiMF7MFpqxMyo4t/XceeRoYm6dEv9ssYISByEfs0A3ggGF6GTw8Lj719TO1r6 V5P5TaT6Yrx5x5rnAacPiYxTjpCd7TIgujq3ieL+W5emoaYYreO9HyAJsSgo+7zrlZVX rL5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:smtp-origin-cluster:cc:to :smtp-origin-hostname:from:smtp-origin-hostprefix:dkim-signature; bh=F6+Y/3KaVJP+sTReowVqRgiVYePOKXbKfRKu1OS1cQw=; b=Uc41u3T/JxTvu3tsm7cewExP6pKnH/YlWBfwIL2y97yuJLaEEvJRWuIk3bkUlIXaJP 2oU0l+9m79ilyEm+Wpk51Ai6shkrpLRI6ZNNYeA15CP9VqPh0s17nrY3Ydl0zX3aM0hj 7oew+txc26ZxMDrjLglXZZSl/SAY3I2+v1yoOfMwPgR2i2tT2ecRyGTtv3mgB3jwePg6 Ly6DlhgNMkGIiidwQBLSF9n6SLvF8xn6QImpHndwHKzqlkiFYGDDM9bZCpPWMbV8RT9K exCv4YiQ/a3NpFZYbqklFEPl+RWBQilnqu0dNHtRkN0MbbwSFlGVBhm/Y9IR/GmHgaUm KkvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=CNOo37C+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fb.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p2si3842894pff.183.2019.09.06.01.21.59; Fri, 06 Sep 2019 01:22:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=CNOo37C+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fb.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404168AbfIEVqb (ORCPT + 99 others); Thu, 5 Sep 2019 17:46:31 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:52030 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390941AbfIEVqP (ORCPT ); Thu, 5 Sep 2019 17:46:15 -0400 Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x85LjAkW007158 for ; Thu, 5 Sep 2019 14:46:14 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=F6+Y/3KaVJP+sTReowVqRgiVYePOKXbKfRKu1OS1cQw=; b=CNOo37C+wR2FT9w55v9ic/fR2lcHK0+5ibSpi3gCiet14MHztSwdQmTCxECKpkawlFrz cNT48JxVG1xWiacjxiu3FP4NIufgJ4FrCOm9jbaZw1XwFfP/ds34no/uvndPhO90KziS cd7Ko/mbtQPu4baCb4DIyzvuRUP8vLq0Q84= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2utkkxwup2-13 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 05 Sep 2019 14:46:14 -0700 Received: from mx-out.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 5 Sep 2019 14:46:08 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id A2A7A17229DF8; Thu, 5 Sep 2019 14:46:06 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH RFC 01/14] mm: memcg: subpage charging API Date: Thu, 5 Sep 2019 14:45:45 -0700 Message-ID: <20190905214553.1643060-2-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190905214553.1643060-1-guro@fb.com> References: <20190905214553.1643060-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.70,1.0.8 definitions=2019-09-05_08:2019-09-04,2019-09-05 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 malwarescore=0 lowpriorityscore=0 spamscore=0 priorityscore=1501 phishscore=0 mlxlogscore=999 mlxscore=0 clxscore=1015 adultscore=0 suspectscore=1 impostorscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1906280000 definitions=main-1909050203 X-FB-Internal: deliver Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Introduce an API to charge subpage objects to the memory cgroup. The API will be used by the new slab memory controller. Later it can also be used to implement percpu memory accounting. In both cases, a single page can be shared between multiple cgroups (and in percpu case a single allocation is split over multiple pages), so it's not possible to use page-based accounting. The implementation is based on percpu stocks. Memory cgroups are still charged in pages, and the residue is stored in perpcu stock, or on the memcg itself, when it's necessary to flush the stock. Please, note, that unlike the generic page charging API, a subpage object is not holding a reference to the memory cgroup. It's because a more complicated indirect scheme is required in order to implement cheap reparenting. The percpu stock holds a single reference to the cached cgroup though. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h | 4 ++ mm/memcontrol.c | 129 +++++++++++++++++++++++++++++++++---- 2 files changed, 119 insertions(+), 14 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0c762e8ca6a6..120d39066148 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -214,6 +214,7 @@ struct mem_cgroup { /* Accounted resources */ struct page_counter memory; struct page_counter swap; + atomic_t nr_stocked_bytes; /* Legacy consumer-oriented counters */ struct page_counter memsw; @@ -1370,6 +1371,9 @@ int __memcg_kmem_charge_memcg(struct page *page, gfp_t gfp, int order, struct mem_cgroup *memcg); void __memcg_kmem_uncharge_memcg(struct mem_cgroup *memcg, unsigned int nr_pages); +int __memcg_kmem_charge_subpage(struct mem_cgroup *memcg, size_t size, + gfp_t gfp); +void __memcg_kmem_uncharge_subpage(struct mem_cgroup *memcg, size_t size); extern struct static_key_false memcg_kmem_enabled_key; extern struct workqueue_struct *memcg_kmem_cache_wq; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1c4c08b45e44..effefcec47b3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2149,6 +2149,10 @@ EXPORT_SYMBOL(unlock_page_memcg); struct memcg_stock_pcp { struct mem_cgroup *cached; /* this never be root cgroup */ unsigned int nr_pages; + + struct mem_cgroup *subpage_cached; + unsigned int nr_bytes; + struct work_struct work; unsigned long flags; #define FLUSHING_CACHED_CHARGE 0 @@ -2189,6 +2193,29 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) return ret; } +static bool consume_subpage_stock(struct mem_cgroup *memcg, + unsigned int nr_bytes) +{ + struct memcg_stock_pcp *stock; + unsigned long flags; + bool ret = false; + + if (nr_bytes > (MEMCG_CHARGE_BATCH << PAGE_SHIFT)) + return ret; + + local_irq_save(flags); + + stock = this_cpu_ptr(&memcg_stock); + if (memcg == stock->subpage_cached && stock->nr_bytes >= nr_bytes) { + stock->nr_bytes -= nr_bytes; + ret = true; + } + + local_irq_restore(flags); + + return ret; +} + /* * Returns stocks cached in percpu and reset cached information. */ @@ -2206,6 +2233,27 @@ static void drain_stock(struct memcg_stock_pcp *stock) stock->cached = NULL; } +static void drain_subpage_stock(struct memcg_stock_pcp *stock) +{ + struct mem_cgroup *old = stock->subpage_cached; + + if (stock->nr_bytes) { + unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT; + unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1); + + page_counter_uncharge(&old->memory, nr_pages); + if (do_memsw_account()) + page_counter_uncharge(&old->memsw, nr_pages); + + atomic_add(nr_bytes, &old->nr_stocked_bytes); + stock->nr_bytes = 0; + } + if (stock->subpage_cached) { + css_put(&old->css); + stock->subpage_cached = NULL; + } +} + static void drain_local_stock(struct work_struct *dummy) { struct memcg_stock_pcp *stock; @@ -2218,8 +2266,11 @@ static void drain_local_stock(struct work_struct *dummy) local_irq_save(flags); stock = this_cpu_ptr(&memcg_stock); - drain_stock(stock); - clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); + if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { + drain_stock(stock); + drain_subpage_stock(stock); + clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); + } local_irq_restore(flags); } @@ -2248,6 +2299,29 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) local_irq_restore(flags); } +static void refill_subpage_stock(struct mem_cgroup *memcg, + unsigned int nr_bytes) +{ + struct memcg_stock_pcp *stock; + unsigned long flags; + + local_irq_save(flags); + + stock = this_cpu_ptr(&memcg_stock); + if (stock->subpage_cached != memcg) { /* reset if necessary */ + drain_subpage_stock(stock); + css_get(&memcg->css); + stock->subpage_cached = memcg; + stock->nr_bytes = atomic_xchg(&memcg->nr_stocked_bytes, 0); + } + stock->nr_bytes += nr_bytes; + + if (stock->nr_bytes > (MEMCG_CHARGE_BATCH << PAGE_SHIFT)) + drain_subpage_stock(stock); + + local_irq_restore(flags); +} + /* * Drains all per-CPU charge caches for given root_memcg resp. subtree * of the hierarchy under it. @@ -2276,6 +2350,9 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) if (memcg && stock->nr_pages && mem_cgroup_is_descendant(memcg, root_memcg)) flush = true; + memcg = stock->subpage_cached; + if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) + flush = true; rcu_read_unlock(); if (flush && @@ -2500,8 +2577,9 @@ void mem_cgroup_handle_over_high(void) } static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, - unsigned int nr_pages) + unsigned int amount, bool subpage) { + unsigned int nr_pages = subpage ? ((amount >> PAGE_SHIFT) + 1) : amount; unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages); int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; struct mem_cgroup *mem_over_limit; @@ -2514,7 +2592,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, if (mem_cgroup_is_root(memcg)) return 0; retry: - if (consume_stock(memcg, nr_pages)) + if (subpage && consume_subpage_stock(memcg, amount)) + return 0; + else if (!subpage && consume_stock(memcg, nr_pages)) return 0; if (!do_memsw_account() || @@ -2632,14 +2712,22 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); - css_get_many(&memcg->css, nr_pages); + + if (subpage) + refill_subpage_stock(memcg, (nr_pages << PAGE_SHIFT) - amount); + else + css_get_many(&memcg->css, nr_pages); return 0; done_restock: - css_get_many(&memcg->css, batch); - if (batch > nr_pages) - refill_stock(memcg, batch - nr_pages); + if (subpage && (batch << PAGE_SHIFT) > amount) { + refill_subpage_stock(memcg, (batch << PAGE_SHIFT) - amount); + } else if (!subpage) { + css_get_many(&memcg->css, batch); + if (batch > nr_pages) + refill_stock(memcg, batch - nr_pages); + } /* * If the hierarchy is above the normal consumption range, schedule @@ -2942,7 +3030,7 @@ int __memcg_kmem_charge_memcg(struct page *page, gfp_t gfp, int order, struct page_counter *counter; int ret; - ret = try_charge(memcg, gfp, nr_pages); + ret = try_charge(memcg, gfp, nr_pages, false); if (ret) return ret; @@ -3020,6 +3108,18 @@ void __memcg_kmem_uncharge(struct page *page, int order) css_put_many(&memcg->css, nr_pages); } + +int __memcg_kmem_charge_subpage(struct mem_cgroup *memcg, size_t size, + gfp_t gfp) +{ + return try_charge(memcg, gfp, size, true); +} + +void __memcg_kmem_uncharge_subpage(struct mem_cgroup *memcg, size_t size) +{ + refill_subpage_stock(memcg, size); +} + #endif /* CONFIG_MEMCG_KMEM */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -5267,7 +5367,8 @@ static int mem_cgroup_do_precharge(unsigned long count) int ret; /* Try a single bulk charge without reclaim first, kswapd may wake */ - ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count); + ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count, + false); if (!ret) { mc.precharge += count; return ret; @@ -5275,7 +5376,7 @@ static int mem_cgroup_do_precharge(unsigned long count) /* Try charges one by one with reclaim, but do not retry */ while (count--) { - ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1); + ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1, false); if (ret) return ret; mc.precharge++; @@ -6487,7 +6588,7 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, if (!memcg) memcg = get_mem_cgroup_from_mm(mm); - ret = try_charge(memcg, gfp_mask, nr_pages); + ret = try_charge(memcg, gfp_mask, nr_pages, false); css_put(&memcg->css); out: @@ -6866,10 +6967,10 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages) mod_memcg_state(memcg, MEMCG_SOCK, nr_pages); - if (try_charge(memcg, gfp_mask, nr_pages) == 0) + if (try_charge(memcg, gfp_mask, nr_pages, false) == 0) return true; - try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages); + try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages, false); return false; } -- 2.21.0