Received: by 2002:ab2:788f:0:b0:1ee:8f2e:70ae with SMTP id b15csp77210lqi; Wed, 6 Mar 2024 10:27:19 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWvQi3tx1FSK2xeZPCqTWzJZ0FFK1xgxkj0iK6bfxK7ak2V7Ed+PypoYGTYwJeBeCxkhpVhwhNHmCdn8XCsQ284+HERFqqSfQHRbnUXbQ== X-Google-Smtp-Source: AGHT+IGF69aRjy/0gVmILmwIB7sJfLfXasFJsju9Gekkccu/I7QsxzTXFryO1CTUKH4eO7nZasYl X-Received: by 2002:a17:90a:c28d:b0:29b:260f:676 with SMTP id f13-20020a17090ac28d00b0029b260f0676mr11328927pjt.46.1709749639344; Wed, 06 Mar 2024 10:27:19 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709749639; cv=pass; d=google.com; s=arc-20160816; b=ZCxxyFehZve2jSMXfXZIzcxxV+mUCGmNgagYpdNODWAx+sSeVoL/dZIifEvm3crnlx +tciNRCXGFvRAErs7qbSadzW1jBYW8elpFUEfQj7kr7adjA2Wanq3rZjVU2j9J3NBB7P UZ+fqsyqi9p5csTrfcdQOserZc60W9LY/vMiC1t87WzdrkZ90SwR2QYPiEZMeM3UqNkM W6hGTO1HzK0IDwwtN11ifo0fLqpxcvv2kbIsliMB142aN/8HXEDiYarhwn3Ik+iXxqfX hpRhxOWa1Txq6e2vhoUYjpWmqYu1u8MN1E5Hr5jJbSrYqnkxaH4+RQ2QJucnANP8mjAe B1FQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=6JMcGt6EYvTroht8h7kHXMFTQxZQYrHbNcyZWU/g6ow=; fh=SJYRhB2cYXFI3LcWhEriDs1IEOlTkFHKvUfnhSGA4AA=; b=BeETAhK7XUz8ujDDx0XlJPG8QJQCPbLjwrAnNZ6iT6Or2psoTlC2XIe21O2d+gW6iA q6BGGEJU4dR3I+nH2NdIzIxuJlwgKmoyZ9auY+6uNYM+yW8y3O9AlgKPVutdPL3deUUL tKLXVA3uB2ObpLDpHn/W4R0KG891Ko4fVRxlkOJu9V8bjOxd3sgdAU3+BaKtlcUt54B6 /zYlZOtR+FDj3X+Le8DoqdMYo5/+czI0JV/tanI8TCW+/r/OnXqRubK/c2tf/75lz9nj 704EHDV/4liU+kTBsWFysZkxi2oDX4t8H59FlC7NKvYDiCcA0ozn0hcyvk1lqZ+XbDIB cfcg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=HFlwXGHq; arc=pass (i=1 spf=pass spfdomain=flex--surenb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-94397-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-94397-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id q3-20020a17090a750300b0029af167a61csi41330pjk.187.2024.03.06.10.27.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Mar 2024 10:27:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-94397-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=HFlwXGHq; arc=pass (i=1 spf=pass spfdomain=flex--surenb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-94397-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-94397-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id B30A8280294 for ; Wed, 6 Mar 2024 18:27:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6C62C143C71; Wed, 6 Mar 2024 18:25:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="HFlwXGHq" Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A8E3C13F447 for ; Wed, 6 Mar 2024 18:24:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709749501; cv=none; b=nDh8PcY1VXXMKjOc4xZxMLpcySBWNx5JUXCOjbWjzIq9UBii7UfEdw0j0+QLWhMT/kWsJ8VsEktuF/CKhwvFxUXYN0o6FRWDZ88l4CvEeftR+wlw7A99BsnjNI08S4pquruqwDY5pK1M5pm/vkXcpQnmmaw2C9zK+WTWdZChTqU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709749501; c=relaxed/simple; bh=6Qs4+AjxStvPZ59uQrS75z0ksUS31776UxioDnITUPY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=UBUz2iVp5ORmsCqFKJh+VWeBO7jGG9Naq/h1+Rl3LmcC4fDfWc/ulHNGVUJCp01rrSmTfQzpU34/UQxi1cyUXGEN/oJC9ONoFTH5lEScmu7EbTYvoKU/DprcxhRELMFAk5jrbXssZbU6oB0P9Twt/phA5LRfpoCO8SG1UyJqIpU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=HFlwXGHq; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-609a1063919so683987b3.0 for ; Wed, 06 Mar 2024 10:24:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709749498; x=1710354298; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6JMcGt6EYvTroht8h7kHXMFTQxZQYrHbNcyZWU/g6ow=; b=HFlwXGHq0SBN0tOU1obL0TnQglIg7wanebg13fKt5WaV8txQo1s7wzl8Vv0zzasKJP QvI7Awq4WJnhU+hMhirZF3yPG+VNQg8hgiuaI3gAyaBMzLCVqCmGHMlx8XUz40eg7BfQ eJRAeaP/rM5/I/bV+FfnwIWrgWYcmD/w6c0pzFfjdVy9LcwiUA3X1LSi8Uq8A8c/o6sj UScS/QTqyhORxrjlYpC+SS+tr+K0/yIC+dmzh7ocJjwiI+zIBwGK05N9zEHM9QO6wchO Rf0N7sw1+QTpQDAR+Uf8/tfVfPeVav6H/PbWGiSgWXPWlhgVmU9qROtuU7OVPVlTPSzO Wkug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709749498; x=1710354298; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6JMcGt6EYvTroht8h7kHXMFTQxZQYrHbNcyZWU/g6ow=; b=la4YNfReKOQGsgzUSVU/wMyYWhBYWwBmZFk/bYfXp1jGq1pBcapjd2UfcKkyQKUnoC NOu6RRqKM/3LNKWDr03WG6kFpNrkn4RqkOlFEdw9CEVmbTU05j8Na0FURarYoiBQ8uI8 lYSDeaJlTNh9PeOaZ3QLEJY4PPaM0UUZP1VFIhgRSvfCAowl2cuTofjMaUpTfuIDk2H2 PFgjG2q1elGKcI7GF49cx27M8IDybKI9FjTN4caKQb5wFXJ4wu/HSnxSwlY8KxCNbT4e zFK7lAeQowzqLnUQB4QMK2UURPI5d5y/vstlWPcFeE2ewq23bXpPEG4wKwd1a7KvYq4y 5kNg== X-Forwarded-Encrypted: i=1; AJvYcCW0Rj6rlMPYhHY7/cUikSr9Tf6h7rQxE2rsFOGFKfqQo90THdyxngVVG00+YHfyIgYuYcjIaj2IxrvR4WGbojupWMaGcWq3SWed9gVU X-Gm-Message-State: AOJu0Yxrs9zMzeVOwTAgoUcX3s+fh+hj49LgUkri08IN5RHhVLXWJXsg hGpLU9sGYcL5YDdImfb+LDKbcDukiWvzPdhy83dU6o2IYqH3TMbSCjGaQGflLOE4IArNKB0Nri0 VEA== X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:85f0:e3db:db05:85e2]) (user=surenb job=sendgmr) by 2002:a05:6902:722:b0:dcd:875:4c40 with SMTP id l2-20020a056902072200b00dcd08754c40mr4233991ybt.10.1709749497766; Wed, 06 Mar 2024 10:24:57 -0800 (PST) Date: Wed, 6 Mar 2024 10:24:04 -0800 In-Reply-To: <20240306182440.2003814-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240306182440.2003814-1-surenb@google.com> X-Mailer: git-send-email 2.44.0.278.ge034bb2e1d-goog Message-ID: <20240306182440.2003814-7-surenb@google.com> Subject: [PATCH v5 06/37] mm: introduce slabobj_ext to support slab object extensions From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: kent.overstreet@linux.dev, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, penguin-kernel@i-love.sakura.ne.jp, corbet@lwn.net, void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com, catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com, david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org, jhubbard@nvidia.com, tj@kernel.org, muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com, yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org, ndesaulniers@google.com, vvvvvv@google.com, gregkh@linuxfoundation.org, ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com, elver@google.com, dvyukov@google.com, shakeelb@google.com, songmuchun@bytedance.com, jbaron@akamai.com, aliceryhl@google.com, rientjes@google.com, minchan@google.com, kaleshsingh@google.com, surenb@google.com, kernel-team@android.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Currently slab pages can store only vectors of obj_cgroup pointers in page->memcg_data. Introduce slabobj_ext structure to allow more data to be stored for each slab object. Wrap obj_cgroup into slabobj_ext to support current functionality while allowing to extend slabobj_ext in the future. Signed-off-by: Suren Baghdasaryan Reviewed-by: Pasha Tatashin --- include/linux/memcontrol.h | 20 ++++--- include/linux/mm_types.h | 4 +- init/Kconfig | 4 ++ mm/kfence/core.c | 14 ++--- mm/kfence/kfence.h | 4 +- mm/memcontrol.c | 56 +++----------------- mm/page_owner.c | 2 +- mm/slab.h | 52 +++++++++--------- mm/slub.c | 106 ++++++++++++++++++++++++++++--------- 9 files changed, 145 insertions(+), 117 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 394fd0a887ae..9a731523000d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -349,8 +349,8 @@ struct mem_cgroup { extern struct mem_cgroup *root_mem_cgroup; enum page_memcg_data_flags { - /* page->memcg_data is a pointer to an objcgs vector */ - MEMCG_DATA_OBJCGS = (1UL << 0), + /* page->memcg_data is a pointer to an slabobj_ext vector */ + MEMCG_DATA_OBJEXTS = (1UL << 0), /* page has been accounted as a non-slab kernel page */ MEMCG_DATA_KMEM = (1UL << 1), /* the next bit after the last actual flag */ @@ -388,7 +388,7 @@ static inline struct mem_cgroup *__folio_memcg(struct folio *folio) unsigned long memcg_data = folio->memcg_data; VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); - VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio); + VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio); return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); @@ -409,7 +409,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio) unsigned long memcg_data = folio->memcg_data; VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); - VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio); + VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio); return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); @@ -506,7 +506,7 @@ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio) */ unsigned long memcg_data = READ_ONCE(folio->memcg_data); - if (memcg_data & MEMCG_DATA_OBJCGS) + if (memcg_data & MEMCG_DATA_OBJEXTS) return NULL; if (memcg_data & MEMCG_DATA_KMEM) { @@ -552,7 +552,7 @@ static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *ob static inline bool folio_memcg_kmem(struct folio *folio) { VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page); - VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJCGS, folio); + VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio); return folio->memcg_data & MEMCG_DATA_KMEM; } @@ -1632,6 +1632,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, } #endif /* CONFIG_MEMCG */ +/* + * Extended information for slab objects stored as an array in page->memcg_data + * if MEMCG_DATA_OBJEXTS is set. + */ +struct slabobj_ext { + struct obj_cgroup *objcg; +} __aligned(8); + static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx) { __mod_lruvec_kmem_state(p, idx, 1); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5240bd7bca33..4ae4684d1add 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -169,7 +169,7 @@ struct page { /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ atomic_t _refcount; -#ifdef CONFIG_MEMCG +#ifdef CONFIG_SLAB_OBJ_EXT unsigned long memcg_data; #endif @@ -331,7 +331,7 @@ struct folio { }; atomic_t _mapcount; atomic_t _refcount; -#ifdef CONFIG_MEMCG +#ifdef CONFIG_SLAB_OBJ_EXT unsigned long memcg_data; #endif #if defined(WANT_PAGE_VIRTUAL) diff --git a/init/Kconfig b/init/Kconfig index bee58f7468c3..d160ab591308 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -929,6 +929,9 @@ config NUMA_BALANCING_DEFAULT_ENABLED If set, automatic NUMA balancing will be enabled if running on a NUMA machine. +config SLAB_OBJ_EXT + bool + menuconfig CGROUPS bool "Control Group support" select KERNFS @@ -962,6 +965,7 @@ config MEMCG bool "Memory controller" select PAGE_COUNTER select EVENTFD + select SLAB_OBJ_EXT help Provides control over the memory footprint of tasks in a cgroup. diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 8350f5c06f2e..964b8482275b 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -595,9 +595,9 @@ static unsigned long kfence_init_pool(void) continue; __folio_set_slab(slab_folio(slab)); -#ifdef CONFIG_MEMCG - slab->memcg_data = (unsigned long)&kfence_metadata_init[i / 2 - 1].objcg | - MEMCG_DATA_OBJCGS; +#ifdef CONFIG_MEMCG_KMEM + slab->obj_exts = (unsigned long)&kfence_metadata_init[i / 2 - 1].obj_exts | + MEMCG_DATA_OBJEXTS; #endif } @@ -645,8 +645,8 @@ static unsigned long kfence_init_pool(void) if (!i || (i % 2)) continue; -#ifdef CONFIG_MEMCG - slab->memcg_data = 0; +#ifdef CONFIG_MEMCG_KMEM + slab->obj_exts = 0; #endif __folio_clear_slab(slab_folio(slab)); } @@ -1139,8 +1139,8 @@ void __kfence_free(void *addr) { struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr); -#ifdef CONFIG_MEMCG - KFENCE_WARN_ON(meta->objcg); +#ifdef CONFIG_MEMCG_KMEM + KFENCE_WARN_ON(meta->obj_exts.objcg); #endif /* * If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index f46fbb03062b..084f5f36e8e7 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -97,8 +97,8 @@ struct kfence_metadata { struct kfence_track free_track; /* For updating alloc_covered on frees. */ u32 alloc_stack_hash; -#ifdef CONFIG_MEMCG - struct obj_cgroup *objcg; +#ifdef CONFIG_MEMCG_KMEM + struct slabobj_ext obj_exts; #endif }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 138bcfa18234..f99c63d5e9f2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2980,13 +2980,6 @@ void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg) } #ifdef CONFIG_MEMCG_KMEM -/* - * The allocated objcg pointers array is not accounted directly. - * Moreover, it should not come from DMA buffer and is not readily - * reclaimable. So those GFP bits should be masked off. - */ -#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | \ - __GFP_ACCOUNT | __GFP_NOFAIL) /* * mod_objcg_mlstate() may be called with irq enabled, so @@ -3006,62 +2999,27 @@ static inline void mod_objcg_mlstate(struct obj_cgroup *objcg, rcu_read_unlock(); } -int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s, - gfp_t gfp, bool new_slab) -{ - unsigned int objects = objs_per_slab(s, slab); - unsigned long memcg_data; - void *vec; - - gfp &= ~OBJCGS_CLEAR_MASK; - vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp, - slab_nid(slab)); - if (!vec) - return -ENOMEM; - - memcg_data = (unsigned long) vec | MEMCG_DATA_OBJCGS; - if (new_slab) { - /* - * If the slab is brand new and nobody can yet access its - * memcg_data, no synchronization is required and memcg_data can - * be simply assigned. - */ - slab->memcg_data = memcg_data; - } else if (cmpxchg(&slab->memcg_data, 0, memcg_data)) { - /* - * If the slab is already in use, somebody can allocate and - * assign obj_cgroups in parallel. In this case the existing - * objcg vector should be reused. - */ - kfree(vec); - return 0; - } - - kmemleak_not_leak(vec); - return 0; -} - static __always_inline struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p) { /* * Slab objects are accounted individually, not per-page. * Memcg membership data for each individual object is saved in - * slab->memcg_data. + * slab->obj_exts. */ if (folio_test_slab(folio)) { - struct obj_cgroup **objcgs; + struct slabobj_ext *obj_exts; struct slab *slab; unsigned int off; slab = folio_slab(folio); - objcgs = slab_objcgs(slab); - if (!objcgs) + obj_exts = slab_obj_exts(slab); + if (!obj_exts) return NULL; off = obj_to_index(slab->slab_cache, slab, p); - if (objcgs[off]) - return obj_cgroup_memcg(objcgs[off]); + if (obj_exts[off].objcg) + return obj_cgroup_memcg(obj_exts[off].objcg); return NULL; } @@ -3069,7 +3027,7 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p) /* * folio_memcg_check() is used here, because in theory we can encounter * a folio where the slab flag has been cleared already, but - * slab->memcg_data has not been freed yet + * slab->obj_exts has not been freed yet * folio_memcg_check() will guarantee that a proper memory * cgroup pointer or NULL will be returned. */ diff --git a/mm/page_owner.c b/mm/page_owner.c index 033e349f6479..d21d00a04e98 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -469,7 +469,7 @@ static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret, if (!memcg_data) goto out_unlock; - if (memcg_data & MEMCG_DATA_OBJCGS) + if (memcg_data & MEMCG_DATA_OBJEXTS) ret += scnprintf(kbuf + ret, count - ret, "Slab cache page\n"); diff --git a/mm/slab.h b/mm/slab.h index 54deeb0428c6..0e61a5834c5f 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -87,8 +87,8 @@ struct slab { unsigned int __unused; atomic_t __page_refcount; -#ifdef CONFIG_MEMCG - unsigned long memcg_data; +#ifdef CONFIG_SLAB_OBJ_EXT + unsigned long obj_exts; #endif }; @@ -97,8 +97,8 @@ struct slab { SLAB_MATCH(flags, __page_flags); SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */ SLAB_MATCH(_refcount, __page_refcount); -#ifdef CONFIG_MEMCG -SLAB_MATCH(memcg_data, memcg_data); +#ifdef CONFIG_SLAB_OBJ_EXT +SLAB_MATCH(memcg_data, obj_exts); #endif #undef SLAB_MATCH static_assert(sizeof(struct slab) <= sizeof(struct page)); @@ -541,42 +541,44 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla return false; } -#ifdef CONFIG_MEMCG_KMEM +#ifdef CONFIG_SLAB_OBJ_EXT + /* - * slab_objcgs - get the object cgroups vector associated with a slab + * slab_obj_exts - get the pointer to the slab object extension vector + * associated with a slab. * @slab: a pointer to the slab struct * - * Returns a pointer to the object cgroups vector associated with the slab, + * Returns a pointer to the object extension vector associated with the slab, * or NULL if no such vector has been associated yet. */ -static inline struct obj_cgroup **slab_objcgs(struct slab *slab) +static inline struct slabobj_ext *slab_obj_exts(struct slab *slab) { - unsigned long memcg_data = READ_ONCE(slab->memcg_data); + unsigned long obj_exts = READ_ONCE(slab->obj_exts); - VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS), +#ifdef CONFIG_MEMCG + VM_BUG_ON_PAGE(obj_exts && !(obj_exts & MEMCG_DATA_OBJEXTS), slab_page(slab)); - VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, slab_page(slab)); + VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab)); - return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return (struct slabobj_ext *)(obj_exts & ~MEMCG_DATA_FLAGS_MASK); +#else + return (struct slabobj_ext *)obj_exts; +#endif } -int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s, - gfp_t gfp, bool new_slab); -void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, - enum node_stat_item idx, int nr); -#else /* CONFIG_MEMCG_KMEM */ -static inline struct obj_cgroup **slab_objcgs(struct slab *slab) +#else /* CONFIG_SLAB_OBJ_EXT */ + +static inline struct slabobj_ext *slab_obj_exts(struct slab *slab) { return NULL; } -static inline int memcg_alloc_slab_cgroups(struct slab *slab, - struct kmem_cache *s, gfp_t gfp, - bool new_slab) -{ - return 0; -} -#endif /* CONFIG_MEMCG_KMEM */ +#endif /* CONFIG_SLAB_OBJ_EXT */ + +#ifdef CONFIG_MEMCG_KMEM +void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, + enum node_stat_item idx, int nr); +#endif size_t __ksize(const void *objp); diff --git a/mm/slub.c b/mm/slub.c index 0f3369f6188b..6ab9f8f38ac5 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1881,13 +1881,78 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s) NR_SLAB_RECLAIMABLE_B : NR_SLAB_UNRECLAIMABLE_B; } -#ifdef CONFIG_MEMCG_KMEM -static inline void memcg_free_slab_cgroups(struct slab *slab) +#ifdef CONFIG_SLAB_OBJ_EXT + +/* + * The allocated objcg pointers array is not accounted directly. + * Moreover, it should not come from DMA buffer and is not readily + * reclaimable. So those GFP bits should be masked off. + */ +#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | \ + __GFP_ACCOUNT | __GFP_NOFAIL) + +static int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s, + gfp_t gfp, bool new_slab) { - kfree(slab_objcgs(slab)); - slab->memcg_data = 0; + unsigned int objects = objs_per_slab(s, slab); + unsigned long obj_exts; + void *vec; + + gfp &= ~OBJCGS_CLEAR_MASK; + vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp, + slab_nid(slab)); + if (!vec) + return -ENOMEM; + + obj_exts = (unsigned long)vec; +#ifdef CONFIG_MEMCG + obj_exts |= MEMCG_DATA_OBJEXTS; +#endif + if (new_slab) { + /* + * If the slab is brand new and nobody can yet access its + * obj_exts, no synchronization is required and obj_exts can + * be simply assigned. + */ + slab->obj_exts = obj_exts; + } else if (cmpxchg(&slab->obj_exts, 0, obj_exts)) { + /* + * If the slab is already in use, somebody can allocate and + * assign slabobj_exts in parallel. In this case the existing + * objcg vector should be reused. + */ + kfree(vec); + return 0; + } + + kmemleak_not_leak(vec); + return 0; } +static inline void free_slab_obj_exts(struct slab *slab) +{ + struct slabobj_ext *obj_exts; + + obj_exts = slab_obj_exts(slab); + if (!obj_exts) + return; + + kfree(obj_exts); + slab->obj_exts = 0; +} +#else /* CONFIG_SLAB_OBJ_EXT */ +static int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s, + gfp_t gfp, bool new_slab) +{ + return 0; +} + +static inline void free_slab_obj_exts(struct slab *slab) +{ +} +#endif /* CONFIG_SLAB_OBJ_EXT */ + +#ifdef CONFIG_MEMCG_KMEM static inline size_t obj_full_size(struct kmem_cache *s) { /* @@ -1966,15 +2031,15 @@ static void __memcg_slab_post_alloc_hook(struct kmem_cache *s, if (likely(p[i])) { slab = virt_to_slab(p[i]); - if (!slab_objcgs(slab) && - memcg_alloc_slab_cgroups(slab, s, flags, false)) { + if (!slab_obj_exts(slab) && + alloc_slab_obj_exts(slab, s, flags, false)) { obj_cgroup_uncharge(objcg, obj_full_size(s)); continue; } off = obj_to_index(s, slab, p[i]); obj_cgroup_get(objcg); - slab_objcgs(slab)[off] = objcg; + slab_obj_exts(slab)[off].objcg = objcg; mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s), obj_full_size(s)); } else { @@ -1995,18 +2060,18 @@ void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, static void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects, - struct obj_cgroup **objcgs) + struct slabobj_ext *obj_exts) { for (int i = 0; i < objects; i++) { struct obj_cgroup *objcg; unsigned int off; off = obj_to_index(s, slab, p[i]); - objcg = objcgs[off]; + objcg = obj_exts[off].objcg; if (!objcg) continue; - objcgs[off] = NULL; + obj_exts[off].objcg = NULL; obj_cgroup_uncharge(objcg, obj_full_size(s)); mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s), -obj_full_size(s)); @@ -2018,16 +2083,16 @@ static __fastpath_inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects) { - struct obj_cgroup **objcgs; + struct slabobj_ext *obj_exts; if (!memcg_kmem_online()) return; - objcgs = slab_objcgs(slab); - if (likely(!objcgs)) + obj_exts = slab_obj_exts(slab); + if (likely(!obj_exts)) return; - __memcg_slab_free_hook(s, slab, p, objects, objcgs); + __memcg_slab_free_hook(s, slab, p, objects, obj_exts); } static inline @@ -2038,15 +2103,6 @@ void memcg_slab_alloc_error_hook(struct kmem_cache *s, int objects, obj_cgroup_uncharge(objcg, objects * obj_full_size(s)); } #else /* CONFIG_MEMCG_KMEM */ -static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr) -{ - return NULL; -} - -static inline void memcg_free_slab_cgroups(struct slab *slab) -{ -} - static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, struct list_lru *lru, struct obj_cgroup **objcgp, @@ -2314,7 +2370,7 @@ static __always_inline void account_slab(struct slab *slab, int order, struct kmem_cache *s, gfp_t gfp) { if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT)) - memcg_alloc_slab_cgroups(slab, s, gfp, true); + alloc_slab_obj_exts(slab, s, gfp, true); mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s), PAGE_SIZE << order); @@ -2324,7 +2380,7 @@ static __always_inline void unaccount_slab(struct slab *slab, int order, struct kmem_cache *s) { if (memcg_kmem_online()) - memcg_free_slab_cgroups(slab); + free_slab_obj_exts(slab); mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s), -(PAGE_SIZE << order)); -- 2.44.0.278.ge034bb2e1d-goog