Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp3780366iog; Tue, 21 Jun 2022 06:00:26 -0700 (PDT) X-Google-Smtp-Source: AGRyM1snWrj5GyKJpR0q/SZ9T3RURvNfqVB8mEyHKwaf4GZ9ogEF9rTg8fMXM4B00bHgb8xBGTp0 X-Received: by 2002:a17:902:6a87:b0:16a:13d6:e7f1 with SMTP id n7-20020a1709026a8700b0016a13d6e7f1mr16204486plk.75.1655816426542; Tue, 21 Jun 2022 06:00:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655816426; cv=none; d=google.com; s=arc-20160816; b=T3rP33Zxi4fDPDkED6mmHiNiF0PNsb2eNtxc8B7595EshPNdt3lKLMu3I+7z6GvdXk Ayu5WYEfNs7TeQr4itfbpIER0bPXIZckFBVbO0ni75DjcRcHxPsMkVJx+BqpuQmbBvR+ i2rrGQdAr6f1tcZJgBu0SIL3SiQ5rYhfZvhuiN2rUe/o7EhgmGNfo/pCgunEplFwKXWJ GJrcJoRwURsc8N5B1Z2K7lMpdWXkv0xEcWJr1qzYFs3gTy//tI7zmcaCSR2VSTsakGpF L0djjvALwrwH17GCWCZtq+LfX7LmvVlwn3w1t6kVuLSGPbmbLQJ9WkU3NuoczAK7xJeA 9C+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FY/yyaqIA+0UtXl+c+2/CMp1DjrnJ12MKuKhyE1HfHU=; b=Ko9PJMwJlo1Qcrr8ttsqyXhmQ+wXdwN1lz4mftPue9Mh6M7imORAH499PC/hrGtN62 Mku4CyX1uTX2ILP41IrUm2+EMSWSN1EdMOskn7o9QKPmV7nV6LJkMJNmn8aDTUFj3s18 pwOA0OwSsv2A7Mlp9HHdKW3uyZHd5tfVBxvc/naSkvph/I5JkQ2FY6VzSouXdOXrXcnH w+GrdLLCERecDTGI6LR4cuuWBBqFjpRGQMb0Vj8W34Uxk0SYWCv61Mm5XhL/B5BnrWGb NffGIqRz6wSxVCiSARhrWvRwiMls6XaxIXg/8PUdkfXUvVO03SFNnxlwPvfxLG6ePZqN RXrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="Zea++/WG"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s11-20020a17090302cb00b001663dfef4aasi18978785plk.141.2022.06.21.05.59.47; Tue, 21 Jun 2022 06:00:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="Zea++/WG"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351066AbiFUM7C (ORCPT + 99 others); Tue, 21 Jun 2022 08:59:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56880 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351100AbiFUM62 (ORCPT ); Tue, 21 Jun 2022 08:58:28 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD42ADEC3 for ; Tue, 21 Jun 2022 05:58:17 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id y13-20020a17090a154d00b001eaaa3b9b8dso13297773pja.2 for ; Tue, 21 Jun 2022 05:58:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=FY/yyaqIA+0UtXl+c+2/CMp1DjrnJ12MKuKhyE1HfHU=; b=Zea++/WGVZTO0KrQJEfldWpAcOWAaI48Jtbz0zDDjspXNq0Je6rF/v2EPq9xOK1S1J 0oPhK7zmf1lsxmyg+s4mVdjYZTk9eTqeXka+6G5SUo0W5JEV9GtmUfgAtsLcNOu9EFKG ddgnYJxu4rhdMX9NoV28KjnkyEgOZblD6QlVqounYQqDKyWB2zPn2wDoGDpD/YDPyLm8 DfPqoOodi0yktIftubkOblm5fzrj9TgFJLmg7fCEOYGluDlrclpzFOhBPnam04gtAZk9 Nd3Hi7J1YIALTLQhbEKF3eF8bwhRi96aOqxpeD/Vshcr+0kJv7OARDDPY8i3fcxkudVs GPzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FY/yyaqIA+0UtXl+c+2/CMp1DjrnJ12MKuKhyE1HfHU=; b=fQ6W5a5Q7ClPSBD8mYrfTaZ3eNf8HK6mApM3/RTFSKlWweog8VLRfs1snaWTvRQ4yJ 4sQYIcyKmipQ/vJDlOfvM7A2fSvtUSocZ1sR/9yGg1gDsPzW8Jlo8JDNCXwul9QIP8UP +EEYEoQ50675Gr94gjjsdRa1Tbp3XyaySO0bjKgYv8O4C5bY8dL5Fty89QqbAmwgWrAX J97x2G0Wnzkq2rZ1Xk6ufq1+0nhuKnF07GkKcwkbnKLG3S69Ev0Xfd0YSyptOVNSUFos 4QiCUPQzEcFSt4H79keOjQy3hE1vv3FqSk1E5ueCg8o0Z3BTbv+RbZ6wtI4CZq9cR7Uw Po/g== X-Gm-Message-State: AJIora834b6IKdgQ8XuK+kTp/zL2diWqCxHwjCXUTuNre9UUDcF/vWtx BSHuZvq7VXIs7PYX9kA7r/9vgA== X-Received: by 2002:a17:90a:8b05:b0:1ea:f5e7:daf2 with SMTP id y5-20020a17090a8b0500b001eaf5e7daf2mr29921210pjn.160.1655816296535; Tue, 21 Jun 2022 05:58:16 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id e3-20020a170903240300b0015ea3a491a1sm10643134plo.191.2022.06.21.05.58.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jun 2022 05:58:16 -0700 (PDT) From: Muchun Song To: akpm@linux-foundation.org, hannes@cmpxchg.org, longman@redhat.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com Cc: cgroups@vger.kernel.org, duanxiongchun@bytedance.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v6 07/11] mm: memcontrol: make all the callers of {folio,page}_memcg() safe Date: Tue, 21 Jun 2022 20:56:54 +0800 Message-Id: <20220621125658.64935-8-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220621125658.64935-1-songmuchun@bytedance.com> References: <20220621125658.64935-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When we use objcg APIs to charge the LRU pages, the page will not hold a reference to the memcg associated with the page. So the caller of the {folio,page}_memcg() should hold an rcu read lock or obtain a reference to the memcg associated with the page to protect memcg from being released. So introduce get_mem_cgroup_from_{page,folio}() to obtain a reference to the memory cgroup associated with the page. In this patch, make all the callers hold an rcu read lock or obtain a reference to the memcg to protect memcg from being released when the LRU pages reparented. We do not need to adjust the callers of {folio,page}_memcg() during the whole process of mem_cgroup_move_task(). Because the cgroup migration and memory cgroup offlining are serialized by @cgroup_mutex. In this routine, the LRU pages cannot be reparented to its parent memory cgroup. So {folio,page}_memcg() is stable and cannot be released. This is a preparation for reparenting the LRU pages. Signed-off-by: Muchun Song Acked-by: Roman Gushchin --- fs/buffer.c | 4 +-- fs/fs-writeback.c | 23 +++++++------- include/linux/memcontrol.h | 66 +++++++++++++++++++++++++++++++++----- include/trace/events/writeback.h | 5 +++ mm/memcontrol.c | 68 +++++++++++++++++++++++++++++----------- mm/migrate.c | 4 +++ mm/page_io.c | 5 +-- 7 files changed, 135 insertions(+), 40 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 898c7f301b1b..04ec53f327e4 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -819,8 +819,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, if (retry) gfp |= __GFP_NOFAIL; - /* The page lock pins the memcg */ - memcg = page_memcg(page); + memcg = get_mem_cgroup_from_page(page); old_memcg = set_active_memcg(memcg); head = NULL; @@ -840,6 +839,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, set_bh_page(bh, page, offset); } out: + mem_cgroup_put(memcg); set_active_memcg(old_memcg); return head; /* diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 05221366a16d..1cbac56c810b 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -244,15 +244,13 @@ void __inode_attach_wb(struct inode *inode, struct page *page) if (inode_cgwb_enabled(inode)) { struct cgroup_subsys_state *memcg_css; - if (page) { - memcg_css = mem_cgroup_css_from_page(page); - wb = wb_get_create(bdi, memcg_css, GFP_ATOMIC); - } else { - /* must pin memcg_css, see wb_get_create() */ + /* must pin memcg_css, see wb_get_create() */ + if (page) + memcg_css = get_mem_cgroup_css_from_page(page); + else memcg_css = task_get_css(current, memory_cgrp_id); - wb = wb_get_create(bdi, memcg_css, GFP_ATOMIC); - css_put(memcg_css); - } + wb = wb_get_create(bdi, memcg_css, GFP_ATOMIC); + css_put(memcg_css); } if (!wb) @@ -869,16 +867,16 @@ void wbc_account_cgroup_owner(struct writeback_control *wbc, struct page *page, if (!wbc->wb || wbc->no_cgroup_owner) return; - css = mem_cgroup_css_from_page(page); + css = get_mem_cgroup_css_from_page(page); /* dead cgroups shouldn't contribute to inode ownership arbitration */ if (!(css->flags & CSS_ONLINE)) - return; + goto out; id = css->id; if (id == wbc->wb_id) { wbc->wb_bytes += bytes; - return; + goto out; } if (id == wbc->wb_lcand_id) @@ -891,6 +889,9 @@ void wbc_account_cgroup_owner(struct writeback_control *wbc, struct page *page, wbc->wb_tcand_bytes += bytes; else wbc->wb_tcand_bytes -= min(bytes, wbc->wb_tcand_bytes); + +out: + css_put(css); } EXPORT_SYMBOL_GPL(wbc_account_cgroup_owner); diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 026b62b206b1..a8bd4bb39502 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -379,7 +379,7 @@ static inline bool folio_memcg_kmem(struct folio *folio); * a valid memcg, but can be atomically swapped to the parent memcg. * * The caller must ensure that the returned memcg won't be released: - * e.g. acquire the rcu_read_lock or css_set_lock. + * e.g. acquire the rcu_read_lock or objcg_lock or cgroup_mutex. */ static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) { @@ -445,8 +445,8 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio) * - lock_page_memcg() * - exclusive reference * - * For a kmem folio a caller should hold an rcu read lock to protect memcg - * associated with a kmem folio from being released. + * Note: The caller should hold an rcu read lock to protect memcg associated + * with a folio from being released. */ static inline struct mem_cgroup *folio_memcg(struct folio *folio) { @@ -455,12 +455,48 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio) return __folio_memcg(folio); } +/* + * page_memcg - Get the memory cgroup associated with a page. + * @page: Pointer to the page. + * + * See the cooments in folio_memcg(). + */ static inline struct mem_cgroup *page_memcg(struct page *page) { return folio_memcg(page_folio(page)); } -/** +/* + * get_mem_cgroup_from_folio - Obtain a reference on the memory cgroup + * associated with a folio. + * @folio: Pointer to the folio. + * + * Returns a pointer to the memory cgroup (and obtain a reference on it) + * associated with the folio, or NULL. This function assumes that the + * folio is known to have a proper memory cgroup pointer. It's not safe + * to call this function against some type of pages, e.g. slab pages or + * ex-slab pages. + */ +static inline struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio) +{ + struct mem_cgroup *memcg; + + rcu_read_lock(); +retry: + memcg = folio_memcg(folio); + if (unlikely(memcg && !css_tryget(&memcg->css))) + goto retry; + rcu_read_unlock(); + + return memcg; +} + +static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page) +{ + return get_mem_cgroup_from_folio(page_folio(page)); +} + +/* * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio. * @folio: Pointer to the folio. * @@ -888,7 +924,7 @@ static inline bool mm_match_cgroup(struct mm_struct *mm, return match; } -struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page); +struct cgroup_subsys_state *get_mem_cgroup_css_from_page(struct page *page); ino_t page_cgroup_ino(struct page *page); static inline bool mem_cgroup_online(struct mem_cgroup *memcg) @@ -1058,19 +1094,25 @@ static inline void count_memcg_events(struct mem_cgroup *memcg, static inline void count_memcg_page_event(struct page *page, enum vm_event_item idx) { - struct mem_cgroup *memcg = page_memcg(page); + struct mem_cgroup *memcg; + rcu_read_lock(); + memcg = page_memcg(page); if (memcg) count_memcg_events(memcg, idx, 1); + rcu_read_unlock(); } static inline void count_memcg_folio_events(struct folio *folio, enum vm_event_item idx, unsigned long nr) { - struct mem_cgroup *memcg = folio_memcg(folio); + struct mem_cgroup *memcg; + rcu_read_lock(); + memcg = folio_memcg(folio); if (memcg) count_memcg_events(memcg, idx, nr); + rcu_read_unlock(); } static inline void count_memcg_event_mm(struct mm_struct *mm, @@ -1149,6 +1191,16 @@ static inline struct mem_cgroup *page_memcg(struct page *page) return NULL; } +static inline struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio) +{ + return NULL; +} + +static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page) +{ + return NULL; +} + static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) { WARN_ON_ONCE(!rcu_read_lock_held()); diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h index 86b2a82da546..cdb822339f13 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -258,6 +258,11 @@ TRACE_EVENT(track_foreign_dirty, __entry->ino = inode ? inode->i_ino : 0; __entry->memcg_id = wb->memcg_css->id; __entry->cgroup_ino = __trace_wb_assign_cgroup(wb); + /* + * TP_fast_assign() is under preemption disabled which can + * serve as an RCU read-side critical section so that the + * memcg returned by folio_memcg() cannot be freed. + */ __entry->page_cgroup_ino = cgroup_ino(folio_memcg(folio)->css.cgroup); ), diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6f171480b2f2..346a954e190e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -369,7 +369,7 @@ EXPORT_SYMBOL(memcg_kmem_enabled_key); #endif /** - * mem_cgroup_css_from_page - css of the memcg associated with a page + * get_mem_cgroup_css_from_page - get css of the memcg associated with a page * @page: page of interest * * If memcg is bound to the default hierarchy, css of the memcg associated @@ -379,13 +379,15 @@ EXPORT_SYMBOL(memcg_kmem_enabled_key); * If memcg is bound to a traditional hierarchy, the css of root_mem_cgroup * is returned. */ -struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page) +struct cgroup_subsys_state *get_mem_cgroup_css_from_page(struct page *page) { struct mem_cgroup *memcg; - memcg = page_memcg(page); + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return &root_mem_cgroup->css; - if (!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) + memcg = get_mem_cgroup_from_page(page); + if (!memcg) memcg = root_mem_cgroup; return &memcg->css; @@ -768,13 +770,13 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, int val) { - struct page *head = compound_head(page); /* rmap on tail pages */ + struct folio *folio = page_folio(page); /* rmap on tail pages */ struct mem_cgroup *memcg; pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; rcu_read_lock(); - memcg = page_memcg(head); + memcg = folio_memcg(folio); /* Untracked pages have no memcg, no lruvec. Update only the node */ if (!memcg) { rcu_read_unlock(); @@ -2056,7 +2058,9 @@ void folio_memcg_lock(struct folio *folio) * The RCU lock is held throughout the transaction. The fast * path can get away without acquiring the memcg->move_lock * because page moving starts with an RCU grace period. - */ + * + * The RCU lock also protects the memcg from being freed. + */ rcu_read_lock(); if (mem_cgroup_disabled()) @@ -3353,7 +3357,7 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) void split_page_memcg(struct page *head, unsigned int nr) { struct folio *folio = page_folio(head); - struct mem_cgroup *memcg = folio_memcg(folio); + struct mem_cgroup *memcg = get_mem_cgroup_from_folio(folio); int i; if (mem_cgroup_disabled() || !memcg) @@ -3366,6 +3370,8 @@ void split_page_memcg(struct page *head, unsigned int nr) obj_cgroup_get_many(__folio_objcg(folio), nr - 1); else css_get_many(&memcg->css, nr - 1); + + css_put(&memcg->css); } #ifdef CONFIG_MEMCG_SWAP @@ -4558,7 +4564,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio, struct bdi_writeback *wb) { - struct mem_cgroup *memcg = folio_memcg(folio); + struct mem_cgroup *memcg = get_mem_cgroup_from_folio(folio); struct memcg_cgwb_frn *frn; u64 now = get_jiffies_64(); u64 oldest_at = now; @@ -4605,6 +4611,7 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio, frn->memcg_id = wb->memcg_css->id; frn->at = now; } + css_put(&memcg->css); } /* issue foreign writeback flushes for recorded foreign dirtying events */ @@ -6167,6 +6174,14 @@ static void mem_cgroup_move_charge(void) atomic_dec(&mc.from->moving_account); } +/* + * The cgroup migration and memory cgroup offlining are serialized by + * @cgroup_mutex. If we reach here, it means that the LRU pages cannot + * be reparented to its parent memory cgroup. So during the whole process + * of mem_cgroup_move_task(), page_memcg(page) is stable. So we do not + * need to worry about the memcg (returned from page_memcg()) being + * released even if we do not hold an rcu read lock. + */ static void mem_cgroup_move_task(void) { if (mc.to) { @@ -7025,7 +7040,7 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) if (folio_memcg(new)) return; - memcg = folio_memcg(old); + memcg = get_mem_cgroup_from_folio(old); VM_WARN_ON_ONCE_FOLIO(!memcg, old); if (!memcg) return; @@ -7044,6 +7059,8 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) mem_cgroup_charge_statistics(memcg, nr_pages); memcg_check_events(memcg, folio_nid(new)); local_irq_restore(flags); + + css_put(&memcg->css); } DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key); @@ -7228,6 +7245,10 @@ void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; + /* + * Interrupts should be disabled by the caller (see the comments below), + * which can serve as RCU read-side critical sections. + */ memcg = folio_memcg(folio); VM_WARN_ON_ONCE_FOLIO(!memcg, folio); @@ -7289,19 +7310,21 @@ int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry) struct page_counter *counter; struct mem_cgroup *memcg; unsigned short oldid; + int ret = 0; if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return 0; + rcu_read_lock(); memcg = folio_memcg(folio); VM_WARN_ON_ONCE_FOLIO(!memcg, folio); if (!memcg) - return 0; + goto out; if (!entry.val) { memcg_memory_event(memcg, MEMCG_SWAP_FAIL); - return 0; + goto out; } memcg = mem_cgroup_id_get_online(memcg); @@ -7311,7 +7334,8 @@ int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry) memcg_memory_event(memcg, MEMCG_SWAP_MAX); memcg_memory_event(memcg, MEMCG_SWAP_FAIL); mem_cgroup_id_put(memcg); - return -ENOMEM; + ret = -ENOMEM; + goto out; } /* Get references for the tail pages, too */ @@ -7320,8 +7344,10 @@ int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry) oldid = swap_cgroup_record(entry, mem_cgroup_id(memcg), nr_pages); VM_BUG_ON_FOLIO(oldid, folio); mod_memcg_state(memcg, MEMCG_SWAP, nr_pages); +out: + rcu_read_unlock(); - return 0; + return ret; } /** @@ -7366,6 +7392,7 @@ long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) bool mem_cgroup_swap_full(struct page *page) { struct mem_cgroup *memcg; + bool ret = false; VM_BUG_ON_PAGE(!PageLocked(page), page); @@ -7374,19 +7401,24 @@ bool mem_cgroup_swap_full(struct page *page) if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) return false; + rcu_read_lock(); memcg = page_memcg(page); if (!memcg) - return false; + goto out; for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) { unsigned long usage = page_counter_read(&memcg->swap); if (usage * 2 >= READ_ONCE(memcg->swap.high) || - usage * 2 >= READ_ONCE(memcg->swap.max)) - return true; + usage * 2 >= READ_ONCE(memcg->swap.max)) { + ret = true; + goto out; + } } +out: + rcu_read_unlock(); - return false; + return ret; } static int __init setup_swap_account(char *s) diff --git a/mm/migrate.c b/mm/migrate.c index 1ece23d80bc4..2e49b96fa339 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -451,6 +451,10 @@ int folio_migrate_mapping(struct address_space *mapping, struct lruvec *old_lruvec, *new_lruvec; struct mem_cgroup *memcg; + /* + * Irq is disabled, which can serve as RCU read-side critical + * sections. + */ memcg = folio_memcg(folio); old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat); new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat); diff --git a/mm/page_io.c b/mm/page_io.c index 68318134dc92..f75ebbc95ee6 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -222,13 +222,14 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) struct cgroup_subsys_state *css; struct mem_cgroup *memcg; + rcu_read_lock(); memcg = page_memcg(page); if (!memcg) - return; + goto out; - rcu_read_lock(); css = cgroup_e_css(memcg->css.cgroup, &io_cgrp_subsys); bio_associate_blkg_from_css(bio, css); +out: rcu_read_unlock(); } #else -- 2.11.0