Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp2648960ioo; Tue, 24 May 2022 02:51:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwTZ2tbSgLG89nOdUWBgSV95Uh3WAgb5pbNEyBR8jK78O5M3EhsWeXDxswV/ZNy7quwaVKW X-Received: by 2002:a65:6ccd:0:b0:3c6:2ba2:5f9a with SMTP id g13-20020a656ccd000000b003c62ba25f9amr24084632pgw.216.1653385903875; Tue, 24 May 2022 02:51:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653385903; cv=none; d=google.com; s=arc-20160816; b=Den0w1Zui4q/8dxZXKj4vZb5fH74O6jXGmwqTpFg+IKLiYc/LomliPQRrU4KfNZ4jT iKhzlnsr934xSi3/l1jdAO3xlt5BkqIA7OMO/Z/osn11Zy8yEwWG7T95ms0RRLwhIZnq ADNnvmO3BomesmI63Gpqvd3eg5x7lnNyBvrnfxGpZRvY31ccaAXlhbObVcqKe8kA06m/ dQpx+oh/TA8pcCbSMjaBropjqooAPkScsfiDwd76KklvAi5uMpo7jQRHwvXkA0XROT8x 1aX9cH4ibx/1D9aN21+1HfiM3S364noBWhai1WUWqfazjW3lRfrrl+O1MIUsIHo2y9H7 j9cA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2hYwGkjhBJyJx1MvqMBFvObYccg6CgYmwL5qHdR4ZiM=; b=DwUYM+XoCziov/K4tjTmW7iKhWiihYT36SJMuiV80FIO6VtMcCAg7fixXGkkU/vmcq 4siEsgZ9p0wIr6Kbezbh4m/W1A8raGPTEemYGGdPCIAQc+DBbMwNknEC8UKHey0eLZRn 6vMltJTUmn3YKvMbwaooglWamvujV2NB4bsarC2HNPga0oDYZWYlLgV61u2tORJsNoDT I9CYVtxzel6kjrRWhahft4hejakHPGM1rwmPsF6hu3SUws6rvVsljHPYPVlMf6Y20FCe PCeLTJ5O5ml3m3swU+irC8kYAKXRqtpoG8ovGqsME37isEUSQDewn9cDsLxn08oBHGK2 UA2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=GFMb6tk3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u11-20020a17090341cb00b0015893aa584csi15793403ple.272.2022.05.24.02.51.31; Tue, 24 May 2022 02:51:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=GFMb6tk3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234021AbiEXG2v (ORCPT + 99 others); Tue, 24 May 2022 02:28:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235078AbiEXG2n (ORCPT ); Tue, 24 May 2022 02:28:43 -0400 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1658895DCB for ; Mon, 23 May 2022 23:28:36 -0700 (PDT) Received: by mail-pj1-x102f.google.com with SMTP id nk9-20020a17090b194900b001df2fcdc165so1254618pjb.0 for ; Mon, 23 May 2022 23:28:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2hYwGkjhBJyJx1MvqMBFvObYccg6CgYmwL5qHdR4ZiM=; b=GFMb6tk3YXF6rCtekI1Z960E9oVS5AdQHpWxVIVmq5MXcaqETUL8TVzU1uUNBFKyo/ /PfpUkia6P9oSanDlllyDyuTZtZfeHRXRPQweEyxDm/QpQlRpxxXslqEhTiCj+QvJ6tW lbKHpdbI5+Izjs/Lkfjl/AhUWX6g8UPf14bJps3dLRI2XDn499bnbeEz6NROkbiMSA+z N2rJHhmjTqd1HSnTND3D3F4eQWYTpm7WEb5KcGeoN1yG30KwPM+v8Nhl9NDVbh5R6Hzt hFhD4dv4h7vobCU/eAw/H7JNVRdgdBkPfyMFeHZIIAJOl5iqYlOSkrCt3LeNoerW6Zpr kpqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2hYwGkjhBJyJx1MvqMBFvObYccg6CgYmwL5qHdR4ZiM=; b=XhpVMwBRze7iKvVywxYnz959oiqQmroTM9gCeHw+hCNUM/zZNjYXo+h+YIJUVorgIl StgQUy8pT71jCLi91lovGZ6IjrnrbfCJD2L2N0c/97NsUMN/ylPAcRdF3UC4qr4ebDZU atL2Y7THBvuXIJDZlwhoMioKjhJy/leRiI2e9gb2+n5Q/aZrr+hBPSJ3ebDuwHUlPQUN hbDTeY3cqw6so+TqOJWGSbApdP3tO00PUpP02Byk6UMZrppA8/A7kHsINNG/upZbEyhZ sogVsuYbRbqvVKJOGaqQo6JYm+lbcF0Hvj54qM9H0U+5rPaNNH7IpZ2FND/TrquTx/6z vmdA== X-Gm-Message-State: AOAM532EIAdHmLsXn6+6TSpAcJBXs9iWlRWJ+RPI+K38u6fI6iT/6WGY ck0mrDncGNdrkASe6/Y2im6VuQ== X-Received: by 2002:a17:902:bb90:b0:158:a031:2ff2 with SMTP id m16-20020a170902bb9000b00158a0312ff2mr25960427pls.117.1653373715289; Mon, 23 May 2022 23:28:35 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([2408:8207:18da:2310:f940:af17:c2f5:8656]) by smtp.gmail.com with ESMTPSA id h5-20020a170902f54500b0016168e90f2dsm6254455plf.219.2022.05.23.23.28.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 23:28:35 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, duanxiongchun@bytedance.com, longman@redhat.com, Muchun Song Subject: [PATCH v4 11/11] mm: lru: use lruvec lock to serialize memcg changes Date: Tue, 24 May 2022 14:05:51 +0800 Message-Id: <20220524060551.80037-12-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220524060551.80037-1-songmuchun@bytedance.com> References: <20220524060551.80037-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As described by commit fc574c23558c ("mm/swap.c: serialize memcg changes in pagevec_lru_move_fn"), TestClearPageLRU() aims to serialize mem_cgroup_move_account() during pagevec_lru_move_fn(). Now folio_lruvec_lock*() has the ability to detect whether page memcg has been changed. So we can use lruvec lock to serialize mem_cgroup_move_account() during pagevec_lru_move_fn(). This change is a partial revert of the commit fc574c23558c ("mm/swap.c: serialize memcg changes in pagevec_lru_move_fn"). And pagevec_lru_move_fn() is more hot compare with mem_cgroup_move_account(), removing an atomic operation would be an optimization. Also this change would not dirty cacheline for a page which isn't on the LRU. Signed-off-by: Muchun Song --- mm/memcontrol.c | 31 +++++++++++++++++++++++++++++++ mm/swap.c | 45 ++++++++++++++------------------------------- mm/vmscan.c | 9 ++++----- 3 files changed, 49 insertions(+), 36 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1a35f7fde3ed..7b6d9c308d91 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1331,12 +1331,38 @@ struct lruvec *folio_lruvec_lock(struct folio *folio) lruvec = folio_lruvec(folio); spin_lock(&lruvec->lru_lock); + /* + * The memcg of the page can be changed by any the following routines: + * + * 1) mem_cgroup_move_account() or + * 2) memcg_reparent_objcgs() + * + * The possible bad scenario would like: + * + * CPU0: CPU1: CPU2: + * lruvec = folio_lruvec() + * + * if (!isolate_lru_page()) + * mem_cgroup_move_account() + * + * memcg_reparent_objcgs() + * + * spin_lock(&lruvec->lru_lock) + * ^^^^^^ + * wrong lock + * + * Either CPU1 or CPU2 can change page memcg, so we need to check + * whether page memcg is changed, if so, we should reacquire the + * new lruvec lock. + */ if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { spin_unlock(&lruvec->lru_lock); goto retry; } /* + * When we reach here, it means that the folio_memcg(folio) is stable. + * * Preemption is disabled in the internal of spin_lock, which can serve * as RCU read-side critical sections. */ @@ -1367,6 +1393,7 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *folio) lruvec = folio_lruvec(folio); spin_lock_irq(&lruvec->lru_lock); + /* See the comments in folio_lruvec_lock(). */ if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { spin_unlock_irq(&lruvec->lru_lock); goto retry; @@ -1402,6 +1429,7 @@ struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, lruvec = folio_lruvec(folio); spin_lock_irqsave(&lruvec->lru_lock, *flags); + /* See the comments in folio_lruvec_lock(). */ if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { spin_unlock_irqrestore(&lruvec->lru_lock, *flags); goto retry; @@ -5751,7 +5779,10 @@ static int mem_cgroup_move_account(struct page *page, obj_cgroup_put(rcu_dereference(from->objcg)); rcu_read_unlock(); + /* See the comments in folio_lruvec_lock(). */ + spin_lock(&from_vec->lru_lock); folio->memcg_data = (unsigned long)rcu_access_pointer(to->objcg); + spin_unlock(&from_vec->lru_lock); __folio_memcg_unlock(from); diff --git a/mm/swap.c b/mm/swap.c index 9680f2fc48b1..984b100e84e4 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -199,14 +199,8 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, struct page *page = pvec->pages[i]; struct folio *folio = page_folio(page); - /* block memcg migration during page moving between lru */ - if (!TestClearPageLRU(page)) - continue; - lruvec = folio_lruvec_relock_irqsave(folio, lruvec, &flags); (*move_fn)(page, lruvec); - - SetPageLRU(page); } if (lruvec) unlock_page_lruvec_irqrestore(lruvec, flags); @@ -218,7 +212,7 @@ static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec) { struct folio *folio = page_folio(page); - if (!folio_test_unevictable(folio)) { + if (folio_test_lru(folio) && !folio_test_unevictable(folio)) { lruvec_del_folio(lruvec, folio); folio_clear_active(folio); lruvec_add_folio_tail(lruvec, folio); @@ -313,7 +307,8 @@ void lru_note_cost_folio(struct folio *folio) static void __folio_activate(struct folio *folio, struct lruvec *lruvec) { - if (!folio_test_active(folio) && !folio_test_unevictable(folio)) { + if (folio_test_lru(folio) && !folio_test_active(folio) && + !folio_test_unevictable(folio)) { long nr_pages = folio_nr_pages(folio); lruvec_del_folio(lruvec, folio); @@ -370,12 +365,9 @@ static void folio_activate(struct folio *folio) { struct lruvec *lruvec; - if (folio_test_clear_lru(folio)) { - lruvec = folio_lruvec_lock_irq(folio); - __folio_activate(folio, lruvec); - unlock_page_lruvec_irq(lruvec); - folio_set_lru(folio); - } + lruvec = folio_lruvec_lock_irq(folio); + __folio_activate(folio, lruvec); + unlock_page_lruvec_irq(lruvec); } #endif @@ -518,6 +510,9 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec) bool active = PageActive(page); int nr_pages = thp_nr_pages(page); + if (!PageLRU(page)) + return; + if (PageUnevictable(page)) return; @@ -555,7 +550,7 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec) static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) { - if (PageActive(page) && !PageUnevictable(page)) { + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { int nr_pages = thp_nr_pages(page); del_page_from_lru_list(page, lruvec); @@ -571,7 +566,7 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec) { - if (PageAnon(page) && PageSwapBacked(page) && + if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && !PageSwapCache(page) && !PageUnevictable(page)) { int nr_pages = thp_nr_pages(page); @@ -1006,8 +1001,9 @@ void __pagevec_release(struct pagevec *pvec) } EXPORT_SYMBOL(__pagevec_release); -static void __pagevec_lru_add_fn(struct folio *folio, struct lruvec *lruvec) +static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec) { + struct folio *folio = page_folio(page); int was_unevictable = folio_test_clear_unevictable(folio); long nr_pages = folio_nr_pages(folio); @@ -1053,20 +1049,7 @@ static void __pagevec_lru_add_fn(struct folio *folio, struct lruvec *lruvec) */ void __pagevec_lru_add(struct pagevec *pvec) { - int i; - struct lruvec *lruvec = NULL; - unsigned long flags = 0; - - for (i = 0; i < pagevec_count(pvec); i++) { - struct folio *folio = page_folio(pvec->pages[i]); - - lruvec = folio_lruvec_relock_irqsave(folio, lruvec, &flags); - __pagevec_lru_add_fn(folio, lruvec); - } - if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); - release_pages(pvec->pages, pvec->nr); - pagevec_reinit(pvec); + pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn); } /** diff --git a/mm/vmscan.c b/mm/vmscan.c index 6c9e2eafc8f9..ec1272ca5ead 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4793,18 +4793,17 @@ void check_move_unevictable_pages(struct pagevec *pvec) nr_pages = thp_nr_pages(page); pgscanned += nr_pages; - /* block memcg migration during page moving between lru */ - if (!TestClearPageLRU(page)) + lruvec = folio_lruvec_relock_irq(folio, lruvec); + + if (!PageLRU(page) || !PageUnevictable(page)) continue; - lruvec = folio_lruvec_relock_irq(folio, lruvec); - if (page_evictable(page) && PageUnevictable(page)) { + if (page_evictable(page)) { del_page_from_lru_list(page, lruvec); ClearPageUnevictable(page); add_page_to_lru_list(page, lruvec); pgrescued += nr_pages; } - SetPageLRU(page); } if (lruvec) { -- 2.11.0