Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp5848723pxb; Tue, 16 Feb 2021 09:01:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJz3GIOGikrPVtLVq4t8benVfseWaiybd2rRmaBrN9qnxt6f1ZIfWAUr6QFYm9iLeQWzQV+h X-Received: by 2002:a17:906:bcd7:: with SMTP id lw23mr19371562ejb.30.1613494903827; Tue, 16 Feb 2021 09:01:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613494903; cv=none; d=google.com; s=arc-20160816; b=00oxSM/2Ty59J6r8gnptqp22+w2SbOPWmqNcTqNO7P9IX9ho6R0ehsx5/nEGSglrRx fJBcIaLAmCPqyYY0zAj7cUqy7Mx1IJ4QwzMNgdgaMiGVX3Q1KHlJTrOuaOE1RbHVVzm4 jrA88kfcwRivUMKOlhRUmHwj0CZs0PU6Wer3Ku2vJJgw+rfi03HaUXCFYZZNK+ak/LNk W2TA9K5eViLlwhFdmVaVaQaeC4389nagGJ18K8jT8uy0i9Fghd1TGOZXYdGp2Jmp89cS kAmn7Auo/LOC3JMzTvAM5W9A+YgtBEfXIpMay9cn22fw66kv3aI6j1tlFVU7KTVNUpet kHkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=7Sbc+/7yHkx+7SnjvZzsa9ETiyUzFfwyyH3lcCN73/I=; b=hGBNx6Z60Y8Lhg0DH9nUmyV57Dkpah6ymTjjNLIcIs7nYDbq27QKLaAb58REjqnjw/ ep4B0f3R4lmOz8pV/1oeC4rI2PB8RoSQkfDeVpPX5M7eay59gODc0TESD8CWqlnaoesm eJ3CgfSgOKGAfSQsjT8n6630GJm35jcDp2i8fLZCx1hptjaEqBpr9kjUm+WYIJJmwz8C l1O85N+m19wo5ZUmZdtoX8tF7GDirrG0i67HH1H1yiRDTbYPHX3jm9F284EyzvajHFWM 92KAj1l4EH2fnmfwOsWckCZNV0AcgZay1wGQl/tKhE1tBZ2R7wPjgG/pE7JmVD5RW7Js Abtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=QcYThMME; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id eb13si18395061edb.8.2021.02.16.09.01.06; Tue, 16 Feb 2021 09:01:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=QcYThMME; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230006AbhBPQ7p (ORCPT + 99 others); Tue, 16 Feb 2021 11:59:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229764AbhBPQ7m (ORCPT ); Tue, 16 Feb 2021 11:59:42 -0500 Received: from mail-qk1-x72b.google.com (mail-qk1-x72b.google.com [IPv6:2607:f8b0:4864:20::72b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83AC5C06174A for ; Tue, 16 Feb 2021 08:59:02 -0800 (PST) Received: by mail-qk1-x72b.google.com with SMTP id w19so9986455qki.13 for ; Tue, 16 Feb 2021 08:59:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=7Sbc+/7yHkx+7SnjvZzsa9ETiyUzFfwyyH3lcCN73/I=; b=QcYThMMEVLZEm7VRmsM12lMG8ZDFYS88jVBFd9Z67UWfpdmyePvN+QkKDAKq3kRSIH dT4jdVZPlMZPWZg0Llj4TOx6NJWY0qbNUcZsx5ciWu7MNSKKlK4y4wXs0KmwHZZGjHFt DLwXH+6VOSHgIPGNOVUJPLSqiiXkqe62CQYLohD4C8Ojp89gxsduXZpeeyr404CO/cwm /btydGUcbGStYIzkl/TPAGiZTGJbhC5KYmqo3pe/CabYp0YnhLT8nLeX4MR7qc5/yscQ YFbDP8zkkOL/UrZSrJpS5A/Jlc+/mgT+srW/uf4YPa9qIbeiSpupCbcbdQQoD8ITU4sW 4jQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=7Sbc+/7yHkx+7SnjvZzsa9ETiyUzFfwyyH3lcCN73/I=; b=GbIzzT7TRU/hfDi5LLKoJfFyITjYjOQ4yIewDso23MC92JsdcCnESzfAHsDGuTbpCK rqTyC4QEK0BcnSVN3e5h3y+Jxj3l9m4VHwgqM+WIzNAweIDqCg7HSRZwcB+d4CG13SZN un5NRAD+Kg9CnPITp0oKxwgmuwCVEZAG/ZQxPLZ7kp6MWZ4DdXI8KJr7FmgBaCQ6nzse UKDuachHBYT2x4lZq8BO7d5T4O9k+UTfBEmZqdwjsCV3HB8pKc93sa8W5pYEElg1Hwy0 71406dxnAezXy8l/C8kJYFSCe6z0xa8F6+LmWPTXJRhTAIvSbl2wZXOnskM9GbCCNlAT Xlqw== X-Gm-Message-State: AOAM530NZJEufc7bj9EB4kn59ehE2OE4+k926XETKUyEh+YrSDcWIHIQ BlF+pfjjWziteolI8+lRaVR27tnuLvsYKg== X-Received: by 2002:a05:620a:148c:: with SMTP id w12mr18486430qkj.186.1613494741630; Tue, 16 Feb 2021 08:59:01 -0800 (PST) Received: from localhost (70.44.39.90.res-cmts.bus.ptd.net. [70.44.39.90]) by smtp.gmail.com with ESMTPSA id q12sm7798492qki.91.2021.02.16.08.59.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Feb 2021 08:59:00 -0800 (PST) Date: Tue, 16 Feb 2021 11:59:00 -0500 From: Johannes Weiner To: Muchun Song Cc: mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 Message-ID: References: <20210212170159.32153-1-songmuchun@bytedance.com> <20210212170159.32153-4-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210212170159.32153-4-songmuchun@bytedance.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Muchun, On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > The swap charges the actual number of swap entries on cgroup v2. > If a swap cache page is charged successful, and then we uncharge > the swap counter. It is wrong on cgroup v2. Because the swap > entry is not freed. The patch makes sense to me. But this code is a bit tricky, we should add more documentation to how it works and what the problem is. How about this for the changelog? --- mm: memcontrol: fix swap undercounting for shared pages in cgroup2 When shared pages are swapped in partially, we can have some page tables referencing the in-memory page and some referencing the swap slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly differently due to the nature of how they account memory and swap: Cgroup1 has a unified memory+swap counter that tracks a data page regardless whether it's in-core or swapped out. On swapin, we transfer the charge from the swap entry to the newly allocated swapcache page, even though the swap entry might stick around for a while. That's why we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). Cgroup2 tracks memory and swap as separate, independent resources and thus has split memory and swap counters. On swapin, we charge the newly allocated swapcache page as memory, while the swap slot in turn must remain charged to the swap counter as long as its allocated too. The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control"), because it accidentally removed the do_memsw_account() check in the branch inside mem_cgroup_uncharge() that was supposed to tell the difference between the charge transfer in cgroup1 and the separate counters in cgroup2. As a result, cgroup2 currently undercounts consumed swap when shared pages are partially swapped back in. This in turn allows a cgroup to consume more swap than its configured limit intends. Add the do_memsw_account() check back to fix this problem. --- > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > Signed-off-by: Muchun Song Acked-by: Johannes Weiner > --- > mm/memcontrol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c737c8f05992..be6bc5044150 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > memcg_check_events(memcg, page); > local_irq_enable(); > > - if (PageSwapCache(page)) { > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { It's more descriptive to use do_memsw_account() here, IMO. We should also add a comment. How about this above the branch? /* * Cgroup1's unified memory+swap counter has been charged with the * new swapcache page, finish the transfer by uncharging the swap * slot. The swap slot would also get uncharged when it dies, but * for shared pages it can stick around indefinitely and we'd count * the page twice the entire time. * * Cgroup2 has separate resource counters for memory and swap, * so this is a non-issue here. Memory and swap charge lifetimes * correspond 1:1 to page and swap slot lifetimes: we charge the * page to memory here, and uncharge swap when the slot is freed. */