Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp873414pxk; Thu, 17 Sep 2020 19:56:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwEqjl2SkpjRK7lmpjD0/kxXUZ1ohgv230e/tvOhjGzQPfWcprEejgv66+PN+dwcIxB5a2V X-Received: by 2002:a17:907:72c5:: with SMTP id du5mr35240966ejc.469.1600397794214; Thu, 17 Sep 2020 19:56:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600397794; cv=none; d=google.com; s=arc-20160816; b=EaGY2muYRrAeJffYg4RUTvwQxVVAFoRmRxt+u884wWUBEmkmgPsNzd6zFCpuXAPf+E WULx5u1Ms1RzB6ghiDXR5G36/JU7CN7/uXOw3XPVtSVvYfVQY7iyi2kRHjvy5JAGqzse b+BwCMPT6Zjc3c7l5r7lBQj85kGlpHzAJ/fXagEkSyEQ2W0I7KqO29PSFi31Maje/b0l PG9WWlWlq8oo6uiUmov2/LR3F+B+O5EyG7dppNtGLU7HrkDZSKlcvlhnk+Qtp0WaVx9V 2D6hNR1DILyHZWpAW7I6En6RPrdl5ciS00Z7o9r+oAC3YfmPa4JQmk0+MVPQrKGugkj0 +KaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Xte5hKvSnM1gW+hkQocX3WEl5pjKZLHI/qn6D40qL/o=; b=cphlYb3VBwm96n4NEQr96B89Haj88uffbvezSapae76tIj5iampWC6QCgqAaQGTx7E k+l+J+FBIe53WwQv/XE33zBtU5CUZyn2OnfKM8bc10WOEcZWCvvrcC2n7t5Mo8QM1Dvp 4Bgs5WbJgrlKOrBakykdgGOHfKXKph9+r2Q3uXgqqygI/21s0JEPRRF5NTaKr/mLC7KR 6Eas/WJV2XU5TMneW+ZKOtoycIqApHF//cDrW9d8ha7LE3Qhe0/XtIRI02nZR9PFmrti bKR/HSceov8Cd0mAZ+iQH7D76mqVQ/aB7xsm9xv9+3Mgl9nA+LAb0f5HVlCQDNW6ApYo z2bQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=qnjEOA0e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s3si1269109ejv.383.2020.09.17.19.56.11; Thu, 17 Sep 2020 19:56:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=qnjEOA0e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728269AbgIRCyV (ORCPT + 99 others); Thu, 17 Sep 2020 22:54:21 -0400 Received: from mail.kernel.org ([198.145.29.99]:57884 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726435AbgIRCHc (ORCPT ); Thu, 17 Sep 2020 22:07:32 -0400 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 48F762389E; Fri, 18 Sep 2020 02:07:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600394851; bh=fR/UbDId/TZwo11/kGB43MSUf1Nix4TokHJyS1Bx8Rs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qnjEOA0ewTwgH0jpeoKeykIQcyHCRtLwQih1PvTXg2rCDqVwTyUn4J1zSNPptg8Kw nLshbpbZ+eMBeSJjc+4sZ2xWo10ZdDV1dQMyJgkOgJ8ZF6QX1nBvWZP71y4XGxq4B8 E6SrEyWRCepAiGsCHtk4WarmDYqf0xGN1h5kn2nM= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Johannes Weiner , Andrew Morton , Joonsoo Kim , Shakeel Butt , Alex Shi , Hugh Dickins , "Kirill A. Shutemov" , Michal Hocko , Roman Gushchin , Balbir Singh , Linus Torvalds , Sasha Levin , cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH AUTOSEL 5.4 307/330] mm: memcontrol: fix stat-corrupting race in charge moving Date: Thu, 17 Sep 2020 22:00:47 -0400 Message-Id: <20200918020110.2063155-307-sashal@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org> References: <20200918020110.2063155-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Johannes Weiner [ Upstream commit abb242f57196dbaa108271575353a0453f6834ef ] The move_lock is a per-memcg lock, but the VM accounting code that needs to acquire it comes from the page and follows page->mem_cgroup under RCU protection. That means that the page becomes unlocked not when we drop the move_lock, but when we update page->mem_cgroup. And that assignment doesn't imply any memory ordering. If that pointer write gets reordered against the reads of the page state - page_mapped, PageDirty etc. the state may change while we rely on it being stable and we can end up corrupting the counters. Place an SMP memory barrier to make sure we're done with all page state by the time the new page->mem_cgroup becomes visible. Also replace the open-coded move_lock with a lock_page_memcg() to make it more obvious what we're serializing against. Signed-off-by: Johannes Weiner Signed-off-by: Andrew Morton Reviewed-by: Joonsoo Kim Reviewed-by: Shakeel Butt Cc: Alex Shi Cc: Hugh Dickins Cc: "Kirill A. Shutemov" Cc: Michal Hocko Cc: Roman Gushchin Cc: Balbir Singh Link: http://lkml.kernel.org/r/20200508183105.225460-3-hannes@cmpxchg.org Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/memcontrol.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 402c8bc65e08d..ca1632850fb76 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5489,7 +5489,6 @@ static int mem_cgroup_move_account(struct page *page, { struct lruvec *from_vec, *to_vec; struct pglist_data *pgdat; - unsigned long flags; unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1; int ret; bool anon; @@ -5516,18 +5515,13 @@ static int mem_cgroup_move_account(struct page *page, from_vec = mem_cgroup_lruvec(pgdat, from); to_vec = mem_cgroup_lruvec(pgdat, to); - spin_lock_irqsave(&from->move_lock, flags); + lock_page_memcg(page); if (!anon && page_mapped(page)) { __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages); __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages); } - /* - * move_lock grabbed above and caller set from->moving_account, so - * mod_memcg_page_state will serialize updates to PageDirty. - * So mapping should be stable for dirty pages. - */ if (!anon && PageDirty(page)) { struct address_space *mapping = page_mapping(page); @@ -5543,15 +5537,23 @@ static int mem_cgroup_move_account(struct page *page, } /* + * All state has been migrated, let's switch to the new memcg. + * * It is safe to change page->mem_cgroup here because the page - * is referenced, charged, and isolated - we can't race with - * uncharging, charging, migration, or LRU putback. + * is referenced, charged, isolated, and locked: we can't race + * with (un)charging, migration, LRU putback, or anything else + * that would rely on a stable page->mem_cgroup. + * + * Note that lock_page_memcg is a memcg lock, not a page lock, + * to save space. As soon as we switch page->mem_cgroup to a + * new memcg that isn't locked, the above state can change + * concurrently again. Make sure we're truly done with it. */ + smp_mb(); - /* caller should have done css_get */ - page->mem_cgroup = to; + page->mem_cgroup = to; /* caller should have done css_get */ - spin_unlock_irqrestore(&from->move_lock, flags); + __unlock_page_memcg(from); ret = 0; -- 2.25.1