Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp4637515pxj; Wed, 12 May 2021 09:46:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzDAZrI87qKhdkn4RfmOeVivh7wWZnNLwLW8YwqRneLrJ8XxI0CuRsMonAT8LBEMwHxkumr X-Received: by 2002:a05:651c:14f:: with SMTP id c15mr10036208ljd.22.1620838003404; Wed, 12 May 2021 09:46:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620838003; cv=none; d=google.com; s=arc-20160816; b=r0YFlSK/1RHCVAhF4BtattjqjaC7ppByw3DN1mp6N6hoQM16lSMeeKfTWgzemc7dz8 kqCJGNBl3eGdmtj2/i0uPZq1OCr/YYhjGR6t7cCEA086atnqHzMBmzU7gSevPW3S+hL4 m++7IXaAscbMJ5f03PAO+MpiKdd1hEC3mhTan3Eaikdfc0JdUZ5WGWE9Z4WiHbT8TNkG D05wTNJvAq3vsdQPSxYi9xxxGSUrHs07ZtmK2chZC9E+ZGbg29Wjcsg/uALLmBfVdZyf u1YSnK5c5BoXGWcejlCldVhAU4JSXS1lBsEBmdWdnWKb7a85Rd3Tu4RWY4I2khssSp+n 04Yw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=sUVq5OT45oytIXlGPX4fS4Jhu2cynI6cKeeleq6Iwg8=; b=azayuif7F5Roimss0uVXDKqmnGaltMTd3dP4zbTtLUeK/gZEZSL3eX0xuehxZUZF0B MqUjw3Mqvc4Sj/lnct133Cs01tFWRo7MifbJZ5pMArGIv5rFl/xkMJZzTNbyyWH/Ik1w S2DU57E2zmR03vE5PdszyL+aHHWpgneNMjMLrzSHfdhRUc8D7nel/noSUuO0MyQbg70C 0s6N3v/qBwTEExs8SVCF/zdgHhjNEkmn22M2cDJmCtXUkJ5wS+bLV4wrIGA3G3fOZ1Tz 4Vvj3oDNMLdM3KioCol87NjkJC2rm7l07Qzp+8phOWY45QMbWTNO5BOqS6yjWlaMLd+f V+8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=VJxAyUZl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q26si340982ljj.203.2021.05.12.09.46.12; Wed, 12 May 2021 09:46:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=VJxAyUZl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238074AbhELQo0 (ORCPT + 99 others); Wed, 12 May 2021 12:44:26 -0400 Received: from mail.kernel.org ([198.145.29.99]:50348 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237027AbhELPrx (ORCPT ); Wed, 12 May 2021 11:47:53 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 89E5461CA7; Wed, 12 May 2021 15:24:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1620833060; bh=n5MOxm+0nVYW0OlRnb1cdcTKRqFNE6J1K63acn64HB0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VJxAyUZlpouRZfQyu2FrVgNreinICsFf3dARPtXAH8QU9OWg2Xuaq26W2/mEoJAGr oy5KGQbzLBF/LdEMV7OoKsBFdg1LngTqt0N1UsrKjNyOIPS3J3c5DR1CagUtcnxlX0 Pe3qO8YYFqxTQ9qjpqg3G+vLPJnPyo0jeM0Sx99Q= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Muchun Song , Shakeel Butt , Roman Gushchin , Johannes Weiner , Michal Hocko , Vladimir Davydov , Xiongchun Duan , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [PATCH 5.10 519/530] mm: memcontrol: slab: fix obtain a reference to a freeing memcg Date: Wed, 12 May 2021 16:50:29 +0200 Message-Id: <20210512144836.811965140@linuxfoundation.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210512144819.664462530@linuxfoundation.org> References: <20210512144819.664462530@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Muchun Song [ Upstream commit 9f38f03ae8d5f57371b71aa6b4275765b65454fd ] Patch series "Use obj_cgroup APIs to charge kmem pages", v5. Since Roman's series "The new cgroup slab memory controller" applied. All slab objects are charged with the new APIs of obj_cgroup. The new APIs introduce a struct obj_cgroup to charge slab objects. It prevents long-living objects from pinning the original memory cgroup in the memory. But there are still some corner objects (e.g. allocations larger than order-1 page on SLUB) which are not charged with the new APIs. Those objects (include the pages which are allocated from buddy allocator directly) are charged as kmem pages which still hold a reference to the memory cgroup. E.g. We know that the kernel stack is charged as kmem pages because the size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64 or arm64). If we create a thread (suppose the thread stack is charged to memory cgroup A) and then move it from memory cgroup A to memory cgroup B. Because the kernel stack of the thread hold a reference to the memory cgroup A. The thread can pin the memory cgroup A in the memory even if we remove the cgroup A. If we want to see this scenario by using the following script. We can see that the system has added 500 dying cgroups (This is not a real world issue, just a script to show that the large kmallocs are charged as kmem pages which can pin the memory cgroup in the memory). #!/bin/bash cat /proc/cgroups | grep memory cd /sys/fs/cgroup/memory echo 1 > memory.move_charge_at_immigrate for i in range{1..500} do mkdir kmem_test echo $$ > kmem_test/cgroup.procs sleep 3600 & echo $$ > cgroup.procs echo `cat kmem_test/cgroup.procs` > cgroup.procs rmdir kmem_test done cat /proc/cgroups | grep memory This patchset aims to make those kmem pages to drop the reference to memory cgroup by using the APIs of obj_cgroup. Finally, we can see that the number of the dying cgroups will not increase if we run the above test script. This patch (of 7): The rcu_read_lock/unlock only can guarantee that the memcg will not be freed, but it cannot guarantee the success of css_get (which is in the refill_stock when cached memcg changed) to memcg. rcu_read_lock() memcg = obj_cgroup_memcg(old) __memcg_kmem_uncharge(memcg) refill_stock(memcg) if (stock->cached != memcg) // css_get can change the ref counter from 0 back to 1. css_get(&memcg->css) rcu_read_unlock() This fix is very like the commit: eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge") Fix this by holding a reference to the memcg which is passed to the __memcg_kmem_uncharge() before calling __memcg_kmem_uncharge(). Link: https://lkml.kernel.org/r/20210319163821.20704-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20210319163821.20704-2-songmuchun@bytedance.com Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining") Signed-off-by: Muchun Song Reviewed-by: Shakeel Butt Acked-by: Roman Gushchin Acked-by: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Xiongchun Duan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/memcontrol.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d72d2b90474a..8d9f5fa4c6d3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3162,9 +3162,17 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock) unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1); if (nr_pages) { + struct mem_cgroup *memcg; + rcu_read_lock(); - __memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages); +retry: + memcg = obj_cgroup_memcg(old); + if (unlikely(!css_tryget(&memcg->css))) + goto retry; rcu_read_unlock(); + + __memcg_kmem_uncharge(memcg, nr_pages); + css_put(&memcg->css); } /* -- 2.30.2