Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E10C1C6379F for ; Mon, 20 Feb 2023 09:17:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231458AbjBTJRQ (ORCPT ); Mon, 20 Feb 2023 04:17:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230459AbjBTJRM (ORCPT ); Mon, 20 Feb 2023 04:17:12 -0500 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49BB9FF0C for ; Mon, 20 Feb 2023 01:17:11 -0800 (PST) Received: by mail-pl1-x62f.google.com with SMTP id e5so1658125plg.8 for ; Mon, 20 Feb 2023 01:17:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; t=1676884631; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Hay0adYlVwY1IjdRfk5CKFeKCnG2WTWsVt3L/E4Vszc=; b=dzq60rThnG1eHPZmbaqIAdKnvx11jcx+9c7+Xz+Fvb8jzUNuPvxrvvoH9rxiY7bseY MwBIqKmGnnRLy9oD7KNU0IP05L8avR9WGkGCuO0KBPHzABJt+tBiqxGIQYJRYUEQ7wdc +CCtYi1ncziYhQ42cmwAoqIuU1fou5tShKZ0YV8xXDToTIOwyrRgdIk32smyOhodVlyW 8GS5Jc0N9MoRt/38dCyYzVz+zoLJ6UwCIwnPMbV9kDCk/1kPdWTnY7K7PljNojAlUkBH R//DTbH1un7fzgI87WSVUXn8bEViUtn4XHEVe68NYl5dLaXaPf0wAFPKmBBOQiZAvtXk j0mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1676884631; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hay0adYlVwY1IjdRfk5CKFeKCnG2WTWsVt3L/E4Vszc=; b=etWvqHmjeT2bd6P/vEh+qK/e1MzUclxmMW/LTH84pnOFoupjV8YofWd9TvvZkwZn/E VF0GHAWtYJ34Je+MwE2nY8ucZLb5GkdVIBCD5ngIMu5QRGy3AOoxq9UXiCFVqxPNIKyl tungdXBGWrIrodbV+AQz76MM67Y3sOcuSErnb20b+5qD7N/FhI5DpQjA0vSayza5rqJU Z3ib0eNeb/9Q3xcJvfaPyUWQd0qWCGQ2W7ApmjuAebddzkXfL2aRr/zrfyB2nLbdkPEK UjuLUQi1Ta0wQ47fBbl92dlWIxHiIiNcbne9eKxvhoUCZB6uxJi8tVRnLxPeo0W98tMR REsQ== X-Gm-Message-State: AO0yUKWLmsDtCitDsBaX5m6qGVXbj4zY6v25dKbnlJccdj0ys3FL1Ozc 0PIKxkzvDoCR9SBoPU45abQ1RQ== X-Google-Smtp-Source: AK7set+RFNh5qhT4uR8QO+C6MsDJpFzqsI6Ran0kA2qMXuWKiMKjLinzDtbzLeMxRFj33Xvv1RqpTg== X-Received: by 2002:a17:90a:ac0a:b0:230:9ae4:b5e2 with SMTP id o10-20020a17090aac0a00b002309ae4b5e2mr566508pjq.0.1676884630706; Mon, 20 Feb 2023 01:17:10 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id fs13-20020a17090af28d00b002339195a47bsm432382pjb.53.2023.02.20.01.17.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Feb 2023 01:17:10 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: tkhai@ya.ru, sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH 2/5] mm: vmscan: make memcg slab shrink lockless Date: Mon, 20 Feb 2023 17:16:34 +0800 Message-Id: <20230220091637.64865-3-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230220091637.64865-1-zhengqi.arch@bytedance.com> References: <20230220091637.64865-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Like global slab shrink, since commit 1cd0bd06093c ("rcu: Remove CONFIG_SRCU"), it's time to use SRCU to protect readers who previously held shrinker_rwsem. We can test with the following script: ``` DIR="/root/shrinker/memcg/mnt" do_create() { mkdir /sys/fs/cgroup/memory/test echo 200M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes for i in `seq 0 $1`; do mkdir /sys/fs/cgroup/memory/test/$i; echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; mkdir -p $DIR/$i; done } do_mount() { for i in `seq $1 $2`; do mount -t tmpfs $i $DIR/$i; done } do_touch() { for i in `seq $1 $2`; do echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 & done } do_create 2000 do_mount 0 2000 do_touch 0 1000 ``` Before applying: 46.60% [kernel] [k] down_read_trylock 18.70% [kernel] [k] up_read 15.44% [kernel] [k] shrink_slab 4.37% [kernel] [k] _find_next_bit 2.75% [kernel] [k] xa_load 2.07% [kernel] [k] idr_find 1.73% [kernel] [k] do_shrink_slab 1.42% [kernel] [k] shrink_lruvec 0.74% [kernel] [k] shrink_node 0.60% [kernel] [k] list_lru_count_one After applying: 19.53% [kernel] [k] _find_next_bit 14.63% [kernel] [k] do_shrink_slab 14.58% [kernel] [k] shrink_slab 11.83% [kernel] [k] shrink_lruvec 9.33% [kernel] [k] __blk_flush_plug 6.67% [kernel] [k] mem_cgroup_iter 3.73% [kernel] [k] list_lru_count_one 2.43% [kernel] [k] shrink_node 1.96% [kernel] [k] super_cache_count 1.78% [kernel] [k] __rcu_read_unlock 1.38% [kernel] [k] __srcu_read_lock 1.30% [kernel] [k] xas_descend We can see that the readers is no longer blocked. Signed-off-by: Qi Zheng --- mm/vmscan.c | 56 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 32 insertions(+), 24 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 95a3d6ddc6c1..dc47396ecd0e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -57,6 +57,7 @@ #include #include #include +#include #include #include @@ -221,8 +222,21 @@ static inline int shrinker_defer_size(int nr_items) static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg, int nid) { - return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info, - lockdep_is_held(&shrinker_rwsem)); + return srcu_dereference_check(memcg->nodeinfo[nid]->shrinker_info, + &shrinker_srcu, + lockdep_is_held(&shrinker_rwsem)); +} + +static struct shrinker_info *shrinker_info_srcu(struct mem_cgroup *memcg, + int nid) +{ + return srcu_dereference(memcg->nodeinfo[nid]->shrinker_info, + &shrinker_srcu); +} + +static void free_shrinker_info_rcu(struct rcu_head *head) +{ + kvfree(container_of(head, struct shrinker_info, rcu)); } static int expand_one_shrinker_info(struct mem_cgroup *memcg, @@ -257,7 +271,7 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg, defer_size - old_defer_size); rcu_assign_pointer(pn->shrinker_info, new); - kvfree_rcu(old, rcu); + call_srcu(&shrinker_srcu, &old->rcu, free_shrinker_info_rcu); } return 0; @@ -350,13 +364,14 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id) { if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) { struct shrinker_info *info; + int srcu_idx; - rcu_read_lock(); - info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); + srcu_idx = srcu_read_lock(&shrinker_srcu); + info = shrinker_info_srcu(memcg, nid); /* Pairs with smp mb in shrink_slab() */ smp_mb__before_atomic(); set_bit(shrinker_id, info->map); - rcu_read_unlock(); + srcu_read_unlock(&shrinker_srcu, srcu_idx); } } @@ -370,7 +385,6 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker) return -ENOSYS; down_write(&shrinker_rwsem); - /* This may call shrinker, so it must use down_read_trylock() */ id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); if (id < 0) goto unlock; @@ -404,7 +418,7 @@ static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker, { struct shrinker_info *info; - info = shrinker_info_protected(memcg, nid); + info = shrinker_info_srcu(memcg, nid); return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0); } @@ -413,13 +427,13 @@ static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker, { struct shrinker_info *info; - info = shrinker_info_protected(memcg, nid); + info = shrinker_info_srcu(memcg, nid); return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]); } void reparent_shrinker_deferred(struct mem_cgroup *memcg) { - int i, nid; + int i, nid, srcu_idx; long nr; struct mem_cgroup *parent; struct shrinker_info *child_info, *parent_info; @@ -429,16 +443,16 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg) parent = root_mem_cgroup; /* Prevent from concurrent shrinker_info expand */ - down_read(&shrinker_rwsem); + srcu_idx = srcu_read_lock(&shrinker_srcu); for_each_node(nid) { - child_info = shrinker_info_protected(memcg, nid); - parent_info = shrinker_info_protected(parent, nid); + child_info = shrinker_info_srcu(memcg, nid); + parent_info = shrinker_info_srcu(parent, nid); for (i = 0; i < shrinker_nr_max; i++) { nr = atomic_long_read(&child_info->nr_deferred[i]); atomic_long_add(nr, &parent_info->nr_deferred[i]); } } - up_read(&shrinker_rwsem); + srcu_read_unlock(&shrinker_srcu, srcu_idx); } static bool cgroup_reclaim(struct scan_control *sc) @@ -891,15 +905,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, { struct shrinker_info *info; unsigned long ret, freed = 0; + int srcu_idx; int i; if (!mem_cgroup_online(memcg)) return 0; - if (!down_read_trylock(&shrinker_rwsem)) - return 0; - - info = shrinker_info_protected(memcg, nid); + srcu_idx = srcu_read_lock(&shrinker_srcu); + info = shrinker_info_srcu(memcg, nid); if (unlikely(!info)) goto unlock; @@ -949,14 +962,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, set_shrinker_bit(memcg, nid, i); } freed += ret; - - if (rwsem_is_contended(&shrinker_rwsem)) { - freed = freed ? : 1; - break; - } } unlock: - up_read(&shrinker_rwsem); + srcu_read_unlock(&shrinker_srcu, srcu_idx); return freed; } #else /* CONFIG_MEMCG */ -- 2.20.1