Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp555930rwd; Thu, 1 Jun 2023 04:00:01 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ60ixgInEuEZrTTio0GU6cIVFSgHz9Bmuzq4KNDJG80QmNZkWUREpspZwqrlz6OJv6AMxq2 X-Received: by 2002:a05:6a00:1491:b0:62a:4503:53ba with SMTP id v17-20020a056a00149100b0062a450353bamr7797507pfu.26.1685617201452; Thu, 01 Jun 2023 04:00:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685617201; cv=none; d=google.com; s=arc-20160816; b=zkKRHZCVjkW00UeO6sxtFzyI4/rjbzqxpkYS3MlmofrH7DdtNw8GNmc+UwiusQLix7 GwPE3xH2lVITWhxaOHB2TAVLuJU0RLchvQ5lv7MP/XSBAqgk8LDf6tq//3gIo8ZO8R0S gz0+sN7FKKF1Z22LUNtasMQ9yMv+0Z1Eln58Mqcf9P88/DV/6d8e9Oy41/v4qndptmOn PIkzzRdpbMWEH+QS7kXtpsZh3iU6lydxufn0DJrxjLhj8pZZxPN9zcTlxcXTnDJ6gTyC FUKdWSmVn2asRV8SrkxfV3y1M+MtoKCG58FwNL7tao1DNMBakNPyt3O8ZgM/sT2vPFdS 4LSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:mime-version:date :dkim-signature:message-id; bh=OaHnJ58f4F7gQfxr4Yam0l+ZAw47gkYMiK47yqVRATI=; b=p4YidTJdMAtQAlJSNyn1goLv5wzXgip1WXOouL/zTuQj/iXgnEL8qQnFGiZRpHG0h8 /2+jqVDZs5MwWAs/TmjuyW8K9zPyrHc7gWtAGeOC1grFWNqnN8p1PyuO20pdu/0hk45s Y3zG4+XfYGvIHL4uif/hjmaHdips1u4Sa6S1FS5UpwwR7gREfeVBAJWaQte9XHC7/+ky k1CYOoz2z9P5fjP+RfokqJXc3ilqBKvj/7Rl79uMbX0XcPHUZ4KbyzrFVc+NtZ7G099H buzN3jbe+Yix6QwOkvCMjMONSMMGMoG9BSt36oh18uHRsAZqSS+11gfFX1DpA4KJRWhI XfFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=jIeEvFCs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a63-20020a639042000000b0053b64127cddsi2689054pge.212.2023.06.01.03.59.47; Thu, 01 Jun 2023 04:00:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=jIeEvFCs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232776AbjFAKpD (ORCPT + 99 others); Thu, 1 Jun 2023 06:45:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233097AbjFAKov (ORCPT ); Thu, 1 Jun 2023 06:44:51 -0400 Received: from out-51.mta1.migadu.com (out-51.mta1.migadu.com [IPv6:2001:41d0:203:375::33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B22218D for ; Thu, 1 Jun 2023 03:44:48 -0700 (PDT) Message-ID: <8b2b273a-1df1-5834-9d3a-397ca31f7e87@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1685616286; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OaHnJ58f4F7gQfxr4Yam0l+ZAw47gkYMiK47yqVRATI=; b=jIeEvFCs2Br/Jzp3f/cZhUviHf961l3cSRA4XOjxZlgg5W9cJ/AD7JDXwmndNzh4y9jPE1 GaGnd0NdUzGkCfiwZ0Db7YbI4iiJ/KGb/ZuLl5taMgEN5CHgI/VGDv46xYfW30uUeuqr1F edxTvFl4goR2THWFDFQDsJ8lsGoQUBA= Date: Thu, 1 Jun 2023 18:44:37 +0800 MIME-Version: 1.0 Subject: Re: [linus:master] [mm] f95bdb700b: stress-ng.ramfs.ops_per_sec -88.8% regression Content-Language: en-US To: Kirill Tkhai Cc: RCU , "paulmck@kernel.org" , Yujie Liu , "oe-lkp@lists.linux.dev" , "lkp@intel.com" , "linux-kernel@vger.kernel.org" , Andrew Morton , Vlastimil Babka , Roman Gushchin , =?UTF-8?Q?Christian_K=c3=b6nig?= , David Hildenbrand , Davidlohr Bueso , Johannes Weiner , Michal Hocko , Muchun Song , Shakeel Butt , Yang Shi , "linux-mm@kvack.org" , "ying.huang@intel.com" , "feng.tang@intel.com" , "fengwei.yin@intel.com" , "david@fromorbit.com" , Christian Brauner References: <202305230837.db2c233f-yujie.liu@intel.com> <896bbb09-d400-ec73-ba3a-b64c6e9bbe46@linux.dev> <44407892-b7bc-4d6c-8e4a-6452f0ee88b9@paulmck-laptop> <095806f1-f7f0-4914-b04b-c874fb25bb83@paulmck-laptop> <26da75a6-115f-a17a-73bb-381a6b4a3a75@linux.dev> <67cdf0ed-19fa-ac28-681f-e07691b7587a@linux.dev> <932751685611907@mail.yandex.ru> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Qi Zheng In-Reply-To: <932751685611907@mail.yandex.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Kirill, On 2023/6/1 18:06, Kirill Tkhai wrote: > 01.06.2023, 11:34, "Qi Zheng" >: > > > > On 2023/6/1 08:57, Kirill Tkhai wrote: > >  Hi, > >  On 30.05.2023 06:07, Qi Zheng wrote: > >  Hi, > >  On 2023/5/29 20:51, Paul E. McKenney wrote: > >  On Mon, May 29, 2023 at 10:39:21AM +0800, Qi Zheng wrote: > > >  [...] > > >  Thanks for such a detailed explanation. > >  Now I think we can continue to try to complete the > idea[1] from >  Kirill Tkhai. The patch moves heavy > synchronize_srcu() to delayed >  work, so it doesn't affect on user-visible > unregistration speed. > >  [1]. > https://lore.kernel.org/lkml/153365636747.19074.12610817307548583381.stgit@localhost.localdomain/ > > >  A blast from the past!  ;-) > >  But yes, moving the long-latency synchronize_srcu() > off the user-visible >  critical code path can be even better. > > >  Yeah, I applied these patches  ([PATCH RFC 04/10]~[PATCH > RFC 10/10], >  with few conflicts), the ops/s does get back to the > previous levels. > >  I'll continue updating this patchset after doing more testing. > > >  You may also fix the issue using the below generic solution. > >  In addition to this we need patch, which calls > unregister_shrinker_delayed_initiate() >  instead of unregister_shrinker() in deactivate_locked_super(), > and calls >  unregister_shrinker_delayed_finalize() from > destroy_super_work(). Compilation tested only. > >  --- >    include/linux/shrinker.h | 2 ++ >    mm/vmscan.c | 38 ++++++++++++++++++++++++++++++-------- >    2 files changed, 32 insertions(+), 8 deletions(-) >  diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h >  index 224293b2dd06..4ba2986716d3 100644 >  --- a/include/linux/shrinker.h >  +++ b/include/linux/shrinker.h >  @@ -4,6 +4,7 @@ > >    #include >    #include >  +#include > >    /* >     * This struct is used to pass information from page reclaim > to the shrinkers. >  @@ -83,6 +84,7 @@ struct shrinker { >    #endif >            /* objs pending delete, per node */ >            atomic_long_t *nr_deferred; >  + struct rw_semaphore rwsem; >    }; >    #define DEFAULT_SEEKS 2 /* A good number if you don't know > better. */ > >  diff --git a/mm/vmscan.c b/mm/vmscan.c >  index eeca83e28c9b..19fc129771ce 100644 >  --- a/mm/vmscan.c >  +++ b/mm/vmscan.c >  @@ -706,6 +706,7 @@ static int __prealloc_shrinker(struct > shrinker *shrinker) >            if (!shrinker->nr_deferred) >                    return -ENOMEM; > >  + init_rwsem(&shrinker->rwsem); >            return 0; >    } > >  @@ -757,7 +758,9 @@ void register_shrinker_prepared(struct > shrinker *shrinker) >    { >            mutex_lock(&shrinker_mutex); >            list_add_tail_rcu(&shrinker->list, &shrinker_list); >  + down_write(&shrinker->rwsem); >            shrinker->flags |= SHRINKER_REGISTERED; >  + up_write(&shrinker->rwsem); >            shrinker_debugfs_add(shrinker); >            mutex_unlock(&shrinker_mutex); >    } >  @@ -802,7 +805,7 @@ EXPORT_SYMBOL(register_shrinker); >    /* >     * Remove one >     */ >  -void unregister_shrinker(struct shrinker *shrinker) >  +void unregister_shrinker_delayed_initiate(struct shrinker > *shrinker) >    { >            struct dentry *debugfs_entry; >            int debugfs_id; >  @@ -812,20 +815,33 @@ void unregister_shrinker(struct shrinker > *shrinker) > >            mutex_lock(&shrinker_mutex); >            list_del_rcu(&shrinker->list); >  + down_write(&shrinker->rwsem); >            shrinker->flags &= ~SHRINKER_REGISTERED; >  + up_write(&shrinker->rwsem); >            if (shrinker->flags & SHRINKER_MEMCG_AWARE) >                    unregister_memcg_shrinker(shrinker); >            debugfs_entry = shrinker_debugfs_detach(shrinker, > &debugfs_id); >            mutex_unlock(&shrinker_mutex); > >  + shrinker_debugfs_remove(debugfs_entry, debugfs_id); // This > is moved in your patch >  +} >  +EXPORT_SYMBOL(unregister_shrinker_delayed_initiate); >  + >  +void unregister_shrinker_delayed_finalize(struct shrinker > *shrinker) >  +{ >            atomic_inc(&shrinker_srcu_generation); >            synchronize_srcu(&shrinker_srcu); > >  - shrinker_debugfs_remove(debugfs_entry, debugfs_id); >  - >            kfree(shrinker->nr_deferred); >            shrinker->nr_deferred = NULL; >    } >  +EXPORT_SYMBOL(unregister_shrinker_delayed_finalize); >  + >  +void unregister_shrinker(struct shrinker *shrinker) >  +{ >  + unregister_shrinker_delayed_initiate(shrinker); >  + unregister_shrinker_delayed_finalize(shrinker); >  +} >    EXPORT_SYMBOL(unregister_shrinker); > >    /** >  @@ -856,9 +872,14 @@ static unsigned long > do_shrink_slab(struct shrink_control *shrinkctl, >                                              : SHRINK_BATCH; >            long scanned = 0, next_deferred; > >  + down_read(&shrinker->rwsem); >  + if (!(shrinker->flags & SHRINKER_REGISTERED)) >  + goto unlock; >            freeable = shrinker->count_objects(shrinker, shrinkctl); >  - if (freeable == 0 || freeable == SHRINK_EMPTY) >  - return freeable; >  + if (freeable == 0 || freeable == SHRINK_EMPTY) { >  + freed = freeable; >  + goto unlock; >  + } > >            /* >             * copy the current shrinker scan count into a local > variable >  @@ -935,6 +956,8 @@ static unsigned long do_shrink_slab(struct > shrink_control *shrinkctl, >             * manner that handles concurrent updates. >             */ >            new_nr = add_nr_deferred(next_deferred, shrinker, > shrinkctl); >  +unlock: >  + up_read(&shrinker->rwsem); > > > It should be moved after trace_mm_shrink_slab_end(). > > Could you explain the reason? I don't see the variable it will protect. We jump to unlock label before trace_mm_shrink_slab_start(), so I think we should not go to call trace_mm_shrink_slab_end(). > > >            trace_mm_shrink_slab_end(shrinker, shrinkctl->nid, > freed, nr, new_nr, total_scan); >            return freed; >  @@ -968,9 +991,8 @@ static unsigned long > shrink_slab_memcg(gfp_t gfp_mask, int nid, >                    struct shrinker *shrinker; > >                    shrinker = idr_find(&shrinker_idr, i); >  - if (unlikely(!shrinker || !(shrinker->flags & > SHRINKER_REGISTERED))) { >  - if (!shrinker) >  - clear_bit(i, info->map); >  + if (unlikely(!shrinker)) { >  + clear_bit(i, info->map); >                            continue; >                    } > > > Keep this as a fast path? > > Probably, yes. And also we need 1)down_trylock() instead of down_read() >  and 2)rwsem_is_contended  in do_shrink_slab(). Agree, although the critical section of the writer of shrinker->rwsem is very short, this prevents unnecessary sleeps. > > > After applying the above patch, the performance regression problem of > ops/s can be solved. And it can be guaranteed that the shrinker is not > running after unregister_shrinker_delayed_initiate(), so the previous > semantics are not broken. > > Keeping old semantics or not is quite subjective, I think. It's possible > to provide strong arguments for both cases. First is faster, second is > easier to adopt for users. For me personally the faster approach looks > better. Agree. I also like completely lock-less slab shrink. > >nce the lock granularity of down_read() has changed to the granularity > > of per shrinker, it seems that the down_read() perf hotspot will not be > very high. I'm not quite sure why. > > (The test script used is the script in > https://lore.kernel.org/all/20230313112819.38938-4-zhengqi.arch@bytedance.com/ ) > > Hmm, possible original shrinker_rwsem had a lot of atomic intersections > between cpus on down_read(), while with small locks it has not. Are I guess so. > CONFIG_ for locks debug are same in original and this case? Basically yes, I will do some more testing. Thanks, Qi > > >    25.28% [kernel] [k] do_shrink_slab >    21.91% [kernel] [k] pv_native_safe_halt >    10.81% [kernel] [k] _find_next_bit >    10.47% [kernel] [k] down_read >     8.75% [kernel] [k] shrink_slab >     4.03% [kernel] [k] up_read >     3.29% [kernel] [k] shrink_lruvec >     2.75% [kernel] [k] xa_load >     2.73% [kernel] [k] mem_cgroup_iter >     2.67% [kernel] [k] shrink_node >     1.30% [kernel] [k] list_lru_count_one > > Thanks, > Qi > > -- Thanks, Qi