Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp650937rwd; Sat, 27 May 2023 04:29:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7CDxe1CSA/7NbnwFSLrplIE3Ds/pLRnGM7KJR2MBmZPbht/h0ITkZB6d8qaujlW+6H17bo X-Received: by 2002:a17:90a:70ce:b0:254:ce9e:53 with SMTP id a14-20020a17090a70ce00b00254ce9e0053mr5732743pjm.23.1685186990089; Sat, 27 May 2023 04:29:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685186990; cv=none; d=google.com; s=arc-20160816; b=0/cdtzJ03xeiJVnCRp/ClXqtJi1FbVHBqPYUoS6u5kN44HBntkN9qxB+uw1F51u0Yv 6vVNEAYiDkgMNz4L3XY3wKuG371EFrGnSvBzLkObX9pZcIO1YZzG6VRy/H9OHOGhtdEE 0yD7cltWLEKJIJbOOjGaVJVhyBOVVs/OAgTfUPcodNrjjN9QPYsUoqKufoBpYzFYEnIV Z4FEFVPF+brlrgs1gTOr54d6hpOgdaCfUnioFiDt/vNfHBEz/Gdw+jw+E2sE00TbQcbT Gw6eR8M323HxdNfzOb80AqTUD92u7CcayCuNS1v5IA9Z3R2XAk1f0fAg+3bACSZxvawO Sc/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=ImMt42FX67BHAaNJukodH9xlBc3Z2v7n1XdV+MUKzPs=; b=dBgd+MZR8uWoRlg6bgwRSWoCVOC8e8zZlN/U11bUleur5s7gvBjeQ5C8DPbf7gwiQ4 Y86aj8uhOl809l1ISSieQgpDgt/idajR2kJnicGfG6Wq4hOjkXSRhic03NtRYRVt9OoR BFZeRA7RtvPXgPp7MbJHFdX/FO/fT7Ujsg+sEUCjt1txrCqqIjU3FxKYSE6vrQ0M63zg w3rceLS9YFUX6LPMXXixu8We2KkhoeJImDLWwUrFTYz5bqMVAf6j297NDOQc0YwQrVEe YKwRgcgF/EsZ3DrZ0ngtwVL/6pYN+LdyMeoCyLHA0DO4AHnURMwfGg77aNxUBqmWYpSQ fOSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N8ErpLvH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ms13-20020a17090b234d00b00255fcc84078si6228284pjb.41.2023.05.27.04.29.37; Sat, 27 May 2023 04:29:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N8ErpLvH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231833AbjE0LOM (ORCPT + 99 others); Sat, 27 May 2023 07:14:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229730AbjE0LOL (ORCPT ); Sat, 27 May 2023 07:14:11 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88F3CEB; Sat, 27 May 2023 04:14:09 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 24FEE60A67; Sat, 27 May 2023 11:14:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 51F8EC433D2; Sat, 27 May 2023 11:14:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685186048; bh=8lS5P5G7+ed2XAfQO36H5TkRWeFxg24U8d5lDJYnpGI=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=N8ErpLvHVJqQJvzLJtuyll5Ie+vMSL9FPGkAI1/qRl+EzwSAbFNp27jHBY466zTqI 5NajK88xMih+9QaBiAkC5sBYLTyz1et2yEJN2C/8Z2eguzphH3iWZZO3dYxURUiswe S4VLelwd9N8QttJDqB/nSc4FexEggGnoxMS8ByE37ALCKSQ9Gb7A250AgeUOZppHzL fuHK0CgvkFl89TcFTW/pJLPR+4wgRENAgbujihED3vIRDv98j9PSUPg8KK9pkheA57 irnO6iImrRpKO5qOIl974oaVx3gsj5+UDwrJfntat8OXIyVAJwjw0SU/LbYjy9tQtJ 12uB8DYw9vU+w== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 01298CE0DB0; Sat, 27 May 2023 04:14:05 -0700 (PDT) Date: Sat, 27 May 2023 04:14:05 -0700 From: "Paul E. McKenney" To: Qi Zheng Cc: RCU , Yujie Liu , oe-lkp@lists.linux.dev, lkp@intel.com, linux-kernel@vger.kernel.org, Andrew Morton , Vlastimil Babka , Kirill Tkhai , Roman Gushchin , Christian =?iso-8859-1?Q?K=F6nig?= , David Hildenbrand , Davidlohr Bueso , Johannes Weiner , Michal Hocko , Muchun Song , Shakeel Butt , Yang Shi , linux-mm@kvack.org, ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com Subject: Re: [linus:master] [mm] f95bdb700b: stress-ng.ramfs.ops_per_sec -88.8% regression Message-ID: <44407892-b7bc-4d6c-8e4a-6452f0ee88b9@paulmck-laptop> Reply-To: paulmck@kernel.org References: <202305230837.db2c233f-yujie.liu@intel.com> <896bbb09-d400-ec73-ba3a-b64c6e9bbe46@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 25, 2023 at 12:03:16PM +0800, Qi Zheng wrote: > On 2023/5/24 19:56, Qi Zheng wrote: > > On 2023/5/24 19:08, Qi Zheng wrote: > > > > [...] > > > > > > > > Well, I just ran the following command and reproduced the result: > > > > > > stress-ng --timeout 60 --times --verify --metrics-brief --ramfs 9 & > > > > > > 1) with commit 42c9db3970483: > > > > > > stress-ng: info:? [11023] setting to a 60 second run per stressor > > > stress-ng: info:? [11023] dispatching hogs: 9 ramfs > > > stress-ng: info:? [11023] stressor?????? bogo ops real time? usr > > > time sys time?? bogo ops/s???? bogo ops/s > > > stress-ng: info:? [11023]?????????????????????????? (secs)??? (secs) > > > (secs)?? (real time) (usr+sys time) > > > stress-ng: info:? [11023] ramfs??????????? 774966???? 60.00???? > > > 10.18 169.45???? 12915.89??????? 4314.26 > > > stress-ng: info:? [11023] for a 60.00s run time: > > > stress-ng: info:? [11023]??? 1920.11s available CPU time > > > stress-ng: info:? [11023]????? 10.18s user time?? (? 0.53%) > > > stress-ng: info:? [11023]???? 169.44s system time (? 8.82%) > > > stress-ng: info:? [11023]???? 179.62s total time? (? 9.35%) > > > stress-ng: info:? [11023] load average: 8.99 2.69 0.93 > > > stress-ng: info:? [11023] successful run completed in 60.00s (1 min, > > > 0.00 secs) > > > > > > 2) with commit f95bdb700bc6b: > > > > > > stress-ng: info:? [37676] dispatching hogs: 9 ramfs > > > stress-ng: info:? [37676] stressor?????? bogo ops real time? usr > > > time sys time?? bogo ops/s???? bogo ops/s > > > stress-ng: info:? [37676]?????????????????????????? (secs)??? (secs) > > > (secs)?? (real time) (usr+sys time) > > > stress-ng: info:? [37676] ramfs??????????? 168673???? 60.00????? > > > 1.61 ??39.66????? 2811.08??????? 4087.47 > > > stress-ng: info:? [37676] for a 60.10s run time: > > > stress-ng: info:? [37676]??? 1923.36s available CPU time > > > stress-ng: info:? [37676]?????? 1.60s user time?? (? 0.08%) > > > stress-ng: info:? [37676]????? 39.66s system time (? 2.06%) > > > stress-ng: info:? [37676]????? 41.26s total time? (? 2.15%) > > > stress-ng: info:? [37676] load average: 7.69 3.63 2.36 > > > stress-ng: info:? [37676] successful run completed in 60.10s (1 min, > > > 0.10 secs) > > > > > > The bogo ops/s (real time) did drop significantly. > > > > > > And the memory reclaimation was not triggered in the whole process. so > > > theoretically no one is in the read critical section of shrinker_srcu. > > > > > > Then I found that some stress-ng-ramfs processes were in > > > TASK_UNINTERRUPTIBLE state for a long time: > > > > > > root?????? 42313? 0.0? 0.0? 69592? 2068 pts/0??? S??? 19:00?? 0:00 > > > stress-ng-ramfs [run] > > > root?????? 42314? 0.0? 0.0? 69592? 2068 pts/0??? S??? 19:00?? 0:00 > > > stress-ng-ramfs [run] > > > root?????? 42315? 0.0? 0.0? 69592? 2068 pts/0??? S??? 19:00?? 0:00 > > > stress-ng-ramfs [run] > > > root?????? 42316? 0.0? 0.0? 69592? 2068 pts/0??? S??? 19:00?? 0:00 > > > stress-ng-ramfs [run] > > > root?????? 42317? 7.8? 0.0? 69592? 1812 pts/0??? D??? 19:00?? 0:02 > > > stress-ng-ramfs [run] > > > root?????? 42318? 0.0? 0.0? 69592? 2068 pts/0??? S??? 19:00?? 0:00 > > > stress-ng-ramfs [run] > > > root?????? 42319? 7.8? 0.0? 69592? 1812 pts/0??? D??? 19:00?? 0:02 > > > stress-ng-ramfs [run] > > > root?????? 42320? 0.0? 0.0? 69592? 2068 pts/0??? S??? 19:00?? 0:00 > > > stress-ng-ramfs [run] > > > root?????? 42321? 7.8? 0.0? 69592? 1812 pts/0??? D??? 19:00?? 0:02 > > > stress-ng-ramfs [run] > > > root?????? 42322? 0.0? 0.0? 69592? 2068 pts/0??? S??? 19:00?? 0:00 > > > stress-ng-ramfs [run] > > > root?????? 42323? 7.8? 0.0? 69592? 1812 pts/0??? D??? 19:00?? 0:02 > > > stress-ng-ramfs [run] > > > root?????? 42324? 0.0? 0.0? 69592? 2068 pts/0??? S??? 19:00?? 0:00 > > > stress-ng-ramfs [run] > > > root?????? 42325? 7.8? 0.0? 69592? 1812 pts/0??? D??? 19:00?? 0:02 > > > stress-ng-ramfs [run] > > > root?????? 42326? 0.0? 0.0? 69592? 2068 pts/0??? S??? 19:00?? 0:00 > > > stress-ng-ramfs [run] > > > root?????? 42327? 7.9? 0.0? 69592? 1812 pts/0??? D??? 19:00?? 0:02 > > > stress-ng-ramfs [run] > > > root?????? 42328? 7.9? 0.0? 69592? 1812 pts/0??? D??? 19:00?? 0:02 > > > stress-ng-ramfs [run] > > > root?????? 42329? 7.9? 0.0? 69592? 1812 pts/0??? D??? 19:00?? 0:02 > > > stress-ng-ramfs [run] > > > root?????? 42330? 7.9? 0.0? 69592? 1556 pts/0??? D??? 19:00?? 0:02 > > > stress-ng-ramfs [run] > > > > > > Their call stack is as follows: > > > > > > cat /proc/42330/stack > > > > > > [<0>] __synchronize_srcu.part.21+0x83/0xb0 > > > [<0>] unregister_shrinker+0x85/0xb0 > > > [<0>] deactivate_locked_super+0x27/0x70 > > > [<0>] cleanup_mnt+0xb8/0x140 > > > [<0>] task_work_run+0x65/0x90 > > > [<0>] exit_to_user_mode_prepare+0x1ba/0x1c0 > > > [<0>] syscall_exit_to_user_mode+0x1b/0x40 > > > [<0>] do_syscall_64+0x44/0x80 > > > [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd > > > > > > + RCU folks, Is this result as expected? I would have thought that > > > synchronize_srcu() should return quickly if no one is in the read > > > critical section. :( In theory, it would indeed be nice if synchronize_srcu() would do that. In practice, the act of checking to see if there is anyone in an SRCU read-side critical section is a heavy-weight operation, involving at least one cache miss per CPU along with a number of full memory barriers. So SRCU has to be careful to not check too frequently. However, if SRCU has been idle for some time, normal synchronize_srcu() will do an immediate check. And this will of course mark SRCU as non-idle. > > With the following changes, ops/s can return to previous levels: > > Or just set rcu_expedited to 1: > echo 1 > /sys/kernel/rcu_expedited This does cause SRCU to be much more aggressive. This can be a good choice for small systems, but please keep in mind that this affects normal RCU as well as SRCU. It will cause RCU to also be much more aggressive, sending IPIs to CPUs that are (or might be) in RCU read-side critical sections. Depending on your workload, this might or might not be what you want RCU to be doing. For example, if you are running aggressive real-time workloads, it most definitely is not what you want. > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index db2ed6e08f67..90f541b07cd1 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -763,7 +763,7 @@ void unregister_shrinker(struct shrinker *shrinker) > > ??????? debugfs_entry = shrinker_debugfs_remove(shrinker); > > ??????? up_write(&shrinker_rwsem); > > > > -?????? synchronize_srcu(&shrinker_srcu); > > +?????? synchronize_srcu_expedited(&shrinker_srcu); If shrinkers are unregistered only occasionally, this is an entirely reasonable change. > > ??????? debugfs_remove_recursive(debugfs_entry); > > > > stress-ng: info:? [13159] dispatching hogs: 9 ramfs > > stress-ng: info:? [13159] stressor?????? bogo ops real time? usr time > > sys time?? bogo ops/s???? bogo ops/s > > stress-ng: info:? [13159]?????????????????????????? (secs)??? (secs) > > (secs)?? (real time) (usr+sys time) > > stress-ng: info:? [13159] ramfs??????????? 710062???? 60.00????? 9.63 > > 157.26???? 11834.18??????? 4254.75 > > stress-ng: info:? [13159] for a 60.00s run time: > > stress-ng: info:? [13159]??? 1920.14s available CPU time > > stress-ng: info:? [13159]?????? 9.62s user time?? (? 0.50%) > > stress-ng: info:? [13159]???? 157.26s system time (? 8.19%) > > stress-ng: info:? [13159]???? 166.88s total time? (? 8.69%) > > stress-ng: info:? [13159] load average: 9.49 4.02 1.65 > > stress-ng: info:? [13159] successful run completed in 60.00s (1 min, > > 0.00 secs) > > > > Can we make synchronize_srcu() call synchronize_srcu_expedited() when no > > one is in the read critical section? Yes, in theory we could, but this would be a bad thing in practice. After all, the point of having synchronize_srcu() be separate from synchronize_srcu_expedited() is to allow uses that are OK with longer latency avoid consuming too much CPU. In addition, that longer SRCU grace-period latency allows the next grace period to handle more synchronize_srcu() and call_srcu() requests. This amortizes the overhead of that next grace period over a larger number of updates. However, your use of synchronize_srcu_expedited() does have that effect, but only for this call point. Which has the advantage of avoiding burning excessive quantities of CPU for the other 50+ call points. Thanx, Paul