Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp4659000rwi; Mon, 17 Oct 2022 09:02:19 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7RtPL3DpNtV8uauFHzfFvvTkw2nkjMY5W7Rb0kQEKnNBSaGVL1l5igIzZw1nOwqpokoTOY X-Received: by 2002:a17:903:41cb:b0:183:1648:be0f with SMTP id u11-20020a17090341cb00b001831648be0fmr12524488ple.18.1666022538763; Mon, 17 Oct 2022 09:02:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666022538; cv=none; d=google.com; s=arc-20160816; b=0tJvSDzzL0H3Xb5XKb0EDFl3ePVtBiWfTBY6kDFlfjHptiKcjQ6cEf9/gVM1qncUHd /JIYliOhBsUAm6iu89SQgWDrXXzbaHed4HwM4xSbSQgQeTTq7GEIRV0sHpsFcQ8Icqet 8EfhNTGM8Dzf8LiymwE6GPO8CKc5kTd4ZcMArl6k8I/tEMun4pzsC8uQ9Z4S4TuCFjcg zky0KmBHWOmhSs5IfVvc891u9CZYaOKJcBiEQpguJZy8NPYyNiZBRuL3AXdRkkZN3IKk She4JLPFfqm/Ot0SC4BLxHdIbnbhzNEjKlNvrt19TebJlIm/tgvU5IQwvPFB0juV7tzJ QVHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=9eVKGDdOKBAf1KWUPlAQ7fkzCQ4q6EZZ0+fGiKA7GMM=; b=eShLLpibGtgJ+PkF9d5c3tusiQnXl6Awzo9kDjOwSdvZPafEFlai8jn49rKeaASbS7 z8xf5OBRN5XXJn2aP7ALqXR1js/1jmoQvjiH1uoRyvjbYrpzJYUuirp34MdL4tOEGqhk d5QTE0JlAlkzBTjQmU1FrfB3kp2jl4xS4lFj+6v/UHaHfAxvovrVzlZSu8pusd6u5pl+ QgXYaUbvjXKCTgf6RoMkx+mF8eAFPHP0AWC9Z5Qs/2OWnVBwRU8BMuZpkxnAlYIDcOpJ +rs4Z6HzpcUON0he+PVX0ISIDHcXsAtEdI5SgY2QqIvWMd7je+pXREtdJRqZ8JuUcCzk AoeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=tQD4DFFN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u6-20020a170902e80600b001781f860a6dsi14394752plg.228.2022.10.17.09.01.57; Mon, 17 Oct 2022 09:02:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=tQD4DFFN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231428AbiJQO5p (ORCPT + 99 others); Mon, 17 Oct 2022 10:57:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231405AbiJQO4d (ORCPT ); Mon, 17 Oct 2022 10:56:33 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A633422D4 for ; Mon, 17 Oct 2022 07:54:48 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8EF1161199 for ; Mon, 17 Oct 2022 14:54:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 95136C433B5; Mon, 17 Oct 2022 14:54:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1666018455; bh=k3LUoF1mLLygSUyHn+ANmlZeEI28iMMxN8qvT1nVFeg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=tQD4DFFNW6xje0kUbTAaHTeOqNejlqD4ufAmxa8Vqtd54CSJjKIaFEUR7cmPXgst/ HDLVY9GS0YdjuxvjtnQeSRZwu5mdKQ55GLhxjdeVok2blEse9217cWIO5ynwWcl4qJ vqXwyx8sm1gOz3M53phcJw9nIZA7iHzI2LHnA93M= Date: Mon, 17 Oct 2022 16:54:11 +0200 From: Greg Kroah-Hartman To: Peter Zijlstra Cc: Vishal Chourasia , linux-kernel@vger.kernel.org, mingo@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, srikar@linux.vnet.ibm.com, sshegde@linux.ibm.com Subject: Re: sched/debug: CPU hotplug operation suffers in a large cpu systems Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 17, 2022 at 04:19:31PM +0200, Peter Zijlstra wrote: > > +GregKH who actually knows about debugfs. > > On Mon, Oct 17, 2022 at 06:40:49PM +0530, Vishal Chourasia wrote: > > smt=off operation on system with 1920 CPUs is taking approx 59 mins on v5.14 > > versus 29 mins on v5.11 measured using: > > # time ppc64_cpu --smt=off > > > > Doing a git bisect between kernel v5.11 and v5.14 pointed to the commit > > 3b87f136f8fc ("sched,debug: Convert sysctl sched_domains to debugfs"). This > > commit moves sched_domain information that was originally exported using sysctl > > to debugfs. > > > > Reverting the said commit, gives us the expected good result. > > > > Previously sched domain information was exported at procfs(sysctl): > > /proc/sys/kernel/sched_domain/ but now it gets exported at debugfs > > :/sys/kernel/debug/sched/domains/ > > > > We also observe regression in kernel v6.0-rc4, which vanishes after reverting > > the commit 3b87f136f8fc > > > > # Output of `time ppc64_cpu --smt=off` on different kernel versions > > |-------------------------------------+------------+----------+----------| > > | kernel version | real | user | sys | > > |-------------------------------------+------------+----------+----------| > > | v5.11 | 29m22.007s | 0m0.001s | 0m6.444s | > > | v5.14 | 58m15.719s | 0m0.037s | 0m7.482s | > > | v6.0-rc4 | 59m30.318s | 0m0.055s | 0m7.681s | > > | v6.0-rc4 with 3b87f136f8fc reverted | 32m20.486s | 0m0.029s | 0m7.361s | > > |-------------------------------------+------------+----------+----------| > > > > Machine with 1920 cpus was used for the above experiments. Output of lscpu is > > added below. > > > > # lscpu > > Architecture: ppc64le > > Byte Order: Little Endian > > CPU(s): 1920 > > On-line CPU(s) list: 0-1919 > > Model name: POWER10 (architected), altivec supported > > Model: 2.0 (pvr 0080 0200) > > Thread(s) per core: 8 > > Core(s) per socket: 14 > > Socket(s): 17 > > Physical sockets: 15 > > Physical chips: 1 > > Physical cores/chip: 16 > > > > Through our experiments we have found that even when offlining 1 cpu, functions > > responsible for exporting sched_domain information took more time in case of > > debugfs relative to sysctl. > > > > Experiments using trace-cmd function-graph plugin have shown execution time for > > certain methods common in both the scenarios (procfs and debugfs) differ > > drastically. > > > > Below table list the execution time for some of the symbols for sysctl(procfs) > > and debugfs case. > > > > |--------------------------------+----------------+--------------| > > | method | sysctl | debugfs | > > |--------------------------------+----------------+--------------| > > | unregister_sysctl_table | 0.020050 s | NA | > > | build_sched_domains | 3.090563 s | 3.119130 s | > > | register_sched_domain_sysctl | 0.065487 s | NA | > > | update_sched_domain_debugfs | NA | 2.791232 s | > > | partition_sched_domains_locked | 3.195958 s | 5.933254 s | > > |--------------------------------+----------------+--------------| > > > > Note: partition_sched_domains_locked internally calls build_sched_domains > > and calls other functions respective to what's being currently used to > > export information i.e. sysctl or debugfs > > > > Above numbers are quoted from the case where we tried offlining 1 cpu in system > > with 1920 online cpus. > > > > From the above table, register_sched_domain_sysctl and > > unregister_sysctl_table_collectively took ~0.085 secs, whereas > > update_sched_domain_debugfs took ~2.79 secs. > > > > Root cause: > > > > The observed regression stems from the way these two pseudo-filesystems handle > > creation and deletion of files and directories internally. Yes, debugfs is not optimized for speed or memory usage at all. This happens to be the first code path I have seen that cares about this for debugfs files. You can either work on not creating so many debugfs files (do you really really need all of them all the time?) Or you can work on moving debugfs to use kernfs as the backend logic, which will save you both speed and memory usage overall as kernfs is used to being used on semi-fast paths. Maybe do both? hope this helps, greg k-h