Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp826970rwe; Thu, 25 Aug 2022 09:45:57 -0700 (PDT) X-Google-Smtp-Source: AA6agR5so2BlGlbxLXAomtfMLVTB2V7fac1KoZiAWqXrnwBTwys1T18I8Zipm6xemyUnI324lFnL X-Received: by 2002:a05:6402:40c9:b0:446:e1dc:5c3d with SMTP id z9-20020a05640240c900b00446e1dc5c3dmr3749297edb.270.1661445957764; Thu, 25 Aug 2022 09:45:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661445957; cv=none; d=google.com; s=arc-20160816; b=I70gBHvaoxWbIbqZ74QD3RnfaNc0qAhLxp/iPlba7IzmibjlwAPJNKJcaTVL668STi AWY7ftfGUu9U5rQn1kVEBkv2TsvHDR50cUp0E2Mb9glOv08qmcVOJjA64DZ441TDSWjU iRBBNlJCkUwVsbxUtAAh2HsxJd5Oc6bEbANaVC3jN13cD3Pl3K57PbDcj9H7fOqqgAST 4eJM+1aIkNfmta5ND9PLLH4dGC5fzuo828Kyh6sdnxL5KHmOU8/cAPjSx6mnjbfTMdpl HfEO4xBPLxDXgwhPCiRsSGge9Iy7GUJQUdqNCFyc45RkK2Pd/tjZ3+H8WyfGbvLOCuxE E8lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=OtUK9OLMWdL6AUc3ZfpyXK1kgh1agdbXcj5hc+979G8=; b=zo0OdVwf7nAqK1YDYpwkLbrYA6TcgMOEL3fcWwmn2F+mAksT0B9XIXuRFGXcffefkv crspwKwMceSF+XQhbeeRgkw12dvhhOL+3MPgIzHadnJoPBxYaKUHgNozr0fbUk82bAVe dWxvdxTXUz0bAKHYdrpHzj33RlYEpcvWNQBynrGnHslzvuL8tx5bAXmfb5OotuXUKUhc 3UP77RrBqEbtpvxgm1ugjaUha1bJA1q8wFQldS6D5NodGvOyuQY6+tjRxQbaA3alSXFv gdG6hG2oUzglPTd/x+j1oaJVMmdA6tbvX54m6U51qwfIQTQ6+0CHabpkcSWqKeDAWCbn C+Ow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=xHJiXgBT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f18-20020a0564021e9200b0043adeeba898si7745388edf.283.2022.08.25.09.45.30; Thu, 25 Aug 2022 09:45:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=xHJiXgBT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243276AbiHYQnd (ORCPT + 99 others); Thu, 25 Aug 2022 12:43:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243117AbiHYQnE (ORCPT ); Thu, 25 Aug 2022 12:43:04 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6440EBB928 for ; Thu, 25 Aug 2022 09:42:35 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id r15-20020a17090a1bcf00b001fabf42a11cso5663629pjr.3 for ; Thu, 25 Aug 2022 09:42:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc; bh=OtUK9OLMWdL6AUc3ZfpyXK1kgh1agdbXcj5hc+979G8=; b=xHJiXgBT7PEkZdMevyBPSnrJlZZCDwl6ozciskiNf9lpSMP/BhlEA/uOVlntwEOMoP 8AsYSYGjz+lts4hRawuyJ6l6j1hjlPAVOBqnpGUKXbtEw9ZdqEkFmWYzCoRhk5hEVMqd LookuDXfD+8p7qtuCBQ2Us57jbxcmrBlqm9sBNsMnW+PY5fnbQg7FmU8Pv54Ao1pZ+jO Qlv7oV81DA9EYTvqlGNwLjuuhXOAJPyHxTpLFABwb7wAi1ZGz+PtCwoQyKk7qJ6TBCwf YrTIDzukpS+oc/W+qzGWsltS5E3XkCXyR6vIOWTqg9xpCYf+3ygIVs+5KfRLmrQe7Qwx altQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc; bh=OtUK9OLMWdL6AUc3ZfpyXK1kgh1agdbXcj5hc+979G8=; b=EF5g+YnWAMzWHn2rcDXyDe7+SDI/NCvAMLitPzsWRsC3nL71vgorFmZ1gotOrD7Q5D kzIC6qzN9mjO/1WDdsVZpHrjNsWIllBCeZc8Ewo79eh0IjQSs/JpSTVvPox47EWmgDch y6xCGH46j/t6D+8nmEv2dLLkWR7Ppm0MGqImQQQtH/fj+EBKw5XaGBk8nhVQbmVCrYEk 8cvnOGXjs9HLkOyKR1d6hOVz3pvihxExCDY8HLA2XFUBMoAZn+FNKpd9qpxEhY8J16Tf cNMXiXC0uKfFkLXtBSw/uZePIdLuKsQCJohToJQY/6yBCgo3tonRaHJePH7NeBDdnmQ/ YHXA== X-Gm-Message-State: ACgBeo1Xtd7CGwld6RKMckvjIkwuzJ/hmEVvUVicHcUvwxZDodnA1Xor 7HYqLQd6UfrgGbcAoNGsEBL1Ig== X-Received: by 2002:a17:902:f70f:b0:171:2572:4f5e with SMTP id h15-20020a170902f70f00b0017125724f5emr4422269plo.40.1661445754247; Thu, 25 Aug 2022 09:42:34 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id b18-20020a62a112000000b005362314bf80sm12779408pff.67.2022.08.25.09.42.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 09:42:33 -0700 (PDT) From: Chengming Zhou To: hannes@cmpxchg.org, tj@kernel.org, mkoutny@suse.com, surenb@google.com Cc: mingo@redhat.com, peterz@infradead.org, gregkh@linuxfoundation.org, corbet@lwn.net, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, songmuchun@bytedance.com, Chengming Zhou Subject: [PATCH v4 00/10] sched/psi: some optimizations and extensions Date: Fri, 26 Aug 2022 00:41:01 +0800 Message-Id: <20220825164111.29534-1-zhouchengming@bytedance.com> X-Mailer: git-send-email 2.35.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, This patch series are some optimizations and extensions for PSI. patch 1/10 fix periodic aggregation shut off problem introduced by earlier commit 4117cebf1a9f ("psi: Optimize task switch inside shared cgroups"). patch 2-4 are some misc optimizations, so put them in front of this series. patch 5/10 optimize task switch inside shared cgroups when in_memstall status of prev task and next task are different. patch 6/10 remove NR_ONCPU task accounting to save 4 bytes in the first cacheline to be used by the following patch 7/10, which introduce new PSI resource PSI_IRQ to track IRQ/SOFTIRQ pressure stall information. patch 8-9 cache parent psi_group in struct psi_group to speed up the hot iteration path. patch 10/10 introduce a per-cgroup interface "cgroup.pressure" to disable or re-enable PSI in the cgroup level, and we implement hiding and unhiding the pressure files per Tejun's suggestion[1], which depends on his work[2]. [1] https://lore.kernel.org/all/YvqjhqJQi2J8RG3X@slm.duckdns.org/ [2] https://lore.kernel.org/all/20220820000550.367085-1-tj@kernel.org/ Performance test using mmtests/config-scheduler-perfpipe in /user.slice/user-0.slice/session-4.scope: next patched patched/only-leaf Min Time 8.82 ( 0.00%) 8.49 ( 3.74%) 8.00 ( 9.32%) 1st-qrtle Time 8.90 ( 0.00%) 8.58 ( 3.63%) 8.05 ( 9.58%) 2nd-qrtle Time 8.94 ( 0.00%) 8.61 ( 3.65%) 8.09 ( 9.50%) 3rd-qrtle Time 8.99 ( 0.00%) 8.65 ( 3.75%) 8.15 ( 9.35%) Max-1 Time 8.82 ( 0.00%) 8.49 ( 3.74%) 8.00 ( 9.32%) Max-5 Time 8.82 ( 0.00%) 8.49 ( 3.74%) 8.00 ( 9.32%) Max-10 Time 8.84 ( 0.00%) 8.55 ( 3.20%) 8.04 ( 9.05%) Max-90 Time 9.04 ( 0.00%) 8.67 ( 4.10%) 8.18 ( 9.51%) Max-95 Time 9.04 ( 0.00%) 8.68 ( 4.03%) 8.20 ( 9.26%) Max-99 Time 9.07 ( 0.00%) 8.73 ( 3.82%) 8.25 ( 9.11%) Max Time 9.12 ( 0.00%) 8.89 ( 2.54%) 8.27 ( 9.29%) Amean Time 8.95 ( 0.00%) 8.62 * 3.67%* 8.11 * 9.43%* Big thanks to Johannes Weiner, Tejun Heo and Michal Koutný for your suggestions and review! Changes in v4: - Collect Acked-by tags from Johannes Weiner. - Add many clear comments and changelogs per Johannes Weiner. - Replace for_each_psi_group() with better open-code. - Change to use better names cgroup_pressure_show() and cgroup_pressure_write(). - Change to use better name psi_cgroup_restart() and only call it on enabling. Changes in v3: - Rebase on linux-next and reorder patches to put misc optimizations patches in the front of this series. - Drop patch "sched/psi: don't change task psi_flags when migrate CPU/group" since it caused a little performance regression and it's just code refactoring, so drop it. - Don't define PSI_IRQ and PSI_IRQ_FULL when !CONFIG_IRQ_TIME_ACCOUNTING, in which case they are not used. - Add patch 8/10 "sched/psi: consolidate cgroup_psi()" make cgroup_psi() can handle all cgroups including root cgroup, make patch 9/10 simpler. - Rename interface to "cgroup.pressure" and add some explanation per Michal's suggestion. - Hide and unhide pressure files when disable/re-enable cgroup PSI, depends on Tejun's work. Changes in v2: - Add Acked-by tags from Johannes Weiner. Thanks for review! - Fix periodic aggregation wakeup for common ancestors in psi_task_switch(). - Add patch 7/10 from Johannes Weiner, which remove NR_ONCPU task accounting to save 4 bytes in the first cacheline. - Remove "psi_irq=" kernel cmdline parameter in last version. - Add per-cgroup interface "cgroup.psi" to disable/re-enable PSI stats accounting in the cgroup level. Chengming Zhou (9): sched/psi: fix periodic aggregation shut off sched/psi: don't create cgroup PSI files when psi_disabled sched/psi: save percpu memory when !psi_cgroups_enabled sched/psi: move private helpers to sched/stats.h sched/psi: optimize task switch inside shared cgroups again sched/psi: add PSI_IRQ to track IRQ/SOFTIRQ pressure sched/psi: consolidate cgroup_psi() sched/psi: cache parent psi_group to speed up groups iterate sched/psi: per-cgroup PSI accounting disable/re-enable interface Johannes Weiner (1): sched/psi: remove NR_ONCPU task accounting Documentation/admin-guide/cgroup-v2.rst | 23 ++ include/linux/cgroup-defs.h | 3 + include/linux/cgroup.h | 5 - include/linux/psi.h | 12 +- include/linux/psi_types.h | 29 ++- kernel/cgroup/cgroup.c | 106 ++++++++- kernel/sched/core.c | 1 + kernel/sched/psi.c | 280 +++++++++++++++++------- kernel/sched/stats.h | 6 + 9 files changed, 362 insertions(+), 103 deletions(-) -- 2.37.2