Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp906750ybl; Tue, 28 Jan 2020 14:37:15 -0800 (PST) X-Google-Smtp-Source: APXvYqykIeTUapwJcl2bDHt+KiW32/iflnv43j3mFg38IrfFgqPoY2ZA+w6ImoC4MWMQARm1BJpT X-Received: by 2002:a54:4182:: with SMTP id 2mr4387855oiy.14.1580251035776; Tue, 28 Jan 2020 14:37:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580251035; cv=none; d=google.com; s=arc-20160816; b=K7BrbNVt0cGJ+6wlA3BYmo7DDLL0vSJ3DpzowD4zueWFrQzHGiKPw+CI5qryvxcqQ1 91acjD19drM3rs1hUbV0FbLkbh5w6FEmAknFJTti64b+Mk8QoBqLxe8n/usPo+N1vTUv MqX1uynMfuVBXcQYszIBU951DPnV1fy09sOeMJMHDmT8cEuqLJWoVqMPkEnPwU0iqxeE INEH8jSUCbtWw5vzUgXXfirZA8RTAn5T2uqWZ9c/hFlulkb+BH35aJXtjRlsN7EUADik x1F2HLINun7QraoVZMymV09OM7+hAjmDdlYwu4e7TMDavVYrtGNSNSE7DH+9W65EePTF rPvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature; bh=r2imr4Qyz3++THwAeaYqJluRAXMOC9erDmyCBduis5c=; b=dTvHflTA5qiSl1486jctP2A8j3Cv7f1KIhhOW74XmSydpy748FKhGMGnMfCj+LfkzV uZU8KEaIPRJFsJl7/LGXHA3l7Ubtf3b8P48I0NAwxj5MKAkDO8qhc2KYcm1gNc5Bh7Ji 8jpV3S8JslreOp+J0WZT6g0w/Vrpk4a8H0/GzyYptt2JRuoRDC6/d5MoNqHZkCzWuKD9 IVGvRVoiRZsVh7m0R+gUr8cfnNp8APgbrnKJrEbqi9iW2MBVb/tDIoFhEdMfZ9kaSP/Z CYDTyc4JpChvVGeyDtxmjdP53+eagw6QJdTG525Qz7S9QG4jT7BB7ZmcLre/r4QQNfXt JqRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=wGUZlBr7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v25si49035ote.90.2020.01.28.14.37.03; Tue, 28 Jan 2020 14:37:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=wGUZlBr7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726340AbgA1WgL (ORCPT + 99 others); Tue, 28 Jan 2020 17:36:11 -0500 Received: from mail-qv1-f52.google.com ([209.85.219.52]:39087 "EHLO mail-qv1-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726257AbgA1WgL (ORCPT ); Tue, 28 Jan 2020 17:36:11 -0500 Received: by mail-qv1-f52.google.com with SMTP id y8so7083943qvk.6 for ; Tue, 28 Jan 2020 14:36:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=r2imr4Qyz3++THwAeaYqJluRAXMOC9erDmyCBduis5c=; b=wGUZlBr7/yFnTgYgm9XDri4uB23r/tplpEJDnq6xOZS3dnYD3bPB5/xFnWTlBQqf9S UYHQ2UGpLCKg/A7V4iqt6DfaMAO6xuKGxmqe6bHpqnIzSEd7pgUOdHpKJpteaGnofR0j PcnShkecUCtyFxApNIw51m93ULH+WVrHN3mY0MkA2WK/gnNPmKlg5MCjMf7oTQJ22cp2 AFEmzlEWos9DbuzvDTDDY59G3SXk3wuIrK+ssX/3ZJyEUYOClGiGoOM7ESZGkXXESGWk OgfgbQu8K/3xv2hsYygs73BZ4xlTCSznskDcAIGIE72p8yxBTTvHqd4uhxRhMNd7MWlH VWlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=r2imr4Qyz3++THwAeaYqJluRAXMOC9erDmyCBduis5c=; b=DWYHx/gC9FLSuLG5nlrtAU/LoMIYGcUg6UBGt0ztFsikc6G05coDR0B6AAKWFhRPJG UCoZjen2Br7HDugNLZqQ+aUlqLgkCccJFZZiZuxiOhdD6UlRUD8DAXbtlXk2Yrwlco6O AC3Lodq4V6DtivVXLOpM082NqJ1Q4NwswrlOSm28ayv8G9lnw4pdQ4IW/WgtoGvu/m2y UlRVedGu/kqbmTdMaI7y7Tw8fWrKxYFimSrYXwKbk/AGJ6EsbFyCsuQ/lsteA3FugY+2 9k74POxliz/ShOx0If+7Co5SPqarDNBhU6EtEbU9Sdv9+J4j8EUzxo+jZGuY0oRRPKz+ Ol9A== X-Gm-Message-State: APjAAAWnE96RAqWecAXcMCC1pdeSbvVTdOeX1YKAaijeK1LI7y864S6n 7vlDiP0TzOX79wV6hDlzcDkbrw== X-Received: by 2002:a0c:e58a:: with SMTP id t10mr25030566qvm.161.1580250969731; Tue, 28 Jan 2020 14:36:09 -0800 (PST) Received: from Thara-Work-Ubuntu.fios-router.home (pool-71-255-246-27.washdc.fios.verizon.net. [71.255.246.27]) by smtp.googlemail.com with ESMTPSA id 124sm13014259qko.11.2020.01.28.14.36.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 28 Jan 2020 14:36:09 -0800 (PST) From: Thara Gopinath To: mingo@redhat.com, peterz@infradead.org, ionela.voinescu@arm.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rui.zhang@intel.com, qperret@google.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, rostedt@goodmis.org, will@kernel.org, catalin.marinas@arm.com, sudeep.holla@arm.com, juri.lelli@redhat.com, corbet@lwn.net Cc: linux-kernel@vger.kernel.org, amit.kachhap@gmail.com, javi.merino@kernel.org, amit.kucheria@verdurent.com Subject: [Patch v9 0/8] Introduce Thermal Pressure Date: Tue, 28 Jan 2020 17:35:59 -0500 Message-Id: <1580250967-4386-1-git-send-email-thara.gopinath@linaro.org> X-Mailer: git-send-email 2.1.4 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thermal governors can respond to an overheat event of a cpu by capping the cpu's maximum possible frequency. This in turn means that the maximum available compute capacity of the cpu is restricted. But today in the kernel, task scheduler is not notified of capping of maximum frequency of a cpu. In other words, scheduler is unaware of maximum capacity restrictions placed on a cpu due to thermal activity. This patch series attempts to address this issue. The benefits identified are better task placement among available cpus in event of overheating which in turn leads to better performance numbers. The reduction in the maximum possible capacity of a cpu due to a thermal event can be considered as thermal pressure. Instantaneous thermal pressure is hard to record and can sometime be erroneous as there can be mismatch between the actual capping of capacity and scheduler recording it. Thus solution is to have a weighted average per cpu value for thermal pressure over time. The weight reflects the amount of time the cpu has spent at a capped maximum frequency. Since thermal pressure is recorded as an average, it must be decayed periodically. Exisiting algorithm in the kernel scheduler pelt framework is re-used to calculate the weighted average. This patch series also defines a sysctl inerface to allow for a configurable decay period. Regarding testing, basic build, boot and sanity testing have been performed on db845c platform with debian file system. Further, dhrystone and hackbench tests have been run with the thermal pressure algorithm. During testing, due to constraints of step wise governor in dealing with big little systems, trip point 0 temperature was made assymetric between cpus in little cluster and big cluster; the idea being that big core will heat up and cpu cooling device will throttle the frequency of the big cores faster, there by limiting the maximum available capacity and the scheduler will spread out tasks to little cores as well. Test Results Hackbench: 1 group , 30000 loops, 10 runs Result SD (Secs) (% of mean) No Thermal Pressure 14.03 2.69% Thermal Pressure PELT Algo. Decay : 32 ms 13.29 0.56% Thermal Pressure PELT Algo. Decay : 64 ms 12.57 1.56% Thermal Pressure PELT Algo. Decay : 128 ms 12.71 1.04% Thermal Pressure PELT Algo. Decay : 256 ms 12.29 1.42% Thermal Pressure PELT Algo. Decay : 512 ms 12.42 1.15% Dhrystone Run Time : 20 threads, 3000 MLOOPS Result SD (Secs) (% of mean) No Thermal Pressure 9.452 4.49% Thermal Pressure PELT Algo. Decay : 32 ms 8.793 5.30% Thermal Pressure PELT Algo. Decay : 64 ms 8.981 5.29% Thermal Pressure PELT Algo. Decay : 128 ms 8.647 6.62% Thermal Pressure PELT Algo. Decay : 256 ms 8.774 6.45% Thermal Pressure PELT Algo. Decay : 512 ms 8.603 5.41% A Brief History The first version of this patch-series was posted with resuing PELT algorithm to decay thermal pressure signal. The discussions that followed were around whether intanteneous thermal pressure solution is better and whether a stand-alone algortihm to accumulate and decay thermal pressure is more appropriate than re-using the PELT framework. Tests on Hikey960 showed the stand-alone algorithm performing slightly better than resuing PELT algorithm and V2 was posted with the stand alone algorithm. Test results were shared as part of this series. Discussions were around re-using PELT algorithm and running further tests with more granular decay period. For some time after this development was impeded due to hardware unavailability, some other unforseen and possibly unfortunate events. For this version, h/w was switched from hikey960 to db845c. Also Instantaneous thermal pressure was never tested as part of this cycle as it is clear that weighted average is a better implementation. The non-PELT algorithm never gave any conclusive results to prove that it is better than reusing PELT algorithm, in this round of testing. Also reusing PELT algorithm means thermal pressure tracks the other utilization signals in the scheduler. v3->v4: - "Patch 3/7:sched: Initialize per cpu thermal pressure structure" is dropped as it is no longer needed following changes in other other patches. - rest of the change log mentioned in specific patches. v5->v6: - "Added arch_ interface APIs to access and update thermal pressure. Moved declaration of per cpu thermal_pressure valriable and infrastructure to update the variable to topology files. v6->v7: - Added CONFIG_HAVE_SCHED_THERMAL_PRESSURE to stub out update_thermal_load_avg in unsupported architectures as per review comments from Peter, Dietmar and Quentin. - Renamed arch_scale_thermal_capacity to arch_cpu_thermal_pressure as per review comments from Peter, Dietmar and Ionela. - Changed the input argument in arch_set_thermal_pressure from capped capacity to delta capacity(thermal pressure) as per Ionela's review comments. Hence the calculation for delta capacity(thermal pressure) is moved to cpufreq_cooling.c. - Fixed a bunch of spelling typos. v7->v8: - Fixed typo in defining update_thermal_load_avg which was causing build errors (reported by kbuild test report) v8->v9: - Defined thermal_load_avg to read rq->avg_thermal.load_avg and avoid cacheline miss in unsupported cases as per Peter's suggestion. - Moved periodic triggering of thermal pressure averaging from CFS tick function to generic scheduler core tick function. - Moved rq_clock_thermal from fair.c to sched.h to enable using the function from multiple files. - Initialized the __shift to 0 in setup_sched_thermal_decay_shift as per Quentin's suggestion - Added an extra patch enabling CONFIG_HAVE_SCHED_THERMAL_PRESSURE as per Dietmar's request. Thara Gopinath (8): sched/pelt: Add support to track thermal pressure sched/topology: Add hook to read per cpu thermal pressure. arm,arm64,drivers:Add infrastructure to store and update instantaneous thermal pressure sched/fair: Enable periodic update of average thermal pressure sched/fair: update cpu_capacity to reflect thermal pressure thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping sched/fair: Enable tuning of decay period arm64: Enable averaging of thermal pressure for arm64 based SoCs Documentation/admin-guide/kernel-parameters.txt | 5 ++++ arch/arm/include/asm/topology.h | 3 +++ arch/arm64/configs/defconfig | 1 + arch/arm64/include/asm/topology.h | 3 +++ drivers/base/arch_topology.c | 11 +++++++++ drivers/thermal/cpufreq_cooling.c | 19 +++++++++++++-- include/linux/arch_topology.h | 10 ++++++++ include/linux/sched/topology.h | 8 +++++++ include/trace/events/sched.h | 4 ++++ init/Kconfig | 4 ++++ kernel/sched/core.c | 3 +++ kernel/sched/fair.c | 25 ++++++++++++++++++++ kernel/sched/pelt.c | 31 +++++++++++++++++++++++++ kernel/sched/pelt.h | 31 +++++++++++++++++++++++++ kernel/sched/sched.h | 21 +++++++++++++++++ 15 files changed, 177 insertions(+), 2 deletions(-) -- 2.1.4