Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp2620350ybi; Thu, 18 Jul 2019 11:19:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqyKHH7G2YyAtIH5a29yD5CSSP4RskZxuScdFoppX/l3mjwk61TLHGVEZjdZ8lG6ZV6fEQ5A X-Received: by 2002:a63:36cc:: with SMTP id d195mr8851411pga.157.1563473940835; Thu, 18 Jul 2019 11:19:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563473940; cv=none; d=google.com; s=arc-20160816; b=KW+/bSEg+yvrupYDeIcD9P+B3yvceVGk6DNbtzCdU0UWsaDCnu7arUOirHFameyScw e5xLLWgHWJIso39f8qItT3lHhEDfVJ/Vn8SNT5u0oCIpRhKm4ZwmYte3jK1rFxVnW0Po dyxeCcNJ1eJBNuR43nef+V8pXAbbsOomp2kQ+u8uPq6U1+VFqE4b5GT4d6OASJ6xgbkx fY2PliyNlLZdkwchzyORdR1+uRX0ap7Zn58nHdrrQptvRDRnBdJ27QsnK2oA6RjI1e5Q RNuC51wNcUEEOUlB/ZzreyjgqDpoqq5VI79BupzQG7AUNz6ZSZhsc9Bby7z+kxhXgBDB +LZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=c9BYLizdMAe9pbGZigt5cyQHb+Xl9X6UaRovfyz4+xQ=; b=yrVzwb2KF8p5VEmNtcfAOy+jABbWeRlvASXk8fnsQc1eKXcgh44o8TD5DIoq4kwV84 txr5kmlZKkZXi4TQlza7nlAZkbM7OT67a+FbsvG+NU/K1OwyN1VsGXuVHd69Eznzti/I ApXzPOngCLtj5soUlLMnyraLxquroZHLKSMsoeg7QTdMP4HQqPSdFzuLwF+z9d8VUrf1 qHUuIUAoVY6hesjrOJS1qpQTHp/1J+HR/Uag6n1rT7jdEjresHhj0eiwRgcswc5kT9D7 yiyqmaC7NVE9HV91HTmSx63wwYLZrLTnv7e69j+YoMnjIBzm8l07DqeO7TZBRLVfY9JQ qeKA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k190si2662001pge.308.2019.07.18.11.18.45; Thu, 18 Jul 2019 11:19:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391029AbfGRSSP (ORCPT + 99 others); Thu, 18 Jul 2019 14:18:15 -0400 Received: from foss.arm.com ([217.140.110.172]:33370 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390994AbfGRSSO (ORCPT ); Thu, 18 Jul 2019 14:18:14 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0221728; Thu, 18 Jul 2019 11:18:13 -0700 (PDT) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 4DE2C3F71A; Thu, 18 Jul 2019 11:18:10 -0700 (PDT) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, linux-api@vger.kernel.org, cgroups@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Tejun Heo , "Rafael J . Wysocki" , Vincent Guittot , Viresh Kumar , Paul Turner , Michal Koutny , Quentin Perret , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle , Suren Baghdasaryan , Alessio Balsini Subject: [PATCH v12 0/6] Add utilization clamping support (CGroups API) Date: Thu, 18 Jul 2019 19:17:42 +0100 Message-Id: <20190718181748.28446-1-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.22.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, this is a respin of: https://lore.kernel.org/lkml/20190708084357.12944-1-patrick.bellasi@arm.com/ which addresses all the comments collected so far: - track requested cgroup's percentage to mask conversion rounding to userspace - use a dedicated variable for parent restrictions - make more explicit in the documentation that the requested "protection" is always capped by the requested "limit" - use the newly added uclamp_mutex to serialize the sysfs write callback - add missing RCU read locks across cpu_util_update_eff() call from uclamp_update_root_tg() - remove not required and confusing sentence from the above changelog - add a new patch to always use enum uclamp_id for clamp_id values - fix percentage's decimals format string as well as adds some small modifications: - introduce UCLAMP_PERCENT_{SHIFT,SCALE} to avoid hardcoded constants - s/uclamp_scale_from_percent()/capacity_from_percent()/ - move range check from cpu_uclamp_{min,max}_write() to capacity_from_percent() The series is based on top of today's Linus master branch (wip for 5.3-rc1): commit 22051d9c4a57 ("Merge tag 'platform-drivers-x86-v5.3-2' of git://git.infradead.org/linux-platform-drivers-x86") Thanks Quentin, Michal and Tejun for your review comments! This has been the first code review targeting specifically the cgroups bits and the series is now hopefully in a better shape. Looking forward for any additional comments! ;) Cheers, Patrick Series Organization =================== The full tree is available here: git://linux-arm.org/linux-pb.git lkml/utilclamp_v12 http://www.linux-arm.org/git?p=linux-pb.git;a=shortlog;h=refs/heads/lkml/utilclamp_v12 Newcomer's Short Abstract ========================= The Linux scheduler tracks a "utilization" signal for each scheduling entity (SE), e.g. tasks, to know how much CPU time they use. This signal allows the scheduler to know how "big" a task is and, in principle, it can support advanced task placement strategies by selecting the best CPU to run a task. Some of these strategies are represented by the Energy Aware Scheduler [1]. When the schedutil cpufreq governor is in use, the utilization signal allows the Linux scheduler to also drive frequency selection. The CPU utilization signal, which represents the aggregated utilization of tasks scheduled on that CPU, is used to select the frequency which best fits the workload generated by the tasks. The current translation of utilization values into a frequency selection is simple: we go to max for RT tasks or to the minimum frequency which can accommodate the utilization of DL+FAIR tasks. However, utilization values by themselves cannot convey the desired power/performance behaviors of each task as intended by user-space. As such they are not ideally suited for task placement decisions. Task placement and frequency selection policies in the kernel can be improved by taking into consideration hints coming from authorized user-space elements, like for example the Android middleware or more generally any "System Management Software" (SMS) framework. Utilization clamping is a mechanism which allows to "clamp" (i.e. filter) the utilization generated by RT and FAIR tasks within a range defined by user-space. The clamped utilization value can then be used, for example, to enforce a minimum and/or maximum frequency depending on which tasks are active on a CPU. The main use-cases for utilization clamping are: - boosting: better interactive response for small tasks which are affecting the user experience. Consider for example the case of a small control thread for an external accelerator (e.g. GPU, DSP, other devices). Here, from the task utilization the scheduler does not have a complete view of what the task's requirements are and, if it's a small utilization task, it keeps selecting a more energy efficient CPU, with smaller capacity and lower frequency, thus negatively impacting the overall time required to complete task activations. - capping: increase energy efficiency for background tasks not affecting the user experience. Since running on a lower capacity CPU at a lower frequency is more energy efficient, when the completion time is not a main goal, then capping the utilization considered for certain (maybe big) tasks can have positive effects, both on energy consumption and thermal headroom. This feature allows also to make RT tasks more energy friendly on mobile systems where running them on high capacity CPUs and at the maximum frequency is not required. From these two use-cases, it's worth noticing that frequency selection biasing, introduced by patches 9 and 10 of this series, is just one possible usage of utilization clamping. Another compelling extension of utilization clamping is in helping the scheduler in making tasks placement decisions. Utilization is (also) a task specific property the scheduler uses to know how much CPU bandwidth a task requires, at least as long as there is idle time. Thus, the utilization clamp values, defined either per-task or per-task_group, can represent tasks to the scheduler as being bigger (or smaller) than what they actually are. Utilization clamping thus enables interesting additional optimizations, for example on asymmetric capacity systems like Arm big.LITTLE and DynamIQ CPUs, where: - boosting: try to run small/foreground tasks on higher-capacity CPUs to complete them faster despite being less energy efficient. - capping: try to run big/background tasks on low-capacity CPUs to save power and thermal headroom for more important tasks This series does not present this additional usage of utilization clamping but it's an integral part of the EAS feature set, where [2] is one of its main components. Android kernels use SchedTune, a solution similar to utilization clamping, to bias both 'frequency selection' and 'task placement'. This series provides the foundation to add similar features to mainline while focusing, for the time being, just on schedutil integration. References ========== [1] Energy Aware Scheduling https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/scheduler/sched-energy.txt?h=v5.1 [2] Expressing per-task/per-cgroup performance hints Linux Plumbers Conference 2018 https://linuxplumbersconf.org/event/2/contributions/128/ Patrick Bellasi (6): sched/core: uclamp: Extend CPU's cgroup controller sched/core: uclamp: Propagate parent clamps sched/core: uclamp: Propagate system defaults to root group sched/core: uclamp: Use TG's clamps to restrict TASK's clamps sched/core: uclamp: Update CPU's refcount on TG's clamp changes sched/core: uclamp: always use enum uclamp_id for clamp_id values Documentation/admin-guide/cgroup-v2.rst | 34 +++ init/Kconfig | 22 ++ kernel/sched/core.c | 382 ++++++++++++++++++++++-- kernel/sched/sched.h | 12 +- 4 files changed, 430 insertions(+), 20 deletions(-) -- 2.22.0