Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754936AbdC1KGg (ORCPT ); Tue, 28 Mar 2017 06:06:36 -0400 Received: from mail-ot0-f173.google.com ([74.125.82.173]:33417 "EHLO mail-ot0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753990AbdC1KGc (ORCPT ); Tue, 28 Mar 2017 06:06:32 -0400 MIME-Version: 1.0 In-Reply-To: <20170328063541.12912-1-dietmar.eggemann@arm.com> References: <20170328063541.12912-1-dietmar.eggemann@arm.com> From: Vincent Guittot Date: Tue, 28 Mar 2017 12:05:12 +0200 Message-ID: Subject: Re: [RFC PATCH 0/5] CFS load tracking trace events To: Dietmar Eggemann Cc: Peter Zijlstra , Ingo Molnar , LKML , Matt Fleming , Steven Rostedt , Morten Rasmussen , Juri Lelli , Patrick Bellasi Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4427 Lines: 118 On 28 March 2017 at 08:35, Dietmar Eggemann wrote: > This patch-set introduces trace events for load (and utilization) > tracking for the following three cfs scheduler bricks: cfs_rq, > sched_entity and task_group. > > I've decided to sent it out because people are discussing problems with > PELT related functionality and as a base for the related discussion next > week at the OSPM-summit in Pisa. > > The requirements for the design are: > > (1) Support for all combinations related to the following kernel > configuration options: CONFIG_SMP, CONFIG_FAIR_GROUP_SCHED, > CONFIG_SCHED_AUTOGROUP, CONFIG_DEBUG_KERNEL. > > (2) All key=value pairs have to appear in all combinations of the > kernel configuration options mentioned under (1). > > (3) Stable interface, i.e. only use keys for a key=value pairs which > are independent from the underlying (load tracking) implementation. > > (4) Minimal invasive to the actual cfs scheduler code, i.e. the trace > event should not have to be guarded by an if condition (e.g. > entity_is_task(se) to distinguish between task and task_group) > or kernel configuration switch. > > The trace events only expose one load (and one utilization) key=value > pair besides those used to identify the cfs scheduler brick. They do > not provide any internal implementation details of the actual (PELT) > load-tracking mechanism. > This limitation might be too much since for a cfs_rq we can only trace > runnable load (cfs_rq->runnable_load_avg) or runnable/blocked load > (cfs_rq->avg.load_avg). > Other load metrics like instantaneous load ({cfs_rq|se}->load.weight) > are not traced but could be easily added. > > The following keys are used to identify the cfs scheduler brick: > > (1) Cpu number the cfs scheduler brick is attached to. > > (2) Task_group path and (css) id. > > (3) Task name and pid. Do you really need both path/name and id/pid ? The path/name looks quite intrusive so can't we just use id/pid ? > > In case a key does not apply due to an unset Kernel configuration option > or the fact that a sched_entity can represent either a task or a > task_group its value is set to an invalid default: > > (1) Task_group path: "(null)" > > (2) Task group id: -1 > > (3) Task name: "(null)" > > (4) Task pid: -1 > > Load tracking trace events are placed into kernel/sched/fair.c: > > (1) for cfs_rq: > > - In PELT core function __update_load_avg(). > > - During sched_entity attach/detach. > > - During sched_entity load/util propagation down the task_group > hierarchy. > > (2) for sched_entity: > > - In PELT core function __update_load_avg(). > > - During sched_entity load/util propagation down the task_group > hierarchy. > > (3) for task_group: > > - In its PELT update function. > > An alternative for using __update_load_avg() would be to put trace > events into update_cfs_rq_load_avg() for cfs_rq's and into > set_task_rq_fair(), update_load_avg(), sync_entity_load_avg() for > sched_entities. This would also get rid of the if(cfs_rq)/else condition > in __update_load_avg(). > > These trace events still look a bit fragile. > First of all, this patch-set has to use cfs specific data structures > in the global task scheduler trace file include/trace/events/sched.h. > And second, the accessor-functions (rq_of(), task_of(), etc.) are > private to the cfs scheduler class. In case they would be public these > trace events would be easier to code. That said, group_cfs_rq() is > already made public by this patch-stack. > > This version bases on tip/sched/core as of yesterday (bc4278987e38). It > has been compile tested on ~160 configurations via 0day's kbuild test > robot. > > Dietmar Eggemann (5): > sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG > sched/events: Introduce cfs_rq load tracking trace event > sched/fair: Export group_cfs_rq() > sched/events: Introduce sched_entity load tracking trace event > sched/events: Introduce task_group load tracking trace event > > include/linux/sched.h | 10 +++ > include/trace/events/sched.h | 143 +++++++++++++++++++++++++++++++++++++++++++ > kernel/sched/autogroup.c | 2 - > kernel/sched/autogroup.h | 2 - > kernel/sched/fair.c | 26 ++++---- > 5 files changed, 167 insertions(+), 16 deletions(-) > > -- > 2.11.0 >