Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp731756ybe; Thu, 5 Sep 2019 05:13:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqwFUYjweCwJnJCLFpTQnVyn+lXSyN+648SBw/khY/vXeHOIiBJgJgeM8E0WEAlW7a+RrC0a X-Received: by 2002:a17:902:7687:: with SMTP id m7mr3140192pll.41.1567685591537; Thu, 05 Sep 2019 05:13:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567685591; cv=none; d=google.com; s=arc-20160816; b=z70ESeJL2lPXQaPHa1AOkDzw+06guyTa74ac9/9+elm3WKVLxwJPox01pAWSCinvHR cx0zl0ZwzDoDwagMiw7sZi3dpFM+v5YJLHe/EXjmGrXdENnLXAa8P4LGiiADMEZHCiwI KbgQEKnUkkMm2gyxUSlGAsikFJAV5tu4eXcW0VEQQmOhn3W7YDO4TIsfPPt74LkOAw0s nW6Iy9dbfcr8AqzoSn5yV9vY2iXlGaFXeM6GI++nDrxQP8Dx5PD5C3OhD42xsU2UfoeU PxiV5vZUqxh/Ie2iAK5d/d/UlpZa/1DV19i8bjnXYf/pCcCkii/+Ix5/pOm//PEaldn1 mXRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:in-reply-to :subject:cc:to:from:user-agent:references; bh=b7kz6EndYiM7fr3/DsprM1luZhmmFDI0d7HwcJij3Ks=; b=uvjdbDWQQ4BxNLf36rmRcj4xyQ8GNlNYFxZjgjOGz23ETALLgC8+5oNxdE19cyOKeT Jgo0zZ5PuF+M2BS2oLr8Sc04/KSgJa1YHv7c2WcTdFmSEPlzEG0y17i0tD2YAMbABFjk UQy2uWbvkFfSVNjBaF0qSkFK8gZuNisCJCOP/vhUZkNj30sZA2yE+N5hv1exxpsXxlrl 3Co+DlRtovAGhPrhgOojGE/VuvdsAoMfwgswRJPaO6nWufp7qAGTpnAaeHa95bejf+Xz eRwARdTmW1DbgPoKt9To8pAjhNjPRcb+xxoHZYiAovy67vgJ22ixfNFvbnKkpIgp3N3S M3Gw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a21si1636774pgv.185.2019.09.05.05.12.55; Thu, 05 Sep 2019 05:13:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732205AbfIEKFX (ORCPT + 99 others); Thu, 5 Sep 2019 06:05:23 -0400 Received: from foss.arm.com ([217.140.110.172]:40928 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731215AbfIEKFX (ORCPT ); Thu, 5 Sep 2019 06:05:23 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9403B1576; Thu, 5 Sep 2019 03:05:22 -0700 (PDT) Received: from e110439-lin (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DB4F03F67D; Thu, 5 Sep 2019 03:05:20 -0700 (PDT) References: <20190830174944.21741-1-subhra.mazumdar@oracle.com> <20190830174944.21741-2-subhra.mazumdar@oracle.com> User-agent: mu4e 1.3.3; emacs 26.2 From: Patrick Bellasi To: subhra mazumdar Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, tglx@linutronix.de, steven.sistare@oracle.com, dhaval.giani@oracle.com, daniel.lezcano@linaro.org, vincent.guittot@linaro.org, viresh.kumar@linaro.org, tim.c.chen@linux.intel.com, mgorman@techsingularity.net, parth@linux.ibm.com Subject: Re: [RFC PATCH 1/9] sched,cgroup: Add interface for latency-nice In-reply-to: <20190830174944.21741-2-subhra.mazumdar@oracle.com> Date: Thu, 05 Sep 2019 11:05:18 +0100 Message-ID: <87pnkf2h41.fsf@arm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We already commented on adding the cgroup API after the per-task API. However, for the cgroup bits will be super important to have [ +tejun ] in CC since here we are at discussing the idea to add a new cpu controller's attribute. There are opinions about which kind of attributes can be added to cgroups and I'm sure a "latency-nice" attribute will generate an interesting discussion. :) LPC is coming up, perhaps we can get the chance to have a chat with Tejun about the manoeuvring space in this area. On Fri, Aug 30, 2019 at 18:49:36 +0100, subhra mazumdar wrote... > Add Cgroup interface for latency-nice. Each CPU Cgroup adds a new file > "latency-nice" which is shared by all the threads in that Cgroup. > > Signed-off-by: subhra mazumdar > --- > include/linux/sched.h | 1 + > kernel/sched/core.c | 40 ++++++++++++++++++++++++++++++++++++++++ > kernel/sched/fair.c | 1 + > kernel/sched/sched.h | 8 ++++++++ > 4 files changed, 50 insertions(+) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 1183741..b4a79c3 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -631,6 +631,7 @@ struct task_struct { > int static_prio; > int normal_prio; > unsigned int rt_priority; > + u64 latency_nice; I guess we can save some bit here... or, if we are very brave, maybe we can explore the possibility to pack all prios into a single u64? ( ( (tomatoes target here) ) ) > const struct sched_class *sched_class; > struct sched_entity se; > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 874c427..47969bc 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5976,6 +5976,7 @@ void __init sched_init(void) > init_dl_rq(&rq->dl); > #ifdef CONFIG_FAIR_GROUP_SCHED > root_task_group.shares = ROOT_TASK_GROUP_LOAD; > + root_task_group.latency_nice = LATENCY_NICE_DEFAULT; > INIT_LIST_HEAD(&rq->leaf_cfs_rq_list); > rq->tmp_alone_branch = &rq->leaf_cfs_rq_list; > /* > @@ -6345,6 +6346,7 @@ static void sched_change_group(struct task_struct *tsk, int type) > */ > tg = container_of(task_css_check(tsk, cpu_cgrp_id, true), > struct task_group, css); > + tsk->latency_nice = tg->latency_nice; > tg = autogroup_task_group(tsk, tg); > tsk->sched_task_group = tg; > > @@ -6812,6 +6814,34 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css, > } > #endif /* CONFIG_RT_GROUP_SCHED */ > > +static u64 cpu_latency_nice_read_u64(struct cgroup_subsys_state *css, > + struct cftype *cft) > +{ > + struct task_group *tg = css_tg(css); > + > + return tg->latency_nice; > +} > + > +static int cpu_latency_nice_write_u64(struct cgroup_subsys_state *css, > + struct cftype *cft, u64 latency_nice) > +{ > + struct task_group *tg = css_tg(css); > + struct css_task_iter it; > + struct task_struct *p; > + > + if (latency_nice < LATENCY_NICE_MIN || latency_nice > LATENCY_NICE_MAX) > + return -ERANGE; > + > + tg->latency_nice = latency_nice; > + > + css_task_iter_start(css, 0, &it); > + while ((p = css_task_iter_next(&it))) > + p->latency_nice = latency_nice; Once (and if) the cgroup API is added we can avoid this (potentially massive) "update on write" in favour of an "on demand composition at wakeup-time". We don't care about updating the latency-nice of NON RUNNABLE tasks, do we? AFAIK, we need that value only (or mostly) at wakeup time. Thus, when a task wakeup up we can easily compose (and eventually cache) it's current latency-nice value by considering, in priority order: - the system wide upper-bound - the task group restriction - the task specific relaxation Something similar to what we already do for uclamp composition with this patch currently in tip/sched/core: commit 3eac870a3247 ("sched/uclamp: Use TG's clamps to restrict TASK's clamps") > + css_task_iter_end(&it); > + > + return 0; > +} > + > static struct cftype cpu_legacy_files[] = { > #ifdef CONFIG_FAIR_GROUP_SCHED > { > @@ -6848,6 +6878,11 @@ static struct cftype cpu_legacy_files[] = { > .write_u64 = cpu_rt_period_write_uint, > }, > #endif > + { > + .name = "latency-nice", > + .read_u64 = cpu_latency_nice_read_u64, > + .write_u64 = cpu_latency_nice_write_u64, > + }, > { } /* Terminate */ > }; > > @@ -7015,6 +7050,11 @@ static struct cftype cpu_files[] = { > .write = cpu_max_write, > }, > #endif > + { > + .name = "latency-nice", > + .read_u64 = cpu_latency_nice_read_u64, > + .write_u64 = cpu_latency_nice_write_u64, > + }, > { } /* terminate */ > }; > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index f35930f..b08d00c 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -10479,6 +10479,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) > goto err; > > tg->shares = NICE_0_LOAD; > + tg->latency_nice = LATENCY_NICE_DEFAULT; ^^^^^^^^^^^^^^^^^^^^ Maybe better NICE_0_LATENCY to be more consistent? > init_cfs_bandwidth(tg_cfs_bandwidth(tg)); > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index b52ed1a..365c928 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -143,6 +143,13 @@ static inline void cpu_load_update_active(struct rq *this_rq) { } > #define NICE_0_LOAD (1L << NICE_0_LOAD_SHIFT) > > /* > + * Latency-nice default value > + */ > +#define LATENCY_NICE_DEFAULT 5 > +#define LATENCY_NICE_MIN 1 > +#define LATENCY_NICE_MAX 100 Values 1 and 5 looks kind of arbitrary. For the range specifically, I already commented in this other message: Message-ID: <87r24v2i14.fsf@arm.com> https://lore.kernel.org/lkml/87r24v2i14.fsf@arm.com/ > + > +/* > * Single value that decides SCHED_DEADLINE internal math precision. > * 10 -> just above 1us > * 9 -> just above 0.5us > @@ -362,6 +369,7 @@ struct cfs_bandwidth { > /* Task group related information */ > struct task_group { > struct cgroup_subsys_state css; > + u64 latency_nice; > > #ifdef CONFIG_FAIR_GROUP_SCHED > /* schedulable entities of this group on each CPU */ Best, Patrick -- #include Patrick Bellasi