Received: by 10.223.185.111 with SMTP id b44csp108264wrg; Fri, 9 Mar 2018 01:54:19 -0800 (PST) X-Google-Smtp-Source: AG47ELsEC5BIELWcdL/t0jN6sKth0uImOUjqSzEo76tboCDJNHVxE42JXRYYZYtZDeuY/9A7Agu1 X-Received: by 2002:a17:902:7e87:: with SMTP id c7-v6mr27909838plm.138.1520589259861; Fri, 09 Mar 2018 01:54:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520589259; cv=none; d=google.com; s=arc-20160816; b=G0CAFfJNwlmpQTKa/EwjzMNDh/Q8+f9JBezliNewxPhMqXvb2SGPprQYdK07L1kiEw xsrcbnt5BYCAbAdALsx/VYtNOV71p9DZ4qjanCHX4dq/Xsa0RtOBP8+fDbna2CIt15bi 09qXYp2pL8cqGZWFhJyR59HwuJdw8VC9knuh+oH6M07kaYpzUTVDFaTPlxZamGC7xUcq WMvNMjOv7kbs2fVI0LERfFa5bljV3aNRev3EqkS2MK2TeQ1s6p7k9I1zZ3DDOsDjwS+n XFpvgFyywt2DY3twk98CgkQFs84/dPOJ37SGqIYDRWZXy8OvT/rETSSgzflQPISJL5Qh gKyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=paokIhXLgN1ZSJqLE/0Db8z4IxHjipKbbMweTjC0I9A=; b=mcS12MV+cgyBcPiJ9jXg7/pK9t0fU52UclL5vPdKY3HW/OXSmnlHMbcLAKs72cnmtI XAZD1nBllV1l56rLDdg3IPuV1BaulvFpuVu4VGIPjShcTk3Hgm/Ud5A+tppEoVDHqP+Z YVmvKR/Z3TIRB5jLTLG2cAV0WEMj7mKtB7OBwBNelwvTmI6beil1cEG0XbHc+M+x8lFD sjFurB9AW9kIp4rI2f44TTngCjJqwOy/sv8Wn7gsIotvxt+yePTLMxnuP8cvwPnOumHY Vxjv16QU8AqyKrU9m5Q4AdP7VPv4Rr0f+YxfwHhBVa1/Pon7s215kiLaXAXd5Wz0ut48 /LRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r1si503984pgq.305.2018.03.09.01.54.05; Fri, 09 Mar 2018 01:54:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751321AbeCIJxT (ORCPT + 99 others); Fri, 9 Mar 2018 04:53:19 -0500 Received: from foss.arm.com ([217.140.101.70]:49370 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751173AbeCIJxJ (ORCPT ); Fri, 9 Mar 2018 04:53:09 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7F98C1596; Fri, 9 Mar 2018 01:53:09 -0800 (PST) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 026B13F25C; Fri, 9 Mar 2018 01:53:06 -0800 (PST) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Viresh Kumar , Vincent Guittot , Paul Turner , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle Subject: [PATCH v6 4/4] sched/fair: update util_est only on util_avg updates Date: Fri, 9 Mar 2018 09:52:45 +0000 Message-Id: <20180309095245.11071-5-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180309095245.11071-1-patrick.bellasi@arm.com> References: <20180309095245.11071-1-patrick.bellasi@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The estimated utilization of a task is currently updated every time the task is dequeued. However, to keep overheads under control, PELT signals are effectively updated at maximum once every 1ms. Thus, for really short running tasks, it can happen that their util_avg value has not been updates since their last enqueue. If such tasks are also frequently running tasks (e.g. the kind of workload generated by hackbench) it can also happen that their util_avg is updated only every few activations. This means that updating util_est at every dequeue potentially introduces not necessary overheads and it's also conceptually wrong if the util_avg signal has never been updated during a task activation. Let's introduce a throttling mechanism on task's util_est updates to sync them with util_avg updates. To make the solution memory efficient, both in terms of space and load/store operations, we encode a synchronization flag into the LSB of util_est.enqueued. This makes util_est an even values only metric, which is still considered good enough for its purpose. The synchronization bit is (re)set by __update_load_avg_se() once the PELT signal of a task has been updated during its last activation. Such a throttling mechanism allows to keep under control util_est overheads in the wakeup hot path, thus making it a suitable mechanism which can be enabled also on high-intensity workload systems. Thus, this now switches on by default the estimation utilization scheduler feature. Suggested-by: Chris Redpath Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Paul Turner Cc: Vincent Guittot Cc: Morten Rasmussen Cc: Dietmar Eggemann Cc: linux-kernel@vger.kernel.org --- Changes in v6: - remove READ_ONCE from rq-lock protected code paths - change flag name to better match its meaning - fix compilation for !CONFIG_SMP systems - add missing CCs in the changelog Changes in v5: - set SCHED_FEAT(UTIL_EST, true) as default (Peter) --- kernel/sched/fair.c | 42 ++++++++++++++++++++++++++++++++++++++---- kernel/sched/features.h | 2 +- 2 files changed, 39 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5cf4aa39a6ca..a52868b07850 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3257,6 +3257,32 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load, unsigned long runna sa->util_avg = sa->util_sum / divider; } +/* + * When a task is dequeued, its estimated utilization should not be update if + * its util_avg has not been updated at least once. + * This flag is used to synchronize util_avg updates with util_est updates. + * We map this information into the LSB bit of the utilization saved at + * dequeue time (i.e. util_est.dequeued). + */ +#define UTIL_AVG_UNCHANGED 0x1 + +static inline void cfs_se_util_change(struct sched_avg *avg) +{ + unsigned int enqueued; + + if (!sched_feat(UTIL_EST)) + return; + + /* Avoid store if the flag has been already set */ + enqueued = avg->util_est.enqueued; + if (!(enqueued & UTIL_AVG_UNCHANGED)) + return; + + /* Reset flag to report util_avg has been updated */ + enqueued &= ~UTIL_AVG_UNCHANGED; + WRITE_ONCE(avg->util_est.enqueued, enqueued); +} + /* * sched_entity: * @@ -3308,6 +3334,7 @@ __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entit cfs_rq->curr == se)) { ___update_load_avg(&se->avg, se_weight(se), se_runnable(se)); + cfs_se_util_change(&se->avg); return 1; } @@ -3908,7 +3935,7 @@ static inline void util_est_enqueue(struct cfs_rq *cfs_rq, /* Update root cfs_rq's estimated utilization */ enqueued = cfs_rq->avg.util_est.enqueued; - enqueued += _task_util_est(p); + enqueued += (_task_util_est(p) | UTIL_AVG_UNCHANGED); WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued); } @@ -3943,7 +3970,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) if (cfs_rq->nr_running) { ue.enqueued = cfs_rq->avg.util_est.enqueued; ue.enqueued -= min_t(unsigned int, ue.enqueued, - _task_util_est(p)); + (_task_util_est(p) | UTIL_AVG_UNCHANGED)); } WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued); @@ -3954,12 +3981,19 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) if (!task_sleep) return; + /* + * If the PELT values haven't changed since enqueue time, + * skip the util_est update. + */ + ue = p->se.avg.util_est; + if (ue.enqueued & UTIL_AVG_UNCHANGED) + return; + /* * Skip update of task's estimated utilization when its EWMA is * already ~1% close to its last activation value. */ - ue = p->se.avg.util_est; - ue.enqueued = task_util(p); + ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED); last_ewma_diff = ue.enqueued - ue.ewma; if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100))) return; diff --git a/kernel/sched/features.h b/kernel/sched/features.h index c459a4b61544..85ae8488039c 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -89,4 +89,4 @@ SCHED_FEAT(WA_BIAS, true) /* * UtilEstimation. Use estimated CPU utilization. */ -SCHED_FEAT(UTIL_EST, false) +SCHED_FEAT(UTIL_EST, true) -- 2.15.1