Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753909AbbFPDSo (ORCPT ); Mon, 15 Jun 2015 23:18:44 -0400 Received: from mga11.intel.com ([192.55.52.93]:33166 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751792AbbFPDSh (ORCPT ); Mon, 15 Jun 2015 23:18:37 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,623,1427785200"; d="scan'208";a="508884006" From: Yuyang Du To: mingo@kernel.org, peterz@infradead.org, linux-kernel@vger.kernel.org Cc: pjt@google.com, bsegall@google.com, morten.rasmussen@arm.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, arjan.van.de.ven@intel.com, len.brown@intel.com, rafael.j.wysocki@intel.com, fengguang.wu@intel.com, boqun.feng@gmail.com, srikar@linux.vnet.ibm.com, Yuyang Du Subject: [Resend PATCH v8 0/4] sched: Rewrite runnable load and utilization average tracking Date: Tue, 16 Jun 2015 03:26:03 +0800 Message-Id: <1434396367-27979-1-git-send-email-yuyang.du@intel.com> X-Mailer: git-send-email 1.7.9.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5415 Lines: 150 Hi Peter and Ingo, Changes are made for the 8th version: 1) Rebase to the latest tip tree 2) scale_load_down the weight when doing the averages 3) change util_sum to u32 Thanks a lot for Ben's comments, which lead to this version. Thanks to Vincent for review. Regards, Yuyang v7 changes: The 7th version mostly is to accommodate the utilization load average recently merged into kernel. The general idea is as well to update the cfs_rq as a whole as opposed to only updating an entity at a time and update the cfs_rq with the only updated entity. 1) Rename utilization_load_avg to util_avg to be concise and meaningful 2) To track the cfs_rq util_avg, simply use "cfs_rq->curr != NULL" as the predicate. This should be equivalent to but simpler than aggregating each individual child sched_entity's util_avg when "cfs_rq->curr == se". Because if cfs_rq->curr != NULL, the cfs_rq->curr has to be some se. 3) Remove se's util_avg from its cfs_rq's when migrating it, this was already proposed by Morten and patches sent 3) The group entity's load average is initiated when the entity is created 4) Small nits: the entity's util_avg is removed from switched_from_fair() and task_move_group_fair(). Thanks a lot for Vincent and Morten's help for the 7th version. Thanks, Yuyang v6 changes: Many thanks to PeterZ for his review, to Dietmar, and to Fengguang for 0Day and LKP. Rebased on v3.18-rc2. - Unify decay_load 32 and 64 bits by mul_u64_u32_shr - Add force option in update_tg_load_avg - Read real-time cfs's load_avg for calc_tg_weight - Have tg_load_avg_contrib ifdef CONFIG_FAIR_GROUP_SCHED - Bug fix v5 changes: Thank Peter intensively for reviewing this patchset in detail and all his comments. And Mike for general and cgroup pipe-test. Morten, Ben, and Vincent in the discussion. - Remove dead task and task group load_avg - Do not update trivial delta to task_group load_avg (threshold 1/64 old_contrib) - mul_u64_u32_shr() is used in decay_load, so on 64bit, load_sum can afford about 4353082796 (=2^64/47742/88761) entities with the highest weight (=88761) always runnable, greater than previous theoretical maximum 132845 - Various code efficiency and style changes We carried out some performance tests (thanks to Fengguang and his LKP). The results are shown as follows. The patchset (including threepatches) is on top of mainline v3.16-rc5. We may report more perf numbers later. Overall, this rewrite has better performance, and reduced net overhead in load average tracking, flat efficiency in multi-layer cgroup pipe-test. v4 changes: Thanks to Morten, Ben, and Fengguang for v4 revision. - Insert memory barrier before writing cfs_rq->load_last_update_copy. - Fix typos. v3 changes: Many thanks to Ben for v3 revision. Regarding the overflow issue, we now have for both entity and cfs_rq: struct sched_avg { ..... u64 load_sum; unsigned long load_avg; ..... }; Given the weight for both entity and cfs_rq is: struct load_weight { unsigned long weight; ..... }; So, load_sum's max is 47742 * load.weight (which is unsigned long), then on 32bit, it is absolutly safe. On 64bit, with unsigned long being 64bit, but we can afford about 4353082796 (=2^64/47742/88761) entities with the highest weight (=88761) always runnable, even considering we may multiply 1<<15 in decay_load64, we can still support 132845 (=4353082796/2^15) always runnable, which should be acceptible. load_avg = load_sum / 47742 = load.weight (which is unsigned long), so it should be perfectly safe for both entity (even with arbitrary user group share) and cfs_rq on both 32bit and 64bit. Originally, we saved this division, but have to get it back because of the overflow issue on 32bit (actually load average itself is safe from overflow, but the rest of the code referencing it always uses long, such as cpu_load, etc., which prevents it from saving). - Fix overflow issue both for entity and cfs_rq on both 32bit and 64bit. - Track all entities (both task and group entity) due to group entity's clock issue. This actually improves code simplicity. - Make a copy of cfs_rq sched_avg's last_update_time, to read an intact 64bit variable on 32bit machine when in data race (hope I did it right). - Minor fixes and code improvement. v2 changes: Thanks to PeterZ and Ben for their help in fixing the issues and improving the quality, and Fengguang and his 0Day in finding compile errors in different configurations for version 2. - Batch update the tg->load_avg, making sure it is up-to-date before update_cfs_shares - Remove migrating task from the old CPU/cfs_rq, and do so with atomic operations Yuyang Du (4): sched: Remove rq's runnable avg sched: Rewrite runnable load and utilization average tracking sched: Init cfs_rq's sched_entity load average sched: Remove task and group entity load when they are dead include/linux/sched.h | 40 ++- kernel/sched/core.c | 5 +- kernel/sched/debug.c | 42 +--- kernel/sched/fair.c | 668 ++++++++++++++++--------------------------------- kernel/sched/sched.h | 32 +-- 5 files changed, 261 insertions(+), 526 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/