Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752925AbbGSJLN (ORCPT ); Sun, 19 Jul 2015 05:11:13 -0400 Received: from lgeamrelo01.lge.com ([156.147.1.125]:52758 "EHLO lgeamrelo01.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752813AbbGSJLL (ORCPT ); Sun, 19 Jul 2015 05:11:11 -0400 X-Original-SENDERIP: 10.177.222.33 X-Original-MAILFROM: byungchul.park@lge.com From: byungchul.park@lge.com To: mingo@kernel.org, peterz@infradead.org Cc: linux-kernel@vger.kernel.org, Byungchul Park Subject: [PATCH v3] sched: modify how to compute a slice and check a preemptability Date: Sun, 19 Jul 2015 18:11:00 +0900 Message-Id: <1437297060-25378-1-git-send-email-byungchul.park@lge.com> X-Mailer: git-send-email 1.7.9.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4542 Lines: 124 From: Byungchul Park hello all, i asked a question like below, in last version(=v2) patch. *** the sysctl_sched_min_granularity must be defined clearly at first. after defining that clearly, the way to work can be set. the definition can be either case 1 or case 2 below. case 1. any task must have at least sysctl_sched_min_granularity slice, which is currently 0.75ms. in this case, increasing the number of tasks in a rq can cause stretching a whole latency, which most of you don't like because it can stretch the whole latency too much. but it looks normal to me since it already happens in !CONFIG_FAIR_GROUP_SCHED world with the large number of tasks. i wonder why CONFIG_FAIR_GROUP_SCHED world must be different with !CONFIG_FAIR_GROUP_SCHED world? anyway... case 2. tasks can have a slice much smaller than sysctl_sched_min_granularity, according to the position in hierarchy. if a rq has 8 same weighted sched entities and each entities has 8 same weighted sched entities and do it one more, then a task can have a very small slice, e.g. 0.75ms / 64 ~ 0.01ms. if you add more level to cgroup, it would get worse. in this situation, context switching overhead becomes very large. what does it mean sysctl_sched_min_granularity here? anyway... i am not sure which is the right definition of sysctl_sched_min_granularity between case 1 and case 2. what do you think about this? *** i wrote this v3 patch based on the case 1 assuming the case 1 is right. if the case 2 is right, then modifications in check_preempt_tick() should be ignored. doesn't it make sense? thank you, byungchul ---------------->8---------------- >From 7ebce566af9b952d24494cd1258b481ec6639cc1 Mon Sep 17 00:00:00 2001 From: Byungchul Park Date: Sun, 19 Jul 2015 17:11:37 +0900 Subject: [PATCH v3] sched: modify how to compute a slice and check a preemptability make cfs scheduler use rq level nr_running to compute a period in the case of CONFIG_FAIR_GROUP_SCHED. using local cfs's nr_running to get period is very weird. for example, imagine cgroup structure below. root(=rq.cfs)--group1----a |---b |---c |---d |---e |---f |---g |---h |---i |---j |---k |---l |---m in this case, group1's slice is not comparable to (a's slice + ... + m's slice) with current code. it makes code using sum_exec_runtime weird, too. it happens since current code does not use a consistent global wide thing to get a global wide period. in addition, modify preempt checking code to ensure that a sched entity has at least sysctl_sched_min_granularity granularity for preemption. Signed-off-by: Byungchul Park --- kernel/sched/fair.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 09456fc..41c619f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -635,7 +635,7 @@ static u64 __sched_period(unsigned long nr_running) */ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se) { - u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq); + u64 slice = __sched_period(rq_of(cfs_rq)->cfs.nr_running + !se->on_rq); for_each_sched_entity(se) { struct load_weight *load; @@ -3226,6 +3226,12 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr) struct sched_entity *se; s64 delta; + /* + * Ensure that a task executes at least for sysctl_sched_min_granularity + */ + if (delta_exec < sysctl_sched_min_granularity) + return; + ideal_runtime = sched_slice(cfs_rq, curr); delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime; if (delta_exec > ideal_runtime) { @@ -3243,9 +3249,6 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr) * narrow margin doesn't have to wait for a full slice. * This also mitigates buddy induced latencies under load. */ - if (delta_exec < sysctl_sched_min_granularity) - return; - se = __pick_first_entity(cfs_rq); delta = curr->vruntime - se->vruntime; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/