Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752857Ab1CBFoc (ORCPT ); Wed, 2 Mar 2011 00:44:32 -0500 Received: from smtp-out.google.com ([74.125.121.67]:40899 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752414Ab1CBFoW convert rfc822-to-8bit (ORCPT ); Wed, 2 Mar 2011 00:44:22 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=google.com; s=beta; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=bECA0UHJrycyIcllz4uEjIfLhKmvm1WEdyixmXhU5wU8BDAyy7OvS4jpQR85vv7fWQ EHeFOtCkc0Ca+/gjsyEg== MIME-Version: 1.0 In-Reply-To: <1299022433-17233-1-git-send-email-venki@google.com> References: <1299022433-17233-1-git-send-email-venki@google.com> From: Paul Turner Date: Tue, 1 Mar 2011 21:43:48 -0800 Message-ID: Subject: Re: [PATCH] sched: next buddy hint on sleep and preempt path To: Venkatesh Pallipadi Cc: Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, Mike Galbraith , Rik van Riel Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5031 Lines: 126 On Tue, Mar 1, 2011 at 3:33 PM, Venkatesh Pallipadi wrote: > When a task in a taskgroup sleeps, pick_next_task starts all the way back at > the root and picks the task/taskgroup with the min vruntime across all > runnable tasks. But, when there are many frequently sleeping tasks > across different taskgroups, it makes better sense to stay with same taskgroup > for its slice period (or until all tasks in the taskgroup sleeps) instead of > switching cross taskgroup on each sleep after a short runtime. > This helps specifically where taskgroups corresponds to a process with > multiple threads. The change reduces the number of CR3 switches in this case. > > Example: > Two taskgroups with 2 threads each which are running for 2ms and > sleeping for 1ms. Looking at sched:sched_switch shows - > > BEFORE: taskgroup_1 threads [5004, 5005], taskgroup_2 threads [5016, 5017] > ? ? ?cpu-soaker-5004 ?[003] ?3683.391089 > ? ? ?cpu-soaker-5016 ?[003] ?3683.393106 > ? ? ?cpu-soaker-5005 ?[003] ?3683.395119 > ? ? ?cpu-soaker-5017 ?[003] ?3683.397130 > ? ? ?cpu-soaker-5004 ?[003] ?3683.399143 > ? ? ?cpu-soaker-5016 ?[003] ?3683.401155 > ? ? ?cpu-soaker-5005 ?[003] ?3683.403168 > ? ? ?cpu-soaker-5017 ?[003] ?3683.405170 > > AFTER: taskgroup_1 threads [21890, 21891], taskgroup_2 threads [21934, 21935] > ? ? ?cpu-soaker-21890 [003] ? 865.895494 > ? ? ?cpu-soaker-21935 [003] ? 865.897506 > ? ? ?cpu-soaker-21934 [003] ? 865.899520 > ? ? ?cpu-soaker-21935 [003] ? 865.901532 > ? ? ?cpu-soaker-21934 [003] ? 865.903543 > ? ? ?cpu-soaker-21935 [003] ? 865.905546 > ? ? ?cpu-soaker-21891 [003] ? 865.907548 > ? ? ?cpu-soaker-21890 [003] ? 865.909560 > ? ? ?cpu-soaker-21891 [003] ? 865.911571 > ? ? ?cpu-soaker-21890 [003] ? 865.913582 > ? ? ?cpu-soaker-21891 [003] ? 865.915594 > ? ? ?cpu-soaker-21934 [003] ? 865.917606 > > Similar problem is there when there are multiple taskgroups and say a task A > preempts currently running task B of taskgroup_1. On schedule, pick_next_task > can pick an unrelated task on taskgroup_2. Here it would be better to give some > preference to task B on pick_next_task. > > A simple (may be extreme case) benchmark I tried was tbench with 2 tbench > client processes with 2 threads each running on a single CPU. Avg throughput > across 5 50 sec runs was - > BEFORE: 105.84 MB/sec > AFTER: 112.42 MB/sec > > Signed-off-by: Venkatesh Pallipadi > --- > ?kernel/sched_fair.c | ? 20 ++++++++++++++++++-- > ?1 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > index 3a88dee..36e8f02 100644 > --- a/kernel/sched_fair.c > +++ b/kernel/sched_fair.c > @@ -1339,6 +1339,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) > ? ? ? ?hrtick_update(rq); > ?} > > +static void set_next_buddy(struct sched_entity *se); > + > ?/* > ?* The dequeue_task method is called before nr_running is > ?* decreased. We remove the task from the rbtree and > @@ -1348,14 +1350,22 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) > ?{ > ? ? ? ?struct cfs_rq *cfs_rq; > ? ? ? ?struct sched_entity *se = &p->se; > + ? ? ? int task_flags = flags; simpler: int voluntary = flags & DEQUEUE_SLEEP; > > ? ? ? ?for_each_sched_entity(se) { > ? ? ? ? ? ? ? ?cfs_rq = cfs_rq_of(se); > ? ? ? ? ? ? ? ?dequeue_entity(cfs_rq, se, flags); > > ? ? ? ? ? ? ? ?/* Don't dequeue parent if it has other entities besides us */ > - ? ? ? ? ? ? ? if (cfs_rq->load.weight) > + ? ? ? ? ? ? ? if (cfs_rq->load.weight) { > + ? ? ? ? ? ? ? ? ? ? ? /* > + ? ? ? ? ? ? ? ? ? ? ? ?* Bias pick_next to pick a task from this cfs_rq, as > + ? ? ? ? ? ? ? ? ? ? ? ?* p is sleeping when it is within its sched_slice. > + ? ? ? ? ? ? ? ? ? ? ? ?*/ > + ? ? ? ? ? ? ? ? ? ? ? if (task_flags & DEQUEUE_SLEEP && se->parent) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? set_next_buddy(se->parent); re-using the last_buddy would seem like a more natural fit here; also doesn't have a clobber race with a wakeup > ? ? ? ? ? ? ? ? ? ? ? ?break; > + ? ? ? ? ? ? ? } > ? ? ? ? ? ? ? ?flags |= DEQUEUE_SLEEP; > ? ? ? ?} > > @@ -1887,8 +1897,14 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_ > ? ? ? ?update_curr(cfs_rq); > ? ? ? ?find_matching_se(&se, &pse); > ? ? ? ?BUG_ON(!pse); > - ? ? ? if (wakeup_preempt_entity(se, pse) == 1) > + ? ? ? if (wakeup_preempt_entity(se, pse) == 1) { > + ? ? ? ? ? ? ? /* > + ? ? ? ? ? ? ? ?* Bias pick_next to pick the sched entity that is > + ? ? ? ? ? ? ? ?* triggering this preemption. > + ? ? ? ? ? ? ? ?*/ > + ? ? ? ? ? ? ? set_next_buddy(pse); this probably wants some sort of unification with the scale-based next buddy above > ? ? ? ? ? ? ? ?goto preempt; > + ? ? ? } > > ? ? ? ?return; > > -- > 1.7.3.1 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/