Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp2731219imj; Mon, 18 Feb 2019 11:06:28 -0800 (PST) X-Google-Smtp-Source: AHgI3IbZM40uFAYTBKqcEWiba/Kzq4ByzSD4JQ6PbS2WBEUED+XJs9VVT8AIRYIL0Fd5KV+svbPJ X-Received: by 2002:a62:6282:: with SMTP id w124mr25760554pfb.168.1550516788282; Mon, 18 Feb 2019 11:06:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550516788; cv=none; d=google.com; s=arc-20160816; b=WFhgkiYGQnckic+EvSBlzMoB9K5F3WHucbcAJidsN4ZcFqru4pMWGXM1C5NSNDXLXJ Y/ndWVLa/qcz6zqzbzIGDYfQQSqTL2osAuOGT6HuckZjLSsjOv/NK7wR4MZRNYmoDo9b iXYPoxfD0rkCvfFRgiIxk9cTd6+fCOYYfGBikVxeaa1+htMAa3ODGxSCEV3R5j4q4e7V vt4c7fA2JY7Tb3oO5nWN8TNchz7Bm++r2ecI+5YlTJbVUzpc2Ut02ZfYwyGQhIsVKTIO d/5MHoTmTCR6KgfKlqaiR4+LZhOWgVPZwsfqByw/o01vMVdUyXmtNOUU9fQckQlupccj RGsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:subject:cc:to :from:date:user-agent:message-id:dkim-signature; bh=KZeke3HJCfFM6JcXY6curYpX6RyAuVqxvqsPDc4E+iI=; b=jeGKAdiD6NfCqH9jiHe95y/5riJw3vf1HXd9pJ1xsChu5ExEiAkk5mcfPj+r2zlBLq YDq8FOQeSRurkdxYr+E57TiIgpV13D8OSuBf3mfYg+xcAf9bPHxmJjQsJqmIJtxXVZXt PhtrqCyV+N+Eayyz78OydXIA2JlQDm2r5GoL7vtlKVszlY6+3ymPxhlAlaDQLc4hU6M8 UNRc9T8okZHJaySKrFg8uoE+/90757u37eDVFE0c6G7IQ8kNKiVv3iw0Ru6FaPqbgRwY Zw+S2tMsxUHu/AS6tiW02O+o00MnbsG/Jv2pNk1JJHUgzDV5FFHKJc3gVHQEKy5hjK0Z mk0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=q+GoRDY6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cv2si15930921plb.192.2019.02.18.11.06.12; Mon, 18 Feb 2019 11:06:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=q+GoRDY6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389014AbfBRRkh (ORCPT + 99 others); Mon, 18 Feb 2019 12:40:37 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:59120 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388781AbfBRRkd (ORCPT ); Mon, 18 Feb 2019 12:40:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=KZeke3HJCfFM6JcXY6curYpX6RyAuVqxvqsPDc4E+iI=; b=q+GoRDY67OXGNQBts88POoa10d rQqjLgpCcU9fnp8v9NG4Av7CVYHYs9Yx20LzCYW4vFKjuK1ktjP0U5V95fEEF8wdV3RwD/qpKCSBk PY5y87mw75dhOedHkWpCMZlefq9WLY35BhfniGS45B/TTdx8ApbtKueyJSK2scd1ViqMJOVt36nHk b9Gz1jNmBL2pF/o0LwSYrLqVfiO9EmAfupCBEzh7s0KEr4eCDMU0LcYmUfa6Ip6M9MFnfVqr3OJCG t/hxHHL5uIci61EVolKWOiysVOYBV7uo2TJiU1PldkIEf0nUPHmETtLmTeuzp6iWRCrd7vgBwzmDt LLR4qAjw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1gvmu4-0005zP-EW; Mon, 18 Feb 2019 17:40:28 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 478672848B87F; Mon, 18 Feb 2019 18:40:23 +0100 (CET) Message-Id: <20190218173514.667598558@infradead.org> User-Agent: quilt/0.65 Date: Mon, 18 Feb 2019 17:56:33 +0100 From: Peter Zijlstra To: mingo@kernel.org, tglx@linutronix.de, pjt@google.com, tim.c.chen@linux.intel.com, torvalds@linux-foundation.org Cc: linux-kernel@vger.kernel.org, subhra.mazumdar@oracle.com, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, "Peter Zijlstra (Intel)" Subject: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling. References: <20190218165620.383905466@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Instead of only selecting a local task, select a task for all SMT siblings for every reschedule on the core (irrespective which logical CPU does the reschedule). NOTE: there is still potential for siblings rivalry. NOTE: this is far too complicated; but thus far I've failed to simplify it further. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 222 ++++++++++++++++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 5 - 2 files changed, 224 insertions(+), 3 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3552,7 +3552,7 @@ static inline void schedule_debug(struct * Pick up the highest-prio task: */ static inline struct task_struct * -pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) +__pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) { const struct sched_class *class; struct task_struct *p; @@ -3597,6 +3597,220 @@ pick_next_task(struct rq *rq, struct tas BUG(); } +#ifdef CONFIG_SCHED_CORE + +static inline bool cookie_match(struct task_struct *a, struct task_struct *b) +{ + if (is_idle_task(a) || is_idle_task(b)) + return true; + + return a->core_cookie == b->core_cookie; +} + +// XXX fairness/fwd progress conditions +static struct task_struct * +pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *max) +{ + struct task_struct *class_pick, *cookie_pick; + unsigned long cookie = 0UL; + + /* + * We must not rely on rq->core->core_cookie here, because we fail to reset + * rq->core->core_cookie on new picks, such that we can detect if we need + * to do single vs multi rq task selection. + */ + + if (max && max->core_cookie) { + WARN_ON_ONCE(rq->core->core_cookie != max->core_cookie); + cookie = max->core_cookie; + } + + class_pick = class->pick_task(rq); + if (!cookie) + return class_pick; + + cookie_pick = sched_core_find(rq, cookie); + if (!class_pick) + return cookie_pick; + + /* + * If class > max && class > cookie, it is the highest priority task on + * the core (so far) and it must be selected, otherwise we must go with + * the cookie pick in order to satisfy the constraint. + */ + if (cpu_prio_less(cookie_pick, class_pick) && cpu_prio_less(max, class_pick)) + return class_pick; + + return cookie_pick; +} + +static struct task_struct * +pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) +{ + struct task_struct *next, *max = NULL; + const struct sched_class *class; + const struct cpumask *smt_mask; + int i, j, cpu; + + if (!sched_core_enabled(rq)) + return __pick_next_task(rq, prev, rf); + + /* + * If there were no {en,de}queues since we picked (IOW, the task + * pointers are all still valid), and we haven't scheduled the last + * pick yet, do so now. + */ + if (rq->core->core_pick_seq == rq->core->core_task_seq && + rq->core->core_pick_seq != rq->core_sched_seq) { + WRITE_ONCE(rq->core_sched_seq, rq->core->core_pick_seq); + + next = rq->core_pick; + if (next != prev) { + put_prev_task(rq, prev); + set_next_task(rq, next); + } + return next; + } + + prev->sched_class->put_prev_task(rq, prev, rf); + if (!rq->nr_running) + newidle_balance(rq, rf); + + cpu = cpu_of(rq); + smt_mask = cpu_smt_mask(cpu); + + /* + * core->core_task_seq, core->core_pick_seq, rq->core_sched_seq + * + * @task_seq guards the task state ({en,de}queues) + * @pick_seq is the @task_seq we did a selection on + * @sched_seq is the @pick_seq we scheduled + * + * However, preemptions can cause multiple picks on the same task set. + * 'Fix' this by also increasing @task_seq for every pick. + */ + rq->core->core_task_seq++; + + /* reset state */ + for_each_cpu(i, smt_mask) { + struct rq *rq_i = cpu_rq(i); + + rq_i->core_pick = NULL; + + if (i != cpu) + update_rq_clock(rq_i); + } + + /* + * Try and select tasks for each sibling in decending sched_class + * order. + */ + for_each_class(class) { +again: + for_each_cpu_wrap(i, smt_mask, cpu) { + struct rq *rq_i = cpu_rq(i); + struct task_struct *p; + + if (rq_i->core_pick) + continue; + + /* + * If this sibling doesn't yet have a suitable task to + * run; ask for the most elegible task, given the + * highest priority task already selected for this + * core. + */ + p = pick_task(rq_i, class, max); + if (!p) { + /* + * If there weren't no cookies; we don't need + * to bother with the other siblings. + */ + if (i == cpu && !rq->core->core_cookie) + goto next_class; + + continue; + } + + /* + * Optimize the 'normal' case where there aren't any + * cookies and we don't need to sync up. + */ + if (i == cpu && !rq->core->core_cookie && !p->core_cookie) { + next = p; + goto done; + } + + rq_i->core_pick = p; + + /* + * If this new candidate is of higher priority than the + * previous; and they're incompatible; we need to wipe + * the slate and start over. + * + * NOTE: this is a linear max-filter and is thus bounded + * in execution time. + */ + if (!max || core_prio_less(max, p)) { + struct task_struct *old_max = max; + + rq->core->core_cookie = p->core_cookie; + max = p; + + if (old_max && !cookie_match(old_max, p)) { + for_each_cpu(j, smt_mask) { + if (j == i) + continue; + + cpu_rq(j)->core_pick = NULL; + } + goto again; + } + } + } +next_class:; + } + + rq->core->core_pick_seq = rq->core->core_task_seq; + + /* + * Reschedule siblings + * + * NOTE: L1TF -- at this point we're no longer running the old task and + * sending an IPI (below) ensures the sibling will no longer be running + * their task. This ensures there is no inter-sibling overlap between + * non-matching user state. + */ + for_each_cpu(i, smt_mask) { + struct rq *rq_i = cpu_rq(i); + + WARN_ON_ONCE(!rq_i->core_pick); + + if (i == cpu) + continue; + + if (rq_i->curr != rq_i->core_pick) + resched_curr(rq_i); + } + + rq->core_sched_seq = rq->core->core_pick_seq; + next = rq->core_pick; + +done: + set_next_task(rq, next); + return next; +} + +#else /* !CONFIG_SCHED_CORE */ + +static struct task_struct * +pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) +{ + return __pick_next_task(rq, prev, rf); +} + +#endif /* CONFIG_SCHED_CORE */ + /* * __schedule() is the main scheduler function. * @@ -5866,7 +6080,7 @@ static void migrate_tasks(struct rq *dea /* * pick_next_task() assumes pinned rq->lock: */ - next = pick_next_task(rq, &fake_task, rf); + next = __pick_next_task(rq, &fake_task, rf); BUG_ON(!next); put_prev_task(rq, next); @@ -6322,7 +6536,11 @@ void __init sched_init(void) #ifdef CONFIG_SCHED_CORE rq->core = NULL; + rq->core_pick = NULL; rq->core_enabled = 0; + rq->core_tree = RB_ROOT; + + rq->core_cookie = 0UL; #endif } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -960,11 +960,15 @@ struct rq { #ifdef CONFIG_SCHED_CORE /* per rq */ struct rq *core; + struct task_struct *core_pick; unsigned int core_enabled; + unsigned int core_sched_seq; struct rb_root core_tree; /* shared state */ unsigned int core_task_seq; + unsigned int core_pick_seq; + unsigned long core_cookie; #endif }; @@ -1770,7 +1774,6 @@ static inline void put_prev_task(struct static inline void set_next_task(struct rq *rq, struct task_struct *next) { - WARN_ON_ONCE(rq->curr != next); next->sched_class->set_next_task(rq, next); }