Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp4702097ybf; Wed, 4 Mar 2020 09:01:30 -0800 (PST) X-Google-Smtp-Source: ADFU+vtwa7+QyRW2nOuQ2ezNdQH0eN4eIVHYyMrdfpe/SD86I8ajX9hQ+Gozzlm5fS2PpaJWZDND X-Received: by 2002:aca:170f:: with SMTP id j15mr2497807oii.95.1583341290644; Wed, 04 Mar 2020 09:01:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583341290; cv=none; d=google.com; s=arc-20160816; b=avlECX8LXcgLIHo2yFF2M0kF2cHPH4phNaGUcnYV6QTiny2JldWLRJBPDKmkHTHEy5 jjxKsFhBp2drjnmQ1yu5S+ymwZH6le9KHN8escUQS3izWsRNIBrxvLZR9bzlB+c3nDx7 x/fUtAIzgYCIRXkiHX9pPAejliMp+9puNeMthh7PlOklK/Gy2XrS2fqeIckJZfozcjdC +h2sOMTqLdtux56CB+psi3EhERtDL2Th13fTJi9jLTmGqHyhsITCXADumUscx88pGNeS EDjHvUoIQVuo/uJIYM7k2q5f6tKgJmzEXUySgL2y17gPbZqP2EEduIHVABtV7MUfLQh5 l5mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature; bh=T8oep6r91WIFMDb1yEpR1+LjSdrnkR/Ox3uBhbfaVnc=; b=VlyDAnhIKO6GxxgSPnCJbSVgSuoQvuB5muAw9wkRMYbojMewcxczuCST1DemzA17Pw gUvZmKAsQMOzcfpkL9BNHbEnvt0qoYcVZ4/d2ch8KwE4auDF5OV0Iq2FmAzyhJ8NzIbf C9Y8fiIuU8/h+XS0sXv5Th0++QF/isrOPfjPjZSgaBy4WqDLl8cewS3sQ70XVfw5V+bP W6oYx+KXQNyeM/f8RQLyQGhEgPCoctKfiCz8Rk6HMELlFivVBa1IM50mX9MmCuuxxRF5 k7FWgX6+Cd1ELqPmHJgGWLNLA3DOErx57lirVjXY1sSh0vE4eNpiWtdK53fWFy5O3S0I dHww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@digitalocean.com header.s=google header.b=RS7m6Ilu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=digitalocean.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r8si1494200ota.230.2020.03.04.09.01.17; Wed, 04 Mar 2020 09:01:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@digitalocean.com header.s=google header.b=RS7m6Ilu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=digitalocean.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388426AbgCDRAl (ORCPT + 99 others); Wed, 4 Mar 2020 12:00:41 -0500 Received: from mail-qk1-f195.google.com ([209.85.222.195]:45900 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388356AbgCDRAS (ORCPT ); Wed, 4 Mar 2020 12:00:18 -0500 Received: by mail-qk1-f195.google.com with SMTP id z12so2292803qkg.12 for ; Wed, 04 Mar 2020 09:00:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digitalocean.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references; bh=T8oep6r91WIFMDb1yEpR1+LjSdrnkR/Ox3uBhbfaVnc=; b=RS7m6IluXaXGXzBZkVJYapkc8L8CPEBPl6Ta680pnwfjVN3VA0fwu2jSb+3wGyL7CI 3JM10L0S6DGIdHSR2gINVlPN/YzEBLssq8DhUiY+wyEZFU+b2T/+UdFHeT3biqYR+6VH vkhVtZcseCbqECVB5/5ltqaT5Gu3eFvYEwA1E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=T8oep6r91WIFMDb1yEpR1+LjSdrnkR/Ox3uBhbfaVnc=; b=YalnHHt5oWQFTChRhR8WwfKiqmEKMRh0kcJmO4KeyedUKBnI4VbfViF6t44mfkVvD3 4iG7Tr3Mcs9T/jHAsIfeY0rKdJdm40hRffbq3Yl5T1JOSp++kwLYE3CBrMCyqEfPnKRw tr/qR6xQGr9nZ06Vd8Zi1orZYkYzBonmpbsYW3itnbtZ/71LXoSovSHi381Lz+DWvJxL SMPFmmn7NK/lE8BH9YB+zZXA2iZx9tZ1kYAzLWtMIT7e5D1qHz71zXc1ywA4njI4iuMg TgqSQjqDIZ6FsrQUWV3tXM3KnKUrgdiciQqgDwhoqXrzfV6MXfEtkfLmeQjd1FV5PJAy 944A== X-Gm-Message-State: ANhLgQ0R5QkTjoDwoYbTHhA4v3dhrZfodVWcH6NoubK+89ouTEeoXtTR K3kf/gPf6chN017q+uwm66Pu2Q== X-Received: by 2002:a05:620a:2208:: with SMTP id m8mr3744398qkh.236.1583341216801; Wed, 04 Mar 2020 09:00:16 -0800 (PST) Received: from s2r5node9 ([162.243.188.99]) by smtp.gmail.com with ESMTPSA id u48sm563482qtc.79.2020.03.04.09.00.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Mar 2020 09:00:15 -0800 (PST) From: vpillai To: Nishanth Aravamudan , Julien Desfossez , Peter Zijlstra , Tim Chen , mingo@kernel.org, tglx@linutronix.de, pjt@google.com, torvalds@linux-foundation.org Cc: Aaron Lu , linux-kernel@vger.kernel.org, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Phil Auld , Aaron Lu , Aubrey Li , aubrey.li@linux.intel.com, Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini , Joel Fernandes , joel@joelfernandes.org, Aaron Lu Subject: [RFC PATCH 09/13] sched/fair: core wide vruntime comparison Date: Wed, 4 Mar 2020 16:59:59 +0000 Message-Id: <2f83d888890cec14be3a7aead0859dceebb4012f.1583332765.git.vpillai@digitalocean.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Aaron Lu This patch provides a vruntime based way to compare two cfs task's priority, be it on the same cpu or different threads of the same core. When the two tasks are on the same CPU, we just need to find a common cfs_rq both sched_entities are on and then do the comparison. When the two tasks are on differen threads of the same core, the root level sched_entities to which the two tasks belong will be used to do the comparison. An ugly illustration for the cross CPU case: cpu0 cpu1 / | \ / | \ se1 se2 se3 se4 se5 se6 / \ / \ se21 se22 se61 se62 Assume CPU0 and CPU1 are smt siblings and task A's se is se21 while task B's se is se61. To compare priority of task A and B, we compare priority of se2 and se6. Whose vruntime is smaller, who wins. To make this work, the root level se should have a common cfs_rq min vuntime, which I call it the core cfs_rq min vruntime. When we adjust the min_vruntime of rq->core, we need to propgate that down the tree so as to not cause starvation of existing tasks based on previous vruntime. Signed-off-by: Aaron Lu --- kernel/sched/core.c | 15 +------ kernel/sched/fair.c | 99 +++++++++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 2 + 3 files changed, 102 insertions(+), 14 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9a1bd236044e..556bf054b896 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -119,19 +119,8 @@ static inline bool prio_less(struct task_struct *a, struct task_struct *b) if (pa == -1) /* dl_prio() doesn't work because of stop_class above */ return !dl_time_before(a->dl.deadline, b->dl.deadline); - if (pa == MAX_RT_PRIO + MAX_NICE) { /* fair */ - u64 vruntime = b->se.vruntime; - - /* - * Normalize the vruntime if tasks are in different cpus. - */ - if (task_cpu(a) != task_cpu(b)) { - vruntime -= task_cfs_rq(b)->min_vruntime; - vruntime += task_cfs_rq(a)->min_vruntime; - } - - return !((s64)(a->se.vruntime - vruntime) <= 0); - } + if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */ + return cfs_prio_less(a, b); return false; } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d99ea6ee7af2..1c9a80d8dbb8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -449,9 +449,105 @@ find_matching_se(struct sched_entity **se, struct sched_entity **pse) #endif /* CONFIG_FAIR_GROUP_SCHED */ +static inline struct cfs_rq *root_cfs_rq(struct cfs_rq *cfs_rq) +{ + return &rq_of(cfs_rq)->cfs; +} + +static inline bool is_root_cfs_rq(struct cfs_rq *cfs_rq) +{ + return cfs_rq == root_cfs_rq(cfs_rq); +} + +static inline struct cfs_rq *core_cfs_rq(struct cfs_rq *cfs_rq) +{ + return &rq_of(cfs_rq)->core->cfs; +} + static inline u64 cfs_rq_min_vruntime(struct cfs_rq *cfs_rq) { - return cfs_rq->min_vruntime; + if (!sched_core_enabled(rq_of(cfs_rq))) + return cfs_rq->min_vruntime; + + if (is_root_cfs_rq(cfs_rq)) + return core_cfs_rq(cfs_rq)->min_vruntime; + else + return cfs_rq->min_vruntime; +} + +static void coresched_adjust_vruntime(struct cfs_rq *cfs_rq, u64 delta) +{ + struct sched_entity *se, *next; + + if (!cfs_rq) + return; + + cfs_rq->min_vruntime -= delta; + rbtree_postorder_for_each_entry_safe(se, next, + &cfs_rq->tasks_timeline.rb_root, run_node) { + if (se->vruntime > delta) + se->vruntime -= delta; + if (se->my_q) + coresched_adjust_vruntime(se->my_q, delta); + } +} + +static void update_core_cfs_rq_min_vruntime(struct cfs_rq *cfs_rq) +{ + struct cfs_rq *cfs_rq_core; + + if (!sched_core_enabled(rq_of(cfs_rq))) + return; + + if (!is_root_cfs_rq(cfs_rq)) + return; + + cfs_rq_core = core_cfs_rq(cfs_rq); + if (cfs_rq_core != cfs_rq && + cfs_rq->min_vruntime < cfs_rq_core->min_vruntime) { + u64 delta = cfs_rq_core->min_vruntime - cfs_rq->min_vruntime; + coresched_adjust_vruntime(cfs_rq_core, delta); + } +} + +bool cfs_prio_less(struct task_struct *a, struct task_struct *b) +{ + struct sched_entity *sea = &a->se; + struct sched_entity *seb = &b->se; + bool samecpu = task_cpu(a) == task_cpu(b); + struct task_struct *p; + s64 delta; + + if (samecpu) { + /* vruntime is per cfs_rq */ + while (!is_same_group(sea, seb)) { + int sea_depth = sea->depth; + int seb_depth = seb->depth; + + if (sea_depth >= seb_depth) + sea = parent_entity(sea); + if (sea_depth <= seb_depth) + seb = parent_entity(seb); + } + + delta = (s64)(sea->vruntime - seb->vruntime); + goto out; + } + + /* crosscpu: compare root level se's vruntime to decide priority */ + while (sea->parent) + sea = sea->parent; + while (seb->parent) + seb = seb->parent; + delta = (s64)(sea->vruntime - seb->vruntime); + +out: + p = delta > 0 ? b : a; + trace_printk("picked %s/%d %s: %Ld %Ld %Ld\n", p->comm, p->pid, + samecpu ? "samecpu" : "crosscpu", + sea->vruntime, seb->vruntime, delta); + + return delta > 0; } static __always_inline @@ -511,6 +607,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq) /* ensure we never gain time by being placed backwards. */ cfs_rq->min_vruntime = max_vruntime(cfs_rq_min_vruntime(cfs_rq), vruntime); + update_core_cfs_rq_min_vruntime(cfs_rq); #ifndef CONFIG_64BIT smp_wmb(); cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a829e26fa43a..ef9e08e5da6a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2561,6 +2561,8 @@ static inline bool sched_energy_enabled(void) { return false; } #endif /* CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL */ +bool cfs_prio_less(struct task_struct *a, struct task_struct *b); + #ifdef CONFIG_MEMBARRIER /* * The scheduler provides memory barriers required by membarrier between: -- 2.17.1