Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3225870yba; Sun, 28 Apr 2019 20:54:31 -0700 (PDT) X-Google-Smtp-Source: APXvYqwkM+fetULnQX8NliwWBPEAIR98xqdD3t7EHHX1OAHiS0/TAt5jTcXejTz9AM2agh6m05Kn X-Received: by 2002:a63:d343:: with SMTP id u3mr7020315pgi.285.1556510071882; Sun, 28 Apr 2019 20:54:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556510071; cv=none; d=google.com; s=arc-20160816; b=OtTwEvI5QYsOSJY8Vf0gYI/fjuVk0kB6duKNJdkd+ESMurfkIv3I7cbOjoJBToSUnj PNNn/HvTWyhptXhZCHM5PUtmD1eSUdmdo/J6/u+uBa1zWiCMloIoxNRcNWKc+IZCBVdN ORg7GsiPBiJUK7lyoAdQvSGRIN/qa0g4AgwaCK2A68yrACcjRrfx1FBUmYcsR3wbb9lq 4W8IaQoyO6j17jUvbW0J8b91hhV8QFW7Aw6T+acvomTGiKkIfIEOTWTLD4pOpN3N0ypx Ghs/3ODq4LU8WHc8KrX0NH7VJuIJOaMCPzAqnUeZ5EQRRO3Hv3okHmwBToFhc525POvv ngFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=zIivFdbfDULkdcpEQBfUI53MpMBf5riJCXN7kDek9f8=; b=tcqBEr6zXpNl7s7q4nP9cYNeJCfwAB/aOBjSasffeGYlR8lKaI9P73tvI5jZeEVisd AdPlKMRTht6Pzwskd9j//aJ7qkf5qOAyyscVFOhsxlpsFoRjeBMliIyjr5l/geEgU/JA Jiir0aDSUUEw24CvmLnDT+xS8rQbm4XaJ9N//oDOaBxP0r0TZm1SwkC2q7yGJnbfU/g2 zABF/jFnJ+ZOnefWtDj5Pcq5UPo98IQn5NV/meNqxleKV47wjkbHcYg8oGYbukD6q4c6 wzuiBk8Vbl/+4XSTqGpOuEtZsgNUqhRvtfwAzxxwseOO68LA0cYjJ35MaPgcjprneALm D+mQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id co15si2151517plb.330.2019.04.28.20.54.16; Sun, 28 Apr 2019 20:54:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726991AbfD2Dx2 (ORCPT + 99 others); Sun, 28 Apr 2019 23:53:28 -0400 Received: from out30-56.freemail.mail.aliyun.com ([115.124.30.56]:42448 "EHLO out30-56.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726819AbfD2Dx1 (ORCPT ); Sun, 28 Apr 2019 23:53:27 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R861e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04391;MF=aaron.lu@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0TQTE8IF_1556510001; Received: from aaronlu(mailfrom:aaron.lu@linux.alibaba.com fp:SMTPD_---0TQTE8IF_1556510001) by smtp.aliyun-inc.com(127.0.0.1); Mon, 29 Apr 2019 11:53:22 +0800 Date: Mon, 29 Apr 2019 11:53:21 +0800 From: Aaron Lu To: Vineeth Remanan Pillai Cc: Phil Auld , Nishanth Aravamudan , Julien Desfossez , Peter Zijlstra , Tim Chen , mingo@kernel.org, tglx@linutronix.de, pjt@google.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, subhra.mazumdar@oracle.com, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Aaron Lu , Aubrey Li , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini Subject: Re: [RFC PATCH v2 00/17] Core scheduling v2 Message-ID: <20190429035320.GB128241@aaronlu> References: <20190423180238.GG22260@pauld.bos.csb> <20190423184527.6230-1-vpillai@digitalocean.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190423184527.6230-1-vpillai@digitalocean.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 23, 2019 at 06:45:27PM +0000, Vineeth Remanan Pillai wrote: > >> - Processes with different tags can still share the core > > > I may have missed something... Could you explain this statement? > > > This, to me, is the whole point of the patch series. If it's not > > doing this then ... what? > > What I meant was, the patch needs some more work to be accurate. > There are some race conditions where the core violation can still > happen. In our testing, we saw around 1 to 5% of the time being > shared with incompatible processes. One example of this happening > is as follows(let cpu 0 and 1 be siblings): > - cpu 0 selects a process with a cookie > - cpu 1 selects a higher priority process without cookie > - Selection process restarts for cpu 0 and it might select a > process with cookie but with lesser priority. > - Since it is lesser priority, the logic in pick_next_task > doesn't compare again for the cookie(trusts pick_task) and > proceeds. > > This is one of the scenarios that we saw from traces, but there > might be other race conditions as well. Fix seems a little > involved and We are working on that. This is what I have used to make sure no two unmatched tasks being scheduled on the same core: (on top of v1, I thinks it's easier to just show the diff instead of commenting on various places of the patches :-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index cb24a0141e57..0cdb1c6a00a4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -186,6 +186,10 @@ struct task_struct *sched_core_find(struct rq *rq, unsigned long cookie) */ match = idle_sched_class.pick_task(rq); + /* TODO: untagged tasks are not in the core tree */ + if (!cookie) + goto out; + while (node) { node_task = container_of(node, struct task_struct, core_node); @@ -199,6 +203,7 @@ struct task_struct *sched_core_find(struct rq *rq, unsigned long cookie) } } +out: return match; } @@ -3634,6 +3639,8 @@ static inline bool cookie_match(struct task_struct *a, struct task_struct *b) } // XXX fairness/fwd progress conditions +// when max is unset, return class_pick; +// when max is set, return cookie_pick unless class_pick has higher priority. static struct task_struct * pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *max) { @@ -3652,7 +3659,19 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma } class_pick = class->pick_task(rq); - if (!cookie) + /* + * we can only return class_pick here when max is not set. + * + * when max is set and cookie is 0, we still have to check if + * class_pick's cookie matches with max, or we can end up picking + * an unmacthed task. e.g. max is untagged and class_pick here + * is tagged. + */ + if (!cookie && !max) + return class_pick; + + /* in case class_pick matches with max, no need to check priority */ + if (class_pick && cookie_match(class_pick, max)) return class_pick; cookie_pick = sched_core_find(rq, cookie); @@ -3663,8 +3682,11 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma * If class > max && class > cookie, it is the highest priority task on * the core (so far) and it must be selected, otherwise we must go with * the cookie pick in order to satisfy the constraint. + * + * class_pick and cookie_pick are on the same cpu so use cpu_prio_less() + * max and class_pick are on different cpus so use core_prio_less() */ - if (cpu_prio_less(cookie_pick, class_pick) && cpu_prio_less(max, class_pick)) + if (cpu_prio_less(cookie_pick, class_pick) && core_prio_less(max, class_pick)) return class_pick; return cookie_pick; @@ -3731,8 +3753,17 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) rq_i->core_pick = NULL; - if (i != cpu) + if (i != cpu) { update_rq_clock(rq_i); + /* + * we are going to pick tasks for both cpus, if our + * sibling is idle and we have core_cookie set, now + * is the time to clear/reset it so that we can do + * an unconstained pick. + */ + if (is_idle_task(rq_i->curr) && rq_i->core->core_cookie) + rq_i->core->core_cookie = 0; + } } /* @@ -3794,20 +3825,42 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) * * NOTE: this is a linear max-filter and is thus bounded * in execution time. + * + * The fact that pick_task() returns p with a different + * cookie means p has higher priority and we need to + * replace max with p. */ - if (!max || core_prio_less(max, p)) { + if (!max || !cookie_match(max, p)) { struct task_struct *old_max = max; rq->core->core_cookie = p->core_cookie; max = p; trace_printk("max: %s/%d %lx\n", max->comm, max->pid, max->core_cookie); - if (old_max && !cookie_match(old_max, p)) { + if (old_max) { for_each_cpu(j, smt_mask) { if (j == i) continue; cpu_rq(j)->core_pick = NULL; + + /* + * if max is untagged, then core_cookie + * is zero and siblig can do a wrongly + * unconstained pick. avoid that by doing + * pick directly here. since there is no + * untagged tasks in core tree, just + * use idle for our sibling. + * TODO: sibling may pick an untagged task. + */ + if (max->core_cookie) + cpu_rq(j)->core_pick = NULL; + else { + cpu_rq(j)->core_pick = idle_sched_class.pick_task(cpu_rq(j)); + occ = 1; + goto out; + } + } occ = 1; goto again; @@ -3817,6 +3870,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) next_class:; } +out: rq->core->core_pick_seq = rq->core->core_task_seq; /* @@ -3834,6 +3888,17 @@ next_class:; rq_i->core_pick->core_occupation = occ; + /* make sure we didn't break L1TF */ + if (!is_idle_task(rq_i->core_pick) && + rq_i->core_pick->core_cookie != rq_i->core->core_cookie) { + trace_printk("cpu%d: cookie mismatch. %s/%d/0x%lx/0x%lx\n", + rq_i->cpu, rq_i->core_pick->comm, + rq_i->core_pick->pid, + rq_i->core_pick->core_cookie, + rq_i->core->core_cookie); + WARN_ON_ONCE(1); + } + if (i == cpu) continue;