Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp351031yba; Fri, 5 Apr 2019 07:59:15 -0700 (PDT) X-Google-Smtp-Source: APXvYqw5TSAT89uQaCKyzxcZqGs6toXKOFxbsZmmW+gwG3L6REKKL1YkYlFF8jHLX+UV7iCNbh4m X-Received: by 2002:a17:902:8349:: with SMTP id z9mr12923622pln.144.1554476355398; Fri, 05 Apr 2019 07:59:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554476355; cv=none; d=google.com; s=arc-20160816; b=o6U4dfMcCe7R+vgolhof9nX/+n/1t9nfpKbN8s8xpESYSavyyAzVo2DBjjwEGjeXaT YdiyGJ4VTgo8dV6xYCLSXgxAMDPK2I9BDEwKHrJ1ruLalzQ6ZXFlhDdBCXSajAl3R8/v YNryNtkfUhevLY1PedkxoC5uDYWyW6s0nAUM7aJBTqAVtsyedvJYjZ2mew1VipQr/eN0 CpRRjxl05Vbae1c3VxRXLnT2HVwpxRx+iBTtNd2/uY5osOhid1vM/AbIIpuJlPN5QSFn YRefNnmDTKZguhzBXVZFvdHQAqeg/BUZqmLPVysrJVh7b1rCIM2qXbPqtkzP220CWzJF uVqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=6TU6ofzRDC8HlelvcklfCxFvw1UoJyy46NbqXiCM1mY=; b=hnQC9sJAkjmrnvoLSgTUavpz3zhDXjE7bdCntloJYZK0XxUArDlRt7P3nmxyyk9mP8 e2HQka5+XcTZMR5DopqYQk0eUjrd1yE67iafZyVQYstAZ67j2MhGANjYin7bHqnDyPEU ViGkMqRAZF05t2ZXyWDZF8NMdBuW95zwOBIN7td3TAEuSi9ZN61oJWo0cXwEHU3rcr1z DZQBvRFkzF8Xi/AEK80BbEA8f933XS1P4ujZuA1TMrGHCIrWfkFharAxQg8ZopVAtkCs x4G1RqMxoqCm31ZyOmz+BhggKdKazBgqbXmzjgHD7Yig4uLSljiDLaiR+cnmzo0rVrE6 d/JA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p88si18998958pfi.142.2019.04.05.07.59.00; Fri, 05 Apr 2019 07:59:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731224AbfDEO6S (ORCPT + 99 others); Fri, 5 Apr 2019 10:58:18 -0400 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:48858 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726594AbfDEO6Q (ORCPT ); Fri, 5 Apr 2019 10:58:16 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R921e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04427;MF=aaron.lu@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0TOZE.oT_1554476132; Received: from aaronlu(mailfrom:aaron.lu@linux.alibaba.com fp:SMTPD_---0TOZE.oT_1554476132) by smtp.aliyun-inc.com(127.0.0.1); Fri, 05 Apr 2019 22:55:42 +0800 Date: Fri, 5 Apr 2019 22:55:32 +0800 From: Aaron Lu To: Peter Zijlstra Cc: mingo@kernel.org, tglx@linutronix.de, pjt@google.com, tim.c.chen@linux.intel.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, subhra.mazumdar@oracle.com, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Aubrey Li , Julien Desfossez Subject: Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling. Message-ID: <20190405145530.GA453@aaronlu> References: <20190218165620.383905466@infradead.org> <20190218173514.667598558@infradead.org> <20190402064612.GA46500@aaronlu> <20190402082812.GJ12232@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190402082812.GJ12232@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 02, 2019 at 10:28:12AM +0200, Peter Zijlstra wrote: > Another approach would be something like the below: > > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -87,7 +87,7 @@ static inline int __task_prio(struct tas > */ > > /* real prio, less is less */ > -static inline bool __prio_less(struct task_struct *a, struct task_struct *b, bool runtime) > +static inline bool __prio_less(struct task_struct *a, struct task_struct *b, u64 vruntime) > { > int pa = __task_prio(a), pb = __task_prio(b); > > @@ -104,21 +104,25 @@ static inline bool __prio_less(struct ta > if (pa == -1) /* dl_prio() doesn't work because of stop_class above */ > return !dl_time_before(a->dl.deadline, b->dl.deadline); > > - if (pa == MAX_RT_PRIO + MAX_NICE && runtime) /* fair */ > - return !((s64)(a->se.vruntime - b->se.vruntime) < 0); > + if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */ > + return !((s64)(a->se.vruntime - vruntime) < 0); ~~~ I think <= should be used here, so that two tasks with the same vruntime will return false. Or we could bounce two tasks having different tags with one set to max in the first round and the other set to max in the next round. CPU would stuck in __schedule() with irq disabled. > > return false; > } > > static inline bool cpu_prio_less(struct task_struct *a, struct task_struct *b) > { > - return __prio_less(a, b, true); > + return __prio_less(a, b, b->se.vruntime); > } > > static inline bool core_prio_less(struct task_struct *a, struct task_struct *b) > { > - /* cannot compare vruntime across CPUs */ > - return __prio_less(a, b, false); > + u64 vruntime = b->se.vruntime; > + > + vruntime -= task_rq(b)->cfs.min_vruntime; > + vruntime += task_rq(a)->cfs.min_vruntime After some testing, I figured task_cfs_rq() should be used instead of task_rq(:-) With the two changes(and some other minor ones that still need more time to sort out), I'm now able to start doing 2 full CPU kbuilds in 2 tagged cgroups. Previouslly, the system would hang pretty soon after I started kbuild in any tagged cgroup(presumbly, CPUs stucked in __schedule() with irqs disabled). And there is no warning appeared due to two tasks having different tags get scheduled on the same CPU. Thanks, Aaron > + > + return __prio_less(a, b, vruntime); > } > > static inline bool __sched_core_less(struct task_struct *a, struct task_struct *b)