Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp397304ybk; Fri, 15 May 2020 03:43:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz0lsNfwjJpR+0KaZ8FAZKiVcECwgRxA4g02dPF9T+Ck9IOypJRm7cFxjFZsQz7IsnmjUq1 X-Received: by 2002:aa7:d513:: with SMTP id y19mr2232396edq.367.1589539400048; Fri, 15 May 2020 03:43:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589539400; cv=none; d=google.com; s=arc-20160816; b=XQtlUWxHoW9ypSqq6w3VoRZxczG+nWHkz6s/XRZWWNqDjTvhwKv5OubVn7btPoyilD SYoTNrBETz/b+srpnCz8+BTP0jpZa1sdwDEnBOectK36CrTMNPnB7nM6GZUigq8li3sk WfR0ZU+jizckrDtFZh6A3dSHsJImubDyz/PzXg3i5Do+QLyxbPUQDETgQabGrm2zWITU 3+L1i4I6zlWrjVsZmjUFr51T08zg03K/rkO6hINqi4mx+NJYulC0cjZR/LJEclnRLjj9 GKsxilHl3KIddwDa8XFDroa92cbXLyeViNbMPONbPRoBsL/s2/a10DaFQ5DsSzRxK9zH QBuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=d6gJlIUI0ETFP+NUvnbl5RO+OQG2do57lk8eV+Mz+RA=; b=NosOMgz1fVQL6uNaegoIqGX0eFwpJ43mLkRXV4k1xlfK8EKKXKq0eB+9v90YveR6tb SL6zjUti8VSZjgnpm3OWP4iYHBFdt6dfgb9STTMUKLnAXrCxJLvMyxxdidlclLxliWnk m0PdS0SHY/X6HigKqrCnidZwP3B6lPZgZpXKS70SfQYvqaNfO+fd2wmAWSNk5mbzaOCJ uRR0eo1CBC7M4F7RhtSustBozAFnQEx8coKPlMMQumlqbcxH59e7gcXfUhSaLMo90bLP XKhZg0lD7oQ+yi9+xEdZ9oFL3K+RVNnHsAXZn4abq1hzlLQzuuvszGbGRi6B4GSKLJT1 XLug== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=qcyKhjDT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cc2si1060003edb.598.2020.05.15.03.42.55; Fri, 15 May 2020 03:43:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=qcyKhjDT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728179AbgEOKjd (ORCPT + 99 others); Fri, 15 May 2020 06:39:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728013AbgEOKjc (ORCPT ); Fri, 15 May 2020 06:39:32 -0400 Received: from merlin.infradead.org (unknown [IPv6:2001:8b0:10b:1231::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86EA9C061A0C for ; Fri, 15 May 2020 03:39:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=d6gJlIUI0ETFP+NUvnbl5RO+OQG2do57lk8eV+Mz+RA=; b=qcyKhjDTftf9MxoT0KvGbOrVbA 0n5smoKJH7vHqh12ZimmdH6szzk4CPSBXN7QIqYPK5sDypKhGFojNLv9w+QONdTujIs4uvEABI0gy Djxa7HDnJfi6MTNM2tjAbLSd0ToobfVYMMLAbnRg6C+2MRsqb+oQvi78gC4JUmc1T4qeI8dqLtNcn B+69XOLbh029kwDNZuVO3sTSZqDLiNwOHWSG1C8U55Fz8ZNT6MDuEJoiUzYf0ytxqnavndz2MvXCr EUogQ7R5xgPuZqFSzP/g5iNES+5e/vRXwdi1MwIhAQUO+rZcoCYvSKYD3eOCl/WDSwEenGcI8idwK KYcqNtbw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1jZXjy-00052d-31; Fri, 15 May 2020 10:38:54 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 12DE73007CD; Fri, 15 May 2020 12:38:44 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id D892B20267E65; Fri, 15 May 2020 12:38:44 +0200 (CEST) Date: Fri, 15 May 2020 12:38:44 +0200 From: Peter Zijlstra To: Vineeth Remanan Pillai Cc: Aaron Lu , Nishanth Aravamudan , Julien Desfossez , Tim Chen , Ingo Molnar , Thomas Gleixner , Paul Turner , Linus Torvalds , Aaron Lu , Linux List Kernel Mailing , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Kees Cook , Greg Kerr , Phil Auld , Aubrey Li , "Li, Aubrey" , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini , Joel Fernandes , Joel Fernandes Subject: Re: [PATCH updated v2] sched/fair: core wide cfs task priority comparison Message-ID: <20200515103844.GG2978@hirez.programming.kicks-ass.net> References: <20200420080759.GA224731@ziqianlu-desktop.localdomain> <20200421025131.GA227300@aaronlu-desktop> <20200424142443.GA263207@aaronlu-desktop> <20200506143506.GH5298@hirez.programming.kicks-ass.net> <20200508084419.GA120223@aaronlu-desktop> <20200508090925.GV5298@hirez.programming.kicks-ass.net> <20200508123457.GA122180@aaronlu-desktop> <20200514130248.GD2940@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 14, 2020 at 06:51:27PM -0400, Vineeth Remanan Pillai wrote: > On Thu, May 14, 2020 at 9:02 AM Peter Zijlstra wrote: > > > > A little something like so, this syncs min_vruntime when we switch to > > single queue mode. This is very much SMT2 only, I got my head in twist > > when thikning about more siblings, I'll have to try again later. > > > Thanks for the quick patch! :-) > > For SMT-n, would it work if sync vruntime if atleast one sibling is > forced idle? Since force_idle is for all the rqs, I think it would > be correct to sync the vruntime if atleast one cpu is forced idle. It's complicated ;-) So this sync is basically a relative reset of S to 0. So with 2 queues, when one goes idle, we drop them both to 0 and one then increases due to not being idle, and the idle one builds up lag to get re-elected. So far so simple, right? When there's 3, we can have the situation where 2 run and one is idle, we sync to 0 and let the idle one build up lag to get re-election. Now suppose another one also drops idle. At this point dropping all to 0 again would destroy the built-up lag from the queue that was already idle, not good. So instead of syncing everything, we can: less := !((s64)(s_a - s_b) <= 0) (v_a - S_a) - (v_b - S_b) == v_a - v_b - S_a + S_b == v_a - (v_b - S_a + S_b) IOW, we can recast the (lag) comparison to a one-sided difference. So if then, instead of syncing the whole queue, sync the idle queue against the active queue with S_a + S_b at the point where we sync. (XXX consider the implication of living in a cyclic group: N / 2^n N) This gives us means of syncing single queues against the active queue, and for already idle queues to preseve their build-up lag. Of course, then we get the situation where there's 2 active and one going idle, who do we pick to sync against? Theory would have us sync against the combined S, but as we've already demonstated, there is no such thing in infeasible weight scenarios. One thing I've considered; and this is where that core_active rudiment came from, is having active queues sync up between themselves after every tick. This limits the observed divergence due to the work conservance. On top of that, we can improve upon things by moving away from our horrible (10) hack and moving to (9) and employing (13) here. Anyway, I got partway through that in the past days, but then my head hurt. I'll consider it some more :-) > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > - if (is_idle_task(rq_i->core_pick) && rq_i->nr_running) > > - rq_i->core_forceidle = true; > > + if (is_idle_task(rq_i->core_pick)) { > > + if (rq_i->nr_running) > > + rq_i->core_forceidle = true; > > + } else { > > + new_active++; > I think we need to reset new_active on restarting the selection. But this loop is after selection has been done; we don't modify new_active during selection. > > + /* > > + * We just dropped into single-rq mode, increment the sequence > > + * count to trigger the vruntime sync. > > + */ > > + rq->core->core_sync_seq++; > > + } > > + rq->core->core_active = new_active; > core_active seems to be unused. Correct; that's rudiments from an SMT-n attempt. > > +bool cfs_prio_less(struct task_struct *a, struct task_struct *b) > > +{ > > + struct sched_entity *se_a = &a->se, *se_b = &b->se; > > + struct cfs_rq *cfs_rq_a, *cfa_rq_b; > > + u64 vruntime_a, vruntime_b; > > + > > + while (!is_same_tg(se_a, se_b)) { > > + int se_a_depth = se_a->depth; > > + int se_b_depth = se_b->depth; > > + > > + if (se_a_depth <= se_b_depth) > > + se_b = parent_entity(se_b); > > + if (se_a_depth >= se_b_depth) > > + se_a = parent_entity(se_a); > > + } > > + > > + cfs_rq_a = cfs_rq_of(se_a); > > + cfs_rq_b = cfs_rq_of(se_b); > > + > > + vruntime_a = se_a->vruntime - cfs_rq_a->core_vruntime; > > + vruntime_b = se_b->vruntime - cfs_rq_b->core_vruntime; > Should we be using core_vruntime conditionally? should it be min_vruntime for > default comparisons and core_vruntime during force_idle? At the very least it should be min_vruntime when cfs_rq_a == cfs_rq_b, ie. when we're on the same CPU. For the other case I was considering that tick based active sync, but never got that finished and admittedly it all looks a bit weird. But I figured I'd send it out so we can at least advance the discussion. --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -469,7 +469,7 @@ bool cfs_prio_less(struct task_struct *a { struct sched_entity *se_a = &a->se, *se_b = &b->se; struct cfs_rq *cfs_rq_a, *cfa_rq_b; - u64 vruntime_a, vruntime_b; + u64 s_a, s_b, S_a, S_b; while (!is_same_tg(se_a, se_b)) { int se_a_depth = se_a->depth; @@ -484,10 +484,16 @@ bool cfs_prio_less(struct task_struct *a cfs_rq_a = cfs_rq_of(se_a); cfs_rq_b = cfs_rq_of(se_b); - vruntime_a = se_a->vruntime - cfs_rq_a->core_vruntime; - vruntime_b = se_b->vruntime - cfs_rq_b->core_vruntime; + S_a = cfs_rq_a->core_vruntime; + S_b = cfs_rq_b->core_vruntime; - return !((s64)(vruntime_a - vruntime_b) <= 0); + if (cfs_rq_a == cfs_rq_b) + S_a = S_b = cfs_rq_a->min_vruntime; + + s_a = se_a->vruntime - S_a; + s_b = se_b->vruntime - S_b; + + return !((s64)(s_a - s_b) <= 0); } static __always_inline