Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp1917128imd; Fri, 2 Nov 2018 02:55:42 -0700 (PDT) X-Google-Smtp-Source: AJdET5fBl6APHBbSBLhV+CZenxvKVKfNprVZOreyBTlxA1iMNRSa/79KnfWH5T82pwVeEkQ+XZBk X-Received: by 2002:a17:902:e185:: with SMTP id cd5-v6mr10763771plb.224.1541152542497; Fri, 02 Nov 2018 02:55:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1541152542; cv=none; d=google.com; s=arc-20160816; b=l1Xcp763DjLgb3PUz/wKo/a5Sf3zZG7yAn+nvV1iirUKR+scQ9I2/TEAnQlniIWPaA JlUNBzvXtwt45rLZKFAz5iSEZcdq20KFhDHSGwZOEie1r1ldHLMcWaqa1Jy80573zhIP rQwPwSEEdEWIvAwc4AunYpSWevSmi65pFwsBKI4YsZZPjhYfXT+R7mENDjBy9aVnx2+Y P6b3UWjxqVrTUdK4rhDIXObNb1vLVkOxejUMypNNAqK8isKsgdU6ov4HV7wslqUWChVi 137W3JStrlxMmAdlI34aGRDp1Vx5437XIC2pXOQ8R4AsfcEwmvO4NaGBnkb3brUI5koy 51MQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=6kJ/eN6c3dzgy4n4OoNK6mGRxVL+2A1Aqr0Fdt4heT8=; b=lD8irrYXgyUaQBi7/JD6SntoAdxEzb6dJFhUpU/CW5b/VGNexWo362tyE9sz74H99M JsFv+e5IY+eX/JG7FHYFyrAVR/a3nZaujmdsyx/ngCQkF8QNQjUj6RMO1dafigF/551U c9Fz3yPYJ8wjS3u7f79hyErJ9KZ/Zl9aNOmflZzXmc3qxrorhUFa1BVX20GaSAOe/GHQ jU2ua68v40iXHv6DbwNoCtbJ6wsQaBd59IJ3KRXyMuWZrxVeLy1UV8MX++5NRBFzzY40 Cjqolk6PycPlHh6VWTuyyDqzJ0VzN0fDd2qoUIvW8okXLZ3BXGtdRpXrXhSoQlmAzqoE Amzg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k64si20695696pge.7.2018.11.02.02.55.27; Fri, 02 Nov 2018 02:55:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726328AbeKBS7z (ORCPT + 99 others); Fri, 2 Nov 2018 14:59:55 -0400 Received: from foss.arm.com ([217.140.101.70]:39316 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725986AbeKBS7z (ORCPT ); Fri, 2 Nov 2018 14:59:55 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A9B621596; Fri, 2 Nov 2018 02:53:19 -0700 (PDT) Received: from e110439-lin (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 815DB3F71D; Fri, 2 Nov 2018 02:53:17 -0700 (PDT) Date: Fri, 2 Nov 2018 09:53:14 +0000 From: Patrick Bellasi To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Quentin Perret , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Suren Baghdasaryan , Aaron Lu , Ye Xiaolong , Ingo Molnar Subject: Re: [PATCH] sched/fair: util_est: fix cpu_util_wake for execl Message-ID: <20181102095314.GB31275@e110439-lin> References: <20181030160947.19581-1-patrick.bellasi@arm.com> <20181031184527.GA3178@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181031184527.GA3178@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 31-Oct 19:45, Peter Zijlstra wrote: > On Tue, Oct 30, 2018 at 04:09:47PM +0000, Patrick Bellasi wrote: > > > Let's fix this by ensuring to always discount the task estimated > > utilization from the CPU's estimated utilization when the task is also > > the current one. The same benchmark of the bug report, executed on a > > dual socket 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz machine, > > reports these "Execl Throughput" figures (higher the better): > > Before this we have: > > /* Discount task's blocked util from CPU's util */ > util -= min_t(unsigned int, util, task_util(p)); > > at the very least that comment is now inaccurate, since @p might not be > blocked. Right... will fix this too. > > @@ -6258,8 +6267,17 @@ static unsigned long cpu_util_wake(int cpu, struct task_struct *p) > > * covered by the following code when estimated utilization is > > * enabled. > > */ > > - if (sched_feat(UTIL_EST)) > > - util = max(util, READ_ONCE(cfs_rq->avg.util_est.enqueued)); > > + if (sched_feat(UTIL_EST)) { > > + unsigned int estimated = > > + READ_ONCE(cfs_rq->avg.util_est.enqueued); > > + > > + if (unlikely(current == p || task_on_rq_queued(p))) { > > I'm confused by the need for 'current == p', afaict task_on_rq_queued(p) > is sufficient -- we've already established task_cpu(p) == cpu earlier. Mmm... you right, I've got confused by the fact that current is removed from the RBTree, but we keep tracking it as: on_rq = TASK_ON_RQ_QUEUED ... unless, select_task_rq_fair() races with LB's: detach_task() p->on_rq = TASK_ON_RQ_MIGRATING; -----------------------------------A deactivate_task() \ dequeue_task() +- RaceTime util_est_dequeue() / -----------------------------------B set_task_cpu() migrate_task_rq{_fair}() detach_entity_cfs_rq() where, in [A..B] we will still avoid to discount *p's estimated utilization. :/ Do you think we can live with that for the time being, maybe by just adding a comment, or should we try to close that too ? Eventually, the (current == p) check, maybe moved to the right of the OR condition above, should certainly close the race window for the specific UnixBench's execl case. Assuming for example the execl is executed by a misfit task which is target of an active load balance... > > + estimated -= min_t(unsigned int, estimated, > > + (_task_util_est(p) | UTIL_AVG_UNCHANGED)); > > + } > > + > > + util = max(util, estimated); > > + } > > Also, I think it is about time we find a suitable name for: > > #define xxx(_var, _val) do { \ remove_contrib(_var, _val) ? > typeof(_var) var = (_var); \ > typeof(_var) val = (_val); \ > typeof(_var) res = var - val; \ > if (res > var) \ > res = 0; \ > (_var) = res; \ > } while (0) > > Which is basically sub_positive() but without the READ_ONCE/WRITE_ONCE > stuff. Perhaps there are still some paths in where sub_positive() can be recycled... will look better into that and see what we can do on that polishing side. However, I'll keep all that in a different patch. > We do that: > > var -= min_t(typeof(var), var, val); > > pattern _all_ over. Cheers Patrick -- #include Patrick Bellasi