Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp486734yba; Fri, 26 Apr 2019 03:45:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqzPeDceByVwFUHqnUgEc0uizhtYd4a15Us8z2LQ6aWFSxRSTGnHnT89JVzROCO2OfyIQh/A X-Received: by 2002:a62:a503:: with SMTP id v3mr13987668pfm.32.1556275558851; Fri, 26 Apr 2019 03:45:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556275558; cv=none; d=google.com; s=arc-20160816; b=LS4agvLzyFehGkTQxW6n1uViaW20nRyYpGrM7oSpyeqAnlKXQAn5I2MLkGtf6apYGe MeeKemB+lCTK/UDtFj38bsZOXZqxVkI9xUH/t0G2IBcShpuJN35PyoHq14rgvm66eoA6 GQUFXaeFvSTyV7B12tG2ii7FCLHDTtMUTG8+s5mapE5dkHsthBBx3eeo9kScxFCldeL0 LioKPFUTvTiuVM3RV1AyJvin/IpfuoiGjAMlVhdl2FHpcT9mPUYtRPHZrRlHr3dHk6jf +msWwC/ohghd3jlbSPfekrS/zxOJe73Xzr2ad2V+tWjLXdg4mpPz+eZ1srVqXElTIHnP 0ArQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=VOGbHae7mInsf0VZjJsCAhlPVMkzemqEzpYt7LFEfy8=; b=SC2QK9zGlM7gxEoGnyLZCQbr3TR9IdzyKI54E+AXy6a3nBdhnxsnJwfoe6+Y5HKv8J TYtzUmjL+FFyAMPyRLV8IBDPZqUHPn80V82FBtF4p9BHE4xmvoqvrp+4OO0PbqCUnbrD FxtwlivaFeCQj2WQPPuez0mEaGyfJ4gyxD0+/z3gMRs4gTJWsS8gbZySdGc48p9oCYV5 tJPR6SDT888jkK8H2H51exA8L3IM1HSLLxRWZE5+ukLUL2h8r/p1rZkXq/PD6mbQK5Ps nToh/RarN35NHYvLBZJXf0jPM5hjUxFtYR/vHXltPb41pzhYQy24izJQC+MkjdowhbSa mzMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q9si23353914pgv.542.2019.04.26.03.45.41; Fri, 26 Apr 2019 03:45:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726218AbfDZKnd (ORCPT + 99 others); Fri, 26 Apr 2019 06:43:33 -0400 Received: from outbound-smtp06.blacknight.com ([81.17.249.39]:51417 "EHLO outbound-smtp06.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725966AbfDZKnd (ORCPT ); Fri, 26 Apr 2019 06:43:33 -0400 Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp06.blacknight.com (Postfix) with ESMTPS id 545FB98BA5 for ; Fri, 26 Apr 2019 10:43:30 +0000 (UTC) Received: (qmail 20738 invoked from network); 26 Apr 2019 10:43:30 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[37.228.225.79]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 26 Apr 2019 10:43:30 -0000 Date: Fri, 26 Apr 2019 11:43:28 +0100 From: Mel Gorman To: Ingo Molnar Cc: Aubrey Li , Julien Desfossez , Vineeth Remanan Pillai , Nishanth Aravamudan , Peter Zijlstra , Tim Chen , Thomas Gleixner , Paul Turner , Linus Torvalds , Linux List Kernel Mailing , Subhra Mazumdar , Fr?d?ric Weisbecker , Kees Cook , Greg Kerr , Phil Auld , Aaron Lu , Valentin Schneider , Pawan Gupta , Paolo Bonzini , Jiri Kosina Subject: Re: [RFC PATCH v2 00/17] Core scheduling v2 Message-ID: <20190426104328.GA18914@techsingularity.net> References: <20190424140013.GA14594@sinkpad> <20190425095508.GA8387@gmail.com> <20190425144619.GX18914@techsingularity.net> <20190425185343.GA122353@gmail.com> <20190425213145.GY18914@techsingularity.net> <20190426084222.GC126896@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20190426084222.GC126896@gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 26, 2019 at 10:42:22AM +0200, Ingo Molnar wrote: > > It should, but it's not perfect. For example, wake_affine_idle does not > > take sibling activity into account even though select_idle_sibling *may* > > take it into account. Even select_idle_sibling in its fast path may use > > an SMT sibling instead of searching. > > > > There are also potential side-effects with cpuidle. Some workloads > > migration around the socket as they are communicating because of how the > > search for an idle CPU works. With SMT on, there is potentially a longer > > opportunity for a core to reach a deep c-state and incur a bigger wakeup > > latency. This is a very weak theory but I've seen cases where latency > > sensitive workloads with only two communicating tasks are affected by > > CPUs reaching low c-states due to migrations. > > > > > Clearly it doesn't. > > > > > > > It's more that it's best effort to wakeup quickly instead of being perfect > > by using an expensive search every time. > > Yeah, but your numbers suggest that for *most* not heavily interacting > under-utilized CPU bound workloads we hurt in the 5-10% range compared to > no-SMT - more in some cases. > Indeed, it was higher than expected and we can't even use the excuse that more resources are available to a single logical CPU as the scheduler is meant to keep them apart. > So we avoid a maybe 0.1% scheduler placement overhead but inflict 5-10% > harm on the workload, and also blow up stddev by randomly co-scheduling > two tasks on the same physical core? Not a good trade-off. > > I really think we should implement a relatively strict physical core > placement policy in the under-utilized case, and resist any attempts to > weaken this for special workloads that ping-pong quickly and benefit from > sharing the same physical core. > It's worth a shot at least. Changes should mostly be in the wake_affine path for most loads of interest. > I.e. as long as load is kept below ~50% the SMT and !SMT benchmark > results and stddev numbers should match up. (With a bit of a leewy if the > workload gets near to 50% or occasionally goes above it.) > > There's absolutely no excluse for these numbers at 30-40% load levels I > think. > Agreed. I'll put it on the todo list but there is no way I'll get to it in the short term due to LSF/MM. Minimally I'll put some thought into tooling to track how often siblings are used with some reporting on when a sibling was used when there was an idle core available. That'll at least quantify the problem and verify the hypothesis. -- Mel Gorman SUSE Labs