Received: by 2002:ac0:aa62:0:0:0:0:0 with SMTP id w31-v6csp747752ima; Wed, 24 Oct 2018 08:36:18 -0700 (PDT) X-Google-Smtp-Source: AJdET5cjDC4sshzAOnEPWOKt0xAkv8V+tqnIpRRF2nLbH4LKAKbG8Dp9c2M/deW6/em9z0jwPlwo X-Received: by 2002:a62:5ac6:: with SMTP id o189-v6mr3214684pfb.40.1540395377991; Wed, 24 Oct 2018 08:36:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540395377; cv=none; d=google.com; s=arc-20160816; b=zcBixwg0HeRsidqH7NFejsmy18xlBBhKuW1KA7sv+dTNl/+2B4ZC44nQkUweEFg6DH L6tVxiHVJ+bF8chq6R/tHLPYdpOsmhZXp5w55xMoVsqEwi2HmrgC2tVYCYyd0jQsl2jG poT/oXZEY7QlQg9HzIjmpxDUr6AI8b81wHNR8674RB9+MAUmNZAheevgwK9rtjJAibKL Q5L0JLC+/6GswFJLLYf9iv9kXgKSgv4xfE4nFd7LyptwVSilXjgPMJOvv7yMPKOWJHBa vjp91wf0dB58KO5PS4VGZI6dVsrz4olkowaMQwtB1lFWTJ2tkD9/XmPhFS6d1iEF+xIF s12Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=dMRROsw0OTpANZ6OudEyvQuHlDmaVakQ5cIbY5a67Fs=; b=Nr7EtgjofG/x8MmgMNaz7YTdMmpS9xG/OBCiy+9EQejX4F5+PvmDgJFLDsuTEfJ3Ka GP2MVQaPVSUMKslWiBr0KMPy4EcLjJw1aB5V40euntNhaxyDUoDeWm+cHm1t/t350CuB ta4tjhRnbqEeCPzzJOJ76yDelc1UAo838roTjBtGt3JSlInV3iVC02+AHZR6v2lCnQ1h vdxQYqWd6qTa255y3t2Tv9yXSfSkWUS8IQBnVpQBmeHutsfTvJMkGRxwF39xIdy+GdcZ rJj2xHgCys3gU28QQ5hvEbWQt2O+mwj+5P7nF3qjjswg4Cr3zO9QKNU6Iz05mH6K3jS2 xsmQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e16-v6si4871283pfn.124.2018.10.24.08.36.02; Wed, 24 Oct 2018 08:36:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726773AbeJYADL (ORCPT + 99 others); Wed, 24 Oct 2018 20:03:11 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:44800 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726449AbeJYADL (ORCPT ); Wed, 24 Oct 2018 20:03:11 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 27EA6A78; Wed, 24 Oct 2018 08:34:38 -0700 (PDT) Received: from [10.1.194.37] (e113632-lin.cambridge.arm.com [10.1.194.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1385B3F71D; Wed, 24 Oct 2018 08:34:35 -0700 (PDT) Subject: Re: [PATCH 00/10] steal tasks to improve CPU utilization To: Steven Sistare , Peter Zijlstra Cc: mingo@redhat.com, subhra.mazumdar@oracle.com, dhaval.giani@oracle.com, daniel.m.jordan@oracle.com, pavel.tatashin@microsoft.com, matt@codeblueprint.co.uk, umgwanakikbuti@gmail.com, riel@redhat.com, jbacik@fb.com, juri.lelli@redhat.com, linux-kernel@vger.kernel.org References: <1540220381-424433-1-git-send-email-steven.sistare@oracle.com> <20181022170421.GF3117@worktop.programming.kicks-ass.net> <8e38ce84-ec1a-aef7-4784-462ef754f62a@oracle.com> From: Valentin Schneider Message-ID: Date: Wed, 24 Oct 2018 16:34:34 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <8e38ce84-ec1a-aef7-4784-462ef754f62a@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 22/10/2018 20:07, Steven Sistare wrote: > On 10/22/2018 1:04 PM, Peter Zijlstra wrote: [...] > > We could delete idle_balance() and use stealing exclusively for handling > new idle. For each sd level, stealing would look for an overloaded CPU > in the overloaded bitmap(s) that overlap that level. I played with that > a little but it is not ready for prime time, and I did not want to hold > the patch series for it. Also, I would like folks to get some production > experience with stealing on a variety of architectures before considering > a radical step like replacing idle_balance(). > I think this could work fine for standard symmetrical systems, but I have some concerns for asymmetric systems (Arm big.LITTLE & co). One thing that should show up in 4.20-rc1 is the misfit logic, which caters to those asymmetric systems. If you look at 757ffdd705ee ("sched/fair: Set rq->rd->overload when misfit") on Linus' tree, we can set rq->rd->overload even if (rq->nr_running == 1). This is because we do want to do an idle_balance() when we have misfit tasks, which should lead to active balancing one of those CPU-hungry tasks to move it to a more powerful CPU. With a pure try_steal() approach, we won't do any active balancing - we could steal some task from a cfs_overload_cpu but that's not what the load balancer would have done. The load balancer would only do such a thing if the imbalance type is group_overloaded, which means: sum_nr_running > group_weight && group_util * sd->imbalance_pct > group_capacity * 100 (IOW the number of tasks running on the CPU is not the sole deciding factor) Otherwise, misfit tasks (group_misfit_task imbalance type) would have priority. Perhaps we could decorate the cfs_overload_cpus with some more information (e.g. misfit task presence), but then we'd have to add some logic to decide when to steal what. We'd also lose the NOHZ update done in idle_balance(), though I think it's not such a big deal - were were piggy-backing this on idle_balance() just because it happened to be convenient, and we still have NOHZ_STATS_KICK anyway. Another thing - in your test cases, what is the most prevalent cause of failure to pull a task in idle_balance()? Is it the load_balance() itself that fails to find a task (e.g. because the imbalance is not deemed big enough), or is it the idle migration cost logic that prevents load_balance() from running to completion? In the first case, try_steal() makes perfect sense to me. In the second case, I'm not sure if we really want to pull something if we know (well, we *think*) we're about to resume the execution of some other task. > We could merge the stealing code into the idle_balance() code to get a > union of the two, but IMO that would be less readable. > > We could remove the core and socket levels from idle_balance() I understand that as only doing load_balance() at DIE level in idle_balance(), as that is what makes most sense to me (with big.LITTLE those misfit migrations are done at DIE level), is that correct? Also, with DynamIQ (next gen big.LITTLE) we could have asymmetry at MC level, which could cause issues there. > and let > stealing handle those levels. I think that makes sense after stealing > performance is validated on more architectures, but we would still have > two different mechanisms. > > - Steve > I'll try out those patches on top of the misfit series to see how the whole thing behaves.