Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp67576ybl; Tue, 7 Jan 2020 02:17:56 -0800 (PST) X-Google-Smtp-Source: APXvYqx3OOe4uw01Eh/OPR+kz6+rTwzexB80w1XXk+JgKdf2VT/YVJVi86UD1dNeV3SCvTU6rX6F X-Received: by 2002:a9d:222f:: with SMTP id o44mr107545163ota.51.1578392276071; Tue, 07 Jan 2020 02:17:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578392276; cv=none; d=google.com; s=arc-20160816; b=YnNFfBJBpRvnlP12whFLtz0unt9SbPcdA4T2PhYhhhFLFI+sElIPlxq/XtPQvAgCTf T4gA9Fyc0m2TmCobi2Sj6MFZKsDZDJsX83sPStR3Z4lcWnpqAfqn72SvYAI7dlKMdHbf wstCeTNBlE4ziNgbVNI3z7ClGed4NiJQQBRSnpiCqIeydMynAuu9OdzKVwPMHt++flx1 ph5a4LwkSvhR+xjK/0JocBDuIOczx4n2t5m2DYq0Mdw5vbxmmldsuXMoN2j0S0mLfmyz kHVJE8KfXiop+v9NL0Tpfcj4pdKVwxs34pRUUCttuP40sHQF/gIGW7ZPSOekRkq35ztn YWVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Rx7RzcGcgZ1Zs7LhHCgnR+kMQ2tb4xvr61nKMURk6O8=; b=pHBBkdq36ZSwTaK4GzVD/FyxFxy2KjUr4Q3qEMyGKITztfg784S0TNkOopkm2KTavS Mqmj6xBxwh8NymCqvVCeG/9pBiSQzC9oRaMZtSt/3d0plJamslwpHYParsR7mRlNnd21 vtFOEZkd2U7rEEcLH/64lFeIgFeRkrsuiszyTPc1IhOzAonDKWescxFvenoWcJmXX+Ob hDITZxgmdCzzUjkHpJitlu0RP3T4T5124Uic8ghc2J/tYTm4zgmw1Msdt84h4wKoGLj7 wsOZXQfmkdsRfzgFEbhKmNfXf8djVmLxVTvCh17PeTX/HDfXk+8jZSV+rK6dTRpMB4ao J1fw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l26si35926106oti.152.2020.01.07.02.17.42; Tue, 07 Jan 2020 02:17:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727745AbgAGKQx (ORCPT + 99 others); Tue, 7 Jan 2020 05:16:53 -0500 Received: from outbound-smtp05.blacknight.com ([81.17.249.38]:43278 "EHLO outbound-smtp05.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726558AbgAGKQx (ORCPT ); Tue, 7 Jan 2020 05:16:53 -0500 Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp05.blacknight.com (Postfix) with ESMTPS id 492B19890A for ; Tue, 7 Jan 2020 10:16:50 +0000 (GMT) Received: (qmail 3988 invoked from network); 7 Jan 2020 10:16:49 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.18.57]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 7 Jan 2020 10:16:49 -0000 Date: Tue, 7 Jan 2020 10:16:46 +0000 From: Mel Gorman To: Vincent Guittot Cc: Hillf Danton , Rik van Riel , Ingo Molnar , Peter Zijlstra , Phil Auld , Valentin Schneider , Srikar Dronamraju , Quentin Perret , Dietmar Eggemann , Morten Rasmussen , Parth Shah , LKML Subject: Re: [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v3 Message-ID: <20200107101646.GG3466@techsingularity.net> References: <20200106144250.GA3466@techsingularity.net> <04033a63f11a9c59ebd2b099355915e4e889b772.camel@surriel.com> <20200106163303.GC3466@techsingularity.net> <20200107015111.4836-1-hdanton@sina.com> <20200107091256.GE3466@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 07, 2020 at 10:43:08AM +0100, Vincent Guittot wrote: > > > > > It's not directly related to the number of CPUs in the node. Are you > > > > > thinking of busiest->group_weight? > > > > > > > > I am, because as it is right now that if condition > > > > looks like it might never be true for imbalance_pct 115. > > > > > > > > Presumably you put that check there for a reason, and > > > > would like it to trigger when the amount by which a node > > > > is busy is less than 2 * (imbalance_pct - 100). > > > > > > > > > If three per cent can make any sense in helping determine utilisation > > > low then the busy load has to meet > > > > > > busiest->sum_nr_running < max(3, cpus in the node / 32); > > > > > > > Why 3% and why would the low utilisation cut-off depend on the number of > > But in the same way, why only 6 tasks ? which is the value with > default imbalance_pct ? I laid this out in another mail sent based on timing so I would repeat myself other than to say it's predictable across machines. > I expect a machine with 128 CPUs to have more bandwidth than a machine > with only 32 CPUs and as a result to allow more imbalance > I would expect so too with the caveat that there can be more memory channels within a node so positioning does matter but we can't take everything into account without creating a convulated mess. Worse, we have no decent method for estimating bandwidth as it depends on the reference pattern and scheduler domains do not currently take memory channels into account. Maybe they should but that's a whole different discussion that we do not want to get into right now. > Maybe the number of running tasks (or idle cpus) is not the right > metrics to choose if we can allow a small degree of imbalance because > this doesn't take into account it wether the tasks are long running or > short running ones > I think running tasks at least is the least bad metric. idle CPUs gets caught up in corner cases with bindings and util_avg can be skewed by outliers. Running tasks is a sensible starting point until there is a concrete use case that shows it is unworkable. Lets see what you think of the other untested patch I posted that takes the group weight and child domain weight into account. -- Mel Gorman SUSE Labs