Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4693233ybb; Tue, 24 Mar 2020 03:37:06 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvbYdtSxX7Cfm2ZwC3p+5cGGJ8ZgzEidXIYxAt2JAnwfJmW0gltvPRpny/Kfiy7nFRLZaAy X-Received: by 2002:aca:cc81:: with SMTP id c123mr2806578oig.74.1585046226277; Tue, 24 Mar 2020 03:37:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585046226; cv=none; d=google.com; s=arc-20160816; b=GK+5HF0MtINzD4pylaVY52n8Il2+1Nsmng62LXrCtyv1WYdBJisG9PrkbMgMJWZ9nf ALI2Wo2+t9vGLOody/ilIXACWDxQX3lEqa+0JTRspmlGNT/vMa0CIKDSN3M7RoLl7hdA dwsvuR1Tmq0cY+lBuqpCyFQQNze/WshAhWRKTssOZPnL0t0JcxRi69VEwJQhYFbkCpyD QFWJkgnwMHXw+eWt3+Ja1vKuHJ19L3w7CiNo4JfoOLFYKgv19toM0Xu5ON9xT9osyAVY 3XqbKXassQSoT72r3LZyS8xtX+NVMBQnMuqV/rQInAUE/kNDEgVxBIfOQ7ykkFbzvUSE gtMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=2LHYyFGD9Er8JMcaofOCqcMt+HKxQKQvAYYRYfkCj9U=; b=NZOfUvSXZMnJXpTDgoAVXSTyPl1SAFoE7EPJRrPvbsHfBTpP8I8B58Jg+Oqow5ZyYc pMXiriyaKYohqnRdko06atse6P7Z2KZSayCjlfzKI1cVBiw2Irm3mDEPnOdsiG5XQ7FK AlJBPl3+wRVVgKIa2PxFlwNqp72Nt+U0RjZ6aib1g3f+w+N1Rlja8iT+66xpn9gGk8y5 lS8EWC2a5zUbGmyoARxICnGbLhKdLs8BAZxs90expIWcTc3nGnDowIZsbtiui3eeTfQC cHakqzdoEWdos2AjmjK5q/Xk/46FpZIwvxjvChz0VUCnUnl00pmAD4hijjJ0OHj4nFdw ItNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Y0J5xjgA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b22si6360620ots.245.2020.03.24.03.36.53; Tue, 24 Mar 2020 03:37:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Y0J5xjgA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727267AbgCXKfX (ORCPT + 99 others); Tue, 24 Mar 2020 06:35:23 -0400 Received: from mail-lj1-f195.google.com ([209.85.208.195]:36507 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727159AbgCXKfX (ORCPT ); Tue, 24 Mar 2020 06:35:23 -0400 Received: by mail-lj1-f195.google.com with SMTP id g12so18006179ljj.3 for ; Tue, 24 Mar 2020 03:35:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2LHYyFGD9Er8JMcaofOCqcMt+HKxQKQvAYYRYfkCj9U=; b=Y0J5xjgAnUnYrUo8xGTuOnthRalZYZyLc4M3fa5aGrwS3WhV5JEKs7Rp8QaynTxHPr WH/lM9Ec2U/Dnz++lSfQZvBdMbxXUUnbdFe0cMA6wxvs72X3p392jPwmg4oBLuVeIfDh 0uhPjACF0EBl+Idcbp3SFfFUc4c3PJH0cDhwvFXErP9tOAPHyofkZHiNbFtesJKXX1cH moJkxKk/MrrnYJ2AlgMaEZ1M+3DykMsA7jPUu+wxiIjpuycIbCPG39junbMgi9BQRDV8 0nQGIay3/5dB7hHS65JHEbVhYrRD9+qCskyZyL+RtzIu5JD++3LNvkPlhI7JIIRFthp4 cgUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2LHYyFGD9Er8JMcaofOCqcMt+HKxQKQvAYYRYfkCj9U=; b=eMXBilp1LRHAziFOR3aS0U1hYuqqChn0KVfYsNFYmDv9FDpo/r1ub/CLjQsPuAmy9f aPNoWMvT152yrdZYx2AWNqiObOu9I+i/Yoh0sQT8aKyVCGRw4v71OF7UG9SJmz15+woA ogHxHRRV4RgM/ELknscU8IK+742zYE0vLYMCgame+Poy89q3InnXX5/lYncGzm28Rtyk 203bL8S13guhF5dF6KmHxa6loXHDlpEhdA+p1B4xWiXuSxmI/Qtm2u0RqcPsfAcaZuGP Hdkj3QH32drSj2Ko7uSpeqzWKIQNyzorGWI9AmeJ42B2pJ+7fIq4XbKF7zoMJJHkyg4w +0MA== X-Gm-Message-State: ANhLgQ3HQ43m/Lz6WhZPleAvZQhZk3geKpgjgMFrg0sAMM7iN5be7tLW tg+O62aI59Sf0aMvyx31FcPUHP540h0fzMknv6lRSg== X-Received: by 2002:a2e:9091:: with SMTP id l17mr15600874ljg.154.1585046119623; Tue, 24 Mar 2020 03:35:19 -0700 (PDT) MIME-Version: 1.0 References: <20200320151245.21152-1-mgorman@techsingularity.net> <20200320151245.21152-5-mgorman@techsingularity.net> <20200320164432.GE3818@techsingularity.net> <20200320174304.GF3818@techsingularity.net> In-Reply-To: <20200320174304.GF3818@techsingularity.net> From: Vincent Guittot Date: Tue, 24 Mar 2020 11:35:08 +0100 Message-ID: Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary To: Mel Gorman Cc: Ingo Molnar , Peter Zijlstra , Valentin Schneider , Phil Auld , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 20 Mar 2020 at 18:43, Mel Gorman wrote: > > On Fri, Mar 20, 2020 at 05:54:57PM +0100, Vincent Guittot wrote: > > On Fri, 20 Mar 2020 at 17:44, Mel Gorman wrote: > > > > > > On Fri, Mar 20, 2020 at 04:48:39PM +0100, Vincent Guittot wrote: > > > > > --- > > > > > include/linux/sched/topology.h | 1 + > > > > > kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++--- > > > > > kernel/sched/features.h | 3 ++ > > > > > 3 files changed, 65 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h > > > > > index af9319e4cfb9..76ec7a54f57b 100644 > > > > > --- a/include/linux/sched/topology.h > > > > > +++ b/include/linux/sched/topology.h > > > > > @@ -66,6 +66,7 @@ struct sched_domain_shared { > > > > > atomic_t ref; > > > > > atomic_t nr_busy_cpus; > > > > > int has_idle_cores; > > > > > + int is_overloaded; > > > > > > > > Can't nr_busy_cpus compared to sd->span_weight give you similar status ? > > > > > > > > > > It's connected to nohz balancing and I didn't see how I could use that > > > for detecting overload. Also, I don't think it ever can be larger than > > > the sd weight and overload is based on the number of running tasks being > > > greater than the number of available CPUs. Did I miss something obvious? > > > > IIUC you try to estimate if there is a chance to find an idle cpu > > before starting the loop and scanning the domain and abort early if > > the possibility is low. > > > > if nr_busy_cpus equals to sd->span_weight it means that there is no > > free cpu so there is no need to scan > > > > Ok, I see what you are getting at but I worry there are multiple > problems there. First, the nr_busy_cpus is decremented only when a CPU > is entering idle with the tick stopped. If nohz is disabled then this > breaks, no? Secondly, a CPU can be idle but the tick not stopped if But this can be changed if that make the statistic useful > __tick_nohz_idle_stop_tick knows there is an event in the near future > so using busy_cpus, we potentially miss a sibling that was adequate > for running a task. Finally, the threshold for cutting off the search > entirely seems low. The patch marks a domain as overloaded if there are > twice as many running tasks as runqueues scanned. In that scenario, even > if tasks are rapidly switching between busy/idle, it's still unlikely > the task will go idle. When cutting off at just the fully-busy mark, we > could miss a CPU that is going idle, almost idle or is running SCHED_IDLE > tasks where are acceptable target candidates for select_idle_sibling. I > think there are too many cases where nr_busy_cpus are problematic to > make it a good alternative. I don't really like this patch because it adds yet another metrics and yet another feature which is set true by default. Also the current proposal seems a bit fragile because it uses an arbitrary ratio of 2 on an arbitrary number of CPUs. This threshold probably works in your case and your system but probably not for others and the threshold really looks like a heuristic that works for you but without any real meaning. Then, the update is done at each and every task wake up and by all CPUs in the LLC. It means that the same variable is updated simultaneously by all CPUs: one CPU can set it and the next one might clear it immediately because they haven't scanned the same CPUs. At the end, 2 threads waking up simultaneously on different CPUS, might end up using 2 different policy without any other reason than a random ordering. I agree that the concept of detecting that a LLC domain is overloaded can be useful to decide to skip searching for an idle cpu but this proposal seems to be not really generic Vincent > > -- > Mel Gorman > SUSE Labs