Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp204561pxb; Fri, 29 Oct 2021 08:21:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzfvoZq5KUaDOADZjD7y4aj5RqLvbaaWuejQj3PZ4KYGHjLs3DrxUMAFPmNZzw4flTCrXKd X-Received: by 2002:a05:6e02:15c4:: with SMTP id q4mr8069610ilu.18.1635520898382; Fri, 29 Oct 2021 08:21:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635520898; cv=none; d=google.com; s=arc-20160816; b=lB0auMAdjdiKJoXbj5xf8WEfuAnh/bG2BzvLhMRrhEiXpDi2Phe8hnS3XMIfTIx53N 9B6Hqoq0j4AaFjY0xGJOCXguqa7l811TBYhXsbLIW6mUnDpcD9dO+CgVK/KQnFD7aLAF PWLJQwbgSs3FvYG76dvRkQKP1w6lJ7c/Ev5hMeEl8CexQx7eE/ZdEhVI3/HVlt/vcOHp jX4OZ7lmEOmyTyXWVWkwIB+HZaVbZUnYokng94qUMoTszqiVUbAFrod3gPTJ5W3KqQV9 DTSOzC0LWOE2PTrR/t6zUAAXyDBGpWWnzqpAgOASeEf18yDv4XRjKlu6euEda2WM8H3/ cjBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=bm3czZvxnVujCnIZ5GB44Fo107YwPmJEdeZR9VzJYFc=; b=uh1av5t3MNzjklFJ905RU6l8NFYXkLcgD1PMY3C1z3h3OnVG9D9pxOQYIvvyTCY8tF voPVReeAJ7FgkW0iw+nmykrx9tihPJBrgizpSwXMH2hrBDb3BGEX6BIGSkuLi+Py51xH ZuFtd7k2KBVC27EiC+qQaxBC+rJWgbTgiv9yHre731F/rNjrUdYdGqb9/xv19Xl/B4QX 2CVmOIQ3h8YZhED/vQBmKAyy2K2gLO+iJgdAkMDydMPsqLABjAITMhEtMAXx1Rqi2tyg 3uL9zOivj5m/EM6D2wuTglnxZU3XXTeHQc2Cl0a/QcoExqzoykiUL4YTQ31q9LuwquPI 5wcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=iIpq7FZQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u18si10473459jat.31.2021.10.29.08.21.25; Fri, 29 Oct 2021 08:21:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=iIpq7FZQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229971AbhJ2PUV (ORCPT + 99 others); Fri, 29 Oct 2021 11:20:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229957AbhJ2PUU (ORCPT ); Fri, 29 Oct 2021 11:20:20 -0400 Received: from mail-yb1-xb2d.google.com (mail-yb1-xb2d.google.com [IPv6:2607:f8b0:4864:20::b2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B61F1C061570 for ; Fri, 29 Oct 2021 08:17:51 -0700 (PDT) Received: by mail-yb1-xb2d.google.com with SMTP id v138so19542650ybb.8 for ; Fri, 29 Oct 2021 08:17:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bm3czZvxnVujCnIZ5GB44Fo107YwPmJEdeZR9VzJYFc=; b=iIpq7FZQ64iNhndh8cqbcroi7LVpRlVTJB6RFvHQ/YnM5RT2Dl+ufYqLzeKP5x2cek Rk3lD4lGDLqekVH0+Ti/Je5/9GooVOGBAGQLU5J0p+zsVZqLc3v8YYCUmn3KGJY0+95M aOnlws1Ny9XGpwp98UVGPyFwkQkYq5BqhUEITM8bvydm45Jtfxet2UKtf8KQednzX+aT 52XehACy69jQPDFm7NrqA8Dh7uYCGwMLtztmEK1OQrZ95P1v+RmMbk8AW9BHXHAyTJsY Y4XKERSwRD9zH+9geCrLNBaW63XsyrZZYCgqp2desQqSNEkoUBY+SDyhaAhNDto1hbWK nn9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bm3czZvxnVujCnIZ5GB44Fo107YwPmJEdeZR9VzJYFc=; b=a9vrpTzAwwClBJVtHyz1tQVRHU5eUKiRXYmE0u28wOUoMmit1MPzhjYzLPJINV5Sog R6cKVRufrlUOjpqAXteRsjZ8AIpYDyrP7ibPrZjqB3FZ4UhEJrDM8fllVtwLiHrXt30L D1FbTULZiwciW8u38Fdt+k2rGAHF9lTiW4RgD9ABDsK/7uFq/CLvrF93+Bq+lehDoai/ lKXy5NAyMUDStTkwA4+ts1Plifq9xIvc8QJid7lDCUgYXfiPXFJMGovU4B9moxPO0WZg SjqenvmzisIOZQbPMFjQar+xRwHnZ8Co3jmopduWelSp6TylwWvldIbs9zC7uvg+WUZC N7qg== X-Gm-Message-State: AOAM530ZWVFxyN9nCUTv8UTPtXaELfNkmWhb8ZhI4OWDwvkDHjmjDEnP xJki8SHD5qsH2jqmvzYxn+CEUe+uGPpIilttkr6rsA== X-Received: by 2002:a05:6902:1148:: with SMTP id p8mr14084454ybu.402.1635520670889; Fri, 29 Oct 2021 08:17:50 -0700 (PDT) MIME-Version: 1.0 References: <20211028094834.1312-1-mgorman@techsingularity.net> <20211028094834.1312-2-mgorman@techsingularity.net> In-Reply-To: <20211028094834.1312-2-mgorman@techsingularity.net> From: Vincent Guittot Date: Fri, 29 Oct 2021 17:17:38 +0200 Message-ID: Subject: Re: [PATCH 1/2] sched/fair: Couple wakee flips with heavy wakers To: Mel Gorman Cc: Peter Zijlstra , Ingo Molnar , Valentin Schneider , Aubrey Li , Barry Song , Mike Galbraith , Srikar Dronamraju , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 28 Oct 2021 at 11:48, Mel Gorman wrote: > > This patch mitigates a problem where wake_wide() allows a heavy waker > (e.g. X) to stack an excessive number of wakees on the same CPU. This > is due to the cpu_load check in wake_affine_weight. As noted by the > original patch author (Mike Galbraith)[1]; > > Between load updates, X, or any other waker of many, can stack > wakees to a ludicrous depth. Tracing kbuild vs firefox playing a > youtube clip, I watched X stack 20 of the zillion firefox minions > while their previous CPUs all had 1 lousy task running but a > cpu_load() higher than the cpu_load() of X's CPU. Most of those > prev_cpus were where X had left them when it migrated. Each and > every crazy depth migration was wake_affine_weight() deciding we > should pull. > > Parahrasing Mike's test results from the patch. > > With make -j8 running along with firefox with two tabs, one > containing youtube's suggestions of stuff, the other a running > clip, if the idle tab in focus, and don't drive mouse around, > flips decay enough for wake_wide() to lose interest, but just > wiggle the mouse, and it starts waking wide. Focus on the running > clip, and it continuously wakes wide. > > The end result is that heavy wakers are less likely to stack tasks and, > depending on the workload, reduce migrations. > > From additional tests on various servers, the impact is machine dependant > but generally this patch improves the situation. > > hackbench-process-pipes > 5.15.0-rc3 5.15.0-rc3 > vanilla sched-wakeeflips-v1r1 > Amean 1 0.3667 ( 0.00%) 0.3890 ( -6.09%) > Amean 4 0.5343 ( 0.00%) 0.5217 ( 2.37%) > Amean 7 0.5300 ( 0.00%) 0.5387 ( -1.64%) > Amean 12 0.5737 ( 0.00%) 0.5443 ( 5.11%) > Amean 21 0.6727 ( 0.00%) 0.6487 ( 3.57%) > Amean 30 0.8583 ( 0.00%) 0.8033 ( 6.41%) > Amean 48 1.3977 ( 0.00%) 1.2400 * 11.28%* > Amean 79 1.9790 ( 0.00%) 1.8200 * 8.03%* > Amean 110 2.8020 ( 0.00%) 2.5820 * 7.85%* > Amean 141 3.6683 ( 0.00%) 3.2203 * 12.21%* > Amean 172 4.6687 ( 0.00%) 3.8200 * 18.18%* > Amean 203 5.2183 ( 0.00%) 4.3357 * 16.91%* > Amean 234 6.1077 ( 0.00%) 4.8047 * 21.33%* > Amean 265 7.1313 ( 0.00%) 5.1243 * 28.14%* > Amean 296 7.7557 ( 0.00%) 5.5940 * 27.87%* > > While different machines showed different results, in general > there were much less CPU migrations of tasks > > tbench4 > 5.15.0-rc3 5.15.0-rc3 > vanilla sched-wakeeflips-v1r1 > Hmean 1 824.05 ( 0.00%) 802.56 * -2.61%* > Hmean 2 1578.49 ( 0.00%) 1645.11 * 4.22%* > Hmean 4 2959.08 ( 0.00%) 2984.75 * 0.87%* > Hmean 8 5080.09 ( 0.00%) 5173.35 * 1.84%* > Hmean 16 8276.02 ( 0.00%) 9327.17 * 12.70%* > Hmean 32 15501.61 ( 0.00%) 15925.55 * 2.73%* > Hmean 64 27313.67 ( 0.00%) 24107.81 * -11.74%* > Hmean 128 32928.19 ( 0.00%) 36261.75 * 10.12%* > Hmean 256 35434.73 ( 0.00%) 38670.61 * 9.13%* > Hmean 512 50098.34 ( 0.00%) 53243.75 * 6.28%* > Hmean 1024 69503.69 ( 0.00%) 67425.26 * -2.99%* > > Bit of a mixed bag but wins more than it loses. > > A new workload was added that runs a kernel build in the background > -jNR_CPUS while NR_CPUS pairs of tasks run Netperf TCP_RR. The > intent is to see if heavy background tasks disrupt ligher tasks > > multi subtest kernbench > 5.15.0-rc3 5.15.0-rc3 > vanilla sched-wakeeflips-v1r1 > Min elsp-256 20.80 ( 0.00%) 14.89 ( 28.41%) > Amean elsp-256 24.08 ( 0.00%) 20.94 ( 13.05%) > Stddev elsp-256 3.32 ( 0.00%) 4.68 ( -41.16%) > CoeffVar elsp-256 13.78 ( 0.00%) 22.36 ( -62.33%) > Max elsp-256 29.11 ( 0.00%) 26.49 ( 9.00%) > > multi subtest netperf-tcp-rr > 5.15.0-rc3 5.15.0-rc3 > vanilla sched-wakeeflips-v1r1 > Min 1 48286.26 ( 0.00%) 49101.48 ( 1.69%) > Hmean 1 62894.82 ( 0.00%) 68963.51 * 9.65%* > Stddev 1 7600.56 ( 0.00%) 8804.82 ( -15.84%) > Max 1 78975.16 ( 0.00%) 87124.67 ( 10.32%) > > The variability is higher as a result of the patch but both workloads > experienced improved performance. > > SpecJBB 2005 is a slightly more realistic workload with multiple > communicating Java threads > > specjbb > 5.15.0-rc3 5.15.0-rc3 > vanilla sched-wakeeflips-v1r1 > Hmean tput-1 50044.48 ( 0.00%) 53969.00 * 7.84%* > Hmean tput-2 106050.31 ( 0.00%) 113580.78 * 7.10%* > Hmean tput-3 156701.44 ( 0.00%) 164857.00 * 5.20%* > Hmean tput-4 196538.75 ( 0.00%) 218373.42 * 11.11%* > Hmean tput-5 247566.16 ( 0.00%) 267173.09 * 7.92%* > Hmean tput-6 284981.46 ( 0.00%) 311007.14 * 9.13%* > Hmean tput-7 328882.48 ( 0.00%) 359373.89 * 9.27%* > Hmean tput-8 366941.24 ( 0.00%) 393244.37 * 7.17%* > Hmean tput-9 402386.74 ( 0.00%) 433010.43 * 7.61%* > Hmean tput-10 437551.05 ( 0.00%) 475756.08 * 8.73%* > Hmean tput-11 481349.41 ( 0.00%) 519824.54 * 7.99%* > Hmean tput-12 533148.45 ( 0.00%) 565070.21 * 5.99%* > Hmean tput-13 570563.97 ( 0.00%) 609499.06 * 6.82%* > Hmean tput-14 601117.97 ( 0.00%) 647876.05 * 7.78%* > Hmean tput-15 639096.38 ( 0.00%) 690854.46 * 8.10%* > Hmean tput-16 682644.91 ( 0.00%) 722826.06 * 5.89%* > Hmean tput-17 732248.96 ( 0.00%) 758805.17 * 3.63%* > Hmean tput-18 762771.33 ( 0.00%) 791211.66 * 3.73%* > Hmean tput-19 780582.92 ( 0.00%) 819064.19 * 4.93%* > Hmean tput-20 812183.95 ( 0.00%) 836664.87 * 3.01%* > Hmean tput-21 821415.48 ( 0.00%) 833734.23 ( 1.50%) > Hmean tput-22 815457.65 ( 0.00%) 844393.98 * 3.55%* > Hmean tput-23 819263.63 ( 0.00%) 846109.07 * 3.28%* > Hmean tput-24 817962.95 ( 0.00%) 839682.92 * 2.66%* > Hmean tput-25 807814.64 ( 0.00%) 841826.52 * 4.21%* > Hmean tput-26 811755.89 ( 0.00%) 838543.08 * 3.30%* > Hmean tput-27 799341.75 ( 0.00%) 833487.26 * 4.27%* > Hmean tput-28 803434.89 ( 0.00%) 829022.50 * 3.18%* > Hmean tput-29 803233.25 ( 0.00%) 826622.37 * 2.91%* > Hmean tput-30 800465.12 ( 0.00%) 824347.42 * 2.98%* > Hmean tput-31 791284.39 ( 0.00%) 791575.67 ( 0.04%) > Hmean tput-32 781930.07 ( 0.00%) 805725.80 ( 3.04%) > Hmean tput-33 785194.31 ( 0.00%) 804795.44 ( 2.50%) > Hmean tput-34 781325.67 ( 0.00%) 800067.53 ( 2.40%) > Hmean tput-35 777715.92 ( 0.00%) 753926.32 ( -3.06%) > Hmean tput-36 770516.85 ( 0.00%) 783328.32 ( 1.66%) > Hmean tput-37 758067.26 ( 0.00%) 772243.18 * 1.87%* > Hmean tput-38 764815.45 ( 0.00%) 769156.32 ( 0.57%) > Hmean tput-39 757885.41 ( 0.00%) 757670.59 ( -0.03%) > Hmean tput-40 750140.15 ( 0.00%) 760739.13 ( 1.41%) > > [1] https://lore.kernel.org/r/02c977d239c312de5e15c77803118dcf1e11f216.camel@gmx.de > > Signed-off-by: Mike Galbraith > Signed-off-by: Mel Gorman > --- > kernel/sched/fair.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index ff69f245b939..d00af3b97d8f 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5865,6 +5865,14 @@ static void record_wakee(struct task_struct *p) > } > > if (current->last_wakee != p) { > + int min = __this_cpu_read(sd_llc_size) << 1; > + /* > + * Couple the wakee flips to the waker for the case where it > + * doesn't accrue flips, taking care to not push the wakee > + * high enough that the wake_wide() heuristic fails. > + */ > + if (current->wakee_flips > p->wakee_flips * min) > + p->wakee_flips++; I have a hard time understanding the rationale behind these changes and the one below. Could you provide more details about why to increase p->wakee_flips here ? Also would be good to add such explanation in the commit message > current->last_wakee = p; > current->wakee_flips++; > } > @@ -5895,7 +5903,7 @@ static int wake_wide(struct task_struct *p) > > if (master < slave) > swap(master, slave); > - if (slave < factor || master < slave * factor) > + if ((slave < factor && master < (factor>>1)*factor) || master < slave * factor) > return 0; > return 1; > } > -- > 2.31.1 >