Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp704322pxb; Thu, 21 Oct 2021 07:58:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxcm6CZn5211Zlemq10TL8UQESWZnBJL4eusKpNmo3/4ZtX18i91rd0gYXD9ml6xjZpmUSJ X-Received: by 2002:a17:907:e94:: with SMTP id ho20mr8025411ejc.243.1634828332920; Thu, 21 Oct 2021 07:58:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634828332; cv=none; d=google.com; s=arc-20160816; b=m4zEbUYd9UC66ixDSURa7EPz9vFZMDouaXSanSYvFFQG7fR1g8GyD6BbBS1Rlda4ki jyhT3G6xahHOYMwY5+Vw7Z5JO77PtayDcaUFHr5Uv6A+6YaDNqDvKxjixbb3YfRlIoy3 8fK3BsRkejEt28KwVG4Ob7QdyhvbnJYXo6HFfz+VwVB6nGVMyGSv7tk9QK1iFdwq79ub wp0hN5XSp37zsj0QQRSN/Ub/W4feClWgwsXkm3ESzmim5oFex8vA87Bu2CuSLegfgxqH XRTx55GtUMyipDKppnWGxkSrC9rgCWG4tsqs2WRFjr26pmft3dP8aODhmr6mmSqAI32b Q41w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=XHtS4yJSC6QLS1/6rNSqyAYE9eFEyZKup/SgeIzeaVU=; b=USC1HE1YhPtLCn/9yP1gRmvHq5wfkv9DzD8tnVTGCSYybAQOqLecbE+/WdTD5Mnuzq nPR9AxYahgPuUwE/uHNqICtI2wXegHeXFoSdtPMjrRwNQ1Lg4oqbxjZFVwa6Vwf3AY4+ ORnSVsEXVqJpq6nFlp009c4tF7rq+sHxgn+VOQy2iUCYqUqsgFDRqxVSypSMtL211flp IbE2dPSCXX/Kcskzc7q+K0iQ6X4hmOOr34qhCFUE3dPERHeXnH5ntjT8+ZECipvvU7fl z8pGLLtb9IVKD59sGXha8csoWTYNfkVMJyrKZ7hCR9eye7ADMJBNj3XnjqtL4Ukk4/3P BFuw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j9si10535205edw.49.2021.10.21.07.58.29; Thu, 21 Oct 2021 07:58:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231611AbhJUO6m (ORCPT + 99 others); Thu, 21 Oct 2021 10:58:42 -0400 Received: from outbound-smtp53.blacknight.com ([46.22.136.237]:56429 "EHLO outbound-smtp53.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231623AbhJUO6l (ORCPT ); Thu, 21 Oct 2021 10:58:41 -0400 Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp53.blacknight.com (Postfix) with ESMTPS id 5F74BFB020 for ; Thu, 21 Oct 2021 15:56:24 +0100 (IST) Received: (qmail 10194 invoked from network); 21 Oct 2021 14:56:24 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.17.29]) by 81.17.254.9 with ESMTPA; 21 Oct 2021 14:56:24 -0000 From: Mel Gorman To: Peter Zijlstra Cc: Ingo Molnar , Vincent Guittot , Valentin Schneider , Aubrey Li , Barry Song , Mike Galbraith , Srikar Dronamraju , LKML , Mel Gorman Subject: [PATCH 1/2] sched/fair: Couple wakee flips with heavy wakers Date: Thu, 21 Oct 2021 15:56:02 +0100 Message-Id: <20211021145603.5313-2-mgorman@techsingularity.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211021145603.5313-1-mgorman@techsingularity.net> References: <20211021145603.5313-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch mitigates a problem where wake_wide() allows a heavy waker (e.g. X) to stack an excessive number of wakees on the same CPU. This is due to the cpu_load check in wake_affine_weight. As noted by the original patch author (Mike Galbraith)[1]; Between load updates, X, or any other waker of many, can stack wakees to a ludicrous depth. Tracing kbuild vs firefox playing a youtube clip, I watched X stack 20 of the zillion firefox minions while their previous CPUs all had 1 lousy task running but a cpu_load() higher than the cpu_load() of X's CPU. Most of those prev_cpus were where X had left them when it migrated. Each and every crazy depth migration was wake_affine_weight() deciding we should pull. Parahrasing Mike's test results from the patch. With make -j8 running along with firefox with two tabs, one containing youtube's suggestions of stuff, the other a running clip, if the idle tab in focus, and don't drive mouse around, flips decay enough for wake_wide() to lose interest, but just wiggle the mouse, and it starts waking wide. Focus on the running clip, and it continuously wakes wide.   The end result is that heavy wakers are less likely to stack tasks and, depending on the workload, reduce migrations. From additional tests on various servers, the impact is machine dependant but generally this patch improves the situation. hackbench-process-pipes 5.15.0-rc3 5.15.0-rc3 vanilla sched-wakeeflips-v1r1 Amean 1 0.3667 ( 0.00%) 0.3890 ( -6.09%) Amean 4 0.5343 ( 0.00%) 0.5217 ( 2.37%) Amean 7 0.5300 ( 0.00%) 0.5387 ( -1.64%) Amean 12 0.5737 ( 0.00%) 0.5443 ( 5.11%) Amean 21 0.6727 ( 0.00%) 0.6487 ( 3.57%) Amean 30 0.8583 ( 0.00%) 0.8033 ( 6.41%) Amean 48 1.3977 ( 0.00%) 1.2400 * 11.28%* Amean 79 1.9790 ( 0.00%) 1.8200 * 8.03%* Amean 110 2.8020 ( 0.00%) 2.5820 * 7.85%* Amean 141 3.6683 ( 0.00%) 3.2203 * 12.21%* Amean 172 4.6687 ( 0.00%) 3.8200 * 18.18%* Amean 203 5.2183 ( 0.00%) 4.3357 * 16.91%* Amean 234 6.1077 ( 0.00%) 4.8047 * 21.33%* Amean 265 7.1313 ( 0.00%) 5.1243 * 28.14%* Amean 296 7.7557 ( 0.00%) 5.5940 * 27.87%* While different machines showed different results, in general there were much less CPU migrations of tasks tbench4 5.15.0-rc3 5.15.0-rc3 vanilla sched-wakeeflips-v1r1 Hmean 1 824.05 ( 0.00%) 802.56 * -2.61%* Hmean 2 1578.49 ( 0.00%) 1645.11 * 4.22%* Hmean 4 2959.08 ( 0.00%) 2984.75 * 0.87%* Hmean 8 5080.09 ( 0.00%) 5173.35 * 1.84%* Hmean 16 8276.02 ( 0.00%) 9327.17 * 12.70%* Hmean 32 15501.61 ( 0.00%) 15925.55 * 2.73%* Hmean 64 27313.67 ( 0.00%) 24107.81 * -11.74%* Hmean 128 32928.19 ( 0.00%) 36261.75 * 10.12%* Hmean 256 35434.73 ( 0.00%) 38670.61 * 9.13%* Hmean 512 50098.34 ( 0.00%) 53243.75 * 6.28%* Hmean 1024 69503.69 ( 0.00%) 67425.26 * -2.99%* Bit of a mixed bag but wins more than it loses. A new workload was added that runs a kernel build in the background -jNR_CPUS while NR_CPUS pairs of tasks run Netperf TCP_RR. The intent is to see if heavy background tasks disrupt ligher tasks multi subtest kernbench 5.15.0-rc3 5.15.0-rc3 vanilla sched-wakeeflips-v1r1 Min elsp-256 20.80 ( 0.00%) 14.89 ( 28.41%) Amean elsp-256 24.08 ( 0.00%) 20.94 ( 13.05%) Stddev elsp-256 3.32 ( 0.00%) 4.68 ( -41.16%) CoeffVar elsp-256 13.78 ( 0.00%) 22.36 ( -62.33%) Max elsp-256 29.11 ( 0.00%) 26.49 ( 9.00%) multi subtest netperf-tcp-rr 5.15.0-rc3 5.15.0-rc3 vanilla sched-wakeeflips-v1r1 Min 1 48286.26 ( 0.00%) 49101.48 ( 1.69%) Hmean 1 62894.82 ( 0.00%) 68963.51 * 9.65%* Stddev 1 7600.56 ( 0.00%) 8804.82 ( -15.84%) Max 1 78975.16 ( 0.00%) 87124.67 ( 10.32%) The variability is higher as a result of the patch but both workloads experienced improved performance. [1] https://lore.kernel.org/r/02c977d239c312de5e15c77803118dcf1e11f216.camel@gmx.de Signed-off-by: Mike Galbraith Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ff69f245b939..d00af3b97d8f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5865,6 +5865,14 @@ static void record_wakee(struct task_struct *p) } if (current->last_wakee != p) { + int min = __this_cpu_read(sd_llc_size) << 1; + /* + * Couple the wakee flips to the waker for the case where it + * doesn't accrue flips, taking care to not push the wakee + * high enough that the wake_wide() heuristic fails. + */ + if (current->wakee_flips > p->wakee_flips * min) + p->wakee_flips++; current->last_wakee = p; current->wakee_flips++; } @@ -5895,7 +5903,7 @@ static int wake_wide(struct task_struct *p) if (master < slave) swap(master, slave); - if (slave < factor || master < slave * factor) + if ((slave < factor && master < (factor>>1)*factor) || master < slave * factor) return 0; return 1; } -- 2.31.1