Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp13177pxj; Wed, 26 May 2021 14:43:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzeDoeT2xWphTybvncY2+kdIFUFaDW8UX8k/t1w/ZDSl58ir8mwHHjcbnegLARmJJhcXzhT X-Received: by 2002:a05:6402:1489:: with SMTP id e9mr332520edv.8.1622065391343; Wed, 26 May 2021 14:43:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622065391; cv=none; d=google.com; s=arc-20160816; b=L2iVWo6hAyPS6prcPBXgQIIQeVzhAbV+kktmK1JSe5ZsGMyf/3geq7ZS3p+7U7MUgV cejNLtJtzGMS4JF0t8jlAj/dbXQ7dTl3JoxC3ybP+mGHPjF2bLDnDHJMdr+ITJlHBLqe JkdI4nvi+rbkwX3CEnQOcsOHsNazG/qtUqQiQOQgrX+VZdqeRdFp7lUZAVP3a4XZQrS/ LSS0NwGZDxnpcg/ojbN38lAHfqsv9IgTj+aRDYGma/CROTnRuuEcJS6gtHci1/yWwOPz v2aM3OeJvWcRX/J0F8jNRJ7rmAVlyLIBSS7gRxcqlvdIRVWBqce2dh5NZY5CG45ZO6gp fuGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from; bh=VfoaXsA0h8FxAEgAabPVNvw6bMT+qQp7duWvwMv4Qtc=; b=qo1bsoDm05RJfgZ4qInP8ywzAdU0ehlPzh/yWGRVnTYjbQiE2NGuvMxFBYfR9Br6fD qmAHgOcBo5/RqAuqbhfDDbgFbLle0sN5PlrPB/rzoytfg5kJUVup9bv4RnhK8CCftIwN MczALHUEMjOT/C9AmI2FjJCMyjT4BP+7qADem1zbMU5vxwoKqKPJBsE7acg4Pua9FilY 1UaCYHwKrERFftpeG2EXDhhLTvIpM+mf5xW2SwcvldBYOYq4Exv2snR+WP5nW1wPxvu6 9PtAlOiy8EW1hLu9dbmzrQ787oHBi6tkRGaDXpcsTpS1m4ZXVnt38FhsxlECC401jzSA rSZw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hisilicon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id df26si56452edb.256.2021.05.26.14.42.45; Wed, 26 May 2021 14:43:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hisilicon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233791AbhEZVjz convert rfc822-to-8bit (ORCPT + 99 others); Wed, 26 May 2021 17:39:55 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:3953 "EHLO szxga07-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233550AbhEZVjy (ORCPT ); Wed, 26 May 2021 17:39:54 -0400 Received: from dggems702-chm.china.huawei.com (unknown [172.30.72.60]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4Fr44G6SSFz80wY; Thu, 27 May 2021 05:35:26 +0800 (CST) Received: from dggpeml100024.china.huawei.com (7.185.36.115) by dggems702-chm.china.huawei.com (10.3.19.179) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Thu, 27 May 2021 05:38:19 +0800 Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by dggpeml100024.china.huawei.com (7.185.36.115) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Thu, 27 May 2021 05:38:19 +0800 Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2176.012; Thu, 27 May 2021 05:38:19 +0800 From: "Song Bao Hua (Barry Song)" To: Peter Zijlstra CC: "vincent.guittot@linaro.org" , "mingo@redhat.com" , "dietmar.eggemann@arm.com" , "rostedt@goodmis.org" , "bsegall@google.com" , "mgorman@suse.de" , "valentin.schneider@arm.com" , "juri.lelli@redhat.com" , "bristot@redhat.com" , "linux-kernel@vger.kernel.org" , "guodong.xu@linaro.org" , yangyicong , tangchengchang , Linuxarm Subject: RE: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC Thread-Topic: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC Thread-Index: AQHXUhAXdUizMfoReUmU6BSpTsz4hqr1J2iAgAEbn9A= Date: Wed, 26 May 2021 21:38:19 +0000 Message-ID: <7dd00a98d6454d5e92a7d9b936d1aa1c@hisilicon.com> References: <20210526091057.1800-1-song.bao.hua@hisilicon.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.126.202.79] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Peter Zijlstra [mailto:peterz@infradead.org] > Sent: Thursday, May 27, 2021 12:16 AM > To: Song Bao Hua (Barry Song) > Cc: vincent.guittot@linaro.org; mingo@redhat.com; dietmar.eggemann@arm.com; > rostedt@goodmis.org; bsegall@google.com; mgorman@suse.de; > valentin.schneider@arm.com; juri.lelli@redhat.com; bristot@redhat.com; > linux-kernel@vger.kernel.org; guodong.xu@linaro.org; yangyicong > ; tangchengchang ; > Linuxarm > Subject: Re: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee > are already in same LLC > > > $subject is weird; sched/fair: is the right tag, and then start with a > capital letter. > > On Wed, May 26, 2021 at 09:10:57PM +1200, Barry Song wrote: > > when waker and wakee are already in the same LLC, it is pointless to worry > > about the competition caused by pulling wakee to waker's LLC domain. > > But there's more than LLC. I suppose other concerns might be about the "idle" and "load" of waker's cpu and wakee's prev_cpu. Here even though we disable wake_wide(), wake_affine() still has chance to select wakee's prev_cpu rather than pulling to waker. So disabling wake_wide() doesn't mean we will 100% pull. static int wake_affine(struct sched_domain *sd, struct task_struct *p, int this_cpu, int prev_cpu, int sync) { int target = nr_cpumask_bits; if (sched_feat(WA_IDLE)) target = wake_affine_idle(this_cpu, prev_cpu, sync); if (sched_feat(WA_WEIGHT) && target == nr_cpumask_bits) target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); if (target == nr_cpumask_bits) return prev_cpu; .. return target; } Furthermore, select_idle_sibling() can also pick wakee's prev_cpu if it is idle: static int select_idle_sibling(struct task_struct *p, int prev, int target) { ... /* * If the previous CPU is cache affine and idle, don't be stupid: */ if (prev != target && cpus_share_cache(prev, target) && (available_idle_cpu(prev) || sched_idle_cpu(prev)) && asym_fits_capacity(task_util, prev)) return prev; ... } Except those, could you please give me some clue about what else you have concerns on? > > > Signed-off-by: Barry Song > > --- > > kernel/sched/fair.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 3248e24a90b0..cfb1bd47acc3 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -6795,7 +6795,15 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, > int wake_flags) > > new_cpu = prev_cpu; > > } > > > > - want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr); > > + /* > > + * we use wake_wide to make smarter pull and avoid cruel > > + * competition because of jam-packed tasks in waker's LLC > > + * domain. But if waker and wakee have been already in > > + * same LLC domain, it seems it is pointless to depend > > + * on wake_wide > > + */ > > + want_affine = (cpus_share_cache(cpu, prev_cpu) || !wake_wide(p)) && > > + cpumask_test_cpu(cpu, p->cpus_ptr); > > } > > And no supportive numbers... Sorry for the confusion. I actually put some supportive numbers at the below thread which derived this patch: https://lore.kernel.org/lkml/bbc339cef87e4009b6d56ee37e202daf@hisilicon.com/ when I tried to give Dietmar some pgbench data in that thread, I found in kunpeng920, while software ran in one die/numa with 24cores sharing LLC, disabling wake_wide() brought the best pgbench result. llc_as_factor don't_use_wake_wide Hmean 1 10869.27 ( 0.00%) 10723.08 * -1.34%* Hmean 8 19580.59 ( 0.00%) 19469.34 * -0.57%* Hmean 12 29643.56 ( 0.00%) 29520.16 * -0.42%* Hmean 24 43194.47 ( 0.00%) 43774.78 * 1.34%* Hmean 32 40163.23 ( 0.00%) 40742.93 * 1.44%* Hmean 48 42249.29 ( 0.00%) 48329.00 * 14.39%* The test was done by https://github.com/gormanm/mmtests and ./run-mmtests.sh --config ./configs/config-db-pgbench-timed-ro-medium test_tag Commit "sched: Implement smarter wake-affine logic" https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62470419 says pgbench can improve by wake_wide(), but I've actually seen the opposite result while waker and wakee are already in one LLC. Not quite sure if it is specific to kunpeng920, perhaps I need to run the same test on some x86 machines. Thanks Barry