Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3581673ybi; Mon, 29 Jul 2019 09:00:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqy9bSNlrVDYkoSuR4UK8FlIy6I7Gej9drFJwqwh+I2S+4V7x6Y6iikWT/B/I+7IkAuWgAV3 X-Received: by 2002:a17:90a:4806:: with SMTP id a6mr113233177pjh.38.1564416043499; Mon, 29 Jul 2019 09:00:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564416043; cv=none; d=google.com; s=arc-20160816; b=o2PQn1Quzo3aNCWh7j5m7tiNVRKqFs5yHKREOfyiF5RpBBWNaHFWfARvZ1/dKbWui+ WSlWJM8KGVTnz1isOJ86tM3HYRqCrVd1U0TwKTMHaKF3xQt8iFH/R018YEX2svTtRbcy 9XMjvTc4qanHCrDeR19BY2bisXkaiZCTlRFqyhe+NOGqb78I5he9R5gnTWq+QcmsVjb9 LavZo/L1x4eqPyXkCMJC4jRjx75ENlYJWnhsc9XbXhI4WMd9++EStD274CM00/AjyXFE jhmd96tUM3UiWIsTi1MTLbxq6kmpS4CS1bmxLYhK7tfw+TNqm30yuJm899mHv5SCYI4+ GaAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=MZIUVHdjLMQukRWc0f7u5fJGuVGb54YkwkBwhwkBOCo=; b=z+FS6KP7lhA/U5AUVC3unnnia5tM5M6BAH/OUPqsJX0sfbXJHbBlE2tlGjbuMNNgII EA/tL9c+lySsGPCDj8uJ25DyWE3fCWJd/XFgJlyMJqjw9T+Cr0U87JwK9LntHmE9ALio ETM8DSOIiBDtIUZf32HhyNNB1rzPVdWc4NT9mHaSCtU8Uk3eJcLcG3MaEfgrs/onZHTa St7lDT+L55uY/UhiQghp81E/yr5bMK0LoqUGOAf8MorHVATQclLbx7li3UEEDjFR/JGR hz6IaAamhCbv4wcaKWliatDuDmQpLzUcTk4tkIkDb5QLT7P2u+qA94XcD1i/gTuH+wYJ 8LbQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeblueprint-co-uk.20150623.gappssmtp.com header.s=20150623 header.b=rLXHXVSo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d25si28308396pge.301.2019.07.29.09.00.28; Mon, 29 Jul 2019 09:00:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeblueprint-co-uk.20150623.gappssmtp.com header.s=20150623 header.b=rLXHXVSo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727617AbfG2MQY (ORCPT + 99 others); Mon, 29 Jul 2019 08:16:24 -0400 Received: from mail-ed1-f66.google.com ([209.85.208.66]:33142 "EHLO mail-ed1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726818AbfG2MQY (ORCPT ); Mon, 29 Jul 2019 08:16:24 -0400 Received: by mail-ed1-f66.google.com with SMTP id i11so59129324edq.0 for ; Mon, 29 Jul 2019 05:16:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=codeblueprint-co-uk.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=MZIUVHdjLMQukRWc0f7u5fJGuVGb54YkwkBwhwkBOCo=; b=rLXHXVSoA/vHF6iJ0hCyWlDw84arO1P1aKv/LcwtKun9ZKApSzDsRFaLhmAnPrYNsY 3Pag5W4aiGUAbsFcW+bQXDgCo5THCMvaywcGhqf7lFRkkNd8we47tukoN64Bf8tO6/W0 VRVOS9FF4P/AggwZNVEwZhKztJ9HKzFUPQKLdBXQtsQ9IuxQvRDJvLtM4xiq8jmugzui kk0RfDFT9mLvKbs7SCJvllTOez7+FejB3Mh1InBkQLGDs46xdTl23b9Sw8iOpvfJxUuQ KivoVMG+JfxzLpmhhoLIkNfaiWBrpXZ8Fq1WzaYwMeyngeD4hKu1vAr1PC2Y608jY6Xq 1GAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=MZIUVHdjLMQukRWc0f7u5fJGuVGb54YkwkBwhwkBOCo=; b=YQOzBEPfpEB6XfuYuY5haw+uMVQh6fSm+QwjQmIhztjW67V9DZsiF1dIuYPe+lQqAt QGIDBTxrQ9SBazFO4MGQ/REUNoPquwk7SQyZmHdxMnNuP2Gbf9R4oLa0flvBDVfzxWBY Nsy3ZuAtC1XSsGCNRvppM4LGr8LO/crndIfruh7KvFhKuaAAY5X4KMtgQwUwhCakgWU0 cs2kM0icIPPHm9eUFXo341U5jHKGDJL88s7nXmc3wUONmYCs2TqenIBnKPWiXqVT3kqh fgm+qggrek9iWD58Dlur9ft12OXhlR/c4FuYJ3ODvwS30olNOQ/3pFbRQcMKmdmEiuKr qiOg== X-Gm-Message-State: APjAAAVCEYLXkaVho+dSo62qLBgovHyGsOP3C553PwcGWt4LxHG/06lA FCe3d/Mga/YGT5ltVk7kM8I= X-Received: by 2002:a50:92cd:: with SMTP id l13mr93556388eda.136.1564402582487; Mon, 29 Jul 2019 05:16:22 -0700 (PDT) Received: from localhost (5ec096bd.skybroadband.com. [94.192.150.189]) by smtp.gmail.com with ESMTPSA id j30sm15833311edb.8.2019.07.29.05.16.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jul 2019 05:16:21 -0700 (PDT) Date: Mon, 29 Jul 2019 13:16:20 +0100 From: Matt Fleming To: "Suthikulpanit, Suravee" Cc: Peter Zijlstra , "linux-kernel@vger.kernel.org" , Mel Gorman , "Lendacky, Thomas" , Borislav Petkov Subject: Re: [PATCH v3] sched/topology: Improve load balancing on AMD EPYC Message-ID: <20190729121620.GD6909@codeblueprint.co.uk> References: <20190723104830.26623-1-matt@codeblueprint.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 25 Jul, at 04:37:06PM, Suthikulpanit, Suravee wrote: > > I am testing this patch on the Linux-5.2, and I actually do not > notice difference pre vs post patch. > > Besides the case above, I have also run an experiment with > a different number of threads across two sockets: > > (Note: I only focus on thread0 of each core.) > > sXnY = Socket X Node Y > > * s0n0 + s0n1 + s1n0 + s1n1 > numactl -C 0-15,32-47 ./spinner 32 > > * s0n2 + s0n3 + s1n2 + s1n3 > numactl -C 16-31,48-63 ./spinner 32 > > * s0 + s1 > numactl -C 0-63 ./spinner 64 > > My observations are: > > * I still notice improper load-balance on one of the task initially > for a few seconds before they are load-balanced correctly. > > * It is taking longer to load balance w/ more number of tasks. > > I wonder if you have tried with a different kernel base? It was tested with one of the 5.2 -rc kernels. I'll take another look at this behaviour, but for the benefit of LKML readers, here's the summary I gave before. It's specific to using cgroups to partitions tasks: It turns out there's a secondary issue to do with how run queue load averages are compared between sched groups. Load averages for a sched_group (a group within a domain) are effectively "scaled" by the number of CPUs in that group. This has a direct influence on how quickly load ramps up in a group. What's happening on my system when running with $(numactl -C 0-7,32-39) is that the load for the top NUMA sched_domain (domain4) is scaling the load by 64 CPUs -- even though the workload can't use all 64 due to scheduler affinity. So because the load balancer thinks there's plenty of room left to run tasks, it doesn't balance very well across sockets even with the SD_BALANCE_FORK flag. This super quick and ugly patch, which caps the number of CPUs at 8, gets both sockets used by fork() on my system. ---->8---- diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 40bd1e27b1b7..9444c34d038c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5791,6 +5791,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int imbalance_scale = 100 + (sd->imbalance_pct-100)/2; unsigned long imbalance = scale_load_down(NICE_0_LOAD) * (sd->imbalance_pct-100) / 100; + unsigned long capacity; if (sd_flag & SD_BALANCE_WAKE) load_idx = sd->wake_idx; @@ -5835,10 +5836,15 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, } /* Adjust by relative CPU capacity of the group */ + capacity = group->sgc->capacity; + + if (capacity > (SCHED_CAPACITY_SCALE * 8)) + capacity = SCHED_CAPACITY_SCALE * 8; + avg_load = (avg_load * SCHED_CAPACITY_SCALE) / - group->sgc->capacity; + capacity; runnable_load = (runnable_load * SCHED_CAPACITY_SCALE) / - group->sgc->capacity; + capacity; if (local_group) { this_runnable_load = runnable_load; ----8<---- There's still an issue with the active load balancer kicking in after a few seconds, but I suspect that is related to the use of group capacity elsewhere in the load balancer code (like update_sg_lb_stats()). -- Matt Fleming SUSE Performance Team