Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp8759559ybi; Tue, 23 Jul 2019 14:40:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqwYiMzK0fP8L2mgHDplkYkR/NeXCMwAJDZP8CNQM6ksxaKgIVe3lXF6+8pUpgBdNl+Jpc99 X-Received: by 2002:a63:6c4:: with SMTP id 187mr73726160pgg.401.1563918029741; Tue, 23 Jul 2019 14:40:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563918029; cv=none; d=google.com; s=arc-20160816; b=cnFUG9mPfrmafHamxu2MTFKoRrg2CB7P8os9Dq49AvwKF22meesKlqktP7XaHx6Xfa 9x54Ysh1h1c3sziU6vN4SGuH5+ePsRVRq/BCSvaSN1/XZWCJSjiqpGdkIEkXtfiWf4Kd 3+yzDaU78btn7ICd+c/rLLgjFEJKbXS5I+fm9nLxHqBOvpfTqAu5qeynf4MsiBX/YpFn 8HTWe72e7FVdjaWSn7yRLQ3HhyPSkufa6Ec/FrqH/Jrbhr4MTf//pzMTbhjocngT7tgO 7lO8/YDUIDT0Sa5NfnAQ+yRSLgS/+NWuurb4IOAL7LJeVVxoSzEDXaYMRtSnn/UjP+9N G0HA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Fl2c1/Xsio32wFak7QbSvf95ZUcICkiEY29y75IwDsY=; b=LJGv0Q2RXa+A1KnbuuFhw9lgVS20iXSrUhGSRSVMZBAVIp8jI6h0S7PY/lmoZNW9/v zbpdFjLEIdqzbjmSgJXhd/YFZtZli9Rn3hXV20BVz5LlhfwKFr8fUyC5gkXEG5JDZ2d9 N2ioCvUvdBOyBvxRDbfjQCjhyBwW7VThogE2eIRVTTvhKVAUXkbc6S2JrEG2LKsMTxNb 5h0LKC2LLcOx0DL1OKW0ajJX2lQv91787VEp/lYjc4pywdn/i1kBf7oh13iUng2aCaWz kBXxjYzpsbx3x9tYKfh11ugE4rSt150//hFFusHSpzvIieA/issiHypQnOGXE0dFj+td NbiA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e23si12356203pgv.528.2019.07.23.14.40.13; Tue, 23 Jul 2019 14:40:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730765AbfGWLmx (ORCPT + 99 others); Tue, 23 Jul 2019 07:42:53 -0400 Received: from outbound-smtp35.blacknight.com ([46.22.139.218]:45974 "EHLO outbound-smtp35.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726575AbfGWLmx (ORCPT ); Tue, 23 Jul 2019 07:42:53 -0400 Received: from mail.blacknight.com (unknown [81.17.255.152]) by outbound-smtp35.blacknight.com (Postfix) with ESMTPS id 1FBC2E69 for ; Tue, 23 Jul 2019 12:42:51 +0100 (IST) Received: (qmail 12414 invoked from network); 23 Jul 2019 11:42:50 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.21.36]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 23 Jul 2019 11:42:50 -0000 Date: Tue, 23 Jul 2019 12:42:48 +0100 From: Mel Gorman To: Matt Fleming Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, "Suthikulpanit, Suravee" , "Lendacky, Thomas" , Borislav Petkov Subject: Re: [PATCH v3] sched/topology: Improve load balancing on AMD EPYC Message-ID: <20190723114248.GJ24383@techsingularity.net> References: <20190723104830.26623-1-matt@codeblueprint.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20190723104830.26623-1-matt@codeblueprint.co.uk> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 23, 2019 at 11:48:30AM +0100, Matt Fleming wrote: > SD_BALANCE_{FORK,EXEC} and SD_WAKE_AFFINE are stripped in sd_init() > for any sched domains with a NUMA distance greater than 2 hops > (RECLAIM_DISTANCE). The idea being that it's expensive to balance > across domains that far apart. > > However, as is rather unfortunately explained in > > commit 32e45ff43eaf ("mm: increase RECLAIM_DISTANCE to 30") > > the value for RECLAIM_DISTANCE is based on node distance tables from > 2011-era hardware. > > Current AMD EPYC machines have the following NUMA node distances: > > node distances: > node 0 1 2 3 4 5 6 7 > 0: 10 16 16 16 32 32 32 32 > 1: 16 10 16 16 32 32 32 32 > 2: 16 16 10 16 32 32 32 32 > 3: 16 16 16 10 32 32 32 32 > 4: 32 32 32 32 10 16 16 16 > 5: 32 32 32 32 16 10 16 16 > 6: 32 32 32 32 16 16 10 16 > 7: 32 32 32 32 16 16 16 10 > > where 2 hops is 32. > > The result is that the scheduler fails to load balance properly across > NUMA nodes on different sockets -- 2 hops apart. > > For example, pinning 16 busy threads to NUMA nodes 0 (CPUs 0-7) and 4 > (CPUs 32-39) like so, > > $ numactl -C 0-7,32-39 ./spinner 16 > > causes all threads to fork and remain on node 0 until the active > balancer kicks in after a few seconds and forcibly moves some threads > to node 4. > > Override node_reclaim_distance for AMD Zen. > > Signed-off-by: Matt Fleming > Cc: "Suthikulpanit, Suravee" > Cc: Mel Gorman > Cc: "Lendacky, Thomas" > Cc: Borislav Petkov Acked-by: Mel Gorman The only caveat I can think of is that a future generation of Zen might take a different magic number than 32 as their remote distance. If or when this happens, it'll need additional smarts but lacking a crystal ball, we can cross that bridge when we come to it. -- Mel Gorman SUSE Labs