Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp886741ybb; Fri, 20 Mar 2020 09:40:24 -0700 (PDT) X-Google-Smtp-Source: ADFU+vtbTTKMUEQn7QqDxHUXbf2CBwEwfx89ecTuSLmrJFs/9bxCZW+3ePTSNt/kyhGXur/2KREL X-Received: by 2002:aca:3857:: with SMTP id f84mr6959062oia.110.1584722424351; Fri, 20 Mar 2020 09:40:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584722424; cv=none; d=google.com; s=arc-20160816; b=YCRGH/oFSzkBThxdbzGxeKBG9Oq+6PXk7nRDxVMp0JCChogqsouWmr07aBm+KcTCCz a0PGDA55EJI3TFxXnI/LBtcJVIK4wdspkQHa336aNd2L2to+TrSTeR8bAvvZTtUjniw8 l4qRKYyGhcix0wkqo2zLjxyWE8YLB9jdLknhPxv6r9IcGW1D8JsRjnU0GX2tuv1wOXyu HIwd49AzRJKQmdgd7nDktgJ1lQnrwW4vcXsKZ1TB+ICVSlbmtLLuPFHO0BSKsXBZhzyB 0Y3vgwiRHniX9ebYBGmnM38sVOv7qyoE+8Czlnhp4tEP5vmmKLuVNJ5uTGJrrKTXsfMq Wz7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=op3WSee7BWQBxWo1B9Ahq1R1fMpd6jxrJR1j05aVNns=; b=0PtqDx38kE6XfLqBZVg8Ht4L5DtA1qS5M7l3qN5ckWrKiNgBR+HuyPXPUMFOt5HmXy B3cvUeC+pNtwajeUpsJxPlGP1EEOh/8oElIvC8w4RE2s131M0tt5Rvi54oAnfk+87oIb 8PTMkNyrIfaxi8F0hILtAegEsKZkBo5Sg58/CJeQ14EF5D1e8uDmuiYxE/IzFyv/vrhl iCkGBkGsQcpWMC3TBr/5xLgWeOT2mltzaxjl8USvtgeJzhViVYnY23+U0AOD0ersxEmy 0t6qFfeIl2YGSqYKv3PLQvEow/+PuEDPEfIHgGhmSruSFV0yU3MK786252CI4JC/eCb9 oBNA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v15si3528781oth.307.2020.03.20.09.40.11; Fri, 20 Mar 2020 09:40:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727590AbgCTQis (ORCPT + 99 others); Fri, 20 Mar 2020 12:38:48 -0400 Received: from outbound-smtp39.blacknight.com ([46.22.139.222]:56425 "EHLO outbound-smtp39.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727269AbgCTQis (ORCPT ); Fri, 20 Mar 2020 12:38:48 -0400 Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp39.blacknight.com (Postfix) with ESMTPS id 1A1331BBF for ; Fri, 20 Mar 2020 16:38:46 +0000 (GMT) Received: (qmail 5667 invoked from network); 20 Mar 2020 16:38:45 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.18.57]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 20 Mar 2020 16:38:45 -0000 Date: Fri, 20 Mar 2020 16:38:43 +0000 From: Mel Gorman To: Jirka Hladky Cc: Phil Auld , Peter Zijlstra , Ingo Molnar , Vincent Guittot , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Valentin Schneider , Hillf Danton , LKML Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6 Message-ID: <20200320163843.GD3818@techsingularity.net> References: <20200309203625.GU3818@techsingularity.net> <20200312095432.GW3818@techsingularity.net> <20200312155640.GX3818@techsingularity.net> <20200312214736.GA3818@techsingularity.net> <20200320152251.GC3818@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 20, 2020 at 04:30:08PM +0100, Jirka Hladky wrote: > > > > MPI or OMP and what is a low thread count? For MPI at least, I saw a 0.4% > > gain on an 4-node machine for bt_C and a 3.88% regression on 8-nodes. I > > think it must be OMP you are using because I found I had to disable UA > > for MPI at some point in the past for reasons I no longer remember. > > > Yes, it's indeed OMP. With low threads count, I mean up to 2x number of > NUMA nodes (8 threads on 4 NUMA node servers, 16 threads on 8 NUMA node > servers). > Ok, so we know it's within the imbalance threshold where a NUMA node can be left idle. > One possibility would be to spread wide always at clone time and assume > > wake_affine will pull related tasks but it's fragile because it breaks > > if the cloned task execs and then allocates memory from a remote node > > only to migrate to a local node immediately. > > > I think the only way to find out how it performs is to test it. If you > could prepare a patch like that, I'm more than happy to give it a try! > When the initial spreading was prevented, it was for pipelines mainly -- even basic shell scripts. In that case it was observed that a shell would fork/exec two tasks connected via pipe that started on separate nodes and had allocated remote data before being pulled close. The processes were typically too short lived for NUMA balancing to fix it up by exec time the information on where the fork happened was lost. See 2c83362734da ("sched/fair: Consider SD_NUMA when selecting the most idle group to schedule on"). Now the logic has probably been partially broken since because of how SD_NUMA is now treated but the concern about spreading wide prematurely remains. -- Mel Gorman SUSE Labs