Received: by 2002:a25:1104:0:0:0:0:0 with SMTP id 4csp179572ybr; Fri, 22 May 2020 04:07:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx6Zb+WtSIeOPGQe714nBLugVcyj96McVl0ZOgxgy8pw8VHrgrfQuB5uAt2i821ZHKwiOZM X-Received: by 2002:a17:906:cecb:: with SMTP id si11mr7388922ejb.122.1590145650968; Fri, 22 May 2020 04:07:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590145650; cv=none; d=google.com; s=arc-20160816; b=tD0O1lw378p5AmeHy4dusYtvOBdmkXmCE76Fnfjhs0EqmbX5z0ghxXF/aBDSP2In6l QiczkxRz+lw1DSCTQLyweHs50+flp/L6LgeYWVn3p5u+H4iEV2QPMicxOfS32hM3tqNP VrrQ/BLVOE+xBp2YWVibmynsLUz82i5jVcUD25X4tkqs5abwv37ehPdb3c/Shn36wXyn P6qHlRQzc27c/VmRo/xiFNYJsvVk2COhVtY5JERpQEwaV2MXlCKgfOfK4+xskA3r85e5 RpMJ8IVO1UroHrBn1NGFx2jl2QhU2wURiI1iExiS4Gtr3Ihd0wBQcKdyEa9LbUu2XH/s +AvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=QjHagI9ImgTKtW0CiSJJXxK8HdSe0Jqb/uhsjyl/mIY=; b=SP5gNekV/OvShq0Lj1NrJ3mRnIkDr5MZg3oxIUhG7jnTeld4a6TEkNuPYCdCuAgRvZ sgkceMrZ6mBFeSsvaBGXnWyKukOdIJlSyj7g2kDW+DvoCOuvETTHVjmcj4BnVuC3g9+D 0sr+oqMxxWwOC2omSuUWmXU9S5azY244kOV7pZDLc+UHfu9wP/oUSM03RQbDg7JYZU+0 sMVzUS9TuEz1/iZ8fUUBME5gzEzQ7V6OBL8pTOH8jF5hFNnqSmYa3bZVzjdaq/uqdmLa ivZWi2NvQ1+pS2NcnFNNJ79TfMOCNaooqzBVmBoH4BDT8FJaRKzXcIm6SNDvtUdzQpK3 b3wA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g18si4806902ejr.498.2020.05.22.04.07.06; Fri, 22 May 2020 04:07:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728938AbgEVLFJ (ORCPT + 99 others); Fri, 22 May 2020 07:05:09 -0400 Received: from outbound-smtp49.blacknight.com ([46.22.136.233]:35969 "EHLO outbound-smtp49.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728371AbgEVLFH (ORCPT ); Fri, 22 May 2020 07:05:07 -0400 Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp49.blacknight.com (Postfix) with ESMTPS id A65B1FAE8A for ; Fri, 22 May 2020 12:05:02 +0100 (IST) Received: (qmail 6893 invoked from network); 22 May 2020 11:05:02 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.18.57]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 22 May 2020 11:05:02 -0000 Date: Fri, 22 May 2020 12:05:00 +0100 From: Mel Gorman To: Hillf Danton Cc: Jirka Hladky , Phil Auld , Peter Zijlstra , Ingo Molnar , Vincent Guittot , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Valentin Schneider , LKML , Douglas Shakshober , Waiman Long , Joe Mario , Bill Gray , "aokuliar@redhat.com" , "kkolakow@redhat.com" Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6 Message-ID: <20200522110500.GD7167@techsingularity.net> References: <20200507155422.GD3758@techsingularity.net> <20200507174934.GD19331@lorien.usersys.redhat.com> <20200508034741.13036-1-hdanton@sina.com> <20200519043154.10876-1-hdanton@sina.com> <20200521140931.15232-1-hdanton@sina.com> <20200522010950.3336-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20200522010950.3336-1-hdanton@sina.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 22, 2020 at 09:09:50AM +0800, Hillf Danton wrote: > > On Thu, 21 May 2020 17:04:08 +0100 Mel Gorman wrote: > > On Thu, May 21, 2020 at 10:09:31PM +0800, Hillf Danton wrote: > > > > I'm ignoring the coding style of c++ comments but minimally that should > > > > be fixed. More importantly, node_type can be one of node_overloaded, > > > > node_has_spare or node_fully busy and this is checking if there is a > > > > mismatch. However, it's not taking into account whether the dst_node > > > > is more or less loaded than the src and does not appear to be actually > > > > checking spare capacity like the comment suggests. > > > > > > > > > > Type other than node_has_spare is not considered because node is not > > > possible to be so idle that two communicating tasks would better > > > "stay local." > > > > > > > You hardcode an imbalance of 2 at the start without computing any > > imbalance. > > Same result comes up if it's a bool. > Then the magic number is simply confusing. The patch needs to be a lot clearer about what the intent is if my patch that adds a "if (env->src_stats.node_type == node_has_spare)" check is not what you were aiming for. > > Then if the source is fully_busy or overloaded while the > > dst is idle, a task can move but that is based on an imaginary hard-coded > > imbalance of 2. > > This is the potpit I walked around by checking spare capacity first. As > for overloaded, I see it as a signal that a glitch is not idle somewhere > else, and I prefer to push it in to ICU before it's too late. > The domain could be overloaded simply due to CPU bindings. I cannot parse the ICU comment (Intensive Care Unit?!?) :( > > Finally, it appears that that the load balancer and > > NUMA balancer are again using separate logic again when one part of the > > objective of the series is that the load balancer and NUMA balancer would > > not override each other. > > This explains 80% of why it is a choppy road ahead. > And I don't think we should go back to the load balancer and NUMA balancer taking different actions. It ends up doing useless CPU migrations and can lead to higher NUMA scanning activity. It's partially why I changed task_numa_compare to prefer swapping with tasks that move to their preferred node when using an idle CPU would cause an imbalance. > > As the imbalance is never computed, the patch > > can create one which the load balancer then overrides. What am I missing? > > LB would give a green light if things move in the direction in favor of > cutting imb. > Load balancing primarily cares about balancing the number of idle CPUs between domains when there is spare capacity. While it tries to avoid balancing by moving tasks from their preferred node, it will do so if there are no other candidates. > > > > > > > > > > > > Then there is this part > > > > > > > > + imbalance = adjust_numa_imbalance(imbalance, > > > > + env->src_stats.nr_running); > > > > + > > > > + //Do nothing without imbalance > > > > + if (!imbalance) { > > > > + imbalance = 2; > > > > + goto check_imb; > > > > + } > > > > > > > > So... if there is no imbalance, assume there is an imbalance of 2, jump to > > > > a branch that will always be false and fall through to code that ignores > > > > the value of imbalance ...... it's hard to see exactly why that code flow > > > > is ideal. > > > > > > > With spare capacity be certain, then lets see how idle the node is. > > > > But you no longer check how idle it is or what it's relative idleness of > > the destination is relative to the source. adjust_numa_imbalance does > > not calculate an imbalance, it only decides whether it should be > > ignored. > > Then the idle CPU is no longer so tempting. > True. While it's prefectly possible to ignore imbalance and use an idle CPU if it exists, the load balancer will simply override it later and we go back to the NUMA balancer and load balancer fighting each other with the NUMA balancer retrying migrations based on p->numa_migrate_retry. Whatever logic is used to allow imbalances (be they based on communicating tasks or preferred locality), it needs to be the same in both balancers. -- Mel Gorman SUSE Labs