Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp3219904pxx; Mon, 2 Nov 2020 03:21:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJw/kFvXRMSajTmneX9GtWL/XGVs7mvGlDn742fqQV4O8P6nsP3ug7UKdMcnui9p/A3JjoQW X-Received: by 2002:a17:906:31d0:: with SMTP id f16mr14408161ejf.409.1604316068538; Mon, 02 Nov 2020 03:21:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604316068; cv=none; d=google.com; s=arc-20160816; b=O2qzzQdGKTCCUxvSBKQUAuz+KBR2SC+H8tM8nkzxXf0pCfj9+jDSHMNiKzBQWE2rvf gv6G/OEXwEB5hKRdwJfQNJef1J+9cKzuuZITL5+CeX0u4lXSlkXSkpgycom+m+/2tQUl ohHRVnxOkYtLFcH+GFs/xcaLV1bZKV/fFudvh2nckhCZ/2OBemqgVNKAYQITcNutsgPp wPs3SXa7h914ZsKiJbhjuoggrAopdSwHtU///ikBEGdXHT0TsygTrQKMqMHeeeZq8Btz Uj6PWCfflIc++TRz9Lo6RYIy4qXwvzKgEEJPgzGPbP5vYErnSXS6/Sps8Y7gmrBNowkT QP7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=N/v/dG6+eWpMl1Ft92i2+qE2mOJFyXv500l8FkDwB44=; b=KyY/o1+megdZq0Xm6EffzzbAgT2+vb1lqynmCc5kBOYH4lmGIMf/BGO2A9EEsMrv9Z qwAFLdYj0fk3LakwE8WozNgxPwD/8Nhmh3UEUod9k3DY+Zmgv0PWCK3t68qmNL5XGRCc 0QlP3HFOo6cpNNYzasZswj2B6HeiIXEA3NqTFVjonl99upM9lfF4SJwwx1eGdikIT2Kp YsiDAdhx+VhL09rcyej5f3y3VXegONQa0Jp3jVK8jTAzIwR9psPZP2abMWK6FY2RvQIz gbZKXNbMxfVEoSEb5WuRlbbtm0cZnSbsXNC+t8ucT6RwxM4Wpnr+eyxhjzSCHrMOXAQA vHqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w8si10583049edu.395.2020.11.02.03.20.45; Mon, 02 Nov 2020 03:21:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728423AbgKBLRX (ORCPT + 99 others); Mon, 2 Nov 2020 06:17:23 -0500 Received: from mx2.suse.de ([195.135.220.15]:53702 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728288AbgKBLRW (ORCPT ); Mon, 2 Nov 2020 06:17:22 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B52C7ABE3; Mon, 2 Nov 2020 11:17:20 +0000 (UTC) Date: Mon, 2 Nov 2020 11:17:17 +0000 From: Mel Gorman To: Huang Ying Cc: Peter Zijlstra , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Ingo Molnar , Rik van Riel , Johannes Weiner , "Matthew Wilcox (Oracle)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes Subject: Re: [PATCH -V2 2/2] autonuma: Migrate on fault among multiple bound nodes Message-ID: <20201102111717.GB3306@suse.de> References: <20201028023411.15045-1-ying.huang@intel.com> <20201028023411.15045-3-ying.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20201028023411.15045-3-ying.huang@intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 28, 2020 at 10:34:11AM +0800, Huang Ying wrote: > Now, AutoNUMA can only optimize the page placement among the NUMA nodes if the > default memory policy is used. Because the memory policy specified explicitly > should take precedence. But this seems too strict in some situations. For > example, on a system with 4 NUMA nodes, if the memory of an application is bound > to the node 0 and 1, AutoNUMA can potentially migrate the pages between the node > 0 and 1 to reduce cross-node accessing without breaking the explicit memory > binding policy. > > So in this patch, if mbind(.mode=MPOL_BIND, .flags=MPOL_MF_LAZY) is used to bind > the memory of the application to multiple nodes, and in the hint page fault > handler both the faulting page node and the accessing node are in the policy > nodemask, the page will be tried to be migrated to the accessing node to reduce > the cross-node accessing. > > [Peter Zijlstra: provided the simplified implementation method.] > > Questions: > > Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA optimizing > globally. But for the memory areas that are bound to multiple NUMA nodes, even > if the AutoNUMA is enabled globally via the sysctl knob, we still need to enable > AutoNUMA again with a special flag. Why not just optimize the page placement if > possible as long as AutoNUMA is enabled globally? The interface would look > simpler with that. > > Signed-off-by: "Huang, Ying" I've no specific objection to the patch or the name change. I can't remember exactly why I picked the name, it was 8 years ago but I think it was because the policy represented the most basic possible approach that could be done without any attempt at being intelligent and established a baseline. The intent was that anything built on top had to be better than the most basic policy imaginable. The name reflected the dictionary definition at the time and happened to match the acronym closely enough and I wanted to make it absolutely clear to reviewers that the policy was not good enough (ruling out MPOL_BASIC or variants thereof) even if it happened to work for some workload and there was no intent to report it to the userspace API. The only hazard with the patch is that applications that use MPOL_BIND on multiple nodes may now incur some NUMA balancing overhead due to trapping faults and migrations. It might still end up being better but I was not aware of a *realistic* workload that binds to multiple nodes deliberately. Generally I expect if an application is binding, it's binding to one local node. If it shows up in regressions, it'll be interesting to get a detailed description of the workload. Pay particular attention to if THP is disabled as I learned relatively recently that NUMA balancing with THP disabled has higher overhead (which is hardly surprising). Lacking data or a specific objection Acked-by: Mel Gorman -- Mel Gorman SUSE Labs