Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp521944ybx; Fri, 1 Nov 2019 07:11:46 -0700 (PDT) X-Google-Smtp-Source: APXvYqxqI4jH/t9uSaZASNLuw+ZRahec1t9X7yb9mIBpkQggt+ZQAD9GXcfmh90xXTtrFyyIhJEk X-Received: by 2002:a17:906:70d2:: with SMTP id g18mr10290698ejk.18.1572617506408; Fri, 01 Nov 2019 07:11:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572617506; cv=none; d=google.com; s=arc-20160816; b=zqGPoihO3dLN/7ncj+JcJ4fiJHrlGKySysjsAh/5S4wAuQLT23j4Zgupy2t1yRU+QQ AwrqvW2Mc+doXEdCs/9LjpSVsw7A1V4jAebcvvTe/9TDpcPxEOqevWsIZjKcdoyRmuMn O2LfiCBQxec7ATaaugL6VhqqNZZH8LoDI3w6LAwwhbwAJzwLCSiK9Pd3JEN1oVmgGioX jrW0HTfRP+sZ1AFkr5McrzxO7TvrkkcBgMMqs1PYH6lU7rpOz+HunXka1lknWxbDCqnm zBODF39mBAJFxWnjK3hW7dayT7TiOqiGsBwToCNWeXR/4K4VLaWEvqkCn/SS5/hkMKEI iRXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=HO/OfNUXFSUhitheefmWZbJS7A3asvkqtdNssxMgFL4=; b=vfn+2Li+mqh4mfULnN/fs3c03DYptGU3tL9vCkNAahq3QR0yD/x3y6fu9Y6L8cyrTH hGZqtpUybnfqXrg8hBEvW4/BWaJaKhAXejL2vykjYiiR7CcxWa2AOZXH56G+W5p0ouC+ YQOmQB4R2QCHrBc+G1ld/rhcZA5d+zhuqzbvuD0tmsEexVtIJcercP5TcI12CBeb7obB XEzzhStmaNnks06+uQQ9eRxcQg4BkIyAoT11GcvD+hFSnWCG8wTk7UqznFtJkEnHDoxg FPNhpnXHDVk6bzclyjp6/c3bONzNNX3/MiFVTt5Ag7k3Al9MA576Xb57HqiPNaPMeLQ/ c6PQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r8si6380155eju.426.2019.11.01.07.11.23; Fri, 01 Nov 2019 07:11:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727230AbfKANfc (ORCPT + 99 others); Fri, 1 Nov 2019 09:35:32 -0400 Received: from mx2.suse.de ([195.135.220.15]:49510 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726860AbfKANfc (ORCPT ); Fri, 1 Nov 2019 09:35:32 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D600DB07B; Fri, 1 Nov 2019 13:35:30 +0000 (UTC) Date: Fri, 1 Nov 2019 13:35:28 +0000 From: Mel Gorman To: ?????? Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched/numa: advanced per-cgroup numa statistic Message-ID: <20191101133528.GP28938@suse.de> References: <46b0fd25-7b73-aa80-372a-9fcd025154cb@linux.alibaba.com> <20191030095505.GF28938@suse.de> <6f5e43db-24f1-5283-0881-f264b0d5f835@linux.alibaba.com> <20191031131731.GJ28938@suse.de> <5d69ff1b-a477-31b5-8600-9233a38445c7@linux.alibaba.com> <20191101091348.GM28938@suse.de> <2573b108-7885-5c4f-a0ae-2b245d663250@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <2573b108-7885-5c4f-a0ae-2b245d663250@linux.alibaba.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 01, 2019 at 07:52:15PM +0800, ?????? wrote: > > a much higher degree of flexibility on what information is tracked and > > allow flexibility on > > > > So, overall I think this can be done outside the kernel but recognise > > that it may not be suitable in all cases. If you feel it must be done > > inside the kernel, split out the patch that adds information on failed > > page migrations as it stands apart. Put it behind its own kconfig entry > > that is disabled by default -- do not tie it directly to NUMA balancing > > because of the data structure changes. When enabled, it should still be > > disabled by default at runtime and only activated via kernel command line > > parameter so that the only people who pay the cost are those that take > > deliberate action to enable it. > > Agree, we could have these per-task faults info there, give the possibility > to implement maybe a practical userland tool, I'd prefer not because that would still require the space in the locality array to store the data. I'd also prefer that numa_faults_locality[] information is not exposed unless this feature is enabled. That information is subject to change and interpreting it requires knowledge of the internals of automatic NUMA balancing. There are just too many corner cases where the information is garbage. Tasks with a memory policy would never update the counters, short-lived tasks may not update it, interleaving will give confused information about locality, the timing of the reads matter because it might be cleared, the frequency at which they clear is unknown as the frequency is adaptive -- the list goes on. I find it very very difficult to believe that a tool based on faults_locality will be able to give anything but the most superficial help and any sensible decision will require ftrace or numa_maps to get real information. > meanwhile have these kernel > numa data disabled by default, folks who got no tool but want to do easy > monitoring can just turn on the switch :-) > > Will have these in next version: > > * separate patch for showing per-task faults info Please only expose the failed= (or migfailed=) in that patch. Do not expose numa_faults_locality unless it is explicitly enabled on behalf of a tool that claims it can sensibly interpret it. > * new CONFIG for numa stat (disabled by default) > * dynamical runtime switch for numa stat (disabled by default) Dynamic runtime enabling will mean that if it's turned on, the information will be temporarily useless until stats are accumulated. Make sure to note that in any associated documentation stating a preference to enabling it with a kernel parameter. -- Mel Gorman SUSE Labs