Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp4765685ybv; Mon, 17 Feb 2020 05:34:34 -0800 (PST) X-Google-Smtp-Source: APXvYqwcB55fCRQ7vzePx3gsjSzGGjl0r+lIXrmZTnhzZhL4F5greHeeLhsICQbLhUYr/ccOgI1W X-Received: by 2002:aca:3d7:: with SMTP id 206mr10061586oid.98.1581946474372; Mon, 17 Feb 2020 05:34:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581946474; cv=none; d=google.com; s=arc-20160816; b=QRWf/c8xxeYDzF4biwEgQmCAJ0ilBdZyH6dNCNAhFbij/qMzQxFDHVsPMlfAy+QxQd 9fP9dgDMkKoK8KBX/UAQL7DFCZGSf13h862s6PhSIUyATtNDMjphEgWZrzpk215EYKLU rMHsCPDE94EPwkTOdQoheXgFegOq71jh65ca6X8jAurbL2YoAfbrwcrcrLpRVY2e1v+m OmLpwGgXxssFgThcITpxw7v776U8Hx8wwFd4+Tu2WKXPn3GjgBcywYU/ezHPZ72rQlMk Bg4me2qEq94X5C3ne3P+iMAjgYNh2vaLpIWjeavx3g/EK6teaqs7WK8Ov2A92R9h5yDT L9lQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=3/RTKpp75ihIjBtmrkR7HY6aRB6AyzPgZg0gKWL4MrY=; b=HcXQ8cfs/IydA4FHycpKkcYQpWNN1Nl6qJHYKVWSECWw++gLxnV2eQ660Etk+KqLjP 4m2BVdnddjhCbMCDgFOti+gQLp9aOMPdtI7I0HmbQOaS0wtTw5yTNwjz/1se8K8RdCi8 f4BEYXKoe+OZlKeFQqiufaRW3QTwF24UwPg6q7M4B3e67ki8t2K80I2O2oxfgLkC+dI4 LzA2FuIXehxVJPRZok3RpR6nyJ7sfsjHm80tAYsdITbhfPyuJ5AftAyLX2E5lz3xm8Fn N/ebDT0cNna8Q6zOzvHZ1HGdz6ZX4mVhOEk6e6mYiPSpbQaJB2ofhcCYO9kUe2A5Vfll 1lIA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n24si209992otf.58.2020.02.17.05.34.21; Mon, 17 Feb 2020 05:34:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728467AbgBQNYO (ORCPT + 99 others); Mon, 17 Feb 2020 08:24:14 -0500 Received: from out30-57.freemail.mail.aliyun.com ([115.124.30.57]:35205 "EHLO out30-57.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728124AbgBQNYN (ORCPT ); Mon, 17 Feb 2020 08:24:13 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04452;MF=yun.wang@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0TqAk0uJ_1581945832; Received: from testdeMacBook-Pro.local(mailfrom:yun.wang@linux.alibaba.com fp:SMTPD_---0TqAk0uJ_1581945832) by smtp.aliyun-inc.com(127.0.0.1); Mon, 17 Feb 2020 21:23:53 +0800 Subject: Re: [PATCH RESEND v8 1/2] sched/numa: introduce per-cgroup NUMA locality info To: Mel Gorman , Peter Zijlstra Cc: Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Luis Chamberlain , Kees Cook , Iurii Zaikin , Michal Koutn? , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, "Paul E. McKenney" , Randy Dunlap , Jonathan Corbet References: <20200214151048.GL14914@hirez.programming.kicks-ass.net> <20200217115810.GA3420@suse.de> From: =?UTF-8?B?546L6LSH?= Message-ID: <881deb50-163e-442a-41ec-b375cc445e4d@linux.alibaba.com> Date: Mon, 17 Feb 2020 21:23:52 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <20200217115810.GA3420@suse.de> Content-Type: text/plain; charset=UTF-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/2/17 下午7:58, Mel Gorman wrote: [snip] >> Mel, I suspect you still feel that way, right? >> > > Yes, I still think it would be a struggle to interpret the data > meaningfully without very specific knowledge of the implementation. If > the scan rate was constant, it would be easier but that would make NUMA > balancing worse overall. Similarly, the stat might get very difficult to > interpret when NUMA balancing is failing because of a load imbalance, > pages are shared and being interleaved or NUMA groups span multiple > active nodes. Hi, Mel, appreciated to have you back on the table :-) IMHO the scan period changing should not be a problem now, since the maximum period is defined by user, so monitoring at maximum period on the accumulated page accessing counters is always meaningful, correct? FYI, by monitoring locality, we found that the kvm vcpu thread is not covered by NUMA Balancing, whatever how many maximum period passed, the counters are not increasing, or very slowly, although inside guest we are copying memory. Later we found such task rarely exit to user space to trigger task work callbacks, and NUMA Balancing scan depends on that, which help us realize the importance to enable NUMA Balancing inside guest, with the correct NUMA topo, a big performance risk I'll say :-P Maybe not a good example, but we just try to highlight that NUMA Balancing could have issue in some cases, and we want them to be exposed, somehow, maybe by the locality. Regards, Michael Wang > > For example, the series that reconciles NUMA and CPU balancers may look > worse in these stats even though the overall performance may be better. > >> In the document (patch 2/2) you write: >> >>> +However, there are no hardware counters for per-task local/remote accessing >>> +info, we don't know how many remote page accesses have occurred for a >>> +particular task. >> >> We can of course 'fix' that by adding a tracepoint. >> >> Mel, would you feel better by having a tracepoint in task_numa_fault() ? >> > > A bit, although interpreting the data would still be difficult and the > tracepoint would have to include information about the cgroup. While > I've never tried, this seems like the type of thing that would be suited > to a BPF script that probes task_numa_fault and extract the information > it needs. >