Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp5331229ybv; Mon, 17 Feb 2020 17:39:59 -0800 (PST) X-Google-Smtp-Source: APXvYqys3qq6CHrE8NUo4G8PsuhIWx02GMg51Nh4Nvv96mR8UhGRdliT8r69v9Q8hPfd1zhgamgu X-Received: by 2002:aca:b4c3:: with SMTP id d186mr1128633oif.131.1581989999774; Mon, 17 Feb 2020 17:39:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581989999; cv=none; d=google.com; s=arc-20160816; b=ITC6f1rOyhYFcYW8YyE033RTLgkQWap09v5aOKjbQsCtQCtx93NAPMJVNN9BirZtz8 GIuqf5hodM2ywO6AzLd+xr0APoIImHNiye8tqZhc1AyMRMyF6hkaglMoW1ya2FnrgkIi WEdPfS1s1Rws+yPsxYYrXWUko5b5S+UASN6m986KzNvzAbdCQ4hKMBIQWc7jNF5s6SjW IpQXDtL9SNW00HeBfgd30lFGdQdK+m2iWir2Q7E1AMHy2CbzHSM0TQovkA7iRVmsKoCj iduYQDai87euqlqoMLz7fCOmgpI9xm5Ekl8cXJfTyeG5RS3l8VZOcPqTnhHZYivZdFS3 z1pQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=X2HeD4AAWI+PbFTKmAmInJBKY+D2Wcdj3mN7uWdIOWQ=; b=EkPCykubeFxNeCexnDgxKufRhMBzcNN5wHwNnzUNmjaqKMUxKnXFC6Pt1PHJFUH/qH ZO6GbDQ2zpxqHgqGufpGzHqA4wfmHfNZfqjFB3/TWQk+9HIHmNMjJ8avS1DNiEt1NaZW fQW6tPxVvBX/6Z8N97ZinOztrF+A81i1kRNGFCMW7VhMpvcj+dAK8u5w+JLWQUhu1QUK PKbxBHf92rxX/Rv11o7MzEUoulU6s7fhn5fHuEZWlogJxa8bJOp4AfMrThof2E2WhUty l5r02z28HQo2kAWShw9d0Qw1d0wfN5pmahvUy0K0XccGTyzwG7miF0N8T+7gZvGXTFOn nTtw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i5si7656509oif.211.2020.02.17.17.39.46; Mon, 17 Feb 2020 17:39:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726292AbgBRBjl (ORCPT + 99 others); Mon, 17 Feb 2020 20:39:41 -0500 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:36215 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726097AbgBRBjl (ORCPT ); Mon, 17 Feb 2020 20:39:41 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=yun.wang@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0TqCOUvS_1581989975; Received: from testdeMacBook-Pro.local(mailfrom:yun.wang@linux.alibaba.com fp:SMTPD_---0TqCOUvS_1581989975) by smtp.aliyun-inc.com(127.0.0.1); Tue, 18 Feb 2020 09:39:36 +0800 Subject: Re: [PATCH RESEND v8 1/2] sched/numa: introduce per-cgroup NUMA locality info To: Mel Gorman Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Luis Chamberlain , Kees Cook , Iurii Zaikin , Michal Koutn? , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, "Paul E. McKenney" , Randy Dunlap , Jonathan Corbet References: <20200214151048.GL14914@hirez.programming.kicks-ass.net> <20200217115810.GA3420@suse.de> <881deb50-163e-442a-41ec-b375cc445e4d@linux.alibaba.com> <20200217141616.GB3420@suse.de> From: =?UTF-8?B?546L6LSH?= Message-ID: <114519ab-4e9e-996a-67b8-4f5fcecba72a@linux.alibaba.com> Date: Tue, 18 Feb 2020 09:39:35 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <20200217141616.GB3420@suse.de> Content-Type: text/plain; charset=UTF-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/2/17 下午10:16, Mel Gorman wrote: > On Mon, Feb 17, 2020 at 09:23:52PM +0800, ?????? wrote: [snip] >> >> IMHO the scan period changing should not be a problem now, since the >> maximum period is defined by user, so monitoring at maximum period >> on the accumulated page accessing counters is always meaningful, correct? >> > > It has meaning but the scan rate drives the fault rate which is the basis > for the stats you accumulate. If the scan rate is high when accesses > are local, the stats can be skewed making it appear the task is much > more local than it may really is at a later point in time. The scan rate > affects the accuracy of the information. The counters have meaning but > they needs careful interpretation. Yeah, to zip so many information from NUMA Balancing to some statistics is a challenge itself, the locality still not so easy to be understood by NUMA newbie :-P > >> FYI, by monitoring locality, we found that the kvm vcpu thread is not >> covered by NUMA Balancing, whatever how many maximum period passed, the >> counters are not increasing, or very slowly, although inside guest we are >> copying memory. >> >> Later we found such task rarely exit to user space to trigger task >> work callbacks, and NUMA Balancing scan depends on that, which help us >> realize the importance to enable NUMA Balancing inside guest, with the >> correct NUMA topo, a big performance risk I'll say :-P >> > > Which is a very interesting corner case in itself but also one that > could have potentially have been inferred from monitoring /proc/vmstat > numa_pte_updates or on a per-task basis by monitoring /proc/PID/sched and > watching numa_scan_seq and total_numa_faults. Accumulating the information > on a per-cgroup basis would require a bit more legwork. That's not working for daily monitoring... Besides, compared with locality, this require much more deeper understand on the implementation, which could even be tough for NUMA developers to assemble all these statistics together. > >> Maybe not a good example, but we just try to highlight that NUMA Balancing >> could have issue in some cases, and we want them to be exposed, somehow, >> maybe by the locality. >> > > Again, I'm somewhat neutral on the patch simply because I would not use > the information for debugging problems with NUMA balancing. I would try > using tracepoints and if the tracepoints were not good enough, I'd add or > fix them -- similar to what I had to do with sched_stick_numa recently. > The caveat is that I mostly look at this sort of problem as a developer. > Sysadmins have very different requirements, especially simplicity even > if the simplicity in this case is an illusion. Fair enough, but I guess PeterZ still want your Ack, so neutral means refuse in this case :-( BTW, how do you think about the documentation in second patch? Do you think it's necessary to have a doc to explain NUMA related statistics? Regards, Michael Wang >