Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp8573967ybl; Thu, 16 Jan 2020 19:58:42 -0800 (PST) X-Google-Smtp-Source: APXvYqx8rf0XbU9129oU7/czaySJRFEukItQGIFmEWsQxG25ItmaLJ1nN2aHk9biHE1SPrfTLOlX X-Received: by 2002:a05:6830:1385:: with SMTP id d5mr4895204otq.61.1579233522583; Thu, 16 Jan 2020 19:58:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579233522; cv=none; d=google.com; s=arc-20160816; b=ek6JYKRJMnrBLxXVloy7aAPyY6vKBQuXkvGzI6jRHtQrkA6nQpiTdKoSDbOh69/EL1 6qJCOqIsgIjTTXT5ZBwDfFM3BUXwOrfkkWoyJJpO39ymCn61geDYGKYu5X9NKbgb740b zwWCYmC6pLziZP/GDTtlPCE/u/YE31b1yyBLHKjGp1vHh8SgFudyxdzStSnNXc1wajFi jtDApAOVTj40a7aphAvPTri1P/7iJQ+evh+cgTdHvY3UpU+W/MBzMAZPuQNqUoQ6nKfq hDByHghjU3wVDNFJcO28vN/G+PhwHOf0MLWgLPcn5aC9R9C4czSvsGAfH0ulAQ432P7b Bf6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:to:subject:from; bh=yLKsrh1eGv5FCF7d041eijZGX/HID+Nt20HCHuh9KbA=; b=nKEIoppDnSmRCKTCA3PtC6tisoj6utVUcdQSvc2FuVfzcSb6Os6MGBNeTDoKNgRucL ALgzOYj1Da5b0FD7tnVdQRt/jWuuqeQAr1WBqcRp9UjySCEidzWGHT2EIOGAVRuratdL JlQooeVKdHnyQiV9lTY8Vor5yZerECXILCQopCLTfoRXZyFAg6umRa98q6SeIjux5IPm powJmYIFCcAad1PKQcSjLjH/7vZRVUXIxfGHGfrdbjnEQHb+RxsCAPV/4AQXySbQiI8i xNxZi+xr/KFV593Q0CadcNboigJteKz4hM5bsm8H3qRRtp5q277HP4d8dkPMkip1RdB+ IkXw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i12si12757757oto.230.2020.01.16.19.58.31; Thu, 16 Jan 2020 19:58:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387650AbgAQCTX (ORCPT + 99 others); Thu, 16 Jan 2020 21:19:23 -0500 Received: from out30-56.freemail.mail.aliyun.com ([115.124.30.56]:46516 "EHLO out30-56.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729730AbgAQCTX (ORCPT ); Thu, 16 Jan 2020 21:19:23 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=yun.wang@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0TnwGlMv_1579227557; Received: from testdeMacBook-Pro.local(mailfrom:yun.wang@linux.alibaba.com fp:SMTPD_---0TnwGlMv_1579227557) by smtp.aliyun-inc.com(127.0.0.1); Fri, 17 Jan 2020 10:19:18 +0800 From: =?UTF-8?B?546L6LSH?= Subject: Re: [PATCH v6 0/2] sched/numa: introduce numa locality To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Luis Chamberlain , Kees Cook , Iurii Zaikin , =?UTF-8?Q?Michal_Koutn=c3=bd?= , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, "Paul E. McKenney" , Randy Dunlap , Jonathan Corbet References: <743eecad-9556-a241-546b-c8a66339840e@linux.alibaba.com> <207ef46c-672c-27c8-2012-735bd692a6de@linux.alibaba.com> <040def80-9c38-4bcc-e4a8-8a0d10f131ed@linux.alibaba.com> <25cf7ef5-e37e-7578-eea7-29ad0b76c4ea@linux.alibaba.com> <443641e7-f968-0954-5ff6-3b7e7fed0e83@linux.alibaba.com> Message-ID: <8edb83a2-9943-2954-0da6-f4d29e3df109@linux.alibaba.com> Date: Fri, 17 Jan 2020 10:19:17 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear folks, During our testing, we found in some cases the NUMA Balancing is not helping improving locality, that is the memory writing inside a virtual machine. The VM is created by docker kata-runtime, inside guest the container executed several tasks to malloc memory and keep writing in page size, then report the time cost after finished 1G writing. The result is not as good as runc, and we found the locality is not growing in kata cases, with some debugging we located the reason. Those vcpu threads created by VM is rarely exit into userspace in this case, they just stay in kernel after calling ioctl(KVM_RUN), while NUMA Balancing work is done with task_work_run(), which is handled together with signal handling before exit to usermode. So the situation is, for these vcpu threads, NUMA Balancing work was queued with task_work_add(), but never got chance to finish. Now the question is, is this by designed or not? BTW, we also passed the NUMA topology into VM, but still the result is not as good as runc, seems like the effect of NUMA Balancing on host is far more better than inside guest. Regards, Michael Wang On 2019/12/13 上午9:43, 王贇 wrote: > Since v5: > * fix compile failure when NUMA disabled > Since v4: > * improved documentation > Since v3: > * fix comments and improved documentation > Since v2: > * simplified the locality concept & implementation > Since v1: > * improved documentation > > Modern production environment could use hundreds of cgroup to control > the resources for different workloads, along with the complicated > resource binding. > > On NUMA platforms where we have multiple nodes, things become even more > complicated, we hope there are more local memory access to improve the > performance, and NUMA Balancing keep working hard to achieve that, > however, wrong memory policy or node binding could easily waste the > effort, result a lot of remote page accessing. > > We need to notice such problems, then we got chance to fix it before > there are too much damages, however, there are no good monitoring > approach yet to help catch the mouse who introduced the remote access. > > This patch set is trying to fill in the missing pieces, by introduce > the per-cgroup NUMA locality info, with this new statistics, we could > achieve the daily monitoring on NUMA efficiency, to give warning when > things going too wrong. > > Please check the second patch for more details. > > Michael Wang (2): > sched/numa: introduce per-cgroup NUMA locality info > sched/numa: documentation for per-cgroup numa statistics > > Documentation/admin-guide/cg-numa-stat.rst | 178 ++++++++++++++++++++++++ > Documentation/admin-guide/index.rst | 1 + > Documentation/admin-guide/kernel-parameters.txt | 4 + > Documentation/admin-guide/sysctl/kernel.rst | 9 ++ > include/linux/sched.h | 15 ++ > include/linux/sched/sysctl.h | 6 + > init/Kconfig | 11 ++ > kernel/sched/core.c | 75 ++++++++++ > kernel/sched/fair.c | 62 +++++++++ > kernel/sched/sched.h | 12 ++ > kernel/sysctl.c | 11 ++ > 11 files changed, 384 insertions(+) > create mode 100644 Documentation/admin-guide/cg-numa-stat.rst >