Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp1415227imc; Mon, 11 Mar 2019 13:16:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqz364pwC50+tQqjg5zfYsQ9Y5S7g9zlIZ9J+rPjU0X1VB6GAThHTwWC899LudDfYuXmWYDZ X-Received: by 2002:aa7:8750:: with SMTP id g16mr8392432pfo.123.1552335392696; Mon, 11 Mar 2019 13:16:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552335392; cv=none; d=google.com; s=arc-20160816; b=lxvt9tpbEhYdfkS85quonPEGyYuws18PgNcxdP84OV6Mqe3Rer6hEsp0Yse1QKt958 PWW7z6cWP8zdj7YpJST8mkmJkb7lVa9YFvSt6o6POO6NTtzPofIY2M58br58V5vhK6MX 14eBTaKChu2c6gzYxp0VQPTAwxYw/T7d78PcfTZvGV+BpRcd0h7LR9JVkjLtWmd6rr7Z ehsvXDXO5GknUCN9ZT8UFKc4H6bVaqosNRfh7iq4pmHAo/xqSp1rVuTdEfRHYk9o1Zv8 D/WV0KB2AVgCm6/6hq+Jn+WoMiBrUmOAYw2npPeObpExDYMglgY+AMsXihV68xqPJn+D RFtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=rQ7/RQ1j30ytbZd6sdAxzHy+Daowbu5PhrgpR35M1Cw=; b=ymwaiIHMa3w/MkqVnJsAJ/bcDGDGQUtdwK3JpwzfaEMHACtN4/ZCJ7DG0s4eKntlqM ZzpgwV7ot3gUD/5W3nauF7zNKuegDxZGYW5Ug2zmWPQJsg2cv9hBUqo7MBgpolhoG8vC SKMVi5OhTXRjkElulcob3qpRpr8f/ay64dVNyey5LgPkQG6wGCO3UhSoNDom9phQlsRc PfzJAAl+GITOClPqkLySgJqnKC6xSBDoqIb1uejAmX4zEVbxLIayqY/6acAKaaNaWmjh tDKaZ/0RL36DJRgRGGkLU2GZ708GLlyLL/8hgqztdzB+DYWCW3hpdGPE4WQckDCQc4eL 6hmg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l14si5926671pfj.112.2019.03.11.13.16.16; Mon, 11 Mar 2019 13:16:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728109AbfCKUPz (ORCPT + 99 others); Mon, 11 Mar 2019 16:15:55 -0400 Received: from mga11.intel.com ([192.55.52.93]:55643 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726675AbfCKUPy (ORCPT ); Mon, 11 Mar 2019 16:15:54 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Mar 2019 13:15:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,468,1544515200"; d="scan'208";a="327630203" Received: from unknown (HELO localhost.localdomain) ([10.232.112.69]) by fmsmga005.fm.intel.com with ESMTP; 11 Mar 2019 13:15:53 -0700 Date: Mon, 11 Mar 2019 14:16:33 -0600 From: Keith Busch To: Jonathan Cameron Cc: "Busch, Keith" , "linux-kernel@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "linux-mm@kvack.org" , "linux-api@vger.kernel.org" , Greg Kroah-Hartman , Rafael Wysocki , "Hansen, Dave" , "Williams, Dan J" Subject: Re: [PATCHv7 10/10] doc/mm: New documentation for memory performance Message-ID: <20190311201632.GG10411@localhost.localdomain> References: <20190227225038.20438-1-keith.busch@intel.com> <20190227225038.20438-11-keith.busch@intel.com> <20190311113843.00006b47@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190311113843.00006b47@huawei.com> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 11, 2019 at 04:38:43AM -0700, Jonathan Cameron wrote: > On Wed, 27 Feb 2019 15:50:38 -0700 > Keith Busch wrote: > > > Platforms may provide system memory where some physical address ranges > > perform differently than others, or is side cached by the system. > The magic 'side cached' term still here in the patch description, ideally > wants cleaning up. > > > > > Add documentation describing a high level overview of such systems and the > > perforamnce and caching attributes the kernel provides for applications > performance > > > wishing to query this information. > > > > Reviewed-by: Mike Rapoport > > Signed-off-by: Keith Busch > > A few comments inline. Mostly the weird corner cases that I miss understood > in one of the earlier versions of the code. > > Whilst I think perhaps that one section could be tweaked a tiny bit I'm basically > happy with this if you don't want to. > > Reviewed-by: Jonathan Cameron > > > --- > > Documentation/admin-guide/mm/numaperf.rst | 164 ++++++++++++++++++++++++++++++ > > 1 file changed, 164 insertions(+) > > create mode 100644 Documentation/admin-guide/mm/numaperf.rst > > > > diff --git a/Documentation/admin-guide/mm/numaperf.rst b/Documentation/admin-guide/mm/numaperf.rst > > new file mode 100644 > > index 000000000000..d32756b9be48 > > --- /dev/null > > +++ b/Documentation/admin-guide/mm/numaperf.rst > > @@ -0,0 +1,164 @@ > > +.. _numaperf: > > + > > +============= > > +NUMA Locality > > +============= > > + > > +Some platforms may have multiple types of memory attached to a compute > > +node. These disparate memory ranges may share some characteristics, such > > +as CPU cache coherence, but may have different performance. For example, > > +different media types and buses affect bandwidth and latency. > > + > > +A system supports such heterogeneous memory by grouping each memory type > > +under different domains, or "nodes", based on locality and performance > > +characteristics. Some memory may share the same node as a CPU, and others > > +are provided as memory only nodes. While memory only nodes do not provide > > +CPUs, they may still be local to one or more compute nodes relative to > > +other nodes. The following diagram shows one such example of two compute > > +nodes with local memory and a memory only node for each of compute node: > > + > > + +------------------+ +------------------+ > > + | Compute Node 0 +-----+ Compute Node 1 | > > + | Local Node0 Mem | | Local Node1 Mem | > > + +--------+---------+ +--------+---------+ > > + | | > > + +--------+---------+ +--------+---------+ > > + | Slower Node2 Mem | | Slower Node3 Mem | > > + +------------------+ +--------+---------+ > > + > > +A "memory initiator" is a node containing one or more devices such as > > +CPUs or separate memory I/O devices that can initiate memory requests. > > +A "memory target" is a node containing one or more physical address > > +ranges accessible from one or more memory initiators. > > + > > +When multiple memory initiators exist, they may not all have the same > > +performance when accessing a given memory target. Each initiator-target > > +pair may be organized into different ranked access classes to represent > > +this relationship. > > This concept is a bit vague at the moment. Largely because only access0 > is actually defined. We should definitely keep a close eye on any others > that are defined in future to make sure this text is still valid. > > I can certainly see it being used for different ideas of 'best' rather > than simply best and second best etc. I tried to make the interface flexible to future extension, but I'm still not sure how potential users would want to see something like all pair-wise attributes, so I had some trouble trying to capture that in words. > > The highest performing initiator to a given target > > +is considered to be one of that target's local initiators, and given > > +the highest access class, 0. Any given target may have one or more > > +local initiators, and any given initiator may have multiple local > > +memory targets. > > + > > +To aid applications matching memory targets with their initiators, the > > +kernel provides symlinks to each other. The following example lists the > > +relationship for the access class "0" memory initiators and targets, which is > > +the of nodes with the highest performing access relationship:: > > + > > + # symlinks -v /sys/devices/system/node/nodeX/access0/targets/ > > + relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY > > So this one perhaps needs a bit more description - I would put it after initiators > which precisely fits the description you have here now. > > "targets contains those nodes for which this initiator is the best possible initiator." > > which is subtly different form > > "targets contains those nodes to which this node has the highest > performing access characteristics." > > For example in my test case: > * 4 nodes with local memory and cpu, 1 node remote and equal distant from all of the > initiators, > > targets for the compute nodes contains both themselves and the remote node, to which > the characteristics are of course worse. As you point out before, we need to look > in > node0/access0/targets/node0/access0/initiators > node0/access0/targets/node4/access0/initiators > to get the relevant characteristics and work out that node0 is 'nearer' itself > (obviously this is a bit of a silly case, but we could have no memory node0 and > be talking about node4 and node5. > > I am happy with the actual interface, this is just a question about whether we can tweak > this text to be slightly clearer. Sure, I mention this in patch 4's commit message. Probably worth repeating here: A memory initiator may have multiple memory targets in the same access class. The target memory's initiators in a given class indicate the nodes access characteristics share the same performance relative to other linked initiator nodes. Each target within an initiator's access class, though, do not necessarily perform the same as each other.