Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp2078013imc; Tue, 12 Mar 2019 06:40:52 -0700 (PDT) X-Google-Smtp-Source: APXvYqyrGE4w8b3Diveaa0FMF2Cb9w+U9wER9+EK6j2EP4rZZMHgdZc+JFYsG7pbFpxc/Y1xTes1 X-Received: by 2002:a63:4146:: with SMTP id o67mr21653304pga.122.1552398052532; Tue, 12 Mar 2019 06:40:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552398052; cv=none; d=google.com; s=arc-20160816; b=TFy7fVgjcL7faHQuARci11wfZhWGliHIG3yDfRCEJkDcNCt8hinz66gtcsgYSGhwcf vPaBb2vUrQEUHZ8kDpx0365ZcMtDqAr92pjMmC5wSLG0GgvDqKo1LEIahvTImg0YxAIy gLjtRbylyRcgOYVoSW+9N+/W6bCK/8JGsrpq6C5sKOIR8MlSq5xsYxyvKW47A2TxU9tq tp0GvAEZwXyxxatAwLPDgzWtaQoX60RWWoSpAG3RkAtShWCVjHnYU87ok9s1dlE87YKT iwTH3cnYD+wD9fwBUSk/bPX/pt4S74Vrc/Z2XOd9XMyoSqKLjgUxrW0ZouBDPzVzXlHy i9Eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=m1mjK4sKFZ/17pgsPcb0jBgwVJQULZRieA1aW4xGA5U=; b=nmysdQ5/v9LeNRO2bOVyJRLihRqYViWKvyR6BgMLxwY49007ecyoKZnUVluRZJzO0g 6/ihH2Oa3evWMXeeOhFIhV/cSzvj17CLWSWBFaEN6GB7OwNm9yHMicWmfZaKsFyIIayK E5xdFfosXloZa8UoWKyqxDGMWZW6tPrzX1ubK29M35kVpvrlJwc0xc2AuHa6MMEGIDKM DkrO3vUStC25JG97l4xFTbHIr2gvnGIWD+BYAoAwze8kCMz1aZQuQF37qW4LXahtgdkW FvpnosgaHa6KsH6/5QZtBxc0Lz1U4lyBdLbPVDjmNZJYKU1e8fLhPhWCFvVdFSAasDs4 zS1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f1si8021309pff.246.2019.03.12.06.40.34; Tue, 12 Mar 2019 06:40:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726800AbfCLNiX (ORCPT + 99 others); Tue, 12 Mar 2019 09:38:23 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:4675 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725774AbfCLNiV (ORCPT ); Tue, 12 Mar 2019 09:38:21 -0400 Received: from DGGEMS403-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 77D1FE6603A2B2CFAEB4; Tue, 12 Mar 2019 21:38:17 +0800 (CST) Received: from localhost (10.202.226.61) by DGGEMS403-HUB.china.huawei.com (10.3.19.203) with Microsoft SMTP Server id 14.3.408.0; Tue, 12 Mar 2019 21:38:11 +0800 Date: Tue, 12 Mar 2019 13:37:56 +0000 From: Jonathan Cameron To: Keith Busch CC: "Busch, Keith" , "linux-kernel@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "linux-mm@kvack.org" , "linux-api@vger.kernel.org" , Greg Kroah-Hartman , Rafael Wysocki , "Hansen, Dave" , "Williams, Dan J" Subject: Re: [PATCHv7 10/10] doc/mm: New documentation for memory performance Message-ID: <20190312133756.000066c7@huawei.com> In-Reply-To: <20190311201632.GG10411@localhost.localdomain> References: <20190227225038.20438-1-keith.busch@intel.com> <20190227225038.20438-11-keith.busch@intel.com> <20190311113843.00006b47@huawei.com> <20190311201632.GG10411@localhost.localdomain> Organization: Huawei X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; i686-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.226.61] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 11 Mar 2019 14:16:33 -0600 Keith Busch wrote: > On Mon, Mar 11, 2019 at 04:38:43AM -0700, Jonathan Cameron wrote: > > On Wed, 27 Feb 2019 15:50:38 -0700 > > Keith Busch wrote: > > > > > Platforms may provide system memory where some physical address ranges > > > perform differently than others, or is side cached by the system. > > The magic 'side cached' term still here in the patch description, ideally > > wants cleaning up. > > > > > > > > Add documentation describing a high level overview of such systems and the > > > perforamnce and caching attributes the kernel provides for applications > > performance > > > > > wishing to query this information. > > > > > > Reviewed-by: Mike Rapoport > > > Signed-off-by: Keith Busch > > > > A few comments inline. Mostly the weird corner cases that I miss understood > > in one of the earlier versions of the code. > > > > Whilst I think perhaps that one section could be tweaked a tiny bit I'm basically > > happy with this if you don't want to. > > > > Reviewed-by: Jonathan Cameron > > > > > --- > > > Documentation/admin-guide/mm/numaperf.rst | 164 ++++++++++++++++++++++++++++++ > > > 1 file changed, 164 insertions(+) > > > create mode 100644 Documentation/admin-guide/mm/numaperf.rst > > > > > > diff --git a/Documentation/admin-guide/mm/numaperf.rst b/Documentation/admin-guide/mm/numaperf.rst > > > new file mode 100644 > > > index 000000000000..d32756b9be48 > > > --- /dev/null > > > +++ b/Documentation/admin-guide/mm/numaperf.rst > > > @@ -0,0 +1,164 @@ > > > +.. _numaperf: > > > + > > > +============= > > > +NUMA Locality > > > +============= > > > + > > > +Some platforms may have multiple types of memory attached to a compute > > > +node. These disparate memory ranges may share some characteristics, such > > > +as CPU cache coherence, but may have different performance. For example, > > > +different media types and buses affect bandwidth and latency. > > > + > > > +A system supports such heterogeneous memory by grouping each memory type > > > +under different domains, or "nodes", based on locality and performance > > > +characteristics. Some memory may share the same node as a CPU, and others > > > +are provided as memory only nodes. While memory only nodes do not provide > > > +CPUs, they may still be local to one or more compute nodes relative to > > > +other nodes. The following diagram shows one such example of two compute > > > +nodes with local memory and a memory only node for each of compute node: > > > + > > > + +------------------+ +------------------+ > > > + | Compute Node 0 +-----+ Compute Node 1 | > > > + | Local Node0 Mem | | Local Node1 Mem | > > > + +--------+---------+ +--------+---------+ > > > + | | > > > + +--------+---------+ +--------+---------+ > > > + | Slower Node2 Mem | | Slower Node3 Mem | > > > + +------------------+ +--------+---------+ > > > + > > > +A "memory initiator" is a node containing one or more devices such as > > > +CPUs or separate memory I/O devices that can initiate memory requests. > > > +A "memory target" is a node containing one or more physical address > > > +ranges accessible from one or more memory initiators. > > > + > > > +When multiple memory initiators exist, they may not all have the same > > > +performance when accessing a given memory target. Each initiator-target > > > +pair may be organized into different ranked access classes to represent > > > +this relationship. > > > > This concept is a bit vague at the moment. Largely because only access0 > > is actually defined. We should definitely keep a close eye on any others > > that are defined in future to make sure this text is still valid. > > > > I can certainly see it being used for different ideas of 'best' rather > > than simply best and second best etc. > > I tried to make the interface flexible to future extension, but I'm > still not sure how potential users would want to see something like > all pair-wise attributes, so I had some trouble trying to capture that > in words. Agreed, it is definitely non obvious. We might end up with something totally different like Jerome is proposing anyway. Let's address this when it happens! > > > > The highest performing initiator to a given target > > > +is considered to be one of that target's local initiators, and given > > > +the highest access class, 0. Any given target may have one or more > > > +local initiators, and any given initiator may have multiple local > > > +memory targets. > > > + > > > +To aid applications matching memory targets with their initiators, the > > > +kernel provides symlinks to each other. The following example lists the > > > +relationship for the access class "0" memory initiators and targets, which is > > > +the of nodes with the highest performing access relationship:: > > > + > > > + # symlinks -v /sys/devices/system/node/nodeX/access0/targets/ > > > + relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY > > > > So this one perhaps needs a bit more description - I would put it after initiators > > which precisely fits the description you have here now. > > > > "targets contains those nodes for which this initiator is the best possible initiator." > > > > which is subtly different form > > > > "targets contains those nodes to which this node has the highest > > performing access characteristics." > > > > For example in my test case: > > * 4 nodes with local memory and cpu, 1 node remote and equal distant from all of the > > initiators, > > > > targets for the compute nodes contains both themselves and the remote node, to which > > the characteristics are of course worse. As you point out before, we need to look > > in > > node0/access0/targets/node0/access0/initiators > > node0/access0/targets/node4/access0/initiators > > to get the relevant characteristics and work out that node0 is 'nearer' itself > > (obviously this is a bit of a silly case, but we could have no memory node0 and > > be talking about node4 and node5. > > > > I am happy with the actual interface, this is just a question about whether we can tweak > > this text to be slightly clearer. > > Sure, I mention this in patch 4's commit message. Probably worth > repeating here: > > A memory initiator may have multiple memory targets in the same access > class. The target memory's initiators in a given class indicate the > nodes access characteristics share the same performance relative to other > linked initiator nodes. Each target within an initiator's access class, > though, do not necessarily perform the same as each other. That sounds good to me.