Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1048812imu; Fri, 16 Nov 2018 14:57:03 -0800 (PST) X-Google-Smtp-Source: AJdET5eBfNF0tK20E+v6K8nKLtYMWP2Ll7kNP+RoTFWG9scbw5RyR4xDDS4friR+pIRaPxpODKcU X-Received: by 2002:a65:484c:: with SMTP id i12mr11648658pgs.309.1542409023797; Fri, 16 Nov 2018 14:57:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542409023; cv=none; d=google.com; s=arc-20160816; b=PWZuFpzPjHb0CK/mN6dzKiJv4mX+zaJc9Chr9c9GK0ZOR+bai0GpVTIcw3tvv0bxQm 89qjUzNt+HBF7uChzoD/AGvjf9gpxiW6GiT0b8W75hP5oyUwUxVnhdm5OWN7g8vNrJ+W 0Q639PIzCv2OpbJVUhx/z/F16apnBxseEyh+FVUdO/zLzylBBhoqXDmri4622QZkLwZJ yopH0RgHmrF4mhG2k55LRQpUTb84bjgjS76C/1gq82MlnfROvJckU3XMAlABORHbareH iLcqlc53W2DzYOm5lRJswRHZvyHdp5rwuMS4n2BMCqxODRH46OsQCXZBOEV/XNYAHapE CPfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=/cVwU96N2XYZtJG7MGhw5JAB55a0CKIl5sIXk+pMEgk=; b=JdSDyVfVik5kb6f/qNS1q3FkrWpKZJbaaFV5mCaj3hDejr9QTSRQvFJcm2VyIMTAss eZZOH5NHIR5tQwJec7uOh24SmX6C7dE90P7XT/7n5PM5qMaf+BgtHwfgz4/wztWJgBRZ UjeLQL9Xccq6B+Lzy3Yq/VW7fjmrYyp2CTaMVaihnSYlFEAVPl4de8fk3ZC/qn57NUpn z7gDqcBfq7/3Uvl5CpTgwG997dtCR52khlSTj95ahnvpMjH2qkRxmtnkskB/i0vpDUzn pdnRlsN8IZZhGVLspMZ+wEeGjWe5mc53aabKw3weyavDbcDxjiAAzIC+76cJ/2e9FmYS eXYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=JO5NojQo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f126-v6si30284831pfa.1.2018.11.16.14.56.49; Fri, 16 Nov 2018 14:57:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=JO5NojQo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730161AbeKQJJ4 (ORCPT + 99 others); Sat, 17 Nov 2018 04:09:56 -0500 Received: from mail-oi1-f196.google.com ([209.85.167.196]:34444 "EHLO mail-oi1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725819AbeKQJJz (ORCPT ); Sat, 17 Nov 2018 04:09:55 -0500 Received: by mail-oi1-f196.google.com with SMTP id h25so2652397oig.1 for ; Fri, 16 Nov 2018 14:55:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/cVwU96N2XYZtJG7MGhw5JAB55a0CKIl5sIXk+pMEgk=; b=JO5NojQoepeIpmHcVnh6lMvFByEYuBFay2UOZVzj9Iab+YDNSvsNzR+Lm9FCemI73R 2PtyWjf5utyCcE/XIKx3uTnLM6H9y8K8DvOynq9C2w+6fbrBGz3P9s2Z1FimUuURAbFI 9r5zlKPOCjALu2zlkhaN9HgCrRxeYVvopj8UMHbKAHtPGNUP2nVvEGJYAfouD0Z9Icco OmuAhtFFj5Wztwi9eLtiVIz5t3O3lBgM7bIt9n5shHzKQAN6dU2ktUn649OA1cBdltNI YaGk2rs2QVBGqvf29ohgn/WCLUqz7Hbsxl76yCS+zpnxWkOEPzXO705ma76AVeBJfKlP 61Tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/cVwU96N2XYZtJG7MGhw5JAB55a0CKIl5sIXk+pMEgk=; b=fQkxTHI3X+7gC47ILEW84a8KTbhe1bNeNSzNOWWj1wtCZWgyqf7JVVTZR3C1gWPj6h /RJ0mEXHnyP19ce4+0PA83eXbXFXpzSq0pPoztSN7l4fXM+AAGg4M1EIPsRUoyyyCP/s bSQfsWGJ11hdRhDiaCbC+pCYYmVMMR3VhOEFZ1Cxce7Bw+8DuaKGYutKiJBsoG0vqTaR IKARSE1EdsffsfVpiJ9zoJsYXxsP2c4R44Au09i5IKE+sqjKCGiE0iEnGmEdiZPRhiAc JLshkoG73bdS053P3B67E8KITlYXg17vObCDasQSGHLnviF3QE9JX0Cssm+ZnjE6T8lr lJ2g== X-Gm-Message-State: AGRZ1gLZuRkceHsjYzspb7B7cqBdLO6Y4JJ95782qduYruhRdXRbfqwq SmFRFx/+QCAab8Xc846000dCNWMeHTJLLn3E1PXD9A== X-Received: by 2002:aca:f4c2:: with SMTP id s185mr2147356oih.244.1542408938666; Fri, 16 Nov 2018 14:55:38 -0800 (PST) MIME-Version: 1.0 References: <20181114224921.12123-2-keith.busch@intel.com> <20181115135710.GD19286@bombadil.infradead.org> <20181115145920.GG11416@localhost.localdomain> <20181115203654.GA28246@bombadil.infradead.org> In-Reply-To: <20181115203654.GA28246@bombadil.infradead.org> From: Dan Williams Date: Fri, 16 Nov 2018 14:55:27 -0800 Message-ID: Subject: Re: [PATCH 1/7] node: Link memory nodes to their compute nodes To: Matthew Wilcox Cc: Keith Busch , Linux Kernel Mailing List , Linux ACPI , Linux MM , Greg KH , "Rafael J. Wysocki" , Dave Hansen Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 15, 2018 at 12:37 PM Matthew Wilcox wrote: > > On Thu, Nov 15, 2018 at 07:59:20AM -0700, Keith Busch wrote: > > On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: > > > On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: > > > > Memory-only nodes will often have affinity to a compute node, and > > > > platforms have ways to express that locality relationship. > > > > > > > > A node containing CPUs or other DMA devices that can initiate memory > > > > access are referred to as "memory iniators". A "memory target" is a > > > > node that provides at least one phyiscal address range accessible to a > > > > memory initiator. > > > > > > I think I may be confused here. If there is _no_ link from node X to > > > node Y, does that mean that node X's CPUs cannot access the memory on > > > node Y? In my mind, all nodes can access all memory in the system, > > > just not with uniform bandwidth/latency. > > > > The link is just about which nodes are "local". It's like how nodes have > > a cpulist. Other CPUs not in the node's list can acces that node's memory, > > but the ones in the mask are local, and provide useful optimization hints. > > So ... let's imagine a hypothetical system (I've never seen one built like > this, but it doesn't seem too implausible). Connect four CPU sockets in > a square, each of which has some regular DIMMs attached to it. CPU A is > 0 hops to Memory A, one hop to Memory B and Memory C, and two hops from > Memory D (each CPU only has two "QPI" links). Then maybe there's some > special memory extender device attached on the PCIe bus. Now there's > Memory B1 and B2 that's attached to CPU B and it's local to CPU B, but > not as local as Memory B is ... and we'd probably _prefer_ to allocate > memory for CPU A from Memory B1 than from Memory D. But ... *mumble*, > this seems hard. > > I understand you're trying to reflect what the HMAT table is telling you, > I'm just really fuzzy on who's ultimately consuming this information > and what decisions they're trying to drive from it. The singular "local" is a limitation of the HMAT, but I would expect the Linux translation of "local" would allow for multiple initiators that can achieve some semblance of the "best" performance. Anything less than best is going to have a wide range of variance and will likely devolve to looking at the platform firmware data table directly. The expected 80% case is software wants to be able to ask "which CPUs should I run on to get the best access to this memory?"