Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2249603imu; Sun, 18 Nov 2018 19:37:27 -0800 (PST) X-Google-Smtp-Source: AFSGD/WdZHl04Pp9eKbdTgNdEuYwqcf25hRqEZgwaYFzXhD7TCV69gxjhMb5WqPQeIHwlgRbSFMN X-Received: by 2002:a17:902:29a7:: with SMTP id h36mr6861472plb.244.1542598647448; Sun, 18 Nov 2018 19:37:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542598647; cv=none; d=google.com; s=arc-20160816; b=Hu5Cn5BhP92rh8sBk0X5AkFzfIl4wGSpEkJmmpr93ru81djOp9BvSWgpD/voxfJEWf sv0ZBaGHjzedFN2MGJ/xB3WJ8smKJfd0PW8rnXm001oFdHnG3ZloSdtHRMOnW09J6hvE 37aTx9F+zCM3RDjcqxMjQR0qPb6a2E/UV+Dl0eGLY3/pFtsVU//ZGUYpVZtVEo3w31Vy NzUj0vnoeTVC88GnN027iDeIaWzL//Bojaboh/k0GVDzwEx71ja5IqaUmSjUv/OI9dY8 73MWvoE8noOPXu0t+AG9NlHjAKQVzzuexrM2Ug3l/tFJ6B+KtCgpmxR8vziHe2J/agvP 7cuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=LJq2eCiHpnQaYNMxsmiE+9WMurM/vGG+ApFnNlcv1dQ=; b=c8pdaIENFZRjRb8hlBm6s2iVcRoYjlIuC4VP8hs/YKFHNX/twuvlmIZGVBJK1F+vLn z3vpxE9CzNIy99Uojau9QXRfypZMn6FKMWbPK1x1WUaqlPNQeA+jKg4l6Gg0rdvzMer+ vvxgZYlWyMADp1IoGWdRJAJM9pFTZLVlKDRIHos4og4xixq0G0Ph+A2IeEVRNlNAnmZC PvtBGVcJ6M6dDIjLdhSWmBIvhzclD+DCa8cE6BoYOuML10qzxTJKpMiuq2hRNhS7sEw9 FBOUdQC3x5x9z4OBbnhjkhXIaW9XpzLN+6owU7H6KbPbAdcoUCOAH1H+ZRyg/y3pDTRg yKxg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u3si35225076pgj.300.2018.11.18.19.37.12; Sun, 18 Nov 2018 19:37:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728330AbeKSNhl (ORCPT + 99 others); Mon, 19 Nov 2018 08:37:41 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:50112 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726903AbeKSNhk (ORCPT ); Mon, 19 Nov 2018 08:37:40 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F13CF80D; Sun, 18 Nov 2018 19:15:28 -0800 (PST) Received: from [10.162.0.72] (unknown [10.162.0.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6C74A3F5A0; Sun, 18 Nov 2018 19:15:26 -0800 (PST) Subject: Re: [PATCH 1/7] node: Link memory nodes to their compute nodes To: Keith Busch , Matthew Wilcox Cc: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams References: <20181114224921.12123-2-keith.busch@intel.com> <20181115135710.GD19286@bombadil.infradead.org> <20181115145920.GG11416@localhost.localdomain> <20181115203654.GA28246@bombadil.infradead.org> <20181116183254.GD14630@localhost.localdomain> From: Anshuman Khandual Message-ID: Date: Mon, 19 Nov 2018 08:45:25 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20181116183254.GD14630@localhost.localdomain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/17/2018 12:02 AM, Keith Busch wrote: > On Thu, Nov 15, 2018 at 12:36:54PM -0800, Matthew Wilcox wrote: >> On Thu, Nov 15, 2018 at 07:59:20AM -0700, Keith Busch wrote: >>> On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: >>>> On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: >>>>> Memory-only nodes will often have affinity to a compute node, and >>>>> platforms have ways to express that locality relationship. >>>>> >>>>> A node containing CPUs or other DMA devices that can initiate memory >>>>> access are referred to as "memory iniators". A "memory target" is a >>>>> node that provides at least one phyiscal address range accessible to a >>>>> memory initiator. >>>> >>>> I think I may be confused here. If there is _no_ link from node X to >>>> node Y, does that mean that node X's CPUs cannot access the memory on >>>> node Y? In my mind, all nodes can access all memory in the system, >>>> just not with uniform bandwidth/latency. >>> >>> The link is just about which nodes are "local". It's like how nodes have >>> a cpulist. Other CPUs not in the node's list can acces that node's memory, >>> but the ones in the mask are local, and provide useful optimization hints. >> >> So ... let's imagine a hypothetical system (I've never seen one built like >> this, but it doesn't seem too implausible). Connect four CPU sockets in >> a square, each of which has some regular DIMMs attached to it. CPU A is >> 0 hops to Memory A, one hop to Memory B and Memory C, and two hops from >> Memory D (each CPU only has two "QPI" links). Then maybe there's some >> special memory extender device attached on the PCIe bus. Now there's >> Memory B1 and B2 that's attached to CPU B and it's local to CPU B, but >> not as local as Memory B is ... and we'd probably _prefer_ to allocate >> memory for CPU A from Memory B1 than from Memory D. But ... *mumble*, >> this seems hard. > > Indeed, that particular example is out of scope for this series. The > first objective is to aid a process running in node B's CPUs to allocate > memory in B1. Anything that crosses QPI are their own. This is problematic. Any new kernel API interface should accommodate B2 type memory as well from the above example which is on a PCIe bus. Because eventually they would be represented as some sort of a NUMA node and then applications will have to depend on this sysfs interface for their desired memory placement requirements. Unless this interface is thought through for B2 type of memory, it might not be extensible in the future.