Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752957AbdGFXaQ (ORCPT ); Thu, 6 Jul 2017 19:30:16 -0400 Received: from mga05.intel.com ([192.55.52.43]:10395 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752575AbdGFXaO (ORCPT ); Thu, 6 Jul 2017 19:30:14 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,319,1496127600"; d="scan'208";a="989802111" Subject: Re: [RFC v2 0/5] surface heterogeneous memory performance information To: Jerome Glisse , Ross Zwisler References: <20170706215233.11329-1-ross.zwisler@linux.intel.com> <20170706230803.GE2919@redhat.com> Cc: linux-kernel@vger.kernel.org, "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Dan Williams , Greg Kroah-Hartman , Len Brown , Tim Chen , devel@acpica.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org From: Dave Hansen Message-ID: Date: Thu, 6 Jul 2017 16:30:08 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20170706230803.GE2919@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1846 Lines: 40 On 07/06/2017 04:08 PM, Jerome Glisse wrote: >> So, for applications that need to differentiate between memory ranges based >> on their performance, what option would work best for you? Is the local >> (initiator,target) performance provided by patch 5 enough, or do you >> require performance information for all possible (initiator,target) >> pairings? > > Am i right in assuming that HBM or any faster memory will be relatively small > (1GB - 8GB maybe 16GB ?) and of fix amount (ie size will depend on the exact > CPU model you have) ? For HBM, that's certainly consistent with the Xeon Phi MCDRAM. But, please remember that this patch set is for fast memory *and* slow memory (vs. plain DRAM). > If so i am wondering if we should not restrict NUMA placement policy for such > node to vma only. Forbid any policy that would prefer those node globally at > thread/process level. This would avoid wide thread policy to exhaust this > smaller pool of memory. You would like to take the NUMA APIs and bifurcate them? Make some of them able to work on this memory, and others not? So, set_mempolicy() would work if you passed it one of these "special" nodes with MPOL_F_ADDR, but would fail otherwise? > Drawback of doing so would be that existing applications would not benefit > from it. So workload where is acceptable to exhaust such memory wouldn't > benefit until their application are updated. I think the guys running 40-year-old fortran binaries might not be so keen on this restriction. I bet there are a pretty substantial number of folks out there that would love to get new hardware and just do: numactl --membind=fast-node ./old-binary If I were working for a hardware company, I'd sure like to just be able to sell somebody some fancy new hardware and have their existing software "just work" with a minimal wrapper.