Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751759AbbFYSfC (ORCPT ); Thu, 25 Jun 2015 14:35:02 -0400 Received: from mga03.intel.com ([134.134.136.65]:36297 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751136AbbFYSew (ORCPT ); Thu, 25 Jun 2015 14:34:52 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,678,1427785200"; d="scan'208";a="734699599" From: "Williams, Dan J" To: "toshi.kani@hp.com" CC: "linux-kernel@vger.kernel.org" , "mingo@kernel.org" , "hch@lst.de" , "axboe@kernel.dk" , "linux-nvdimm@lists.01.org" , "linux-fsdevel@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "boaz@plexistor.com" Subject: Re: [PATCH v2 15/17] libnvdimm: Set numa_node to NVDIMM devices Thread-Topic: [PATCH v2 15/17] libnvdimm: Set numa_node to NVDIMM devices Thread-Index: AQHQrytfA9nrfqQTNkWmdEpHhpYGKJ299GeAgAANz4A= Date: Thu, 25 Jun 2015 18:34:47 +0000 Message-ID: <1435257283.13411.4.camel@intel.com> References: <20150625090554.40066.69562.stgit@dwillia2-desk3.jf.intel.com> <20150625093738.40066.88750.stgit@dwillia2-desk3.jf.intel.com> <1435254317.11808.327.camel@misato.fc.hp.com> In-Reply-To: <1435254317.11808.327.camel@misato.fc.hp.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.140] Content-Type: text/plain; charset="utf-8" Content-ID: <99B108357E832F49AAAADC54323A9624@intel.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id t5PIZJmP025833 Content-Length: 5869 Lines: 156 On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote: > On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote: > > From: Toshi Kani > > > > ACPI NFIT table has System Physical Address Range Structure entries that > > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is > > set in the flags. > > > > Change acpi_nfit_register_region() to map a proximity ID to its node ID, > > and set it to a new numa_node field of nd_region_desc, which is then > > conveyed to the nd_region device. > > > > The device core arranges for btt and namespace devices to inherit their > > node from their parent region. > > > > Signed-off-by: Toshi Kani > > [djbw: move set_dev_node() from region 'probe' to 'create'] > > Sorry, I failed to mention other issue, which led me call set_dev_node() > in probe. nd_async_device_register() calls device_add(), which does: > > /* use parent numa_node */ > if (parent) > set_dev_node(dev, dev_to_node(parent)); > > and overwrites numa_node to -1. Since region's parent is ndbusN, we > cannot set numa_node to the parent. So, I had to set it in probe. In general, I still don't like leaving it up to ->probe() which is within its rights to fail and not set the node. How about the following that moves it to the bus uevent code? Should get triggered before probe so the numa_node is valid before userspace is ever notified about the device. device_add() does: kobject_uevent(&dev->kobj, KOBJ_ADD); bus_probe_device(dev); ...so I think we're good, agree? I also added a missing init of ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below. 8<----- Subject: libnvdimm: Set numa_node to NVDIMM devices From: Toshi Kani ACPI NFIT table has System Physical Address Range Structure entries that describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is set in the flags. Change acpi_nfit_register_region() to map a proximity ID to its node ID, and set it to a new numa_node field of nd_region_desc, which is then conveyed to the nd_region device. The device core arranges for btt and namespace devices to inherit their node from their parent region. Signed-off-by: Toshi Kani [djbw: move set_dev_node() from region.c to bus.c] Signed-off-by: Dan Williams --- arch/x86/kernel/pmem.c | 1 + drivers/acpi/nfit.c | 6 ++++++ drivers/nvdimm/bus.c | 6 ++++++ drivers/nvdimm/nd.h | 2 +- drivers/nvdimm/region_devs.c | 1 + include/linux/libnvdimm.h | 1 + 6 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c index 0f4ef472ab9e..64f90f53bb85 100644 --- a/arch/x86/kernel/pmem.c +++ b/arch/x86/kernel/pmem.c @@ -67,6 +67,7 @@ static __init int register_e820_pmem(void) memset(&ndr_desc, 0, sizeof(ndr_desc)); ndr_desc.res = &res; ndr_desc.attr_groups = e820_pmem_region_attribute_groups; + ndr_desc.numa_node = NUMA_NO_NODE; if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc)) goto err; } diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c index 1f6f1b1a54f4..d96c8fe974dd 100644 --- a/drivers/acpi/nfit.c +++ b/drivers/acpi/nfit.c @@ -1392,6 +1392,12 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc, ndr_desc->res = &res; ndr_desc->provider_data = nfit_spa; ndr_desc->attr_groups = acpi_nfit_region_attribute_groups; + if (spa->flags & ACPI_NFIT_PROXIMITY_VALID) + ndr_desc->numa_node = acpi_map_pxm_to_online_node( + spa->proximity_domain); + else + ndr_desc->numa_node = NUMA_NO_NODE; + list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) { struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev; struct nd_mapping *nd_mapping; diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index ec59f1f26d95..205344643852 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -48,6 +48,12 @@ static int to_nd_device_type(struct device *dev) static int nvdimm_bus_uevent(struct device *dev, struct kobj_uevent_env *env) { + /* + * Ensure that region devices always have their numa node set as + * early as possible. + */ + if (is_nd_pmem(dev) || is_nd_blk(dev)) + set_dev_node(dev, to_nd_region(dev)->numa_node); return add_uevent_var(env, "MODALIAS=" ND_DEVICE_MODALIAS_FMT, to_nd_device_type(dev)); } diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index b870de9add79..72c26461835d 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -96,7 +96,7 @@ struct nd_region { u16 ndr_mappings; u64 ndr_size; u64 ndr_start; - int id, num_lanes, ro; + int id, num_lanes, ro, numa_node; void *provider_data; struct nd_interleave_set *nd_set; struct nd_percpu_lane __percpu *lane; diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index 8f8c7ea485f1..55b424f6ba0d 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -736,6 +736,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, nd_region->nd_set = ndr_desc->nd_set; nd_region->num_lanes = ndr_desc->num_lanes; nd_region->ro = ro; + nd_region->numa_node = ndr_desc->numa_node; ida_init(&nd_region->ns_ida); dev = &nd_region->dev; dev_set_name(dev, "region%d", nd_region->id); diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index dc799a29ed1a..30b3deaafd51 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -89,6 +89,7 @@ struct nd_region_desc { struct nd_interleave_set *nd_set; void *provider_data; int num_lanes; + int numa_node; }; struct nvdimm_bus; ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?