2015-06-17 23:16:09

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 00/16] libnvdimm: non-volatile memory devices

A new sub-system in support of non-volatile memory storage devices.

Stephen, please add libnvdimm-for-next to -next:

git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm libnvdimm-for-next

Changes since v6 [1]:

1/ Deferred the patches dependent on ->rw_bytes() (BTT - stacked block
driver, BLK - mmio aperture windows driver, NFIT_TEST - unit test
infrastructure for all libnvdimm + nfit components) to their own
patchset. Make the ->rw_bytes() implementation the first patch in
that series (Christoph)

2/ Collected acks from Christoph and Rafael!

3/ Add a HAS_IOMEM dependency to CONFIG_BLK_DEV_PMEM following commit
b6f2098fb708 "block: pmem: Add dependency on HAS_IOMEM" in 4.1-rc8.

4/ Move libnvdimm to subsys_initcall() and move arch/x86/kernel/pmem.c
back to device_initcall(). This allows ACPI_NFIT to be built-in.
(Linda)

5/ Drop the ACPI_DRIVER_ALL_NOTIFY_EVENTS flag in the nfit driver.
(Rafael)

6/ Reference count the nvdimm_drvdata object. This fixes a bug that was
found when the unit tests were extended to test disabling an nvdimm
while a region device still had references to label data.

Diffstat since v6:

arch/x86/kernel/pmem.c | 2 +-
drivers/acpi/nfit.c | 1 -
drivers/nvdimm/Kconfig | 1 +
drivers/nvdimm/bus.c | 2 +-
drivers/nvdimm/core.c | 2 +-
drivers/nvdimm/dimm.c | 21 ++++-----------------
drivers/nvdimm/dimm_devs.c | 34 ++++++++++++++++++++++++++++++++++
drivers/nvdimm/namespace_devs.c | 43 +++++++++++++++++++++++++------------------
drivers/nvdimm/nd-core.h | 2 +-
drivers/nvdimm/nd.h | 3 +++
drivers/nvdimm/region_devs.c | 21 ++++++++-------------
include/linux/libnvdimm.h | 8 ++++++++
12 files changed, 87 insertions(+), 53 deletions(-)

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-June/001166.html


---

Dan Williams (16):
e820, efi: add ACPI 6.0 persistent memory types
libnvdimm, nfit: initial libnvdimm infrastructure and NFIT support
libnvdimm: control character device and nvdimm_bus sysfs attributes
libnvdimm, nfit: dimm/memory-devices
libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices
libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure
libnvdimm, nfit: regions (block-data-window, persistent memory, volatile memory)
libnvdimm: support for legacy (non-aliasing) nvdimms
libnvdimm, pmem: move pmem to drivers/nvdimm/
libnvdimm, pmem: add libnvdimm support to the pmem driver
libnvdimm, nfit: add interleave-set state-tracking infrastructure
libnvdimm: namespace indices: read and validate
libnvdimm: pmem label sets and namespace instantiation.
libnvdimm: blk labels and namespace instantiation
libnvdimm: write pmem label set
libnvdimm: write blk label set


arch/arm64/kernel/efi.c | 1
arch/ia64/kernel/efi.c | 4
arch/x86/Kconfig | 3
arch/x86/boot/compressed/eboot.c | 4
arch/x86/include/uapi/asm/e820.h | 1
arch/x86/kernel/e820.c | 28 +
arch/x86/kernel/pmem.c | 92 +-
arch/x86/platform/efi/efi.c | 3
drivers/Kconfig | 2
drivers/Makefile | 1
drivers/acpi/Kconfig | 26 +
drivers/acpi/Makefile | 1
drivers/acpi/nfit.c | 1123 ++++++++++++++++++++++++++
drivers/acpi/nfit.h | 100 ++
drivers/block/Kconfig | 11
drivers/block/Makefile | 1
drivers/block/pmem.c | 262 ------
drivers/nvdimm/Kconfig | 36 +
drivers/nvdimm/Makefile | 13
drivers/nvdimm/bus.c | 668 +++++++++++++++
drivers/nvdimm/core.c | 396 +++++++++
drivers/nvdimm/dimm.c | 102 ++
drivers/nvdimm/dimm_devs.c | 542 +++++++++++++
drivers/nvdimm/label.c | 926 +++++++++++++++++++++
drivers/nvdimm/label.h | 141 +++
drivers/nvdimm/namespace_devs.c | 1645 ++++++++++++++++++++++++++++++++++++++
drivers/nvdimm/nd-core.h | 80 ++
drivers/nvdimm/nd.h | 135 +++
drivers/nvdimm/pmem.c | 278 ++++++
drivers/nvdimm/region.c | 96 ++
drivers/nvdimm/region_devs.c | 585 ++++++++++++++
include/linux/efi.h | 3
include/linux/libnvdimm.h | 123 +++
include/linux/nd.h | 98 ++
include/uapi/linux/Kbuild | 1
include/uapi/linux/ndctl.h | 197 +++++
36 files changed, 7417 insertions(+), 311 deletions(-)
create mode 100644 drivers/acpi/nfit.c
create mode 100644 drivers/acpi/nfit.h
delete mode 100644 drivers/block/pmem.c
create mode 100644 drivers/nvdimm/Kconfig
create mode 100644 drivers/nvdimm/Makefile
create mode 100644 drivers/nvdimm/bus.c
create mode 100644 drivers/nvdimm/core.c
create mode 100644 drivers/nvdimm/dimm.c
create mode 100644 drivers/nvdimm/dimm_devs.c
create mode 100644 drivers/nvdimm/label.c
create mode 100644 drivers/nvdimm/label.h
create mode 100644 drivers/nvdimm/namespace_devs.c
create mode 100644 drivers/nvdimm/nd-core.h
create mode 100644 drivers/nvdimm/nd.h
create mode 100644 drivers/nvdimm/pmem.c
create mode 100644 drivers/nvdimm/region.c
create mode 100644 drivers/nvdimm/region_devs.c
create mode 100644 include/linux/libnvdimm.h
create mode 100644 include/linux/nd.h
create mode 100644 include/uapi/linux/ndctl.h


2015-06-17 23:20:42

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 01/16] e820, efi: add ACPI 6.0 persistent memory types

ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory.
Mark it "reserved" and allow it to be claimed by a persistent memory
device driver.

This definition is in addition to the Linux kernel's existing type-12
definition that was recently added in support of shipping platforms with
NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as
OEM reserved).

Note, /proc/iomem can be consulted for differentiating legacy
"Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory"
E820_PMEM.

Cc: Boaz Harrosh <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Acked-by: Jeff Moyer <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Reviewed-by: Ross Zwisler <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
arch/arm64/kernel/efi.c | 1 +
arch/ia64/kernel/efi.c | 4 ++++
arch/x86/boot/compressed/eboot.c | 4 ++++
arch/x86/include/uapi/asm/e820.h | 1 +
arch/x86/kernel/e820.c | 28 ++++++++++++++++++++++++----
arch/x86/platform/efi/efi.c | 3 +++
include/linux/efi.h | 3 ++-
7 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index ab21e0d58278..9d4aa18f2a82 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -158,6 +158,7 @@ static __init int is_reserve_region(efi_memory_desc_t *md)
case EFI_BOOT_SERVICES_CODE:
case EFI_BOOT_SERVICES_DATA:
case EFI_CONVENTIONAL_MEMORY:
+ case EFI_PERSISTENT_MEMORY:
return 0;
default:
break;
diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index c52d7540dc05..5f6be9dd6968 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -1223,6 +1223,10 @@ efi_initialize_iomem_resources(struct resource *code_resource,
flags |= IORESOURCE_DISABLED;
break;

+ case EFI_PERSISTENT_MEMORY:
+ name = "Persistent Memory";
+ break;
+
case EFI_RESERVED_TYPE:
case EFI_RUNTIME_SERVICES_CODE:
case EFI_RUNTIME_SERVICES_DATA:
diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 48304b89b601..2c82bd150d43 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -1224,6 +1224,10 @@ static efi_status_t setup_e820(struct boot_params *params,
e820_type = E820_NVS;
break;

+ case EFI_PERSISTENT_MEMORY:
+ e820_type = E820_PMEM;
+ break;
+
default:
continue;
}
diff --git a/arch/x86/include/uapi/asm/e820.h b/arch/x86/include/uapi/asm/e820.h
index 960a8a9dc4ab..0f457e6eab18 100644
--- a/arch/x86/include/uapi/asm/e820.h
+++ b/arch/x86/include/uapi/asm/e820.h
@@ -32,6 +32,7 @@
#define E820_ACPI 3
#define E820_NVS 4
#define E820_UNUSABLE 5
+#define E820_PMEM 7

/*
* This is a non-standardized way to represent ADR or NVDIMM regions that
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index e2ce85db2283..c857d53269dd 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -149,6 +149,7 @@ static void __init e820_print_type(u32 type)
case E820_UNUSABLE:
printk(KERN_CONT "unusable");
break;
+ case E820_PMEM:
case E820_PRAM:
printk(KERN_CONT "persistent (type %u)", type);
break;
@@ -918,11 +919,32 @@ static inline const char *e820_type_to_string(int e820_type)
case E820_ACPI: return "ACPI Tables";
case E820_NVS: return "ACPI Non-volatile Storage";
case E820_UNUSABLE: return "Unusable memory";
- case E820_PRAM: return "Persistent RAM";
+ case E820_PRAM: return "Persistent Memory (legacy)";
+ case E820_PMEM: return "Persistent Memory";
default: return "reserved";
}
}

+static bool do_mark_busy(u32 type, struct resource *res)
+{
+ /* this is the legacy bios/dos rom-shadow + mmio region */
+ if (res->start < (1ULL<<20))
+ return true;
+
+ /*
+ * Treat persistent memory like device memory, i.e. reserve it
+ * for exclusive use of a driver
+ */
+ switch (type) {
+ case E820_RESERVED:
+ case E820_PRAM:
+ case E820_PMEM:
+ return false;
+ default:
+ return true;
+ }
+}
+
/*
* Mark e820 reserved areas as busy for the resource manager.
*/
@@ -952,9 +974,7 @@ void __init e820_reserve_resources(void)
* pci device BAR resource and insert them later in
* pcibios_resource_survey()
*/
- if (((e820.map[i].type != E820_RESERVED) &&
- (e820.map[i].type != E820_PRAM)) ||
- res->start < (1ULL<<20)) {
+ if (do_mark_busy(e820.map[i].type, res)) {
res->flags |= IORESOURCE_BUSY;
insert_resource(&iomem_resource, res);
}
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 02744df576d5..fe01ae37a2a4 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -153,6 +153,9 @@ static void __init do_add_efi_memmap(void)
case EFI_UNUSABLE_MEMORY:
e820_type = E820_UNUSABLE;
break;
+ case EFI_PERSISTENT_MEMORY:
+ e820_type = E820_PMEM;
+ break;
default:
/*
* EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
diff --git a/include/linux/efi.h b/include/linux/efi.h
index af5be0368dec..825b6e3d69cb 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -85,7 +85,8 @@ typedef struct {
#define EFI_MEMORY_MAPPED_IO 11
#define EFI_MEMORY_MAPPED_IO_PORT_SPACE 12
#define EFI_PAL_CODE 13
-#define EFI_MAX_MEMORY_TYPE 14
+#define EFI_PERSISTENT_MEMORY 14
+#define EFI_MAX_MEMORY_TYPE 15

/* Attribute values: */
#define EFI_MEMORY_UC ((u64)0x0000000000000001ULL) /* uncached */

2015-06-17 23:16:19

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 02/16] libnvdimm, nfit: initial libnvdimm infrastructure and NFIT support

A struct nvdimm_bus is the anchor device for registering nvdimm
resources and interfaces, for example, a character control device,
nvdimm devices, and I/O region devices. The ACPI NFIT (NVDIMM Firmware
Interface Table) is one possible platform description for such
non-volatile memory resources in a system. The nfit.ko driver attaches
to the "ACPI0012" device that indicates the presence of the NFIT and
parses the table to register a struct nvdimm_bus instance.

Cc: <[email protected]>
Cc: Lv Zheng <[email protected]>
Cc: Robert Moore <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Acked-by: Jeff Moyer <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/Kconfig | 2
drivers/Makefile | 1
drivers/acpi/Kconfig | 14 +
drivers/acpi/Makefile | 1
drivers/acpi/nfit.c | 481 +++++++++++++++++++++++++++++++++++++++++++++
drivers/acpi/nfit.h | 90 ++++++++
drivers/nvdimm/Kconfig | 15 +
drivers/nvdimm/Makefile | 3
drivers/nvdimm/core.c | 69 ++++++
drivers/nvdimm/nd-core.h | 23 ++
include/linux/libnvdimm.h | 34 +++
11 files changed, 733 insertions(+)
create mode 100644 drivers/acpi/nfit.c
create mode 100644 drivers/acpi/nfit.h
create mode 100644 drivers/nvdimm/Kconfig
create mode 100644 drivers/nvdimm/Makefile
create mode 100644 drivers/nvdimm/core.c
create mode 100644 drivers/nvdimm/nd-core.h
create mode 100644 include/linux/libnvdimm.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index c0cc96bab9e7..6e973b8e3a3b 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -182,4 +182,6 @@ source "drivers/thunderbolt/Kconfig"

source "drivers/android/Kconfig"

+source "drivers/nvdimm/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 46d2554be404..692adf659028 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_FB_INTEL) += video/fbdev/intelfb/

obj-$(CONFIG_PARPORT) += parport/
obj-y += base/ block/ misc/ mfd/ nfc/
+obj-$(CONFIG_LIBNVDIMM) += nvdimm/
obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/
obj-$(CONFIG_NUBUS) += nubus/
obj-y += macintosh/
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index ab2cbb51c6aa..300b4ef3712b 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -383,6 +383,20 @@ config ACPI_REDUCED_HARDWARE_ONLY

If you are unsure what to do, do not enable this option.

+config ACPI_NFIT
+ tristate "ACPI NVDIMM Firmware Interface Table (NFIT)"
+ depends on PHYS_ADDR_T_64BIT
+ depends on BLK_DEV
+ select LIBNVDIMM
+ help
+ Infrastructure to probe ACPI 6 compliant platforms for
+ NVDIMMs (NFIT) and register a libnvdimm device tree. In
+ addition to storage devices this also enables libnvdimm to pass
+ ACPI._DSM messages for platform/dimm configuration.
+
+ To compile this driver as a module, choose M here:
+ the module will be called nfit.
+
source "drivers/acpi/apei/Kconfig"

config ACPI_EXTLOG
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 8a063e276530..f7e9c92ccdcb 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -71,6 +71,7 @@ obj-$(CONFIG_ACPI_PCI_SLOT) += pci_slot.o
obj-$(CONFIG_ACPI_PROCESSOR) += processor.o
obj-y += container.o
obj-$(CONFIG_ACPI_THERMAL) += thermal.o
+obj-$(CONFIG_ACPI_NFIT) += nfit.o
obj-y += acpi_memhotplug.o
obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
obj-$(CONFIG_ACPI_BATTERY) += battery.o
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
new file mode 100644
index 000000000000..9f9a20bae0ac
--- /dev/null
+++ b/drivers/acpi/nfit.c
@@ -0,0 +1,481 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/list_sort.h>
+#include <linux/libnvdimm.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/acpi.h>
+#include "nfit.h"
+
+static u8 nfit_uuid[NFIT_UUID_MAX][16];
+
+static const u8 *to_nfit_uuid(enum nfit_uuids id)
+{
+ return nfit_uuid[id];
+}
+
+static int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc,
+ struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len)
+{
+ return -ENOTTY;
+}
+
+static const char *spa_type_name(u16 type)
+{
+ static const char *to_name[] = {
+ [NFIT_SPA_VOLATILE] = "volatile",
+ [NFIT_SPA_PM] = "pmem",
+ [NFIT_SPA_DCR] = "dimm-control-region",
+ [NFIT_SPA_BDW] = "block-data-window",
+ [NFIT_SPA_VDISK] = "volatile-disk",
+ [NFIT_SPA_VCD] = "volatile-cd",
+ [NFIT_SPA_PDISK] = "persistent-disk",
+ [NFIT_SPA_PCD] = "persistent-cd",
+
+ };
+
+ if (type > NFIT_SPA_PCD)
+ return "unknown";
+
+ return to_name[type];
+}
+
+static int nfit_spa_type(struct acpi_nfit_system_address *spa)
+{
+ int i;
+
+ for (i = 0; i < NFIT_UUID_MAX; i++)
+ if (memcmp(to_nfit_uuid(i), spa->range_guid, 16) == 0)
+ return i;
+ return -1;
+}
+
+static bool add_spa(struct acpi_nfit_desc *acpi_desc,
+ struct acpi_nfit_system_address *spa)
+{
+ struct device *dev = acpi_desc->dev;
+ struct nfit_spa *nfit_spa = devm_kzalloc(dev, sizeof(*nfit_spa),
+ GFP_KERNEL);
+
+ if (!nfit_spa)
+ return false;
+ INIT_LIST_HEAD(&nfit_spa->list);
+ nfit_spa->spa = spa;
+ list_add_tail(&nfit_spa->list, &acpi_desc->spas);
+ dev_dbg(dev, "%s: spa index: %d type: %s\n", __func__,
+ spa->range_index,
+ spa_type_name(nfit_spa_type(spa)));
+ return true;
+}
+
+static bool add_memdev(struct acpi_nfit_desc *acpi_desc,
+ struct acpi_nfit_memory_map *memdev)
+{
+ struct device *dev = acpi_desc->dev;
+ struct nfit_memdev *nfit_memdev = devm_kzalloc(dev,
+ sizeof(*nfit_memdev), GFP_KERNEL);
+
+ if (!nfit_memdev)
+ return false;
+ INIT_LIST_HEAD(&nfit_memdev->list);
+ nfit_memdev->memdev = memdev;
+ list_add_tail(&nfit_memdev->list, &acpi_desc->memdevs);
+ dev_dbg(dev, "%s: memdev handle: %#x spa: %d dcr: %d\n",
+ __func__, memdev->device_handle, memdev->range_index,
+ memdev->region_index);
+ return true;
+}
+
+static bool add_dcr(struct acpi_nfit_desc *acpi_desc,
+ struct acpi_nfit_control_region *dcr)
+{
+ struct device *dev = acpi_desc->dev;
+ struct nfit_dcr *nfit_dcr = devm_kzalloc(dev, sizeof(*nfit_dcr),
+ GFP_KERNEL);
+
+ if (!nfit_dcr)
+ return false;
+ INIT_LIST_HEAD(&nfit_dcr->list);
+ nfit_dcr->dcr = dcr;
+ list_add_tail(&nfit_dcr->list, &acpi_desc->dcrs);
+ dev_dbg(dev, "%s: dcr index: %d windows: %d\n", __func__,
+ dcr->region_index, dcr->windows);
+ return true;
+}
+
+static bool add_bdw(struct acpi_nfit_desc *acpi_desc,
+ struct acpi_nfit_data_region *bdw)
+{
+ struct device *dev = acpi_desc->dev;
+ struct nfit_bdw *nfit_bdw = devm_kzalloc(dev, sizeof(*nfit_bdw),
+ GFP_KERNEL);
+
+ if (!nfit_bdw)
+ return false;
+ INIT_LIST_HEAD(&nfit_bdw->list);
+ nfit_bdw->bdw = bdw;
+ list_add_tail(&nfit_bdw->list, &acpi_desc->bdws);
+ dev_dbg(dev, "%s: bdw dcr: %d windows: %d\n", __func__,
+ bdw->region_index, bdw->windows);
+ return true;
+}
+
+static void *add_table(struct acpi_nfit_desc *acpi_desc, void *table,
+ const void *end)
+{
+ struct device *dev = acpi_desc->dev;
+ struct acpi_nfit_header *hdr;
+ void *err = ERR_PTR(-ENOMEM);
+
+ if (table >= end)
+ return NULL;
+
+ hdr = table;
+ switch (hdr->type) {
+ case ACPI_NFIT_TYPE_SYSTEM_ADDRESS:
+ if (!add_spa(acpi_desc, table))
+ return err;
+ break;
+ case ACPI_NFIT_TYPE_MEMORY_MAP:
+ if (!add_memdev(acpi_desc, table))
+ return err;
+ break;
+ case ACPI_NFIT_TYPE_CONTROL_REGION:
+ if (!add_dcr(acpi_desc, table))
+ return err;
+ break;
+ case ACPI_NFIT_TYPE_DATA_REGION:
+ if (!add_bdw(acpi_desc, table))
+ return err;
+ break;
+ /* TODO */
+ case ACPI_NFIT_TYPE_INTERLEAVE:
+ dev_dbg(dev, "%s: idt\n", __func__);
+ break;
+ case ACPI_NFIT_TYPE_FLUSH_ADDRESS:
+ dev_dbg(dev, "%s: flush\n", __func__);
+ break;
+ case ACPI_NFIT_TYPE_SMBIOS:
+ dev_dbg(dev, "%s: smbios\n", __func__);
+ break;
+ default:
+ dev_err(dev, "unknown table '%d' parsing nfit\n", hdr->type);
+ break;
+ }
+
+ return table + hdr->length;
+}
+
+static void nfit_mem_find_spa_bdw(struct acpi_nfit_desc *acpi_desc,
+ struct nfit_mem *nfit_mem)
+{
+ u32 device_handle = __to_nfit_memdev(nfit_mem)->device_handle;
+ u16 dcr = nfit_mem->dcr->region_index;
+ struct nfit_spa *nfit_spa;
+
+ list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+ u16 range_index = nfit_spa->spa->range_index;
+ int type = nfit_spa_type(nfit_spa->spa);
+ struct nfit_memdev *nfit_memdev;
+
+ if (type != NFIT_SPA_BDW)
+ continue;
+
+ list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
+ if (nfit_memdev->memdev->range_index != range_index)
+ continue;
+ if (nfit_memdev->memdev->device_handle != device_handle)
+ continue;
+ if (nfit_memdev->memdev->region_index != dcr)
+ continue;
+
+ nfit_mem->spa_bdw = nfit_spa->spa;
+ return;
+ }
+ }
+
+ dev_dbg(acpi_desc->dev, "SPA-BDW not found for SPA-DCR %d\n",
+ nfit_mem->spa_dcr->range_index);
+ nfit_mem->bdw = NULL;
+}
+
+static int nfit_mem_add(struct acpi_nfit_desc *acpi_desc,
+ struct nfit_mem *nfit_mem, struct acpi_nfit_system_address *spa)
+{
+ u16 dcr = __to_nfit_memdev(nfit_mem)->region_index;
+ struct nfit_dcr *nfit_dcr;
+ struct nfit_bdw *nfit_bdw;
+
+ list_for_each_entry(nfit_dcr, &acpi_desc->dcrs, list) {
+ if (nfit_dcr->dcr->region_index != dcr)
+ continue;
+ nfit_mem->dcr = nfit_dcr->dcr;
+ break;
+ }
+
+ if (!nfit_mem->dcr) {
+ dev_dbg(acpi_desc->dev, "SPA %d missing:%s%s\n",
+ spa->range_index, __to_nfit_memdev(nfit_mem)
+ ? "" : " MEMDEV", nfit_mem->dcr ? "" : " DCR");
+ return -ENODEV;
+ }
+
+ /*
+ * We've found enough to create an nvdimm, optionally
+ * find an associated BDW
+ */
+ list_add(&nfit_mem->list, &acpi_desc->dimms);
+
+ list_for_each_entry(nfit_bdw, &acpi_desc->bdws, list) {
+ if (nfit_bdw->bdw->region_index != dcr)
+ continue;
+ nfit_mem->bdw = nfit_bdw->bdw;
+ break;
+ }
+
+ if (!nfit_mem->bdw)
+ return 0;
+
+ nfit_mem_find_spa_bdw(acpi_desc, nfit_mem);
+ return 0;
+}
+
+static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
+ struct acpi_nfit_system_address *spa)
+{
+ struct nfit_mem *nfit_mem, *found;
+ struct nfit_memdev *nfit_memdev;
+ int type = nfit_spa_type(spa);
+ u16 dcr;
+
+ switch (type) {
+ case NFIT_SPA_DCR:
+ case NFIT_SPA_PM:
+ break;
+ default:
+ return 0;
+ }
+
+ list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
+ int rc;
+
+ if (nfit_memdev->memdev->range_index != spa->range_index)
+ continue;
+ found = NULL;
+ dcr = nfit_memdev->memdev->region_index;
+ list_for_each_entry(nfit_mem, &acpi_desc->dimms, list)
+ if (__to_nfit_memdev(nfit_mem)->region_index == dcr) {
+ found = nfit_mem;
+ break;
+ }
+
+ if (found)
+ nfit_mem = found;
+ else {
+ nfit_mem = devm_kzalloc(acpi_desc->dev,
+ sizeof(*nfit_mem), GFP_KERNEL);
+ if (!nfit_mem)
+ return -ENOMEM;
+ INIT_LIST_HEAD(&nfit_mem->list);
+ }
+
+ if (type == NFIT_SPA_DCR) {
+ /* multiple dimms may share a SPA when interleaved */
+ nfit_mem->spa_dcr = spa;
+ nfit_mem->memdev_dcr = nfit_memdev->memdev;
+ } else {
+ /*
+ * A single dimm may belong to multiple SPA-PM
+ * ranges, record at least one in addition to
+ * any SPA-DCR range.
+ */
+ nfit_mem->memdev_pmem = nfit_memdev->memdev;
+ }
+
+ if (found)
+ continue;
+
+ rc = nfit_mem_add(acpi_desc, nfit_mem, spa);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+static int nfit_mem_cmp(void *priv, struct list_head *_a, struct list_head *_b)
+{
+ struct nfit_mem *a = container_of(_a, typeof(*a), list);
+ struct nfit_mem *b = container_of(_b, typeof(*b), list);
+ u32 handleA, handleB;
+
+ handleA = __to_nfit_memdev(a)->device_handle;
+ handleB = __to_nfit_memdev(b)->device_handle;
+ if (handleA < handleB)
+ return -1;
+ else if (handleA > handleB)
+ return 1;
+ return 0;
+}
+
+static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
+{
+ struct nfit_spa *nfit_spa;
+
+ /*
+ * For each SPA-DCR or SPA-PMEM address range find its
+ * corresponding MEMDEV(s). From each MEMDEV find the
+ * corresponding DCR. Then, if we're operating on a SPA-DCR,
+ * try to find a SPA-BDW and a corresponding BDW that references
+ * the DCR. Throw it all into an nfit_mem object. Note, that
+ * BDWs are optional.
+ */
+ list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+ int rc;
+
+ rc = nfit_mem_dcr_init(acpi_desc, nfit_spa->spa);
+ if (rc)
+ return rc;
+ }
+
+ list_sort(NULL, &acpi_desc->dimms, nfit_mem_cmp);
+
+ return 0;
+}
+
+static int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
+{
+ struct device *dev = acpi_desc->dev;
+ const void *end;
+ u8 *data;
+
+ INIT_LIST_HEAD(&acpi_desc->spas);
+ INIT_LIST_HEAD(&acpi_desc->dcrs);
+ INIT_LIST_HEAD(&acpi_desc->bdws);
+ INIT_LIST_HEAD(&acpi_desc->memdevs);
+ INIT_LIST_HEAD(&acpi_desc->dimms);
+
+ data = (u8 *) acpi_desc->nfit;
+ end = data + sz;
+ data += sizeof(struct acpi_table_nfit);
+ while (!IS_ERR_OR_NULL(data))
+ data = add_table(acpi_desc, data, end);
+
+ if (IS_ERR(data)) {
+ dev_dbg(dev, "%s: nfit table parsing error: %ld\n", __func__,
+ PTR_ERR(data));
+ return PTR_ERR(data);
+ }
+
+ if (nfit_mem_init(acpi_desc) != 0)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static int acpi_nfit_add(struct acpi_device *adev)
+{
+ struct nvdimm_bus_descriptor *nd_desc;
+ struct acpi_nfit_desc *acpi_desc;
+ struct device *dev = &adev->dev;
+ struct acpi_table_header *tbl;
+ acpi_status status = AE_OK;
+ acpi_size sz;
+ int rc;
+
+ status = acpi_get_table_with_size("NFIT", 0, &tbl, &sz);
+ if (ACPI_FAILURE(status)) {
+ dev_err(dev, "failed to find NFIT\n");
+ return -ENXIO;
+ }
+
+ acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
+ if (!acpi_desc)
+ return -ENOMEM;
+
+ dev_set_drvdata(dev, acpi_desc);
+ acpi_desc->dev = dev;
+ acpi_desc->nfit = (struct acpi_table_nfit *) tbl;
+ nd_desc = &acpi_desc->nd_desc;
+ nd_desc->provider_name = "ACPI.NFIT";
+ nd_desc->ndctl = acpi_nfit_ctl;
+
+ acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, nd_desc);
+ if (!acpi_desc->nvdimm_bus)
+ return -ENXIO;
+
+ rc = acpi_nfit_init(acpi_desc, sz);
+ if (rc) {
+ nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
+ return rc;
+ }
+ return 0;
+}
+
+static int acpi_nfit_remove(struct acpi_device *adev)
+{
+ struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(&adev->dev);
+
+ nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
+ return 0;
+}
+
+static const struct acpi_device_id acpi_nfit_ids[] = {
+ { "ACPI0012", 0 },
+ { "", 0 },
+};
+MODULE_DEVICE_TABLE(acpi, acpi_nfit_ids);
+
+static struct acpi_driver acpi_nfit_driver = {
+ .name = KBUILD_MODNAME,
+ .ids = acpi_nfit_ids,
+ .ops = {
+ .add = acpi_nfit_add,
+ .remove = acpi_nfit_remove,
+ },
+};
+
+static __init int nfit_init(void)
+{
+ BUILD_BUG_ON(sizeof(struct acpi_table_nfit) != 40);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_system_address) != 56);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_memory_map) != 48);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_interleave) != 20);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_smbios) != 9);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_control_region) != 80);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_data_region) != 40);
+
+ acpi_str_to_uuid(UUID_VOLATILE_MEMORY, nfit_uuid[NFIT_SPA_VOLATILE]);
+ acpi_str_to_uuid(UUID_PERSISTENT_MEMORY, nfit_uuid[NFIT_SPA_PM]);
+ acpi_str_to_uuid(UUID_CONTROL_REGION, nfit_uuid[NFIT_SPA_DCR]);
+ acpi_str_to_uuid(UUID_DATA_REGION, nfit_uuid[NFIT_SPA_BDW]);
+ acpi_str_to_uuid(UUID_VOLATILE_VIRTUAL_DISK, nfit_uuid[NFIT_SPA_VDISK]);
+ acpi_str_to_uuid(UUID_VOLATILE_VIRTUAL_CD, nfit_uuid[NFIT_SPA_VCD]);
+ acpi_str_to_uuid(UUID_PERSISTENT_VIRTUAL_DISK, nfit_uuid[NFIT_SPA_PDISK]);
+ acpi_str_to_uuid(UUID_PERSISTENT_VIRTUAL_CD, nfit_uuid[NFIT_SPA_PCD]);
+ acpi_str_to_uuid(UUID_NFIT_BUS, nfit_uuid[NFIT_DEV_BUS]);
+ acpi_str_to_uuid(UUID_NFIT_DIMM, nfit_uuid[NFIT_DEV_DIMM]);
+
+ return acpi_bus_register_driver(&acpi_nfit_driver);
+}
+
+static __exit void nfit_exit(void)
+{
+ acpi_bus_unregister_driver(&acpi_nfit_driver);
+}
+
+module_init(nfit_init);
+module_exit(nfit_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
new file mode 100644
index 000000000000..c6baba64703f
--- /dev/null
+++ b/drivers/acpi/nfit.h
@@ -0,0 +1,90 @@
+/*
+ * NVDIMM Firmware Interface Table - NFIT
+ *
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __NFIT_H__
+#define __NFIT_H__
+#include <linux/libnvdimm.h>
+#include <linux/types.h>
+#include <linux/uuid.h>
+#include <linux/acpi.h>
+#include <acpi/acuuid.h>
+
+#define UUID_NFIT_BUS "2f10e7a4-9e91-11e4-89d3-123b93f75cba"
+#define UUID_NFIT_DIMM "4309ac30-0d11-11e4-9191-0800200c9a66"
+
+enum nfit_uuids {
+ NFIT_SPA_VOLATILE,
+ NFIT_SPA_PM,
+ NFIT_SPA_DCR,
+ NFIT_SPA_BDW,
+ NFIT_SPA_VDISK,
+ NFIT_SPA_VCD,
+ NFIT_SPA_PDISK,
+ NFIT_SPA_PCD,
+ NFIT_DEV_BUS,
+ NFIT_DEV_DIMM,
+ NFIT_UUID_MAX,
+};
+
+struct nfit_spa {
+ struct acpi_nfit_system_address *spa;
+ struct list_head list;
+};
+
+struct nfit_dcr {
+ struct acpi_nfit_control_region *dcr;
+ struct list_head list;
+};
+
+struct nfit_bdw {
+ struct acpi_nfit_data_region *bdw;
+ struct list_head list;
+};
+
+struct nfit_memdev {
+ struct acpi_nfit_memory_map *memdev;
+ struct list_head list;
+};
+
+/* assembled tables for a given dimm/memory-device */
+struct nfit_mem {
+ struct acpi_nfit_memory_map *memdev_dcr;
+ struct acpi_nfit_memory_map *memdev_pmem;
+ struct acpi_nfit_control_region *dcr;
+ struct acpi_nfit_data_region *bdw;
+ struct acpi_nfit_system_address *spa_dcr;
+ struct acpi_nfit_system_address *spa_bdw;
+ struct list_head list;
+};
+
+struct acpi_nfit_desc {
+ struct nvdimm_bus_descriptor nd_desc;
+ struct acpi_table_nfit *nfit;
+ struct list_head memdevs;
+ struct list_head dimms;
+ struct list_head spas;
+ struct list_head dcrs;
+ struct list_head bdws;
+ struct nvdimm_bus *nvdimm_bus;
+ struct device *dev;
+};
+
+static inline struct acpi_nfit_memory_map *__to_nfit_memdev(
+ struct nfit_mem *nfit_mem)
+{
+ if (nfit_mem->memdev_dcr)
+ return nfit_mem->memdev_dcr;
+ return nfit_mem->memdev_pmem;
+}
+#endif /* __NFIT_H__ */
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
new file mode 100644
index 000000000000..92933551f846
--- /dev/null
+++ b/drivers/nvdimm/Kconfig
@@ -0,0 +1,15 @@
+config LIBNVDIMM
+ tristate "NVDIMM (Non-Volatile Memory Device) Support"
+ depends on PHYS_ADDR_T_64BIT
+ depends on BLK_DEV
+ help
+ Generic support for non-volatile memory devices including
+ ACPI-6-NFIT defined resources. On platforms that define an
+ NFIT, or otherwise can discover NVDIMM resources, a libnvdimm
+ bus is registered to advertise PMEM (persistent memory)
+ namespaces (/dev/pmemX) and BLK (sliding mmio window(s))
+ namespaces (/dev/ndX). A PMEM namespace refers to a memory
+ resource that may span multiple DIMMs and support DAX (see
+ CONFIG_DAX). A BLK namespace refers to an NVDIMM control
+ region which exposes an mmio register set for windowed
+ access mode to non-volatile memory.
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
new file mode 100644
index 000000000000..10bc7af47992
--- /dev/null
+++ b/drivers/nvdimm/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
+
+libnvdimm-y := core.o
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
new file mode 100644
index 000000000000..c578a49867ac
--- /dev/null
+++ b/drivers/nvdimm/core.c
@@ -0,0 +1,69 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/libnvdimm.h>
+#include <linux/export.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include "nd-core.h"
+
+static DEFINE_IDA(nd_ida);
+
+static void nvdimm_bus_release(struct device *dev)
+{
+ struct nvdimm_bus *nvdimm_bus;
+
+ nvdimm_bus = container_of(dev, struct nvdimm_bus, dev);
+ ida_simple_remove(&nd_ida, nvdimm_bus->id);
+ kfree(nvdimm_bus);
+}
+
+struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
+ struct nvdimm_bus_descriptor *nd_desc)
+{
+ struct nvdimm_bus *nvdimm_bus;
+ int rc;
+
+ nvdimm_bus = kzalloc(sizeof(*nvdimm_bus), GFP_KERNEL);
+ if (!nvdimm_bus)
+ return NULL;
+ nvdimm_bus->id = ida_simple_get(&nd_ida, 0, 0, GFP_KERNEL);
+ if (nvdimm_bus->id < 0) {
+ kfree(nvdimm_bus);
+ return NULL;
+ }
+ nvdimm_bus->nd_desc = nd_desc;
+ nvdimm_bus->dev.parent = parent;
+ nvdimm_bus->dev.release = nvdimm_bus_release;
+ dev_set_name(&nvdimm_bus->dev, "ndbus%d", nvdimm_bus->id);
+ rc = device_register(&nvdimm_bus->dev);
+ if (rc) {
+ dev_dbg(&nvdimm_bus->dev, "registration failed: %d\n", rc);
+ put_device(&nvdimm_bus->dev);
+ return NULL;
+ }
+
+ return nvdimm_bus;
+}
+EXPORT_SYMBOL_GPL(nvdimm_bus_register);
+
+void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus)
+{
+ if (!nvdimm_bus)
+ return;
+ device_unregister(&nvdimm_bus->dev);
+}
+EXPORT_SYMBOL_GPL(nvdimm_bus_unregister);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
new file mode 100644
index 000000000000..291b2fdcd96b
--- /dev/null
+++ b/drivers/nvdimm/nd-core.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __ND_CORE_H__
+#define __ND_CORE_H__
+#include <linux/libnvdimm.h>
+#include <linux/device.h>
+
+struct nvdimm_bus {
+ struct nvdimm_bus_descriptor *nd_desc;
+ struct device dev;
+ int id;
+};
+#endif /* __ND_CORE_H__ */
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
new file mode 100644
index 000000000000..2b3c63950c91
--- /dev/null
+++ b/include/linux/libnvdimm.h
@@ -0,0 +1,34 @@
+/*
+ * libnvdimm - Non-volatile-memory Devices Subsystem
+ *
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __LIBNVDIMM_H__
+#define __LIBNVDIMM_H__
+struct nvdimm;
+struct nvdimm_bus_descriptor;
+typedef int (*ndctl_fn)(struct nvdimm_bus_descriptor *nd_desc,
+ struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len);
+
+struct nvdimm_bus_descriptor {
+ unsigned long dsm_mask;
+ char *provider_name;
+ ndctl_fn ndctl;
+};
+
+struct device;
+struct nvdimm_bus;
+struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
+ struct nvdimm_bus_descriptor *nfit_desc);
+void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus);
+#endif /* __LIBNVDIMM_H__ */

2015-06-17 23:16:27

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 03/16] libnvdimm: control character device and nvdimm_bus sysfs attributes

The control device for a nvdimm_bus is registered as an "nd" class
device. The expectation is that there will usually only be one "nd" bus
registered under /sys/class/nd. However, we allow for the possibility
of multiple buses and they will listed in discovery order as
ndctl0...ndctlN. This character device hosts the ioctl for passing
control messages. The initial command set has a 1:1 correlation with
the commands listed in the by the "NFIT DSM Example" document [1], but
this scheme is extensible to future command sets.

Note, nd_ioctl() and the backing ->ndctl() implementation are defined in
a subsequent patch. This is simply the initial registrations and sysfs
attributes.

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

Cc: Neil Brown <[email protected]>
Cc: Greg KH <[email protected]>
Cc: <[email protected]>
Cc: Robert Moore <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit.c | 28 ++++++++++++++
drivers/acpi/nfit.h | 6 +++
drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/bus.c | 83 ++++++++++++++++++++++++++++++++++++++++++
drivers/nvdimm/core.c | 88 ++++++++++++++++++++++++++++++++++++++++++++-
drivers/nvdimm/nd-core.h | 6 +++
include/linux/libnvdimm.h | 6 +++
7 files changed, 215 insertions(+), 3 deletions(-)
create mode 100644 drivers/nvdimm/bus.c

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 9f9a20bae0ac..162391b32022 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -354,6 +354,33 @@ static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
return 0;
}

+static ssize_t revision_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+ struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+
+ return sprintf(buf, "%d\n", acpi_desc->nfit->header.revision);
+}
+static DEVICE_ATTR_RO(revision);
+
+static struct attribute *acpi_nfit_attributes[] = {
+ &dev_attr_revision.attr,
+ NULL,
+};
+
+static struct attribute_group acpi_nfit_attribute_group = {
+ .name = "nfit",
+ .attrs = acpi_nfit_attributes,
+};
+
+static const struct attribute_group *acpi_nfit_attribute_groups[] = {
+ &nvdimm_bus_attribute_group,
+ &acpi_nfit_attribute_group,
+ NULL,
+};
+
static int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
{
struct device *dev = acpi_desc->dev;
@@ -410,6 +437,7 @@ static int acpi_nfit_add(struct acpi_device *adev)
nd_desc = &acpi_desc->nd_desc;
nd_desc->provider_name = "ACPI.NFIT";
nd_desc->ndctl = acpi_nfit_ctl;
+ nd_desc->attr_groups = acpi_nfit_attribute_groups;

acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, nd_desc);
if (!acpi_desc->nvdimm_bus)
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index c6baba64703f..eda565fa81ef 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -87,4 +87,10 @@ static inline struct acpi_nfit_memory_map *__to_nfit_memdev(
return nfit_mem->memdev_dcr;
return nfit_mem->memdev_pmem;
}
+
+static inline struct acpi_nfit_desc *to_acpi_desc(
+ struct nvdimm_bus_descriptor *nd_desc)
+{
+ return container_of(nd_desc, struct acpi_nfit_desc, nd_desc);
+}
#endif /* __NFIT_H__ */
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 10bc7af47992..8a8293c72c7c 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -1,3 +1,4 @@
obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o

libnvdimm-y := core.o
+libnvdimm-y += bus.o
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
new file mode 100644
index 000000000000..3f7c690a5d0c
--- /dev/null
+++ b/drivers/nvdimm/bus.c
@@ -0,0 +1,83 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/uaccess.h>
+#include <linux/fcntl.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/io.h>
+#include "nd-core.h"
+
+static int nvdimm_bus_major;
+static struct class *nd_class;
+
+int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus)
+{
+ dev_t devt = MKDEV(nvdimm_bus_major, nvdimm_bus->id);
+ struct device *dev;
+
+ dev = device_create(nd_class, &nvdimm_bus->dev, devt, nvdimm_bus,
+ "ndctl%d", nvdimm_bus->id);
+
+ if (IS_ERR(dev)) {
+ dev_dbg(&nvdimm_bus->dev, "failed to register ndctl%d: %ld\n",
+ nvdimm_bus->id, PTR_ERR(dev));
+ return PTR_ERR(dev);
+ }
+ return 0;
+}
+
+void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus)
+{
+ device_destroy(nd_class, MKDEV(nvdimm_bus_major, nvdimm_bus->id));
+}
+
+static long nd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ return -ENXIO;
+}
+
+static const struct file_operations nvdimm_bus_fops = {
+ .owner = THIS_MODULE,
+ .open = nonseekable_open,
+ .unlocked_ioctl = nd_ioctl,
+ .compat_ioctl = nd_ioctl,
+ .llseek = noop_llseek,
+};
+
+int __init nvdimm_bus_init(void)
+{
+ int rc;
+
+ rc = register_chrdev(0, "ndctl", &nvdimm_bus_fops);
+ if (rc < 0)
+ return rc;
+ nvdimm_bus_major = rc;
+
+ nd_class = class_create(THIS_MODULE, "nd");
+ if (IS_ERR(nd_class))
+ goto err_class;
+
+ return 0;
+
+ err_class:
+ unregister_chrdev(nvdimm_bus_major, "ndctl");
+
+ return rc;
+}
+
+void __exit nvdimm_bus_exit(void)
+{
+ class_destroy(nd_class);
+ unregister_chrdev(nvdimm_bus_major, "ndctl");
+}
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index c578a49867ac..fa7ab5ad0318 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -14,9 +14,12 @@
#include <linux/export.h>
#include <linux/module.h>
#include <linux/device.h>
+#include <linux/mutex.h>
#include <linux/slab.h>
#include "nd-core.h"

+static LIST_HEAD(nvdimm_bus_list);
+static DEFINE_MUTEX(nvdimm_bus_list_mutex);
static DEFINE_IDA(nd_ida);

static void nvdimm_bus_release(struct device *dev)
@@ -28,6 +31,55 @@ static void nvdimm_bus_release(struct device *dev)
kfree(nvdimm_bus);
}

+struct nvdimm_bus *to_nvdimm_bus(struct device *dev)
+{
+ struct nvdimm_bus *nvdimm_bus;
+
+ nvdimm_bus = container_of(dev, struct nvdimm_bus, dev);
+ WARN_ON(nvdimm_bus->dev.release != nvdimm_bus_release);
+ return nvdimm_bus;
+}
+EXPORT_SYMBOL_GPL(to_nvdimm_bus);
+
+struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus)
+{
+ /* struct nvdimm_bus definition is private to libnvdimm */
+ return nvdimm_bus->nd_desc;
+}
+EXPORT_SYMBOL_GPL(to_nd_desc);
+
+static const char *nvdimm_bus_provider(struct nvdimm_bus *nvdimm_bus)
+{
+ struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+ struct device *parent = nvdimm_bus->dev.parent;
+
+ if (nd_desc->provider_name)
+ return nd_desc->provider_name;
+ else if (parent)
+ return dev_name(parent);
+ else
+ return "unknown";
+}
+
+static ssize_t provider_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+
+ return sprintf(buf, "%s\n", nvdimm_bus_provider(nvdimm_bus));
+}
+static DEVICE_ATTR_RO(provider);
+
+static struct attribute *nvdimm_bus_attributes[] = {
+ &dev_attr_provider.attr,
+ NULL,
+};
+
+struct attribute_group nvdimm_bus_attribute_group = {
+ .attrs = nvdimm_bus_attributes,
+};
+EXPORT_SYMBOL_GPL(nvdimm_bus_attribute_group);
+
struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
struct nvdimm_bus_descriptor *nd_desc)
{
@@ -37,6 +89,7 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
nvdimm_bus = kzalloc(sizeof(*nvdimm_bus), GFP_KERNEL);
if (!nvdimm_bus)
return NULL;
+ INIT_LIST_HEAD(&nvdimm_bus->list);
nvdimm_bus->id = ida_simple_get(&nd_ida, 0, 0, GFP_KERNEL);
if (nvdimm_bus->id < 0) {
kfree(nvdimm_bus);
@@ -45,15 +98,26 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
nvdimm_bus->nd_desc = nd_desc;
nvdimm_bus->dev.parent = parent;
nvdimm_bus->dev.release = nvdimm_bus_release;
+ nvdimm_bus->dev.groups = nd_desc->attr_groups;
dev_set_name(&nvdimm_bus->dev, "ndbus%d", nvdimm_bus->id);
rc = device_register(&nvdimm_bus->dev);
if (rc) {
dev_dbg(&nvdimm_bus->dev, "registration failed: %d\n", rc);
- put_device(&nvdimm_bus->dev);
- return NULL;
+ goto err;
}

+ rc = nvdimm_bus_create_ndctl(nvdimm_bus);
+ if (rc)
+ goto err;
+
+ mutex_lock(&nvdimm_bus_list_mutex);
+ list_add_tail(&nvdimm_bus->list, &nvdimm_bus_list);
+ mutex_unlock(&nvdimm_bus_list_mutex);
+
return nvdimm_bus;
+ err:
+ put_device(&nvdimm_bus->dev);
+ return NULL;
}
EXPORT_SYMBOL_GPL(nvdimm_bus_register);

@@ -61,9 +125,29 @@ void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus)
{
if (!nvdimm_bus)
return;
+
+ mutex_lock(&nvdimm_bus_list_mutex);
+ list_del_init(&nvdimm_bus->list);
+ mutex_unlock(&nvdimm_bus_list_mutex);
+
+ nvdimm_bus_destroy_ndctl(nvdimm_bus);
+
device_unregister(&nvdimm_bus->dev);
}
EXPORT_SYMBOL_GPL(nvdimm_bus_unregister);

+static __init int libnvdimm_init(void)
+{
+ return nvdimm_bus_init();
+}
+
+static __exit void libnvdimm_exit(void)
+{
+ WARN_ON(!list_empty(&nvdimm_bus_list));
+ nvdimm_bus_exit();
+}
+
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Intel Corporation");
+subsys_initcall(libnvdimm_init);
+module_exit(libnvdimm_exit);
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 291b2fdcd96b..0b0ff2423161 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -17,7 +17,13 @@

struct nvdimm_bus {
struct nvdimm_bus_descriptor *nd_desc;
+ struct list_head list;
struct device dev;
int id;
};
+
+int __init nvdimm_bus_init(void);
+void __exit nvdimm_bus_exit(void);
+int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus);
+void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus);
#endif /* __ND_CORE_H__ */
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 2b3c63950c91..d375cdc4abd5 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -14,6 +14,8 @@
*/
#ifndef __LIBNVDIMM_H__
#define __LIBNVDIMM_H__
+extern struct attribute_group nvdimm_bus_attribute_group;
+
struct nvdimm;
struct nvdimm_bus_descriptor;
typedef int (*ndctl_fn)(struct nvdimm_bus_descriptor *nd_desc,
@@ -21,14 +23,16 @@ typedef int (*ndctl_fn)(struct nvdimm_bus_descriptor *nd_desc,
unsigned int buf_len);

struct nvdimm_bus_descriptor {
+ const struct attribute_group **attr_groups;
unsigned long dsm_mask;
char *provider_name;
ndctl_fn ndctl;
};

struct device;
-struct nvdimm_bus;
struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
struct nvdimm_bus_descriptor *nfit_desc);
void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus);
+struct nvdimm_bus *to_nvdimm_bus(struct device *dev);
+struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
#endif /* __LIBNVDIMM_H__ */

2015-06-17 23:20:10

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 04/16] libnvdimm, nfit: dimm/memory-devices

Enable nvdimm devices to be registered on a nvdimm_bus. The kernel
assigned device id for nvdimm devicesis dynamic. If userspace needs a
more static identifier it should consult a provider-specific attribute.
In the case where NFIT is the provider, the 'nmemX/nfit/handle' or
'nmemX/nfit/serial' attributes may be used for this purpose.

Cc: Neil Brown <[email protected]>
Cc: <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Robert Moore <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit.c | 161 ++++++++++++++++++++++++++++++++++++++++++++
drivers/acpi/nfit.h | 1
drivers/nvdimm/Makefile | 1
drivers/nvdimm/bus.c | 14 ++++
drivers/nvdimm/core.c | 33 ++++++++-
drivers/nvdimm/dimm_devs.c | 92 +++++++++++++++++++++++++
drivers/nvdimm/nd-core.h | 12 +++
include/linux/libnvdimm.h | 11 +++
8 files changed, 321 insertions(+), 4 deletions(-)
create mode 100644 drivers/nvdimm/dimm_devs.c

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 162391b32022..9fd7781f966e 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -381,6 +381,165 @@ static const struct attribute_group *acpi_nfit_attribute_groups[] = {
NULL,
};

+static struct acpi_nfit_memory_map *to_nfit_memdev(struct device *dev)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+ struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
+
+ return __to_nfit_memdev(nfit_mem);
+}
+
+static struct acpi_nfit_control_region *to_nfit_dcr(struct device *dev)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+ struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
+
+ return nfit_mem->dcr;
+}
+
+static ssize_t handle_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_memory_map *memdev = to_nfit_memdev(dev);
+
+ return sprintf(buf, "%#x\n", memdev->device_handle);
+}
+static DEVICE_ATTR_RO(handle);
+
+static ssize_t phys_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_memory_map *memdev = to_nfit_memdev(dev);
+
+ return sprintf(buf, "%#x\n", memdev->physical_id);
+}
+static DEVICE_ATTR_RO(phys_id);
+
+static ssize_t vendor_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_control_region *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "%#x\n", dcr->vendor_id);
+}
+static DEVICE_ATTR_RO(vendor);
+
+static ssize_t rev_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_control_region *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "%#x\n", dcr->revision_id);
+}
+static DEVICE_ATTR_RO(rev_id);
+
+static ssize_t device_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_control_region *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "%#x\n", dcr->device_id);
+}
+static DEVICE_ATTR_RO(device);
+
+static ssize_t format_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_control_region *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "%#x\n", dcr->code);
+}
+static DEVICE_ATTR_RO(format);
+
+static ssize_t serial_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_control_region *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "%#x\n", dcr->serial_number);
+}
+static DEVICE_ATTR_RO(serial);
+
+static struct attribute *acpi_nfit_dimm_attributes[] = {
+ &dev_attr_handle.attr,
+ &dev_attr_phys_id.attr,
+ &dev_attr_vendor.attr,
+ &dev_attr_device.attr,
+ &dev_attr_format.attr,
+ &dev_attr_serial.attr,
+ &dev_attr_rev_id.attr,
+ NULL,
+};
+
+static umode_t acpi_nfit_dimm_attr_visible(struct kobject *kobj,
+ struct attribute *a, int n)
+{
+ struct device *dev = container_of(kobj, struct device, kobj);
+
+ if (to_nfit_dcr(dev))
+ return a->mode;
+ else
+ return 0;
+}
+
+static struct attribute_group acpi_nfit_dimm_attribute_group = {
+ .name = "nfit",
+ .attrs = acpi_nfit_dimm_attributes,
+ .is_visible = acpi_nfit_dimm_attr_visible,
+};
+
+static const struct attribute_group *acpi_nfit_dimm_attribute_groups[] = {
+ &acpi_nfit_dimm_attribute_group,
+ NULL,
+};
+
+static struct nvdimm *acpi_nfit_dimm_by_handle(struct acpi_nfit_desc *acpi_desc,
+ u32 device_handle)
+{
+ struct nfit_mem *nfit_mem;
+
+ list_for_each_entry(nfit_mem, &acpi_desc->dimms, list)
+ if (__to_nfit_memdev(nfit_mem)->device_handle == device_handle)
+ return nfit_mem->nvdimm;
+
+ return NULL;
+}
+
+static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
+{
+ struct nfit_mem *nfit_mem;
+
+ list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
+ struct nvdimm *nvdimm;
+ unsigned long flags = 0;
+ u32 device_handle;
+
+ device_handle = __to_nfit_memdev(nfit_mem)->device_handle;
+ nvdimm = acpi_nfit_dimm_by_handle(acpi_desc, device_handle);
+ if (nvdimm) {
+ /*
+ * If for some reason we find multiple DCRs the
+ * first one wins
+ */
+ dev_err(acpi_desc->dev, "duplicate DCR detected: %s\n",
+ nvdimm_name(nvdimm));
+ continue;
+ }
+
+ if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+ flags |= NDD_ALIASING;
+
+ nvdimm = nvdimm_create(acpi_desc->nvdimm_bus, nfit_mem,
+ acpi_nfit_dimm_attribute_groups, flags);
+ if (!nvdimm)
+ return -ENOMEM;
+
+ nfit_mem->nvdimm = nvdimm;
+ }
+
+ return 0;
+}
+
static int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
{
struct device *dev = acpi_desc->dev;
@@ -408,7 +567,7 @@ static int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
if (nfit_mem_init(acpi_desc) != 0)
return -ENOMEM;

- return 0;
+ return acpi_nfit_register_dimms(acpi_desc);
}

static int acpi_nfit_add(struct acpi_device *adev)
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index eda565fa81ef..9dd437fe5563 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -59,6 +59,7 @@ struct nfit_memdev {

/* assembled tables for a given dimm/memory-device */
struct nfit_mem {
+ struct nvdimm *nvdimm;
struct acpi_nfit_memory_map *memdev_dcr;
struct acpi_nfit_memory_map *memdev_pmem;
struct acpi_nfit_control_region *dcr;
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 8a8293c72c7c..5b68738ba406 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o

libnvdimm-y := core.o
libnvdimm-y += bus.o
+libnvdimm-y += dimm_devs.o
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 3f7c690a5d0c..a8802577fb55 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -13,6 +13,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/uaccess.h>
#include <linux/fcntl.h>
+#include <linux/async.h>
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/io.h>
@@ -21,6 +22,10 @@
static int nvdimm_bus_major;
static struct class *nd_class;

+struct bus_type nvdimm_bus_type = {
+ .name = "nd",
+};
+
int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus)
{
dev_t devt = MKDEV(nvdimm_bus_major, nvdimm_bus->id);
@@ -59,9 +64,13 @@ int __init nvdimm_bus_init(void)
{
int rc;

+ rc = bus_register(&nvdimm_bus_type);
+ if (rc)
+ return rc;
+
rc = register_chrdev(0, "ndctl", &nvdimm_bus_fops);
if (rc < 0)
- return rc;
+ goto err_chrdev;
nvdimm_bus_major = rc;

nd_class = class_create(THIS_MODULE, "nd");
@@ -72,6 +81,8 @@ int __init nvdimm_bus_init(void)

err_class:
unregister_chrdev(nvdimm_bus_major, "ndctl");
+ err_chrdev:
+ bus_unregister(&nvdimm_bus_type);

return rc;
}
@@ -80,4 +91,5 @@ void __exit nvdimm_bus_exit(void)
{
class_destroy(nd_class);
unregister_chrdev(nvdimm_bus_major, "ndctl");
+ bus_unregister(&nvdimm_bus_type);
}
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index fa7ab5ad0318..ef957eb37c90 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -18,8 +18,8 @@
#include <linux/slab.h>
#include "nd-core.h"

-static LIST_HEAD(nvdimm_bus_list);
-static DEFINE_MUTEX(nvdimm_bus_list_mutex);
+LIST_HEAD(nvdimm_bus_list);
+DEFINE_MUTEX(nvdimm_bus_list_mutex);
static DEFINE_IDA(nd_ida);

static void nvdimm_bus_release(struct device *dev)
@@ -48,6 +48,19 @@ struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus)
}
EXPORT_SYMBOL_GPL(to_nd_desc);

+struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
+{
+ struct device *dev;
+
+ for (dev = nd_dev; dev; dev = dev->parent)
+ if (dev->release == nvdimm_bus_release)
+ break;
+ dev_WARN_ONCE(nd_dev, !dev, "invalid dev, not on nd bus\n");
+ if (dev)
+ return to_nvdimm_bus(dev);
+ return NULL;
+}
+
static const char *nvdimm_bus_provider(struct nvdimm_bus *nvdimm_bus)
{
struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
@@ -121,6 +134,21 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
}
EXPORT_SYMBOL_GPL(nvdimm_bus_register);

+static int child_unregister(struct device *dev, void *data)
+{
+ /*
+ * the singular ndctl class device per bus needs to be
+ * "device_destroy"ed, so skip it here
+ *
+ * i.e. remove classless children
+ */
+ if (dev->class)
+ /* pass */;
+ else
+ device_unregister(dev);
+ return 0;
+}
+
void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus)
{
if (!nvdimm_bus)
@@ -130,6 +158,7 @@ void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus)
list_del_init(&nvdimm_bus->list);
mutex_unlock(&nvdimm_bus_list_mutex);

+ device_for_each_child(&nvdimm_bus->dev, NULL, child_unregister);
nvdimm_bus_destroy_ndctl(nvdimm_bus);

device_unregister(&nvdimm_bus->dev);
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
new file mode 100644
index 000000000000..51ea52cc2079
--- /dev/null
+++ b/drivers/nvdimm/dimm_devs.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include "nd-core.h"
+
+static DEFINE_IDA(dimm_ida);
+
+static void nvdimm_release(struct device *dev)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+
+ ida_simple_remove(&dimm_ida, nvdimm->id);
+ kfree(nvdimm);
+}
+
+static struct device_type nvdimm_device_type = {
+ .name = "nvdimm",
+ .release = nvdimm_release,
+};
+
+static bool is_nvdimm(struct device *dev)
+{
+ return dev->type == &nvdimm_device_type;
+}
+
+struct nvdimm *to_nvdimm(struct device *dev)
+{
+ struct nvdimm *nvdimm = container_of(dev, struct nvdimm, dev);
+
+ WARN_ON(!is_nvdimm(dev));
+ return nvdimm;
+}
+EXPORT_SYMBOL_GPL(to_nvdimm);
+
+const char *nvdimm_name(struct nvdimm *nvdimm)
+{
+ return dev_name(&nvdimm->dev);
+}
+EXPORT_SYMBOL_GPL(nvdimm_name);
+
+void *nvdimm_provider_data(struct nvdimm *nvdimm)
+{
+ return nvdimm->provider_data;
+}
+EXPORT_SYMBOL_GPL(nvdimm_provider_data);
+
+struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
+ const struct attribute_group **groups, unsigned long flags)
+{
+ struct nvdimm *nvdimm = kzalloc(sizeof(*nvdimm), GFP_KERNEL);
+ struct device *dev;
+
+ if (!nvdimm)
+ return NULL;
+
+ nvdimm->id = ida_simple_get(&dimm_ida, 0, 0, GFP_KERNEL);
+ if (nvdimm->id < 0) {
+ kfree(nvdimm);
+ return NULL;
+ }
+ nvdimm->provider_data = provider_data;
+ nvdimm->flags = flags;
+
+ dev = &nvdimm->dev;
+ dev_set_name(dev, "nmem%d", nvdimm->id);
+ dev->parent = &nvdimm_bus->dev;
+ dev->type = &nvdimm_device_type;
+ dev->bus = &nvdimm_bus_type;
+ dev->groups = groups;
+ if (device_register(dev) != 0) {
+ put_device(dev);
+ return NULL;
+ }
+
+ return nvdimm;
+}
+EXPORT_SYMBOL_GPL(nvdimm_create);
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 0b0ff2423161..9b8303413b60 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -15,6 +15,10 @@
#include <linux/libnvdimm.h>
#include <linux/device.h>

+extern struct list_head nvdimm_bus_list;
+extern struct mutex nvdimm_bus_list_mutex;
+extern struct bus_type nvdimm_bus_type;
+
struct nvdimm_bus {
struct nvdimm_bus_descriptor *nd_desc;
struct list_head list;
@@ -22,6 +26,14 @@ struct nvdimm_bus {
int id;
};

+struct nvdimm {
+ unsigned long flags;
+ void *provider_data;
+ struct device dev;
+ int id;
+};
+
+struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev);
int __init nvdimm_bus_init(void);
void __exit nvdimm_bus_exit(void);
int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index d375cdc4abd5..07787f0dd7de 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -14,6 +14,12 @@
*/
#ifndef __LIBNVDIMM_H__
#define __LIBNVDIMM_H__
+
+enum {
+ /* when a dimm supports both PMEM and BLK access a label is required */
+ NDD_ALIASING = 1 << 0,
+};
+
extern struct attribute_group nvdimm_bus_attribute_group;

struct nvdimm;
@@ -34,5 +40,10 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
struct nvdimm_bus_descriptor *nfit_desc);
void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus);
struct nvdimm_bus *to_nvdimm_bus(struct device *dev);
+struct nvdimm *to_nvdimm(struct device *dev);
struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
+const char *nvdimm_name(struct nvdimm *nvdimm);
+void *nvdimm_provider_data(struct nvdimm *nvdimm);
+struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
+ const struct attribute_group **groups, unsigned long flags);
#endif /* __LIBNVDIMM_H__ */

2015-06-17 23:16:36

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 05/16] libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices

Most discovery/configuration of the nvdimm-subsystem is done via sysfs
attributes. However, some nvdimm_bus instances, particularly the
ACPI.NFIT bus, define a small set of messages that can be passed to the
platform. For convenience we derive the initial libnvdimm-ioctl command
formats directly from the NFIT DSM Interface Example formats.

ND_CMD_SMART: media health and diagnostics
ND_CMD_GET_CONFIG_SIZE: size of the label space
ND_CMD_GET_CONFIG_DATA: read label space
ND_CMD_SET_CONFIG_DATA: write label space
ND_CMD_VENDOR: vendor-specific command passthrough
ND_CMD_ARS_CAP: report address-range-scrubbing capabilities
ND_CMD_ARS_START: initiate scrubbing
ND_CMD_ARS_STATUS: report on scrubbing state
ND_CMD_SMART_THRESHOLD: configure alarm thresholds for smart events

If a platform later defines different commands than this set it is
straightforward to extend support to those formats.

Most of the commands target a specific dimm. However, the
address-range-scrubbing commands target the bus. The 'commands'
attribute in sysfs of an nvdimm_bus, or nvdimm, enumerate the supported
commands for that object.

Cc: <[email protected]>
Cc: Robert Moore <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Reported-by: Nicholas Moulin <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/Kconfig | 12 ++
drivers/acpi/nfit.c | 216 +++++++++++++++++++++++++++++
drivers/acpi/nfit.h | 3
drivers/nvdimm/bus.c | 326 +++++++++++++++++++++++++++++++++++++++++++-
drivers/nvdimm/core.c | 16 ++
drivers/nvdimm/dimm_devs.c | 38 +++++
drivers/nvdimm/nd-core.h | 3
include/linux/libnvdimm.h | 27 ++++
include/uapi/linux/Kbuild | 1
include/uapi/linux/ndctl.h | 178 ++++++++++++++++++++++++
10 files changed, 810 insertions(+), 10 deletions(-)
create mode 100644 include/uapi/linux/ndctl.h

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 300b4ef3712b..9c43ae301300 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -397,6 +397,18 @@ config ACPI_NFIT
To compile this driver as a module, choose M here:
the module will be called nfit.

+config ACPI_NFIT_DEBUG
+ bool "NFIT DSM debug"
+ depends on ACPI_NFIT
+ depends on DYNAMIC_DEBUG
+ default n
+ help
+ Enabling this option causes the nfit driver to dump the
+ input and output buffers of _DSM operations on the ACPI0012
+ device and its children. This can be very verbose, so leave
+ it disabled unless you are debugging a hardware / firmware
+ issue.
+
source "drivers/acpi/apei/Kconfig"

config ACPI_EXTLOG
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 9fd7781f966e..9112a6210a4b 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -13,6 +13,7 @@
#include <linux/list_sort.h>
#include <linux/libnvdimm.h>
#include <linux/module.h>
+#include <linux/ndctl.h>
#include <linux/list.h>
#include <linux/acpi.h>
#include "nfit.h"
@@ -24,11 +25,153 @@ static const u8 *to_nfit_uuid(enum nfit_uuids id)
return nfit_uuid[id];
}

+static struct acpi_nfit_desc *to_acpi_nfit_desc(
+ struct nvdimm_bus_descriptor *nd_desc)
+{
+ return container_of(nd_desc, struct acpi_nfit_desc, nd_desc);
+}
+
+static struct acpi_device *to_acpi_dev(struct acpi_nfit_desc *acpi_desc)
+{
+ struct nvdimm_bus_descriptor *nd_desc = &acpi_desc->nd_desc;
+
+ /*
+ * If provider == 'ACPI.NFIT' we can assume 'dev' is a struct
+ * acpi_device.
+ */
+ if (!nd_desc->provider_name
+ || strcmp(nd_desc->provider_name, "ACPI.NFIT") != 0)
+ return NULL;
+
+ return to_acpi_device(acpi_desc->dev);
+}
+
static int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len)
{
- return -ENOTTY;
+ struct acpi_nfit_desc *acpi_desc = to_acpi_nfit_desc(nd_desc);
+ const struct nd_cmd_desc *desc = NULL;
+ union acpi_object in_obj, in_buf, *out_obj;
+ struct device *dev = acpi_desc->dev;
+ const char *cmd_name, *dimm_name;
+ unsigned long dsm_mask;
+ acpi_handle handle;
+ const u8 *uuid;
+ u32 offset;
+ int rc, i;
+
+ if (nvdimm) {
+ struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
+ struct acpi_device *adev = nfit_mem->adev;
+
+ if (!adev)
+ return -ENOTTY;
+ dimm_name = dev_name(&adev->dev);
+ cmd_name = nvdimm_cmd_name(cmd);
+ dsm_mask = nfit_mem->dsm_mask;
+ desc = nd_cmd_dimm_desc(cmd);
+ uuid = to_nfit_uuid(NFIT_DEV_DIMM);
+ handle = adev->handle;
+ } else {
+ struct acpi_device *adev = to_acpi_dev(acpi_desc);
+
+ cmd_name = nvdimm_bus_cmd_name(cmd);
+ dsm_mask = nd_desc->dsm_mask;
+ desc = nd_cmd_bus_desc(cmd);
+ uuid = to_nfit_uuid(NFIT_DEV_BUS);
+ handle = adev->handle;
+ dimm_name = "bus";
+ }
+
+ if (!desc || (cmd && (desc->out_num + desc->in_num == 0)))
+ return -ENOTTY;
+
+ if (!test_bit(cmd, &dsm_mask))
+ return -ENOTTY;
+
+ in_obj.type = ACPI_TYPE_PACKAGE;
+ in_obj.package.count = 1;
+ in_obj.package.elements = &in_buf;
+ in_buf.type = ACPI_TYPE_BUFFER;
+ in_buf.buffer.pointer = buf;
+ in_buf.buffer.length = 0;
+
+ /* libnvdimm has already validated the input envelope */
+ for (i = 0; i < desc->in_num; i++)
+ in_buf.buffer.length += nd_cmd_in_size(nvdimm, cmd, desc,
+ i, buf);
+
+ if (IS_ENABLED(CONFIG_ACPI_NFIT_DEBUG)) {
+ dev_dbg(dev, "%s:%s cmd: %s input length: %d\n", __func__,
+ dimm_name, cmd_name, in_buf.buffer.length);
+ print_hex_dump_debug(cmd_name, DUMP_PREFIX_OFFSET, 4,
+ 4, in_buf.buffer.pointer, min_t(u32, 128,
+ in_buf.buffer.length), true);
+ }
+
+ out_obj = acpi_evaluate_dsm(handle, uuid, 1, cmd, &in_obj);
+ if (!out_obj) {
+ dev_dbg(dev, "%s:%s _DSM failed cmd: %s\n", __func__, dimm_name,
+ cmd_name);
+ return -EINVAL;
+ }
+
+ if (out_obj->package.type != ACPI_TYPE_BUFFER) {
+ dev_dbg(dev, "%s:%s unexpected output object type cmd: %s type: %d\n",
+ __func__, dimm_name, cmd_name, out_obj->type);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ if (IS_ENABLED(CONFIG_ACPI_NFIT_DEBUG)) {
+ dev_dbg(dev, "%s:%s cmd: %s output length: %d\n", __func__,
+ dimm_name, cmd_name, out_obj->buffer.length);
+ print_hex_dump_debug(cmd_name, DUMP_PREFIX_OFFSET, 4,
+ 4, out_obj->buffer.pointer, min_t(u32, 128,
+ out_obj->buffer.length), true);
+ }
+
+ for (i = 0, offset = 0; i < desc->out_num; i++) {
+ u32 out_size = nd_cmd_out_size(nvdimm, cmd, desc, i, buf,
+ (u32 *) out_obj->buffer.pointer);
+
+ if (offset + out_size > out_obj->buffer.length) {
+ dev_dbg(dev, "%s:%s output object underflow cmd: %s field: %d\n",
+ __func__, dimm_name, cmd_name, i);
+ break;
+ }
+
+ if (in_buf.buffer.length + offset + out_size > buf_len) {
+ dev_dbg(dev, "%s:%s output overrun cmd: %s field: %d\n",
+ __func__, dimm_name, cmd_name, i);
+ rc = -ENXIO;
+ goto out;
+ }
+ memcpy(buf + in_buf.buffer.length + offset,
+ out_obj->buffer.pointer + offset, out_size);
+ offset += out_size;
+ }
+ if (offset + in_buf.buffer.length < buf_len) {
+ if (i >= 1) {
+ /*
+ * status valid, return the number of bytes left
+ * unfilled in the output buffer
+ */
+ rc = buf_len - offset - in_buf.buffer.length;
+ } else {
+ dev_err(dev, "%s:%s underrun cmd: %s buf_len: %d out_len: %d\n",
+ __func__, dimm_name, cmd_name, buf_len,
+ offset);
+ rc = -ENXIO;
+ }
+ } else
+ rc = 0;
+
+ out:
+ ACPI_FREE(out_obj);
+
+ return rc;
}

static const char *spa_type_name(u16 type)
@@ -489,6 +632,7 @@ static struct attribute_group acpi_nfit_dimm_attribute_group = {
};

static const struct attribute_group *acpi_nfit_dimm_attribute_groups[] = {
+ &nvdimm_attribute_group,
&acpi_nfit_dimm_attribute_group,
NULL,
};
@@ -505,6 +649,50 @@ static struct nvdimm *acpi_nfit_dimm_by_handle(struct acpi_nfit_desc *acpi_desc,
return NULL;
}

+static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
+ struct nfit_mem *nfit_mem, u32 device_handle)
+{
+ struct acpi_device *adev, *adev_dimm;
+ struct device *dev = acpi_desc->dev;
+ const u8 *uuid = to_nfit_uuid(NFIT_DEV_DIMM);
+ unsigned long long sta;
+ int i, rc = -ENODEV;
+ acpi_status status;
+
+ nfit_mem->dsm_mask = acpi_desc->dimm_dsm_force_en;
+ adev = to_acpi_dev(acpi_desc);
+ if (!adev)
+ return 0;
+
+ adev_dimm = acpi_find_child_device(adev, device_handle, false);
+ nfit_mem->adev = adev_dimm;
+ if (!adev_dimm) {
+ dev_err(dev, "no ACPI.NFIT device with _ADR %#x, disabling...\n",
+ device_handle);
+ return -ENODEV;
+ }
+
+ status = acpi_evaluate_integer(adev_dimm->handle, "_STA", NULL, &sta);
+ if (status == AE_NOT_FOUND) {
+ dev_dbg(dev, "%s missing _STA, assuming enabled...\n",
+ dev_name(&adev_dimm->dev));
+ rc = 0;
+ } else if (ACPI_FAILURE(status))
+ dev_err(dev, "%s failed to retrieve_STA, disabling...\n",
+ dev_name(&adev_dimm->dev));
+ else if ((sta & ACPI_STA_DEVICE_ENABLED) == 0)
+ dev_info(dev, "%s disabled by firmware\n",
+ dev_name(&adev_dimm->dev));
+ else
+ rc = 0;
+
+ for (i = ND_CMD_SMART; i <= ND_CMD_VENDOR; i++)
+ if (acpi_check_dsm(adev_dimm->handle, uuid, 1, 1ULL << i))
+ set_bit(i, &nfit_mem->dsm_mask);
+
+ return rc;
+}
+
static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
{
struct nfit_mem *nfit_mem;
@@ -513,6 +701,7 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
struct nvdimm *nvdimm;
unsigned long flags = 0;
u32 device_handle;
+ int rc;

device_handle = __to_nfit_memdev(nfit_mem)->device_handle;
nvdimm = acpi_nfit_dimm_by_handle(acpi_desc, device_handle);
@@ -529,8 +718,13 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
if (nfit_mem->bdw && nfit_mem->memdev_pmem)
flags |= NDD_ALIASING;

+ rc = acpi_nfit_add_dimm(acpi_desc, nfit_mem, device_handle);
+ if (rc)
+ continue;
+
nvdimm = nvdimm_create(acpi_desc->nvdimm_bus, nfit_mem,
- acpi_nfit_dimm_attribute_groups, flags);
+ acpi_nfit_dimm_attribute_groups,
+ flags, &nfit_mem->dsm_mask);
if (!nvdimm)
return -ENOMEM;

@@ -540,6 +734,22 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
return 0;
}

+static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
+{
+ struct nvdimm_bus_descriptor *nd_desc = &acpi_desc->nd_desc;
+ const u8 *uuid = to_nfit_uuid(NFIT_DEV_BUS);
+ struct acpi_device *adev;
+ int i;
+
+ adev = to_acpi_dev(acpi_desc);
+ if (!adev)
+ return;
+
+ for (i = ND_CMD_ARS_CAP; i <= ND_CMD_ARS_STATUS; i++)
+ if (acpi_check_dsm(adev->handle, uuid, 1, 1ULL << i))
+ set_bit(i, &nd_desc->dsm_mask);
+}
+
static int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
{
struct device *dev = acpi_desc->dev;
@@ -567,6 +777,8 @@ static int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
if (nfit_mem_init(acpi_desc) != 0)
return -ENOMEM;

+ acpi_nfit_init_dsms(acpi_desc);
+
return acpi_nfit_register_dimms(acpi_desc);
}

diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 9dd437fe5563..b76e33629098 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -67,6 +67,8 @@ struct nfit_mem {
struct acpi_nfit_system_address *spa_dcr;
struct acpi_nfit_system_address *spa_bdw;
struct list_head list;
+ struct acpi_device *adev;
+ unsigned long dsm_mask;
};

struct acpi_nfit_desc {
@@ -79,6 +81,7 @@ struct acpi_nfit_desc {
struct list_head bdws;
struct nvdimm_bus *nvdimm_bus;
struct device *dev;
+ unsigned long dimm_dsm_force_en;
};

static inline struct acpi_nfit_memory_map *__to_nfit_memdev(
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a8802577fb55..15f3a3ddc225 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -11,14 +11,18 @@
* General Public License for more details.
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/vmalloc.h>
#include <linux/uaccess.h>
#include <linux/fcntl.h>
#include <linux/async.h>
+#include <linux/ndctl.h>
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/io.h>
+#include <linux/mm.h>
#include "nd-core.h"

+int nvdimm_major;
static int nvdimm_bus_major;
static struct class *nd_class;

@@ -47,19 +51,325 @@ void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus)
device_destroy(nd_class, MKDEV(nvdimm_bus_major, nvdimm_bus->id));
}

+static const struct nd_cmd_desc __nd_cmd_dimm_descs[] = {
+ [ND_CMD_IMPLEMENTED] = { },
+ [ND_CMD_SMART] = {
+ .out_num = 2,
+ .out_sizes = { 4, 8, },
+ },
+ [ND_CMD_SMART_THRESHOLD] = {
+ .out_num = 2,
+ .out_sizes = { 4, 8, },
+ },
+ [ND_CMD_DIMM_FLAGS] = {
+ .out_num = 2,
+ .out_sizes = { 4, 4 },
+ },
+ [ND_CMD_GET_CONFIG_SIZE] = {
+ .out_num = 3,
+ .out_sizes = { 4, 4, 4, },
+ },
+ [ND_CMD_GET_CONFIG_DATA] = {
+ .in_num = 2,
+ .in_sizes = { 4, 4, },
+ .out_num = 2,
+ .out_sizes = { 4, UINT_MAX, },
+ },
+ [ND_CMD_SET_CONFIG_DATA] = {
+ .in_num = 3,
+ .in_sizes = { 4, 4, UINT_MAX, },
+ .out_num = 1,
+ .out_sizes = { 4, },
+ },
+ [ND_CMD_VENDOR] = {
+ .in_num = 3,
+ .in_sizes = { 4, 4, UINT_MAX, },
+ .out_num = 3,
+ .out_sizes = { 4, 4, UINT_MAX, },
+ },
+};
+
+const struct nd_cmd_desc *nd_cmd_dimm_desc(int cmd)
+{
+ if (cmd < ARRAY_SIZE(__nd_cmd_dimm_descs))
+ return &__nd_cmd_dimm_descs[cmd];
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(nd_cmd_dimm_desc);
+
+static const struct nd_cmd_desc __nd_cmd_bus_descs[] = {
+ [ND_CMD_IMPLEMENTED] = { },
+ [ND_CMD_ARS_CAP] = {
+ .in_num = 2,
+ .in_sizes = { 8, 8, },
+ .out_num = 2,
+ .out_sizes = { 4, 4, },
+ },
+ [ND_CMD_ARS_START] = {
+ .in_num = 4,
+ .in_sizes = { 8, 8, 2, 6, },
+ .out_num = 1,
+ .out_sizes = { 4, },
+ },
+ [ND_CMD_ARS_STATUS] = {
+ .out_num = 2,
+ .out_sizes = { 4, UINT_MAX, },
+ },
+};
+
+const struct nd_cmd_desc *nd_cmd_bus_desc(int cmd)
+{
+ if (cmd < ARRAY_SIZE(__nd_cmd_bus_descs))
+ return &__nd_cmd_bus_descs[cmd];
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(nd_cmd_bus_desc);
+
+u32 nd_cmd_in_size(struct nvdimm *nvdimm, int cmd,
+ const struct nd_cmd_desc *desc, int idx, void *buf)
+{
+ if (idx >= desc->in_num)
+ return UINT_MAX;
+
+ if (desc->in_sizes[idx] < UINT_MAX)
+ return desc->in_sizes[idx];
+
+ if (nvdimm && cmd == ND_CMD_SET_CONFIG_DATA && idx == 2) {
+ struct nd_cmd_set_config_hdr *hdr = buf;
+
+ return hdr->in_length;
+ } else if (nvdimm && cmd == ND_CMD_VENDOR && idx == 2) {
+ struct nd_cmd_vendor_hdr *hdr = buf;
+
+ return hdr->in_length;
+ }
+
+ return UINT_MAX;
+}
+EXPORT_SYMBOL_GPL(nd_cmd_in_size);
+
+u32 nd_cmd_out_size(struct nvdimm *nvdimm, int cmd,
+ const struct nd_cmd_desc *desc, int idx, const u32 *in_field,
+ const u32 *out_field)
+{
+ if (idx >= desc->out_num)
+ return UINT_MAX;
+
+ if (desc->out_sizes[idx] < UINT_MAX)
+ return desc->out_sizes[idx];
+
+ if (nvdimm && cmd == ND_CMD_GET_CONFIG_DATA && idx == 1)
+ return in_field[1];
+ else if (nvdimm && cmd == ND_CMD_VENDOR && idx == 2)
+ return out_field[1];
+ else if (!nvdimm && cmd == ND_CMD_ARS_STATUS && idx == 1)
+ return ND_CMD_ARS_STATUS_MAX;
+
+ return UINT_MAX;
+}
+EXPORT_SYMBOL_GPL(nd_cmd_out_size);
+
+static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
+ int read_only, unsigned int ioctl_cmd, unsigned long arg)
+{
+ struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+ size_t buf_len = 0, in_len = 0, out_len = 0;
+ static char out_env[ND_CMD_MAX_ENVELOPE];
+ static char in_env[ND_CMD_MAX_ENVELOPE];
+ const struct nd_cmd_desc *desc = NULL;
+ unsigned int cmd = _IOC_NR(ioctl_cmd);
+ void __user *p = (void __user *) arg;
+ struct device *dev = &nvdimm_bus->dev;
+ const char *cmd_name, *dimm_name;
+ unsigned long dsm_mask;
+ void *buf;
+ int rc, i;
+
+ if (nvdimm) {
+ desc = nd_cmd_dimm_desc(cmd);
+ cmd_name = nvdimm_cmd_name(cmd);
+ dsm_mask = nvdimm->dsm_mask ? *(nvdimm->dsm_mask) : 0;
+ dimm_name = dev_name(&nvdimm->dev);
+ } else {
+ desc = nd_cmd_bus_desc(cmd);
+ cmd_name = nvdimm_bus_cmd_name(cmd);
+ dsm_mask = nd_desc->dsm_mask;
+ dimm_name = "bus";
+ }
+
+ if (!desc || (desc->out_num + desc->in_num == 0) ||
+ !test_bit(cmd, &dsm_mask))
+ return -ENOTTY;
+
+ /* fail write commands (when read-only) */
+ if (read_only)
+ switch (ioctl_cmd) {
+ case ND_IOCTL_VENDOR:
+ case ND_IOCTL_SET_CONFIG_DATA:
+ case ND_IOCTL_ARS_START:
+ dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ nvdimm ? nvdimm_cmd_name(cmd)
+ : nvdimm_bus_cmd_name(cmd));
+ return -EPERM;
+ default:
+ break;
+ }
+
+ /* process an input envelope */
+ for (i = 0; i < desc->in_num; i++) {
+ u32 in_size, copy;
+
+ in_size = nd_cmd_in_size(nvdimm, cmd, desc, i, in_env);
+ if (in_size == UINT_MAX) {
+ dev_err(dev, "%s:%s unknown input size cmd: %s field: %d\n",
+ __func__, dimm_name, cmd_name, i);
+ return -ENXIO;
+ }
+ if (!access_ok(VERIFY_READ, p + in_len, in_size))
+ return -EFAULT;
+ if (in_len < sizeof(in_env))
+ copy = min_t(u32, sizeof(in_env) - in_len, in_size);
+ else
+ copy = 0;
+ if (copy && copy_from_user(&in_env[in_len], p + in_len, copy))
+ return -EFAULT;
+ in_len += in_size;
+ }
+
+ /* process an output envelope */
+ for (i = 0; i < desc->out_num; i++) {
+ u32 out_size = nd_cmd_out_size(nvdimm, cmd, desc, i,
+ (u32 *) in_env, (u32 *) out_env);
+ u32 copy;
+
+ if (out_size == UINT_MAX) {
+ dev_dbg(dev, "%s:%s unknown output size cmd: %s field: %d\n",
+ __func__, dimm_name, cmd_name, i);
+ return -EFAULT;
+ }
+ if (!access_ok(VERIFY_WRITE, p + in_len + out_len, out_size))
+ return -EFAULT;
+ if (out_len < sizeof(out_env))
+ copy = min_t(u32, sizeof(out_env) - out_len, out_size);
+ else
+ copy = 0;
+ if (copy && copy_from_user(&out_env[out_len],
+ p + in_len + out_len, copy))
+ return -EFAULT;
+ out_len += out_size;
+ }
+
+ buf_len = out_len + in_len;
+ if (!access_ok(VERIFY_WRITE, p, sizeof(buf_len)))
+ return -EFAULT;
+
+ if (buf_len > ND_IOCTL_MAX_BUFLEN) {
+ dev_dbg(dev, "%s:%s cmd: %s buf_len: %zu > %d\n", __func__,
+ dimm_name, cmd_name, buf_len,
+ ND_IOCTL_MAX_BUFLEN);
+ return -EINVAL;
+ }
+
+ buf = vmalloc(buf_len);
+ if (!buf)
+ return -ENOMEM;
+
+ if (copy_from_user(buf, p, buf_len)) {
+ rc = -EFAULT;
+ goto out;
+ }
+
+ rc = nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len);
+ if (rc < 0)
+ goto out;
+ if (copy_to_user(p, buf, buf_len))
+ rc = -EFAULT;
+ out:
+ vfree(buf);
+ return rc;
+}
+
static long nd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
- return -ENXIO;
+ long id = (long) file->private_data;
+ int rc = -ENXIO, read_only;
+ struct nvdimm_bus *nvdimm_bus;
+
+ read_only = (O_RDWR != (file->f_flags & O_ACCMODE));
+ mutex_lock(&nvdimm_bus_list_mutex);
+ list_for_each_entry(nvdimm_bus, &nvdimm_bus_list, list) {
+ if (nvdimm_bus->id == id) {
+ rc = __nd_ioctl(nvdimm_bus, NULL, read_only, cmd, arg);
+ break;
+ }
+ }
+ mutex_unlock(&nvdimm_bus_list_mutex);
+
+ return rc;
+}
+
+static int match_dimm(struct device *dev, void *data)
+{
+ long id = (long) data;
+
+ if (is_nvdimm(dev)) {
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+
+ return nvdimm->id == id;
+ }
+
+ return 0;
+}
+
+static long nvdimm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ int rc = -ENXIO, read_only;
+ struct nvdimm_bus *nvdimm_bus;
+
+ read_only = (O_RDWR != (file->f_flags & O_ACCMODE));
+ mutex_lock(&nvdimm_bus_list_mutex);
+ list_for_each_entry(nvdimm_bus, &nvdimm_bus_list, list) {
+ struct device *dev = device_find_child(&nvdimm_bus->dev,
+ file->private_data, match_dimm);
+ struct nvdimm *nvdimm;
+
+ if (!dev)
+ continue;
+
+ nvdimm = to_nvdimm(dev);
+ rc = __nd_ioctl(nvdimm_bus, nvdimm, read_only, cmd, arg);
+ put_device(dev);
+ break;
+ }
+ mutex_unlock(&nvdimm_bus_list_mutex);
+
+ return rc;
+}
+
+static int nd_open(struct inode *inode, struct file *file)
+{
+ long minor = iminor(inode);
+
+ file->private_data = (void *) minor;
+ return 0;
}

static const struct file_operations nvdimm_bus_fops = {
.owner = THIS_MODULE,
- .open = nonseekable_open,
+ .open = nd_open,
.unlocked_ioctl = nd_ioctl,
.compat_ioctl = nd_ioctl,
.llseek = noop_llseek,
};

+static const struct file_operations nvdimm_fops = {
+ .owner = THIS_MODULE,
+ .open = nd_open,
+ .unlocked_ioctl = nvdimm_ioctl,
+ .compat_ioctl = nvdimm_ioctl,
+ .llseek = noop_llseek,
+};
+
int __init nvdimm_bus_init(void)
{
int rc;
@@ -70,9 +380,14 @@ int __init nvdimm_bus_init(void)

rc = register_chrdev(0, "ndctl", &nvdimm_bus_fops);
if (rc < 0)
- goto err_chrdev;
+ goto err_bus_chrdev;
nvdimm_bus_major = rc;

+ rc = register_chrdev(0, "dimmctl", &nvdimm_fops);
+ if (rc < 0)
+ goto err_dimm_chrdev;
+ nvdimm_major = rc;
+
nd_class = class_create(THIS_MODULE, "nd");
if (IS_ERR(nd_class))
goto err_class;
@@ -80,8 +395,10 @@ int __init nvdimm_bus_init(void)
return 0;

err_class:
+ unregister_chrdev(nvdimm_major, "dimmctl");
+ err_dimm_chrdev:
unregister_chrdev(nvdimm_bus_major, "ndctl");
- err_chrdev:
+ err_bus_chrdev:
bus_unregister(&nvdimm_bus_type);

return rc;
@@ -91,5 +408,6 @@ void __exit nvdimm_bus_exit(void)
{
class_destroy(nd_class);
unregister_chrdev(nvdimm_bus_major, "ndctl");
+ unregister_chrdev(nvdimm_major, "dimmctl");
bus_unregister(&nvdimm_bus_type);
}
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index ef957eb37c90..1ce159095c52 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -14,6 +14,7 @@
#include <linux/export.h>
#include <linux/module.h>
#include <linux/device.h>
+#include <linux/ndctl.h>
#include <linux/mutex.h>
#include <linux/slab.h>
#include "nd-core.h"
@@ -61,6 +62,20 @@ struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
return NULL;
}

+static ssize_t commands_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ int cmd, len = 0;
+ struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+ struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+
+ for_each_set_bit(cmd, &nd_desc->dsm_mask, BITS_PER_LONG)
+ len += sprintf(buf + len, "%s ", nvdimm_bus_cmd_name(cmd));
+ len += sprintf(buf + len, "\n");
+ return len;
+}
+static DEVICE_ATTR_RO(commands);
+
static const char *nvdimm_bus_provider(struct nvdimm_bus *nvdimm_bus)
{
struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
@@ -84,6 +99,7 @@ static ssize_t provider_show(struct device *dev,
static DEVICE_ATTR_RO(provider);

static struct attribute *nvdimm_bus_attributes[] = {
+ &dev_attr_commands.attr,
&dev_attr_provider.attr,
NULL,
};
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 51ea52cc2079..c3dd7227d1bb 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -12,6 +12,7 @@
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/device.h>
+#include <linux/ndctl.h>
#include <linux/slab.h>
#include <linux/io.h>
#include <linux/fs.h>
@@ -33,7 +34,7 @@ static struct device_type nvdimm_device_type = {
.release = nvdimm_release,
};

-static bool is_nvdimm(struct device *dev)
+bool is_nvdimm(struct device *dev)
{
return dev->type == &nvdimm_device_type;
}
@@ -55,12 +56,41 @@ EXPORT_SYMBOL_GPL(nvdimm_name);

void *nvdimm_provider_data(struct nvdimm *nvdimm)
{
- return nvdimm->provider_data;
+ if (nvdimm)
+ return nvdimm->provider_data;
+ return NULL;
}
EXPORT_SYMBOL_GPL(nvdimm_provider_data);

+static ssize_t commands_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+ int cmd, len = 0;
+
+ if (!nvdimm->dsm_mask)
+ return sprintf(buf, "\n");
+
+ for_each_set_bit(cmd, nvdimm->dsm_mask, BITS_PER_LONG)
+ len += sprintf(buf + len, "%s ", nvdimm_cmd_name(cmd));
+ len += sprintf(buf + len, "\n");
+ return len;
+}
+static DEVICE_ATTR_RO(commands);
+
+static struct attribute *nvdimm_attributes[] = {
+ &dev_attr_commands.attr,
+ NULL,
+};
+
+struct attribute_group nvdimm_attribute_group = {
+ .attrs = nvdimm_attributes,
+};
+EXPORT_SYMBOL_GPL(nvdimm_attribute_group);
+
struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
- const struct attribute_group **groups, unsigned long flags)
+ const struct attribute_group **groups, unsigned long flags,
+ unsigned long *dsm_mask)
{
struct nvdimm *nvdimm = kzalloc(sizeof(*nvdimm), GFP_KERNEL);
struct device *dev;
@@ -75,12 +105,14 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
}
nvdimm->provider_data = provider_data;
nvdimm->flags = flags;
+ nvdimm->dsm_mask = dsm_mask;

dev = &nvdimm->dev;
dev_set_name(dev, "nmem%d", nvdimm->id);
dev->parent = &nvdimm_bus->dev;
dev->type = &nvdimm_device_type;
dev->bus = &nvdimm_bus_type;
+ dev->devt = MKDEV(nvdimm_major, nvdimm->id);
dev->groups = groups;
if (device_register(dev) != 0) {
put_device(dev);
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 9b8303413b60..59528b3c9de8 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -18,6 +18,7 @@
extern struct list_head nvdimm_bus_list;
extern struct mutex nvdimm_bus_list_mutex;
extern struct bus_type nvdimm_bus_type;
+extern int nvdimm_major;

struct nvdimm_bus {
struct nvdimm_bus_descriptor *nd_desc;
@@ -29,10 +30,12 @@ struct nvdimm_bus {
struct nvdimm {
unsigned long flags;
void *provider_data;
+ unsigned long *dsm_mask;
struct device dev;
int id;
};

+bool is_nvdimm(struct device *dev);
struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev);
int __init nvdimm_bus_init(void);
void __exit nvdimm_bus_exit(void);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 07787f0dd7de..a39235819af3 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -14,13 +14,22 @@
*/
#ifndef __LIBNVDIMM_H__
#define __LIBNVDIMM_H__
+#include <linux/sizes.h>
+#include <linux/types.h>

enum {
/* when a dimm supports both PMEM and BLK access a label is required */
NDD_ALIASING = 1 << 0,
+
+ /* need to set a limit somewhere, but yes, this is likely overkill */
+ ND_IOCTL_MAX_BUFLEN = SZ_4M,
+ ND_CMD_MAX_ELEM = 4,
+ ND_CMD_MAX_ENVELOPE = 16,
+ ND_CMD_ARS_STATUS_MAX = SZ_4K,
};

extern struct attribute_group nvdimm_bus_attribute_group;
+extern struct attribute_group nvdimm_attribute_group;

struct nvdimm;
struct nvdimm_bus_descriptor;
@@ -35,6 +44,14 @@ struct nvdimm_bus_descriptor {
ndctl_fn ndctl;
};

+struct nd_cmd_desc {
+ int in_num;
+ int out_num;
+ u32 in_sizes[ND_CMD_MAX_ELEM];
+ int out_sizes[ND_CMD_MAX_ELEM];
+};
+
+struct nvdimm_bus;
struct device;
struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
struct nvdimm_bus_descriptor *nfit_desc);
@@ -45,5 +62,13 @@ struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
const char *nvdimm_name(struct nvdimm *nvdimm);
void *nvdimm_provider_data(struct nvdimm *nvdimm);
struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
- const struct attribute_group **groups, unsigned long flags);
+ const struct attribute_group **groups, unsigned long flags,
+ unsigned long *dsm_mask);
+const struct nd_cmd_desc *nd_cmd_dimm_desc(int cmd);
+const struct nd_cmd_desc *nd_cmd_bus_desc(int cmd);
+u32 nd_cmd_in_size(struct nvdimm *nvdimm, int cmd,
+ const struct nd_cmd_desc *desc, int idx, void *buf);
+u32 nd_cmd_out_size(struct nvdimm *nvdimm, int cmd,
+ const struct nd_cmd_desc *desc, int idx, const u32 *in_field,
+ const u32 *out_field);
#endif /* __LIBNVDIMM_H__ */
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 1a0006a76b00..200cc5ea2998 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -271,6 +271,7 @@ header-y += ncp_fs.h
header-y += ncp.h
header-y += ncp_mount.h
header-y += ncp_no.h
+header-y += ndctl.h
header-y += neighbour.h
header-y += netconf.h
header-y += netdevice.h
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
new file mode 100644
index 000000000000..ff13c23b26df
--- /dev/null
+++ b/include/uapi/linux/ndctl.h
@@ -0,0 +1,178 @@
+/*
+ * Copyright (c) 2014-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU Lesser General Public License,
+ * version 2.1, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT ANY
+ * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for
+ * more details.
+ */
+#ifndef __NDCTL_H__
+#define __NDCTL_H__
+
+#include <linux/types.h>
+
+struct nd_cmd_smart {
+ __u32 status;
+ __u8 data[128];
+} __packed;
+
+struct nd_cmd_smart_threshold {
+ __u32 status;
+ __u8 data[8];
+} __packed;
+
+struct nd_cmd_dimm_flags {
+ __u32 status;
+ __u32 flags;
+} __packed;
+
+struct nd_cmd_get_config_size {
+ __u32 status;
+ __u32 config_size;
+ __u32 max_xfer;
+} __packed;
+
+struct nd_cmd_get_config_data_hdr {
+ __u32 in_offset;
+ __u32 in_length;
+ __u32 status;
+ __u8 out_buf[0];
+} __packed;
+
+struct nd_cmd_set_config_hdr {
+ __u32 in_offset;
+ __u32 in_length;
+ __u8 in_buf[0];
+} __packed;
+
+struct nd_cmd_vendor_hdr {
+ __u32 opcode;
+ __u32 in_length;
+ __u8 in_buf[0];
+} __packed;
+
+struct nd_cmd_vendor_tail {
+ __u32 status;
+ __u32 out_length;
+ __u8 out_buf[0];
+} __packed;
+
+struct nd_cmd_ars_cap {
+ __u64 address;
+ __u64 length;
+ __u32 status;
+ __u32 max_ars_out;
+} __packed;
+
+struct nd_cmd_ars_start {
+ __u64 address;
+ __u64 length;
+ __u16 type;
+ __u8 reserved[6];
+ __u32 status;
+} __packed;
+
+struct nd_cmd_ars_status {
+ __u32 status;
+ __u32 out_length;
+ __u64 address;
+ __u64 length;
+ __u16 type;
+ __u32 num_records;
+ struct nd_ars_record {
+ __u32 handle;
+ __u32 flags;
+ __u64 err_address;
+ __u64 mask;
+ } __packed records[0];
+} __packed;
+
+enum {
+ ND_CMD_IMPLEMENTED = 0,
+
+ /* bus commands */
+ ND_CMD_ARS_CAP = 1,
+ ND_CMD_ARS_START = 2,
+ ND_CMD_ARS_STATUS = 3,
+
+ /* per-dimm commands */
+ ND_CMD_SMART = 1,
+ ND_CMD_SMART_THRESHOLD = 2,
+ ND_CMD_DIMM_FLAGS = 3,
+ ND_CMD_GET_CONFIG_SIZE = 4,
+ ND_CMD_GET_CONFIG_DATA = 5,
+ ND_CMD_SET_CONFIG_DATA = 6,
+ ND_CMD_VENDOR_EFFECT_LOG_SIZE = 7,
+ ND_CMD_VENDOR_EFFECT_LOG = 8,
+ ND_CMD_VENDOR = 9,
+};
+
+static inline const char *nvdimm_bus_cmd_name(unsigned cmd)
+{
+ static const char * const names[] = {
+ [ND_CMD_ARS_CAP] = "ars_cap",
+ [ND_CMD_ARS_START] = "ars_start",
+ [ND_CMD_ARS_STATUS] = "ars_status",
+ };
+
+ if (cmd < ARRAY_SIZE(names) && names[cmd])
+ return names[cmd];
+ return "unknown";
+}
+
+static inline const char *nvdimm_cmd_name(unsigned cmd)
+{
+ static const char * const names[] = {
+ [ND_CMD_SMART] = "smart",
+ [ND_CMD_SMART_THRESHOLD] = "smart_thresh",
+ [ND_CMD_DIMM_FLAGS] = "flags",
+ [ND_CMD_GET_CONFIG_SIZE] = "get_size",
+ [ND_CMD_GET_CONFIG_DATA] = "get_data",
+ [ND_CMD_SET_CONFIG_DATA] = "set_data",
+ [ND_CMD_VENDOR_EFFECT_LOG_SIZE] = "effect_size",
+ [ND_CMD_VENDOR_EFFECT_LOG] = "effect_log",
+ [ND_CMD_VENDOR] = "vendor",
+ };
+
+ if (cmd < ARRAY_SIZE(names) && names[cmd])
+ return names[cmd];
+ return "unknown";
+}
+
+#define ND_IOCTL 'N'
+
+#define ND_IOCTL_SMART _IOWR(ND_IOCTL, ND_CMD_SMART,\
+ struct nd_cmd_smart)
+
+#define ND_IOCTL_SMART_THRESHOLD _IOWR(ND_IOCTL, ND_CMD_SMART_THRESHOLD,\
+ struct nd_cmd_smart_threshold)
+
+#define ND_IOCTL_DIMM_FLAGS _IOWR(ND_IOCTL, ND_CMD_DIMM_FLAGS,\
+ struct nd_cmd_dimm_flags)
+
+#define ND_IOCTL_GET_CONFIG_SIZE _IOWR(ND_IOCTL, ND_CMD_GET_CONFIG_SIZE,\
+ struct nd_cmd_get_config_size)
+
+#define ND_IOCTL_GET_CONFIG_DATA _IOWR(ND_IOCTL, ND_CMD_GET_CONFIG_DATA,\
+ struct nd_cmd_get_config_data_hdr)
+
+#define ND_IOCTL_SET_CONFIG_DATA _IOWR(ND_IOCTL, ND_CMD_SET_CONFIG_DATA,\
+ struct nd_cmd_set_config_hdr)
+
+#define ND_IOCTL_VENDOR _IOWR(ND_IOCTL, ND_CMD_VENDOR,\
+ struct nd_cmd_vendor_hdr)
+
+#define ND_IOCTL_ARS_CAP _IOWR(ND_IOCTL, ND_CMD_ARS_CAP,\
+ struct nd_cmd_ars_cap)
+
+#define ND_IOCTL_ARS_START _IOWR(ND_IOCTL, ND_CMD_ARS_START,\
+ struct nd_cmd_ars_start)
+
+#define ND_IOCTL_ARS_STATUS _IOWR(ND_IOCTL, ND_CMD_ARS_STATUS,\
+ struct nd_cmd_ars_status)
+
+#endif /* __NDCTL_H__ */

2015-06-17 23:19:44

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 06/16] libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure

* Implement the device-model infrastructure for loading modules and
attaching drivers to nvdimm devices. This is a simple association of a
nd-device-type number with a driver that has a bitmask of supported
device types. To facilitate userspace bind/unbind operations 'modalias'
and 'devtype', that also appear in the uevent, are added as generic
sysfs attributes for all nvdimm devices. The reason for the device-type
number is to support sub-types within a given parent devtype, be it a
vendor-specific sub-type or otherwise.

* The first consumer of this infrastructure is the driver
for dimm devices. It simply uses control messages to retrieve and
store the configuration-data image (label set) from each dimm.

Note: nd_device_register() arranges for asynchronous registration of
nvdimm bus devices by default.

Cc: Greg KH <[email protected]>
Cc: Neil Brown <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit.c | 13 +++
drivers/nvdimm/Makefile | 1
drivers/nvdimm/bus.c | 168 +++++++++++++++++++++++++++++++++++++++++++-
drivers/nvdimm/core.c | 43 +++++++++++
drivers/nvdimm/dimm.c | 92 ++++++++++++++++++++++++
drivers/nvdimm/dimm_devs.c | 136 ++++++++++++++++++++++++++++++++++--
drivers/nvdimm/nd-core.h | 6 +-
drivers/nvdimm/nd.h | 36 +++++++++
include/linux/libnvdimm.h | 2 +
include/linux/nd.h | 39 ++++++++++
include/uapi/linux/ndctl.h | 6 ++
11 files changed, 527 insertions(+), 15 deletions(-)
create mode 100644 drivers/nvdimm/dimm.c
create mode 100644 drivers/nvdimm/nd.h
create mode 100644 include/linux/nd.h

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 9112a6210a4b..c4ccec1bc60b 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -18,6 +18,10 @@
#include <linux/acpi.h>
#include "nfit.h"

+static bool force_enable_dimms;
+module_param(force_enable_dimms, bool, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(force_enable_dimms, "Ignore _STA (ACPI DIMM device) status");
+
static u8 nfit_uuid[NFIT_UUID_MAX][16];

static const u8 *to_nfit_uuid(enum nfit_uuids id)
@@ -633,6 +637,7 @@ static struct attribute_group acpi_nfit_dimm_attribute_group = {

static const struct attribute_group *acpi_nfit_dimm_attribute_groups[] = {
&nvdimm_attribute_group,
+ &nd_device_attribute_group,
&acpi_nfit_dimm_attribute_group,
NULL,
};
@@ -669,7 +674,7 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
if (!adev_dimm) {
dev_err(dev, "no ACPI.NFIT device with _ADR %#x, disabling...\n",
device_handle);
- return -ENODEV;
+ return force_enable_dimms ? 0 : -ENODEV;
}

status = acpi_evaluate_integer(adev_dimm->handle, "_STA", NULL, &sta);
@@ -690,12 +695,13 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
if (acpi_check_dsm(adev_dimm->handle, uuid, 1, 1ULL << i))
set_bit(i, &nfit_mem->dsm_mask);

- return rc;
+ return force_enable_dimms ? 0 : rc;
}

static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
{
struct nfit_mem *nfit_mem;
+ int dimm_count = 0;

list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
struct nvdimm *nvdimm;
@@ -729,9 +735,10 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
return -ENOMEM;

nfit_mem->nvdimm = nvdimm;
+ dimm_count++;
}

- return 0;
+ return nvdimm_bus_check_dimm_count(acpi_desc->nvdimm_bus, dimm_count);
}

static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 5b68738ba406..d44b5c1fcd3b 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
libnvdimm-y := core.o
libnvdimm-y += bus.o
libnvdimm-y += dimm_devs.o
+libnvdimm-y += dimm.o
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 15f3a3ddc225..a0308f1872bf 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -16,19 +16,183 @@
#include <linux/fcntl.h>
#include <linux/async.h>
#include <linux/ndctl.h>
+#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/io.h>
#include <linux/mm.h>
+#include <linux/nd.h>
#include "nd-core.h"
+#include "nd.h"

int nvdimm_major;
static int nvdimm_bus_major;
static struct class *nd_class;

-struct bus_type nvdimm_bus_type = {
+static int to_nd_device_type(struct device *dev)
+{
+ if (is_nvdimm(dev))
+ return ND_DEVICE_DIMM;
+
+ return 0;
+}
+
+static int nvdimm_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
+{
+ return add_uevent_var(env, "MODALIAS=" ND_DEVICE_MODALIAS_FMT,
+ to_nd_device_type(dev));
+}
+
+static int nvdimm_bus_match(struct device *dev, struct device_driver *drv)
+{
+ struct nd_device_driver *nd_drv = to_nd_device_driver(drv);
+
+ return test_bit(to_nd_device_type(dev), &nd_drv->type);
+}
+
+static int nvdimm_bus_probe(struct device *dev)
+{
+ struct nd_device_driver *nd_drv = to_nd_device_driver(dev->driver);
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+ int rc;
+
+ rc = nd_drv->probe(dev);
+ dev_dbg(&nvdimm_bus->dev, "%s.probe(%s) = %d\n", dev->driver->name,
+ dev_name(dev), rc);
+ return rc;
+}
+
+static int nvdimm_bus_remove(struct device *dev)
+{
+ struct nd_device_driver *nd_drv = to_nd_device_driver(dev->driver);
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+ int rc;
+
+ rc = nd_drv->remove(dev);
+ dev_dbg(&nvdimm_bus->dev, "%s.remove(%s) = %d\n", dev->driver->name,
+ dev_name(dev), rc);
+ return rc;
+}
+
+static struct bus_type nvdimm_bus_type = {
.name = "nd",
+ .uevent = nvdimm_bus_uevent,
+ .match = nvdimm_bus_match,
+ .probe = nvdimm_bus_probe,
+ .remove = nvdimm_bus_remove,
+};
+
+static ASYNC_DOMAIN_EXCLUSIVE(nd_async_domain);
+
+void nd_synchronize(void)
+{
+ async_synchronize_full_domain(&nd_async_domain);
+}
+EXPORT_SYMBOL_GPL(nd_synchronize);
+
+static void nd_async_device_register(void *d, async_cookie_t cookie)
+{
+ struct device *dev = d;
+
+ if (device_add(dev) != 0) {
+ dev_err(dev, "%s: failed\n", __func__);
+ put_device(dev);
+ }
+ put_device(dev);
+}
+
+static void nd_async_device_unregister(void *d, async_cookie_t cookie)
+{
+ struct device *dev = d;
+
+ device_unregister(dev);
+ put_device(dev);
+}
+
+void nd_device_register(struct device *dev)
+{
+ dev->bus = &nvdimm_bus_type;
+ device_initialize(dev);
+ get_device(dev);
+ async_schedule_domain(nd_async_device_register, dev,
+ &nd_async_domain);
+}
+EXPORT_SYMBOL(nd_device_register);
+
+void nd_device_unregister(struct device *dev, enum nd_async_mode mode)
+{
+ switch (mode) {
+ case ND_ASYNC:
+ get_device(dev);
+ async_schedule_domain(nd_async_device_unregister, dev,
+ &nd_async_domain);
+ break;
+ case ND_SYNC:
+ nd_synchronize();
+ device_unregister(dev);
+ break;
+ }
+}
+EXPORT_SYMBOL(nd_device_unregister);
+
+/**
+ * __nd_driver_register() - register a region or a namespace driver
+ * @nd_drv: driver to register
+ * @owner: automatically set by nd_driver_register() macro
+ * @mod_name: automatically set by nd_driver_register() macro
+ */
+int __nd_driver_register(struct nd_device_driver *nd_drv, struct module *owner,
+ const char *mod_name)
+{
+ struct device_driver *drv = &nd_drv->drv;
+
+ if (!nd_drv->type) {
+ pr_debug("driver type bitmask not set (%pf)\n",
+ __builtin_return_address(0));
+ return -EINVAL;
+ }
+
+ if (!nd_drv->probe || !nd_drv->remove) {
+ pr_debug("->probe() and ->remove() must be specified\n");
+ return -EINVAL;
+ }
+
+ drv->bus = &nvdimm_bus_type;
+ drv->owner = owner;
+ drv->mod_name = mod_name;
+
+ return driver_register(drv);
+}
+EXPORT_SYMBOL(__nd_driver_register);
+
+static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, ND_DEVICE_MODALIAS_FMT "\n",
+ to_nd_device_type(dev));
+}
+static DEVICE_ATTR_RO(modalias);
+
+static ssize_t devtype_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%s\n", dev->type->name);
+}
+static DEVICE_ATTR_RO(devtype);
+
+static struct attribute *nd_device_attributes[] = {
+ &dev_attr_modalias.attr,
+ &dev_attr_devtype.attr,
+ NULL,
+};
+
+/**
+ * nd_device_attribute_group - generic attributes for all devices on an nd bus
+ */
+struct attribute_group nd_device_attribute_group = {
+ .attrs = nd_device_attributes,
};
+EXPORT_SYMBOL_GPL(nd_device_attribute_group);

int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus)
{
@@ -404,7 +568,7 @@ int __init nvdimm_bus_init(void)
return rc;
}

-void __exit nvdimm_bus_exit(void)
+void nvdimm_bus_exit(void)
{
class_destroy(nd_class);
unregister_chrdev(nvdimm_bus_major, "ndctl");
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 1ce159095c52..50ab880f0dc0 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -18,6 +18,7 @@
#include <linux/mutex.h>
#include <linux/slab.h>
#include "nd-core.h"
+#include "nd.h"

LIST_HEAD(nvdimm_bus_list);
DEFINE_MUTEX(nvdimm_bus_list_mutex);
@@ -98,8 +99,33 @@ static ssize_t provider_show(struct device *dev,
}
static DEVICE_ATTR_RO(provider);

+static int flush_namespaces(struct device *dev, void *data)
+{
+ device_lock(dev);
+ device_unlock(dev);
+ return 0;
+}
+
+static int flush_regions_dimms(struct device *dev, void *data)
+{
+ device_lock(dev);
+ device_unlock(dev);
+ device_for_each_child(dev, NULL, flush_namespaces);
+ return 0;
+}
+
+static ssize_t wait_probe_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ nd_synchronize();
+ device_for_each_child(dev, NULL, flush_regions_dimms);
+ return sprintf(buf, "1\n");
+}
+static DEVICE_ATTR_RO(wait_probe);
+
static struct attribute *nvdimm_bus_attributes[] = {
&dev_attr_commands.attr,
+ &dev_attr_wait_probe.attr,
&dev_attr_provider.attr,
NULL,
};
@@ -161,7 +187,7 @@ static int child_unregister(struct device *dev, void *data)
if (dev->class)
/* pass */;
else
- device_unregister(dev);
+ nd_device_unregister(dev, ND_SYNC);
return 0;
}

@@ -174,6 +200,7 @@ void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus)
list_del_init(&nvdimm_bus->list);
mutex_unlock(&nvdimm_bus_list_mutex);

+ nd_synchronize();
device_for_each_child(&nvdimm_bus->dev, NULL, child_unregister);
nvdimm_bus_destroy_ndctl(nvdimm_bus);

@@ -183,12 +210,24 @@ EXPORT_SYMBOL_GPL(nvdimm_bus_unregister);

static __init int libnvdimm_init(void)
{
- return nvdimm_bus_init();
+ int rc;
+
+ rc = nvdimm_bus_init();
+ if (rc)
+ return rc;
+ rc = nvdimm_init();
+ if (rc)
+ goto err_dimm;
+ return 0;
+ err_dimm:
+ nvdimm_bus_exit();
+ return rc;
}

static __exit void libnvdimm_exit(void)
{
WARN_ON(!list_empty(&nvdimm_bus_list));
+ nvdimm_exit();
nvdimm_bus_exit();
}

diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
new file mode 100644
index 000000000000..28001a6ccd4e
--- /dev/null
+++ b/drivers/nvdimm/dimm.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/vmalloc.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/sizes.h>
+#include <linux/ndctl.h>
+#include <linux/slab.h>
+#include <linux/mm.h>
+#include <linux/nd.h>
+#include "nd.h"
+
+static void free_data(struct nvdimm_drvdata *ndd)
+{
+ if (!ndd)
+ return;
+
+ if (ndd->data && is_vmalloc_addr(ndd->data))
+ vfree(ndd->data);
+ else
+ kfree(ndd->data);
+ kfree(ndd);
+}
+
+static int nvdimm_probe(struct device *dev)
+{
+ struct nvdimm_drvdata *ndd;
+ int rc;
+
+ ndd = kzalloc(sizeof(*ndd), GFP_KERNEL);
+ if (!ndd)
+ return -ENOMEM;
+
+ dev_set_drvdata(dev, ndd);
+ ndd->dev = dev;
+
+ rc = nvdimm_init_nsarea(ndd);
+ if (rc)
+ goto err;
+
+ rc = nvdimm_init_config_data(ndd);
+ if (rc)
+ goto err;
+
+ dev_dbg(dev, "config data size: %d\n", ndd->nsarea.config_size);
+
+ return 0;
+
+ err:
+ free_data(ndd);
+ return rc;
+}
+
+static int nvdimm_remove(struct device *dev)
+{
+ struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
+
+ free_data(ndd);
+
+ return 0;
+}
+
+static struct nd_device_driver nvdimm_driver = {
+ .probe = nvdimm_probe,
+ .remove = nvdimm_remove,
+ .drv = {
+ .name = "nvdimm",
+ },
+ .type = ND_DRIVER_DIMM,
+};
+
+int __init nvdimm_init(void)
+{
+ return nd_driver_register(&nvdimm_driver);
+}
+
+void __exit nvdimm_exit(void)
+{
+ driver_unregister(&nvdimm_driver.drv);
+}
+
+MODULE_ALIAS_ND_DEVICE(ND_DEVICE_DIMM);
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index c3dd7227d1bb..b3ae86f2e1da 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -11,6 +11,7 @@
* General Public License for more details.
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/vmalloc.h>
#include <linux/device.h>
#include <linux/ndctl.h>
#include <linux/slab.h>
@@ -18,9 +19,115 @@
#include <linux/fs.h>
#include <linux/mm.h>
#include "nd-core.h"
+#include "nd.h"

static DEFINE_IDA(dimm_ida);

+/*
+ * Retrieve bus and dimm handle and return if this bus supports
+ * get_config_data commands
+ */
+static int __validate_dimm(struct nvdimm_drvdata *ndd)
+{
+ struct nvdimm *nvdimm;
+
+ if (!ndd)
+ return -EINVAL;
+
+ nvdimm = to_nvdimm(ndd->dev);
+
+ if (!nvdimm->dsm_mask)
+ return -ENXIO;
+ if (!test_bit(ND_CMD_GET_CONFIG_DATA, nvdimm->dsm_mask))
+ return -ENXIO;
+
+ return 0;
+}
+
+static int validate_dimm(struct nvdimm_drvdata *ndd)
+{
+ int rc = __validate_dimm(ndd);
+
+ if (rc && ndd)
+ dev_dbg(ndd->dev, "%pf: %s error: %d\n",
+ __builtin_return_address(0), __func__, rc);
+ return rc;
+}
+
+/**
+ * nvdimm_init_nsarea - determine the geometry of a dimm's namespace area
+ * @nvdimm: dimm to initialize
+ */
+int nvdimm_init_nsarea(struct nvdimm_drvdata *ndd)
+{
+ struct nd_cmd_get_config_size *cmd = &ndd->nsarea;
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(ndd->dev);
+ struct nvdimm_bus_descriptor *nd_desc;
+ int rc = validate_dimm(ndd);
+
+ if (rc)
+ return rc;
+
+ if (cmd->config_size)
+ return 0; /* already valid */
+
+ memset(cmd, 0, sizeof(*cmd));
+ nd_desc = nvdimm_bus->nd_desc;
+ return nd_desc->ndctl(nd_desc, to_nvdimm(ndd->dev),
+ ND_CMD_GET_CONFIG_SIZE, cmd, sizeof(*cmd));
+}
+
+int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
+{
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(ndd->dev);
+ struct nd_cmd_get_config_data_hdr *cmd;
+ struct nvdimm_bus_descriptor *nd_desc;
+ int rc = validate_dimm(ndd);
+ u32 max_cmd_size, config_size;
+ size_t offset;
+
+ if (rc)
+ return rc;
+
+ if (ndd->data)
+ return 0;
+
+ if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0)
+ return -ENXIO;
+
+ ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
+ if (!ndd->data)
+ ndd->data = vmalloc(ndd->nsarea.config_size);
+
+ if (!ndd->data)
+ return -ENOMEM;
+
+ max_cmd_size = min_t(u32, PAGE_SIZE, ndd->nsarea.max_xfer);
+ cmd = kzalloc(max_cmd_size + sizeof(*cmd), GFP_KERNEL);
+ if (!cmd)
+ return -ENOMEM;
+
+ nd_desc = nvdimm_bus->nd_desc;
+ for (config_size = ndd->nsarea.config_size, offset = 0;
+ config_size; config_size -= cmd->in_length,
+ offset += cmd->in_length) {
+ cmd->in_length = min(config_size, max_cmd_size);
+ cmd->in_offset = offset;
+ rc = nd_desc->ndctl(nd_desc, to_nvdimm(ndd->dev),
+ ND_CMD_GET_CONFIG_DATA, cmd,
+ cmd->in_length + sizeof(*cmd));
+ if (rc || cmd->status) {
+ rc = -ENXIO;
+ break;
+ }
+ memcpy(ndd->data + offset, cmd->out_buf, cmd->in_length);
+ }
+ dev_dbg(ndd->dev, "%s: len: %zu rc: %d\n", __func__, offset, rc);
+ kfree(cmd);
+
+ return rc;
+}
+
static void nvdimm_release(struct device *dev)
{
struct nvdimm *nvdimm = to_nvdimm(dev);
@@ -111,14 +218,33 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
dev_set_name(dev, "nmem%d", nvdimm->id);
dev->parent = &nvdimm_bus->dev;
dev->type = &nvdimm_device_type;
- dev->bus = &nvdimm_bus_type;
dev->devt = MKDEV(nvdimm_major, nvdimm->id);
dev->groups = groups;
- if (device_register(dev) != 0) {
- put_device(dev);
- return NULL;
- }
+ nd_device_register(dev);

return nvdimm;
}
EXPORT_SYMBOL_GPL(nvdimm_create);
+
+static int count_dimms(struct device *dev, void *c)
+{
+ int *count = c;
+
+ if (is_nvdimm(dev))
+ (*count)++;
+ return 0;
+}
+
+int nvdimm_bus_check_dimm_count(struct nvdimm_bus *nvdimm_bus, int dimm_count)
+{
+ int count = 0;
+ /* Flush any possible dimm registration failures */
+ nd_synchronize();
+
+ device_for_each_child(&nvdimm_bus->dev, &count, count_dimms);
+ dev_dbg(&nvdimm_bus->dev, "%s: count: %d\n", __func__, count);
+ if (count != dimm_count)
+ return -ENXIO;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(nvdimm_bus_check_dimm_count);
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 59528b3c9de8..f2004b790874 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -17,7 +17,6 @@

extern struct list_head nvdimm_bus_list;
extern struct mutex nvdimm_bus_list_mutex;
-extern struct bus_type nvdimm_bus_type;
extern int nvdimm_major;

struct nvdimm_bus {
@@ -35,10 +34,11 @@ struct nvdimm {
int id;
};

-bool is_nvdimm(struct device *dev);
struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev);
int __init nvdimm_bus_init(void);
-void __exit nvdimm_bus_exit(void);
+void nvdimm_bus_exit(void);
int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus);
void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus);
+void nd_synchronize(void);
+bool is_nvdimm(struct device *dev);
#endif /* __ND_CORE_H__ */
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
new file mode 100644
index 000000000000..1f7f6ecab0fc
--- /dev/null
+++ b/drivers/nvdimm/nd.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __ND_H__
+#define __ND_H__
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/ndctl.h>
+
+struct nvdimm_drvdata {
+ struct device *dev;
+ struct nd_cmd_get_config_size nsarea;
+ void *data;
+};
+
+enum nd_async_mode {
+ ND_SYNC,
+ ND_ASYNC,
+};
+
+void nd_device_register(struct device *dev);
+void nd_device_unregister(struct device *dev, enum nd_async_mode mode);
+int __init nvdimm_init(void);
+void nvdimm_exit(void);
+int nvdimm_init_nsarea(struct nvdimm_drvdata *ndd);
+int nvdimm_init_config_data(struct nvdimm_drvdata *ndd);
+#endif /* __ND_H__ */
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index a39235819af3..d3ebccf4ea8b 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -30,6 +30,7 @@ enum {

extern struct attribute_group nvdimm_bus_attribute_group;
extern struct attribute_group nvdimm_attribute_group;
+extern struct attribute_group nd_device_attribute_group;

struct nvdimm;
struct nvdimm_bus_descriptor;
@@ -71,4 +72,5 @@ u32 nd_cmd_in_size(struct nvdimm *nvdimm, int cmd,
u32 nd_cmd_out_size(struct nvdimm *nvdimm, int cmd,
const struct nd_cmd_desc *desc, int idx, const u32 *in_field,
const u32 *out_field);
+int nvdimm_bus_check_dimm_count(struct nvdimm_bus *nvdimm_bus, int dimm_count);
#endif /* __LIBNVDIMM_H__ */
diff --git a/include/linux/nd.h b/include/linux/nd.h
new file mode 100644
index 000000000000..e074f67e53a3
--- /dev/null
+++ b/include/linux/nd.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __LINUX_ND_H__
+#define __LINUX_ND_H__
+#include <linux/ndctl.h>
+#include <linux/device.h>
+
+struct nd_device_driver {
+ struct device_driver drv;
+ unsigned long type;
+ int (*probe)(struct device *dev);
+ int (*remove)(struct device *dev);
+};
+
+static inline struct nd_device_driver *to_nd_device_driver(
+ struct device_driver *drv)
+{
+ return container_of(drv, struct nd_device_driver, drv);
+}
+
+#define MODULE_ALIAS_ND_DEVICE(type) \
+ MODULE_ALIAS("nd:t" __stringify(type) "*")
+#define ND_DEVICE_MODALIAS_FMT "nd:t%d"
+
+int __must_check __nd_driver_register(struct nd_device_driver *nd_drv,
+ struct module *module, const char *mod_name);
+#define nd_driver_register(driver) \
+ __nd_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
+#endif /* __LINUX_ND_H__ */
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index ff13c23b26df..37640916d146 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -175,4 +175,10 @@ static inline const char *nvdimm_cmd_name(unsigned cmd)
#define ND_IOCTL_ARS_STATUS _IOWR(ND_IOCTL, ND_CMD_ARS_STATUS,\
struct nd_cmd_ars_status)

+
+#define ND_DEVICE_DIMM 1 /* nd_dimm: container for "config data" */
+
+enum nd_driver_flags {
+ ND_DRIVER_DIMM = 1 << ND_DEVICE_DIMM,
+};
#endif /* __NDCTL_H__ */

2015-06-17 23:16:47

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 07/16] libnvdimm, nfit: regions (block-data-window, persistent memory, volatile memory)

A "region" device represents the maximum capacity of a BLK range (mmio
block-data-window(s)), or a PMEM range (DAX-capable persistent memory or
volatile memory), without regard for aliasing. Aliasing, in the
dimm-local address space (DPA), is resolved by metadata on a dimm to
designate which exclusive interface will access the aliased DPA ranges.
Support for the per-dimm metadata/label arrvies is in a subsequent
patch.

The name format of "region" devices is "regionN" where, like dimms, N is
a global ida index assigned at discovery time. This id is not reliable
across reboots nor in the presence of hotplug. Look to attributes of
the region or static id-data of the sub-namespace to generate a
persistent name. However, if the platform configuration does not change
it is reasonable to expect the same region id to be assigned at the next
boot.

"region"s have 2 generic attributes "size", and "mapping"s where:
- size: the BLK accessible capacity or the span of the
system physical address range in the case of PMEM.

- mappingN: a tuple describing a dimm's contribution to the region's
capacity in the format (<nmemX>,<dpa>,<size>). For a PMEM-region
there will be at least one mapping per dimm in the interleave set. For
a BLK-region there is only "mapping0" listing the starting DPA of the
BLK-region and the available DPA capacity of that space (matches "size"
above).

The max number of mappings per "region" is hard coded per the
constraints of sysfs attribute groups. That said the number of mappings
per region should never exceed the maximum number of possible dimms in
the system. If the current number turns out to not be enough then the
"mappings" attribute clarifies how many there are supposed to be. "32
should be enough for anybody...".

Cc: Neil Brown <[email protected]>
Cc: <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Robert Moore <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit.c | 148 +++++++++++++++++++++
drivers/nvdimm/Makefile | 1
drivers/nvdimm/nd-core.h | 3
drivers/nvdimm/nd.h | 11 ++
drivers/nvdimm/region_devs.c | 297 ++++++++++++++++++++++++++++++++++++++++++
include/linux/libnvdimm.h | 25 ++++
6 files changed, 484 insertions(+), 1 deletion(-)
create mode 100644 drivers/nvdimm/region_devs.c

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index c4ccec1bc60b..068f69d70c9e 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -757,11 +757,153 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
set_bit(i, &nd_desc->dsm_mask);
}

+static ssize_t range_index_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ struct nfit_spa *nfit_spa = nd_region_provider_data(nd_region);
+
+ return sprintf(buf, "%d\n", nfit_spa->spa->range_index);
+}
+static DEVICE_ATTR_RO(range_index);
+
+static struct attribute *acpi_nfit_region_attributes[] = {
+ &dev_attr_range_index.attr,
+ NULL,
+};
+
+static struct attribute_group acpi_nfit_region_attribute_group = {
+ .name = "nfit",
+ .attrs = acpi_nfit_region_attributes,
+};
+
+static const struct attribute_group *acpi_nfit_region_attribute_groups[] = {
+ &nd_region_attribute_group,
+ &nd_mapping_attribute_group,
+ &acpi_nfit_region_attribute_group,
+ NULL,
+};
+
+static int acpi_nfit_init_mapping(struct acpi_nfit_desc *acpi_desc,
+ struct nd_mapping *nd_mapping, struct nd_region_desc *ndr_desc,
+ struct acpi_nfit_memory_map *memdev,
+ struct acpi_nfit_system_address *spa)
+{
+ struct nvdimm *nvdimm = acpi_nfit_dimm_by_handle(acpi_desc,
+ memdev->device_handle);
+ struct nfit_mem *nfit_mem;
+ int blk_valid = 0;
+
+ if (!nvdimm) {
+ dev_err(acpi_desc->dev, "spa%d dimm: %#x not found\n",
+ spa->range_index, memdev->device_handle);
+ return -ENODEV;
+ }
+
+ nd_mapping->nvdimm = nvdimm;
+ switch (nfit_spa_type(spa)) {
+ case NFIT_SPA_PM:
+ case NFIT_SPA_VOLATILE:
+ nd_mapping->start = memdev->address;
+ nd_mapping->size = memdev->region_size;
+ break;
+ case NFIT_SPA_DCR:
+ nfit_mem = nvdimm_provider_data(nvdimm);
+ if (!nfit_mem || !nfit_mem->bdw) {
+ dev_dbg(acpi_desc->dev, "spa%d %s missing bdw\n",
+ spa->range_index, nvdimm_name(nvdimm));
+ } else {
+ nd_mapping->size = nfit_mem->bdw->capacity;
+ nd_mapping->start = nfit_mem->bdw->start_address;
+ blk_valid = 1;
+ }
+
+ ndr_desc->nd_mapping = nd_mapping;
+ ndr_desc->num_mappings = blk_valid;
+ if (!nvdimm_blk_region_create(acpi_desc->nvdimm_bus, ndr_desc))
+ return -ENOMEM;
+ break;
+ }
+
+ return 0;
+}
+
+static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
+ struct nfit_spa *nfit_spa)
+{
+ static struct nd_mapping nd_mappings[ND_MAX_MAPPINGS];
+ struct acpi_nfit_system_address *spa = nfit_spa->spa;
+ struct nfit_memdev *nfit_memdev;
+ struct nd_region_desc ndr_desc;
+ struct nvdimm_bus *nvdimm_bus;
+ struct resource res;
+ int count = 0;
+
+ if (spa->range_index == 0) {
+ dev_dbg(acpi_desc->dev, "%s: detected invalid spa index\n",
+ __func__);
+ return 0;
+ }
+
+ memset(&res, 0, sizeof(res));
+ memset(&nd_mappings, 0, sizeof(nd_mappings));
+ memset(&ndr_desc, 0, sizeof(ndr_desc));
+ res.start = spa->address;
+ res.end = res.start + spa->length - 1;
+ ndr_desc.res = &res;
+ ndr_desc.provider_data = nfit_spa;
+ ndr_desc.attr_groups = acpi_nfit_region_attribute_groups;
+ list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
+ struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev;
+ struct nd_mapping *nd_mapping;
+ int rc;
+
+ if (memdev->range_index != spa->range_index)
+ continue;
+ if (count >= ND_MAX_MAPPINGS) {
+ dev_err(acpi_desc->dev, "spa%d exceeds max mappings %d\n",
+ spa->range_index, ND_MAX_MAPPINGS);
+ return -ENXIO;
+ }
+ nd_mapping = &nd_mappings[count++];
+ rc = acpi_nfit_init_mapping(acpi_desc, nd_mapping, &ndr_desc,
+ memdev, spa);
+ if (rc)
+ return rc;
+ }
+
+ ndr_desc.nd_mapping = nd_mappings;
+ ndr_desc.num_mappings = count;
+ nvdimm_bus = acpi_desc->nvdimm_bus;
+ if (nfit_spa_type(spa) == NFIT_SPA_PM) {
+ if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
+ return -ENOMEM;
+ } else if (nfit_spa_type(spa) == NFIT_SPA_VOLATILE) {
+ if (!nvdimm_volatile_region_create(nvdimm_bus, &ndr_desc))
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static int acpi_nfit_register_regions(struct acpi_nfit_desc *acpi_desc)
+{
+ struct nfit_spa *nfit_spa;
+
+ list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+ int rc = acpi_nfit_register_region(acpi_desc, nfit_spa);
+
+ if (rc)
+ return rc;
+ }
+ return 0;
+}
+
static int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
{
struct device *dev = acpi_desc->dev;
const void *end;
u8 *data;
+ int rc;

INIT_LIST_HEAD(&acpi_desc->spas);
INIT_LIST_HEAD(&acpi_desc->dcrs);
@@ -786,7 +928,11 @@ static int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)

acpi_nfit_init_dsms(acpi_desc);

- return acpi_nfit_register_dimms(acpi_desc);
+ rc = acpi_nfit_register_dimms(acpi_desc);
+ if (rc)
+ return rc;
+
+ return acpi_nfit_register_regions(acpi_desc);
}

static int acpi_nfit_add(struct acpi_device *adev)
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index d44b5c1fcd3b..88afd0d849c3 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -4,3 +4,4 @@ libnvdimm-y := core.o
libnvdimm-y += bus.o
libnvdimm-y += dimm_devs.o
libnvdimm-y += dimm.o
+libnvdimm-y += region_devs.o
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index f2004b790874..1d760bf24857 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -40,5 +40,8 @@ void nvdimm_bus_exit(void);
int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus);
void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus);
void nd_synchronize(void);
+int nvdimm_bus_register_dimms(struct nvdimm_bus *nvdimm_bus);
+int nvdimm_bus_register_regions(struct nvdimm_bus *nvdimm_bus);
+int nd_match_dimm(struct device *dev, void *data);
bool is_nvdimm(struct device *dev);
#endif /* __ND_CORE_H__ */
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 1f7f6ecab0fc..ea0cca337aa6 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -12,6 +12,7 @@
*/
#ifndef __ND_H__
#define __ND_H__
+#include <linux/libnvdimm.h>
#include <linux/device.h>
#include <linux/mutex.h>
#include <linux/ndctl.h>
@@ -22,6 +23,16 @@ struct nvdimm_drvdata {
void *data;
};

+struct nd_region {
+ struct device dev;
+ u16 ndr_mappings;
+ u64 ndr_size;
+ u64 ndr_start;
+ int id;
+ void *provider_data;
+ struct nd_mapping mapping[0];
+};
+
enum nd_async_mode {
ND_SYNC,
ND_ASYNC,
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
new file mode 100644
index 000000000000..4bda2e0df8f7
--- /dev/null
+++ b/drivers/nvdimm/region_devs.c
@@ -0,0 +1,297 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/slab.h>
+#include <linux/io.h>
+#include "nd-core.h"
+#include "nd.h"
+
+static DEFINE_IDA(region_ida);
+
+static void nd_region_release(struct device *dev)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ u16 i;
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+ put_device(&nvdimm->dev);
+ }
+ ida_simple_remove(&region_ida, nd_region->id);
+ kfree(nd_region);
+}
+
+static struct device_type nd_blk_device_type = {
+ .name = "nd_blk",
+ .release = nd_region_release,
+};
+
+static struct device_type nd_pmem_device_type = {
+ .name = "nd_pmem",
+ .release = nd_region_release,
+};
+
+static struct device_type nd_volatile_device_type = {
+ .name = "nd_volatile",
+ .release = nd_region_release,
+};
+
+static bool is_nd_pmem(struct device *dev)
+{
+ return dev ? dev->type == &nd_pmem_device_type : false;
+}
+
+struct nd_region *to_nd_region(struct device *dev)
+{
+ struct nd_region *nd_region = container_of(dev, struct nd_region, dev);
+
+ WARN_ON(dev->type->release != nd_region_release);
+ return nd_region;
+}
+EXPORT_SYMBOL_GPL(to_nd_region);
+
+static ssize_t size_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ unsigned long long size = 0;
+
+ if (is_nd_pmem(dev)) {
+ size = nd_region->ndr_size;
+ } else if (nd_region->ndr_mappings == 1) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+
+ size = nd_mapping->size;
+ }
+
+ return sprintf(buf, "%llu\n", size);
+}
+static DEVICE_ATTR_RO(size);
+
+static ssize_t mappings_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+
+ return sprintf(buf, "%d\n", nd_region->ndr_mappings);
+}
+static DEVICE_ATTR_RO(mappings);
+
+static struct attribute *nd_region_attributes[] = {
+ &dev_attr_size.attr,
+ &dev_attr_mappings.attr,
+ NULL,
+};
+
+struct attribute_group nd_region_attribute_group = {
+ .attrs = nd_region_attributes,
+};
+EXPORT_SYMBOL_GPL(nd_region_attribute_group);
+
+static ssize_t mappingN(struct device *dev, char *buf, int n)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ struct nd_mapping *nd_mapping;
+ struct nvdimm *nvdimm;
+
+ if (n >= nd_region->ndr_mappings)
+ return -ENXIO;
+ nd_mapping = &nd_region->mapping[n];
+ nvdimm = nd_mapping->nvdimm;
+
+ return sprintf(buf, "%s,%llu,%llu\n", dev_name(&nvdimm->dev),
+ nd_mapping->start, nd_mapping->size);
+}
+
+#define REGION_MAPPING(idx) \
+static ssize_t mapping##idx##_show(struct device *dev, \
+ struct device_attribute *attr, char *buf) \
+{ \
+ return mappingN(dev, buf, idx); \
+} \
+static DEVICE_ATTR_RO(mapping##idx)
+
+/*
+ * 32 should be enough for a while, even in the presence of socket
+ * interleave a 32-way interleave set is a degenerate case.
+ */
+REGION_MAPPING(0);
+REGION_MAPPING(1);
+REGION_MAPPING(2);
+REGION_MAPPING(3);
+REGION_MAPPING(4);
+REGION_MAPPING(5);
+REGION_MAPPING(6);
+REGION_MAPPING(7);
+REGION_MAPPING(8);
+REGION_MAPPING(9);
+REGION_MAPPING(10);
+REGION_MAPPING(11);
+REGION_MAPPING(12);
+REGION_MAPPING(13);
+REGION_MAPPING(14);
+REGION_MAPPING(15);
+REGION_MAPPING(16);
+REGION_MAPPING(17);
+REGION_MAPPING(18);
+REGION_MAPPING(19);
+REGION_MAPPING(20);
+REGION_MAPPING(21);
+REGION_MAPPING(22);
+REGION_MAPPING(23);
+REGION_MAPPING(24);
+REGION_MAPPING(25);
+REGION_MAPPING(26);
+REGION_MAPPING(27);
+REGION_MAPPING(28);
+REGION_MAPPING(29);
+REGION_MAPPING(30);
+REGION_MAPPING(31);
+
+static umode_t mapping_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ struct device *dev = container_of(kobj, struct device, kobj);
+ struct nd_region *nd_region = to_nd_region(dev);
+
+ if (n < nd_region->ndr_mappings)
+ return a->mode;
+ return 0;
+}
+
+static struct attribute *mapping_attributes[] = {
+ &dev_attr_mapping0.attr,
+ &dev_attr_mapping1.attr,
+ &dev_attr_mapping2.attr,
+ &dev_attr_mapping3.attr,
+ &dev_attr_mapping4.attr,
+ &dev_attr_mapping5.attr,
+ &dev_attr_mapping6.attr,
+ &dev_attr_mapping7.attr,
+ &dev_attr_mapping8.attr,
+ &dev_attr_mapping9.attr,
+ &dev_attr_mapping10.attr,
+ &dev_attr_mapping11.attr,
+ &dev_attr_mapping12.attr,
+ &dev_attr_mapping13.attr,
+ &dev_attr_mapping14.attr,
+ &dev_attr_mapping15.attr,
+ &dev_attr_mapping16.attr,
+ &dev_attr_mapping17.attr,
+ &dev_attr_mapping18.attr,
+ &dev_attr_mapping19.attr,
+ &dev_attr_mapping20.attr,
+ &dev_attr_mapping21.attr,
+ &dev_attr_mapping22.attr,
+ &dev_attr_mapping23.attr,
+ &dev_attr_mapping24.attr,
+ &dev_attr_mapping25.attr,
+ &dev_attr_mapping26.attr,
+ &dev_attr_mapping27.attr,
+ &dev_attr_mapping28.attr,
+ &dev_attr_mapping29.attr,
+ &dev_attr_mapping30.attr,
+ &dev_attr_mapping31.attr,
+ NULL,
+};
+
+struct attribute_group nd_mapping_attribute_group = {
+ .is_visible = mapping_visible,
+ .attrs = mapping_attributes,
+};
+EXPORT_SYMBOL_GPL(nd_mapping_attribute_group);
+
+void *nd_region_provider_data(struct nd_region *nd_region)
+{
+ return nd_region->provider_data;
+}
+EXPORT_SYMBOL_GPL(nd_region_provider_data);
+
+static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
+ struct nd_region_desc *ndr_desc, struct device_type *dev_type,
+ const char *caller)
+{
+ struct nd_region *nd_region;
+ struct device *dev;
+ u16 i;
+
+ for (i = 0; i < ndr_desc->num_mappings; i++) {
+ struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
+ struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+ if ((nd_mapping->start | nd_mapping->size) % SZ_4K) {
+ dev_err(&nvdimm_bus->dev, "%s: %s mapping%d is not 4K aligned\n",
+ caller, dev_name(&nvdimm->dev), i);
+
+ return NULL;
+ }
+ }
+
+ nd_region = kzalloc(sizeof(struct nd_region)
+ + sizeof(struct nd_mapping) * ndr_desc->num_mappings,
+ GFP_KERNEL);
+ if (!nd_region)
+ return NULL;
+ nd_region->id = ida_simple_get(&region_ida, 0, 0, GFP_KERNEL);
+ if (nd_region->id < 0) {
+ kfree(nd_region);
+ return NULL;
+ }
+
+ memcpy(nd_region->mapping, ndr_desc->nd_mapping,
+ sizeof(struct nd_mapping) * ndr_desc->num_mappings);
+ for (i = 0; i < ndr_desc->num_mappings; i++) {
+ struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
+ struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+ get_device(&nvdimm->dev);
+ }
+ nd_region->ndr_mappings = ndr_desc->num_mappings;
+ nd_region->provider_data = ndr_desc->provider_data;
+ dev = &nd_region->dev;
+ dev_set_name(dev, "region%d", nd_region->id);
+ dev->parent = &nvdimm_bus->dev;
+ dev->type = dev_type;
+ dev->groups = ndr_desc->attr_groups;
+ nd_region->ndr_size = resource_size(ndr_desc->res);
+ nd_region->ndr_start = ndr_desc->res->start;
+ nd_device_register(dev);
+
+ return nd_region;
+}
+
+struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
+ struct nd_region_desc *ndr_desc)
+{
+ return nd_region_create(nvdimm_bus, ndr_desc, &nd_pmem_device_type,
+ __func__);
+}
+EXPORT_SYMBOL_GPL(nvdimm_pmem_region_create);
+
+struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
+ struct nd_region_desc *ndr_desc)
+{
+ if (ndr_desc->num_mappings > 1)
+ return NULL;
+ return nd_region_create(nvdimm_bus, ndr_desc, &nd_blk_device_type,
+ __func__);
+}
+EXPORT_SYMBOL_GPL(nvdimm_blk_region_create);
+
+struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
+ struct nd_region_desc *ndr_desc)
+{
+ return nd_region_create(nvdimm_bus, ndr_desc, &nd_volatile_device_type,
+ __func__);
+}
+EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index d3ebccf4ea8b..39e7e606092a 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -26,11 +26,14 @@ enum {
ND_CMD_MAX_ELEM = 4,
ND_CMD_MAX_ENVELOPE = 16,
ND_CMD_ARS_STATUS_MAX = SZ_4K,
+ ND_MAX_MAPPINGS = 32,
};

extern struct attribute_group nvdimm_bus_attribute_group;
extern struct attribute_group nvdimm_attribute_group;
extern struct attribute_group nd_device_attribute_group;
+extern struct attribute_group nd_region_attribute_group;
+extern struct attribute_group nd_mapping_attribute_group;

struct nvdimm;
struct nvdimm_bus_descriptor;
@@ -38,6 +41,12 @@ typedef int (*ndctl_fn)(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len);

+struct nd_mapping {
+ struct nvdimm *nvdimm;
+ u64 start;
+ u64 size;
+};
+
struct nvdimm_bus_descriptor {
const struct attribute_group **attr_groups;
unsigned long dsm_mask;
@@ -52,6 +61,14 @@ struct nd_cmd_desc {
int out_sizes[ND_CMD_MAX_ELEM];
};

+struct nd_region_desc {
+ struct resource *res;
+ struct nd_mapping *nd_mapping;
+ u16 num_mappings;
+ const struct attribute_group **attr_groups;
+ void *provider_data;
+};
+
struct nvdimm_bus;
struct device;
struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
@@ -59,9 +76,11 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus);
struct nvdimm_bus *to_nvdimm_bus(struct device *dev);
struct nvdimm *to_nvdimm(struct device *dev);
+struct nd_region *to_nd_region(struct device *dev);
struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
const char *nvdimm_name(struct nvdimm *nvdimm);
void *nvdimm_provider_data(struct nvdimm *nvdimm);
+void *nd_region_provider_data(struct nd_region *nd_region);
struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
const struct attribute_group **groups, unsigned long flags,
unsigned long *dsm_mask);
@@ -73,4 +92,10 @@ u32 nd_cmd_out_size(struct nvdimm *nvdimm, int cmd,
const struct nd_cmd_desc *desc, int idx, const u32 *in_field,
const u32 *out_field);
int nvdimm_bus_check_dimm_count(struct nvdimm_bus *nvdimm_bus, int dimm_count);
+struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
+ struct nd_region_desc *ndr_desc);
+struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
+ struct nd_region_desc *ndr_desc);
+struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
+ struct nd_region_desc *ndr_desc);
#endif /* __LIBNVDIMM_H__ */

2015-06-17 23:19:18

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 08/16] libnvdimm: support for legacy (non-aliasing) nvdimms

The libnvdimm region driver is an intermediary driver that translates
non-volatile "region"s into "namespace" sub-devices that are surfaced by
persistent memory block-device drivers (PMEM and BLK).

ACPI 6 introduces the concept that a given nvdimm may simultaneously
offer multiple access modes to its media through direct PMEM load/store
access, or windowed BLK mode. Existing nvdimms mostly implement a PMEM
interface, some offer a BLK-like mode, but never both as ACPI 6 defines.
If an nvdimm is single interfaced, then there is no need for dimm
metadata labels. For these devices we can take the region boundaries
directly to create a child namespace device (nd_namespace_io).

Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit.c | 1
drivers/nvdimm/Makefile | 2 +
drivers/nvdimm/bus.c | 26 +++++++++
drivers/nvdimm/core.c | 44 ++++++++++++++-
drivers/nvdimm/dimm.c | 2 -
drivers/nvdimm/namespace_devs.c | 111 +++++++++++++++++++++++++++++++++++++++
drivers/nvdimm/nd-core.h | 6 ++
drivers/nvdimm/nd.h | 13 +++++
drivers/nvdimm/region.c | 93 +++++++++++++++++++++++++++++++++
drivers/nvdimm/region_devs.c | 66 +++++++++++++++++++++++
include/linux/libnvdimm.h | 7 ++
include/linux/nd.h | 10 ++++
include/uapi/linux/ndctl.h | 10 ++++
13 files changed, 383 insertions(+), 8 deletions(-)
create mode 100644 drivers/nvdimm/namespace_devs.c
create mode 100644 drivers/nvdimm/region.c

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 068f69d70c9e..ce290748fe36 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -780,6 +780,7 @@ static struct attribute_group acpi_nfit_region_attribute_group = {
static const struct attribute_group *acpi_nfit_region_attribute_groups[] = {
&nd_region_attribute_group,
&nd_mapping_attribute_group,
+ &nd_device_attribute_group,
&acpi_nfit_region_attribute_group,
NULL,
};
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 88afd0d849c3..af5e2760ddbd 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -5,3 +5,5 @@ libnvdimm-y += bus.o
libnvdimm-y += dimm_devs.o
libnvdimm-y += dimm.o
libnvdimm-y += region_devs.o
+libnvdimm-y += region.o
+libnvdimm-y += namespace_devs.o
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a0308f1872bf..4b77665a6cc8 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -13,6 +13,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/vmalloc.h>
#include <linux/uaccess.h>
+#include <linux/module.h>
#include <linux/fcntl.h>
#include <linux/async.h>
#include <linux/ndctl.h>
@@ -33,6 +34,12 @@ static int to_nd_device_type(struct device *dev)
{
if (is_nvdimm(dev))
return ND_DEVICE_DIMM;
+ else if (is_nd_pmem(dev))
+ return ND_DEVICE_REGION_PMEM;
+ else if (is_nd_blk(dev))
+ return ND_DEVICE_REGION_BLK;
+ else if (is_nd_pmem(dev->parent) || is_nd_blk(dev->parent))
+ return nd_region_to_nstype(to_nd_region(dev->parent));

return 0;
}
@@ -50,27 +57,46 @@ static int nvdimm_bus_match(struct device *dev, struct device_driver *drv)
return test_bit(to_nd_device_type(dev), &nd_drv->type);
}

+static struct module *to_bus_provider(struct device *dev)
+{
+ /* pin bus providers while regions are enabled */
+ if (is_nd_pmem(dev) || is_nd_blk(dev)) {
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+
+ return nvdimm_bus->module;
+ }
+ return NULL;
+}
+
static int nvdimm_bus_probe(struct device *dev)
{
struct nd_device_driver *nd_drv = to_nd_device_driver(dev->driver);
+ struct module *provider = to_bus_provider(dev);
struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
int rc;

+ if (!try_module_get(provider))
+ return -ENXIO;
+
rc = nd_drv->probe(dev);
dev_dbg(&nvdimm_bus->dev, "%s.probe(%s) = %d\n", dev->driver->name,
dev_name(dev), rc);
+ if (rc != 0)
+ module_put(provider);
return rc;
}

static int nvdimm_bus_remove(struct device *dev)
{
struct nd_device_driver *nd_drv = to_nd_device_driver(dev->driver);
+ struct module *provider = to_bus_provider(dev);
struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
int rc;

rc = nd_drv->remove(dev);
dev_dbg(&nvdimm_bus->dev, "%s.remove(%s) = %d\n", dev->driver->name,
dev_name(dev), rc);
+ module_put(provider);
return rc;
}

diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 50ab880f0dc0..1b6b15d11f54 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -24,6 +24,36 @@ LIST_HEAD(nvdimm_bus_list);
DEFINE_MUTEX(nvdimm_bus_list_mutex);
static DEFINE_IDA(nd_ida);

+void nvdimm_bus_lock(struct device *dev)
+{
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+
+ if (!nvdimm_bus)
+ return;
+ mutex_lock(&nvdimm_bus->reconfig_mutex);
+}
+EXPORT_SYMBOL(nvdimm_bus_lock);
+
+void nvdimm_bus_unlock(struct device *dev)
+{
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+
+ if (!nvdimm_bus)
+ return;
+ mutex_unlock(&nvdimm_bus->reconfig_mutex);
+}
+EXPORT_SYMBOL(nvdimm_bus_unlock);
+
+bool is_nvdimm_bus_locked(struct device *dev)
+{
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+
+ if (!nvdimm_bus)
+ return false;
+ return mutex_is_locked(&nvdimm_bus->reconfig_mutex);
+}
+EXPORT_SYMBOL(is_nvdimm_bus_locked);
+
static void nvdimm_bus_release(struct device *dev)
{
struct nvdimm_bus *nvdimm_bus;
@@ -135,8 +165,8 @@ struct attribute_group nvdimm_bus_attribute_group = {
};
EXPORT_SYMBOL_GPL(nvdimm_bus_attribute_group);

-struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
- struct nvdimm_bus_descriptor *nd_desc)
+struct nvdimm_bus *__nvdimm_bus_register(struct device *parent,
+ struct nvdimm_bus_descriptor *nd_desc, struct module *module)
{
struct nvdimm_bus *nvdimm_bus;
int rc;
@@ -146,11 +176,13 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
return NULL;
INIT_LIST_HEAD(&nvdimm_bus->list);
nvdimm_bus->id = ida_simple_get(&nd_ida, 0, 0, GFP_KERNEL);
+ mutex_init(&nvdimm_bus->reconfig_mutex);
if (nvdimm_bus->id < 0) {
kfree(nvdimm_bus);
return NULL;
}
nvdimm_bus->nd_desc = nd_desc;
+ nvdimm_bus->module = module;
nvdimm_bus->dev.parent = parent;
nvdimm_bus->dev.release = nvdimm_bus_release;
nvdimm_bus->dev.groups = nd_desc->attr_groups;
@@ -174,7 +206,7 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
put_device(&nvdimm_bus->dev);
return NULL;
}
-EXPORT_SYMBOL_GPL(nvdimm_bus_register);
+EXPORT_SYMBOL_GPL(__nvdimm_bus_register);

static int child_unregister(struct device *dev, void *data)
{
@@ -218,7 +250,12 @@ static __init int libnvdimm_init(void)
rc = nvdimm_init();
if (rc)
goto err_dimm;
+ rc = nd_region_init();
+ if (rc)
+ goto err_region;
return 0;
+ err_region:
+ nvdimm_exit();
err_dimm:
nvdimm_bus_exit();
return rc;
@@ -227,6 +264,7 @@ static __init int libnvdimm_init(void)
static __exit void libnvdimm_exit(void)
{
WARN_ON(!list_empty(&nvdimm_bus_list));
+ nd_region_exit();
nvdimm_exit();
nvdimm_bus_exit();
}
diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
index 28001a6ccd4e..eb20fc2df32b 100644
--- a/drivers/nvdimm/dimm.c
+++ b/drivers/nvdimm/dimm.c
@@ -84,7 +84,7 @@ int __init nvdimm_init(void)
return nd_driver_register(&nvdimm_driver);
}

-void __exit nvdimm_exit(void)
+void nvdimm_exit(void)
{
driver_unregister(&nvdimm_driver.drv);
}
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
new file mode 100644
index 000000000000..4f653d1e61ad
--- /dev/null
+++ b/drivers/nvdimm/namespace_devs.c
@@ -0,0 +1,111 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/nd.h>
+#include "nd.h"
+
+static void namespace_io_release(struct device *dev)
+{
+ struct nd_namespace_io *nsio = to_nd_namespace_io(dev);
+
+ kfree(nsio);
+}
+
+static struct device_type namespace_io_device_type = {
+ .name = "nd_namespace_io",
+ .release = namespace_io_release,
+};
+
+static ssize_t nstype_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev->parent);
+
+ return sprintf(buf, "%d\n", nd_region_to_nstype(nd_region));
+}
+static DEVICE_ATTR_RO(nstype);
+
+static struct attribute *nd_namespace_attributes[] = {
+ &dev_attr_nstype.attr,
+ NULL,
+};
+
+static struct attribute_group nd_namespace_attribute_group = {
+ .attrs = nd_namespace_attributes,
+};
+
+static const struct attribute_group *nd_namespace_attribute_groups[] = {
+ &nd_device_attribute_group,
+ &nd_namespace_attribute_group,
+ NULL,
+};
+
+static struct device **create_namespace_io(struct nd_region *nd_region)
+{
+ struct nd_namespace_io *nsio;
+ struct device *dev, **devs;
+ struct resource *res;
+
+ nsio = kzalloc(sizeof(*nsio), GFP_KERNEL);
+ if (!nsio)
+ return NULL;
+
+ devs = kcalloc(2, sizeof(struct device *), GFP_KERNEL);
+ if (!devs) {
+ kfree(nsio);
+ return NULL;
+ }
+
+ dev = &nsio->dev;
+ dev->type = &namespace_io_device_type;
+ dev->parent = &nd_region->dev;
+ res = &nsio->res;
+ res->name = dev_name(&nd_region->dev);
+ res->flags = IORESOURCE_MEM;
+ res->start = nd_region->ndr_start;
+ res->end = res->start + nd_region->ndr_size - 1;
+
+ devs[0] = dev;
+ return devs;
+}
+
+int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
+{
+ struct device **devs = NULL;
+ int i;
+
+ *err = 0;
+ switch (nd_region_to_nstype(nd_region)) {
+ case ND_DEVICE_NAMESPACE_IO:
+ devs = create_namespace_io(nd_region);
+ break;
+ default:
+ break;
+ }
+
+ if (!devs)
+ return -ENODEV;
+
+ for (i = 0; devs[i]; i++) {
+ struct device *dev = devs[i];
+
+ dev_set_name(dev, "namespace%d.%d", nd_region->id, i);
+ dev->groups = nd_namespace_attribute_groups;
+ nd_device_register(dev);
+ }
+ kfree(devs);
+
+ return i;
+}
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 1d760bf24857..0e9b41fd2546 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -21,9 +21,11 @@ extern int nvdimm_major;

struct nvdimm_bus {
struct nvdimm_bus_descriptor *nd_desc;
+ struct module *module;
struct list_head list;
struct device dev;
int id;
+ struct mutex reconfig_mutex;
};

struct nvdimm {
@@ -34,6 +36,9 @@ struct nvdimm {
int id;
};

+bool is_nvdimm(struct device *dev);
+bool is_nd_blk(struct device *dev);
+bool is_nd_pmem(struct device *dev);
struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev);
int __init nvdimm_bus_init(void);
void nvdimm_bus_exit(void);
@@ -43,5 +48,4 @@ void nd_synchronize(void);
int nvdimm_bus_register_dimms(struct nvdimm_bus *nvdimm_bus);
int nvdimm_bus_register_regions(struct nvdimm_bus *nvdimm_bus);
int nd_match_dimm(struct device *dev, void *data);
-bool is_nvdimm(struct device *dev);
#endif /* __ND_CORE_H__ */
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index ea0cca337aa6..bc5a08e36a25 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -23,6 +23,11 @@ struct nvdimm_drvdata {
void *data;
};

+struct nd_region_namespaces {
+ int count;
+ int active;
+};
+
struct nd_region {
struct device dev;
u16 ndr_mappings;
@@ -41,7 +46,15 @@ enum nd_async_mode {
void nd_device_register(struct device *dev);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode);
int __init nvdimm_init(void);
+int __init nd_region_init(void);
void nvdimm_exit(void);
+void nd_region_exit(void);
int nvdimm_init_nsarea(struct nvdimm_drvdata *ndd);
int nvdimm_init_config_data(struct nvdimm_drvdata *ndd);
+struct nd_region *to_nd_region(struct device *dev);
+int nd_region_to_nstype(struct nd_region *nd_region);
+int nd_region_register_namespaces(struct nd_region *nd_region, int *err);
+void nvdimm_bus_lock(struct device *dev);
+void nvdimm_bus_unlock(struct device *dev);
+bool is_nvdimm_bus_locked(struct device *dev);
#endif /* __ND_H__ */
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
new file mode 100644
index 000000000000..ade3dba81afd
--- /dev/null
+++ b/drivers/nvdimm/region.c
@@ -0,0 +1,93 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/nd.h>
+#include "nd.h"
+
+static int nd_region_probe(struct device *dev)
+{
+ int err;
+ struct nd_region_namespaces *num_ns;
+ struct nd_region *nd_region = to_nd_region(dev);
+ int rc = nd_region_register_namespaces(nd_region, &err);
+
+ num_ns = devm_kzalloc(dev, sizeof(*num_ns), GFP_KERNEL);
+ if (!num_ns)
+ return -ENOMEM;
+
+ if (rc < 0)
+ return rc;
+
+ num_ns->active = rc;
+ num_ns->count = rc + err;
+ dev_set_drvdata(dev, num_ns);
+
+ if (err == 0)
+ return 0;
+
+ if (rc == err)
+ return -ENODEV;
+
+ /*
+ * Given multiple namespaces per region, we do not want to
+ * disable all the successfully registered peer namespaces upon
+ * a single registration failure. If userspace is missing a
+ * namespace that it expects it can disable/re-enable the region
+ * to retry discovery after correcting the failure.
+ * <regionX>/namespaces returns the current
+ * "<async-registered>/<total>" namespace count.
+ */
+ dev_err(dev, "failed to register %d namespace%s, continuing...\n",
+ err, err == 1 ? "" : "s");
+ return 0;
+}
+
+static int child_unregister(struct device *dev, void *data)
+{
+ nd_device_unregister(dev, ND_SYNC);
+ return 0;
+}
+
+static int nd_region_remove(struct device *dev)
+{
+ /* flush attribute readers and disable */
+ nvdimm_bus_lock(dev);
+ dev_set_drvdata(dev, NULL);
+ nvdimm_bus_unlock(dev);
+
+ device_for_each_child(dev, NULL, child_unregister);
+ return 0;
+}
+
+static struct nd_device_driver nd_region_driver = {
+ .probe = nd_region_probe,
+ .remove = nd_region_remove,
+ .drv = {
+ .name = "nd_region",
+ },
+ .type = ND_DRIVER_REGION_BLK | ND_DRIVER_REGION_PMEM,
+};
+
+int __init nd_region_init(void)
+{
+ return nd_driver_register(&nd_region_driver);
+}
+
+void nd_region_exit(void)
+{
+ driver_unregister(&nd_region_driver.drv);
+}
+
+MODULE_ALIAS_ND_DEVICE(ND_DEVICE_REGION_PMEM);
+MODULE_ALIAS_ND_DEVICE(ND_DEVICE_REGION_BLK);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4bda2e0df8f7..b5c5b9095b28 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -47,11 +47,16 @@ static struct device_type nd_volatile_device_type = {
.release = nd_region_release,
};

-static bool is_nd_pmem(struct device *dev)
+bool is_nd_pmem(struct device *dev)
{
return dev ? dev->type == &nd_pmem_device_type : false;
}

+bool is_nd_blk(struct device *dev)
+{
+ return dev ? dev->type == &nd_blk_device_type : false;
+}
+
struct nd_region *to_nd_region(struct device *dev)
{
struct nd_region *nd_region = container_of(dev, struct nd_region, dev);
@@ -61,6 +66,37 @@ struct nd_region *to_nd_region(struct device *dev)
}
EXPORT_SYMBOL_GPL(to_nd_region);

+/**
+ * nd_region_to_nstype() - region to an integer namespace type
+ * @nd_region: region-device to interrogate
+ *
+ * This is the 'nstype' attribute of a region as well, an input to the
+ * MODALIAS for namespace devices, and bit number for a nvdimm_bus to match
+ * namespace devices with namespace drivers.
+ */
+int nd_region_to_nstype(struct nd_region *nd_region)
+{
+ if (is_nd_pmem(&nd_region->dev)) {
+ u16 i, alias;
+
+ for (i = 0, alias = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+ if (nvdimm->flags & NDD_ALIASING)
+ alias++;
+ }
+ if (alias)
+ return ND_DEVICE_NAMESPACE_PMEM;
+ else
+ return ND_DEVICE_NAMESPACE_IO;
+ } else if (is_nd_blk(&nd_region->dev)) {
+ return ND_DEVICE_NAMESPACE_BLK;
+ }
+
+ return 0;
+}
+
static ssize_t size_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -88,9 +124,37 @@ static ssize_t mappings_show(struct device *dev,
}
static DEVICE_ATTR_RO(mappings);

+static ssize_t nstype_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+
+ return sprintf(buf, "%d\n", nd_region_to_nstype(nd_region));
+}
+static DEVICE_ATTR_RO(nstype);
+
+static ssize_t init_namespaces_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region_namespaces *num_ns = dev_get_drvdata(dev);
+ ssize_t rc;
+
+ nvdimm_bus_lock(dev);
+ if (num_ns)
+ rc = sprintf(buf, "%d/%d\n", num_ns->active, num_ns->count);
+ else
+ rc = -ENXIO;
+ nvdimm_bus_unlock(dev);
+
+ return rc;
+}
+static DEVICE_ATTR_RO(init_namespaces);
+
static struct attribute *nd_region_attributes[] = {
&dev_attr_size.attr,
+ &dev_attr_nstype.attr,
&dev_attr_mappings.attr,
+ &dev_attr_init_namespaces.attr,
NULL,
};

diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 39e7e606092a..37f966aff386 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -71,8 +71,11 @@ struct nd_region_desc {

struct nvdimm_bus;
struct device;
-struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
- struct nvdimm_bus_descriptor *nfit_desc);
+struct module;
+struct nvdimm_bus *__nvdimm_bus_register(struct device *parent,
+ struct nvdimm_bus_descriptor *nfit_desc, struct module *module);
+#define nvdimm_bus_register(parent, desc) \
+ __nvdimm_bus_register(parent, desc, THIS_MODULE)
void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus);
struct nvdimm_bus *to_nvdimm_bus(struct device *dev);
struct nvdimm *to_nvdimm(struct device *dev);
diff --git a/include/linux/nd.h b/include/linux/nd.h
index e074f67e53a3..da70e9962197 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -26,6 +26,16 @@ static inline struct nd_device_driver *to_nd_device_driver(
struct device_driver *drv)
{
return container_of(drv, struct nd_device_driver, drv);
+};
+
+struct nd_namespace_io {
+ struct device dev;
+ struct resource res;
+};
+
+static inline struct nd_namespace_io *to_nd_namespace_io(struct device *dev)
+{
+ return container_of(dev, struct nd_namespace_io, dev);
}

#define MODULE_ALIAS_ND_DEVICE(type) \
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index 37640916d146..174b6371dcc1 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -177,8 +177,18 @@ static inline const char *nvdimm_cmd_name(unsigned cmd)


#define ND_DEVICE_DIMM 1 /* nd_dimm: container for "config data" */
+#define ND_DEVICE_REGION_PMEM 2 /* nd_region: (parent of PMEM namespaces) */
+#define ND_DEVICE_REGION_BLK 3 /* nd_region: (parent of BLK namespaces) */
+#define ND_DEVICE_NAMESPACE_IO 4 /* legacy persistent memory */
+#define ND_DEVICE_NAMESPACE_PMEM 5 /* PMEM namespace (may alias with BLK) */
+#define ND_DEVICE_NAMESPACE_BLK 6 /* BLK namespace (may alias with PMEM) */

enum nd_driver_flags {
ND_DRIVER_DIMM = 1 << ND_DEVICE_DIMM,
+ ND_DRIVER_REGION_PMEM = 1 << ND_DEVICE_REGION_PMEM,
+ ND_DRIVER_REGION_BLK = 1 << ND_DEVICE_REGION_BLK,
+ ND_DRIVER_NAMESPACE_IO = 1 << ND_DEVICE_NAMESPACE_IO,
+ ND_DRIVER_NAMESPACE_PMEM = 1 << ND_DEVICE_NAMESPACE_PMEM,
+ ND_DRIVER_NAMESPACE_BLK = 1 << ND_DEVICE_NAMESPACE_BLK,
};
#endif /* __NDCTL_H__ */

2015-06-17 23:16:55

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 09/16] libnvdimm, pmem: move pmem to drivers/nvdimm/

Prepare the pmem driver to consume PMEM namespaces emitted by regions of
an nvdimm_bus instance. No functional change.

Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/block/Kconfig | 11 --
drivers/block/Makefile | 1
drivers/block/pmem.c | 262 -----------------------------------------------
drivers/nvdimm/Kconfig | 23 ++++
drivers/nvdimm/Makefile | 3 +
drivers/nvdimm/pmem.c | 262 +++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 287 insertions(+), 275 deletions(-)
delete mode 100644 drivers/block/pmem.c
create mode 100644 drivers/nvdimm/pmem.c

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index eb1fed5bd516..1b8094d4d7af 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -404,17 +404,6 @@ config BLK_DEV_RAM_DAX
and will prevent RAM block device backing store memory from being
allocated from highmem (only a problem for highmem systems).

-config BLK_DEV_PMEM
- tristate "Persistent memory block device support"
- help
- Saying Y here will allow you to use a contiguous range of reserved
- memory as one or more persistent block devices.
-
- To compile this driver as a module, choose M here: the module will be
- called 'pmem'.
-
- If unsure, say N.
-
config CDROM_PKTCDVD
tristate "Packet writing on CD/DVD media"
depends on !UML
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 9cc6c18a1c7e..02b688d1438d 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -14,7 +14,6 @@ obj-$(CONFIG_PS3_VRAM) += ps3vram.o
obj-$(CONFIG_ATARI_FLOPPY) += ataflop.o
obj-$(CONFIG_AMIGA_Z2RAM) += z2ram.o
obj-$(CONFIG_BLK_DEV_RAM) += brd.o
-obj-$(CONFIG_BLK_DEV_PMEM) += pmem.o
obj-$(CONFIG_BLK_DEV_LOOP) += loop.o
obj-$(CONFIG_BLK_CPQ_DA) += cpqarray.o
obj-$(CONFIG_BLK_CPQ_CISS_DA) += cciss.o
diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c
deleted file mode 100644
index eabf4a8d0085..000000000000
--- a/drivers/block/pmem.c
+++ /dev/null
@@ -1,262 +0,0 @@
-/*
- * Persistent Memory Driver
- *
- * Copyright (c) 2014, Intel Corporation.
- * Copyright (c) 2015, Christoph Hellwig <[email protected]>.
- * Copyright (c) 2015, Boaz Harrosh <[email protected]>.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- */
-
-#include <asm/cacheflush.h>
-#include <linux/blkdev.h>
-#include <linux/hdreg.h>
-#include <linux/init.h>
-#include <linux/platform_device.h>
-#include <linux/module.h>
-#include <linux/moduleparam.h>
-#include <linux/slab.h>
-
-#define PMEM_MINORS 16
-
-struct pmem_device {
- struct request_queue *pmem_queue;
- struct gendisk *pmem_disk;
-
- /* One contiguous memory region per device */
- phys_addr_t phys_addr;
- void *virt_addr;
- size_t size;
-};
-
-static int pmem_major;
-static atomic_t pmem_index;
-
-static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
- unsigned int len, unsigned int off, int rw,
- sector_t sector)
-{
- void *mem = kmap_atomic(page);
- size_t pmem_off = sector << 9;
-
- if (rw == READ) {
- memcpy(mem + off, pmem->virt_addr + pmem_off, len);
- flush_dcache_page(page);
- } else {
- flush_dcache_page(page);
- memcpy(pmem->virt_addr + pmem_off, mem + off, len);
- }
-
- kunmap_atomic(mem);
-}
-
-static void pmem_make_request(struct request_queue *q, struct bio *bio)
-{
- struct block_device *bdev = bio->bi_bdev;
- struct pmem_device *pmem = bdev->bd_disk->private_data;
- int rw;
- struct bio_vec bvec;
- sector_t sector;
- struct bvec_iter iter;
- int err = 0;
-
- if (bio_end_sector(bio) > get_capacity(bdev->bd_disk)) {
- err = -EIO;
- goto out;
- }
-
- BUG_ON(bio->bi_rw & REQ_DISCARD);
-
- rw = bio_data_dir(bio);
- sector = bio->bi_iter.bi_sector;
- bio_for_each_segment(bvec, bio, iter) {
- pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len, bvec.bv_offset,
- rw, sector);
- sector += bvec.bv_len >> 9;
- }
-
-out:
- bio_endio(bio, err);
-}
-
-static int pmem_rw_page(struct block_device *bdev, sector_t sector,
- struct page *page, int rw)
-{
- struct pmem_device *pmem = bdev->bd_disk->private_data;
-
- pmem_do_bvec(pmem, page, PAGE_CACHE_SIZE, 0, rw, sector);
- page_endio(page, rw & WRITE, 0);
-
- return 0;
-}
-
-static long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, unsigned long *pfn, long size)
-{
- struct pmem_device *pmem = bdev->bd_disk->private_data;
- size_t offset = sector << 9;
-
- if (!pmem)
- return -ENODEV;
-
- *kaddr = pmem->virt_addr + offset;
- *pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT;
-
- return pmem->size - offset;
-}
-
-static const struct block_device_operations pmem_fops = {
- .owner = THIS_MODULE,
- .rw_page = pmem_rw_page,
- .direct_access = pmem_direct_access,
-};
-
-static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
-{
- struct pmem_device *pmem;
- struct gendisk *disk;
- int idx, err;
-
- err = -ENOMEM;
- pmem = kzalloc(sizeof(*pmem), GFP_KERNEL);
- if (!pmem)
- goto out;
-
- pmem->phys_addr = res->start;
- pmem->size = resource_size(res);
-
- err = -EINVAL;
- if (!request_mem_region(pmem->phys_addr, pmem->size, "pmem")) {
- dev_warn(dev, "could not reserve region [0x%pa:0x%zx]\n", &pmem->phys_addr, pmem->size);
- goto out_free_dev;
- }
-
- /*
- * Map the memory as non-cachable, as we can't write back the contents
- * of the CPU caches in case of a crash.
- */
- err = -ENOMEM;
- pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size);
- if (!pmem->virt_addr)
- goto out_release_region;
-
- pmem->pmem_queue = blk_alloc_queue(GFP_KERNEL);
- if (!pmem->pmem_queue)
- goto out_unmap;
-
- blk_queue_make_request(pmem->pmem_queue, pmem_make_request);
- blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);
- blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY);
-
- disk = alloc_disk(PMEM_MINORS);
- if (!disk)
- goto out_free_queue;
-
- idx = atomic_inc_return(&pmem_index) - 1;
-
- disk->major = pmem_major;
- disk->first_minor = PMEM_MINORS * idx;
- disk->fops = &pmem_fops;
- disk->private_data = pmem;
- disk->queue = pmem->pmem_queue;
- disk->flags = GENHD_FL_EXT_DEVT;
- sprintf(disk->disk_name, "pmem%d", idx);
- disk->driverfs_dev = dev;
- set_capacity(disk, pmem->size >> 9);
- pmem->pmem_disk = disk;
-
- add_disk(disk);
-
- return pmem;
-
-out_free_queue:
- blk_cleanup_queue(pmem->pmem_queue);
-out_unmap:
- iounmap(pmem->virt_addr);
-out_release_region:
- release_mem_region(pmem->phys_addr, pmem->size);
-out_free_dev:
- kfree(pmem);
-out:
- return ERR_PTR(err);
-}
-
-static void pmem_free(struct pmem_device *pmem)
-{
- del_gendisk(pmem->pmem_disk);
- put_disk(pmem->pmem_disk);
- blk_cleanup_queue(pmem->pmem_queue);
- iounmap(pmem->virt_addr);
- release_mem_region(pmem->phys_addr, pmem->size);
- kfree(pmem);
-}
-
-static int pmem_probe(struct platform_device *pdev)
-{
- struct pmem_device *pmem;
- struct resource *res;
-
- if (WARN_ON(pdev->num_resources > 1))
- return -ENXIO;
-
- res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- if (!res)
- return -ENXIO;
-
- pmem = pmem_alloc(&pdev->dev, res);
- if (IS_ERR(pmem))
- return PTR_ERR(pmem);
-
- platform_set_drvdata(pdev, pmem);
-
- return 0;
-}
-
-static int pmem_remove(struct platform_device *pdev)
-{
- struct pmem_device *pmem = platform_get_drvdata(pdev);
-
- pmem_free(pmem);
- return 0;
-}
-
-static struct platform_driver pmem_driver = {
- .probe = pmem_probe,
- .remove = pmem_remove,
- .driver = {
- .owner = THIS_MODULE,
- .name = "pmem",
- },
-};
-
-static int __init pmem_init(void)
-{
- int error;
-
- pmem_major = register_blkdev(0, "pmem");
- if (pmem_major < 0)
- return pmem_major;
-
- error = platform_driver_register(&pmem_driver);
- if (error)
- unregister_blkdev(pmem_major, "pmem");
- return error;
-}
-module_init(pmem_init);
-
-static void pmem_exit(void)
-{
- platform_driver_unregister(&pmem_driver);
- unregister_blkdev(pmem_major, "pmem");
-}
-module_exit(pmem_exit);
-
-MODULE_AUTHOR("Ross Zwisler <[email protected]>");
-MODULE_LICENSE("GPL v2");
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 92933551f846..07a29113b870 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -1,4 +1,4 @@
-config LIBNVDIMM
+menuconfig LIBNVDIMM
tristate "NVDIMM (Non-Volatile Memory Device) Support"
depends on PHYS_ADDR_T_64BIT
depends on BLK_DEV
@@ -13,3 +13,24 @@ config LIBNVDIMM
CONFIG_DAX). A BLK namespace refers to an NVDIMM control
region which exposes an mmio register set for windowed
access mode to non-volatile memory.
+
+if LIBNVDIMM
+
+config BLK_DEV_PMEM
+ tristate "PMEM: Persistent memory block device support"
+ default LIBNVDIMM
+ depends on HAS_IOMEM
+ help
+ Memory ranges for PMEM are described by either an NFIT
+ (NVDIMM Firmware Interface Table, see CONFIG_NFIT_ACPI), a
+ non-standard OEM-specific E820 memory type (type-12, see
+ CONFIG_X86_PMEM_LEGACY), or it is manually specified by the
+ 'memmap=nn[KMG]!ss[KMG]' kernel command line (see
+ Documentation/kernel-parameters.txt). This driver converts
+ these persistent memory ranges into block devices that are
+ capable of DAX (direct-access) file system mappings. See
+ Documentation/nvdimm/nvdimm.txt for more details.
+
+ Say Y if you want to use an NVDIMM
+
+endif
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index af5e2760ddbd..4d2a27f52faa 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -1,4 +1,7 @@
obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
+obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
+
+nd_pmem-y := pmem.o

libnvdimm-y := core.o
libnvdimm-y += bus.o
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
new file mode 100644
index 000000000000..eabf4a8d0085
--- /dev/null
+++ b/drivers/nvdimm/pmem.c
@@ -0,0 +1,262 @@
+/*
+ * Persistent Memory Driver
+ *
+ * Copyright (c) 2014, Intel Corporation.
+ * Copyright (c) 2015, Christoph Hellwig <[email protected]>.
+ * Copyright (c) 2015, Boaz Harrosh <[email protected]>.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#include <asm/cacheflush.h>
+#include <linux/blkdev.h>
+#include <linux/hdreg.h>
+#include <linux/init.h>
+#include <linux/platform_device.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/slab.h>
+
+#define PMEM_MINORS 16
+
+struct pmem_device {
+ struct request_queue *pmem_queue;
+ struct gendisk *pmem_disk;
+
+ /* One contiguous memory region per device */
+ phys_addr_t phys_addr;
+ void *virt_addr;
+ size_t size;
+};
+
+static int pmem_major;
+static atomic_t pmem_index;
+
+static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
+ unsigned int len, unsigned int off, int rw,
+ sector_t sector)
+{
+ void *mem = kmap_atomic(page);
+ size_t pmem_off = sector << 9;
+
+ if (rw == READ) {
+ memcpy(mem + off, pmem->virt_addr + pmem_off, len);
+ flush_dcache_page(page);
+ } else {
+ flush_dcache_page(page);
+ memcpy(pmem->virt_addr + pmem_off, mem + off, len);
+ }
+
+ kunmap_atomic(mem);
+}
+
+static void pmem_make_request(struct request_queue *q, struct bio *bio)
+{
+ struct block_device *bdev = bio->bi_bdev;
+ struct pmem_device *pmem = bdev->bd_disk->private_data;
+ int rw;
+ struct bio_vec bvec;
+ sector_t sector;
+ struct bvec_iter iter;
+ int err = 0;
+
+ if (bio_end_sector(bio) > get_capacity(bdev->bd_disk)) {
+ err = -EIO;
+ goto out;
+ }
+
+ BUG_ON(bio->bi_rw & REQ_DISCARD);
+
+ rw = bio_data_dir(bio);
+ sector = bio->bi_iter.bi_sector;
+ bio_for_each_segment(bvec, bio, iter) {
+ pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len, bvec.bv_offset,
+ rw, sector);
+ sector += bvec.bv_len >> 9;
+ }
+
+out:
+ bio_endio(bio, err);
+}
+
+static int pmem_rw_page(struct block_device *bdev, sector_t sector,
+ struct page *page, int rw)
+{
+ struct pmem_device *pmem = bdev->bd_disk->private_data;
+
+ pmem_do_bvec(pmem, page, PAGE_CACHE_SIZE, 0, rw, sector);
+ page_endio(page, rw & WRITE, 0);
+
+ return 0;
+}
+
+static long pmem_direct_access(struct block_device *bdev, sector_t sector,
+ void **kaddr, unsigned long *pfn, long size)
+{
+ struct pmem_device *pmem = bdev->bd_disk->private_data;
+ size_t offset = sector << 9;
+
+ if (!pmem)
+ return -ENODEV;
+
+ *kaddr = pmem->virt_addr + offset;
+ *pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT;
+
+ return pmem->size - offset;
+}
+
+static const struct block_device_operations pmem_fops = {
+ .owner = THIS_MODULE,
+ .rw_page = pmem_rw_page,
+ .direct_access = pmem_direct_access,
+};
+
+static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
+{
+ struct pmem_device *pmem;
+ struct gendisk *disk;
+ int idx, err;
+
+ err = -ENOMEM;
+ pmem = kzalloc(sizeof(*pmem), GFP_KERNEL);
+ if (!pmem)
+ goto out;
+
+ pmem->phys_addr = res->start;
+ pmem->size = resource_size(res);
+
+ err = -EINVAL;
+ if (!request_mem_region(pmem->phys_addr, pmem->size, "pmem")) {
+ dev_warn(dev, "could not reserve region [0x%pa:0x%zx]\n", &pmem->phys_addr, pmem->size);
+ goto out_free_dev;
+ }
+
+ /*
+ * Map the memory as non-cachable, as we can't write back the contents
+ * of the CPU caches in case of a crash.
+ */
+ err = -ENOMEM;
+ pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size);
+ if (!pmem->virt_addr)
+ goto out_release_region;
+
+ pmem->pmem_queue = blk_alloc_queue(GFP_KERNEL);
+ if (!pmem->pmem_queue)
+ goto out_unmap;
+
+ blk_queue_make_request(pmem->pmem_queue, pmem_make_request);
+ blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);
+ blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY);
+
+ disk = alloc_disk(PMEM_MINORS);
+ if (!disk)
+ goto out_free_queue;
+
+ idx = atomic_inc_return(&pmem_index) - 1;
+
+ disk->major = pmem_major;
+ disk->first_minor = PMEM_MINORS * idx;
+ disk->fops = &pmem_fops;
+ disk->private_data = pmem;
+ disk->queue = pmem->pmem_queue;
+ disk->flags = GENHD_FL_EXT_DEVT;
+ sprintf(disk->disk_name, "pmem%d", idx);
+ disk->driverfs_dev = dev;
+ set_capacity(disk, pmem->size >> 9);
+ pmem->pmem_disk = disk;
+
+ add_disk(disk);
+
+ return pmem;
+
+out_free_queue:
+ blk_cleanup_queue(pmem->pmem_queue);
+out_unmap:
+ iounmap(pmem->virt_addr);
+out_release_region:
+ release_mem_region(pmem->phys_addr, pmem->size);
+out_free_dev:
+ kfree(pmem);
+out:
+ return ERR_PTR(err);
+}
+
+static void pmem_free(struct pmem_device *pmem)
+{
+ del_gendisk(pmem->pmem_disk);
+ put_disk(pmem->pmem_disk);
+ blk_cleanup_queue(pmem->pmem_queue);
+ iounmap(pmem->virt_addr);
+ release_mem_region(pmem->phys_addr, pmem->size);
+ kfree(pmem);
+}
+
+static int pmem_probe(struct platform_device *pdev)
+{
+ struct pmem_device *pmem;
+ struct resource *res;
+
+ if (WARN_ON(pdev->num_resources > 1))
+ return -ENXIO;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res)
+ return -ENXIO;
+
+ pmem = pmem_alloc(&pdev->dev, res);
+ if (IS_ERR(pmem))
+ return PTR_ERR(pmem);
+
+ platform_set_drvdata(pdev, pmem);
+
+ return 0;
+}
+
+static int pmem_remove(struct platform_device *pdev)
+{
+ struct pmem_device *pmem = platform_get_drvdata(pdev);
+
+ pmem_free(pmem);
+ return 0;
+}
+
+static struct platform_driver pmem_driver = {
+ .probe = pmem_probe,
+ .remove = pmem_remove,
+ .driver = {
+ .owner = THIS_MODULE,
+ .name = "pmem",
+ },
+};
+
+static int __init pmem_init(void)
+{
+ int error;
+
+ pmem_major = register_blkdev(0, "pmem");
+ if (pmem_major < 0)
+ return pmem_major;
+
+ error = platform_driver_register(&pmem_driver);
+ if (error)
+ unregister_blkdev(pmem_major, "pmem");
+ return error;
+}
+module_init(pmem_init);
+
+static void pmem_exit(void)
+{
+ platform_driver_unregister(&pmem_driver);
+ unregister_blkdev(pmem_major, "pmem");
+}
+module_exit(pmem_exit);
+
+MODULE_AUTHOR("Ross Zwisler <[email protected]>");
+MODULE_LICENSE("GPL v2");

2015-06-17 23:17:01

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 10/16] libnvdimm, pmem: add libnvdimm support to the pmem driver

nd_pmem attaches to persistent memory regions and namespaces emitted by
the libnvdimm subsystem, and, same as the original pmem driver, presents
the system-physical-address range as a block device.

The existing e820-type-12 to pmem setup is converted to an nvdimm_bus
that emits an nd_namespace_io device.

Note that the X in 'pmemX' is now derived from the parent region. This
provides some stability to the pmem devices names from boot-to-boot.
The minor numbers are also more predictable by passing 0 to
alloc_disk().

Cc: Andy Lutomirski <[email protected]>
Cc: Boaz Harrosh <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Ross Zwisler <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/Kconfig | 3 ++
arch/x86/kernel/pmem.c | 92 +++++++++++++++++++++++++++++++-----------------
drivers/nvdimm/pmem.c | 68 +++++++++++++++++------------------
3 files changed, 96 insertions(+), 67 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 226d5696e1d1..1a2cbf641667 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1424,6 +1424,9 @@ source "mm/Kconfig"

config X86_PMEM_LEGACY
bool "Support non-standard NVDIMMs and ADR protected memory"
+ depends on PHYS_ADDR_T_64BIT
+ depends on BLK_DEV
+ select LIBNVDIMM
help
Treat memory marked using the non-standard e820 type of 12 as used
by the Intel Sandy Bridge-EP reference BIOS as protected memory.
diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
index 3420c874ddc5..0f4ef472ab9e 100644
--- a/arch/x86/kernel/pmem.c
+++ b/arch/x86/kernel/pmem.c
@@ -1,53 +1,81 @@
/*
* Copyright (c) 2015, Christoph Hellwig.
+ * Copyright (c) 2015, Intel Corporation.
*/
-#include <linux/memblock.h>
#include <linux/platform_device.h>
-#include <linux/slab.h>
+#include <linux/libnvdimm.h>
+#include <linux/module.h>
#include <asm/e820.h>
-#include <asm/page_types.h>
-#include <asm/setup.h>

-static __init void register_pmem_device(struct resource *res)
+static void e820_pmem_release(struct device *dev)
{
- struct platform_device *pdev;
- int error;
+ struct nvdimm_bus *nvdimm_bus = dev->platform_data;

- pdev = platform_device_alloc("pmem", PLATFORM_DEVID_AUTO);
- if (!pdev)
- return;
+ if (nvdimm_bus)
+ nvdimm_bus_unregister(nvdimm_bus);
+}

- error = platform_device_add_resources(pdev, res, 1);
- if (error)
- goto out_put_pdev;
+static struct platform_device e820_pmem = {
+ .name = "e820_pmem",
+ .id = -1,
+ .dev = {
+ .release = e820_pmem_release,
+ },
+};

- error = platform_device_add(pdev);
- if (error)
- goto out_put_pdev;
- return;
+static const struct attribute_group *e820_pmem_attribute_groups[] = {
+ &nvdimm_bus_attribute_group,
+ NULL,
+};

-out_put_pdev:
- dev_warn(&pdev->dev, "failed to add 'pmem' (persistent memory) device!\n");
- platform_device_put(pdev);
-}
+static const struct attribute_group *e820_pmem_region_attribute_groups[] = {
+ &nd_region_attribute_group,
+ &nd_device_attribute_group,
+ NULL,
+};

-static __init int register_pmem_devices(void)
+static __init int register_e820_pmem(void)
{
- int i;
+ static struct nvdimm_bus_descriptor nd_desc;
+ struct device *dev = &e820_pmem.dev;
+ struct nvdimm_bus *nvdimm_bus;
+ int rc, i;
+
+ rc = platform_device_register(&e820_pmem);
+ if (rc)
+ return rc;
+
+ nd_desc.attr_groups = e820_pmem_attribute_groups;
+ nd_desc.provider_name = "e820";
+ nvdimm_bus = nvdimm_bus_register(dev, &nd_desc);
+ if (!nvdimm_bus)
+ goto err;
+ dev->platform_data = nvdimm_bus;

for (i = 0; i < e820.nr_map; i++) {
struct e820entry *ei = &e820.map[i];
+ struct resource res = {
+ .flags = IORESOURCE_MEM,
+ .start = ei->addr,
+ .end = ei->addr + ei->size - 1,
+ };
+ struct nd_region_desc ndr_desc;
+
+ if (ei->type != E820_PRAM)
+ continue;

- if (ei->type == E820_PRAM) {
- struct resource res = {
- .flags = IORESOURCE_MEM,
- .start = ei->addr,
- .end = ei->addr + ei->size - 1,
- };
- register_pmem_device(&res);
- }
+ memset(&ndr_desc, 0, sizeof(ndr_desc));
+ ndr_desc.res = &res;
+ ndr_desc.attr_groups = e820_pmem_region_attribute_groups;
+ if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
+ goto err;
}

return 0;
+
+ err:
+ dev_err(dev, "failed to register legacy persistent memory ranges\n");
+ platform_device_unregister(&e820_pmem);
+ return -ENXIO;
}
-device_initcall(register_pmem_devices);
+device_initcall(register_e820_pmem);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index eabf4a8d0085..d46975ed9e40 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -1,7 +1,7 @@
/*
* Persistent Memory Driver
*
- * Copyright (c) 2014, Intel Corporation.
+ * Copyright (c) 2014-2015, Intel Corporation.
* Copyright (c) 2015, Christoph Hellwig <[email protected]>.
* Copyright (c) 2015, Boaz Harrosh <[email protected]>.
*
@@ -23,8 +23,8 @@
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/slab.h>
-
-#define PMEM_MINORS 16
+#include <linux/nd.h>
+#include "nd.h"

struct pmem_device {
struct request_queue *pmem_queue;
@@ -37,7 +37,6 @@ struct pmem_device {
};

static int pmem_major;
-static atomic_t pmem_index;

static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
unsigned int len, unsigned int off, int rw,
@@ -118,11 +117,12 @@ static const struct block_device_operations pmem_fops = {
.direct_access = pmem_direct_access,
};

-static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
+static struct pmem_device *pmem_alloc(struct device *dev,
+ struct resource *res, int id)
{
struct pmem_device *pmem;
struct gendisk *disk;
- int idx, err;
+ int err;

err = -ENOMEM;
pmem = kzalloc(sizeof(*pmem), GFP_KERNEL);
@@ -134,7 +134,8 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)

err = -EINVAL;
if (!request_mem_region(pmem->phys_addr, pmem->size, "pmem")) {
- dev_warn(dev, "could not reserve region [0x%pa:0x%zx]\n", &pmem->phys_addr, pmem->size);
+ dev_warn(dev, "could not reserve region [0x%pa:0x%zx]\n",
+ &pmem->phys_addr, pmem->size);
goto out_free_dev;
}

@@ -155,19 +156,17 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);
blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY);

- disk = alloc_disk(PMEM_MINORS);
+ disk = alloc_disk(0);
if (!disk)
goto out_free_queue;

- idx = atomic_inc_return(&pmem_index) - 1;
-
disk->major = pmem_major;
- disk->first_minor = PMEM_MINORS * idx;
+ disk->first_minor = 0;
disk->fops = &pmem_fops;
disk->private_data = pmem;
disk->queue = pmem->pmem_queue;
disk->flags = GENHD_FL_EXT_DEVT;
- sprintf(disk->disk_name, "pmem%d", idx);
+ sprintf(disk->disk_name, "pmem%d", id);
disk->driverfs_dev = dev;
set_capacity(disk, pmem->size >> 9);
pmem->pmem_disk = disk;
@@ -198,42 +197,38 @@ static void pmem_free(struct pmem_device *pmem)
kfree(pmem);
}

-static int pmem_probe(struct platform_device *pdev)
+static int nd_pmem_probe(struct device *dev)
{
+ struct nd_region *nd_region = to_nd_region(dev->parent);
+ struct nd_namespace_io *nsio = to_nd_namespace_io(dev);
struct pmem_device *pmem;
- struct resource *res;
-
- if (WARN_ON(pdev->num_resources > 1))
- return -ENXIO;
-
- res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- if (!res)
- return -ENXIO;

- pmem = pmem_alloc(&pdev->dev, res);
+ pmem = pmem_alloc(dev, &nsio->res, nd_region->id);
if (IS_ERR(pmem))
return PTR_ERR(pmem);

- platform_set_drvdata(pdev, pmem);
+ dev_set_drvdata(dev, pmem);

return 0;
}

-static int pmem_remove(struct platform_device *pdev)
+static int nd_pmem_remove(struct device *dev)
{
- struct pmem_device *pmem = platform_get_drvdata(pdev);
+ struct pmem_device *pmem = dev_get_drvdata(dev);

pmem_free(pmem);
return 0;
}

-static struct platform_driver pmem_driver = {
- .probe = pmem_probe,
- .remove = pmem_remove,
- .driver = {
- .owner = THIS_MODULE,
- .name = "pmem",
+MODULE_ALIAS("pmem");
+MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_IO);
+static struct nd_device_driver nd_pmem_driver = {
+ .probe = nd_pmem_probe,
+ .remove = nd_pmem_remove,
+ .drv = {
+ .name = "nd_pmem",
},
+ .type = ND_DRIVER_NAMESPACE_IO,
};

static int __init pmem_init(void)
@@ -244,16 +239,19 @@ static int __init pmem_init(void)
if (pmem_major < 0)
return pmem_major;

- error = platform_driver_register(&pmem_driver);
- if (error)
+ error = nd_driver_register(&nd_pmem_driver);
+ if (error) {
unregister_blkdev(pmem_major, "pmem");
- return error;
+ return error;
+ }
+
+ return 0;
}
module_init(pmem_init);

static void pmem_exit(void)
{
- platform_driver_unregister(&pmem_driver);
+ driver_unregister(&nd_pmem_driver.drv);
unregister_blkdev(pmem_major, "pmem");
}
module_exit(pmem_exit);

2015-06-17 23:17:10

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 11/16] libnvdimm, nfit: add interleave-set state-tracking infrastructure

On platforms that have firmware support for reading/writing per-dimm
label space, a portion of the dimm may be accessible via an interleave
set PMEM mapping in addition to the dimm's BLK (block-data-window
aperture(s)) interface. A label, stored in a "configuration data
region" on the dimm, disambiguates which dimm addresses are accessed
through which exclusive interface.

Add infrastructure that allows the kernel to block modifications to a
label in the set while any member dimm is active. Note that this is
meant only for enforcing "no modifications of active labels" via the
coarse ioctl command. Adding/deleting namespaces from an active
interleave set is always possible via sysfs.

Another aspect of tracking interleave sets is tracking their integrity
when DIMMs in a set are physically re-ordered. For this purpose we
generate an "interleave-set cookie" that can be recorded in a label and
validated against the current configuration. It is the bus provider
implementation's responsibility to calculate the interleave set cookie
and attach it to a given region.

Cc: Neil Brown <[email protected]>
Cc: <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Robert Moore <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit.c | 93 +++++++++++++++++++++++++++++++++++++++++-
drivers/nvdimm/bus.c | 59 ++++++++++++++++++++++++++-
drivers/nvdimm/core.c | 17 ++++++++
drivers/nvdimm/dimm_devs.c | 19 ++++++++-
drivers/nvdimm/nd-core.h | 10 ++++-
drivers/nvdimm/nd.h | 1
drivers/nvdimm/region_devs.c | 69 +++++++++++++++++++++++++++++++
include/linux/libnvdimm.h | 6 +++
8 files changed, 269 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index ce290748fe36..35af6f7f0abd 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -16,6 +16,7 @@
#include <linux/ndctl.h>
#include <linux/list.h>
#include <linux/acpi.h>
+#include <linux/sort.h>
#include "nfit.h"

static bool force_enable_dimms;
@@ -785,6 +786,91 @@ static const struct attribute_group *acpi_nfit_region_attribute_groups[] = {
NULL,
};

+/* enough info to uniquely specify an interleave set */
+struct nfit_set_info {
+ struct nfit_set_info_map {
+ u64 region_offset;
+ u32 serial_number;
+ u32 pad;
+ } mapping[0];
+};
+
+static size_t sizeof_nfit_set_info(int num_mappings)
+{
+ return sizeof(struct nfit_set_info)
+ + num_mappings * sizeof(struct nfit_set_info_map);
+}
+
+static int cmp_map(const void *m0, const void *m1)
+{
+ const struct nfit_set_info_map *map0 = m0;
+ const struct nfit_set_info_map *map1 = m1;
+
+ return memcmp(&map0->region_offset, &map1->region_offset,
+ sizeof(u64));
+}
+
+/* Retrieve the nth entry referencing this spa */
+static struct acpi_nfit_memory_map *memdev_from_spa(
+ struct acpi_nfit_desc *acpi_desc, u16 range_index, int n)
+{
+ struct nfit_memdev *nfit_memdev;
+
+ list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list)
+ if (nfit_memdev->memdev->range_index == range_index)
+ if (n-- == 0)
+ return nfit_memdev->memdev;
+ return NULL;
+}
+
+static int acpi_nfit_init_interleave_set(struct acpi_nfit_desc *acpi_desc,
+ struct nd_region_desc *ndr_desc,
+ struct acpi_nfit_system_address *spa)
+{
+ int i, spa_type = nfit_spa_type(spa);
+ struct device *dev = acpi_desc->dev;
+ struct nd_interleave_set *nd_set;
+ u16 nr = ndr_desc->num_mappings;
+ struct nfit_set_info *info;
+
+ if (spa_type == NFIT_SPA_PM || spa_type == NFIT_SPA_VOLATILE)
+ /* pass */;
+ else
+ return 0;
+
+ nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL);
+ if (!nd_set)
+ return -ENOMEM;
+
+ info = devm_kzalloc(dev, sizeof_nfit_set_info(nr), GFP_KERNEL);
+ if (!info)
+ return -ENOMEM;
+ for (i = 0; i < nr; i++) {
+ struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
+ struct nfit_set_info_map *map = &info->mapping[i];
+ struct nvdimm *nvdimm = nd_mapping->nvdimm;
+ struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
+ struct acpi_nfit_memory_map *memdev = memdev_from_spa(acpi_desc,
+ spa->range_index, i);
+
+ if (!memdev || !nfit_mem->dcr) {
+ dev_err(dev, "%s: failed to find DCR\n", __func__);
+ return -ENODEV;
+ }
+
+ map->region_offset = memdev->region_offset;
+ map->serial_number = nfit_mem->dcr->serial_number;
+ }
+
+ sort(&info->mapping[0], nr, sizeof(struct nfit_set_info_map),
+ cmp_map, NULL);
+ nd_set->cookie = nd_fletcher64(info, sizeof_nfit_set_info(nr), 0);
+ ndr_desc->nd_set = nd_set;
+ devm_kfree(dev, info);
+
+ return 0;
+}
+
static int acpi_nfit_init_mapping(struct acpi_nfit_desc *acpi_desc,
struct nd_mapping *nd_mapping, struct nd_region_desc *ndr_desc,
struct acpi_nfit_memory_map *memdev,
@@ -838,7 +924,7 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
struct nd_region_desc ndr_desc;
struct nvdimm_bus *nvdimm_bus;
struct resource res;
- int count = 0;
+ int count = 0, rc;

if (spa->range_index == 0) {
dev_dbg(acpi_desc->dev, "%s: detected invalid spa index\n",
@@ -857,7 +943,6 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev;
struct nd_mapping *nd_mapping;
- int rc;

if (memdev->range_index != spa->range_index)
continue;
@@ -875,6 +960,10 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,

ndr_desc.nd_mapping = nd_mappings;
ndr_desc.num_mappings = count;
+ rc = acpi_nfit_init_interleave_set(acpi_desc, &ndr_desc, spa);
+ if (rc)
+ return rc;
+
nvdimm_bus = acpi_desc->nvdimm_bus;
if (nfit_spa_type(spa) == NFIT_SPA_PM) {
if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 4b77665a6cc8..ffb43cada625 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -68,6 +68,21 @@ static struct module *to_bus_provider(struct device *dev)
return NULL;
}

+static void nvdimm_bus_probe_start(struct nvdimm_bus *nvdimm_bus)
+{
+ nvdimm_bus_lock(&nvdimm_bus->dev);
+ nvdimm_bus->probe_active++;
+ nvdimm_bus_unlock(&nvdimm_bus->dev);
+}
+
+static void nvdimm_bus_probe_end(struct nvdimm_bus *nvdimm_bus)
+{
+ nvdimm_bus_lock(&nvdimm_bus->dev);
+ if (--nvdimm_bus->probe_active == 0)
+ wake_up(&nvdimm_bus->probe_wait);
+ nvdimm_bus_unlock(&nvdimm_bus->dev);
+}
+
static int nvdimm_bus_probe(struct device *dev)
{
struct nd_device_driver *nd_drv = to_nd_device_driver(dev->driver);
@@ -78,7 +93,12 @@ static int nvdimm_bus_probe(struct device *dev)
if (!try_module_get(provider))
return -ENXIO;

+ nvdimm_bus_probe_start(nvdimm_bus);
rc = nd_drv->probe(dev);
+ if (rc == 0)
+ nd_region_probe_success(nvdimm_bus, dev);
+ nvdimm_bus_probe_end(nvdimm_bus);
+
dev_dbg(&nvdimm_bus->dev, "%s.probe(%s) = %d\n", dev->driver->name,
dev_name(dev), rc);
if (rc != 0)
@@ -94,6 +114,8 @@ static int nvdimm_bus_remove(struct device *dev)
int rc;

rc = nd_drv->remove(dev);
+ nd_region_disable(nvdimm_bus, dev);
+
dev_dbg(&nvdimm_bus->dev, "%s.remove(%s) = %d\n", dev->driver->name,
dev_name(dev), rc);
module_put(provider);
@@ -359,6 +381,34 @@ u32 nd_cmd_out_size(struct nvdimm *nvdimm, int cmd,
}
EXPORT_SYMBOL_GPL(nd_cmd_out_size);

+static void wait_nvdimm_bus_probe_idle(struct nvdimm_bus *nvdimm_bus)
+{
+ do {
+ if (nvdimm_bus->probe_active == 0)
+ break;
+ nvdimm_bus_unlock(&nvdimm_bus->dev);
+ wait_event(nvdimm_bus->probe_wait,
+ nvdimm_bus->probe_active == 0);
+ nvdimm_bus_lock(&nvdimm_bus->dev);
+ } while (true);
+}
+
+/* set_config requires an idle interleave set */
+static int nd_cmd_clear_to_send(struct nvdimm *nvdimm, unsigned int cmd)
+{
+ struct nvdimm_bus *nvdimm_bus;
+
+ if (!nvdimm || cmd != ND_CMD_SET_CONFIG_DATA)
+ return 0;
+
+ nvdimm_bus = walk_to_nvdimm_bus(&nvdimm->dev);
+ wait_nvdimm_bus_probe_idle(nvdimm_bus);
+
+ if (atomic_read(&nvdimm->busy))
+ return -EBUSY;
+ return 0;
+}
+
static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
int read_only, unsigned int ioctl_cmd, unsigned long arg)
{
@@ -469,11 +519,18 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}

+ nvdimm_bus_lock(&nvdimm_bus->dev);
+ rc = nd_cmd_clear_to_send(nvdimm, cmd);
+ if (rc)
+ goto out_unlock;
+
rc = nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len);
if (rc < 0)
- goto out;
+ goto out_unlock;
if (copy_to_user(p, buf, buf_len))
rc = -EFAULT;
+ out_unlock:
+ nvdimm_bus_unlock(&nvdimm_bus->dev);
out:
vfree(buf);
return rc;
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 1b6b15d11f54..7806eaaf4707 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -54,6 +54,22 @@ bool is_nvdimm_bus_locked(struct device *dev)
}
EXPORT_SYMBOL(is_nvdimm_bus_locked);

+u64 nd_fletcher64(void *addr, size_t len, bool le)
+{
+ u32 *buf = addr;
+ u32 lo32 = 0;
+ u64 hi32 = 0;
+ int i;
+
+ for (i = 0; i < len / sizeof(u32); i++) {
+ lo32 += le ? le32_to_cpu((__le32) buf[i]) : buf[i];
+ hi32 += lo32;
+ }
+
+ return hi32 << 32 | lo32;
+}
+EXPORT_SYMBOL_GPL(nd_fletcher64);
+
static void nvdimm_bus_release(struct device *dev)
{
struct nvdimm_bus *nvdimm_bus;
@@ -175,6 +191,7 @@ struct nvdimm_bus *__nvdimm_bus_register(struct device *parent,
if (!nvdimm_bus)
return NULL;
INIT_LIST_HEAD(&nvdimm_bus->list);
+ init_waitqueue_head(&nvdimm_bus->probe_wait);
nvdimm_bus->id = ida_simple_get(&nd_ida, 0, 0, GFP_KERNEL);
mutex_init(&nvdimm_bus->reconfig_mutex);
if (nvdimm_bus->id < 0) {
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index b3ae86f2e1da..bdf8241b6525 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -185,7 +185,24 @@ static ssize_t commands_show(struct device *dev,
}
static DEVICE_ATTR_RO(commands);

+static ssize_t state_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+
+ /*
+ * The state may be in the process of changing, userspace should
+ * quiesce probing if it wants a static answer
+ */
+ nvdimm_bus_lock(dev);
+ nvdimm_bus_unlock(dev);
+ return sprintf(buf, "%s\n", atomic_read(&nvdimm->busy)
+ ? "active" : "idle");
+}
+static DEVICE_ATTR_RO(state);
+
static struct attribute *nvdimm_attributes[] = {
+ &dev_attr_state.attr,
&dev_attr_commands.attr,
NULL,
};
@@ -213,7 +230,7 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
nvdimm->provider_data = provider_data;
nvdimm->flags = flags;
nvdimm->dsm_mask = dsm_mask;
-
+ atomic_set(&nvdimm->busy, 0);
dev = &nvdimm->dev;
dev_set_name(dev, "nmem%d", nvdimm->id);
dev->parent = &nvdimm_bus->dev;
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 0e9b41fd2546..6a4b2c066ee7 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -14,6 +14,9 @@
#define __ND_CORE_H__
#include <linux/libnvdimm.h>
#include <linux/device.h>
+#include <linux/libnvdimm.h>
+#include <linux/sizes.h>
+#include <linux/mutex.h>

extern struct list_head nvdimm_bus_list;
extern struct mutex nvdimm_bus_list_mutex;
@@ -21,10 +24,11 @@ extern int nvdimm_major;

struct nvdimm_bus {
struct nvdimm_bus_descriptor *nd_desc;
+ wait_queue_head_t probe_wait;
struct module *module;
struct list_head list;
struct device dev;
- int id;
+ int id, probe_active;
struct mutex reconfig_mutex;
};

@@ -33,6 +37,7 @@ struct nvdimm {
void *provider_data;
unsigned long *dsm_mask;
struct device dev;
+ atomic_t busy;
int id;
};

@@ -42,10 +47,13 @@ bool is_nd_pmem(struct device *dev);
struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev);
int __init nvdimm_bus_init(void);
void nvdimm_bus_exit(void);
+void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev);
+void nd_region_disable(struct nvdimm_bus *nvdimm_bus, struct device *dev);
int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus);
void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus);
void nd_synchronize(void);
int nvdimm_bus_register_dimms(struct nvdimm_bus *nvdimm_bus);
int nvdimm_bus_register_regions(struct nvdimm_bus *nvdimm_bus);
+int nvdimm_bus_init_interleave_sets(struct nvdimm_bus *nvdimm_bus);
int nd_match_dimm(struct device *dev, void *data);
#endif /* __ND_CORE_H__ */
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index bc5a08e36a25..0285e4588b03 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -35,6 +35,7 @@ struct nd_region {
u64 ndr_start;
int id;
void *provider_data;
+ struct nd_interleave_set *nd_set;
struct nd_mapping mapping[0];
};

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b5c5b9095b28..1571424578f0 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -10,7 +10,10 @@
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*/
+#include <linux/scatterlist.h>
+#include <linux/sched.h>
#include <linux/slab.h>
+#include <linux/sort.h>
#include <linux/io.h>
#include "nd-core.h"
#include "nd.h"
@@ -133,6 +136,21 @@ static ssize_t nstype_show(struct device *dev,
}
static DEVICE_ATTR_RO(nstype);

+static ssize_t set_cookie_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ struct nd_interleave_set *nd_set = nd_region->nd_set;
+
+ if (is_nd_pmem(dev) && nd_set)
+ /* pass, should be precluded by region_visible */;
+ else
+ return -ENXIO;
+
+ return sprintf(buf, "%#llx\n", nd_set->cookie);
+}
+static DEVICE_ATTR_RO(set_cookie);
+
static ssize_t init_namespaces_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -154,15 +172,65 @@ static struct attribute *nd_region_attributes[] = {
&dev_attr_size.attr,
&dev_attr_nstype.attr,
&dev_attr_mappings.attr,
+ &dev_attr_set_cookie.attr,
&dev_attr_init_namespaces.attr,
NULL,
};

+static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ struct device *dev = container_of(kobj, typeof(*dev), kobj);
+ struct nd_region *nd_region = to_nd_region(dev);
+ struct nd_interleave_set *nd_set = nd_region->nd_set;
+
+ if (a != &dev_attr_set_cookie.attr)
+ return a->mode;
+
+ if (is_nd_pmem(dev) && nd_set)
+ return a->mode;
+
+ return 0;
+}
+
struct attribute_group nd_region_attribute_group = {
.attrs = nd_region_attributes,
+ .is_visible = region_visible,
};
EXPORT_SYMBOL_GPL(nd_region_attribute_group);

+/*
+ * Upon successful probe/remove, take/release a reference on the
+ * associated interleave set (if present)
+ */
+static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
+ struct device *dev, bool probe)
+{
+ if (is_nd_pmem(dev) || is_nd_blk(dev)) {
+ struct nd_region *nd_region = to_nd_region(dev);
+ int i;
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+ if (probe)
+ atomic_inc(&nvdimm->busy);
+ else
+ atomic_dec(&nvdimm->busy);
+ }
+ }
+}
+
+void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev)
+{
+ nd_region_notify_driver_action(nvdimm_bus, dev, true);
+}
+
+void nd_region_disable(struct nvdimm_bus *nvdimm_bus, struct device *dev)
+{
+ nd_region_notify_driver_action(nvdimm_bus, dev, false);
+}
+
static ssize_t mappingN(struct device *dev, char *buf, int n)
{
struct nd_region *nd_region = to_nd_region(dev);
@@ -322,6 +390,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
}
nd_region->ndr_mappings = ndr_desc->num_mappings;
nd_region->provider_data = ndr_desc->provider_data;
+ nd_region->nd_set = ndr_desc->nd_set;
dev = &nd_region->dev;
dev_set_name(dev, "region%d", nd_region->id);
dev->parent = &nvdimm_bus->dev;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 37f966aff386..1b627b109360 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -61,11 +61,16 @@ struct nd_cmd_desc {
int out_sizes[ND_CMD_MAX_ELEM];
};

+struct nd_interleave_set {
+ u64 cookie;
+};
+
struct nd_region_desc {
struct resource *res;
struct nd_mapping *nd_mapping;
u16 num_mappings;
const struct attribute_group **attr_groups;
+ struct nd_interleave_set *nd_set;
void *provider_data;
};

@@ -101,4 +106,5 @@ struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
struct nd_region_desc *ndr_desc);
struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
struct nd_region_desc *ndr_desc);
+u64 nd_fletcher64(void *addr, size_t len, bool le);
#endif /* __LIBNVDIMM_H__ */

2015-06-17 23:17:14

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 12/16] libnvdimm: namespace indices: read and validate

This on media label format [1] consists of two index blocks followed by
an array of labels. None of these structures are ever updated in place.
A sequence number tracks the current active index and the next one to
write, while labels are written to free slots.

+------------+
| |
| nsindex0 |
| |
+------------+
| |
| nsindex1 |
| |
+------------+
| label0 |
+------------+
| label1 |
+------------+
| |
....nslot...
| |
+------------+
| labelN |
+------------+

After reading valid labels, store the dpa ranges they claim into
per-dimm resource trees.

[1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf

Cc: Neil Brown <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/Makefile | 1
drivers/nvdimm/dimm.c | 23 +++
drivers/nvdimm/dimm_devs.c | 30 ++++-
drivers/nvdimm/label.c | 290 ++++++++++++++++++++++++++++++++++++++++++++
drivers/nvdimm/label.h | 128 +++++++++++++++++++
drivers/nvdimm/nd.h | 49 +++++++
include/uapi/linux/ndctl.h | 1
7 files changed, 520 insertions(+), 2 deletions(-)
create mode 100644 drivers/nvdimm/label.c
create mode 100644 drivers/nvdimm/label.h

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 4d2a27f52faa..abce98f87f16 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -10,3 +10,4 @@ libnvdimm-y += dimm.o
libnvdimm-y += region_devs.o
libnvdimm-y += region.o
libnvdimm-y += namespace_devs.o
+libnvdimm-y += label.o
diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
index eb20fc2df32b..2df97c3c3b34 100644
--- a/drivers/nvdimm/dimm.c
+++ b/drivers/nvdimm/dimm.c
@@ -18,6 +18,7 @@
#include <linux/slab.h>
#include <linux/mm.h>
#include <linux/nd.h>
+#include "label.h"
#include "nd.h"

static void free_data(struct nvdimm_drvdata *ndd)
@@ -42,6 +43,11 @@ static int nvdimm_probe(struct device *dev)
return -ENOMEM;

dev_set_drvdata(dev, ndd);
+ ndd->dpa.name = dev_name(dev);
+ ndd->ns_current = -1;
+ ndd->ns_next = -1;
+ ndd->dpa.start = 0;
+ ndd->dpa.end = -1;
ndd->dev = dev;

rc = nvdimm_init_nsarea(ndd);
@@ -54,6 +60,17 @@ static int nvdimm_probe(struct device *dev)

dev_dbg(dev, "config data size: %d\n", ndd->nsarea.config_size);

+ nvdimm_bus_lock(dev);
+ ndd->ns_current = nd_label_validate(ndd);
+ ndd->ns_next = nd_label_next_nsindex(ndd->ns_current);
+ nd_label_copy(ndd, to_next_namespace_index(ndd),
+ to_current_namespace_index(ndd));
+ rc = nd_label_reserve_dpa(ndd);
+ nvdimm_bus_unlock(dev);
+
+ if (rc)
+ goto err;
+
return 0;

err:
@@ -64,7 +81,13 @@ static int nvdimm_probe(struct device *dev)
static int nvdimm_remove(struct device *dev)
{
struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
+ struct resource *res, *_r;

+ nvdimm_bus_lock(dev);
+ dev_set_drvdata(dev, NULL);
+ for_each_dpa_resource_safe(ndd, res, _r)
+ nvdimm_free_dpa(ndd, res);
+ nvdimm_bus_unlock(dev);
free_data(ndd);

return 0;
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index bdf8241b6525..d2ef02e4be6c 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -92,8 +92,12 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
if (ndd->data)
return 0;

- if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0)
+ if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0
+ || ndd->nsarea.config_size < ND_LABEL_MIN_SIZE) {
+ dev_dbg(ndd->dev, "failed to init config data area: (%d:%d)\n",
+ ndd->nsarea.max_xfer, ndd->nsarea.config_size);
return -ENXIO;
+ }

ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
if (!ndd->data)
@@ -243,6 +247,30 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
}
EXPORT_SYMBOL_GPL(nvdimm_create);

+void nvdimm_free_dpa(struct nvdimm_drvdata *ndd, struct resource *res)
+{
+ WARN_ON_ONCE(!is_nvdimm_bus_locked(ndd->dev));
+ kfree(res->name);
+ __release_region(&ndd->dpa, res->start, resource_size(res));
+}
+
+struct resource *nvdimm_allocate_dpa(struct nvdimm_drvdata *ndd,
+ struct nd_label_id *label_id, resource_size_t start,
+ resource_size_t n)
+{
+ char *name = kmemdup(label_id, sizeof(*label_id), GFP_KERNEL);
+ struct resource *res;
+
+ if (!name)
+ return NULL;
+
+ WARN_ON_ONCE(!is_nvdimm_bus_locked(ndd->dev));
+ res = __request_region(&ndd->dpa, start, n, name, 0);
+ if (!res)
+ kfree(name);
+ return res;
+}
+
static int count_dimms(struct device *dev, void *c)
{
int *count = c;
diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
new file mode 100644
index 000000000000..db5d7492dc8d
--- /dev/null
+++ b/drivers/nvdimm/label.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/device.h>
+#include <linux/ndctl.h>
+#include <linux/io.h>
+#include <linux/nd.h>
+#include "nd-core.h"
+#include "label.h"
+#include "nd.h"
+
+static u32 best_seq(u32 a, u32 b)
+{
+ a &= NSINDEX_SEQ_MASK;
+ b &= NSINDEX_SEQ_MASK;
+
+ if (a == 0 || a == b)
+ return b;
+ else if (b == 0)
+ return a;
+ else if (nd_inc_seq(a) == b)
+ return b;
+ else
+ return a;
+}
+
+size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd)
+{
+ u32 index_span;
+
+ if (ndd->nsindex_size)
+ return ndd->nsindex_size;
+
+ /*
+ * The minimum index space is 512 bytes, with that amount of
+ * index we can describe ~1400 labels which is less than a byte
+ * of overhead per label. Round up to a byte of overhead per
+ * label and determine the size of the index region. Yes, this
+ * starts to waste space at larger config_sizes, but it's
+ * unlikely we'll ever see anything but 128K.
+ */
+ index_span = ndd->nsarea.config_size / 129;
+ index_span /= NSINDEX_ALIGN * 2;
+ ndd->nsindex_size = index_span * NSINDEX_ALIGN;
+
+ return ndd->nsindex_size;
+}
+
+int nd_label_validate(struct nvdimm_drvdata *ndd)
+{
+ /*
+ * On media label format consists of two index blocks followed
+ * by an array of labels. None of these structures are ever
+ * updated in place. A sequence number tracks the current
+ * active index and the next one to write, while labels are
+ * written to free slots.
+ *
+ * +------------+
+ * | |
+ * | nsindex0 |
+ * | |
+ * +------------+
+ * | |
+ * | nsindex1 |
+ * | |
+ * +------------+
+ * | label0 |
+ * +------------+
+ * | label1 |
+ * +------------+
+ * | |
+ * ....nslot...
+ * | |
+ * +------------+
+ * | labelN |
+ * +------------+
+ */
+ struct nd_namespace_index *nsindex[] = {
+ to_namespace_index(ndd, 0),
+ to_namespace_index(ndd, 1),
+ };
+ const int num_index = ARRAY_SIZE(nsindex);
+ struct device *dev = ndd->dev;
+ bool valid[2] = { 0 };
+ int i, num_valid = 0;
+ u32 seq;
+
+ for (i = 0; i < num_index; i++) {
+ u32 nslot;
+ u8 sig[NSINDEX_SIG_LEN];
+ u64 sum_save, sum, size;
+
+ memcpy(sig, nsindex[i]->sig, NSINDEX_SIG_LEN);
+ if (memcmp(sig, NSINDEX_SIGNATURE, NSINDEX_SIG_LEN) != 0) {
+ dev_dbg(dev, "%s: nsindex%d signature invalid\n",
+ __func__, i);
+ continue;
+ }
+ sum_save = __le64_to_cpu(nsindex[i]->checksum);
+ nsindex[i]->checksum = __cpu_to_le64(0);
+ sum = nd_fletcher64(nsindex[i], sizeof_namespace_index(ndd), 1);
+ nsindex[i]->checksum = __cpu_to_le64(sum_save);
+ if (sum != sum_save) {
+ dev_dbg(dev, "%s: nsindex%d checksum invalid\n",
+ __func__, i);
+ continue;
+ }
+
+ seq = __le32_to_cpu(nsindex[i]->seq);
+ if ((seq & NSINDEX_SEQ_MASK) == 0) {
+ dev_dbg(dev, "%s: nsindex%d sequence: %#x invalid\n",
+ __func__, i, seq);
+ continue;
+ }
+
+ /* sanity check the index against expected values */
+ if (__le64_to_cpu(nsindex[i]->myoff)
+ != i * sizeof_namespace_index(ndd)) {
+ dev_dbg(dev, "%s: nsindex%d myoff: %#llx invalid\n",
+ __func__, i, (unsigned long long)
+ __le64_to_cpu(nsindex[i]->myoff));
+ continue;
+ }
+ if (__le64_to_cpu(nsindex[i]->otheroff)
+ != (!i) * sizeof_namespace_index(ndd)) {
+ dev_dbg(dev, "%s: nsindex%d otheroff: %#llx invalid\n",
+ __func__, i, (unsigned long long)
+ __le64_to_cpu(nsindex[i]->otheroff));
+ continue;
+ }
+
+ size = __le64_to_cpu(nsindex[i]->mysize);
+ if (size > sizeof_namespace_index(ndd)
+ || size < sizeof(struct nd_namespace_index)) {
+ dev_dbg(dev, "%s: nsindex%d mysize: %#llx invalid\n",
+ __func__, i, size);
+ continue;
+ }
+
+ nslot = __le32_to_cpu(nsindex[i]->nslot);
+ if (nslot * sizeof(struct nd_namespace_label)
+ + 2 * sizeof_namespace_index(ndd)
+ > ndd->nsarea.config_size) {
+ dev_dbg(dev, "%s: nsindex%d nslot: %u invalid, config_size: %#x\n",
+ __func__, i, nslot,
+ ndd->nsarea.config_size);
+ continue;
+ }
+ valid[i] = true;
+ num_valid++;
+ }
+
+ switch (num_valid) {
+ case 0:
+ break;
+ case 1:
+ for (i = 0; i < num_index; i++)
+ if (valid[i])
+ return i;
+ /* can't have num_valid > 0 but valid[] = { false, false } */
+ WARN_ON(1);
+ break;
+ default:
+ /* pick the best index... */
+ seq = best_seq(__le32_to_cpu(nsindex[0]->seq),
+ __le32_to_cpu(nsindex[1]->seq));
+ if (seq == (__le32_to_cpu(nsindex[1]->seq) & NSINDEX_SEQ_MASK))
+ return 1;
+ else
+ return 0;
+ break;
+ }
+
+ return -1;
+}
+
+void nd_label_copy(struct nvdimm_drvdata *ndd, struct nd_namespace_index *dst,
+ struct nd_namespace_index *src)
+{
+ if (dst && src)
+ /* pass */;
+ else
+ return;
+
+ memcpy(dst, src, sizeof_namespace_index(ndd));
+}
+
+static struct nd_namespace_label *nd_label_base(struct nvdimm_drvdata *ndd)
+{
+ void *base = to_namespace_index(ndd, 0);
+
+ return base + 2 * sizeof_namespace_index(ndd);
+}
+
+#define for_each_clear_bit_le(bit, addr, size) \
+ for ((bit) = find_next_zero_bit_le((addr), (size), 0); \
+ (bit) < (size); \
+ (bit) = find_next_zero_bit_le((addr), (size), (bit) + 1))
+
+/**
+ * preamble_current - common variable initialization for nd_label_* routines
+ * @ndd: dimm container for the relevant label set
+ * @nsindex_out: on return set to the currently active namespace index
+ * @free: on return set to the free label bitmap in the index
+ * @nslot: on return set to the number of slots in the label space
+ */
+static bool preamble_current(struct nvdimm_drvdata *ndd,
+ struct nd_namespace_index **nsindex_out,
+ unsigned long **free, u32 *nslot)
+{
+ struct nd_namespace_index *nsindex;
+
+ nsindex = to_current_namespace_index(ndd);
+ if (nsindex == NULL)
+ return false;
+
+ *free = (unsigned long *) nsindex->free;
+ *nslot = __le32_to_cpu(nsindex->nslot);
+ *nsindex_out = nsindex;
+
+ return true;
+}
+
+static char *nd_label_gen_id(struct nd_label_id *label_id, u8 *uuid, u32 flags)
+{
+ if (!label_id || !uuid)
+ return NULL;
+ snprintf(label_id->id, ND_LABEL_ID_SIZE, "%s-%pUb",
+ flags & NSLABEL_FLAG_LOCAL ? "blk" : "pmem", uuid);
+ return label_id->id;
+}
+
+static bool slot_valid(struct nd_namespace_label *nd_label, u32 slot)
+{
+ /* check that we are written where we expect to be written */
+ if (slot != __le32_to_cpu(nd_label->slot))
+ return false;
+
+ /* check that DPA allocations are page aligned */
+ if ((__le64_to_cpu(nd_label->dpa)
+ | __le64_to_cpu(nd_label->rawsize)) % SZ_4K)
+ return false;
+
+ return true;
+}
+
+int nd_label_reserve_dpa(struct nvdimm_drvdata *ndd)
+{
+ struct nd_namespace_index *nsindex;
+ unsigned long *free;
+ u32 nslot, slot;
+
+ if (!preamble_current(ndd, &nsindex, &free, &nslot))
+ return 0; /* no label, nothing to reserve */
+
+ for_each_clear_bit_le(slot, free, nslot) {
+ struct nd_namespace_label *nd_label;
+ struct nd_region *nd_region = NULL;
+ u8 label_uuid[NSLABEL_UUID_LEN];
+ struct nd_label_id label_id;
+ struct resource *res;
+ u32 flags;
+
+ nd_label = nd_label_base(ndd) + slot;
+
+ if (!slot_valid(nd_label, slot))
+ continue;
+
+ memcpy(label_uuid, nd_label->uuid, NSLABEL_UUID_LEN);
+ flags = __le32_to_cpu(nd_label->flags);
+ nd_label_gen_id(&label_id, label_uuid, flags);
+ res = nvdimm_allocate_dpa(ndd, &label_id,
+ __le64_to_cpu(nd_label->dpa),
+ __le64_to_cpu(nd_label->rawsize));
+ nd_dbg_dpa(nd_region, ndd, res, "reserve\n");
+ if (!res)
+ return -EBUSY;
+ }
+
+ return 0;
+}
diff --git a/drivers/nvdimm/label.h b/drivers/nvdimm/label.h
new file mode 100644
index 000000000000..d6aa0d5c6b4e
--- /dev/null
+++ b/drivers/nvdimm/label.h
@@ -0,0 +1,128 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __LABEL_H__
+#define __LABEL_H__
+
+#include <linux/ndctl.h>
+#include <linux/sizes.h>
+#include <linux/io.h>
+
+enum {
+ NSINDEX_SIG_LEN = 16,
+ NSINDEX_ALIGN = 256,
+ NSINDEX_SEQ_MASK = 0x3,
+ NSLABEL_UUID_LEN = 16,
+ NSLABEL_NAME_LEN = 64,
+ NSLABEL_FLAG_ROLABEL = 0x1, /* read-only label */
+ NSLABEL_FLAG_LOCAL = 0x2, /* DIMM-local namespace */
+ NSLABEL_FLAG_BTT = 0x4, /* namespace contains a BTT */
+ NSLABEL_FLAG_UPDATING = 0x8, /* label being updated */
+ BTT_ALIGN = 4096, /* all btt structures */
+ BTTINFO_SIG_LEN = 16,
+ BTTINFO_UUID_LEN = 16,
+ BTTINFO_FLAG_ERROR = 0x1, /* error state (read-only) */
+ BTTINFO_MAJOR_VERSION = 1,
+ ND_LABEL_MIN_SIZE = 512 * 129, /* see sizeof_namespace_index() */
+ ND_LABEL_ID_SIZE = 50,
+};
+
+static const char NSINDEX_SIGNATURE[] = "NAMESPACE_INDEX\0";
+
+/**
+ * struct nd_namespace_index - label set superblock
+ * @sig: NAMESPACE_INDEX\0
+ * @flags: placeholder
+ * @seq: sequence number for this index
+ * @myoff: offset of this index in label area
+ * @mysize: size of this index struct
+ * @otheroff: offset of other index
+ * @labeloff: offset of first label slot
+ * @nslot: total number of label slots
+ * @major: label area major version
+ * @minor: label area minor version
+ * @checksum: fletcher64 of all fields
+ * @free[0]: bitmap, nlabel bits
+ *
+ * The size of free[] is rounded up so the total struct size is a
+ * multiple of NSINDEX_ALIGN bytes. Any bits this allocates beyond
+ * nlabel bits must be zero.
+ */
+struct nd_namespace_index {
+ u8 sig[NSINDEX_SIG_LEN];
+ __le32 flags;
+ __le32 seq;
+ __le64 myoff;
+ __le64 mysize;
+ __le64 otheroff;
+ __le64 labeloff;
+ __le32 nslot;
+ __le16 major;
+ __le16 minor;
+ __le64 checksum;
+ u8 free[0];
+};
+
+/**
+ * struct nd_namespace_label - namespace superblock
+ * @uuid: UUID per RFC 4122
+ * @name: optional name (NULL-terminated)
+ * @flags: see NSLABEL_FLAG_*
+ * @nlabel: num labels to describe this ns
+ * @position: labels position in set
+ * @isetcookie: interleave set cookie
+ * @lbasize: LBA size in bytes or 0 for pmem
+ * @dpa: DPA of NVM range on this DIMM
+ * @rawsize: size of namespace
+ * @slot: slot of this label in label area
+ * @unused: must be zero
+ */
+struct nd_namespace_label {
+ u8 uuid[NSLABEL_UUID_LEN];
+ u8 name[NSLABEL_NAME_LEN];
+ __le32 flags;
+ __le16 nlabel;
+ __le16 position;
+ __le64 isetcookie;
+ __le64 lbasize;
+ __le64 dpa;
+ __le64 rawsize;
+ __le32 slot;
+ __le32 unused;
+};
+
+/**
+ * struct nd_label_id - identifier string for dpa allocation
+ * @id: "{blk|pmem}-<namespace uuid>"
+ */
+struct nd_label_id {
+ char id[ND_LABEL_ID_SIZE];
+};
+
+/*
+ * If the 'best' index is invalid, so is the 'next' index. Otherwise,
+ * the next index is MOD(index+1, 2)
+ */
+static inline int nd_label_next_nsindex(int index)
+{
+ if (index < 0)
+ return -1;
+
+ return (index + 1) % 2;
+}
+
+struct nvdimm_drvdata;
+int nd_label_validate(struct nvdimm_drvdata *ndd);
+void nd_label_copy(struct nvdimm_drvdata *ndd, struct nd_namespace_index *dst,
+ struct nd_namespace_index *src);
+size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd);
+#endif /* __LABEL_H__ */
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 0285e4588b03..401fa0d5b6ea 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -16,11 +16,15 @@
#include <linux/device.h>
#include <linux/mutex.h>
#include <linux/ndctl.h>
+#include "label.h"

struct nvdimm_drvdata {
struct device *dev;
+ int nsindex_size;
struct nd_cmd_get_config_size nsarea;
void *data;
+ int ns_current, ns_next;
+ struct resource dpa;
};

struct nd_region_namespaces {
@@ -28,6 +32,37 @@ struct nd_region_namespaces {
int active;
};

+static inline struct nd_namespace_index *to_namespace_index(
+ struct nvdimm_drvdata *ndd, int i)
+{
+ if (i < 0)
+ return NULL;
+
+ return ndd->data + sizeof_namespace_index(ndd) * i;
+}
+
+static inline struct nd_namespace_index *to_current_namespace_index(
+ struct nvdimm_drvdata *ndd)
+{
+ return to_namespace_index(ndd, ndd->ns_current);
+}
+
+static inline struct nd_namespace_index *to_next_namespace_index(
+ struct nvdimm_drvdata *ndd)
+{
+ return to_namespace_index(ndd, ndd->ns_next);
+}
+
+#define nd_dbg_dpa(r, d, res, fmt, arg...) \
+ dev_dbg((r) ? &(r)->dev : (d)->dev, "%s: %.13s: %#llx @ %#llx " fmt, \
+ (r) ? dev_name((d)->dev) : "", res ? res->name : "null", \
+ (unsigned long long) (res ? resource_size(res) : 0), \
+ (unsigned long long) (res ? res->start : 0), ##arg)
+
+#define for_each_dpa_resource_safe(ndd, res, next) \
+ for (res = (ndd)->dpa.child, next = res ? res->sibling : NULL; \
+ res; res = next, next = next ? next->sibling : NULL)
+
struct nd_region {
struct device dev;
u16 ndr_mappings;
@@ -39,6 +74,15 @@ struct nd_region {
struct nd_mapping mapping[0];
};

+/*
+ * Lookup next in the repeating sequence of 01, 10, and 11.
+ */
+static inline unsigned nd_inc_seq(unsigned seq)
+{
+ static const unsigned next[] = { 0, 2, 3, 1 };
+
+ return next[seq & 3];
+}
enum nd_async_mode {
ND_SYNC,
ND_ASYNC,
@@ -58,4 +102,9 @@ int nd_region_register_namespaces(struct nd_region *nd_region, int *err);
void nvdimm_bus_lock(struct device *dev);
void nvdimm_bus_unlock(struct device *dev);
bool is_nvdimm_bus_locked(struct device *dev);
+int nd_label_reserve_dpa(struct nvdimm_drvdata *ndd);
+void nvdimm_free_dpa(struct nvdimm_drvdata *ndd, struct resource *res);
+struct resource *nvdimm_allocate_dpa(struct nvdimm_drvdata *ndd,
+ struct nd_label_id *label_id, resource_size_t start,
+ resource_size_t n);
#endif /* __ND_H__ */
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index 174b6371dcc1..1357a87b8714 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -175,7 +175,6 @@ static inline const char *nvdimm_cmd_name(unsigned cmd)
#define ND_IOCTL_ARS_STATUS _IOWR(ND_IOCTL, ND_CMD_ARS_STATUS,\
struct nd_cmd_ars_status)

-
#define ND_DEVICE_DIMM 1 /* nd_dimm: container for "config data" */
#define ND_DEVICE_REGION_PMEM 2 /* nd_region: (parent of PMEM namespaces) */
#define ND_DEVICE_REGION_BLK 3 /* nd_region: (parent of BLK namespaces) */

2015-06-17 23:17:26

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 13/16] libnvdimm: pmem label sets and namespace instantiation.

A complete label set is a PMEM-label per-dimm per-interleave-set where
all the UUIDs match and the interleave set cookie matches the hosting
interleave set.

Present sysfs attributes for manipulation of a PMEM-namespace's
'alt_name', 'uuid', and 'size' attributes. A later patch will make
these settings persistent by writing back the label.

Note that PMEM allocations grow forwards from the start of an interleave
set (lowest dimm-physical-address (DPA)). BLK-namespaces that alias
with a PMEM interleave set will grow allocations backward from the
highest DPA.

Cc: Greg KH <[email protected]>
Cc: Neil Brown <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/bus.c | 8
drivers/nvdimm/core.c | 64 ++
drivers/nvdimm/dimm.c | 21 -
drivers/nvdimm/dimm_devs.c | 137 +++++
drivers/nvdimm/label.c | 55 ++
drivers/nvdimm/label.h | 2
drivers/nvdimm/namespace_devs.c | 1002 +++++++++++++++++++++++++++++++++++++++
drivers/nvdimm/nd-core.h | 12
drivers/nvdimm/nd.h | 17 +
drivers/nvdimm/pmem.c | 20 +
drivers/nvdimm/region.c | 3
drivers/nvdimm/region_devs.c | 158 ++++++
include/linux/libnvdimm.h | 10
include/linux/nd.h | 24 +
include/uapi/linux/ndctl.h | 4
15 files changed, 1506 insertions(+), 31 deletions(-)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index ffb43cada625..fddc3f2a8f80 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -97,6 +97,8 @@ static int nvdimm_bus_probe(struct device *dev)
rc = nd_drv->probe(dev);
if (rc == 0)
nd_region_probe_success(nvdimm_bus, dev);
+ else
+ nd_region_disable(nvdimm_bus, dev);
nvdimm_bus_probe_end(nvdimm_bus);

dev_dbg(&nvdimm_bus->dev, "%s.probe(%s) = %d\n", dev->driver->name,
@@ -381,8 +383,10 @@ u32 nd_cmd_out_size(struct nvdimm *nvdimm, int cmd,
}
EXPORT_SYMBOL_GPL(nd_cmd_out_size);

-static void wait_nvdimm_bus_probe_idle(struct nvdimm_bus *nvdimm_bus)
+void wait_nvdimm_bus_probe_idle(struct device *dev)
{
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+
do {
if (nvdimm_bus->probe_active == 0)
break;
@@ -402,7 +406,7 @@ static int nd_cmd_clear_to_send(struct nvdimm *nvdimm, unsigned int cmd)
return 0;

nvdimm_bus = walk_to_nvdimm_bus(&nvdimm->dev);
- wait_nvdimm_bus_probe_idle(nvdimm_bus);
+ wait_nvdimm_bus_probe_idle(&nvdimm_bus->dev);

if (atomic_read(&nvdimm->busy))
return -EBUSY;
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 7806eaaf4707..cf99cce8ef33 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -14,6 +14,7 @@
#include <linux/export.h>
#include <linux/module.h>
#include <linux/device.h>
+#include <linux/ctype.h>
#include <linux/ndctl.h>
#include <linux/mutex.h>
#include <linux/slab.h>
@@ -109,6 +110,69 @@ struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
return NULL;
}

+static bool is_uuid_sep(char sep)
+{
+ if (sep == '\n' || sep == '-' || sep == ':' || sep == '\0')
+ return true;
+ return false;
+}
+
+static int nd_uuid_parse(struct device *dev, u8 *uuid_out, const char *buf,
+ size_t len)
+{
+ const char *str = buf;
+ u8 uuid[16];
+ int i;
+
+ for (i = 0; i < 16; i++) {
+ if (!isxdigit(str[0]) || !isxdigit(str[1])) {
+ dev_dbg(dev, "%s: pos: %d buf[%zd]: %c buf[%zd]: %c\n",
+ __func__, i, str - buf, str[0],
+ str + 1 - buf, str[1]);
+ return -EINVAL;
+ }
+
+ uuid[i] = (hex_to_bin(str[0]) << 4) | hex_to_bin(str[1]);
+ str += 2;
+ if (is_uuid_sep(*str))
+ str++;
+ }
+
+ memcpy(uuid_out, uuid, sizeof(uuid));
+ return 0;
+}
+
+/**
+ * nd_uuid_store: common implementation for writing 'uuid' sysfs attributes
+ * @dev: container device for the uuid property
+ * @uuid_out: uuid buffer to replace
+ * @buf: raw sysfs buffer to parse
+ *
+ * Enforce that uuids can only be changed while the device is disabled
+ * (driver detached)
+ * LOCKING: expects device_lock() is held on entry
+ */
+int nd_uuid_store(struct device *dev, u8 **uuid_out, const char *buf,
+ size_t len)
+{
+ u8 uuid[16];
+ int rc;
+
+ if (dev->driver)
+ return -EBUSY;
+
+ rc = nd_uuid_parse(dev, uuid, buf, len);
+ if (rc)
+ return rc;
+
+ kfree(*uuid_out);
+ *uuid_out = kmemdup(uuid, sizeof(uuid), GFP_KERNEL);
+ if (!(*uuid_out))
+ return -ENOMEM;
+
+ return 0;
+}
+
static ssize_t commands_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
index 2df97c3c3b34..71d12bb67339 100644
--- a/drivers/nvdimm/dimm.c
+++ b/drivers/nvdimm/dimm.c
@@ -21,18 +21,6 @@
#include "label.h"
#include "nd.h"

-static void free_data(struct nvdimm_drvdata *ndd)
-{
- if (!ndd)
- return;
-
- if (ndd->data && is_vmalloc_addr(ndd->data))
- vfree(ndd->data);
- else
- kfree(ndd->data);
- kfree(ndd);
-}
-
static int nvdimm_probe(struct device *dev)
{
struct nvdimm_drvdata *ndd;
@@ -49,6 +37,8 @@ static int nvdimm_probe(struct device *dev)
ndd->dpa.start = 0;
ndd->dpa.end = -1;
ndd->dev = dev;
+ get_device(dev);
+ kref_init(&ndd->kref);

rc = nvdimm_init_nsarea(ndd);
if (rc)
@@ -74,21 +64,18 @@ static int nvdimm_probe(struct device *dev)
return 0;

err:
- free_data(ndd);
+ put_ndd(ndd);
return rc;
}

static int nvdimm_remove(struct device *dev)
{
struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
- struct resource *res, *_r;

nvdimm_bus_lock(dev);
dev_set_drvdata(dev, NULL);
- for_each_dpa_resource_safe(ndd, res, _r)
- nvdimm_free_dpa(ndd, res);
nvdimm_bus_unlock(dev);
- free_data(ndd);
+ put_ndd(ndd);

return 0;
}
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index d2ef02e4be6c..b55acef179ba 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -159,6 +159,48 @@ struct nvdimm *to_nvdimm(struct device *dev)
}
EXPORT_SYMBOL_GPL(to_nvdimm);

+struct nvdimm_drvdata *to_ndd(struct nd_mapping *nd_mapping)
+{
+ struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+ WARN_ON_ONCE(!is_nvdimm_bus_locked(&nvdimm->dev));
+
+ return dev_get_drvdata(&nvdimm->dev);
+}
+EXPORT_SYMBOL(to_ndd);
+
+void nvdimm_drvdata_release(struct kref *kref)
+{
+ struct nvdimm_drvdata *ndd = container_of(kref, typeof(*ndd), kref);
+ struct device *dev = ndd->dev;
+ struct resource *res, *_r;
+
+ dev_dbg(dev, "%s\n", __func__);
+
+ nvdimm_bus_lock(dev);
+ for_each_dpa_resource_safe(ndd, res, _r)
+ nvdimm_free_dpa(ndd, res);
+ nvdimm_bus_unlock(dev);
+
+ if (ndd->data && is_vmalloc_addr(ndd->data))
+ vfree(ndd->data);
+ else
+ kfree(ndd->data);
+ kfree(ndd);
+ put_device(dev);
+}
+
+void get_ndd(struct nvdimm_drvdata *ndd)
+{
+ kref_get(&ndd->kref);
+}
+
+void put_ndd(struct nvdimm_drvdata *ndd)
+{
+ if (ndd)
+ kref_put(&ndd->kref, nvdimm_drvdata_release);
+}
+
const char *nvdimm_name(struct nvdimm *nvdimm)
{
return dev_name(&nvdimm->dev);
@@ -247,6 +289,83 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
}
EXPORT_SYMBOL_GPL(nvdimm_create);

+/**
+ * nd_pmem_available_dpa - for the given dimm+region account unallocated dpa
+ * @nd_mapping: container of dpa-resource-root + labels
+ * @nd_region: constrain available space check to this reference region
+ * @overlap: calculate available space assuming this level of overlap
+ *
+ * Validate that a PMEM label, if present, aligns with the start of an
+ * interleave set and truncate the available size at the lowest BLK
+ * overlap point.
+ *
+ * The expectation is that this routine is called multiple times as it
+ * probes for the largest BLK encroachment for any single member DIMM of
+ * the interleave set. Once that value is determined the PMEM-limit for
+ * the set can be established.
+ */
+resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
+ struct nd_mapping *nd_mapping, resource_size_t *overlap)
+{
+ resource_size_t map_start, map_end, busy = 0, available, blk_start;
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct resource *res;
+ const char *reason;
+
+ if (!ndd)
+ return 0;
+
+ map_start = nd_mapping->start;
+ map_end = map_start + nd_mapping->size - 1;
+ blk_start = max(map_start, map_end + 1 - *overlap);
+ for_each_dpa_resource(ndd, res)
+ if (res->start >= map_start && res->start < map_end) {
+ if (strncmp(res->name, "blk", 3) == 0)
+ blk_start = min(blk_start, res->start);
+ else if (res->start != map_start) {
+ reason = "misaligned to iset";
+ goto err;
+ } else {
+ if (busy) {
+ reason = "duplicate overlapping PMEM reservations?";
+ goto err;
+ }
+ busy += resource_size(res);
+ continue;
+ }
+ } else if (res->end >= map_start && res->end <= map_end) {
+ if (strncmp(res->name, "blk", 3) == 0) {
+ /*
+ * If a BLK allocation overlaps the start of
+ * PMEM the entire interleave set may now only
+ * be used for BLK.
+ */
+ blk_start = map_start;
+ } else {
+ reason = "misaligned to iset";
+ goto err;
+ }
+ } else if (map_start > res->start && map_start < res->end) {
+ /* total eclipse of the mapping */
+ busy += nd_mapping->size;
+ blk_start = map_start;
+ }
+
+ *overlap = map_end + 1 - blk_start;
+ available = blk_start - map_start;
+ if (busy < available)
+ return available - busy;
+ return 0;
+
+ err:
+ /*
+ * Something is wrong, PMEM must align with the start of the
+ * interleave set, and there can only be one allocation per set.
+ */
+ nd_dbg_dpa(nd_region, ndd, res, "%s\n", reason);
+ return 0;
+}
+
void nvdimm_free_dpa(struct nvdimm_drvdata *ndd, struct resource *res)
{
WARN_ON_ONCE(!is_nvdimm_bus_locked(ndd->dev));
@@ -271,6 +390,24 @@ struct resource *nvdimm_allocate_dpa(struct nvdimm_drvdata *ndd,
return res;
}

+/**
+ * nvdimm_allocated_dpa - sum up the dpa currently allocated to this label_id
+ * @nvdimm: container of dpa-resource-root + labels
+ * @label_id: dpa resource name of the form {pmem|blk}-<human readable uuid>
+ */
+resource_size_t nvdimm_allocated_dpa(struct nvdimm_drvdata *ndd,
+ struct nd_label_id *label_id)
+{
+ resource_size_t allocated = 0;
+ struct resource *res;
+
+ for_each_dpa_resource(ndd, res)
+ if (strcmp(res->name, label_id->id) == 0)
+ allocated += resource_size(res);
+
+ return allocated;
+}
+
static int count_dimms(struct device *dev, void *c)
{
int *count = c;
diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index db5d7492dc8d..1a3bcd27a57a 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -230,7 +230,7 @@ static bool preamble_current(struct nvdimm_drvdata *ndd,
return true;
}

-static char *nd_label_gen_id(struct nd_label_id *label_id, u8 *uuid, u32 flags)
+char *nd_label_gen_id(struct nd_label_id *label_id, u8 *uuid, u32 flags)
{
if (!label_id || !uuid)
return NULL;
@@ -288,3 +288,56 @@ int nd_label_reserve_dpa(struct nvdimm_drvdata *ndd)

return 0;
}
+
+int nd_label_active_count(struct nvdimm_drvdata *ndd)
+{
+ struct nd_namespace_index *nsindex;
+ unsigned long *free;
+ u32 nslot, slot;
+ int count = 0;
+
+ if (!preamble_current(ndd, &nsindex, &free, &nslot))
+ return 0;
+
+ for_each_clear_bit_le(slot, free, nslot) {
+ struct nd_namespace_label *nd_label;
+
+ nd_label = nd_label_base(ndd) + slot;
+
+ if (!slot_valid(nd_label, slot)) {
+ u32 label_slot = __le32_to_cpu(nd_label->slot);
+ u64 size = __le64_to_cpu(nd_label->rawsize);
+ u64 dpa = __le64_to_cpu(nd_label->dpa);
+
+ dev_dbg(ndd->dev,
+ "%s: slot%d invalid slot: %d dpa: %llx size: %llx\n",
+ __func__, slot, label_slot, dpa, size);
+ continue;
+ }
+ count++;
+ }
+ return count;
+}
+
+struct nd_namespace_label *nd_label_active(struct nvdimm_drvdata *ndd, int n)
+{
+ struct nd_namespace_index *nsindex;
+ unsigned long *free;
+ u32 nslot, slot;
+
+ if (!preamble_current(ndd, &nsindex, &free, &nslot))
+ return NULL;
+
+ for_each_clear_bit_le(slot, free, nslot) {
+ struct nd_namespace_label *nd_label;
+
+ nd_label = nd_label_base(ndd) + slot;
+ if (!slot_valid(nd_label, slot))
+ continue;
+
+ if (n-- == 0)
+ return nd_label_base(ndd) + slot;
+ }
+
+ return NULL;
+}
diff --git a/drivers/nvdimm/label.h b/drivers/nvdimm/label.h
index d6aa0d5c6b4e..8ee1376526c7 100644
--- a/drivers/nvdimm/label.h
+++ b/drivers/nvdimm/label.h
@@ -125,4 +125,6 @@ int nd_label_validate(struct nvdimm_drvdata *ndd);
void nd_label_copy(struct nvdimm_drvdata *ndd, struct nd_namespace_index *dst,
struct nd_namespace_index *src);
size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd);
+int nd_label_active_count(struct nvdimm_drvdata *ndd);
+struct nd_namespace_label *nd_label_active(struct nvdimm_drvdata *ndd, int n);
#endif /* __LABEL_H__ */
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 4f653d1e61ad..5d81032fcfc5 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -14,6 +14,7 @@
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/nd.h>
+#include "nd-core.h"
#include "nd.h"

static void namespace_io_release(struct device *dev)
@@ -23,11 +24,50 @@ static void namespace_io_release(struct device *dev)
kfree(nsio);
}

+static void namespace_pmem_release(struct device *dev)
+{
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ kfree(nspm->alt_name);
+ kfree(nspm->uuid);
+ kfree(nspm);
+}
+
+static void namespace_blk_release(struct device *dev)
+{
+ /* TODO: blk namespace support */
+}
+
static struct device_type namespace_io_device_type = {
.name = "nd_namespace_io",
.release = namespace_io_release,
};

+static struct device_type namespace_pmem_device_type = {
+ .name = "nd_namespace_pmem",
+ .release = namespace_pmem_release,
+};
+
+static struct device_type namespace_blk_device_type = {
+ .name = "nd_namespace_blk",
+ .release = namespace_blk_release,
+};
+
+static bool is_namespace_pmem(struct device *dev)
+{
+ return dev ? dev->type == &namespace_pmem_device_type : false;
+}
+
+static bool is_namespace_blk(struct device *dev)
+{
+ return dev ? dev->type == &namespace_blk_device_type : false;
+}
+
+static bool is_namespace_io(struct device *dev)
+{
+ return dev ? dev->type == &namespace_io_device_type : false;
+}
+
static ssize_t nstype_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -37,13 +77,676 @@ static ssize_t nstype_show(struct device *dev,
}
static DEVICE_ATTR_RO(nstype);

+static ssize_t __alt_name_store(struct device *dev, const char *buf,
+ const size_t len)
+{
+ char *input, *pos, *alt_name, **ns_altname;
+ ssize_t rc;
+
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ ns_altname = &nspm->alt_name;
+ } else if (is_namespace_blk(dev)) {
+ /* TODO: blk namespace support */
+ return -ENXIO;
+ } else
+ return -ENXIO;
+
+ if (dev->driver)
+ return -EBUSY;
+
+ input = kmemdup(buf, len + 1, GFP_KERNEL);
+ if (!input)
+ return -ENOMEM;
+
+ input[len] = '\0';
+ pos = strim(input);
+ if (strlen(pos) + 1 > NSLABEL_NAME_LEN) {
+ rc = -EINVAL;
+ goto out;
+ }
+
+ alt_name = kzalloc(NSLABEL_NAME_LEN, GFP_KERNEL);
+ if (!alt_name) {
+ rc = -ENOMEM;
+ goto out;
+ }
+ kfree(*ns_altname);
+ *ns_altname = alt_name;
+ sprintf(*ns_altname, "%s", pos);
+ rc = len;
+
+out:
+ kfree(input);
+ return rc;
+}
+
+static ssize_t alt_name_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ ssize_t rc;
+
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
+ wait_nvdimm_bus_probe_idle(dev);
+ rc = __alt_name_store(dev, buf, len);
+ dev_dbg(dev, "%s: %s(%zd)\n", __func__, rc < 0 ? "fail " : "", rc);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
+
+ return rc;
+}
+
+static ssize_t alt_name_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ char *ns_altname;
+
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ ns_altname = nspm->alt_name;
+ } else if (is_namespace_blk(dev)) {
+ /* TODO: blk namespace support */
+ return -ENXIO;
+ } else
+ return -ENXIO;
+
+ return sprintf(buf, "%s\n", ns_altname ? ns_altname : "");
+}
+static DEVICE_ATTR_RW(alt_name);
+
+static int scan_free(struct nd_region *nd_region,
+ struct nd_mapping *nd_mapping, struct nd_label_id *label_id,
+ resource_size_t n)
+{
+ bool is_blk = strncmp(label_id->id, "blk", 3) == 0;
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ int rc = 0;
+
+ while (n) {
+ struct resource *res, *last;
+ resource_size_t new_start;
+
+ last = NULL;
+ for_each_dpa_resource(ndd, res)
+ if (strcmp(res->name, label_id->id) == 0)
+ last = res;
+ res = last;
+ if (!res)
+ return 0;
+
+ if (n >= resource_size(res)) {
+ n -= resource_size(res);
+ nd_dbg_dpa(nd_region, ndd, res, "delete %d\n", rc);
+ nvdimm_free_dpa(ndd, res);
+ /* retry with last resource deleted */
+ continue;
+ }
+
+ /*
+ * Keep BLK allocations relegated to high DPA as much as
+ * possible
+ */
+ if (is_blk)
+ new_start = res->start + n;
+ else
+ new_start = res->start;
+
+ rc = adjust_resource(res, new_start, resource_size(res) - n);
+ nd_dbg_dpa(nd_region, ndd, res, "shrink %d\n", rc);
+ break;
+ }
+
+ return rc;
+}
+
+/**
+ * shrink_dpa_allocation - for each dimm in region free n bytes for label_id
+ * @nd_region: the set of dimms to reclaim @n bytes from
+ * @label_id: unique identifier for the namespace consuming this dpa range
+ * @n: number of bytes per-dimm to release
+ *
+ * Assumes resources are ordered. Starting from the end try to
+ * adjust_resource() the allocation to @n, but if @n is larger than the
+ * allocation delete it and find the 'new' last allocation in the label
+ * set.
+ */
+static int shrink_dpa_allocation(struct nd_region *nd_region,
+ struct nd_label_id *label_id, resource_size_t n)
+{
+ int i;
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ int rc;
+
+ rc = scan_free(nd_region, nd_mapping, label_id, n);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+static resource_size_t init_dpa_allocation(struct nd_label_id *label_id,
+ struct nd_region *nd_region, struct nd_mapping *nd_mapping,
+ resource_size_t n)
+{
+ bool is_blk = strncmp(label_id->id, "blk", 3) == 0;
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ resource_size_t first_dpa;
+ struct resource *res;
+ int rc = 0;
+
+ /* allocate blk from highest dpa first */
+ if (is_blk)
+ first_dpa = nd_mapping->start + nd_mapping->size - n;
+ else
+ first_dpa = nd_mapping->start;
+
+ /* first resource allocation for this label-id or dimm */
+ res = nvdimm_allocate_dpa(ndd, label_id, first_dpa, n);
+ if (!res)
+ rc = -EBUSY;
+
+ nd_dbg_dpa(nd_region, ndd, res, "init %d\n", rc);
+ return rc ? n : 0;
+}
+
+static bool space_valid(bool is_pmem, struct nd_label_id *label_id,
+ struct resource *res)
+{
+ /*
+ * For BLK-space any space is valid, for PMEM-space, it must be
+ * contiguous with an existing allocation.
+ */
+ if (!is_pmem)
+ return true;
+ if (!res || strcmp(res->name, label_id->id) == 0)
+ return true;
+ return false;
+}
+
+enum alloc_loc {
+ ALLOC_ERR = 0, ALLOC_BEFORE, ALLOC_MID, ALLOC_AFTER,
+};
+
+static resource_size_t scan_allocate(struct nd_region *nd_region,
+ struct nd_mapping *nd_mapping, struct nd_label_id *label_id,
+ resource_size_t n)
+{
+ resource_size_t mapping_end = nd_mapping->start + nd_mapping->size - 1;
+ bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0;
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ const resource_size_t to_allocate = n;
+ struct resource *res;
+ int first;
+
+ retry:
+ first = 0;
+ for_each_dpa_resource(ndd, res) {
+ resource_size_t allocate, available = 0, free_start, free_end;
+ struct resource *next = res->sibling, *new_res = NULL;
+ enum alloc_loc loc = ALLOC_ERR;
+ const char *action;
+ int rc = 0;
+
+ /* ignore resources outside this nd_mapping */
+ if (res->start > mapping_end)
+ continue;
+ if (res->end < nd_mapping->start)
+ continue;
+
+ /* space at the beginning of the mapping */
+ if (!first++ && res->start > nd_mapping->start) {
+ free_start = nd_mapping->start;
+ available = res->start - free_start;
+ if (space_valid(is_pmem, label_id, NULL))
+ loc = ALLOC_BEFORE;
+ }
+
+ /* space between allocations */
+ if (!loc && next) {
+ free_start = res->start + resource_size(res);
+ free_end = min(mapping_end, next->start - 1);
+ if (space_valid(is_pmem, label_id, res)
+ && free_start < free_end) {
+ available = free_end + 1 - free_start;
+ loc = ALLOC_MID;
+ }
+ }
+
+ /* space at the end of the mapping */
+ if (!loc && !next) {
+ free_start = res->start + resource_size(res);
+ free_end = mapping_end;
+ if (space_valid(is_pmem, label_id, res)
+ && free_start < free_end) {
+ available = free_end + 1 - free_start;
+ loc = ALLOC_AFTER;
+ }
+ }
+
+ if (!loc || !available)
+ continue;
+ allocate = min(available, n);
+ switch (loc) {
+ case ALLOC_BEFORE:
+ if (strcmp(res->name, label_id->id) == 0) {
+ /* adjust current resource up */
+ if (is_pmem)
+ return n;
+ rc = adjust_resource(res, res->start - allocate,
+ resource_size(res) + allocate);
+ action = "cur grow up";
+ } else
+ action = "allocate";
+ break;
+ case ALLOC_MID:
+ if (strcmp(next->name, label_id->id) == 0) {
+ /* adjust next resource up */
+ if (is_pmem)
+ return n;
+ rc = adjust_resource(next, next->start
+ - allocate, resource_size(next)
+ + allocate);
+ new_res = next;
+ action = "next grow up";
+ } else if (strcmp(res->name, label_id->id) == 0) {
+ action = "grow down";
+ } else
+ action = "allocate";
+ break;
+ case ALLOC_AFTER:
+ if (strcmp(res->name, label_id->id) == 0)
+ action = "grow down";
+ else
+ action = "allocate";
+ break;
+ default:
+ return n;
+ }
+
+ if (strcmp(action, "allocate") == 0) {
+ /* BLK allocate bottom up */
+ if (!is_pmem)
+ free_start += available - allocate;
+ else if (free_start != nd_mapping->start)
+ return n;
+
+ new_res = nvdimm_allocate_dpa(ndd, label_id,
+ free_start, allocate);
+ if (!new_res)
+ rc = -EBUSY;
+ } else if (strcmp(action, "grow down") == 0) {
+ /* adjust current resource down */
+ rc = adjust_resource(res, res->start, resource_size(res)
+ + allocate);
+ }
+
+ if (!new_res)
+ new_res = res;
+
+ nd_dbg_dpa(nd_region, ndd, new_res, "%s(%d) %d\n",
+ action, loc, rc);
+
+ if (rc)
+ return n;
+
+ n -= allocate;
+ if (n) {
+ /*
+ * Retry scan with newly inserted resources.
+ * For example, if we did an ALLOC_BEFORE
+ * insertion there may also have been space
+ * available for an ALLOC_AFTER insertion, so we
+ * need to check this same resource again
+ */
+ goto retry;
+ } else
+ return 0;
+ }
+
+ if (is_pmem && n == to_allocate)
+ return init_dpa_allocation(label_id, nd_region, nd_mapping, n);
+ return n;
+}
+
+/**
+ * grow_dpa_allocation - for each dimm allocate n bytes for @label_id
+ * @nd_region: the set of dimms to allocate @n more bytes from
+ * @label_id: unique identifier for the namespace consuming this dpa range
+ * @n: number of bytes per-dimm to add to the existing allocation
+ *
+ * Assumes resources are ordered. For BLK regions, first consume
+ * BLK-only available DPA free space, then consume PMEM-aliased DPA
+ * space starting at the highest DPA. For PMEM regions start
+ * allocations from the start of an interleave set and end at the first
+ * BLK allocation or the end of the interleave set, whichever comes
+ * first.
+ */
+static int grow_dpa_allocation(struct nd_region *nd_region,
+ struct nd_label_id *label_id, resource_size_t n)
+{
+ int i;
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ int rc;
+
+ rc = scan_allocate(nd_region, nd_mapping, label_id, n);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+static void nd_namespace_pmem_set_size(struct nd_region *nd_region,
+ struct nd_namespace_pmem *nspm, resource_size_t size)
+{
+ struct resource *res = &nspm->nsio.res;
+
+ res->start = nd_region->ndr_start;
+ res->end = nd_region->ndr_start + size - 1;
+}
+
+static ssize_t __size_store(struct device *dev, unsigned long long val)
+{
+ resource_size_t allocated = 0, available = 0;
+ struct nd_region *nd_region = to_nd_region(dev->parent);
+ struct nd_mapping *nd_mapping;
+ struct nvdimm_drvdata *ndd;
+ struct nd_label_id label_id;
+ u32 flags = 0, remainder;
+ u8 *uuid = NULL;
+ int rc, i;
+
+ if (dev->driver)
+ return -EBUSY;
+
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ uuid = nspm->uuid;
+ } else if (is_namespace_blk(dev)) {
+ /* TODO: blk namespace support */
+ return -ENXIO;
+ }
+
+ /*
+ * We need a uuid for the allocation-label and dimm(s) on which
+ * to store the label.
+ */
+ if (!uuid || nd_region->ndr_mappings == 0)
+ return -ENXIO;
+
+ div_u64_rem(val, SZ_4K * nd_region->ndr_mappings, &remainder);
+ if (remainder) {
+ dev_dbg(dev, "%llu is not %dK aligned\n", val,
+ (SZ_4K * nd_region->ndr_mappings) / SZ_1K);
+ return -EINVAL;
+ }
+
+ nd_label_gen_id(&label_id, uuid, flags);
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ nd_mapping = &nd_region->mapping[i];
+ ndd = to_ndd(nd_mapping);
+
+ /*
+ * All dimms in an interleave set, or the base dimm for a blk
+ * region, need to be enabled for the size to be changed.
+ */
+ if (!ndd)
+ return -ENXIO;
+
+ allocated += nvdimm_allocated_dpa(ndd, &label_id);
+ }
+ available = nd_region_available_dpa(nd_region);
+
+ if (val > available + allocated)
+ return -ENOSPC;
+
+ if (val == allocated)
+ return 0;
+
+ val = div_u64(val, nd_region->ndr_mappings);
+ allocated = div_u64(allocated, nd_region->ndr_mappings);
+ if (val < allocated)
+ rc = shrink_dpa_allocation(nd_region, &label_id,
+ allocated - val);
+ else
+ rc = grow_dpa_allocation(nd_region, &label_id, val - allocated);
+
+ if (rc)
+ return rc;
+
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ nd_namespace_pmem_set_size(nd_region, nspm,
+ val * nd_region->ndr_mappings);
+ }
+
+ return rc;
+}
+
+static ssize_t size_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ unsigned long long val;
+ u8 **uuid = NULL;
+ int rc;
+
+ rc = kstrtoull(buf, 0, &val);
+ if (rc)
+ return rc;
+
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
+ wait_nvdimm_bus_probe_idle(dev);
+ rc = __size_store(dev, val);
+
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ uuid = &nspm->uuid;
+ } else if (is_namespace_blk(dev)) {
+ /* TODO: blk namespace support */
+ rc = -ENXIO;
+ }
+
+ if (rc == 0 && val == 0 && uuid) {
+ /* setting size zero == 'delete namespace' */
+ kfree(*uuid);
+ *uuid = NULL;
+ }
+
+ dev_dbg(dev, "%s: %llx %s (%d)\n", __func__, val, rc < 0
+ ? "fail" : "success", rc);
+
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
+
+ return rc ? rc : len;
+}
+
+static ssize_t size_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ return sprintf(buf, "%llu\n", (unsigned long long)
+ resource_size(&nspm->nsio.res));
+ } else if (is_namespace_blk(dev)) {
+ /* TODO: blk namespace support */
+ return -ENXIO;
+ } else if (is_namespace_io(dev)) {
+ struct nd_namespace_io *nsio = to_nd_namespace_io(dev);
+
+ return sprintf(buf, "%llu\n", (unsigned long long)
+ resource_size(&nsio->res));
+ } else
+ return -ENXIO;
+}
+static DEVICE_ATTR(size, S_IRUGO, size_show, size_store);
+
+static ssize_t uuid_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ u8 *uuid;
+
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ uuid = nspm->uuid;
+ } else if (is_namespace_blk(dev)) {
+ /* TODO: blk namespace support */
+ return -ENXIO;
+ } else
+ return -ENXIO;
+
+ if (uuid)
+ return sprintf(buf, "%pUb\n", uuid);
+ return sprintf(buf, "\n");
+}
+
+/**
+ * namespace_update_uuid - check for a unique uuid and whether we're "renaming"
+ * @nd_region: parent region so we can updates all dimms in the set
+ * @dev: namespace type for generating label_id
+ * @new_uuid: incoming uuid
+ * @old_uuid: reference to the uuid storage location in the namespace object
+ */
+static int namespace_update_uuid(struct nd_region *nd_region,
+ struct device *dev, u8 *new_uuid, u8 **old_uuid)
+{
+ u32 flags = is_namespace_blk(dev) ? NSLABEL_FLAG_LOCAL : 0;
+ struct nd_label_id old_label_id;
+ struct nd_label_id new_label_id;
+ int i, rc;
+
+ rc = nd_is_uuid_unique(dev, new_uuid) ? 0 : -EINVAL;
+ if (rc) {
+ kfree(new_uuid);
+ return rc;
+ }
+
+ if (*old_uuid == NULL)
+ goto out;
+
+ nd_label_gen_id(&old_label_id, *old_uuid, flags);
+ nd_label_gen_id(&new_label_id, new_uuid, flags);
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct resource *res;
+
+ for_each_dpa_resource(ndd, res)
+ if (strcmp(res->name, old_label_id.id) == 0)
+ sprintf((void *) res->name, "%s",
+ new_label_id.id);
+ }
+ kfree(*old_uuid);
+ out:
+ *old_uuid = new_uuid;
+ return 0;
+}
+
+static ssize_t uuid_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ struct nd_region *nd_region = to_nd_region(dev->parent);
+ u8 *uuid = NULL;
+ u8 **ns_uuid;
+ ssize_t rc;
+
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ ns_uuid = &nspm->uuid;
+ } else if (is_namespace_blk(dev)) {
+ /* TODO: blk namespace support */
+ return -ENXIO;
+ } else
+ return -ENXIO;
+
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
+ wait_nvdimm_bus_probe_idle(dev);
+ rc = nd_uuid_store(dev, &uuid, buf, len);
+ if (rc >= 0)
+ rc = namespace_update_uuid(nd_region, dev, uuid, ns_uuid);
+ dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
+ rc, buf, buf[len - 1] == '\n' ? "" : "\n");
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
+
+ return rc ? rc : len;
+}
+static DEVICE_ATTR_RW(uuid);
+
+static ssize_t resource_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct resource *res;
+
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ res = &nspm->nsio.res;
+ } else if (is_namespace_io(dev)) {
+ struct nd_namespace_io *nsio = to_nd_namespace_io(dev);
+
+ res = &nsio->res;
+ } else
+ return -ENXIO;
+
+ /* no address to convey if the namespace has no allocation */
+ if (resource_size(res) == 0)
+ return -ENXIO;
+ return sprintf(buf, "%#llx\n", (unsigned long long) res->start);
+}
+static DEVICE_ATTR_RO(resource);
+
static struct attribute *nd_namespace_attributes[] = {
&dev_attr_nstype.attr,
+ &dev_attr_size.attr,
+ &dev_attr_uuid.attr,
+ &dev_attr_resource.attr,
+ &dev_attr_alt_name.attr,
NULL,
};

+static umode_t namespace_visible(struct kobject *kobj,
+ struct attribute *a, int n)
+{
+ struct device *dev = container_of(kobj, struct device, kobj);
+
+ if (a == &dev_attr_resource.attr) {
+ if (is_namespace_blk(dev))
+ return 0;
+ return a->mode;
+ }
+
+ if (is_namespace_pmem(dev) || is_namespace_blk(dev)) {
+ if (a == &dev_attr_size.attr)
+ return S_IWUSR | S_IRUGO;
+ return a->mode;
+ }
+
+ if (a == &dev_attr_nstype.attr || a == &dev_attr_size.attr)
+ return a->mode;
+
+ return 0;
+}
+
static struct attribute_group nd_namespace_attribute_group = {
.attrs = nd_namespace_attributes,
+ .is_visible = namespace_visible,
};

static const struct attribute_group *nd_namespace_attribute_groups[] = {
@@ -81,23 +784,318 @@ static struct device **create_namespace_io(struct nd_region *nd_region)
return devs;
}

+static bool has_uuid_at_pos(struct nd_region *nd_region, u8 *uuid,
+ u64 cookie, u16 pos)
+{
+ struct nd_namespace_label *found = NULL;
+ int i;
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nd_namespace_label *nd_label;
+ bool found_uuid = false;
+ int l;
+
+ for_each_label(l, nd_label, nd_mapping->labels) {
+ u64 isetcookie = __le64_to_cpu(nd_label->isetcookie);
+ u16 position = __le16_to_cpu(nd_label->position);
+ u16 nlabel = __le16_to_cpu(nd_label->nlabel);
+
+ if (isetcookie != cookie)
+ continue;
+
+ if (memcmp(nd_label->uuid, uuid, NSLABEL_UUID_LEN) != 0)
+ continue;
+
+ if (found_uuid) {
+ dev_dbg(to_ndd(nd_mapping)->dev,
+ "%s duplicate entry for uuid\n",
+ __func__);
+ return false;
+ }
+ found_uuid = true;
+ if (nlabel != nd_region->ndr_mappings)
+ continue;
+ if (position != pos)
+ continue;
+ found = nd_label;
+ break;
+ }
+ if (found)
+ break;
+ }
+ return found != NULL;
+}
+
+static int select_pmem_id(struct nd_region *nd_region, u8 *pmem_id)
+{
+ struct nd_namespace_label *select = NULL;
+ int i;
+
+ if (!pmem_id)
+ return -ENODEV;
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nd_namespace_label *nd_label;
+ u64 hw_start, hw_end, pmem_start, pmem_end;
+ int l;
+
+ for_each_label(l, nd_label, nd_mapping->labels)
+ if (memcmp(nd_label->uuid, pmem_id, NSLABEL_UUID_LEN) == 0)
+ break;
+
+ if (!nd_label) {
+ WARN_ON(1);
+ return -EINVAL;
+ }
+
+ select = nd_label;
+ /*
+ * Check that this label is compliant with the dpa
+ * range published in NFIT
+ */
+ hw_start = nd_mapping->start;
+ hw_end = hw_start + nd_mapping->size;
+ pmem_start = __le64_to_cpu(select->dpa);
+ pmem_end = pmem_start + __le64_to_cpu(select->rawsize);
+ if (pmem_start == hw_start && pmem_end <= hw_end)
+ /* pass */;
+ else
+ return -EINVAL;
+
+ nd_mapping->labels[0] = select;
+ nd_mapping->labels[1] = NULL;
+ }
+ return 0;
+}
+
+/**
+ * find_pmem_label_set - validate interleave set labelling, retrieve label0
+ * @nd_region: region with mappings to validate
+ */
+static int find_pmem_label_set(struct nd_region *nd_region,
+ struct nd_namespace_pmem *nspm)
+{
+ u64 cookie = nd_region_interleave_set_cookie(nd_region);
+ struct nd_namespace_label *nd_label;
+ u8 select_id[NSLABEL_UUID_LEN];
+ resource_size_t size = 0;
+ u8 *pmem_id = NULL;
+ int rc = -ENODEV, l;
+ u16 i;
+
+ if (cookie == 0)
+ return -ENXIO;
+
+ /*
+ * Find a complete set of labels by uuid. By definition we can start
+ * with any mapping as the reference label
+ */
+ for_each_label(l, nd_label, nd_region->mapping[0].labels) {
+ u64 isetcookie = __le64_to_cpu(nd_label->isetcookie);
+
+ if (isetcookie != cookie)
+ continue;
+
+ for (i = 0; nd_region->ndr_mappings; i++)
+ if (!has_uuid_at_pos(nd_region, nd_label->uuid,
+ cookie, i))
+ break;
+ if (i < nd_region->ndr_mappings) {
+ /*
+ * Give up if we don't find an instance of a
+ * uuid at each position (from 0 to
+ * nd_region->ndr_mappings - 1), or if we find a
+ * dimm with two instances of the same uuid.
+ */
+ rc = -EINVAL;
+ goto err;
+ } else if (pmem_id) {
+ /*
+ * If there is more than one valid uuid set, we
+ * need userspace to clean this up.
+ */
+ rc = -EBUSY;
+ goto err;
+ }
+ memcpy(select_id, nd_label->uuid, NSLABEL_UUID_LEN);
+ pmem_id = select_id;
+ }
+
+ /*
+ * Fix up each mapping's 'labels' to have the validated pmem label for
+ * that position at labels[0], and NULL at labels[1]. In the process,
+ * check that the namespace aligns with interleave-set. We know
+ * that it does not overlap with any blk namespaces by virtue of
+ * the dimm being enabled (i.e. nd_label_reserve_dpa()
+ * succeeded).
+ */
+ rc = select_pmem_id(nd_region, pmem_id);
+ if (rc)
+ goto err;
+
+ /* Calculate total size and populate namespace properties from label0 */
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nd_namespace_label *label0 = nd_mapping->labels[0];
+
+ size += __le64_to_cpu(label0->rawsize);
+ if (__le16_to_cpu(label0->position) != 0)
+ continue;
+ WARN_ON(nspm->alt_name || nspm->uuid);
+ nspm->alt_name = kmemdup((void __force *) label0->name,
+ NSLABEL_NAME_LEN, GFP_KERNEL);
+ nspm->uuid = kmemdup((void __force *) label0->uuid,
+ NSLABEL_UUID_LEN, GFP_KERNEL);
+ }
+
+ if (!nspm->alt_name || !nspm->uuid) {
+ rc = -ENOMEM;
+ goto err;
+ }
+
+ nd_namespace_pmem_set_size(nd_region, nspm, size);
+
+ return 0;
+ err:
+ switch (rc) {
+ case -EINVAL:
+ dev_dbg(&nd_region->dev, "%s: invalid label(s)\n", __func__);
+ break;
+ case -ENODEV:
+ dev_dbg(&nd_region->dev, "%s: label not found\n", __func__);
+ break;
+ default:
+ dev_dbg(&nd_region->dev, "%s: unexpected err: %d\n",
+ __func__, rc);
+ break;
+ }
+ return rc;
+}
+
+static struct device **create_namespace_pmem(struct nd_region *nd_region)
+{
+ struct nd_namespace_pmem *nspm;
+ struct device *dev, **devs;
+ struct resource *res;
+ int rc;
+
+ nspm = kzalloc(sizeof(*nspm), GFP_KERNEL);
+ if (!nspm)
+ return NULL;
+
+ dev = &nspm->nsio.dev;
+ dev->type = &namespace_pmem_device_type;
+ dev->parent = &nd_region->dev;
+ res = &nspm->nsio.res;
+ res->name = dev_name(&nd_region->dev);
+ res->flags = IORESOURCE_MEM;
+ rc = find_pmem_label_set(nd_region, nspm);
+ if (rc == -ENODEV) {
+ int i;
+
+ /* Pass, try to permit namespace creation... */
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+
+ kfree(nd_mapping->labels);
+ nd_mapping->labels = NULL;
+ }
+
+ /* Publish a zero-sized namespace for userspace to configure. */
+ nd_namespace_pmem_set_size(nd_region, nspm, 0);
+
+ rc = 0;
+ } else if (rc)
+ goto err;
+
+ devs = kcalloc(2, sizeof(struct device *), GFP_KERNEL);
+ if (!devs)
+ goto err;
+
+ devs[0] = dev;
+ return devs;
+
+ err:
+ namespace_pmem_release(&nspm->nsio.dev);
+ return NULL;
+}
+
+static int init_active_labels(struct nd_region *nd_region)
+{
+ int i;
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct nvdimm *nvdimm = nd_mapping->nvdimm;
+ int count, j;
+
+ /*
+ * If the dimm is disabled then prevent the region from
+ * being activated if it aliases DPA.
+ */
+ if (!ndd) {
+ if ((nvdimm->flags & NDD_ALIASING) == 0)
+ return 0;
+ dev_dbg(&nd_region->dev, "%s: is disabled, failing probe\n",
+ dev_name(&nd_mapping->nvdimm->dev));
+ return -ENXIO;
+ }
+ nd_mapping->ndd = ndd;
+ atomic_inc(&nvdimm->busy);
+ get_ndd(ndd);
+
+ count = nd_label_active_count(ndd);
+ dev_dbg(ndd->dev, "%s: %d\n", __func__, count);
+ if (!count)
+ continue;
+ nd_mapping->labels = kcalloc(count + 1, sizeof(void *),
+ GFP_KERNEL);
+ if (!nd_mapping->labels)
+ return -ENOMEM;
+ for (j = 0; j < count; j++) {
+ struct nd_namespace_label *label;
+
+ label = nd_label_active(ndd, j);
+ nd_mapping->labels[j] = label;
+ }
+ }
+
+ return 0;
+}
+
int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
{
struct device **devs = NULL;
- int i;
+ int i, rc = 0, type;

*err = 0;
- switch (nd_region_to_nstype(nd_region)) {
+ nvdimm_bus_lock(&nd_region->dev);
+ rc = init_active_labels(nd_region);
+ if (rc) {
+ nvdimm_bus_unlock(&nd_region->dev);
+ return rc;
+ }
+
+ type = nd_region_to_nstype(nd_region);
+ switch (type) {
case ND_DEVICE_NAMESPACE_IO:
devs = create_namespace_io(nd_region);
break;
+ case ND_DEVICE_NAMESPACE_PMEM:
+ devs = create_namespace_pmem(nd_region);
+ break;
default:
break;
}
+ nvdimm_bus_unlock(&nd_region->dev);

if (!devs)
return -ENODEV;

+ nd_region->ns_seed = devs[0];
for (i = 0; devs[i]; i++) {
struct device *dev = devs[i];

diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 6a4b2c066ee7..c6c889292bab 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -56,4 +56,16 @@ int nvdimm_bus_register_dimms(struct nvdimm_bus *nvdimm_bus);
int nvdimm_bus_register_regions(struct nvdimm_bus *nvdimm_bus);
int nvdimm_bus_init_interleave_sets(struct nvdimm_bus *nvdimm_bus);
int nd_match_dimm(struct device *dev, void *data);
+struct nd_label_id;
+char *nd_label_gen_id(struct nd_label_id *label_id, u8 *uuid, u32 flags);
+bool nd_is_uuid_unique(struct device *dev, u8 *uuid);
+struct nd_region;
+struct nvdimm_drvdata;
+struct nd_mapping;
+resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
+ struct nd_mapping *nd_mapping, resource_size_t *overlap);
+resource_size_t nd_region_available_dpa(struct nd_region *nd_region);
+resource_size_t nvdimm_allocated_dpa(struct nvdimm_drvdata *ndd,
+ struct nd_label_id *label_id);
+void get_ndd(struct nvdimm_drvdata *ndd);
#endif /* __ND_CORE_H__ */
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 401fa0d5b6ea..03e610cd9f43 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -16,6 +16,7 @@
#include <linux/device.h>
#include <linux/mutex.h>
#include <linux/ndctl.h>
+#include <linux/types.h>
#include "label.h"

struct nvdimm_drvdata {
@@ -25,6 +26,7 @@ struct nvdimm_drvdata {
void *data;
int ns_current, ns_next;
struct resource dpa;
+ struct kref kref;
};

struct nd_region_namespaces {
@@ -59,12 +61,19 @@ static inline struct nd_namespace_index *to_next_namespace_index(
(unsigned long long) (res ? resource_size(res) : 0), \
(unsigned long long) (res ? res->start : 0), ##arg)

+#define for_each_label(l, label, labels) \
+ for (l = 0; (label = labels ? labels[l] : NULL); l++)
+
+#define for_each_dpa_resource(ndd, res) \
+ for (res = (ndd)->dpa.child; res; res = res->sibling)
+
#define for_each_dpa_resource_safe(ndd, res, next) \
for (res = (ndd)->dpa.child, next = res ? res->sibling : NULL; \
res; res = next, next = next ? next->sibling : NULL)

struct nd_region {
struct device dev;
+ struct device *ns_seed;
u16 ndr_mappings;
u64 ndr_size;
u64 ndr_start;
@@ -88,20 +97,28 @@ enum nd_async_mode {
ND_ASYNC,
};

+void wait_nvdimm_bus_probe_idle(struct device *dev);
void nd_device_register(struct device *dev);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode);
+int nd_uuid_store(struct device *dev, u8 **uuid_out, const char *buf,
+ size_t len);
int __init nvdimm_init(void);
int __init nd_region_init(void);
void nvdimm_exit(void);
void nd_region_exit(void);
+struct nvdimm;
+struct nvdimm_drvdata *to_ndd(struct nd_mapping *nd_mapping);
int nvdimm_init_nsarea(struct nvdimm_drvdata *ndd);
int nvdimm_init_config_data(struct nvdimm_drvdata *ndd);
struct nd_region *to_nd_region(struct device *dev);
int nd_region_to_nstype(struct nd_region *nd_region);
int nd_region_register_namespaces(struct nd_region *nd_region, int *err);
+u64 nd_region_interleave_set_cookie(struct nd_region *nd_region);
void nvdimm_bus_lock(struct device *dev);
void nvdimm_bus_unlock(struct device *dev);
bool is_nvdimm_bus_locked(struct device *dev);
+void nvdimm_drvdata_release(struct kref *kref);
+void put_ndd(struct nvdimm_drvdata *ndd);
int nd_label_reserve_dpa(struct nvdimm_drvdata *ndd);
void nvdimm_free_dpa(struct nvdimm_drvdata *ndd, struct resource *res);
struct resource *nvdimm_allocate_dpa(struct nvdimm_drvdata *ndd,
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index d46975ed9e40..90902a142e35 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -203,6 +203,23 @@ static int nd_pmem_probe(struct device *dev)
struct nd_namespace_io *nsio = to_nd_namespace_io(dev);
struct pmem_device *pmem;

+ if (resource_size(&nsio->res) < ND_MIN_NAMESPACE_SIZE) {
+ resource_size_t size = resource_size(&nsio->res);
+
+ dev_dbg(dev, "%s: size: %pa, too small must be at least %#x\n",
+ __func__, &size, ND_MIN_NAMESPACE_SIZE);
+ return -ENODEV;
+ }
+
+ if (nd_region_to_nstype(nd_region) == ND_DEVICE_NAMESPACE_PMEM) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ if (!nspm->uuid) {
+ dev_dbg(dev, "%s: uuid not set\n", __func__);
+ return -ENODEV;
+ }
+ }
+
pmem = pmem_alloc(dev, &nsio->res, nd_region->id);
if (IS_ERR(pmem))
return PTR_ERR(pmem);
@@ -222,13 +239,14 @@ static int nd_pmem_remove(struct device *dev)

MODULE_ALIAS("pmem");
MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_IO);
+MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_PMEM);
static struct nd_device_driver nd_pmem_driver = {
.probe = nd_pmem_probe,
.remove = nd_pmem_remove,
.drv = {
.name = "nd_pmem",
},
- .type = ND_DRIVER_NAMESPACE_IO,
+ .type = ND_DRIVER_NAMESPACE_IO | ND_DRIVER_NAMESPACE_PMEM,
};

static int __init pmem_init(void)
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index ade3dba81afd..9aba44e483e0 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -61,8 +61,11 @@ static int child_unregister(struct device *dev, void *data)

static int nd_region_remove(struct device *dev)
{
+ struct nd_region *nd_region = to_nd_region(dev);
+
/* flush attribute readers and disable */
nvdimm_bus_lock(dev);
+ nd_region->ns_seed = NULL;
dev_set_drvdata(dev, NULL);
nvdimm_bus_unlock(dev);

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 1571424578f0..b45806f7176d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -15,6 +15,7 @@
#include <linux/slab.h>
#include <linux/sort.h>
#include <linux/io.h>
+#include <linux/nd.h>
#include "nd-core.h"
#include "nd.h"

@@ -99,6 +100,58 @@ int nd_region_to_nstype(struct nd_region *nd_region)

return 0;
}
+EXPORT_SYMBOL(nd_region_to_nstype);
+
+static int is_uuid_busy(struct device *dev, void *data)
+{
+ struct nd_region *nd_region = to_nd_region(dev->parent);
+ u8 *uuid = data;
+
+ switch (nd_region_to_nstype(nd_region)) {
+ case ND_DEVICE_NAMESPACE_PMEM: {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ if (!nspm->uuid)
+ break;
+ if (memcmp(uuid, nspm->uuid, NSLABEL_UUID_LEN) == 0)
+ return -EBUSY;
+ break;
+ }
+ case ND_DEVICE_NAMESPACE_BLK: {
+ /* TODO: blk namespace support */
+ break;
+ }
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static int is_namespace_uuid_busy(struct device *dev, void *data)
+{
+ if (is_nd_pmem(dev) || is_nd_blk(dev))
+ return device_for_each_child(dev, data, is_uuid_busy);
+ return 0;
+}
+
+/**
+ * nd_is_uuid_unique - verify that no other namespace has @uuid
+ * @dev: any device on a nvdimm_bus
+ * @uuid: uuid to check
+ */
+bool nd_is_uuid_unique(struct device *dev, u8 *uuid)
+{
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+
+ if (!nvdimm_bus)
+ return false;
+ WARN_ON_ONCE(!is_nvdimm_bus_locked(&nvdimm_bus->dev));
+ if (device_for_each_child(&nvdimm_bus->dev, uuid,
+ is_namespace_uuid_busy) != 0)
+ return false;
+ return true;
+}

static ssize_t size_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -151,6 +204,60 @@ static ssize_t set_cookie_show(struct device *dev,
}
static DEVICE_ATTR_RO(set_cookie);

+resource_size_t nd_region_available_dpa(struct nd_region *nd_region)
+{
+ resource_size_t blk_max_overlap = 0, available, overlap;
+ int i;
+
+ WARN_ON(!is_nvdimm_bus_locked(&nd_region->dev));
+
+ retry:
+ available = 0;
+ overlap = blk_max_overlap;
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+
+ /* if a dimm is disabled the available capacity is zero */
+ if (!ndd)
+ return 0;
+
+ if (is_nd_pmem(&nd_region->dev)) {
+ available += nd_pmem_available_dpa(nd_region,
+ nd_mapping, &overlap);
+ if (overlap > blk_max_overlap) {
+ blk_max_overlap = overlap;
+ goto retry;
+ }
+ } else if (is_nd_blk(&nd_region->dev)) {
+ /* TODO: BLK Namespace support */
+ }
+ }
+
+ return available;
+}
+
+static ssize_t available_size_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ unsigned long long available = 0;
+
+ /*
+ * Flush in-flight updates and grab a snapshot of the available
+ * size. Of course, this value is potentially invalidated the
+ * memory nvdimm_bus_lock() is dropped, but that's userspace's
+ * problem to not race itself.
+ */
+ nvdimm_bus_lock(dev);
+ wait_nvdimm_bus_probe_idle(dev);
+ available = nd_region_available_dpa(nd_region);
+ nvdimm_bus_unlock(dev);
+
+ return sprintf(buf, "%llu\n", available);
+}
+static DEVICE_ATTR_RO(available_size);
+
static ssize_t init_namespaces_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -168,11 +275,29 @@ static ssize_t init_namespaces_show(struct device *dev,
}
static DEVICE_ATTR_RO(init_namespaces);

+static ssize_t namespace_seed_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ ssize_t rc;
+
+ nvdimm_bus_lock(dev);
+ if (nd_region->ns_seed)
+ rc = sprintf(buf, "%s\n", dev_name(nd_region->ns_seed));
+ else
+ rc = sprintf(buf, "\n");
+ nvdimm_bus_unlock(dev);
+ return rc;
+}
+static DEVICE_ATTR_RO(namespace_seed);
+
static struct attribute *nd_region_attributes[] = {
&dev_attr_size.attr,
&dev_attr_nstype.attr,
&dev_attr_mappings.attr,
&dev_attr_set_cookie.attr,
+ &dev_attr_available_size.attr,
+ &dev_attr_namespace_seed.attr,
&dev_attr_init_namespaces.attr,
NULL,
};
@@ -182,12 +307,18 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
struct device *dev = container_of(kobj, typeof(*dev), kobj);
struct nd_region *nd_region = to_nd_region(dev);
struct nd_interleave_set *nd_set = nd_region->nd_set;
+ int type = nd_region_to_nstype(nd_region);

- if (a != &dev_attr_set_cookie.attr)
+ if (a != &dev_attr_set_cookie.attr
+ && a != &dev_attr_available_size.attr)
return a->mode;

- if (is_nd_pmem(dev) && nd_set)
- return a->mode;
+ if ((type == ND_DEVICE_NAMESPACE_PMEM
+ || type == ND_DEVICE_NAMESPACE_BLK)
+ && a == &dev_attr_available_size.attr)
+ return a->mode;
+ else if (is_nd_pmem(dev) && nd_set)
+ return a->mode;

return 0;
}
@@ -198,6 +329,15 @@ struct attribute_group nd_region_attribute_group = {
};
EXPORT_SYMBOL_GPL(nd_region_attribute_group);

+u64 nd_region_interleave_set_cookie(struct nd_region *nd_region)
+{
+ struct nd_interleave_set *nd_set = nd_region->nd_set;
+
+ if (nd_set)
+ return nd_set->cookie;
+ return 0;
+}
+
/*
* Upon successful probe/remove, take/release a reference on the
* associated interleave set (if present)
@@ -205,18 +345,20 @@ EXPORT_SYMBOL_GPL(nd_region_attribute_group);
static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
struct device *dev, bool probe)
{
- if (is_nd_pmem(dev) || is_nd_blk(dev)) {
+ if (!probe && (is_nd_pmem(dev) || is_nd_blk(dev))) {
struct nd_region *nd_region = to_nd_region(dev);
int i;

for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nvdimm_drvdata *ndd = nd_mapping->ndd;
struct nvdimm *nvdimm = nd_mapping->nvdimm;

- if (probe)
- atomic_inc(&nvdimm->busy);
- else
- atomic_dec(&nvdimm->busy);
+ kfree(nd_mapping->labels);
+ nd_mapping->labels = NULL;
+ put_ndd(ndd);
+ nd_mapping->ndd = NULL;
+ atomic_dec(&nvdimm->busy);
}
}
}
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 1b627b109360..c130972e08c4 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -41,10 +41,20 @@ typedef int (*ndctl_fn)(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len);

+struct nd_namespace_label;
+struct nvdimm_drvdata;
struct nd_mapping {
struct nvdimm *nvdimm;
+ struct nd_namespace_label **labels;
u64 start;
u64 size;
+ /*
+ * @ndd is for private use at region enable / disable time for
+ * get_ndd() + put_ndd(), all other nd_mapping to ndd
+ * conversions use to_ndd() which respects enabled state of the
+ * nvdimm.
+ */
+ struct nvdimm_drvdata *ndd;
};

struct nvdimm_bus_descriptor {
diff --git a/include/linux/nd.h b/include/linux/nd.h
index da70e9962197..255c38a83083 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -28,16 +28,40 @@ static inline struct nd_device_driver *to_nd_device_driver(
return container_of(drv, struct nd_device_driver, drv);
};

+/**
+ * struct nd_namespace_io - infrastructure for loading an nd_pmem instance
+ * @dev: namespace device created by the nd region driver
+ * @res: struct resource conversion of a NFIT SPA table
+ */
struct nd_namespace_io {
struct device dev;
struct resource res;
};

+/**
+ * struct nd_namespace_pmem - namespace device for dimm-backed interleaved memory
+ * @nsio: device and system physical address range to drive
+ * @alt_name: namespace name supplied in the dimm label
+ * @uuid: namespace name supplied in the dimm label
+ */
+struct nd_namespace_pmem {
+ struct nd_namespace_io nsio;
+ char *alt_name;
+ u8 *uuid;
+};
+
static inline struct nd_namespace_io *to_nd_namespace_io(struct device *dev)
{
return container_of(dev, struct nd_namespace_io, dev);
}

+static inline struct nd_namespace_pmem *to_nd_namespace_pmem(struct device *dev)
+{
+ struct nd_namespace_io *nsio = to_nd_namespace_io(dev);
+
+ return container_of(nsio, struct nd_namespace_pmem, nsio);
+}
+
#define MODULE_ALIAS_ND_DEVICE(type) \
MODULE_ALIAS("nd:t" __stringify(type) "*")
#define ND_DEVICE_MODALIAS_FMT "nd:t%d"
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index 1357a87b8714..2b94ea2287bb 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -190,4 +190,8 @@ enum nd_driver_flags {
ND_DRIVER_NAMESPACE_PMEM = 1 << ND_DEVICE_NAMESPACE_PMEM,
ND_DRIVER_NAMESPACE_BLK = 1 << ND_DEVICE_NAMESPACE_BLK,
};
+
+enum {
+ ND_MIN_NAMESPACE_SIZE = 0x00400000,
+};
#endif /* __NDCTL_H__ */

2015-06-17 23:17:29

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 14/16] libnvdimm: blk labels and namespace instantiation

A blk label set describes a namespace comprised of one or more
discontiguous dpa ranges on a single dimm. They may alias with one or
more pmem interleave sets that include the given dimm.

This is the runtime/volatile configuration infrastructure for sysfs
manipulation of 'alt_name', 'uuid', 'size', and 'sector_size'. A later
patch will make these settings persistent by writing back the label(s).

Unlike pmem namespaces, multiple blk namespaces can be created per
region. Once a blk namespace has been created a new seed device
(unconfigured child of a parent blk region) is instantiated. As long as
a region has 'available_size' != 0 new child namespaces may be created.

Cc: Greg KH <[email protected]>
Cc: Neil Brown <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/core.c | 40 +++
drivers/nvdimm/dimm_devs.c | 36 +++
drivers/nvdimm/namespace_devs.c | 498 ++++++++++++++++++++++++++++++++++++---
drivers/nvdimm/nd-core.h | 8 +
drivers/nvdimm/nd.h | 5
drivers/nvdimm/region_devs.c | 17 +
include/linux/libnvdimm.h | 3
include/linux/nd.h | 25 ++
8 files changed, 594 insertions(+), 38 deletions(-)

diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index cf99cce8ef33..dd824d7c2669 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -173,6 +173,46 @@ int nd_uuid_store(struct device *dev, u8 **uuid_out, const char *buf,
return 0;
}

+ssize_t nd_sector_size_show(unsigned long current_lbasize,
+ const unsigned long *supported, char *buf)
+{
+ ssize_t len = 0;
+ int i;
+
+ for (i = 0; supported[i]; i++)
+ if (current_lbasize == supported[i])
+ len += sprintf(buf + len, "[%ld] ", supported[i]);
+ else
+ len += sprintf(buf + len, "%ld ", supported[i]);
+ len += sprintf(buf + len, "\n");
+ return len;
+}
+
+ssize_t nd_sector_size_store(struct device *dev, const char *buf,
+ unsigned long *current_lbasize, const unsigned long *supported)
+{
+ unsigned long lbasize;
+ int rc, i;
+
+ if (dev->driver)
+ return -EBUSY;
+
+ rc = kstrtoul(buf, 0, &lbasize);
+ if (rc)
+ return rc;
+
+ for (i = 0; supported[i]; i++)
+ if (lbasize == supported[i])
+ break;
+
+ if (supported[i]) {
+ *current_lbasize = lbasize;
+ return 0;
+ } else {
+ return -EINVAL;
+ }
+}
+
static ssize_t commands_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index b55acef179ba..101d3b76e405 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -290,6 +290,42 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
EXPORT_SYMBOL_GPL(nvdimm_create);

/**
+ * nd_blk_available_dpa - account the unused dpa of BLK region
+ * @nd_mapping: container of dpa-resource-root + labels
+ *
+ * Unlike PMEM, BLK namespaces can occupy discontiguous DPA ranges.
+ */
+resource_size_t nd_blk_available_dpa(struct nd_mapping *nd_mapping)
+{
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ resource_size_t map_end, busy = 0, available;
+ struct resource *res;
+
+ if (!ndd)
+ return 0;
+
+ map_end = nd_mapping->start + nd_mapping->size - 1;
+ for_each_dpa_resource(ndd, res)
+ if (res->start >= nd_mapping->start && res->start < map_end) {
+ resource_size_t end = min(map_end, res->end);
+
+ busy += end - res->start + 1;
+ } else if (res->end >= nd_mapping->start
+ && res->end <= map_end) {
+ busy += res->end - nd_mapping->start;
+ } else if (nd_mapping->start > res->start
+ && nd_mapping->start < res->end) {
+ /* total eclipse of the BLK region mapping */
+ busy += nd_mapping->size;
+ }
+
+ available = map_end - nd_mapping->start + 1;
+ if (busy < available)
+ return available - busy;
+ return 0;
+}
+
+/**
* nd_pmem_available_dpa - for the given dimm+region account unallocated dpa
* @nd_mapping: container of dpa-resource-root + labels
* @nd_region: constrain available space check to this reference region
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 5d81032fcfc5..ad0ec09ca40f 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -35,7 +35,15 @@ static void namespace_pmem_release(struct device *dev)

static void namespace_blk_release(struct device *dev)
{
- /* TODO: blk namespace support */
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+ struct nd_region *nd_region = to_nd_region(dev->parent);
+
+ if (nsblk->id >= 0)
+ ida_simple_remove(&nd_region->ns_ida, nsblk->id);
+ kfree(nsblk->alt_name);
+ kfree(nsblk->uuid);
+ kfree(nsblk->res);
+ kfree(nsblk);
}

static struct device_type namespace_io_device_type = {
@@ -88,8 +96,9 @@ static ssize_t __alt_name_store(struct device *dev, const char *buf,

ns_altname = &nspm->alt_name;
} else if (is_namespace_blk(dev)) {
- /* TODO: blk namespace support */
- return -ENXIO;
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+
+ ns_altname = &nsblk->alt_name;
} else
return -ENXIO;

@@ -122,6 +131,24 @@ out:
return rc;
}

+static resource_size_t nd_namespace_blk_size(struct nd_namespace_blk *nsblk)
+{
+ struct nd_region *nd_region = to_nd_region(nsblk->dev.parent);
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct nd_label_id label_id;
+ resource_size_t size = 0;
+ struct resource *res;
+
+ if (!nsblk->uuid)
+ return 0;
+ nd_label_gen_id(&label_id, nsblk->uuid, NSLABEL_FLAG_LOCAL);
+ for_each_dpa_resource(ndd, res)
+ if (strcmp(res->name, label_id.id) == 0)
+ size += resource_size(res);
+ return size;
+}
+
static ssize_t alt_name_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
@@ -148,8 +175,9 @@ static ssize_t alt_name_show(struct device *dev,

ns_altname = nspm->alt_name;
} else if (is_namespace_blk(dev)) {
- /* TODO: blk namespace support */
- return -ENXIO;
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+
+ ns_altname = nsblk->alt_name;
} else
return -ENXIO;

@@ -195,6 +223,8 @@ static int scan_free(struct nd_region *nd_region,
new_start = res->start;

rc = adjust_resource(res, new_start, resource_size(res) - n);
+ if (rc == 0)
+ res->flags |= DPA_RESOURCE_ADJUSTED;
nd_dbg_dpa(nd_region, ndd, res, "shrink %d\n", rc);
break;
}
@@ -255,14 +285,15 @@ static resource_size_t init_dpa_allocation(struct nd_label_id *label_id,
return rc ? n : 0;
}

-static bool space_valid(bool is_pmem, struct nd_label_id *label_id,
- struct resource *res)
+static bool space_valid(bool is_pmem, bool is_reserve,
+ struct nd_label_id *label_id, struct resource *res)
{
/*
* For BLK-space any space is valid, for PMEM-space, it must be
- * contiguous with an existing allocation.
+ * contiguous with an existing allocation unless we are
+ * reserving pmem.
*/
- if (!is_pmem)
+ if (is_reserve || !is_pmem)
return true;
if (!res || strcmp(res->name, label_id->id) == 0)
return true;
@@ -278,6 +309,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
resource_size_t n)
{
resource_size_t mapping_end = nd_mapping->start + nd_mapping->size - 1;
+ bool is_reserve = strcmp(label_id->id, "pmem-reserve") == 0;
bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
const resource_size_t to_allocate = n;
@@ -303,7 +335,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
if (!first++ && res->start > nd_mapping->start) {
free_start = nd_mapping->start;
available = res->start - free_start;
- if (space_valid(is_pmem, label_id, NULL))
+ if (space_valid(is_pmem, is_reserve, label_id, NULL))
loc = ALLOC_BEFORE;
}

@@ -311,7 +343,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
if (!loc && next) {
free_start = res->start + resource_size(res);
free_end = min(mapping_end, next->start - 1);
- if (space_valid(is_pmem, label_id, res)
+ if (space_valid(is_pmem, is_reserve, label_id, res)
&& free_start < free_end) {
available = free_end + 1 - free_start;
loc = ALLOC_MID;
@@ -322,7 +354,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
if (!loc && !next) {
free_start = res->start + resource_size(res);
free_end = mapping_end;
- if (space_valid(is_pmem, label_id, res)
+ if (space_valid(is_pmem, is_reserve, label_id, res)
&& free_start < free_end) {
available = free_end + 1 - free_start;
loc = ALLOC_AFTER;
@@ -336,7 +368,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
case ALLOC_BEFORE:
if (strcmp(res->name, label_id->id) == 0) {
/* adjust current resource up */
- if (is_pmem)
+ if (is_pmem && !is_reserve)
return n;
rc = adjust_resource(res, res->start - allocate,
resource_size(res) + allocate);
@@ -347,7 +379,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
case ALLOC_MID:
if (strcmp(next->name, label_id->id) == 0) {
/* adjust next resource up */
- if (is_pmem)
+ if (is_pmem && !is_reserve)
return n;
rc = adjust_resource(next, next->start
- allocate, resource_size(next)
@@ -373,7 +405,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
/* BLK allocate bottom up */
if (!is_pmem)
free_start += available - allocate;
- else if (free_start != nd_mapping->start)
+ else if (!is_reserve && free_start != nd_mapping->start)
return n;

new_res = nvdimm_allocate_dpa(ndd, label_id,
@@ -384,6 +416,8 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
/* adjust current resource down */
rc = adjust_resource(res, res->start, resource_size(res)
+ allocate);
+ if (rc == 0)
+ res->flags |= DPA_RESOURCE_ADJUSTED;
}

if (!new_res)
@@ -409,11 +443,108 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
return 0;
}

- if (is_pmem && n == to_allocate)
+ /*
+ * If we allocated nothing in the BLK case it may be because we are in
+ * an initial "pmem-reserve pass". Only do an initial BLK allocation
+ * when none of the DPA space is reserved.
+ */
+ if ((is_pmem || !ndd->dpa.child) && n == to_allocate)
return init_dpa_allocation(label_id, nd_region, nd_mapping, n);
return n;
}

+static int merge_dpa(struct nd_region *nd_region,
+ struct nd_mapping *nd_mapping, struct nd_label_id *label_id)
+{
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct resource *res;
+
+ if (strncmp("pmem", label_id->id, 4) == 0)
+ return 0;
+ retry:
+ for_each_dpa_resource(ndd, res) {
+ int rc;
+ struct resource *next = res->sibling;
+ resource_size_t end = res->start + resource_size(res);
+
+ if (!next || strcmp(res->name, label_id->id) != 0
+ || strcmp(next->name, label_id->id) != 0
+ || end != next->start)
+ continue;
+ end += resource_size(next);
+ nvdimm_free_dpa(ndd, next);
+ rc = adjust_resource(res, res->start, end - res->start);
+ nd_dbg_dpa(nd_region, ndd, res, "merge %d\n", rc);
+ if (rc)
+ return rc;
+ res->flags |= DPA_RESOURCE_ADJUSTED;
+ goto retry;
+ }
+
+ return 0;
+}
+
+static int __reserve_free_pmem(struct device *dev, void *data)
+{
+ struct nvdimm *nvdimm = data;
+ struct nd_region *nd_region;
+ struct nd_label_id label_id;
+ int i;
+
+ if (!is_nd_pmem(dev))
+ return 0;
+
+ nd_region = to_nd_region(dev);
+ if (nd_region->ndr_mappings == 0)
+ return 0;
+
+ memset(&label_id, 0, sizeof(label_id));
+ strcat(label_id.id, "pmem-reserve");
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ resource_size_t n, rem = 0;
+
+ if (nd_mapping->nvdimm != nvdimm)
+ continue;
+
+ n = nd_pmem_available_dpa(nd_region, nd_mapping, &rem);
+ if (n == 0)
+ return 0;
+ rem = scan_allocate(nd_region, nd_mapping, &label_id, n);
+ dev_WARN_ONCE(&nd_region->dev, rem,
+ "pmem reserve underrun: %#llx of %#llx bytes\n",
+ (unsigned long long) n - rem,
+ (unsigned long long) n);
+ return rem ? -ENXIO : 0;
+ }
+
+ return 0;
+}
+
+static void release_free_pmem(struct nvdimm_bus *nvdimm_bus,
+ struct nd_mapping *nd_mapping)
+{
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct resource *res, *_res;
+
+ for_each_dpa_resource_safe(ndd, res, _res)
+ if (strcmp(res->name, "pmem-reserve") == 0)
+ nvdimm_free_dpa(ndd, res);
+}
+
+static int reserve_free_pmem(struct nvdimm_bus *nvdimm_bus,
+ struct nd_mapping *nd_mapping)
+{
+ struct nvdimm *nvdimm = nd_mapping->nvdimm;
+ int rc;
+
+ rc = device_for_each_child(&nvdimm_bus->dev, nvdimm,
+ __reserve_free_pmem);
+ if (rc)
+ release_free_pmem(nvdimm_bus, nd_mapping);
+ return rc;
+}
+
/**
* grow_dpa_allocation - for each dimm allocate n bytes for @label_id
* @nd_region: the set of dimms to allocate @n more bytes from
@@ -430,13 +561,45 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
static int grow_dpa_allocation(struct nd_region *nd_region,
struct nd_label_id *label_id, resource_size_t n)
{
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(&nd_region->dev);
+ bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0;
int i;

for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
- int rc;
+ resource_size_t rem = n;
+ int rc, j;
+
+ /*
+ * In the BLK case try once with all unallocated PMEM
+ * reserved, and once without
+ */
+ for (j = is_pmem; j < 2; j++) {
+ bool blk_only = j == 0;
+
+ if (blk_only) {
+ rc = reserve_free_pmem(nvdimm_bus, nd_mapping);
+ if (rc)
+ return rc;
+ }
+ rem = scan_allocate(nd_region, nd_mapping,
+ label_id, rem);
+ if (blk_only)
+ release_free_pmem(nvdimm_bus, nd_mapping);

- rc = scan_allocate(nd_region, nd_mapping, label_id, n);
+ /* try again and allow encroachments into PMEM */
+ if (rem == 0)
+ break;
+ }
+
+ dev_WARN_ONCE(&nd_region->dev, rem,
+ "allocation underrun: %#llx of %#llx bytes\n",
+ (unsigned long long) n - rem,
+ (unsigned long long) n);
+ if (rem)
+ return -ENXIO;
+
+ rc = merge_dpa(nd_region, nd_mapping, label_id);
if (rc)
return rc;
}
@@ -472,8 +635,10 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)

uuid = nspm->uuid;
} else if (is_namespace_blk(dev)) {
- /* TODO: blk namespace support */
- return -ENXIO;
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+
+ uuid = nsblk->uuid;
+ flags = NSLABEL_FLAG_LOCAL;
}

/*
@@ -528,6 +693,14 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)

nd_namespace_pmem_set_size(nd_region, nspm,
val * nd_region->ndr_mappings);
+ } else if (is_namespace_blk(dev)) {
+ /*
+ * Try to delete the namespace if we deleted all of its
+ * allocation and this is not the seed device for the
+ * region.
+ */
+ if (val == 0 && nd_region->ns_seed != dev)
+ nd_device_unregister(dev, ND_ASYNC);
}

return rc;
@@ -554,8 +727,9 @@ static ssize_t size_store(struct device *dev,

uuid = &nspm->uuid;
} else if (is_namespace_blk(dev)) {
- /* TODO: blk namespace support */
- rc = -ENXIO;
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+
+ uuid = &nsblk->uuid;
}

if (rc == 0 && val == 0 && uuid) {
@@ -576,21 +750,23 @@ static ssize_t size_store(struct device *dev,
static ssize_t size_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
+ unsigned long long size = 0;
+
+ nvdimm_bus_lock(dev);
if (is_namespace_pmem(dev)) {
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);

- return sprintf(buf, "%llu\n", (unsigned long long)
- resource_size(&nspm->nsio.res));
+ size = resource_size(&nspm->nsio.res);
} else if (is_namespace_blk(dev)) {
- /* TODO: blk namespace support */
- return -ENXIO;
+ size = nd_namespace_blk_size(to_nd_namespace_blk(dev));
} else if (is_namespace_io(dev)) {
struct nd_namespace_io *nsio = to_nd_namespace_io(dev);

- return sprintf(buf, "%llu\n", (unsigned long long)
- resource_size(&nsio->res));
- } else
- return -ENXIO;
+ size = resource_size(&nsio->res);
+ }
+ nvdimm_bus_unlock(dev);
+
+ return sprintf(buf, "%llu\n", size);
}
static DEVICE_ATTR(size, S_IRUGO, size_show, size_store);

@@ -604,8 +780,9 @@ static ssize_t uuid_show(struct device *dev,

uuid = nspm->uuid;
} else if (is_namespace_blk(dev)) {
- /* TODO: blk namespace support */
- return -ENXIO;
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+
+ uuid = nsblk->uuid;
} else
return -ENXIO;

@@ -669,8 +846,9 @@ static ssize_t uuid_store(struct device *dev,

ns_uuid = &nspm->uuid;
} else if (is_namespace_blk(dev)) {
- /* TODO: blk namespace support */
- return -ENXIO;
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+
+ ns_uuid = &nsblk->uuid;
} else
return -ENXIO;

@@ -712,12 +890,48 @@ static ssize_t resource_show(struct device *dev,
}
static DEVICE_ATTR_RO(resource);

+static const unsigned long ns_lbasize_supported[] = { 512, 0 };
+
+static ssize_t sector_size_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+
+ if (!is_namespace_blk(dev))
+ return -ENXIO;
+
+ return nd_sector_size_show(nsblk->lbasize, ns_lbasize_supported, buf);
+}
+
+static ssize_t sector_size_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+ ssize_t rc;
+
+ if (!is_namespace_blk(dev))
+ return -ENXIO;
+
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
+ rc = nd_sector_size_store(dev, buf, &nsblk->lbasize,
+ ns_lbasize_supported);
+ dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
+ rc, buf, buf[len - 1] == '\n' ? "" : "\n");
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
+
+ return rc ? rc : len;
+}
+static DEVICE_ATTR_RW(sector_size);
+
static struct attribute *nd_namespace_attributes[] = {
&dev_attr_nstype.attr,
&dev_attr_size.attr,
&dev_attr_uuid.attr,
&dev_attr_resource.attr,
&dev_attr_alt_name.attr,
+ &dev_attr_sector_size.attr,
NULL,
};

@@ -735,6 +949,10 @@ static umode_t namespace_visible(struct kobject *kobj,
if (is_namespace_pmem(dev) || is_namespace_blk(dev)) {
if (a == &dev_attr_size.attr)
return S_IWUSR | S_IRUGO;
+
+ if (is_namespace_pmem(dev) && a == &dev_attr_sector_size.attr)
+ return 0;
+
return a->mode;
}

@@ -1022,6 +1240,176 @@ static struct device **create_namespace_pmem(struct nd_region *nd_region)
return NULL;
}

+struct resource *nsblk_add_resource(struct nd_region *nd_region,
+ struct nvdimm_drvdata *ndd, struct nd_namespace_blk *nsblk,
+ resource_size_t start)
+{
+ struct nd_label_id label_id;
+ struct resource *res;
+
+ nd_label_gen_id(&label_id, nsblk->uuid, NSLABEL_FLAG_LOCAL);
+ res = krealloc(nsblk->res,
+ sizeof(void *) * (nsblk->num_resources + 1),
+ GFP_KERNEL);
+ if (!res)
+ return NULL;
+ nsblk->res = (struct resource **) res;
+ for_each_dpa_resource(ndd, res)
+ if (strcmp(res->name, label_id.id) == 0
+ && res->start == start) {
+ nsblk->res[nsblk->num_resources++] = res;
+ return res;
+ }
+ return NULL;
+}
+
+static struct device *nd_namespace_blk_create(struct nd_region *nd_region)
+{
+ struct nd_namespace_blk *nsblk;
+ struct device *dev;
+
+ if (!is_nd_blk(&nd_region->dev))
+ return NULL;
+
+ nsblk = kzalloc(sizeof(*nsblk), GFP_KERNEL);
+ if (!nsblk)
+ return NULL;
+
+ dev = &nsblk->dev;
+ dev->type = &namespace_blk_device_type;
+ nsblk->id = ida_simple_get(&nd_region->ns_ida, 0, 0, GFP_KERNEL);
+ if (nsblk->id < 0) {
+ kfree(nsblk);
+ return NULL;
+ }
+ dev_set_name(dev, "namespace%d.%d", nd_region->id, nsblk->id);
+ dev->parent = &nd_region->dev;
+ dev->groups = nd_namespace_attribute_groups;
+
+ return &nsblk->dev;
+}
+
+void nd_region_create_blk_seed(struct nd_region *nd_region)
+{
+ WARN_ON(!is_nvdimm_bus_locked(&nd_region->dev));
+ nd_region->ns_seed = nd_namespace_blk_create(nd_region);
+ /*
+ * Seed creation failures are not fatal, provisioning is simply
+ * disabled until memory becomes available
+ */
+ if (!nd_region->ns_seed)
+ dev_err(&nd_region->dev, "failed to create blk namespace\n");
+ else
+ nd_device_register(nd_region->ns_seed);
+}
+
+static struct device **create_namespace_blk(struct nd_region *nd_region)
+{
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+ struct nd_namespace_label *nd_label;
+ struct device *dev, **devs = NULL;
+ struct nd_namespace_blk *nsblk;
+ struct nvdimm_drvdata *ndd;
+ int i, l, count = 0;
+ struct resource *res;
+
+ if (nd_region->ndr_mappings == 0)
+ return NULL;
+
+ ndd = to_ndd(nd_mapping);
+ for_each_label(l, nd_label, nd_mapping->labels) {
+ u32 flags = __le32_to_cpu(nd_label->flags);
+ char *name[NSLABEL_NAME_LEN];
+ struct device **__devs;
+
+ if (flags & NSLABEL_FLAG_LOCAL)
+ /* pass */;
+ else
+ continue;
+
+ for (i = 0; i < count; i++) {
+ nsblk = to_nd_namespace_blk(devs[i]);
+ if (memcmp(nsblk->uuid, nd_label->uuid,
+ NSLABEL_UUID_LEN) == 0) {
+ res = nsblk_add_resource(nd_region, ndd, nsblk,
+ __le64_to_cpu(nd_label->dpa));
+ if (!res)
+ goto err;
+ nd_dbg_dpa(nd_region, ndd, res, "%s assign\n",
+ dev_name(&nsblk->dev));
+ break;
+ }
+ }
+ if (i < count)
+ continue;
+ __devs = kcalloc(count + 2, sizeof(dev), GFP_KERNEL);
+ if (!__devs)
+ goto err;
+ memcpy(__devs, devs, sizeof(dev) * count);
+ kfree(devs);
+ devs = __devs;
+
+ nsblk = kzalloc(sizeof(*nsblk), GFP_KERNEL);
+ if (!nsblk)
+ goto err;
+ dev = &nsblk->dev;
+ dev->type = &namespace_blk_device_type;
+ dev->parent = &nd_region->dev;
+ dev_set_name(dev, "namespace%d.%d", nd_region->id, count);
+ devs[count++] = dev;
+ nsblk->id = -1;
+ nsblk->lbasize = __le64_to_cpu(nd_label->lbasize);
+ nsblk->uuid = kmemdup(nd_label->uuid, NSLABEL_UUID_LEN,
+ GFP_KERNEL);
+ if (!nsblk->uuid)
+ goto err;
+ memcpy(name, nd_label->name, NSLABEL_NAME_LEN);
+ if (name[0])
+ nsblk->alt_name = kmemdup(name, NSLABEL_NAME_LEN,
+ GFP_KERNEL);
+ res = nsblk_add_resource(nd_region, ndd, nsblk,
+ __le64_to_cpu(nd_label->dpa));
+ if (!res)
+ goto err;
+ nd_dbg_dpa(nd_region, ndd, res, "%s assign\n",
+ dev_name(&nsblk->dev));
+ }
+
+ dev_dbg(&nd_region->dev, "%s: discovered %d blk namespace%s\n",
+ __func__, count, count == 1 ? "" : "s");
+
+ if (count == 0) {
+ /* Publish a zero-sized namespace for userspace to configure. */
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+
+ kfree(nd_mapping->labels);
+ nd_mapping->labels = NULL;
+ }
+
+ devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
+ if (!devs)
+ goto err;
+ nsblk = kzalloc(sizeof(*nsblk), GFP_KERNEL);
+ if (!nsblk)
+ goto err;
+ dev = &nsblk->dev;
+ dev->type = &namespace_blk_device_type;
+ dev->parent = &nd_region->dev;
+ devs[count++] = dev;
+ }
+
+ return devs;
+
+err:
+ for (i = 0; i < count; i++) {
+ nsblk = to_nd_namespace_blk(devs[i]);
+ namespace_blk_release(&nsblk->dev);
+ }
+ kfree(devs);
+ return NULL;
+}
+
static int init_active_labels(struct nd_region *nd_region)
{
int i;
@@ -1087,6 +1475,9 @@ int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
case ND_DEVICE_NAMESPACE_PMEM:
devs = create_namespace_pmem(nd_region);
break;
+ case ND_DEVICE_NAMESPACE_BLK:
+ devs = create_namespace_blk(nd_region);
+ break;
default:
break;
}
@@ -1095,15 +1486,50 @@ int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
if (!devs)
return -ENODEV;

- nd_region->ns_seed = devs[0];
for (i = 0; devs[i]; i++) {
struct device *dev = devs[i];
+ int id;

- dev_set_name(dev, "namespace%d.%d", nd_region->id, i);
+ if (type == ND_DEVICE_NAMESPACE_BLK) {
+ struct nd_namespace_blk *nsblk;
+
+ nsblk = to_nd_namespace_blk(dev);
+ id = ida_simple_get(&nd_region->ns_ida, 0, 0,
+ GFP_KERNEL);
+ nsblk->id = id;
+ } else
+ id = i;
+
+ if (id < 0)
+ break;
+ dev_set_name(dev, "namespace%d.%d", nd_region->id, id);
dev->groups = nd_namespace_attribute_groups;
nd_device_register(dev);
}
+ if (i)
+ nd_region->ns_seed = devs[0];
+
+ if (devs[i]) {
+ int j;
+
+ for (j = i; devs[j]; j++) {
+ struct device *dev = devs[j];
+
+ device_initialize(dev);
+ put_device(dev);
+ }
+ *err = j - i;
+ /*
+ * All of the namespaces we tried to register failed, so
+ * fail region activation.
+ */
+ if (*err == 0)
+ rc = -ENODEV;
+ }
kfree(devs);

+ if (rc == -ENODEV)
+ return rc;
+
return i;
}
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index c6c889292bab..22489555a6f1 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -17,6 +17,7 @@
#include <linux/libnvdimm.h>
#include <linux/sizes.h>
#include <linux/mutex.h>
+#include <linux/nd.h>

extern struct list_head nvdimm_bus_list;
extern struct mutex nvdimm_bus_list_mutex;
@@ -48,6 +49,8 @@ struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev);
int __init nvdimm_bus_init(void);
void nvdimm_bus_exit(void);
void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev);
+struct nd_region;
+void nd_region_create_blk_seed(struct nd_region *nd_region);
void nd_region_disable(struct nvdimm_bus *nvdimm_bus, struct device *dev);
int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus);
void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus);
@@ -64,8 +67,13 @@ struct nvdimm_drvdata;
struct nd_mapping;
resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
struct nd_mapping *nd_mapping, resource_size_t *overlap);
+resource_size_t nd_blk_available_dpa(struct nd_mapping *nd_mapping);
resource_size_t nd_region_available_dpa(struct nd_region *nd_region);
resource_size_t nvdimm_allocated_dpa(struct nvdimm_drvdata *ndd,
struct nd_label_id *label_id);
+struct nd_mapping;
+struct resource *nsblk_add_resource(struct nd_region *nd_region,
+ struct nvdimm_drvdata *ndd, struct nd_namespace_blk *nsblk,
+ resource_size_t start);
void get_ndd(struct nvdimm_drvdata *ndd);
#endif /* __ND_CORE_H__ */
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 03e610cd9f43..9b021b626202 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -73,6 +73,7 @@ static inline struct nd_namespace_index *to_next_namespace_index(

struct nd_region {
struct device dev;
+ struct ida ns_ida;
struct device *ns_seed;
u16 ndr_mappings;
u64 ndr_size;
@@ -102,6 +103,10 @@ void nd_device_register(struct device *dev);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode);
int nd_uuid_store(struct device *dev, u8 **uuid_out, const char *buf,
size_t len);
+ssize_t nd_sector_size_show(unsigned long current_lbasize,
+ const unsigned long *supported, char *buf);
+ssize_t nd_sector_size_store(struct device *dev, const char *buf,
+ unsigned long *current_lbasize, const unsigned long *supported);
int __init nvdimm_init(void);
int __init nd_region_init(void);
void nvdimm_exit(void);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b45806f7176d..ac21ce419beb 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -118,7 +118,12 @@ static int is_uuid_busy(struct device *dev, void *data)
break;
}
case ND_DEVICE_NAMESPACE_BLK: {
- /* TODO: blk namespace support */
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+
+ if (!nsblk->uuid)
+ break;
+ if (memcmp(uuid, nsblk->uuid, NSLABEL_UUID_LEN) == 0)
+ return -EBUSY;
break;
}
default:
@@ -230,7 +235,7 @@ resource_size_t nd_region_available_dpa(struct nd_region *nd_region)
goto retry;
}
} else if (is_nd_blk(&nd_region->dev)) {
- /* TODO: BLK Namespace support */
+ available += nd_blk_available_dpa(nd_mapping);
}
}

@@ -360,6 +365,13 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
nd_mapping->ndd = NULL;
atomic_dec(&nvdimm->busy);
}
+ } else if (dev->parent && is_nd_blk(dev->parent) && probe) {
+ struct nd_region *nd_region = to_nd_region(dev->parent);
+
+ nvdimm_bus_lock(dev);
+ if (nd_region->ns_seed == dev)
+ nd_region_create_blk_seed(nd_region);
+ nvdimm_bus_unlock(dev);
}
}

@@ -533,6 +545,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
nd_region->ndr_mappings = ndr_desc->num_mappings;
nd_region->provider_data = ndr_desc->provider_data;
nd_region->nd_set = ndr_desc->nd_set;
+ ida_init(&nd_region->ns_ida);
dev = &nd_region->dev;
dev_set_name(dev, "region%d", nd_region->id);
dev->parent = &nvdimm_bus->dev;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index c130972e08c4..a59dca17b3aa 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -27,6 +27,9 @@ enum {
ND_CMD_MAX_ENVELOPE = 16,
ND_CMD_ARS_STATUS_MAX = SZ_4K,
ND_MAX_MAPPINGS = 32,
+
+ /* mark newly adjusted resources as requiring a label update */
+ DPA_RESOURCE_ADJUSTED = 1 << 0,
};

extern struct attribute_group nvdimm_bus_attribute_group;
diff --git a/include/linux/nd.h b/include/linux/nd.h
index 255c38a83083..23276ea91690 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -50,6 +50,26 @@ struct nd_namespace_pmem {
u8 *uuid;
};

+/**
+ * struct nd_namespace_blk - namespace for dimm-bounded persistent memory
+ * @dev: namespace device creation by the nd region driver
+ * @alt_name: namespace name supplied in the dimm label
+ * @uuid: namespace name supplied in the dimm label
+ * @id: ida allocated id
+ * @lbasize: blk namespaces have a native sector size when btt not present
+ * @num_resources: number of dpa extents to claim
+ * @res: discontiguous dpa extents for given dimm
+ */
+struct nd_namespace_blk {
+ struct device dev;
+ char *alt_name;
+ u8 *uuid;
+ int id;
+ unsigned long lbasize;
+ int num_resources;
+ struct resource **res;
+};
+
static inline struct nd_namespace_io *to_nd_namespace_io(struct device *dev)
{
return container_of(dev, struct nd_namespace_io, dev);
@@ -62,6 +82,11 @@ static inline struct nd_namespace_pmem *to_nd_namespace_pmem(struct device *dev)
return container_of(nsio, struct nd_namespace_pmem, nsio);
}

+static inline struct nd_namespace_blk *to_nd_namespace_blk(struct device *dev)
+{
+ return container_of(dev, struct nd_namespace_blk, dev);
+}
+
#define MODULE_ALIAS_ND_DEVICE(type) \
MODULE_ALIAS("nd:t" __stringify(type) "*")
#define ND_DEVICE_MODALIAS_FMT "nd:t%d"

2015-06-17 23:18:18

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 15/16] libnvdimm: write pmem label set

After 'uuid', 'size', and optionally 'alt_name' have been set to valid
values the labels on the dimms can be updated.

Write procedure is:
1/ Allocate and write new labels in the "next" index
2/ Free the old labels in the working copy
3/ Write the bitmap and the label space on the dimm
4/ Write the index to make the update valid

Label ranges directly mirror the dpa resource values for the given
label_id of the namespace.

Cc: Greg KH <[email protected]>
Cc: Neil Brown <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/dimm_devs.c | 49 ++++++
drivers/nvdimm/label.c | 328 +++++++++++++++++++++++++++++++++++++++
drivers/nvdimm/label.h | 6 +
drivers/nvdimm/namespace_devs.c | 83 +++++++++-
drivers/nvdimm/nd.h | 3
5 files changed, 455 insertions(+), 14 deletions(-)

diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 101d3b76e405..156d518a089c 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -132,6 +132,55 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
return rc;
}

+int nvdimm_set_config_data(struct nvdimm_drvdata *ndd, size_t offset,
+ void *buf, size_t len)
+{
+ int rc = validate_dimm(ndd);
+ size_t max_cmd_size, buf_offset;
+ struct nd_cmd_set_config_hdr *cmd;
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(ndd->dev);
+ struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+
+ if (rc)
+ return rc;
+
+ if (!ndd->data)
+ return -ENXIO;
+
+ if (offset + len > ndd->nsarea.config_size)
+ return -ENXIO;
+
+ max_cmd_size = min_t(u32, PAGE_SIZE, len);
+ max_cmd_size = min_t(u32, max_cmd_size, ndd->nsarea.max_xfer);
+ cmd = kzalloc(max_cmd_size + sizeof(*cmd) + sizeof(u32), GFP_KERNEL);
+ if (!cmd)
+ return -ENOMEM;
+
+ for (buf_offset = 0; len; len -= cmd->in_length,
+ buf_offset += cmd->in_length) {
+ size_t cmd_size;
+ u32 *status;
+
+ cmd->in_offset = offset + buf_offset;
+ cmd->in_length = min(max_cmd_size, len);
+ memcpy(cmd->in_buf, buf + buf_offset, cmd->in_length);
+
+ /* status is output in the last 4-bytes of the command buffer */
+ cmd_size = sizeof(*cmd) + cmd->in_length + sizeof(u32);
+ status = ((void *) cmd) + cmd_size - sizeof(u32);
+
+ rc = nd_desc->ndctl(nd_desc, to_nvdimm(ndd->dev),
+ ND_CMD_SET_CONFIG_DATA, cmd, cmd_size);
+ if (rc || *status) {
+ rc = rc ? rc : -ENXIO;
+ break;
+ }
+ }
+ kfree(cmd);
+
+ return rc;
+}
+
static void nvdimm_release(struct device *dev)
{
struct nvdimm *nvdimm = to_nvdimm(dev);
diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index 1a3bcd27a57a..ffa85d700459 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -12,6 +12,7 @@
*/
#include <linux/device.h>
#include <linux/ndctl.h>
+#include <linux/slab.h>
#include <linux/io.h>
#include <linux/nd.h>
#include "nd-core.h"
@@ -55,6 +56,11 @@ size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd)
return ndd->nsindex_size;
}

+static int nvdimm_num_label_slots(struct nvdimm_drvdata *ndd)
+{
+ return ndd->nsarea.config_size / 129;
+}
+
int nd_label_validate(struct nvdimm_drvdata *ndd)
{
/*
@@ -201,25 +207,32 @@ static struct nd_namespace_label *nd_label_base(struct nvdimm_drvdata *ndd)
return base + 2 * sizeof_namespace_index(ndd);
}

+static int to_slot(struct nvdimm_drvdata *ndd,
+ struct nd_namespace_label *nd_label)
+{
+ return nd_label - nd_label_base(ndd);
+}
+
#define for_each_clear_bit_le(bit, addr, size) \
for ((bit) = find_next_zero_bit_le((addr), (size), 0); \
(bit) < (size); \
(bit) = find_next_zero_bit_le((addr), (size), (bit) + 1))

/**
- * preamble_current - common variable initialization for nd_label_* routines
+ * preamble_index - common variable initialization for nd_label_* routines
* @ndd: dimm container for the relevant label set
+ * @idx: namespace_index index
* @nsindex_out: on return set to the currently active namespace index
* @free: on return set to the free label bitmap in the index
* @nslot: on return set to the number of slots in the label space
*/
-static bool preamble_current(struct nvdimm_drvdata *ndd,
+static bool preamble_index(struct nvdimm_drvdata *ndd, int idx,
struct nd_namespace_index **nsindex_out,
unsigned long **free, u32 *nslot)
{
struct nd_namespace_index *nsindex;

- nsindex = to_current_namespace_index(ndd);
+ nsindex = to_namespace_index(ndd, idx);
if (nsindex == NULL)
return false;

@@ -239,6 +252,22 @@ char *nd_label_gen_id(struct nd_label_id *label_id, u8 *uuid, u32 flags)
return label_id->id;
}

+static bool preamble_current(struct nvdimm_drvdata *ndd,
+ struct nd_namespace_index **nsindex,
+ unsigned long **free, u32 *nslot)
+{
+ return preamble_index(ndd, ndd->ns_current, nsindex,
+ free, nslot);
+}
+
+static bool preamble_next(struct nvdimm_drvdata *ndd,
+ struct nd_namespace_index **nsindex,
+ unsigned long **free, u32 *nslot)
+{
+ return preamble_index(ndd, ndd->ns_next, nsindex,
+ free, nslot);
+}
+
static bool slot_valid(struct nd_namespace_label *nd_label, u32 slot)
{
/* check that we are written where we expect to be written */
@@ -341,3 +370,296 @@ struct nd_namespace_label *nd_label_active(struct nvdimm_drvdata *ndd, int n)

return NULL;
}
+
+static u32 nd_label_alloc_slot(struct nvdimm_drvdata *ndd)
+{
+ struct nd_namespace_index *nsindex;
+ unsigned long *free;
+ u32 nslot, slot;
+
+ if (!preamble_next(ndd, &nsindex, &free, &nslot))
+ return UINT_MAX;
+
+ WARN_ON(!is_nvdimm_bus_locked(ndd->dev));
+
+ slot = find_next_bit_le(free, nslot, 0);
+ if (slot == nslot)
+ return UINT_MAX;
+
+ clear_bit_le(slot, free);
+
+ return slot;
+}
+
+static bool nd_label_free_slot(struct nvdimm_drvdata *ndd, u32 slot)
+{
+ struct nd_namespace_index *nsindex;
+ unsigned long *free;
+ u32 nslot;
+
+ if (!preamble_next(ndd, &nsindex, &free, &nslot))
+ return false;
+
+ WARN_ON(!is_nvdimm_bus_locked(ndd->dev));
+
+ if (slot < nslot)
+ return !test_and_set_bit_le(slot, free);
+ return false;
+}
+
+u32 nd_label_nfree(struct nvdimm_drvdata *ndd)
+{
+ struct nd_namespace_index *nsindex;
+ unsigned long *free;
+ u32 nslot;
+
+ WARN_ON(!is_nvdimm_bus_locked(ndd->dev));
+
+ if (!preamble_next(ndd, &nsindex, &free, &nslot))
+ return 0;
+
+ return bitmap_weight(free, nslot);
+}
+
+static int nd_label_write_index(struct nvdimm_drvdata *ndd, int index, u32 seq,
+ unsigned long flags)
+{
+ struct nd_namespace_index *nsindex;
+ unsigned long offset;
+ u64 checksum;
+ u32 nslot;
+ int rc;
+
+ nsindex = to_namespace_index(ndd, index);
+ if (flags & ND_NSINDEX_INIT)
+ nslot = nvdimm_num_label_slots(ndd);
+ else
+ nslot = __le32_to_cpu(nsindex->nslot);
+
+ memcpy(nsindex->sig, NSINDEX_SIGNATURE, NSINDEX_SIG_LEN);
+ nsindex->flags = __cpu_to_le32(0);
+ nsindex->seq = __cpu_to_le32(seq);
+ offset = (unsigned long) nsindex
+ - (unsigned long) to_namespace_index(ndd, 0);
+ nsindex->myoff = __cpu_to_le64(offset);
+ nsindex->mysize = __cpu_to_le64(sizeof_namespace_index(ndd));
+ offset = (unsigned long) to_namespace_index(ndd,
+ nd_label_next_nsindex(index))
+ - (unsigned long) to_namespace_index(ndd, 0);
+ nsindex->otheroff = __cpu_to_le64(offset);
+ offset = (unsigned long) nd_label_base(ndd)
+ - (unsigned long) to_namespace_index(ndd, 0);
+ nsindex->labeloff = __cpu_to_le64(offset);
+ nsindex->nslot = __cpu_to_le32(nslot);
+ nsindex->major = __cpu_to_le16(1);
+ nsindex->minor = __cpu_to_le16(1);
+ nsindex->checksum = __cpu_to_le64(0);
+ if (flags & ND_NSINDEX_INIT) {
+ unsigned long *free = (unsigned long *) nsindex->free;
+ u32 nfree = ALIGN(nslot, BITS_PER_LONG);
+ int last_bits, i;
+
+ memset(nsindex->free, 0xff, nfree / 8);
+ for (i = 0, last_bits = nfree - nslot; i < last_bits; i++)
+ clear_bit_le(nslot + i, free);
+ }
+ checksum = nd_fletcher64(nsindex, sizeof_namespace_index(ndd), 1);
+ nsindex->checksum = __cpu_to_le64(checksum);
+ rc = nvdimm_set_config_data(ndd, __le64_to_cpu(nsindex->myoff),
+ nsindex, sizeof_namespace_index(ndd));
+ if (rc < 0)
+ return rc;
+
+ if (flags & ND_NSINDEX_INIT)
+ return 0;
+
+ /* copy the index we just wrote to the new 'next' */
+ WARN_ON(index != ndd->ns_next);
+ nd_label_copy(ndd, to_current_namespace_index(ndd), nsindex);
+ ndd->ns_current = nd_label_next_nsindex(ndd->ns_current);
+ ndd->ns_next = nd_label_next_nsindex(ndd->ns_next);
+ WARN_ON(ndd->ns_current == ndd->ns_next);
+
+ return 0;
+}
+
+static unsigned long nd_label_offset(struct nvdimm_drvdata *ndd,
+ struct nd_namespace_label *nd_label)
+{
+ return (unsigned long) nd_label
+ - (unsigned long) to_namespace_index(ndd, 0);
+}
+
+static int __pmem_label_update(struct nd_region *nd_region,
+ struct nd_mapping *nd_mapping, struct nd_namespace_pmem *nspm,
+ int pos)
+{
+ u64 cookie = nd_region_interleave_set_cookie(nd_region), rawsize;
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct nd_namespace_label *victim_label;
+ struct nd_namespace_label *nd_label;
+ struct nd_namespace_index *nsindex;
+ unsigned long *free;
+ u32 nslot, slot;
+ size_t offset;
+ int rc;
+
+ if (!preamble_next(ndd, &nsindex, &free, &nslot))
+ return -ENXIO;
+
+ /* allocate and write the label to the staging (next) index */
+ slot = nd_label_alloc_slot(ndd);
+ if (slot == UINT_MAX)
+ return -ENXIO;
+ dev_dbg(ndd->dev, "%s: allocated: %d\n", __func__, slot);
+
+ nd_label = nd_label_base(ndd) + slot;
+ memset(nd_label, 0, sizeof(struct nd_namespace_label));
+ memcpy(nd_label->uuid, nspm->uuid, NSLABEL_UUID_LEN);
+ if (nspm->alt_name)
+ memcpy(nd_label->name, nspm->alt_name, NSLABEL_NAME_LEN);
+ nd_label->flags = __cpu_to_le32(NSLABEL_FLAG_UPDATING);
+ nd_label->nlabel = __cpu_to_le16(nd_region->ndr_mappings);
+ nd_label->position = __cpu_to_le16(pos);
+ nd_label->isetcookie = __cpu_to_le64(cookie);
+ rawsize = div_u64(resource_size(&nspm->nsio.res),
+ nd_region->ndr_mappings);
+ nd_label->rawsize = __cpu_to_le64(rawsize);
+ nd_label->dpa = __cpu_to_le64(nd_mapping->start);
+ nd_label->slot = __cpu_to_le32(slot);
+
+ /* update label */
+ offset = nd_label_offset(ndd, nd_label);
+ rc = nvdimm_set_config_data(ndd, offset, nd_label,
+ sizeof(struct nd_namespace_label));
+ if (rc < 0)
+ return rc;
+
+ /* Garbage collect the previous label */
+ victim_label = nd_mapping->labels[0];
+ if (victim_label) {
+ slot = to_slot(ndd, victim_label);
+ nd_label_free_slot(ndd, slot);
+ dev_dbg(ndd->dev, "%s: free: %d\n", __func__, slot);
+ }
+
+ /* update index */
+ rc = nd_label_write_index(ndd, ndd->ns_next,
+ nd_inc_seq(__le32_to_cpu(nsindex->seq)), 0);
+ if (rc < 0)
+ return rc;
+
+ nd_mapping->labels[0] = nd_label;
+
+ return 0;
+}
+
+static int init_labels(struct nd_mapping *nd_mapping)
+{
+ int i;
+ struct nd_namespace_index *nsindex;
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+
+ if (!nd_mapping->labels)
+ nd_mapping->labels = kcalloc(2, sizeof(void *), GFP_KERNEL);
+
+ if (!nd_mapping->labels)
+ return -ENOMEM;
+
+ if (ndd->ns_current == -1 || ndd->ns_next == -1)
+ /* pass */;
+ else
+ return 0;
+
+ nsindex = to_namespace_index(ndd, 0);
+ memset(nsindex, 0, ndd->nsarea.config_size);
+ for (i = 0; i < 2; i++) {
+ int rc = nd_label_write_index(ndd, i, i*2, ND_NSINDEX_INIT);
+
+ if (rc)
+ return rc;
+ }
+ ndd->ns_next = 1;
+ ndd->ns_current = 0;
+
+ return 0;
+}
+
+static int del_labels(struct nd_mapping *nd_mapping, u8 *uuid)
+{
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct nd_namespace_label *nd_label;
+ struct nd_namespace_index *nsindex;
+ u8 label_uuid[NSLABEL_UUID_LEN];
+ int l, num_freed = 0;
+ unsigned long *free;
+ u32 nslot, slot;
+
+ if (!uuid)
+ return 0;
+
+ /* no index || no labels == nothing to delete */
+ if (!preamble_next(ndd, &nsindex, &free, &nslot)
+ || !nd_mapping->labels)
+ return 0;
+
+ for_each_label(l, nd_label, nd_mapping->labels) {
+ int j;
+
+ memcpy(label_uuid, nd_label->uuid, NSLABEL_UUID_LEN);
+ if (memcmp(label_uuid, uuid, NSLABEL_UUID_LEN) != 0)
+ continue;
+ slot = to_slot(ndd, nd_label);
+ nd_label_free_slot(ndd, slot);
+ dev_dbg(ndd->dev, "%s: free: %d\n", __func__, slot);
+ for (j = l; nd_mapping->labels[j + 1]; j++) {
+ struct nd_namespace_label *next_label;
+
+ next_label = nd_mapping->labels[j + 1];
+ nd_mapping->labels[j] = next_label;
+ }
+ nd_mapping->labels[j] = NULL;
+ num_freed++;
+ }
+
+ if (num_freed > l) {
+ /*
+ * num_freed will only ever be > l when we delete the last
+ * label
+ */
+ kfree(nd_mapping->labels);
+ nd_mapping->labels = NULL;
+ dev_dbg(ndd->dev, "%s: no more labels\n", __func__);
+ }
+
+ return nd_label_write_index(ndd, ndd->ns_next,
+ nd_inc_seq(__le32_to_cpu(nsindex->seq)), 0);
+}
+
+int nd_pmem_namespace_label_update(struct nd_region *nd_region,
+ struct nd_namespace_pmem *nspm, resource_size_t size)
+{
+ int i;
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ int rc;
+
+ if (size == 0) {
+ rc = del_labels(nd_mapping, nspm->uuid);
+ if (rc)
+ return rc;
+ continue;
+ }
+
+ rc = init_labels(nd_mapping);
+ if (rc)
+ return rc;
+
+ rc = __pmem_label_update(nd_region, nd_mapping, nspm, i);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
diff --git a/drivers/nvdimm/label.h b/drivers/nvdimm/label.h
index 8ee1376526c7..6d376be31937 100644
--- a/drivers/nvdimm/label.h
+++ b/drivers/nvdimm/label.h
@@ -34,6 +34,7 @@ enum {
BTTINFO_MAJOR_VERSION = 1,
ND_LABEL_MIN_SIZE = 512 * 129, /* see sizeof_namespace_index() */
ND_LABEL_ID_SIZE = 50,
+ ND_NSINDEX_INIT = 0x1,
};

static const char NSINDEX_SIGNATURE[] = "NAMESPACE_INDEX\0";
@@ -127,4 +128,9 @@ void nd_label_copy(struct nvdimm_drvdata *ndd, struct nd_namespace_index *dst,
size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd);
int nd_label_active_count(struct nvdimm_drvdata *ndd);
struct nd_namespace_label *nd_label_active(struct nvdimm_drvdata *ndd, int n);
+u32 nd_label_nfree(struct nvdimm_drvdata *ndd);
+struct nd_region;
+struct nd_namespace_pmem;
+int nd_pmem_namespace_label_update(struct nd_region *nd_region,
+ struct nd_namespace_pmem *nspm, resource_size_t size);
#endif /* __LABEL_H__ */
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index ad0ec09ca40f..546e77e122fd 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -149,20 +149,53 @@ static resource_size_t nd_namespace_blk_size(struct nd_namespace_blk *nsblk)
return size;
}

+static int nd_namespace_label_update(struct nd_region *nd_region,
+ struct device *dev)
+{
+ dev_WARN_ONCE(dev, dev->driver,
+ "namespace must be idle during label update\n");
+ if (dev->driver)
+ return 0;
+
+ /*
+ * Only allow label writes that will result in a valid namespace
+ * or deletion of an existing namespace.
+ */
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+ struct resource *res = &nspm->nsio.res;
+ resource_size_t size = resource_size(res);
+
+ if (size == 0 && nspm->uuid)
+ /* delete allocation */;
+ else if (!nspm->uuid)
+ return 0;
+
+ return nd_pmem_namespace_label_update(nd_region, nspm, size);
+ } else if (is_namespace_blk(dev)) {
+ /* TODO: implement blk labels */
+ return 0;
+ } else
+ return -ENXIO;
+}
+
static ssize_t alt_name_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
+ struct nd_region *nd_region = to_nd_region(dev->parent);
ssize_t rc;

device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
rc = __alt_name_store(dev, buf, len);
+ if (rc >= 0)
+ rc = nd_namespace_label_update(nd_region, dev);
dev_dbg(dev, "%s: %s(%zd)\n", __func__, rc < 0 ? "fail " : "", rc);
nvdimm_bus_unlock(dev);
device_unlock(dev);

- return rc;
+ return rc < 0 ? rc : len;
}

static ssize_t alt_name_show(struct device *dev,
@@ -709,6 +742,7 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
static ssize_t size_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
+ struct nd_region *nd_region = to_nd_region(dev->parent);
unsigned long long val;
u8 **uuid = NULL;
int rc;
@@ -721,6 +755,8 @@ static ssize_t size_store(struct device *dev,
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
rc = __size_store(dev, val);
+ if (rc >= 0)
+ rc = nd_namespace_label_update(nd_region, dev);

if (is_namespace_pmem(dev)) {
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
@@ -744,7 +780,7 @@ static ssize_t size_store(struct device *dev,
nvdimm_bus_unlock(dev);
device_unlock(dev);

- return rc ? rc : len;
+ return rc < 0 ? rc : len;
}

static ssize_t size_show(struct device *dev,
@@ -804,17 +840,34 @@ static int namespace_update_uuid(struct nd_region *nd_region,
u32 flags = is_namespace_blk(dev) ? NSLABEL_FLAG_LOCAL : 0;
struct nd_label_id old_label_id;
struct nd_label_id new_label_id;
- int i, rc;
+ int i;

- rc = nd_is_uuid_unique(dev, new_uuid) ? 0 : -EINVAL;
- if (rc) {
- kfree(new_uuid);
- return rc;
- }
+ if (!nd_is_uuid_unique(dev, new_uuid))
+ return -EINVAL;

if (*old_uuid == NULL)
goto out;

+ /*
+ * If we've already written a label with this uuid, then it's
+ * too late to rename because we can't reliably update the uuid
+ * without losing the old namespace. Userspace must delete this
+ * namespace to abandon the old uuid.
+ */
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+
+ /*
+ * This check by itself is sufficient because old_uuid
+ * would be NULL above if this uuid did not exist in the
+ * currently written set.
+ *
+ * FIXME: can we delete uuid with zero dpa allocated?
+ */
+ if (nd_mapping->labels)
+ return -EBUSY;
+ }
+
nd_label_gen_id(&old_label_id, *old_uuid, flags);
nd_label_gen_id(&new_label_id, new_uuid, flags);
for (i = 0; i < nd_region->ndr_mappings; i++) {
@@ -858,12 +911,16 @@ static ssize_t uuid_store(struct device *dev,
rc = nd_uuid_store(dev, &uuid, buf, len);
if (rc >= 0)
rc = namespace_update_uuid(nd_region, dev, uuid, ns_uuid);
+ if (rc >= 0)
+ rc = nd_namespace_label_update(nd_region, dev);
+ else
+ kfree(uuid);
dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
rc, buf, buf[len - 1] == '\n' ? "" : "\n");
nvdimm_bus_unlock(dev);
device_unlock(dev);

- return rc ? rc : len;
+ return rc < 0 ? rc : len;
}
static DEVICE_ATTR_RW(uuid);

@@ -907,6 +964,7 @@ static ssize_t sector_size_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+ struct nd_region *nd_region = to_nd_region(dev->parent);
ssize_t rc;

if (!is_namespace_blk(dev))
@@ -916,8 +974,11 @@ static ssize_t sector_size_store(struct device *dev,
nvdimm_bus_lock(dev);
rc = nd_sector_size_store(dev, buf, &nsblk->lbasize,
ns_lbasize_supported);
- dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
- rc, buf, buf[len - 1] == '\n' ? "" : "\n");
+ if (rc >= 0)
+ rc = nd_namespace_label_update(nd_region, dev);
+ dev_dbg(dev, "%s: result: %zd %s: %s%s", __func__,
+ rc, rc < 0 ? "tried" : "wrote", buf,
+ buf[len - 1] == '\n' ? "" : "\n");
nvdimm_bus_unlock(dev);
device_unlock(dev);

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 9b021b626202..bfa849617358 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -93,6 +93,7 @@ static inline unsigned nd_inc_seq(unsigned seq)

return next[seq & 3];
}
+
enum nd_async_mode {
ND_SYNC,
ND_ASYNC,
@@ -115,6 +116,8 @@ struct nvdimm;
struct nvdimm_drvdata *to_ndd(struct nd_mapping *nd_mapping);
int nvdimm_init_nsarea(struct nvdimm_drvdata *ndd);
int nvdimm_init_config_data(struct nvdimm_drvdata *ndd);
+int nvdimm_set_config_data(struct nvdimm_drvdata *ndd, size_t offset,
+ void *buf, size_t len);
struct nd_region *to_nd_region(struct device *dev);
int nd_region_to_nstype(struct nd_region *nd_region);
int nd_region_register_namespaces(struct nd_region *nd_region, int *err);

2015-06-17 23:17:37

by Dan Williams

[permalink] [raw]
Subject: [PATCH v7 16/16] libnvdimm: write blk label set

After 'uuid', 'size', 'sector_size', and optionally 'alt_name' have been
set to valid values the labels on the dimm can be updated. The
difference with the pmem case is that blk namespaces are limited to one
dimm and can cover discontiguous ranges in dpa space.

Also, after allocating label slots, it is useful for userspace to know
how many slots are left. Export this information in sysfs.

Cc: Greg KH <[email protected]>
Cc: Neil Brown <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/bus.c | 4 +
drivers/nvdimm/dimm_devs.c | 25 +++
drivers/nvdimm/label.c | 301 ++++++++++++++++++++++++++++++++++++---
drivers/nvdimm/label.h | 5 +
drivers/nvdimm/namespace_devs.c | 57 +++++++
drivers/nvdimm/nd-core.h | 1
6 files changed, 369 insertions(+), 24 deletions(-)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index fddc3f2a8f80..ca802702440e 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -155,6 +155,10 @@ static void nd_async_device_unregister(void *d, async_cookie_t cookie)
{
struct device *dev = d;

+ /* flush bus operations before delete */
+ nvdimm_bus_lock(dev);
+ nvdimm_bus_unlock(dev);
+
device_unregister(dev);
put_device(dev);
}
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 156d518a089c..83b179ed6d61 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -19,6 +19,7 @@
#include <linux/fs.h>
#include <linux/mm.h>
#include "nd-core.h"
+#include "label.h"
#include "nd.h"

static DEFINE_IDA(dimm_ida);
@@ -296,9 +297,33 @@ static ssize_t state_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(state);

+static ssize_t available_slots_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
+ ssize_t rc;
+ u32 nfree;
+
+ if (!ndd)
+ return -ENXIO;
+
+ nvdimm_bus_lock(dev);
+ nfree = nd_label_nfree(ndd);
+ if (nfree - 1 > nfree) {
+ dev_WARN_ONCE(dev, 1, "we ate our last label?\n");
+ nfree = 0;
+ } else
+ nfree--;
+ rc = sprintf(buf, "%d\n", nfree);
+ nvdimm_bus_unlock(dev);
+ return rc;
+}
+static DEVICE_ATTR_RO(available_slots);
+
static struct attribute *nvdimm_attributes[] = {
&dev_attr_state.attr,
&dev_attr_commands.attr,
+ &dev_attr_available_slots.attr,
NULL,
};

diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index ffa85d700459..34148003fc73 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -56,7 +56,7 @@ size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd)
return ndd->nsindex_size;
}

-static int nvdimm_num_label_slots(struct nvdimm_drvdata *ndd)
+int nvdimm_num_label_slots(struct nvdimm_drvdata *ndd)
{
return ndd->nsarea.config_size / 129;
}
@@ -371,7 +371,7 @@ struct nd_namespace_label *nd_label_active(struct nvdimm_drvdata *ndd, int n)
return NULL;
}

-static u32 nd_label_alloc_slot(struct nvdimm_drvdata *ndd)
+u32 nd_label_alloc_slot(struct nvdimm_drvdata *ndd)
{
struct nd_namespace_index *nsindex;
unsigned long *free;
@@ -391,7 +391,7 @@ static u32 nd_label_alloc_slot(struct nvdimm_drvdata *ndd)
return slot;
}

-static bool nd_label_free_slot(struct nvdimm_drvdata *ndd, u32 slot)
+bool nd_label_free_slot(struct nvdimm_drvdata *ndd, u32 slot)
{
struct nd_namespace_index *nsindex;
unsigned long *free;
@@ -416,7 +416,7 @@ u32 nd_label_nfree(struct nvdimm_drvdata *ndd)
WARN_ON(!is_nvdimm_bus_locked(ndd->dev));

if (!preamble_next(ndd, &nsindex, &free, &nslot))
- return 0;
+ return nvdimm_num_label_slots(ndd);

return bitmap_weight(free, nslot);
}
@@ -554,22 +554,270 @@ static int __pmem_label_update(struct nd_region *nd_region,
return 0;
}

-static int init_labels(struct nd_mapping *nd_mapping)
+static void del_label(struct nd_mapping *nd_mapping, int l)
+{
+ struct nd_namespace_label *next_label, *nd_label;
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ unsigned int slot;
+ int j;
+
+ nd_label = nd_mapping->labels[l];
+ slot = to_slot(ndd, nd_label);
+ dev_vdbg(ndd->dev, "%s: clear: %d\n", __func__, slot);
+
+ for (j = l; (next_label = nd_mapping->labels[j + 1]); j++)
+ nd_mapping->labels[j] = next_label;
+ nd_mapping->labels[j] = NULL;
+}
+
+static bool is_old_resource(struct resource *res, struct resource **list, int n)
{
int i;
+
+ if (res->flags & DPA_RESOURCE_ADJUSTED)
+ return false;
+ for (i = 0; i < n; i++)
+ if (res == list[i])
+ return true;
+ return false;
+}
+
+static struct resource *to_resource(struct nvdimm_drvdata *ndd,
+ struct nd_namespace_label *nd_label)
+{
+ struct resource *res;
+
+ for_each_dpa_resource(ndd, res) {
+ if (res->start != __le64_to_cpu(nd_label->dpa))
+ continue;
+ if (resource_size(res) != __le64_to_cpu(nd_label->rawsize))
+ continue;
+ return res;
+ }
+
+ return NULL;
+}
+
+/*
+ * 1/ Account all the labels that can be freed after this update
+ * 2/ Allocate and write the label to the staging (next) index
+ * 3/ Record the resources in the namespace device
+ */
+static int __blk_label_update(struct nd_region *nd_region,
+ struct nd_mapping *nd_mapping, struct nd_namespace_blk *nsblk,
+ int num_labels)
+{
+ int i, l, alloc, victims, nfree, old_num_resources, nlabel, rc = -ENXIO;
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct nd_namespace_label *nd_label;
+ struct nd_namespace_index *nsindex;
+ unsigned long *free, *victim_map = NULL;
+ struct resource *res, **old_res_list;
+ struct nd_label_id label_id;
+ u8 uuid[NSLABEL_UUID_LEN];
+ u32 nslot, slot;
+
+ if (!preamble_next(ndd, &nsindex, &free, &nslot))
+ return -ENXIO;
+
+ old_res_list = nsblk->res;
+ nfree = nd_label_nfree(ndd);
+ old_num_resources = nsblk->num_resources;
+ nd_label_gen_id(&label_id, nsblk->uuid, NSLABEL_FLAG_LOCAL);
+
+ /*
+ * We need to loop over the old resources a few times, which seems a
+ * bit inefficient, but we need to know that we have the label
+ * space before we start mutating the tracking structures.
+ * Otherwise the recovery method of last resort for userspace is
+ * disable and re-enable the parent region.
+ */
+ alloc = 0;
+ for_each_dpa_resource(ndd, res) {
+ if (strcmp(res->name, label_id.id) != 0)
+ continue;
+ if (!is_old_resource(res, old_res_list, old_num_resources))
+ alloc++;
+ }
+
+ victims = 0;
+ if (old_num_resources) {
+ /* convert old local-label-map to dimm-slot victim-map */
+ victim_map = kcalloc(BITS_TO_LONGS(nslot), sizeof(long),
+ GFP_KERNEL);
+ if (!victim_map)
+ return -ENOMEM;
+
+ /* mark unused labels for garbage collection */
+ for_each_clear_bit_le(slot, free, nslot) {
+ nd_label = nd_label_base(ndd) + slot;
+ memcpy(uuid, nd_label->uuid, NSLABEL_UUID_LEN);
+ if (memcmp(uuid, nsblk->uuid, NSLABEL_UUID_LEN) != 0)
+ continue;
+ res = to_resource(ndd, nd_label);
+ if (res && is_old_resource(res, old_res_list,
+ old_num_resources))
+ continue;
+ slot = to_slot(ndd, nd_label);
+ set_bit(slot, victim_map);
+ victims++;
+ }
+ }
+
+ /* don't allow updates that consume the last label */
+ if (nfree - alloc < 0 || nfree - alloc + victims < 1) {
+ dev_info(&nsblk->dev, "insufficient label space\n");
+ kfree(victim_map);
+ return -ENOSPC;
+ }
+ /* from here on we need to abort on error */
+
+
+ /* assign all resources to the namespace before writing the labels */
+ nsblk->res = NULL;
+ nsblk->num_resources = 0;
+ for_each_dpa_resource(ndd, res) {
+ if (strcmp(res->name, label_id.id) != 0)
+ continue;
+ if (!nsblk_add_resource(nd_region, ndd, nsblk, res->start)) {
+ rc = -ENOMEM;
+ goto abort;
+ }
+ }
+
+ for (i = 0; i < nsblk->num_resources; i++) {
+ size_t offset;
+
+ res = nsblk->res[i];
+ if (is_old_resource(res, old_res_list, old_num_resources))
+ continue; /* carry-over */
+ slot = nd_label_alloc_slot(ndd);
+ if (slot == UINT_MAX)
+ goto abort;
+ dev_dbg(ndd->dev, "%s: allocated: %d\n", __func__, slot);
+
+ nd_label = nd_label_base(ndd) + slot;
+ memset(nd_label, 0, sizeof(struct nd_namespace_label));
+ memcpy(nd_label->uuid, nsblk->uuid, NSLABEL_UUID_LEN);
+ if (nsblk->alt_name)
+ memcpy(nd_label->name, nsblk->alt_name,
+ NSLABEL_NAME_LEN);
+ nd_label->flags = __cpu_to_le32(NSLABEL_FLAG_LOCAL);
+ nd_label->nlabel = __cpu_to_le16(0); /* N/A */
+ nd_label->position = __cpu_to_le16(0); /* N/A */
+ nd_label->isetcookie = __cpu_to_le64(0); /* N/A */
+ nd_label->dpa = __cpu_to_le64(res->start);
+ nd_label->rawsize = __cpu_to_le64(resource_size(res));
+ nd_label->lbasize = __cpu_to_le64(nsblk->lbasize);
+ nd_label->slot = __cpu_to_le32(slot);
+
+ /* update label */
+ offset = nd_label_offset(ndd, nd_label);
+ rc = nvdimm_set_config_data(ndd, offset, nd_label,
+ sizeof(struct nd_namespace_label));
+ if (rc < 0)
+ goto abort;
+ }
+
+ /* free up now unused slots in the new index */
+ for_each_set_bit(slot, victim_map, victim_map ? nslot : 0) {
+ dev_dbg(ndd->dev, "%s: free: %d\n", __func__, slot);
+ nd_label_free_slot(ndd, slot);
+ }
+
+ /* update index */
+ rc = nd_label_write_index(ndd, ndd->ns_next,
+ nd_inc_seq(__le32_to_cpu(nsindex->seq)), 0);
+ if (rc)
+ goto abort;
+
+ /*
+ * Now that the on-dimm labels are up to date, fix up the tracking
+ * entries in nd_mapping->labels
+ */
+ nlabel = 0;
+ for_each_label(l, nd_label, nd_mapping->labels) {
+ nlabel++;
+ memcpy(uuid, nd_label->uuid, NSLABEL_UUID_LEN);
+ if (memcmp(uuid, nsblk->uuid, NSLABEL_UUID_LEN) != 0)
+ continue;
+ nlabel--;
+ del_label(nd_mapping, l);
+ l--; /* retry with the new label at this index */
+ }
+ if (nlabel + nsblk->num_resources > num_labels) {
+ /*
+ * Bug, we can't end up with more resources than
+ * available labels
+ */
+ WARN_ON_ONCE(1);
+ rc = -ENXIO;
+ goto out;
+ }
+
+ for_each_clear_bit_le(slot, free, nslot) {
+ nd_label = nd_label_base(ndd) + slot;
+ memcpy(uuid, nd_label->uuid, NSLABEL_UUID_LEN);
+ if (memcmp(uuid, nsblk->uuid, NSLABEL_UUID_LEN) != 0)
+ continue;
+ res = to_resource(ndd, nd_label);
+ res->flags &= ~DPA_RESOURCE_ADJUSTED;
+ dev_vdbg(&nsblk->dev, "assign label[%d] slot: %d\n", l, slot);
+ nd_mapping->labels[l++] = nd_label;
+ }
+ nd_mapping->labels[l] = NULL;
+
+ out:
+ kfree(old_res_list);
+ kfree(victim_map);
+ return rc;
+
+ abort:
+ /*
+ * 1/ repair the allocated label bitmap in the index
+ * 2/ restore the resource list
+ */
+ nd_label_copy(ndd, nsindex, to_current_namespace_index(ndd));
+ kfree(nsblk->res);
+ nsblk->res = old_res_list;
+ nsblk->num_resources = old_num_resources;
+ old_res_list = NULL;
+ goto out;
+}
+
+static int init_labels(struct nd_mapping *nd_mapping, int num_labels)
+{
+ int i, l, old_num_labels = 0;
struct nd_namespace_index *nsindex;
+ struct nd_namespace_label *nd_label;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ size_t size = (num_labels + 1) * sizeof(struct nd_namespace_label *);

- if (!nd_mapping->labels)
- nd_mapping->labels = kcalloc(2, sizeof(void *), GFP_KERNEL);
+ for_each_label(l, nd_label, nd_mapping->labels)
+ old_num_labels++;

+ /*
+ * We need to preserve all the old labels for the mapping so
+ * they can be garbage collected after writing the new labels.
+ */
+ if (num_labels > old_num_labels) {
+ struct nd_namespace_label **labels;
+
+ labels = krealloc(nd_mapping->labels, size, GFP_KERNEL);
+ if (!labels)
+ return -ENOMEM;
+ nd_mapping->labels = labels;
+ }
if (!nd_mapping->labels)
return -ENOMEM;

+ for (i = old_num_labels; i <= num_labels; i++)
+ nd_mapping->labels[i] = NULL;
+
if (ndd->ns_current == -1 || ndd->ns_next == -1)
/* pass */;
else
- return 0;
+ return max(num_labels, old_num_labels);

nsindex = to_namespace_index(ndd, 0);
memset(nsindex, 0, ndd->nsarea.config_size);
@@ -582,7 +830,7 @@ static int init_labels(struct nd_mapping *nd_mapping)
ndd->ns_next = 1;
ndd->ns_current = 0;

- return 0;
+ return max(num_labels, old_num_labels);
}

static int del_labels(struct nd_mapping *nd_mapping, u8 *uuid)
@@ -604,22 +852,15 @@ static int del_labels(struct nd_mapping *nd_mapping, u8 *uuid)
return 0;

for_each_label(l, nd_label, nd_mapping->labels) {
- int j;
-
memcpy(label_uuid, nd_label->uuid, NSLABEL_UUID_LEN);
if (memcmp(label_uuid, uuid, NSLABEL_UUID_LEN) != 0)
continue;
slot = to_slot(ndd, nd_label);
nd_label_free_slot(ndd, slot);
dev_dbg(ndd->dev, "%s: free: %d\n", __func__, slot);
- for (j = l; nd_mapping->labels[j + 1]; j++) {
- struct nd_namespace_label *next_label;
-
- next_label = nd_mapping->labels[j + 1];
- nd_mapping->labels[j] = next_label;
- }
- nd_mapping->labels[j] = NULL;
+ del_label(nd_mapping, l);
num_freed++;
+ l--; /* retry with new label at this index */
}

if (num_freed > l) {
@@ -652,8 +893,8 @@ int nd_pmem_namespace_label_update(struct nd_region *nd_region,
continue;
}

- rc = init_labels(nd_mapping);
- if (rc)
+ rc = init_labels(nd_mapping, 1);
+ if (rc < 0)
return rc;

rc = __pmem_label_update(nd_region, nd_mapping, nspm, i);
@@ -663,3 +904,23 @@ int nd_pmem_namespace_label_update(struct nd_region *nd_region,

return 0;
}
+
+int nd_blk_namespace_label_update(struct nd_region *nd_region,
+ struct nd_namespace_blk *nsblk, resource_size_t size)
+{
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+ struct resource *res;
+ int count = 0;
+
+ if (size == 0)
+ return del_labels(nd_mapping, nsblk->uuid);
+
+ for_each_dpa_resource(to_ndd(nd_mapping), res)
+ count++;
+
+ count = init_labels(nd_mapping, count);
+ if (count < 0)
+ return count;
+
+ return __blk_label_update(nd_region, nd_mapping, nsblk, count);
+}
diff --git a/drivers/nvdimm/label.h b/drivers/nvdimm/label.h
index 6d376be31937..a59ef6eef2a3 100644
--- a/drivers/nvdimm/label.h
+++ b/drivers/nvdimm/label.h
@@ -128,9 +128,14 @@ void nd_label_copy(struct nvdimm_drvdata *ndd, struct nd_namespace_index *dst,
size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd);
int nd_label_active_count(struct nvdimm_drvdata *ndd);
struct nd_namespace_label *nd_label_active(struct nvdimm_drvdata *ndd, int n);
+u32 nd_label_alloc_slot(struct nvdimm_drvdata *ndd);
+bool nd_label_free_slot(struct nvdimm_drvdata *ndd, u32 slot);
u32 nd_label_nfree(struct nvdimm_drvdata *ndd);
struct nd_region;
struct nd_namespace_pmem;
+struct nd_namespace_blk;
int nd_pmem_namespace_label_update(struct nd_region *nd_region,
struct nd_namespace_pmem *nspm, resource_size_t size);
+int nd_blk_namespace_label_update(struct nd_region *nd_region,
+ struct nd_namespace_blk *nsblk, resource_size_t size);
#endif /* __LABEL_H__ */
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 546e77e122fd..50b502b1908e 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -163,8 +163,7 @@ static int nd_namespace_label_update(struct nd_region *nd_region,
*/
if (is_namespace_pmem(dev)) {
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
- struct resource *res = &nspm->nsio.res;
- resource_size_t size = resource_size(res);
+ resource_size_t size = resource_size(&nspm->nsio.res);

if (size == 0 && nspm->uuid)
/* delete allocation */;
@@ -173,8 +172,15 @@ static int nd_namespace_label_update(struct nd_region *nd_region,

return nd_pmem_namespace_label_update(nd_region, nspm, size);
} else if (is_namespace_blk(dev)) {
- /* TODO: implement blk labels */
- return 0;
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+ resource_size_t size = nd_namespace_blk_size(nsblk);
+
+ if (size == 0 && nsblk->uuid)
+ /* delete allocation */;
+ else if (!nsblk->uuid || !nsblk->lbasize)
+ return 0;
+
+ return nd_blk_namespace_label_update(nd_region, nsblk, size);
} else
return -ENXIO;
}
@@ -986,6 +992,48 @@ static ssize_t sector_size_store(struct device *dev,
}
static DEVICE_ATTR_RW(sector_size);

+static ssize_t dpa_extents_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev->parent);
+ struct nd_label_id label_id;
+ int count = 0, i;
+ u8 *uuid = NULL;
+ u32 flags = 0;
+
+ nvdimm_bus_lock(dev);
+ if (is_namespace_pmem(dev)) {
+ struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+
+ uuid = nspm->uuid;
+ flags = 0;
+ } else if (is_namespace_blk(dev)) {
+ struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
+
+ uuid = nsblk->uuid;
+ flags = NSLABEL_FLAG_LOCAL;
+ }
+
+ if (!uuid)
+ goto out;
+
+ nd_label_gen_id(&label_id, uuid, flags);
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct resource *res;
+
+ for_each_dpa_resource(ndd, res)
+ if (strcmp(res->name, label_id.id) == 0)
+ count++;
+ }
+ out:
+ nvdimm_bus_unlock(dev);
+
+ return sprintf(buf, "%d\n", count);
+}
+static DEVICE_ATTR_RO(dpa_extents);
+
static struct attribute *nd_namespace_attributes[] = {
&dev_attr_nstype.attr,
&dev_attr_size.attr,
@@ -993,6 +1041,7 @@ static struct attribute *nd_namespace_attributes[] = {
&dev_attr_resource.attr,
&dev_attr_alt_name.attr,
&dev_attr_sector_size.attr,
+ &dev_attr_dpa_extents.attr,
NULL,
};

diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 22489555a6f1..78d6c51f4bac 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -75,5 +75,6 @@ struct nd_mapping;
struct resource *nsblk_add_resource(struct nd_region *nd_region,
struct nvdimm_drvdata *ndd, struct nd_namespace_blk *nsblk,
resource_size_t start);
+int nvdimm_num_label_slots(struct nvdimm_drvdata *ndd);
void get_ndd(struct nvdimm_drvdata *ndd);
#endif /* __ND_CORE_H__ */

2015-06-20 12:40:38

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH v7 00/16] libnvdimm: non-volatile memory devices

Hi Dan,

On Wed, 17 Jun 2015 19:13:19 -0400 Dan Williams <[email protected]> wrote:
>
> A new sub-system in support of non-volatile memory storage devices.
>
> Stephen, please add libnvdimm-for-next to -next:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm libnvdimm-for-next

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgment of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]


Attachments:
(No filename) (819.00 B)
OpenPGP digital signature

2015-06-25 00:59:46

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v7 00/16] libnvdimm: non-volatile memory devices

On Wed, 2015-06-17 at 19:13 -0400, Dan Williams wrote:
> A new sub-system in support of non-volatile memory storage devices.
>
> Stephen, please add libnvdimm-for-next to -next:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm libnvdimm-for-next
>
> Changes since v6 [1]:
>
> 1/ Deferred the patches dependent on ->rw_bytes() (BTT - stacked block
> driver, BLK - mmio aperture windows driver, NFIT_TEST - unit test
> infrastructure for all libnvdimm + nfit components) to their own
> patchset. Make the ->rw_bytes() implementation the first patch in
> that series (Christoph)
>
> 2/ Collected acks from Christoph and Rafael!
>
> 3/ Add a HAS_IOMEM dependency to CONFIG_BLK_DEV_PMEM following commit
> b6f2098fb708 "block: pmem: Add dependency on HAS_IOMEM" in 4.1-rc8.
>
> 4/ Move libnvdimm to subsys_initcall() and move arch/x86/kernel/pmem.c
> back to device_initcall(). This allows ACPI_NFIT to be built-in.
> (Linda)
>
> 5/ Drop the ACPI_DRIVER_ALL_NOTIFY_EVENTS flag in the nfit driver.
> (Rafael)
>
> 6/ Reference count the nvdimm_drvdata object. This fixes a bug that was
> found when the unit tests were extended to test disabling an nvdimm
> while a region device still had references to label data.
>
:
> Dan Williams (16):
> e820, efi: add ACPI 6.0 persistent memory types
> libnvdimm, nfit: initial libnvdimm infrastructure and NFIT support
> libnvdimm: control character device and nvdimm_bus sysfs attributes
> libnvdimm, nfit: dimm/memory-devices
> libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices
> libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure
> libnvdimm, nfit: regions (block-data-window, persistent memory, volatile memory)
> libnvdimm: support for legacy (non-aliasing) nvdimms
> libnvdimm, pmem: move pmem to drivers/nvdimm/
> libnvdimm, pmem: add libnvdimm support to the pmem driver
> libnvdimm, nfit: add interleave-set state-tracking infrastructure
> libnvdimm: namespace indices: read and validate
> libnvdimm: pmem label sets and namespace instantiation.
> libnvdimm: blk labels and namespace instantiation
> libnvdimm: write pmem label set
> libnvdimm: write blk label set

We have been successfully running this patchset on our NFIT-enabled
prototype systems with pmem. (Intel example _DSM, label, blk are not
available for testing.)

So for patch 1/16 to 4/15, and 6/16 to 10/16.

Tested-by: Toshi Kani <[email protected]>

Thanks,
-Toshi

2015-07-01 13:13:10

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v7 09/16] libnvdimm, pmem: move pmem to drivers/nvdimm/

Hi Dan,

On Thu, Jun 18, 2015 at 1:14 AM, Dan Williams <[email protected]> wrote:
> Prepare the pmem driver to consume PMEM namespaces emitted by regions of
> an nvdimm_bus instance. No functional change.

As LIBNVDIMM depends on PHYS_ADDR_T_64BIT, the driver can no
longer be enabled on pure 32-bit platforms. Is that intended?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2015-07-01 16:41:45

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v7 09/16] libnvdimm, pmem: move pmem to drivers/nvdimm/

On Wed, Jul 1, 2015 at 6:13 AM, Geert Uytterhoeven <[email protected]> wrote:
> Hi Dan,
>
> On Thu, Jun 18, 2015 at 1:14 AM, Dan Williams <[email protected]> wrote:
>> Prepare the pmem driver to consume PMEM namespaces emitted by regions of
>> an nvdimm_bus instance. No functional change.
>
> As LIBNVDIMM depends on PHYS_ADDR_T_64BIT, the driver can no
> longer be enabled on pure 32-bit platforms. Is that intended?

Yes it was intentional. It still allows 32-bit platforms with 64-bit
resource_size_t to compile. Do you otherwise have a 32-bit use case
for pmem? I'm of course open to working through the changes to add
wider architecture support.

2015-07-01 16:49:43

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v7 09/16] libnvdimm, pmem: move pmem to drivers/nvdimm/

On Wed, Jul 1, 2015 at 6:41 PM, Dan Williams <[email protected]> wrote:
> On Wed, Jul 1, 2015 at 6:13 AM, Geert Uytterhoeven <[email protected]> wrote:
>> On Thu, Jun 18, 2015 at 1:14 AM, Dan Williams <[email protected]> wrote:
>>> Prepare the pmem driver to consume PMEM namespaces emitted by regions of
>>> an nvdimm_bus instance. No functional change.
>>
>> As LIBNVDIMM depends on PHYS_ADDR_T_64BIT, the driver can no
>> longer be enabled on pure 32-bit platforms. Is that intended?
>
> Yes it was intentional. It still allows 32-bit platforms with 64-bit
> resource_size_t to compile. Do you otherwise have a 32-bit use case
> for pmem? I'm of course open to working through the changes to add
> wider architecture support.

Nope, just wondering, as the original didn't depend on PHYS_ADDR_T_64BIT.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2015-07-01 17:25:11

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v7 09/16] libnvdimm, pmem: move pmem to drivers/nvdimm/

On Wed, Jul 1, 2015 at 9:49 AM, Geert Uytterhoeven <[email protected]> wrote:
> On Wed, Jul 1, 2015 at 6:41 PM, Dan Williams <[email protected]> wrote:
>> On Wed, Jul 1, 2015 at 6:13 AM, Geert Uytterhoeven <[email protected]> wrote:
>>> On Thu, Jun 18, 2015 at 1:14 AM, Dan Williams <[email protected]> wrote:
>>>> Prepare the pmem driver to consume PMEM namespaces emitted by regions of
>>>> an nvdimm_bus instance. No functional change.
>>>
>>> As LIBNVDIMM depends on PHYS_ADDR_T_64BIT, the driver can no
>>> longer be enabled on pure 32-bit platforms. Is that intended?
>>
>> Yes it was intentional. It still allows 32-bit platforms with 64-bit
>> resource_size_t to compile. Do you otherwise have a 32-bit use case
>> for pmem? I'm of course open to working through the changes to add
>> wider architecture support.
>
> Nope, just wondering, as the original didn't depend on PHYS_ADDR_T_64BIT.

Not explicitly, but it did rely on X86_PMEM_LEGACY.