With the arrival of the device-dax facility in 4.7 a pmem namespace can
now be configured into a total of four distinct modes: 'raw', 'sector',
'memory', and 'dax'. Where raw, sector, and memory are block device
modes and dax supports the device-dax character device. With that degree
of freedom in the use cases it is overly restrictive to continue the
current limit of only one pmem namespace per-region, or "interleave-set"
in ACPI 6+ terminology.
This series adds support for reading and writing configurations that
describe multiple pmem allocations within a region. The new rules for
allocating / validating the available capacity when blk and pmem regions
alias are (quoting space_valid()):
BLK-space is valid as long as it does not precede a PMEM
allocation in a given region. PMEM-space must be contiguous
and adjacent to an existing existing allocation (if one
exists).
Where "adjacent" allocations grow an existing namespace. Note that
growing a namespace is potentially destructive if free space is consumed
from a location preceding the current allocation. There is no support
for dis-continuity within a given namespace allocation.
Previously, since there was only one namespace per-region, the resulting
pmem device would be named after the region. Now, subsequent namespaces
after the first are named with the region index and a
".<namespace-index>" suffix. For example:
/dev/pmem0.1
---
Dan Williams (14):
libnvdimm, region: move region-mapping input-paramters to nd_mapping_desc
libnvdimm, label: convert label tracking to a linked list
libnvdimm, namespace: refactor uuid_show() into a namespace_to_uuid() helper
libnvdimm, namespace: unify blk and pmem label scanning
tools/testing/nvdimm: support for sub-dividing a pmem region
libnvdimm, namespace: allow multiple pmem-namespaces per region at scan time
libnvdimm, namespace: sort namespaces by dpa at init
libnvdimm, region: update nd_region_available_dpa() for multi-pmem support
libnvdimm, namespace: expand pmem device naming scheme for multi-pmem
libnvdimm, namespace: update label implementation for multi-pmem
libnvdimm, namespace: enable allocation of multiple pmem namespaces
libnvdimm, namespace: filter out of range labels in scan_labels()
libnvdimm, namespace: lift single pmem limit in scan_labels()
libnvdimm, namespace: allow creation of multiple pmem-namespaces per region
drivers/acpi/nfit/core.c | 30 +
drivers/nvdimm/dimm_devs.c | 192 ++++++--
drivers/nvdimm/label.c | 192 +++++---
drivers/nvdimm/namespace_devs.c | 786 +++++++++++++++++++++++----------
drivers/nvdimm/nd-core.h | 23 +
drivers/nvdimm/nd.h | 28 +
drivers/nvdimm/region_devs.c | 58 ++
include/linux/libnvdimm.h | 25 -
include/linux/nd.h | 8
tools/testing/nvdimm/test/iomap.c | 134 ++++--
tools/testing/nvdimm/test/nfit.c | 21 -
tools/testing/nvdimm/test/nfit_test.h | 12 -
12 files changed, 1055 insertions(+), 454 deletions(-)
Before we add more libnvdimm-private fields to nd_mapping make it clear
which parameters are input vs libnvdimm internals. Use struct
nd_mapping_desc instead of struct nd_mapping in nd_region_desc and make
struct nd_mapping private to libnvdimm.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit/core.c | 30 +++++++++++++++---------------
drivers/nvdimm/nd.h | 14 ++++++++++++++
drivers/nvdimm/region_devs.c | 16 +++++++++-------
include/linux/libnvdimm.h | 25 +++++++------------------
4 files changed, 45 insertions(+), 40 deletions(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 02838f928d7e..6490a15abdd3 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1627,9 +1627,9 @@ static int acpi_nfit_init_interleave_set(struct acpi_nfit_desc *acpi_desc,
if (!info)
return -ENOMEM;
for (i = 0; i < nr; i++) {
- struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
+ struct nd_mapping_desc *mapping = &ndr_desc->mapping[i];
struct nfit_set_info_map *map = &info->mapping[i];
- struct nvdimm *nvdimm = nd_mapping->nvdimm;
+ struct nvdimm *nvdimm = mapping->nvdimm;
struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
struct acpi_nfit_memory_map *memdev = memdev_from_spa(acpi_desc,
spa->range_index, i);
@@ -2053,7 +2053,7 @@ static int acpi_nfit_insert_resource(struct acpi_nfit_desc *acpi_desc,
}
static int acpi_nfit_init_mapping(struct acpi_nfit_desc *acpi_desc,
- struct nd_mapping *nd_mapping, struct nd_region_desc *ndr_desc,
+ struct nd_mapping_desc *mapping, struct nd_region_desc *ndr_desc,
struct acpi_nfit_memory_map *memdev,
struct nfit_spa *nfit_spa)
{
@@ -2070,12 +2070,12 @@ static int acpi_nfit_init_mapping(struct acpi_nfit_desc *acpi_desc,
return -ENODEV;
}
- nd_mapping->nvdimm = nvdimm;
+ mapping->nvdimm = nvdimm;
switch (nfit_spa_type(spa)) {
case NFIT_SPA_PM:
case NFIT_SPA_VOLATILE:
- nd_mapping->start = memdev->address;
- nd_mapping->size = memdev->region_size;
+ mapping->start = memdev->address;
+ mapping->size = memdev->region_size;
break;
case NFIT_SPA_DCR:
nfit_mem = nvdimm_provider_data(nvdimm);
@@ -2083,13 +2083,13 @@ static int acpi_nfit_init_mapping(struct acpi_nfit_desc *acpi_desc,
dev_dbg(acpi_desc->dev, "spa%d %s missing bdw\n",
spa->range_index, nvdimm_name(nvdimm));
} else {
- nd_mapping->size = nfit_mem->bdw->capacity;
- nd_mapping->start = nfit_mem->bdw->start_address;
+ mapping->size = nfit_mem->bdw->capacity;
+ mapping->start = nfit_mem->bdw->start_address;
ndr_desc->num_lanes = nfit_mem->bdw->windows;
blk_valid = 1;
}
- ndr_desc->nd_mapping = nd_mapping;
+ ndr_desc->mapping = mapping;
ndr_desc->num_mappings = blk_valid;
ndbr_desc = to_blk_region_desc(ndr_desc);
ndbr_desc->enable = acpi_nfit_blk_region_enable;
@@ -2115,7 +2115,7 @@ static bool nfit_spa_is_virtual(struct acpi_nfit_system_address *spa)
static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
struct nfit_spa *nfit_spa)
{
- static struct nd_mapping nd_mappings[ND_MAX_MAPPINGS];
+ static struct nd_mapping_desc mappings[ND_MAX_MAPPINGS];
struct acpi_nfit_system_address *spa = nfit_spa->spa;
struct nd_blk_region_desc ndbr_desc;
struct nd_region_desc *ndr_desc;
@@ -2134,7 +2134,7 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
}
memset(&res, 0, sizeof(res));
- memset(&nd_mappings, 0, sizeof(nd_mappings));
+ memset(&mappings, 0, sizeof(mappings));
memset(&ndbr_desc, 0, sizeof(ndbr_desc));
res.start = spa->address;
res.end = res.start + spa->length - 1;
@@ -2150,7 +2150,7 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev;
- struct nd_mapping *nd_mapping;
+ struct nd_mapping_desc *mapping;
if (memdev->range_index != spa->range_index)
continue;
@@ -2159,14 +2159,14 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
spa->range_index, ND_MAX_MAPPINGS);
return -ENXIO;
}
- nd_mapping = &nd_mappings[count++];
- rc = acpi_nfit_init_mapping(acpi_desc, nd_mapping, ndr_desc,
+ mapping = &mappings[count++];
+ rc = acpi_nfit_init_mapping(acpi_desc, mapping, ndr_desc,
memdev, nfit_spa);
if (rc)
goto out;
}
- ndr_desc->nd_mapping = nd_mappings;
+ ndr_desc->mapping = mappings;
ndr_desc->num_mappings = count;
rc = acpi_nfit_init_interleave_set(acpi_desc, ndr_desc, spa);
if (rc)
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 38d6f039234e..e58c40824e1f 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -98,6 +98,20 @@ struct nd_percpu_lane {
spinlock_t lock;
};
+struct nd_mapping {
+ struct nvdimm *nvdimm;
+ struct nd_namespace_label **labels;
+ u64 start;
+ u64 size;
+ /*
+ * @ndd is for private use at region enable / disable time for
+ * get_ndd() + put_ndd(), all other nd_mapping to ndd
+ * conversions use to_ndd() which respects enabled state of the
+ * nvdimm.
+ */
+ struct nvdimm_drvdata *ndd;
+};
+
struct nd_region {
struct device dev;
struct ida ns_ida;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e8d5ba7b29af..0ff43cbb15e3 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -755,10 +755,10 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
int ro = 0;
for (i = 0; i < ndr_desc->num_mappings; i++) {
- struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
- struct nvdimm *nvdimm = nd_mapping->nvdimm;
+ struct nd_mapping_desc *mapping = &ndr_desc->mapping[i];
+ struct nvdimm *nvdimm = mapping->nvdimm;
- if ((nd_mapping->start | nd_mapping->size) % SZ_4K) {
+ if ((mapping->start | mapping->size) % SZ_4K) {
dev_err(&nvdimm_bus->dev, "%s: %s mapping%d is not 4K aligned\n",
caller, dev_name(&nvdimm->dev), i);
@@ -809,11 +809,13 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
ndl->count = 0;
}
- memcpy(nd_region->mapping, ndr_desc->nd_mapping,
- sizeof(struct nd_mapping) * ndr_desc->num_mappings);
for (i = 0; i < ndr_desc->num_mappings; i++) {
- struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
- struct nvdimm *nvdimm = nd_mapping->nvdimm;
+ struct nd_mapping_desc *mapping = &ndr_desc->mapping[i];
+ struct nvdimm *nvdimm = mapping->nvdimm;
+
+ nd_region->mapping[i].nvdimm = nvdimm;
+ nd_region->mapping[i].start = mapping->start;
+ nd_region->mapping[i].size = mapping->size;
get_device(&nvdimm->dev);
}
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 4a5f8c51f2a5..f4947fda11e7 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -50,23 +50,6 @@ typedef int (*ndctl_fn)(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len, int *cmd_rc);
-struct nd_namespace_label;
-struct nvdimm_drvdata;
-
-struct nd_mapping {
- struct nvdimm *nvdimm;
- struct nd_namespace_label **labels;
- u64 start;
- u64 size;
- /*
- * @ndd is for private use at region enable / disable time for
- * get_ndd() + put_ndd(), all other nd_mapping to ndd
- * conversions use to_ndd() which respects enabled state of the
- * nvdimm.
- */
- struct nvdimm_drvdata *ndd;
-};
-
struct nvdimm_bus_descriptor {
const struct attribute_group **attr_groups;
unsigned long cmd_mask;
@@ -89,9 +72,15 @@ struct nd_interleave_set {
u64 cookie;
};
+struct nd_mapping_desc {
+ struct nvdimm *nvdimm;
+ u64 start;
+ u64 size;
+};
+
struct nd_region_desc {
struct resource *res;
- struct nd_mapping *nd_mapping;
+ struct nd_mapping_desc *mapping;
u16 num_mappings;
const struct attribute_group **attr_groups;
struct nd_interleave_set *nd_set;
The ability to translate a generic struct device pointer into a
namespace uuid is a useful utility as we go to unify the blk and pmem
label scanning paths.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/namespace_devs.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 9f4188c78120..0e62f46755e7 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1032,22 +1032,27 @@ static ssize_t size_show(struct device *dev,
}
static DEVICE_ATTR(size, S_IRUGO, size_show, size_store);
-static ssize_t uuid_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static u8 *namespace_to_uuid(struct device *dev)
{
- u8 *uuid;
-
if (is_namespace_pmem(dev)) {
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
- uuid = nspm->uuid;
+ return nspm->uuid;
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
- uuid = nsblk->uuid;
+ return nsblk->uuid;
} else
- return -ENXIO;
+ return ERR_PTR(-ENXIO);
+}
+
+static ssize_t uuid_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ u8 *uuid = namespace_to_uuid(dev);
+ if (IS_ERR(uuid))
+ return PTR_ERR(uuid);
if (uuid)
return sprintf(buf, "%pUb\n", uuid);
return sprintf(buf, "\n");
In preparation for enabling multiple namespaces per pmem region, convert
the label tracking to use a linked list. In particular this will allow
select_pmem_id() to move labels from the unvalidated state to the
validated state. Currently we only track one validated set per-region.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/label.c | 136 +++++++++++++++++--------------
drivers/nvdimm/namespace_devs.c | 173 +++++++++++++++++++++++++++------------
drivers/nvdimm/nd-core.h | 1
drivers/nvdimm/nd.h | 16 +++-
drivers/nvdimm/region_devs.c | 19 ++++
5 files changed, 225 insertions(+), 120 deletions(-)
diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index 96526dcfdd37..c37357210428 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -499,6 +499,7 @@ static int __pmem_label_update(struct nd_region *nd_region,
struct nd_namespace_label *victim_label;
struct nd_namespace_label *nd_label;
struct nd_namespace_index *nsindex;
+ struct nd_label_ent *label_ent;
unsigned long *free;
u32 nslot, slot;
size_t offset;
@@ -536,8 +537,13 @@ static int __pmem_label_update(struct nd_region *nd_region,
return rc;
/* Garbage collect the previous label */
- victim_label = nd_mapping->labels[0];
+ mutex_lock(&nd_mapping->lock);
+ label_ent = list_first_entry_or_null(&nd_mapping->labels,
+ typeof(*label_ent), list);
+ WARN_ON(!label_ent);
+ victim_label = label_ent ? label_ent->label : NULL;
if (victim_label) {
+ label_ent->label = NULL;
slot = to_slot(ndd, victim_label);
nd_label_free_slot(ndd, slot);
dev_dbg(ndd->dev, "%s: free: %d\n", __func__, slot);
@@ -546,28 +552,11 @@ static int __pmem_label_update(struct nd_region *nd_region,
/* update index */
rc = nd_label_write_index(ndd, ndd->ns_next,
nd_inc_seq(__le32_to_cpu(nsindex->seq)), 0);
- if (rc < 0)
- return rc;
-
- nd_mapping->labels[0] = nd_label;
-
- return 0;
-}
-
-static void del_label(struct nd_mapping *nd_mapping, int l)
-{
- struct nd_namespace_label *next_label, *nd_label;
- struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
- unsigned int slot;
- int j;
+ if (rc == 0 && label_ent)
+ label_ent->label = nd_label;
+ mutex_unlock(&nd_mapping->lock);
- nd_label = nd_mapping->labels[l];
- slot = to_slot(ndd, nd_label);
- dev_vdbg(ndd->dev, "%s: clear: %d\n", __func__, slot);
-
- for (j = l; (next_label = nd_mapping->labels[j + 1]); j++)
- nd_mapping->labels[j] = next_label;
- nd_mapping->labels[j] = NULL;
+ return rc;
}
static bool is_old_resource(struct resource *res, struct resource **list, int n)
@@ -607,14 +596,16 @@ static int __blk_label_update(struct nd_region *nd_region,
struct nd_mapping *nd_mapping, struct nd_namespace_blk *nsblk,
int num_labels)
{
- int i, l, alloc, victims, nfree, old_num_resources, nlabel, rc = -ENXIO;
+ int i, alloc, victims, nfree, old_num_resources, nlabel, rc = -ENXIO;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
struct nd_namespace_label *nd_label;
+ struct nd_label_ent *label_ent, *e;
struct nd_namespace_index *nsindex;
unsigned long *free, *victim_map = NULL;
struct resource *res, **old_res_list;
struct nd_label_id label_id;
u8 uuid[NSLABEL_UUID_LEN];
+ LIST_HEAD(list);
u32 nslot, slot;
if (!preamble_next(ndd, &nsindex, &free, &nslot))
@@ -736,15 +727,22 @@ static int __blk_label_update(struct nd_region *nd_region,
* entries in nd_mapping->labels
*/
nlabel = 0;
- for_each_label(l, nd_label, nd_mapping->labels) {
+ mutex_lock(&nd_mapping->lock);
+ list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list) {
+ nd_label = label_ent->label;
+ if (!nd_label)
+ continue;
nlabel++;
memcpy(uuid, nd_label->uuid, NSLABEL_UUID_LEN);
if (memcmp(uuid, nsblk->uuid, NSLABEL_UUID_LEN) != 0)
continue;
nlabel--;
- del_label(nd_mapping, l);
- l--; /* retry with the new label at this index */
+ list_move(&label_ent->list, &list);
+ label_ent->label = NULL;
}
+ list_splice_tail_init(&list, &nd_mapping->labels);
+ mutex_unlock(&nd_mapping->lock);
+
if (nlabel + nsblk->num_resources > num_labels) {
/*
* Bug, we can't end up with more resources than
@@ -755,6 +753,15 @@ static int __blk_label_update(struct nd_region *nd_region,
goto out;
}
+ mutex_lock(&nd_mapping->lock);
+ label_ent = list_first_entry_or_null(&nd_mapping->labels,
+ typeof(*label_ent), list);
+ if (!label_ent) {
+ WARN_ON(1);
+ mutex_unlock(&nd_mapping->lock);
+ rc = -ENXIO;
+ goto out;
+ }
for_each_clear_bit_le(slot, free, nslot) {
nd_label = nd_label_base(ndd) + slot;
memcpy(uuid, nd_label->uuid, NSLABEL_UUID_LEN);
@@ -762,11 +769,19 @@ static int __blk_label_update(struct nd_region *nd_region,
continue;
res = to_resource(ndd, nd_label);
res->flags &= ~DPA_RESOURCE_ADJUSTED;
- dev_vdbg(&nsblk->common.dev, "assign label[%d] slot: %d\n",
- l, slot);
- nd_mapping->labels[l++] = nd_label;
+ dev_vdbg(&nsblk->common.dev, "assign label slot: %d\n", slot);
+ list_for_each_entry_from(label_ent, &nd_mapping->labels, list) {
+ if (label_ent->label)
+ continue;
+ label_ent->label = nd_label;
+ nd_label = NULL;
+ break;
+ }
+ if (nd_label)
+ dev_WARN(&nsblk->common.dev,
+ "failed to track label slot%d\n", slot);
}
- nd_mapping->labels[l] = NULL;
+ mutex_unlock(&nd_mapping->lock);
out:
kfree(old_res_list);
@@ -788,32 +803,28 @@ static int __blk_label_update(struct nd_region *nd_region,
static int init_labels(struct nd_mapping *nd_mapping, int num_labels)
{
- int i, l, old_num_labels = 0;
+ int i, old_num_labels = 0;
+ struct nd_label_ent *label_ent;
struct nd_namespace_index *nsindex;
- struct nd_namespace_label *nd_label;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
- size_t size = (num_labels + 1) * sizeof(struct nd_namespace_label *);
- for_each_label(l, nd_label, nd_mapping->labels)
+ mutex_lock(&nd_mapping->lock);
+ list_for_each_entry(label_ent, &nd_mapping->labels, list)
old_num_labels++;
+ mutex_unlock(&nd_mapping->lock);
/*
* We need to preserve all the old labels for the mapping so
* they can be garbage collected after writing the new labels.
*/
- if (num_labels > old_num_labels) {
- struct nd_namespace_label **labels;
-
- labels = krealloc(nd_mapping->labels, size, GFP_KERNEL);
- if (!labels)
+ for (i = old_num_labels; i < num_labels; i++) {
+ label_ent = kzalloc(sizeof(*label_ent), GFP_KERNEL);
+ if (!label_ent)
return -ENOMEM;
- nd_mapping->labels = labels;
+ mutex_lock(&nd_mapping->lock);
+ list_add_tail(&label_ent->list, &nd_mapping->labels);
+ mutex_unlock(&nd_mapping->lock);
}
- if (!nd_mapping->labels)
- return -ENOMEM;
-
- for (i = old_num_labels; i <= num_labels; i++)
- nd_mapping->labels[i] = NULL;
if (ndd->ns_current == -1 || ndd->ns_next == -1)
/* pass */;
@@ -837,42 +848,45 @@ static int init_labels(struct nd_mapping *nd_mapping, int num_labels)
static int del_labels(struct nd_mapping *nd_mapping, u8 *uuid)
{
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
- struct nd_namespace_label *nd_label;
+ struct nd_label_ent *label_ent, *e;
struct nd_namespace_index *nsindex;
u8 label_uuid[NSLABEL_UUID_LEN];
- int l, num_freed = 0;
unsigned long *free;
+ LIST_HEAD(list);
u32 nslot, slot;
+ int active = 0;
if (!uuid)
return 0;
/* no index || no labels == nothing to delete */
- if (!preamble_next(ndd, &nsindex, &free, &nslot)
- || !nd_mapping->labels)
+ if (!preamble_next(ndd, &nsindex, &free, &nslot))
return 0;
- for_each_label(l, nd_label, nd_mapping->labels) {
+ mutex_lock(&nd_mapping->lock);
+ list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list) {
+ struct nd_namespace_label *nd_label = label_ent->label;
+
+ if (!nd_label)
+ continue;
+ active++;
memcpy(label_uuid, nd_label->uuid, NSLABEL_UUID_LEN);
if (memcmp(label_uuid, uuid, NSLABEL_UUID_LEN) != 0)
continue;
+ active--;
slot = to_slot(ndd, nd_label);
nd_label_free_slot(ndd, slot);
dev_dbg(ndd->dev, "%s: free: %d\n", __func__, slot);
- del_label(nd_mapping, l);
- num_freed++;
- l--; /* retry with new label at this index */
+ list_move_tail(&label_ent->list, &list);
+ label_ent->label = NULL;
}
+ list_splice_tail_init(&list, &nd_mapping->labels);
- if (num_freed > l) {
- /*
- * num_freed will only ever be > l when we delete the last
- * label
- */
- kfree(nd_mapping->labels);
- nd_mapping->labels = NULL;
- dev_dbg(ndd->dev, "%s: no more labels\n", __func__);
+ if (active == 0) {
+ nd_mapping_free_labels(nd_mapping);
+ dev_dbg(ndd->dev, "%s: no more active labels\n", __func__);
}
+ mutex_unlock(&nd_mapping->lock);
return nd_label_write_index(ndd, ndd->ns_next,
nd_inc_seq(__le32_to_cpu(nsindex->seq)), 0);
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 4f0a21308417..9f4188c78120 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -14,6 +14,7 @@
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/pmem.h>
+#include <linux/list.h>
#include <linux/nd.h>
#include "nd-core.h"
#include "nd.h"
@@ -1089,7 +1090,7 @@ static int namespace_update_uuid(struct nd_region *nd_region,
*
* FIXME: can we delete uuid with zero dpa allocated?
*/
- if (nd_mapping->labels)
+ if (list_empty(&nd_mapping->labels))
return -EBUSY;
}
@@ -1491,14 +1492,19 @@ static bool has_uuid_at_pos(struct nd_region *nd_region, u8 *uuid,
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
- struct nd_namespace_label *nd_label;
+ struct nd_label_ent *label_ent;
bool found_uuid = false;
- int l;
- for_each_label(l, nd_label, nd_mapping->labels) {
- u64 isetcookie = __le64_to_cpu(nd_label->isetcookie);
- u16 position = __le16_to_cpu(nd_label->position);
- u16 nlabel = __le16_to_cpu(nd_label->nlabel);
+ list_for_each_entry(label_ent, &nd_mapping->labels, list) {
+ struct nd_namespace_label *nd_label = label_ent->label;
+ u16 position, nlabel;
+ u64 isetcookie;
+
+ if (!nd_label)
+ continue;
+ isetcookie = __le64_to_cpu(nd_label->isetcookie);
+ position = __le16_to_cpu(nd_label->position);
+ nlabel = __le16_to_cpu(nd_label->nlabel);
if (isetcookie != cookie)
continue;
@@ -1528,7 +1534,6 @@ static bool has_uuid_at_pos(struct nd_region *nd_region, u8 *uuid,
static int select_pmem_id(struct nd_region *nd_region, u8 *pmem_id)
{
- struct nd_namespace_label *select = NULL;
int i;
if (!pmem_id)
@@ -1536,35 +1541,47 @@ static int select_pmem_id(struct nd_region *nd_region, u8 *pmem_id)
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
- struct nd_namespace_label *nd_label;
+ struct nd_namespace_label *nd_label = NULL;
u64 hw_start, hw_end, pmem_start, pmem_end;
- int l;
+ struct nd_label_ent *label_ent;
- for_each_label(l, nd_label, nd_mapping->labels)
+ mutex_lock(&nd_mapping->lock);
+ list_for_each_entry(label_ent, &nd_mapping->labels, list) {
+ nd_label = label_ent->label;
+ if (!nd_label)
+ continue;
if (memcmp(nd_label->uuid, pmem_id, NSLABEL_UUID_LEN) == 0)
break;
+ nd_label = NULL;
+ }
+ mutex_unlock(&nd_mapping->lock);
if (!nd_label) {
WARN_ON(1);
return -EINVAL;
}
- select = nd_label;
/*
* Check that this label is compliant with the dpa
* range published in NFIT
*/
hw_start = nd_mapping->start;
hw_end = hw_start + nd_mapping->size;
- pmem_start = __le64_to_cpu(select->dpa);
- pmem_end = pmem_start + __le64_to_cpu(select->rawsize);
+ pmem_start = __le64_to_cpu(nd_label->dpa);
+ pmem_end = pmem_start + __le64_to_cpu(nd_label->rawsize);
if (pmem_start == hw_start && pmem_end <= hw_end)
/* pass */;
else
return -EINVAL;
- nd_mapping->labels[0] = select;
- nd_mapping->labels[1] = NULL;
+ mutex_lock(&nd_mapping->lock);
+ label_ent = list_first_entry(&nd_mapping->labels,
+ typeof(*label_ent), list);
+ label_ent->label = nd_label;
+ list_del(&label_ent->list);
+ nd_mapping_free_labels(nd_mapping);
+ list_add(&label_ent->list, &nd_mapping->labels);
+ mutex_unlock(&nd_mapping->lock);
}
return 0;
}
@@ -1577,11 +1594,12 @@ static int find_pmem_label_set(struct nd_region *nd_region,
struct nd_namespace_pmem *nspm)
{
u64 cookie = nd_region_interleave_set_cookie(nd_region);
- struct nd_namespace_label *nd_label;
u8 select_id[NSLABEL_UUID_LEN];
+ struct nd_label_ent *label_ent;
+ struct nd_mapping *nd_mapping;
resource_size_t size = 0;
u8 *pmem_id = NULL;
- int rc = -ENODEV, l;
+ int rc = 0;
u16 i;
if (cookie == 0) {
@@ -1593,13 +1611,19 @@ static int find_pmem_label_set(struct nd_region *nd_region,
* Find a complete set of labels by uuid. By definition we can start
* with any mapping as the reference label
*/
- for_each_label(l, nd_label, nd_region->mapping[0].labels) {
- u64 isetcookie = __le64_to_cpu(nd_label->isetcookie);
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ nd_mapping = &nd_region->mapping[i];
+ mutex_lock_nested(&nd_mapping->lock, i);
+ }
+ list_for_each_entry(label_ent, &nd_region->mapping[0].labels, list) {
+ struct nd_namespace_label *nd_label = label_ent->label;
- if (isetcookie != cookie)
+ if (!nd_label)
+ continue;
+ if (__le64_to_cpu(nd_label->isetcookie) != cookie)
continue;
- for (i = 0; nd_region->ndr_mappings; i++)
+ for (i = 0; i < nd_region->ndr_mappings; i++)
if (!has_uuid_at_pos(nd_region, nd_label->uuid,
cookie, i))
break;
@@ -1611,18 +1635,27 @@ static int find_pmem_label_set(struct nd_region *nd_region,
* dimm with two instances of the same uuid.
*/
rc = -EINVAL;
- goto err;
+ break;
} else if (pmem_id) {
/*
* If there is more than one valid uuid set, we
* need userspace to clean this up.
*/
rc = -EBUSY;
- goto err;
+ break;
}
memcpy(select_id, nd_label->uuid, NSLABEL_UUID_LEN);
pmem_id = select_id;
}
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ int reverse = nd_region->ndr_mappings - 1 - i;
+
+ nd_mapping = &nd_region->mapping[reverse];
+ mutex_unlock(&nd_mapping->lock);
+ }
+
+ if (rc)
+ goto err;
/*
* Fix up each mapping's 'labels' to have the validated pmem label for
@@ -1638,8 +1671,19 @@ static int find_pmem_label_set(struct nd_region *nd_region,
/* Calculate total size and populate namespace properties from label0 */
for (i = 0; i < nd_region->ndr_mappings; i++) {
- struct nd_mapping *nd_mapping = &nd_region->mapping[i];
- struct nd_namespace_label *label0 = nd_mapping->labels[0];
+ struct nd_namespace_label *label0;
+
+ nd_mapping = &nd_region->mapping[i];
+ mutex_lock(&nd_mapping->lock);
+ label_ent = list_first_entry_or_null(&nd_mapping->labels,
+ typeof(*label_ent), list);
+ label0 = label_ent ? label_ent->label : 0;
+ mutex_unlock(&nd_mapping->lock);
+
+ if (!label0) {
+ WARN_ON(1);
+ continue;
+ }
size += __le64_to_cpu(label0->rawsize);
if (__le16_to_cpu(label0->position) != 0)
@@ -1700,8 +1744,9 @@ static struct device **create_namespace_pmem(struct nd_region *nd_region)
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
- kfree(nd_mapping->labels);
- nd_mapping->labels = NULL;
+ mutex_lock(&nd_mapping->lock);
+ nd_mapping_free_labels(nd_mapping);
+ mutex_unlock(&nd_mapping->lock);
}
/* Publish a zero-sized namespace for userspace to configure. */
@@ -1822,25 +1867,25 @@ void nd_region_create_btt_seed(struct nd_region *nd_region)
dev_err(&nd_region->dev, "failed to create btt namespace\n");
}
-static struct device **create_namespace_blk(struct nd_region *nd_region)
+static struct device **scan_labels(struct nd_region *nd_region,
+ struct nd_mapping *nd_mapping)
{
- struct nd_mapping *nd_mapping = &nd_region->mapping[0];
- struct nd_namespace_label *nd_label;
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
struct device *dev, **devs = NULL;
struct nd_namespace_blk *nsblk;
- struct nvdimm_drvdata *ndd;
- int i, l, count = 0;
- struct resource *res;
-
- if (nd_region->ndr_mappings == 0)
- return NULL;
+ struct nd_label_ent *label_ent;
+ int i, count = 0;
- ndd = to_ndd(nd_mapping);
- for_each_label(l, nd_label, nd_mapping->labels) {
- u32 flags = __le32_to_cpu(nd_label->flags);
+ list_for_each_entry(label_ent, &nd_mapping->labels, list) {
+ struct nd_namespace_label *nd_label = label_ent->label;
char *name[NSLABEL_NAME_LEN];
struct device **__devs;
+ struct resource *res;
+ u32 flags;
+ if (!nd_label)
+ continue;
+ flags = __le32_to_cpu(nd_label->flags);
if (flags & NSLABEL_FLAG_LOCAL)
/* pass */;
else
@@ -1899,12 +1944,7 @@ static struct device **create_namespace_blk(struct nd_region *nd_region)
if (count == 0) {
/* Publish a zero-sized namespace for userspace to configure. */
- for (i = 0; i < nd_region->ndr_mappings; i++) {
- struct nd_mapping *nd_mapping = &nd_region->mapping[i];
-
- kfree(nd_mapping->labels);
- nd_mapping->labels = NULL;
- }
+ nd_mapping_free_labels(nd_mapping);
devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
if (!devs)
@@ -1920,8 +1960,8 @@ static struct device **create_namespace_blk(struct nd_region *nd_region)
return devs;
-err:
- for (i = 0; i < count; i++) {
+ err:
+ for (i = 0; devs[i]; i++) {
nsblk = to_nd_namespace_blk(devs[i]);
namespace_blk_release(&nsblk->common.dev);
}
@@ -1929,6 +1969,21 @@ err:
return NULL;
}
+static struct device **create_namespace_blk(struct nd_region *nd_region)
+{
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+ struct device **devs;
+
+ if (nd_region->ndr_mappings == 0)
+ return NULL;
+
+ mutex_lock(&nd_mapping->lock);
+ devs = scan_labels(nd_region, nd_mapping);
+ mutex_unlock(&nd_mapping->lock);
+
+ return devs;
+}
+
static int init_active_labels(struct nd_region *nd_region)
{
int i;
@@ -1937,6 +1992,7 @@ static int init_active_labels(struct nd_region *nd_region)
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
struct nvdimm *nvdimm = nd_mapping->nvdimm;
+ struct nd_label_ent *label_ent;
int count, j;
/*
@@ -1958,16 +2014,27 @@ static int init_active_labels(struct nd_region *nd_region)
dev_dbg(ndd->dev, "%s: %d\n", __func__, count);
if (!count)
continue;
- nd_mapping->labels = kcalloc(count + 1, sizeof(void *),
- GFP_KERNEL);
- if (!nd_mapping->labels)
- return -ENOMEM;
for (j = 0; j < count; j++) {
struct nd_namespace_label *label;
+ label_ent = kzalloc(sizeof(*label_ent), GFP_KERNEL);
+ if (!label_ent)
+ break;
label = nd_label_active(ndd, j);
- nd_mapping->labels[j] = label;
+ label_ent->label = label;
+
+ mutex_lock(&nd_mapping->lock);
+ list_add_tail(&label_ent->list, &nd_mapping->labels);
+ mutex_unlock(&nd_mapping->lock);
}
+
+ if (j >= count)
+ continue;
+
+ mutex_lock(&nd_mapping->lock);
+ nd_mapping_free_labels(nd_mapping);
+ mutex_unlock(&nd_mapping->lock);
+ return -ENOMEM;
}
return 0;
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 1414784c6c2b..fb3ade0d4a83 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -73,6 +73,7 @@ bool nd_is_uuid_unique(struct device *dev, u8 *uuid);
struct nd_region;
struct nvdimm_drvdata;
struct nd_mapping;
+void nd_mapping_free_labels(struct nd_mapping *nd_mapping);
resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
struct nd_mapping *nd_mapping, resource_size_t *overlap);
resource_size_t nd_blk_available_dpa(struct nd_mapping *nd_mapping);
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index e58c40824e1f..f67c61f1a8a4 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -83,9 +83,6 @@ static inline struct nd_namespace_index *to_next_namespace_index(
(unsigned long long) (res ? resource_size(res) : 0), \
(unsigned long long) (res ? res->start : 0), ##arg)
-#define for_each_label(l, label, labels) \
- for (l = 0; (label = labels ? labels[l] : NULL); l++)
-
#define for_each_dpa_resource(ndd, res) \
for (res = (ndd)->dpa.child; res; res = res->sibling)
@@ -98,11 +95,22 @@ struct nd_percpu_lane {
spinlock_t lock;
};
+struct nd_label_ent {
+ struct list_head list;
+ struct nd_namespace_label *label;
+};
+
+enum nd_mapping_lock_class {
+ ND_MAPPING_CLASS0,
+ ND_MAPPING_UUID_SCAN,
+};
+
struct nd_mapping {
struct nvdimm *nvdimm;
- struct nd_namespace_label **labels;
u64 start;
u64 size;
+ struct list_head labels;
+ struct mutex lock;
/*
* @ndd is for private use at region enable / disable time for
* get_ndd() + put_ndd(), all other nd_mapping to ndd
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 0ff43cbb15e3..19bcd68c4141 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -487,6 +487,17 @@ u64 nd_region_interleave_set_cookie(struct nd_region *nd_region)
return 0;
}
+void nd_mapping_free_labels(struct nd_mapping *nd_mapping)
+{
+ struct nd_label_ent *label_ent, *e;
+
+ WARN_ON(!mutex_is_locked(&nd_mapping->lock));
+ list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list) {
+ list_del(&label_ent->list);
+ kfree(label_ent);
+ }
+}
+
/*
* Upon successful probe/remove, take/release a reference on the
* associated interleave set (if present), and plant new btt + namespace
@@ -507,8 +518,10 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
struct nvdimm_drvdata *ndd = nd_mapping->ndd;
struct nvdimm *nvdimm = nd_mapping->nvdimm;
- kfree(nd_mapping->labels);
- nd_mapping->labels = NULL;
+ mutex_lock(&nd_mapping->lock);
+ nd_mapping_free_labels(nd_mapping);
+ mutex_unlock(&nd_mapping->lock);
+
put_ndd(ndd);
nd_mapping->ndd = NULL;
if (ndd)
@@ -816,6 +829,8 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
nd_region->mapping[i].nvdimm = nvdimm;
nd_region->mapping[i].start = mapping->start;
nd_region->mapping[i].size = mapping->size;
+ INIT_LIST_HEAD(&nd_region->mapping[i].labels);
+ mutex_init(&nd_region->mapping[i].lock);
get_device(&nvdimm->dev);
}
Add more determinism to initial namespace device-name assignments by
sorting the namespaces by starting dpa.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/namespace_devs.c | 35 ++++++++++++++++++++++++++++++++---
include/linux/nd.h | 6 +++---
2 files changed, 35 insertions(+), 6 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 47d29632b937..f0536c2789e9 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -12,6 +12,7 @@
*/
#include <linux/module.h>
#include <linux/device.h>
+#include <linux/sort.h>
#include <linux/slab.h>
#include <linux/pmem.h>
#include <linux/list.h>
@@ -66,17 +67,17 @@ static struct device_type namespace_blk_device_type = {
.release = namespace_blk_release,
};
-static bool is_namespace_pmem(struct device *dev)
+static bool is_namespace_pmem(const struct device *dev)
{
return dev ? dev->type == &namespace_pmem_device_type : false;
}
-static bool is_namespace_blk(struct device *dev)
+static bool is_namespace_blk(const struct device *dev)
{
return dev ? dev->type == &namespace_blk_device_type : false;
}
-static bool is_namespace_io(struct device *dev)
+static bool is_namespace_io(const struct device *dev)
{
return dev ? dev->type == &namespace_io_device_type : false;
}
@@ -1919,6 +1920,31 @@ struct device *create_namespace_blk(struct nd_region *nd_region,
return ERR_PTR(-ENXIO);
}
+static int cmp_dpa(const void *a, const void *b)
+{
+ const struct device *dev_a = *(const struct device **) a;
+ const struct device *dev_b = *(const struct device **) b;
+ struct nd_namespace_blk *nsblk_a, *nsblk_b;
+ struct nd_namespace_pmem *nspm_a, *nspm_b;
+
+ if (is_namespace_io(dev_a))
+ return 0;
+
+ if (is_namespace_blk(dev_a)) {
+ nsblk_a = to_nd_namespace_blk(dev_a);
+ nsblk_b = to_nd_namespace_blk(dev_b);
+
+ return memcmp(&nsblk_a->res[0]->start, &nsblk_b->res[0]->start,
+ sizeof(resource_size_t));
+ }
+
+ nspm_a = to_nd_namespace_pmem(dev_a);
+ nspm_b = to_nd_namespace_pmem(dev_b);
+
+ return memcmp(&nspm_a->nsio.res.start, &nspm_b->nsio.res.start,
+ sizeof(resource_size_t));
+}
+
static struct device **scan_labels(struct nd_region *nd_region)
{
struct nd_mapping *nd_mapping = &nd_region->mapping[0];
@@ -2034,6 +2060,9 @@ static struct device **scan_labels(struct nd_region *nd_region)
}
}
+ if (count > 1)
+ sort(devs, count, sizeof(struct device *), cmp_dpa, NULL);
+
return devs;
err:
diff --git a/include/linux/nd.h b/include/linux/nd.h
index ddcc7788305c..fa66aeed441a 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -107,19 +107,19 @@ struct nd_namespace_blk {
struct resource **res;
};
-static inline struct nd_namespace_io *to_nd_namespace_io(struct device *dev)
+static inline struct nd_namespace_io *to_nd_namespace_io(const struct device *dev)
{
return container_of(dev, struct nd_namespace_io, common.dev);
}
-static inline struct nd_namespace_pmem *to_nd_namespace_pmem(struct device *dev)
+static inline struct nd_namespace_pmem *to_nd_namespace_pmem(const struct device *dev)
{
struct nd_namespace_io *nsio = to_nd_namespace_io(dev);
return container_of(nsio, struct nd_namespace_pmem, nsio);
}
-static inline struct nd_namespace_blk *to_nd_namespace_blk(struct device *dev)
+static inline struct nd_namespace_blk *to_nd_namespace_blk(const struct device *dev)
{
return container_of(dev, struct nd_namespace_blk, common.dev);
}
If label scanning finds multiple valid pmem namespaces allow them to be
surfaced rather than fail namespace scanning. Support for creating
multiple namespaces per region is saved for a later patch.
Note that this adds some new error messages to clarify which of the pmem
namespaces in the set are potentially impacted by invalid labels.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/namespace_devs.c | 84 +++++++++++++++++++++++++++++++++------
include/linux/nd.h | 2 +
2 files changed, 74 insertions(+), 12 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index fbcadc7cb8fd..47d29632b937 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -29,7 +29,10 @@ static void namespace_io_release(struct device *dev)
static void namespace_pmem_release(struct device *dev)
{
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
+ struct nd_region *nd_region = to_nd_region(dev->parent);
+ if (nspm->id >= 0)
+ ida_simple_remove(&nd_region->ns_ida, nspm->id);
kfree(nspm->alt_name);
kfree(nspm->uuid);
kfree(nspm);
@@ -833,13 +836,45 @@ static int grow_dpa_allocation(struct nd_region *nd_region,
return 0;
}
-static void nd_namespace_pmem_set_size(struct nd_region *nd_region,
+static void nd_namespace_pmem_set_resource(struct nd_region *nd_region,
struct nd_namespace_pmem *nspm, resource_size_t size)
{
struct resource *res = &nspm->nsio.res;
+ resource_size_t offset = 0;
- res->start = nd_region->ndr_start;
- res->end = nd_region->ndr_start + size - 1;
+ if (size && !nspm->uuid) {
+ WARN_ON_ONCE(1);
+ size = 0;
+ }
+
+ if (size && nspm->uuid) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct nd_label_id label_id;
+ struct resource *res;
+
+ if (!ndd) {
+ size = 0;
+ goto out;
+ }
+
+ nd_label_gen_id(&label_id, nspm->uuid, 0);
+
+ /* calculate a spa offset from the dpa allocation offset */
+ for_each_dpa_resource(ndd, res)
+ if (strcmp(res->name, label_id.id) == 0) {
+ offset = (res->start - nd_mapping->start)
+ * nd_region->ndr_mappings;
+ goto out;
+ }
+
+ WARN_ON_ONCE(1);
+ size = 0;
+ }
+
+ out:
+ res->start = nd_region->ndr_start + offset;
+ res->end = res->start + size - 1;
}
static bool uuid_not_set(const u8 *uuid, struct device *dev, const char *where)
@@ -930,7 +965,7 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
if (is_namespace_pmem(dev)) {
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
- nd_namespace_pmem_set_size(nd_region, nspm,
+ nd_namespace_pmem_set_resource(nd_region, nspm,
val * nd_region->ndr_mappings);
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
@@ -1546,6 +1581,7 @@ static int select_pmem_id(struct nd_region *nd_region, u8 *pmem_id)
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
struct nd_namespace_label *nd_label = NULL;
u64 hw_start, hw_end, pmem_start, pmem_end;
struct nd_label_ent *label_ent;
@@ -1573,10 +1609,14 @@ static int select_pmem_id(struct nd_region *nd_region, u8 *pmem_id)
hw_end = hw_start + nd_mapping->size;
pmem_start = __le64_to_cpu(nd_label->dpa);
pmem_end = pmem_start + __le64_to_cpu(nd_label->rawsize);
- if (pmem_start == hw_start && pmem_end <= hw_end)
+ if (pmem_start >= hw_start && pmem_start < hw_end
+ && pmem_end <= hw_end && pmem_end > hw_start)
/* pass */;
- else
+ else {
+ dev_dbg(&nd_region->dev, "%s invalid label for %pUb\n",
+ dev_name(ndd->dev), nd_label->uuid);
return -EINVAL;
+ }
/* move recently validated label to the front of the list */
list_move(&label_ent->list, &nd_mapping->labels);
@@ -1618,6 +1658,7 @@ struct device *create_namespace_pmem(struct nd_region *nd_region,
if (!nspm)
return ERR_PTR(-ENOMEM);
+ nspm->id = -1;
dev = &nspm->nsio.common.dev;
dev->type = &namespace_pmem_device_type;
dev->parent = &nd_region->dev;
@@ -1629,11 +1670,15 @@ struct device *create_namespace_pmem(struct nd_region *nd_region,
if (!has_uuid_at_pos(nd_region, nd_label->uuid, cookie, i))
break;
if (i < nd_region->ndr_mappings) {
+ struct nvdimm_drvdata *ndd = to_ndd(&nd_region->mapping[i]);
+
/*
* Give up if we don't find an instance of a uuid at each
* position (from 0 to nd_region->ndr_mappings - 1), or if we
* find a dimm with two instances of the same uuid.
*/
+ dev_err(&nd_region->dev, "%s missing label for %pUb\n",
+ dev_name(ndd->dev), nd_label->uuid);
rc = -EINVAL;
goto err;
}
@@ -1679,7 +1724,7 @@ struct device *create_namespace_pmem(struct nd_region *nd_region,
goto err;
}
- nd_namespace_pmem_set_size(nd_region, nspm, size);
+ nd_namespace_pmem_set_resource(nd_region, nspm, size);
return dev;
err:
@@ -1961,23 +2006,31 @@ static struct device **scan_labels(struct nd_region *nd_region)
goto err;
dev = &nspm->nsio.common.dev;
dev->type = &namespace_pmem_device_type;
- nd_namespace_pmem_set_size(nd_region, nspm, 0);
+ nd_namespace_pmem_set_resource(nd_region, nspm, 0);
}
dev->parent = &nd_region->dev;
devs[count++] = dev;
} else if (is_nd_pmem(&nd_region->dev)) {
/* clean unselected labels */
for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct list_head *l, *e;
+ LIST_HEAD(list);
+ int j;
+
nd_mapping = &nd_region->mapping[i];
if (list_empty(&nd_mapping->labels)) {
WARN_ON(1);
continue;
}
- label_ent = list_first_entry(&nd_mapping->labels,
- typeof(*label_ent), list);
- list_del(&label_ent->list);
+
+ j = count;
+ list_for_each_safe(l, e, &nd_mapping->labels) {
+ if (!j--)
+ break;
+ list_move_tail(l, &list);
+ }
nd_mapping_free_labels(nd_mapping);
- list_add(&label_ent->list, &nd_mapping->labels);
+ list_splice_init(&list, &nd_mapping->labels);
}
}
@@ -2117,6 +2170,13 @@ int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
id = ida_simple_get(&nd_region->ns_ida, 0, 0,
GFP_KERNEL);
nsblk->id = id;
+ } else if (type == ND_DEVICE_NAMESPACE_PMEM) {
+ struct nd_namespace_pmem *nspm;
+
+ nspm = to_nd_namespace_pmem(dev);
+ id = ida_simple_get(&nd_region->ns_ida, 0, 0,
+ GFP_KERNEL);
+ nspm->id = id;
} else
id = i;
diff --git a/include/linux/nd.h b/include/linux/nd.h
index f1ea426d6a5e..ddcc7788305c 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -77,11 +77,13 @@ struct nd_namespace_io {
* @nsio: device and system physical address range to drive
* @alt_name: namespace name supplied in the dimm label
* @uuid: namespace name supplied in the dimm label
+ * @id: ida allocated id
*/
struct nd_namespace_pmem {
struct nd_namespace_io nsio;
char *alt_name;
u8 *uuid;
+ int id;
};
/**
Now that we have nd_region_available_dpa() able to handle the presence
of multiple PMEM allocations in aliased PMEM regions, reuse that same
infrastructure to track allocations from free space. In particular
handle allocating from an aliased PMEM region in the case where there
are dis-contiguous holes. The allocation for BLK and PMEM are
documented in the space_valid() helper:
BLK-space is valid as long as it does not precede a PMEM
allocation in a given region. PMEM-space must be contiguous
and adjacent to an existing existing allocation (if one
exists).
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/dimm_devs.c | 32 ++++++++--
drivers/nvdimm/namespace_devs.c | 128 +++++++++++++++++++++++++++------------
drivers/nvdimm/nd-core.h | 18 +++++
3 files changed, 133 insertions(+), 45 deletions(-)
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 4b0296ccb375..d614493ad5ac 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -386,13 +386,7 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
}
EXPORT_SYMBOL_GPL(nvdimm_create);
-struct blk_alloc_info {
- struct nd_mapping *nd_mapping;
- resource_size_t available, busy;
- struct resource *res;
-};
-
-static int alias_dpa_busy(struct device *dev, void *data)
+int alias_dpa_busy(struct device *dev, void *data)
{
resource_size_t map_end, blk_start, new, busy;
struct blk_alloc_info *info = data;
@@ -418,6 +412,20 @@ static int alias_dpa_busy(struct device *dev, void *data)
ndd = to_ndd(nd_mapping);
map_end = nd_mapping->start + nd_mapping->size - 1;
blk_start = nd_mapping->start;
+
+ /*
+ * In the allocation case ->res is set to free space that we are
+ * looking to validate against PMEM aliasing collision rules
+ * (i.e. BLK is allocated after all aliased PMEM).
+ */
+ if (info->res) {
+ if (info->res->start >= nd_mapping->start
+ && info->res->start < map_end)
+ /* pass */;
+ else
+ return 0;
+ }
+
retry:
/*
* Find the free dpa from the end of the last pmem allocation to
@@ -447,7 +455,16 @@ static int alias_dpa_busy(struct device *dev, void *data)
}
}
+ /* update the free space range with the probed blk_start */
+ if (info->res && blk_start > info->res->start) {
+ info->res->start = max(info->res->start, blk_start);
+ if (info->res->start > info->res->end)
+ info->res->end = info->res->start - 1;
+ return 1;
+ }
+
info->available -= blk_start - nd_mapping->start + busy;
+
return 0;
}
@@ -508,6 +525,7 @@ resource_size_t nd_blk_available_dpa(struct nd_region *nd_region)
struct blk_alloc_info info = {
.nd_mapping = nd_mapping,
.available = nd_mapping->size,
+ .res = NULL,
};
struct resource *res;
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 132c5b8b5366..81451c74b01c 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -529,19 +529,68 @@ static resource_size_t init_dpa_allocation(struct nd_label_id *label_id,
return rc ? n : 0;
}
-static bool space_valid(bool is_pmem, bool is_reserve,
- struct nd_label_id *label_id, struct resource *res)
+
+/**
+ * space_valid() - validate free dpa space against constraints
+ * @nd_region: hosting region of the free space
+ * @ndd: dimm device data for debug
+ * @label_id: namespace id to allocate space
+ * @prev: potential allocation that precedes free space
+ * @next: allocation that follows the given free space range
+ * @exist: first allocation with same id in the mapping
+ * @n: range that must satisfied for pmem allocations
+ * @valid: free space range to validate
+ *
+ * BLK-space is valid as long as it does not precede a PMEM
+ * allocation in a given region. PMEM-space must be contiguous
+ * and adjacent to an existing existing allocation (if one
+ * exists). If reserving PMEM any space is valid.
+ */
+static void space_valid(struct nd_region *nd_region, struct nvdimm_drvdata *ndd,
+ struct nd_label_id *label_id, struct resource *prev,
+ struct resource *next, struct resource *exist,
+ resource_size_t n, struct resource *valid)
{
- /*
- * For BLK-space any space is valid, for PMEM-space, it must be
- * contiguous with an existing allocation unless we are
- * reserving pmem.
- */
- if (is_reserve || !is_pmem)
- return true;
- if (!res || strcmp(res->name, label_id->id) == 0)
- return true;
- return false;
+ bool is_reserve = strcmp(label_id->id, "pmem-reserve") == 0;
+ bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0;
+
+ if (valid->start >= valid->end)
+ goto invalid;
+
+ if (is_reserve)
+ return;
+
+ if (!is_pmem) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+ struct nvdimm_bus *nvdimm_bus;
+ struct blk_alloc_info info = {
+ .nd_mapping = nd_mapping,
+ .available = nd_mapping->size,
+ .res = valid,
+ };
+
+ WARN_ON(!is_nd_blk(&nd_region->dev));
+ nvdimm_bus = walk_to_nvdimm_bus(&nd_region->dev);
+ device_for_each_child(&nvdimm_bus->dev, &info, alias_dpa_busy);
+ return;
+ }
+
+ /* allocation needs to be contiguous, so this is all or nothing */
+ if (resource_size(valid) < n)
+ goto invalid;
+
+ /* we've got all the space we need and no existing allocation */
+ if (!exist)
+ return;
+
+ /* allocation needs to be contiguous with the existing namespace */
+ if (valid->start == exist->end + 1
+ || valid->end == exist->start - 1)
+ return;
+
+ invalid:
+ /* truncate @valid size to 0 */
+ valid->end = valid->start - 1;
}
enum alloc_loc {
@@ -553,18 +602,24 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
resource_size_t n)
{
resource_size_t mapping_end = nd_mapping->start + nd_mapping->size - 1;
- bool is_reserve = strcmp(label_id->id, "pmem-reserve") == 0;
bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct resource *res, *exist = NULL, valid;
const resource_size_t to_allocate = n;
- struct resource *res;
int first;
+ for_each_dpa_resource(ndd, res)
+ if (strcmp(label_id->id, res->name) == 0)
+ exist = res;
+
+ valid.start = nd_mapping->start;
+ valid.end = mapping_end;
+ valid.name = "free space";
retry:
first = 0;
for_each_dpa_resource(ndd, res) {
- resource_size_t allocate, available = 0, free_start, free_end;
struct resource *next = res->sibling, *new_res = NULL;
+ resource_size_t allocate, available = 0;
enum alloc_loc loc = ALLOC_ERR;
const char *action;
int rc = 0;
@@ -577,32 +632,35 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
/* space at the beginning of the mapping */
if (!first++ && res->start > nd_mapping->start) {
- free_start = nd_mapping->start;
- available = res->start - free_start;
- if (space_valid(is_pmem, is_reserve, label_id, NULL))
+ valid.start = nd_mapping->start;
+ valid.end = res->start - 1;
+ space_valid(nd_region, ndd, label_id, NULL, next, exist,
+ to_allocate, &valid);
+ available = resource_size(&valid);
+ if (available)
loc = ALLOC_BEFORE;
}
/* space between allocations */
if (!loc && next) {
- free_start = res->start + resource_size(res);
- free_end = min(mapping_end, next->start - 1);
- if (space_valid(is_pmem, is_reserve, label_id, res)
- && free_start < free_end) {
- available = free_end + 1 - free_start;
+ valid.start = res->start + resource_size(res);
+ valid.end = min(mapping_end, next->start - 1);
+ space_valid(nd_region, ndd, label_id, res, next, exist,
+ to_allocate, &valid);
+ available = resource_size(&valid);
+ if (available)
loc = ALLOC_MID;
- }
}
/* space at the end of the mapping */
if (!loc && !next) {
- free_start = res->start + resource_size(res);
- free_end = mapping_end;
- if (space_valid(is_pmem, is_reserve, label_id, res)
- && free_start < free_end) {
- available = free_end + 1 - free_start;
+ valid.start = res->start + resource_size(res);
+ valid.end = mapping_end;
+ space_valid(nd_region, ndd, label_id, res, next, exist,
+ to_allocate, &valid);
+ available = resource_size(&valid);
+ if (available)
loc = ALLOC_AFTER;
- }
}
if (!loc || !available)
@@ -612,8 +670,6 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
case ALLOC_BEFORE:
if (strcmp(res->name, label_id->id) == 0) {
/* adjust current resource up */
- if (is_pmem && !is_reserve)
- return n;
rc = adjust_resource(res, res->start - allocate,
resource_size(res) + allocate);
action = "cur grow up";
@@ -623,8 +679,6 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
case ALLOC_MID:
if (strcmp(next->name, label_id->id) == 0) {
/* adjust next resource up */
- if (is_pmem && !is_reserve)
- return n;
rc = adjust_resource(next, next->start
- allocate, resource_size(next)
+ allocate);
@@ -648,12 +702,10 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
if (strcmp(action, "allocate") == 0) {
/* BLK allocate bottom up */
if (!is_pmem)
- free_start += available - allocate;
- else if (!is_reserve && free_start != nd_mapping->start)
- return n;
+ valid.start += available - allocate;
new_res = nvdimm_allocate_dpa(ndd, label_id,
- free_start, allocate);
+ valid.start, allocate);
if (!new_res)
rc = -EBUSY;
} else if (strcmp(action, "grow down") == 0) {
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 7c2196a1d56f..3ba0b96ce7de 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -44,6 +44,23 @@ struct nvdimm {
struct resource *flush_wpq;
};
+/**
+ * struct blk_alloc_info - tracking info for BLK dpa scanning
+ * @nd_mapping: blk region mapping boundaries
+ * @available: decremented in alias_dpa_busy as aliased PMEM is scanned
+ * @busy: decremented in blk_dpa_busy to account for ranges already
+ * handled by alias_dpa_busy
+ * @res: alias_dpa_busy interprets this a free space range that needs to
+ * be truncated to the valid BLK allocation starting DPA, blk_dpa_busy
+ * treats it as a busy range that needs the aliased PMEM ranges
+ * truncated.
+ */
+struct blk_alloc_info {
+ struct nd_mapping *nd_mapping;
+ resource_size_t available, busy;
+ struct resource *res;
+};
+
bool is_nvdimm(struct device *dev);
bool is_nd_pmem(struct device *dev);
bool is_nd_blk(struct device *dev);
@@ -80,6 +97,7 @@ resource_size_t nd_blk_available_dpa(struct nd_region *nd_region);
resource_size_t nd_region_available_dpa(struct nd_region *nd_region);
resource_size_t nvdimm_allocated_dpa(struct nvdimm_drvdata *ndd,
struct nd_label_id *label_id);
+int alias_dpa_busy(struct device *dev, void *data);
struct resource *nsblk_add_resource(struct nd_region *nd_region,
struct nvdimm_drvdata *ndd, struct nd_namespace_blk *nsblk,
resource_size_t start);
Instead of assuming that there will only ever be one allocated range at
the start of the region, account for additional namespaces that might
start at an offset from the region base.
After this change pmem namespaces now have a reason to carry an array of
resources similar to blk. Unifying the resource tracking infrastructure
in nd_namespace_common is a future cleanup candidate.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/label.c | 72 +++++++++++++++++++++++++++++++++++-------------
1 file changed, 53 insertions(+), 19 deletions(-)
diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index c37357210428..fac7cabe8f56 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -494,12 +494,13 @@ static int __pmem_label_update(struct nd_region *nd_region,
struct nd_mapping *nd_mapping, struct nd_namespace_pmem *nspm,
int pos)
{
- u64 cookie = nd_region_interleave_set_cookie(nd_region), rawsize;
+ u64 cookie = nd_region_interleave_set_cookie(nd_region);
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
- struct nd_namespace_label *victim_label;
+ struct nd_label_ent *label_ent, *victim = NULL;
struct nd_namespace_label *nd_label;
struct nd_namespace_index *nsindex;
- struct nd_label_ent *label_ent;
+ struct nd_label_id label_id;
+ struct resource *res;
unsigned long *free;
u32 nslot, slot;
size_t offset;
@@ -508,6 +509,16 @@ static int __pmem_label_update(struct nd_region *nd_region,
if (!preamble_next(ndd, &nsindex, &free, &nslot))
return -ENXIO;
+ nd_label_gen_id(&label_id, nspm->uuid, 0);
+ for_each_dpa_resource(ndd, res)
+ if (strcmp(res->name, label_id.id) == 0)
+ break;
+
+ if (!res) {
+ WARN_ON_ONCE(1);
+ return -ENXIO;
+ }
+
/* allocate and write the label to the staging (next) index */
slot = nd_label_alloc_slot(ndd);
if (slot == UINT_MAX)
@@ -523,11 +534,10 @@ static int __pmem_label_update(struct nd_region *nd_region,
nd_label->nlabel = __cpu_to_le16(nd_region->ndr_mappings);
nd_label->position = __cpu_to_le16(pos);
nd_label->isetcookie = __cpu_to_le64(cookie);
- rawsize = div_u64(resource_size(&nspm->nsio.res),
- nd_region->ndr_mappings);
- nd_label->rawsize = __cpu_to_le64(rawsize);
- nd_label->dpa = __cpu_to_le64(nd_mapping->start);
+ nd_label->rawsize = __cpu_to_le64(resource_size(res));
+ nd_label->dpa = __cpu_to_le64(res->start);
nd_label->slot = __cpu_to_le32(slot);
+ nd_dbg_dpa(nd_region, ndd, res, "%s\n", __func__);
/* update label */
offset = nd_label_offset(ndd, nd_label);
@@ -538,22 +548,39 @@ static int __pmem_label_update(struct nd_region *nd_region,
/* Garbage collect the previous label */
mutex_lock(&nd_mapping->lock);
- label_ent = list_first_entry_or_null(&nd_mapping->labels,
- typeof(*label_ent), list);
- WARN_ON(!label_ent);
- victim_label = label_ent ? label_ent->label : NULL;
- if (victim_label) {
- label_ent->label = NULL;
- slot = to_slot(ndd, victim_label);
- nd_label_free_slot(ndd, slot);
+ list_for_each_entry(label_ent, &nd_mapping->labels, list) {
+ if (!label_ent->label)
+ continue;
+ if (memcmp(nspm->uuid, label_ent->label->uuid,
+ NSLABEL_UUID_LEN) != 0)
+ continue;
+ victim = label_ent;
+ list_move_tail(&victim->list, &nd_mapping->labels);
+ break;
+ }
+ if (victim) {
dev_dbg(ndd->dev, "%s: free: %d\n", __func__, slot);
+ slot = to_slot(ndd, victim->label);
+ nd_label_free_slot(ndd, slot);
+ victim->label = NULL;
}
/* update index */
rc = nd_label_write_index(ndd, ndd->ns_next,
nd_inc_seq(__le32_to_cpu(nsindex->seq)), 0);
- if (rc == 0 && label_ent)
- label_ent->label = nd_label;
+ if (rc == 0) {
+ list_for_each_entry(label_ent, &nd_mapping->labels, list)
+ if (!label_ent->label) {
+ label_ent->label = nd_label;
+ nd_label = NULL;
+ break;
+ }
+ dev_WARN_ONCE(&nspm->nsio.common.dev, nd_label,
+ "failed to track label: %d\n",
+ to_slot(ndd, nd_label));
+ if (nd_label)
+ rc = -ENXIO;
+ }
mutex_unlock(&nd_mapping->lock);
return rc;
@@ -899,7 +926,9 @@ int nd_pmem_namespace_label_update(struct nd_region *nd_region,
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
- int rc;
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ struct resource *res;
+ int rc, count = 0;
if (size == 0) {
rc = del_labels(nd_mapping, nspm->uuid);
@@ -908,7 +937,12 @@ int nd_pmem_namespace_label_update(struct nd_region *nd_region,
continue;
}
- rc = init_labels(nd_mapping, 1);
+ for_each_dpa_resource(ndd, res)
+ if (strncmp(res->name, "pmem", 3) == 0)
+ count++;
+ WARN_ON_ONCE(!count);
+
+ rc = init_labels(nd_mapping, count);
if (rc < 0)
return rc;
Short-circuit doomed-to-fail label validation attempts by skipping
labels that are outside the given region. For example a DIMM that has
multiple PMEM regions will waste time attempting to create namespaces
only to find that the interleave-set-cookie does not validate, e.g.:
nd_region region6: invalid cookie in label: 73e608dc-47b9-4b2a-b5c7-2d55a32e0c2
Similar to how we skip BLK labels when performing PMEM validation we can
skip out-of-range labels early.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/namespace_devs.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 81451c74b01c..54babc3a80ca 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2013,10 +2013,11 @@ static int cmp_dpa(const void *a, const void *b)
static struct device **scan_labels(struct nd_region *nd_region)
{
- struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+ int i, count = 0;
struct device *dev, **devs = NULL;
struct nd_label_ent *label_ent, *e;
- int i, count = 0;
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+ resource_size_t map_end = nd_mapping->start + nd_mapping->size - 1;
/* "safe" because create_namespace_pmem() might list_move() label_ent */
list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list) {
@@ -2033,6 +2034,10 @@ static struct device **scan_labels(struct nd_region *nd_region)
else
continue;
+ /* skip labels that describe extents outside of the region */
+ if (nd_label->dpa < nd_mapping->start || nd_label->dpa > map_end)
+ continue;
+
i = add_namespace_resource(nd_region, nd_label, devs, count);
if (i < 0)
goto err;
Update nfit_test to handle multiple sub-allocations within a given pmem
region. The mock resource now tracks and un-tracks sub-ranges as they
are requested and released (either explicitly or via devm callback).
Signed-off-by: Dan Williams <[email protected]>
---
tools/testing/nvdimm/test/iomap.c | 134 ++++++++++++++++++++++++++-------
tools/testing/nvdimm/test/nfit.c | 21 ++---
tools/testing/nvdimm/test/nfit_test.h | 12 +++
3 files changed, 124 insertions(+), 43 deletions(-)
diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
index dae5b9b6d186..3ccef732fce9 100644
--- a/tools/testing/nvdimm/test/iomap.c
+++ b/tools/testing/nvdimm/test/iomap.c
@@ -74,7 +74,7 @@ void __iomem *__nfit_test_ioremap(resource_size_t offset, unsigned long size,
if (nfit_res)
return (void __iomem *) nfit_res->buf + offset
- - nfit_res->res->start;
+ - nfit_res->res.start;
return fallback_fn(offset, size);
}
@@ -85,7 +85,7 @@ void __iomem *__wrap_devm_ioremap_nocache(struct device *dev,
if (nfit_res)
return (void __iomem *) nfit_res->buf + offset
- - nfit_res->res->start;
+ - nfit_res->res.start;
return devm_ioremap_nocache(dev, offset, size);
}
EXPORT_SYMBOL(__wrap_devm_ioremap_nocache);
@@ -96,7 +96,7 @@ void *__wrap_devm_memremap(struct device *dev, resource_size_t offset,
struct nfit_test_resource *nfit_res = get_nfit_res(offset);
if (nfit_res)
- return nfit_res->buf + offset - nfit_res->res->start;
+ return nfit_res->buf + offset - nfit_res->res.start;
return devm_memremap(dev, offset, size, flags);
}
EXPORT_SYMBOL(__wrap_devm_memremap);
@@ -108,7 +108,7 @@ void *__wrap_devm_memremap_pages(struct device *dev, struct resource *res,
struct nfit_test_resource *nfit_res = get_nfit_res(offset);
if (nfit_res)
- return nfit_res->buf + offset - nfit_res->res->start;
+ return nfit_res->buf + offset - nfit_res->res.start;
return devm_memremap_pages(dev, res, ref, altmap);
}
EXPORT_SYMBOL(__wrap_devm_memremap_pages);
@@ -129,7 +129,7 @@ void *__wrap_memremap(resource_size_t offset, size_t size,
struct nfit_test_resource *nfit_res = get_nfit_res(offset);
if (nfit_res)
- return nfit_res->buf + offset - nfit_res->res->start;
+ return nfit_res->buf + offset - nfit_res->res.start;
return memremap(offset, size, flags);
}
EXPORT_SYMBOL(__wrap_memremap);
@@ -175,6 +175,63 @@ void __wrap_memunmap(void *addr)
}
EXPORT_SYMBOL(__wrap_memunmap);
+static bool nfit_test_release_region(struct device *dev,
+ struct resource *parent, resource_size_t start,
+ resource_size_t n);
+
+static void nfit_devres_release(struct device *dev, void *data)
+{
+ struct resource *res = *((struct resource **) data);
+
+ WARN_ON(!nfit_test_release_region(NULL, &iomem_resource, res->start,
+ resource_size(res)));
+}
+
+static int match(struct device *dev, void *__res, void *match_data)
+{
+ struct resource *res = *((struct resource **) __res);
+ resource_size_t start = *((resource_size_t *) match_data);
+
+ return res->start == start;
+}
+
+static bool nfit_test_release_region(struct device *dev,
+ struct resource *parent, resource_size_t start,
+ resource_size_t n)
+{
+ if (parent == &iomem_resource) {
+ struct nfit_test_resource *nfit_res = get_nfit_res(start);
+
+ if (nfit_res) {
+ struct nfit_test_request *req;
+ struct resource *res = NULL;
+
+ if (dev) {
+ devres_release(dev, nfit_devres_release, match,
+ &start);
+ return true;
+ }
+
+ spin_lock(&nfit_res->lock);
+ list_for_each_entry(req, &nfit_res->requests, list)
+ if (req->res.start == start) {
+ res = &req->res;
+ list_del(&req->list);
+ break;
+ }
+ spin_unlock(&nfit_res->lock);
+
+ WARN(!res || resource_size(res) != n,
+ "%s: start: %llx n: %llx mismatch: %pr\n",
+ __func__, start, n, res);
+ if (res)
+ kfree(req);
+ return true;
+ }
+ }
+ return false;
+}
+
static struct resource *nfit_test_request_region(struct device *dev,
struct resource *parent, resource_size_t start,
resource_size_t n, const char *name, int flags)
@@ -184,21 +241,57 @@ static struct resource *nfit_test_request_region(struct device *dev,
if (parent == &iomem_resource) {
nfit_res = get_nfit_res(start);
if (nfit_res) {
- struct resource *res = nfit_res->res + 1;
+ struct nfit_test_request *req;
+ struct resource *res = NULL;
- if (start + n > nfit_res->res->start
- + resource_size(nfit_res->res)) {
+ if (start + n > nfit_res->res.start
+ + resource_size(&nfit_res->res)) {
pr_debug("%s: start: %llx n: %llx overflow: %pr\n",
__func__, start, n,
- nfit_res->res);
+ &nfit_res->res);
+ return NULL;
+ }
+
+ spin_lock(&nfit_res->lock);
+ list_for_each_entry(req, &nfit_res->requests, list)
+ if (start == req->res.start) {
+ res = &req->res;
+ break;
+ }
+ spin_unlock(&nfit_res->lock);
+
+ if (res) {
+ WARN(1, "%pr already busy\n", res);
return NULL;
}
+ req = kzalloc(sizeof(*req), GFP_KERNEL);
+ if (!req)
+ return NULL;
+ INIT_LIST_HEAD(&req->list);
+ res = &req->res;
+
res->start = start;
res->end = start + n - 1;
res->name = name;
res->flags = resource_type(parent);
res->flags |= IORESOURCE_BUSY | flags;
+ spin_lock(&nfit_res->lock);
+ list_add(&req->list, &nfit_res->requests);
+ spin_unlock(&nfit_res->lock);
+
+ if (dev) {
+ struct resource **d;
+
+ d = devres_alloc(nfit_devres_release,
+ sizeof(struct resource *),
+ GFP_KERNEL);
+ if (!d)
+ return NULL;
+ *d = res;
+ devres_add(dev, d);
+ }
+
pr_debug("%s: %pr\n", __func__, res);
return res;
}
@@ -242,29 +335,10 @@ struct resource *__wrap___devm_request_region(struct device *dev,
}
EXPORT_SYMBOL(__wrap___devm_request_region);
-static bool nfit_test_release_region(struct resource *parent,
- resource_size_t start, resource_size_t n)
-{
- if (parent == &iomem_resource) {
- struct nfit_test_resource *nfit_res = get_nfit_res(start);
- if (nfit_res) {
- struct resource *res = nfit_res->res + 1;
-
- if (start != res->start || resource_size(res) != n)
- pr_info("%s: start: %llx n: %llx mismatch: %pr\n",
- __func__, start, n, res);
- else
- memset(res, 0, sizeof(*res));
- return true;
- }
- }
- return false;
-}
-
void __wrap___release_region(struct resource *parent, resource_size_t start,
resource_size_t n)
{
- if (!nfit_test_release_region(parent, start, n))
+ if (!nfit_test_release_region(NULL, parent, start, n))
__release_region(parent, start, n);
}
EXPORT_SYMBOL(__wrap___release_region);
@@ -272,7 +346,7 @@ EXPORT_SYMBOL(__wrap___release_region);
void __wrap___devm_release_region(struct device *dev, struct resource *parent,
resource_size_t start, resource_size_t n)
{
- if (!nfit_test_release_region(parent, start, n))
+ if (!nfit_test_release_region(dev, parent, start, n))
__devm_release_region(dev, parent, start, n);
}
EXPORT_SYMBOL(__wrap___devm_release_region);
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 175fc24f8f3a..0e721c6fb1cf 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -478,14 +478,12 @@ static struct nfit_test *instances[NUM_NFITS];
static void release_nfit_res(void *data)
{
struct nfit_test_resource *nfit_res = data;
- struct resource *res = nfit_res->res;
spin_lock(&nfit_test_lock);
list_del(&nfit_res->list);
spin_unlock(&nfit_test_lock);
vfree(nfit_res->buf);
- kfree(res);
kfree(nfit_res);
}
@@ -493,12 +491,11 @@ static void *__test_alloc(struct nfit_test *t, size_t size, dma_addr_t *dma,
void *buf)
{
struct device *dev = &t->pdev.dev;
- struct resource *res = kzalloc(sizeof(*res) * 2, GFP_KERNEL);
struct nfit_test_resource *nfit_res = kzalloc(sizeof(*nfit_res),
GFP_KERNEL);
int rc;
- if (!res || !buf || !nfit_res)
+ if (!buf || !nfit_res)
goto err;
rc = devm_add_action(dev, release_nfit_res, nfit_res);
if (rc)
@@ -507,10 +504,11 @@ static void *__test_alloc(struct nfit_test *t, size_t size, dma_addr_t *dma,
memset(buf, 0, size);
nfit_res->dev = dev;
nfit_res->buf = buf;
- nfit_res->res = res;
- res->start = *dma;
- res->end = *dma + size - 1;
- res->name = "NFIT";
+ nfit_res->res.start = *dma;
+ nfit_res->res.end = *dma + size - 1;
+ nfit_res->res.name = "NFIT";
+ spin_lock_init(&nfit_res->lock);
+ INIT_LIST_HEAD(&nfit_res->requests);
spin_lock(&nfit_test_lock);
list_add(&nfit_res->list, &t->resources);
spin_unlock(&nfit_test_lock);
@@ -519,7 +517,6 @@ static void *__test_alloc(struct nfit_test *t, size_t size, dma_addr_t *dma,
err:
if (buf)
vfree(buf);
- kfree(res);
kfree(nfit_res);
return NULL;
}
@@ -544,13 +541,13 @@ static struct nfit_test_resource *nfit_test_lookup(resource_size_t addr)
continue;
spin_lock(&nfit_test_lock);
list_for_each_entry(n, &t->resources, list) {
- if (addr >= n->res->start && (addr < n->res->start
- + resource_size(n->res))) {
+ if (addr >= n->res.start && (addr < n->res.start
+ + resource_size(&n->res))) {
nfit_res = n;
break;
} else if (addr >= (unsigned long) n->buf
&& (addr < (unsigned long) n->buf
- + resource_size(n->res))) {
+ + resource_size(&n->res))) {
nfit_res = n;
break;
}
diff --git a/tools/testing/nvdimm/test/nfit_test.h b/tools/testing/nvdimm/test/nfit_test.h
index 9f18e2a4a862..c281dd2e5e2d 100644
--- a/tools/testing/nvdimm/test/nfit_test.h
+++ b/tools/testing/nvdimm/test/nfit_test.h
@@ -13,11 +13,21 @@
#ifndef __NFIT_TEST_H__
#define __NFIT_TEST_H__
#include <linux/list.h>
+#include <linux/ioport.h>
+#include <linux/spinlock_types.h>
+
+struct nfit_test_request {
+ struct list_head list;
+ struct resource res;
+};
struct nfit_test_resource {
+ struct list_head requests;
struct list_head list;
- struct resource *res;
+ struct resource res;
struct device *dev;
+ spinlock_t lock;
+ int req_count;
void *buf;
};
Similar to BLK regions, publish new seed namespace devices to allow
unused PMEM region capacity to be consumed by additional namespaces.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/namespace_devs.c | 48 +++++++++++++++++++++++++++++++++++++--
drivers/nvdimm/nd-core.h | 2 +-
drivers/nvdimm/region_devs.c | 18 +++++++++++----
3 files changed, 59 insertions(+), 9 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index fa51d751ccf7..3509cff68ef9 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1860,16 +1860,58 @@ static struct device *nd_namespace_blk_create(struct nd_region *nd_region)
return &nsblk->common.dev;
}
-void nd_region_create_blk_seed(struct nd_region *nd_region)
+static struct device *nd_namespace_pmem_create(struct nd_region *nd_region)
+{
+ struct nd_namespace_pmem *nspm;
+ struct resource *res;
+ struct device *dev;
+
+ if (!is_nd_pmem(&nd_region->dev))
+ return NULL;
+
+ nspm = kzalloc(sizeof(*nspm), GFP_KERNEL);
+ if (!nspm)
+ return NULL;
+
+ dev = &nspm->nsio.common.dev;
+ dev->type = &namespace_pmem_device_type;
+ dev->parent = &nd_region->dev;
+ res = &nspm->nsio.res;
+ res->name = dev_name(&nd_region->dev);
+ res->flags = IORESOURCE_MEM;
+
+ nspm->id = ida_simple_get(&nd_region->ns_ida, 0, 0, GFP_KERNEL);
+ if (nspm->id < 0) {
+ kfree(nspm);
+ return NULL;
+ }
+ dev_set_name(dev, "namespace%d.%d", nd_region->id, nspm->id);
+ dev->parent = &nd_region->dev;
+ dev->groups = nd_namespace_attribute_groups;
+ nd_namespace_pmem_set_resource(nd_region, nspm, 0);
+
+ return dev;
+}
+
+void nd_region_create_ns_seed(struct nd_region *nd_region)
{
WARN_ON(!is_nvdimm_bus_locked(&nd_region->dev));
- nd_region->ns_seed = nd_namespace_blk_create(nd_region);
+
+ if (nd_region_to_nstype(nd_region) == ND_DEVICE_NAMESPACE_IO)
+ return;
+
+ if (is_nd_blk(&nd_region->dev))
+ nd_region->ns_seed = nd_namespace_blk_create(nd_region);
+ else
+ nd_region->ns_seed = nd_namespace_pmem_create(nd_region);
+
/*
* Seed creation failures are not fatal, provisioning is simply
* disabled until memory becomes available
*/
if (!nd_region->ns_seed)
- dev_err(&nd_region->dev, "failed to create blk namespace\n");
+ dev_err(&nd_region->dev, "failed to create %s namespace\n",
+ is_nd_blk(&nd_region->dev) ? "blk" : "pmem");
else
nd_device_register(nd_region->ns_seed);
}
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 3ba0b96ce7de..8623e57c2ce3 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -71,7 +71,7 @@ void nvdimm_devs_exit(void);
void nd_region_devs_exit(void);
void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev);
struct nd_region;
-void nd_region_create_blk_seed(struct nd_region *nd_region);
+void nd_region_create_ns_seed(struct nd_region *nd_region);
void nd_region_create_btt_seed(struct nd_region *nd_region);
void nd_region_create_pfn_seed(struct nd_region *nd_region);
void nd_region_create_dax_seed(struct nd_region *nd_region);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 3ac534aec60c..4f74e009b135 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -530,11 +530,12 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
if (is_nd_pmem(dev))
return;
}
- if (dev->parent && is_nd_blk(dev->parent) && probe) {
+ if (dev->parent && (is_nd_blk(dev->parent) || is_nd_pmem(dev->parent))
+ && probe) {
nd_region = to_nd_region(dev->parent);
nvdimm_bus_lock(dev);
if (nd_region->ns_seed == dev)
- nd_region_create_blk_seed(nd_region);
+ nd_region_create_ns_seed(nd_region);
nvdimm_bus_unlock(dev);
}
if (is_nd_btt(dev) && probe) {
@@ -544,23 +545,30 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
nvdimm_bus_lock(dev);
if (nd_region->btt_seed == dev)
nd_region_create_btt_seed(nd_region);
- if (nd_region->ns_seed == &nd_btt->ndns->dev &&
- is_nd_blk(dev->parent))
- nd_region_create_blk_seed(nd_region);
+ if (nd_region->ns_seed == &nd_btt->ndns->dev)
+ nd_region_create_ns_seed(nd_region);
nvdimm_bus_unlock(dev);
}
if (is_nd_pfn(dev) && probe) {
+ struct nd_pfn *nd_pfn = to_nd_pfn(dev);
+
nd_region = to_nd_region(dev->parent);
nvdimm_bus_lock(dev);
if (nd_region->pfn_seed == dev)
nd_region_create_pfn_seed(nd_region);
+ if (nd_region->ns_seed == &nd_pfn->ndns->dev)
+ nd_region_create_ns_seed(nd_region);
nvdimm_bus_unlock(dev);
}
if (is_nd_dax(dev) && probe) {
+ struct nd_dax *nd_dax = to_nd_dax(dev);
+
nd_region = to_nd_region(dev->parent);
nvdimm_bus_lock(dev);
if (nd_region->dax_seed == dev)
nd_region_create_dax_seed(nd_region);
+ if (nd_region->ns_seed == &nd_dax->nd_pfn.ndns->dev)
+ nd_region_create_ns_seed(nd_region);
nvdimm_bus_unlock(dev);
}
}
Now that the rest of the infrastructure has been converted to handle
multi-pmem configurations, lift the artificial barrier at scan time.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/namespace_devs.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 54babc3a80ca..fa51d751ccf7 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2070,9 +2070,6 @@ static struct device **scan_labels(struct nd_region *nd_region)
}
} else
devs[count++] = dev;
-
- /* we only expect one valid pmem label set per region */
- break;
}
}
The free dpa (dimm-physical-address) space calculation reports how much
free space is available with consideration for aliased BLK + PMEM
regions. Recall that BLK capacity is allocated from high addresses and
PMEM is allocated from low addresses in their respective regions.
nd_region_available_dpa() accounts for the fact that the largest
encroachment (lowest starting address) into PMEM capacity by a BLK
allocation limits the available capacity to that point, regardless if
there is BLK allocation hole at a higher address. Similarly, for the
multi-pmem case we need to track the largest encroachment (highest
ending address) of a PMEM allocation in BLK capacity regardless of
whether there is an allocation hole that a BLK allocation could fill at
a lower address.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/dimm_devs.c | 174 +++++++++++++++++++++++++++++++++---------
drivers/nvdimm/nd-core.h | 2
drivers/nvdimm/region_devs.c | 5 -
3 files changed, 139 insertions(+), 42 deletions(-)
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index cf36470e94c0..4b0296ccb375 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -386,40 +386,148 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
}
EXPORT_SYMBOL_GPL(nvdimm_create);
+struct blk_alloc_info {
+ struct nd_mapping *nd_mapping;
+ resource_size_t available, busy;
+ struct resource *res;
+};
+
+static int alias_dpa_busy(struct device *dev, void *data)
+{
+ resource_size_t map_end, blk_start, new, busy;
+ struct blk_alloc_info *info = data;
+ struct nd_mapping *nd_mapping;
+ struct nd_region *nd_region;
+ struct nvdimm_drvdata *ndd;
+ struct resource *res;
+ int i;
+
+ if (!is_nd_pmem(dev))
+ return 0;
+
+ nd_region = to_nd_region(dev);
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ nd_mapping = &nd_region->mapping[i];
+ if (nd_mapping->nvdimm == info->nd_mapping->nvdimm)
+ break;
+ }
+
+ if (i >= nd_region->ndr_mappings)
+ return 0;
+
+ ndd = to_ndd(nd_mapping);
+ map_end = nd_mapping->start + nd_mapping->size - 1;
+ blk_start = nd_mapping->start;
+ retry:
+ /*
+ * Find the free dpa from the end of the last pmem allocation to
+ * the end of the interleave-set mapping that is not already
+ * covered by a blk allocation.
+ */
+ busy = 0;
+ for_each_dpa_resource(ndd, res) {
+ if ((res->start >= blk_start && res->start < map_end)
+ || (res->end >= blk_start
+ && res->end <= map_end)) {
+ if (strncmp(res->name, "pmem", 4) == 0) {
+ new = max(blk_start, min(map_end + 1,
+ res->end + 1));
+ if (new != blk_start) {
+ blk_start = new;
+ goto retry;
+ }
+ } else
+ busy += min(map_end, res->end)
+ - max(nd_mapping->start, res->start) + 1;
+ } else if (nd_mapping->start > res->start
+ && map_end < res->end) {
+ /* total eclipse of the PMEM region mapping */
+ busy += nd_mapping->size;
+ break;
+ }
+ }
+
+ info->available -= blk_start - nd_mapping->start + busy;
+ return 0;
+}
+
+static int blk_dpa_busy(struct device *dev, void *data)
+{
+ struct blk_alloc_info *info = data;
+ struct nd_mapping *nd_mapping;
+ struct nd_region *nd_region;
+ resource_size_t map_end;
+ int i;
+
+ if (!is_nd_pmem(dev))
+ return 0;
+
+ nd_region = to_nd_region(dev);
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ nd_mapping = &nd_region->mapping[i];
+ if (nd_mapping->nvdimm == info->nd_mapping->nvdimm)
+ break;
+ }
+
+ if (i >= nd_region->ndr_mappings)
+ return 0;
+
+ map_end = nd_mapping->start + nd_mapping->size - 1;
+ if (info->res->start >= nd_mapping->start
+ && info->res->start < map_end) {
+ if (info->res->end <= map_end) {
+ info->busy = 0;
+ return 1;
+ } else {
+ info->busy -= info->res->end - map_end;
+ return 0;
+ }
+ } else if (info->res->end >= nd_mapping->start
+ && info->res->end <= map_end) {
+ info->busy -= nd_mapping->start - info->res->start;
+ return 0;
+ } else {
+ info->busy -= nd_mapping->size;
+ return 0;
+ }
+}
+
/**
* nd_blk_available_dpa - account the unused dpa of BLK region
* @nd_mapping: container of dpa-resource-root + labels
*
- * Unlike PMEM, BLK namespaces can occupy discontiguous DPA ranges.
+ * Unlike PMEM, BLK namespaces can occupy discontiguous DPA ranges, but
+ * we arrange for them to never start at an lower dpa than the last
+ * PMEM allocation in an aliased region.
*/
-resource_size_t nd_blk_available_dpa(struct nd_mapping *nd_mapping)
+resource_size_t nd_blk_available_dpa(struct nd_region *nd_region)
{
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(&nd_region->dev);
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
- resource_size_t map_end, busy = 0, available;
+ struct blk_alloc_info info = {
+ .nd_mapping = nd_mapping,
+ .available = nd_mapping->size,
+ };
struct resource *res;
if (!ndd)
return 0;
- map_end = nd_mapping->start + nd_mapping->size - 1;
- for_each_dpa_resource(ndd, res)
- if (res->start >= nd_mapping->start && res->start < map_end) {
- resource_size_t end = min(map_end, res->end);
+ device_for_each_child(&nvdimm_bus->dev, &info, alias_dpa_busy);
- busy += end - res->start + 1;
- } else if (res->end >= nd_mapping->start
- && res->end <= map_end) {
- busy += res->end - nd_mapping->start;
- } else if (nd_mapping->start > res->start
- && nd_mapping->start < res->end) {
- /* total eclipse of the BLK region mapping */
- busy += nd_mapping->size;
- }
+ /* now account for busy blk allocations in unaliased dpa */
+ for_each_dpa_resource(ndd, res) {
+ if (strncmp(res->name, "blk", 3) != 0)
+ continue;
- available = map_end - nd_mapping->start + 1;
- if (busy < available)
- return available - busy;
- return 0;
+ info.res = res;
+ info.busy = resource_size(res);
+ device_for_each_child(&nvdimm_bus->dev, &info, blk_dpa_busy);
+ info.available -= info.busy;
+ }
+
+ return info.available;
}
/**
@@ -451,21 +559,16 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
map_start = nd_mapping->start;
map_end = map_start + nd_mapping->size - 1;
blk_start = max(map_start, map_end + 1 - *overlap);
- for_each_dpa_resource(ndd, res)
+ for_each_dpa_resource(ndd, res) {
if (res->start >= map_start && res->start < map_end) {
if (strncmp(res->name, "blk", 3) == 0)
- blk_start = min(blk_start, res->start);
- else if (res->start != map_start) {
+ blk_start = min(blk_start,
+ max(map_start, res->start));
+ else if (res->end > map_end) {
reason = "misaligned to iset";
goto err;
- } else {
- if (busy) {
- reason = "duplicate overlapping PMEM reservations?";
- goto err;
- }
+ } else
busy += resource_size(res);
- continue;
- }
} else if (res->end >= map_start && res->end <= map_end) {
if (strncmp(res->name, "blk", 3) == 0) {
/*
@@ -474,15 +577,14 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
* be used for BLK.
*/
blk_start = map_start;
- } else {
- reason = "misaligned to iset";
- goto err;
- }
+ } else
+ busy += resource_size(res);
} else if (map_start > res->start && map_start < res->end) {
/* total eclipse of the mapping */
busy += nd_mapping->size;
blk_start = map_start;
}
+ }
*overlap = map_end + 1 - blk_start;
available = blk_start - map_start;
@@ -491,10 +593,6 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
return 0;
err:
- /*
- * Something is wrong, PMEM must align with the start of the
- * interleave set, and there can only be one allocation per set.
- */
nd_dbg_dpa(nd_region, ndd, res, "%s\n", reason);
return 0;
}
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index fb3ade0d4a83..7c2196a1d56f 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -76,7 +76,7 @@ struct nd_mapping;
void nd_mapping_free_labels(struct nd_mapping *nd_mapping);
resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
struct nd_mapping *nd_mapping, resource_size_t *overlap);
-resource_size_t nd_blk_available_dpa(struct nd_mapping *nd_mapping);
+resource_size_t nd_blk_available_dpa(struct nd_region *nd_region);
resource_size_t nd_region_available_dpa(struct nd_region *nd_region);
resource_size_t nvdimm_allocated_dpa(struct nvdimm_drvdata *ndd,
struct nd_label_id *label_id);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 19bcd68c4141..3ac534aec60c 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -294,9 +294,8 @@ resource_size_t nd_region_available_dpa(struct nd_region *nd_region)
blk_max_overlap = overlap;
goto retry;
}
- } else if (is_nd_blk(&nd_region->dev)) {
- available += nd_blk_available_dpa(nd_mapping);
- }
+ } else if (is_nd_blk(&nd_region->dev))
+ available += nd_blk_available_dpa(nd_region);
}
return available;
pmem devices are currently named /dev/pmem<region-index>. Preserve the
naming of the 0th device, but add a ".<namespace-index>" for other
devices.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/namespace_devs.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index f0536c2789e9..132c5b8b5366 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -173,7 +173,21 @@ const char *nvdimm_namespace_disk_name(struct nd_namespace_common *ndns,
suffix = "s";
if (is_namespace_pmem(&ndns->dev) || is_namespace_io(&ndns->dev)) {
- sprintf(name, "pmem%d%s", nd_region->id, suffix ? suffix : "");
+ int nsidx = 0;
+
+ if (is_namespace_pmem(&ndns->dev)) {
+ struct nd_namespace_pmem *nspm;
+
+ nspm = to_nd_namespace_pmem(&ndns->dev);
+ nsidx = nspm->id;
+ }
+
+ if (nsidx)
+ sprintf(name, "pmem%d.%d%s", nd_region->id, nsidx,
+ suffix ? suffix : "");
+ else
+ sprintf(name, "pmem%d%s", nd_region->id,
+ suffix ? suffix : "");
} else if (is_namespace_blk(&ndns->dev)) {
struct nd_namespace_blk *nsblk;
In preparation for allowing multiple namespace per pmem region, unify
blk and pmem label scanning. Given that blk regions already support
multiple namespaces, teaching that path how to do pmem namespace
scanning is an incremental step towards multiple pmem namespace support.
This should be functionally equivalent to the previous state in that
stops after finding the first valid pmem label set.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/namespace_devs.c | 385 +++++++++++++++++++++------------------
1 file changed, 207 insertions(+), 178 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 0e62f46755e7..fbcadc7cb8fd 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1550,7 +1550,7 @@ static int select_pmem_id(struct nd_region *nd_region, u8 *pmem_id)
u64 hw_start, hw_end, pmem_start, pmem_end;
struct nd_label_ent *label_ent;
- mutex_lock(&nd_mapping->lock);
+ WARN_ON(!mutex_is_locked(&nd_mapping->lock));
list_for_each_entry(label_ent, &nd_mapping->labels, list) {
nd_label = label_ent->label;
if (!nd_label)
@@ -1559,7 +1559,6 @@ static int select_pmem_id(struct nd_region *nd_region, u8 *pmem_id)
break;
nd_label = NULL;
}
- mutex_unlock(&nd_mapping->lock);
if (!nd_label) {
WARN_ON(1);
@@ -1579,88 +1578,65 @@ static int select_pmem_id(struct nd_region *nd_region, u8 *pmem_id)
else
return -EINVAL;
- mutex_lock(&nd_mapping->lock);
- label_ent = list_first_entry(&nd_mapping->labels,
- typeof(*label_ent), list);
- label_ent->label = nd_label;
- list_del(&label_ent->list);
- nd_mapping_free_labels(nd_mapping);
- list_add(&label_ent->list, &nd_mapping->labels);
- mutex_unlock(&nd_mapping->lock);
+ /* move recently validated label to the front of the list */
+ list_move(&label_ent->list, &nd_mapping->labels);
}
return 0;
}
/**
- * find_pmem_label_set - validate interleave set labelling, retrieve label0
+ * create_namespace_pmem - validate interleave set labelling, retrieve label0
* @nd_region: region with mappings to validate
+ * @nspm: target namespace to create
+ * @nd_label: target pmem namespace label to evaluate
*/
-static int find_pmem_label_set(struct nd_region *nd_region,
- struct nd_namespace_pmem *nspm)
+struct device *create_namespace_pmem(struct nd_region *nd_region,
+ struct nd_namespace_label *nd_label)
{
u64 cookie = nd_region_interleave_set_cookie(nd_region);
- u8 select_id[NSLABEL_UUID_LEN];
struct nd_label_ent *label_ent;
+ struct nd_namespace_pmem *nspm;
struct nd_mapping *nd_mapping;
resource_size_t size = 0;
- u8 *pmem_id = NULL;
+ struct resource *res;
+ struct device *dev;
int rc = 0;
u16 i;
if (cookie == 0) {
dev_dbg(&nd_region->dev, "invalid interleave-set-cookie\n");
- return -ENXIO;
+ return ERR_PTR(-ENXIO);
}
- /*
- * Find a complete set of labels by uuid. By definition we can start
- * with any mapping as the reference label
- */
- for (i = 0; i < nd_region->ndr_mappings; i++) {
- nd_mapping = &nd_region->mapping[i];
- mutex_lock_nested(&nd_mapping->lock, i);
+ if (__le64_to_cpu(nd_label->isetcookie) != cookie) {
+ dev_dbg(&nd_region->dev, "invalid cookie in label: %pUb\n",
+ nd_label->uuid);
+ return ERR_PTR(-EAGAIN);
}
- list_for_each_entry(label_ent, &nd_region->mapping[0].labels, list) {
- struct nd_namespace_label *nd_label = label_ent->label;
- if (!nd_label)
- continue;
- if (__le64_to_cpu(nd_label->isetcookie) != cookie)
- continue;
-
- for (i = 0; i < nd_region->ndr_mappings; i++)
- if (!has_uuid_at_pos(nd_region, nd_label->uuid,
- cookie, i))
- break;
- if (i < nd_region->ndr_mappings) {
- /*
- * Give up if we don't find an instance of a
- * uuid at each position (from 0 to
- * nd_region->ndr_mappings - 1), or if we find a
- * dimm with two instances of the same uuid.
- */
- rc = -EINVAL;
- break;
- } else if (pmem_id) {
- /*
- * If there is more than one valid uuid set, we
- * need userspace to clean this up.
- */
- rc = -EBUSY;
- break;
- }
- memcpy(select_id, nd_label->uuid, NSLABEL_UUID_LEN);
- pmem_id = select_id;
- }
- for (i = 0; i < nd_region->ndr_mappings; i++) {
- int reverse = nd_region->ndr_mappings - 1 - i;
+ nspm = kzalloc(sizeof(*nspm), GFP_KERNEL);
+ if (!nspm)
+ return ERR_PTR(-ENOMEM);
- nd_mapping = &nd_region->mapping[reverse];
- mutex_unlock(&nd_mapping->lock);
- }
+ dev = &nspm->nsio.common.dev;
+ dev->type = &namespace_pmem_device_type;
+ dev->parent = &nd_region->dev;
+ res = &nspm->nsio.res;
+ res->name = dev_name(&nd_region->dev);
+ res->flags = IORESOURCE_MEM;
- if (rc)
+ for (i = 0; i < nd_region->ndr_mappings; i++)
+ if (!has_uuid_at_pos(nd_region, nd_label->uuid, cookie, i))
+ break;
+ if (i < nd_region->ndr_mappings) {
+ /*
+ * Give up if we don't find an instance of a uuid at each
+ * position (from 0 to nd_region->ndr_mappings - 1), or if we
+ * find a dimm with two instances of the same uuid.
+ */
+ rc = -EINVAL;
goto err;
+ }
/*
* Fix up each mapping's 'labels' to have the validated pmem label for
@@ -1670,7 +1646,7 @@ static int find_pmem_label_set(struct nd_region *nd_region,
* the dimm being enabled (i.e. nd_label_reserve_dpa()
* succeeded).
*/
- rc = select_pmem_id(nd_region, pmem_id);
+ rc = select_pmem_id(nd_region, nd_label->uuid);
if (rc)
goto err;
@@ -1679,11 +1655,9 @@ static int find_pmem_label_set(struct nd_region *nd_region,
struct nd_namespace_label *label0;
nd_mapping = &nd_region->mapping[i];
- mutex_lock(&nd_mapping->lock);
label_ent = list_first_entry_or_null(&nd_mapping->labels,
typeof(*label_ent), list);
label0 = label_ent ? label_ent->label : 0;
- mutex_unlock(&nd_mapping->lock);
if (!label0) {
WARN_ON(1);
@@ -1707,8 +1681,9 @@ static int find_pmem_label_set(struct nd_region *nd_region,
nd_namespace_pmem_set_size(nd_region, nspm, size);
- return 0;
+ return dev;
err:
+ namespace_pmem_release(dev);
switch (rc) {
case -EINVAL:
dev_dbg(&nd_region->dev, "%s: invalid label(s)\n", __func__);
@@ -1721,56 +1696,7 @@ static int find_pmem_label_set(struct nd_region *nd_region,
__func__, rc);
break;
}
- return rc;
-}
-
-static struct device **create_namespace_pmem(struct nd_region *nd_region)
-{
- struct nd_namespace_pmem *nspm;
- struct device *dev, **devs;
- struct resource *res;
- int rc;
-
- nspm = kzalloc(sizeof(*nspm), GFP_KERNEL);
- if (!nspm)
- return NULL;
-
- dev = &nspm->nsio.common.dev;
- dev->type = &namespace_pmem_device_type;
- dev->parent = &nd_region->dev;
- res = &nspm->nsio.res;
- res->name = dev_name(&nd_region->dev);
- res->flags = IORESOURCE_MEM;
- rc = find_pmem_label_set(nd_region, nspm);
- if (rc == -ENODEV) {
- int i;
-
- /* Pass, try to permit namespace creation... */
- for (i = 0; i < nd_region->ndr_mappings; i++) {
- struct nd_mapping *nd_mapping = &nd_region->mapping[i];
-
- mutex_lock(&nd_mapping->lock);
- nd_mapping_free_labels(nd_mapping);
- mutex_unlock(&nd_mapping->lock);
- }
-
- /* Publish a zero-sized namespace for userspace to configure. */
- nd_namespace_pmem_set_size(nd_region, nspm, 0);
-
- rc = 0;
- } else if (rc)
- goto err;
-
- devs = kcalloc(2, sizeof(struct device *), GFP_KERNEL);
- if (!devs)
- goto err;
-
- devs[0] = dev;
- return devs;
-
- err:
- namespace_pmem_release(&nspm->nsio.common.dev);
- return NULL;
+ return ERR_PTR(rc);
}
struct resource *nsblk_add_resource(struct nd_region *nd_region,
@@ -1872,43 +1798,107 @@ void nd_region_create_btt_seed(struct nd_region *nd_region)
dev_err(&nd_region->dev, "failed to create btt namespace\n");
}
-static struct device **scan_labels(struct nd_region *nd_region,
- struct nd_mapping *nd_mapping)
+static int add_namespace_resource(struct nd_region *nd_region,
+ struct nd_namespace_label *nd_label, struct device **devs,
+ int count)
{
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+ struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
+ int i;
+
+ for (i = 0; i < count; i++) {
+ u8 *uuid = namespace_to_uuid(devs[i]);
+ struct resource *res;
+
+ if (IS_ERR_OR_NULL(uuid)) {
+ WARN_ON(1);
+ continue;
+ }
+
+ if (memcmp(uuid, nd_label->uuid, NSLABEL_UUID_LEN) != 0)
+ continue;
+ if (is_namespace_blk(devs[i])) {
+ res = nsblk_add_resource(nd_region, ndd,
+ to_nd_namespace_blk(devs[i]),
+ __le64_to_cpu(nd_label->dpa));
+ if (!res)
+ return -ENXIO;
+ nd_dbg_dpa(nd_region, ndd, res, "%d assign\n", count);
+ } else {
+ dev_err(&nd_region->dev,
+ "error: conflicting extents for uuid: %pUb\n",
+ nd_label->uuid);
+ return -ENXIO;
+ }
+ break;
+ }
+
+ return i;
+}
+
+struct device *create_namespace_blk(struct nd_region *nd_region,
+ struct nd_namespace_label *nd_label, int count)
+{
+
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
- struct device *dev, **devs = NULL;
struct nd_namespace_blk *nsblk;
- struct nd_label_ent *label_ent;
+ char *name[NSLABEL_NAME_LEN];
+ struct device *dev = NULL;
+ struct resource *res;
+
+ nsblk = kzalloc(sizeof(*nsblk), GFP_KERNEL);
+ if (!nsblk)
+ return ERR_PTR(-ENOMEM);
+ dev = &nsblk->common.dev;
+ dev->type = &namespace_blk_device_type;
+ dev->parent = &nd_region->dev;
+ nsblk->id = -1;
+ nsblk->lbasize = __le64_to_cpu(nd_label->lbasize);
+ nsblk->uuid = kmemdup(nd_label->uuid, NSLABEL_UUID_LEN,
+ GFP_KERNEL);
+ if (!nsblk->uuid)
+ goto blk_err;
+ memcpy(name, nd_label->name, NSLABEL_NAME_LEN);
+ if (name[0])
+ nsblk->alt_name = kmemdup(name, NSLABEL_NAME_LEN,
+ GFP_KERNEL);
+ res = nsblk_add_resource(nd_region, ndd, nsblk,
+ __le64_to_cpu(nd_label->dpa));
+ if (!res)
+ goto blk_err;
+ nd_dbg_dpa(nd_region, ndd, res, "%d: assign\n", count);
+ return dev;
+ blk_err:
+ namespace_blk_release(dev);
+ return ERR_PTR(-ENXIO);
+}
+
+static struct device **scan_labels(struct nd_region *nd_region)
+{
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+ struct device *dev, **devs = NULL;
+ struct nd_label_ent *label_ent, *e;
int i, count = 0;
- list_for_each_entry(label_ent, &nd_mapping->labels, list) {
+ /* "safe" because create_namespace_pmem() might list_move() label_ent */
+ list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list) {
struct nd_namespace_label *nd_label = label_ent->label;
- char *name[NSLABEL_NAME_LEN];
struct device **__devs;
- struct resource *res;
u32 flags;
if (!nd_label)
continue;
flags = __le32_to_cpu(nd_label->flags);
- if (flags & NSLABEL_FLAG_LOCAL)
- /* pass */;
+ if (is_nd_blk(&nd_region->dev)
+ == !!(flags & NSLABEL_FLAG_LOCAL))
+ /* pass, region matches label type */;
else
continue;
- for (i = 0; i < count; i++) {
- nsblk = to_nd_namespace_blk(devs[i]);
- if (memcmp(nsblk->uuid, nd_label->uuid,
- NSLABEL_UUID_LEN) == 0) {
- res = nsblk_add_resource(nd_region, ndd, nsblk,
- __le64_to_cpu(nd_label->dpa));
- if (!res)
- goto err;
- nd_dbg_dpa(nd_region, ndd, res, "%s assign\n",
- dev_name(&nsblk->common.dev));
- break;
- }
- }
+ i = add_namespace_resource(nd_region, nd_label, devs, count);
+ if (i < 0)
+ goto err;
if (i < count)
continue;
__devs = kcalloc(count + 2, sizeof(dev), GFP_KERNEL);
@@ -1918,34 +1908,35 @@ static struct device **scan_labels(struct nd_region *nd_region,
kfree(devs);
devs = __devs;
- nsblk = kzalloc(sizeof(*nsblk), GFP_KERNEL);
- if (!nsblk)
- goto err;
- dev = &nsblk->common.dev;
- dev->type = &namespace_blk_device_type;
- dev->parent = &nd_region->dev;
- dev_set_name(dev, "namespace%d.%d", nd_region->id, count);
- devs[count++] = dev;
- nsblk->id = -1;
- nsblk->lbasize = __le64_to_cpu(nd_label->lbasize);
- nsblk->uuid = kmemdup(nd_label->uuid, NSLABEL_UUID_LEN,
- GFP_KERNEL);
- if (!nsblk->uuid)
- goto err;
- memcpy(name, nd_label->name, NSLABEL_NAME_LEN);
- if (name[0])
- nsblk->alt_name = kmemdup(name, NSLABEL_NAME_LEN,
- GFP_KERNEL);
- res = nsblk_add_resource(nd_region, ndd, nsblk,
- __le64_to_cpu(nd_label->dpa));
- if (!res)
- goto err;
- nd_dbg_dpa(nd_region, ndd, res, "%s assign\n",
- dev_name(&nsblk->common.dev));
+ if (is_nd_blk(&nd_region->dev)) {
+ dev = create_namespace_blk(nd_region, nd_label, count);
+ if (IS_ERR(dev))
+ goto err;
+ devs[count++] = dev;
+ } else {
+ dev = create_namespace_pmem(nd_region, nd_label);
+ if (IS_ERR(dev)) {
+ switch (PTR_ERR(dev)) {
+ case -EAGAIN:
+ /* skip invalid labels */
+ continue;
+ case -ENODEV:
+ /* fallthrough to seed creation */
+ break;
+ default:
+ goto err;
+ }
+ } else
+ devs[count++] = dev;
+
+ /* we only expect one valid pmem label set per region */
+ break;
+ }
}
- dev_dbg(&nd_region->dev, "%s: discovered %d blk namespace%s\n",
- __func__, count, count == 1 ? "" : "s");
+ dev_dbg(&nd_region->dev, "%s: discovered %d %s namespace%s\n",
+ __func__, count, is_nd_blk(&nd_region->dev)
+ ? "blk" : "pmem", count == 1 ? "" : "s");
if (count == 0) {
/* Publish a zero-sized namespace for userspace to configure. */
@@ -1954,37 +1945,77 @@ static struct device **scan_labels(struct nd_region *nd_region,
devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
if (!devs)
goto err;
- nsblk = kzalloc(sizeof(*nsblk), GFP_KERNEL);
- if (!nsblk)
- goto err;
- dev = &nsblk->common.dev;
- dev->type = &namespace_blk_device_type;
+ if (is_nd_blk(&nd_region->dev)) {
+ struct nd_namespace_blk *nsblk;
+
+ nsblk = kzalloc(sizeof(*nsblk), GFP_KERNEL);
+ if (!nsblk)
+ goto err;
+ dev = &nsblk->common.dev;
+ dev->type = &namespace_blk_device_type;
+ } else {
+ struct nd_namespace_pmem *nspm;
+
+ nspm = kzalloc(sizeof(*nspm), GFP_KERNEL);
+ if (!nspm)
+ goto err;
+ dev = &nspm->nsio.common.dev;
+ dev->type = &namespace_pmem_device_type;
+ nd_namespace_pmem_set_size(nd_region, nspm, 0);
+ }
dev->parent = &nd_region->dev;
devs[count++] = dev;
+ } else if (is_nd_pmem(&nd_region->dev)) {
+ /* clean unselected labels */
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ nd_mapping = &nd_region->mapping[i];
+ if (list_empty(&nd_mapping->labels)) {
+ WARN_ON(1);
+ continue;
+ }
+ label_ent = list_first_entry(&nd_mapping->labels,
+ typeof(*label_ent), list);
+ list_del(&label_ent->list);
+ nd_mapping_free_labels(nd_mapping);
+ list_add(&label_ent->list, &nd_mapping->labels);
+ }
}
return devs;
err:
- for (i = 0; devs[i]; i++) {
- nsblk = to_nd_namespace_blk(devs[i]);
- namespace_blk_release(&nsblk->common.dev);
- }
+ for (i = 0; devs[i]; i++)
+ if (is_nd_blk(&nd_region->dev))
+ namespace_blk_release(devs[i]);
+ else
+ namespace_pmem_release(devs[i]);
kfree(devs);
return NULL;
}
-static struct device **create_namespace_blk(struct nd_region *nd_region)
+static struct device **create_namespaces(struct nd_region *nd_region)
{
struct nd_mapping *nd_mapping = &nd_region->mapping[0];
struct device **devs;
+ int i;
if (nd_region->ndr_mappings == 0)
return NULL;
- mutex_lock(&nd_mapping->lock);
- devs = scan_labels(nd_region, nd_mapping);
- mutex_unlock(&nd_mapping->lock);
+ /* lock down all mappings while we scan labels */
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ nd_mapping = &nd_region->mapping[i];
+ mutex_lock_nested(&nd_mapping->lock, i);
+ }
+
+ devs = scan_labels(nd_region);
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ int reverse = nd_region->ndr_mappings - 1 - i;
+
+ nd_mapping = &nd_region->mapping[reverse];
+ mutex_unlock(&nd_mapping->lock);
+ }
return devs;
}
@@ -2064,10 +2095,8 @@ int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
devs = create_namespace_io(nd_region);
break;
case ND_DEVICE_NAMESPACE_PMEM:
- devs = create_namespace_pmem(nd_region);
- break;
case ND_DEVICE_NAMESPACE_BLK:
- devs = create_namespace_blk(nd_region);
+ devs = create_namespaces(nd_region);
break;
default:
break;
Hi Dan,
A couple of general questions...
On 10/7/2016 12:38 PM, Dan Williams wrote:
> With the arrival of the device-dax facility in 4.7 a pmem namespace can
> now be configured into a total of four distinct modes: 'raw', 'sector',
> 'memory', and 'dax'. Where raw, sector, and memory are block device
> modes and dax supports the device-dax character device. With that degree
> of freedom in the use cases it is overly restrictive to continue the
> current limit of only one pmem namespace per-region, or "interleave-set"
> in ACPI 6+ terminology.
If I understand correctly, at least some of the restrictions were
part of the Intel NVDIMM Namespace spec rather than ACPI/NFIT restrictions.
The most recent namespace spec on pmem.io hasn't been updated to remove
those restrictions. Is there a different public spec?
> This series adds support for reading and writing configurations that
> describe multiple pmem allocations within a region. The new rules for
> allocating / validating the available capacity when blk and pmem regions
> alias are (quoting space_valid()):
>
> BLK-space is valid as long as it does not precede a PMEM
> allocation in a given region. PMEM-space must be contiguous
> and adjacent to an existing existing allocation (if one
> exists).
Why is this new rule necessary? Is this a HW-specific rule or something
related to how Linux could possibly support something? Why do we care
whether blk-space is before or after pmem-space? If it's a HW-specific
rule, then shouldn't the enforcement be in the management tool that
configures the namespaces?
> Where "adjacent" allocations grow an existing namespace. Note that
> growing a namespace is potentially destructive if free space is consumed
> from a location preceding the current allocation. There is no support
> for dis-continuity within a given namespace allocation.
Are you talking about DPAs here?
> Previously, since there was only one namespace per-region, the resulting
> pmem device would be named after the region. Now, subsequent namespaces
> after the first are named with the region index and a
> ".<namespace-index>" suffix. For example:
>
> /dev/pmem0.1
According to the existing namespace spec, you can already have multiple
block namespaces on a device. I've not see a system with block namespaces
so what do those /dev entries look like? (The dots are somewhat unattractive.)
-- ljk
>
> ---
>
> Dan Williams (14):
> libnvdimm, region: move region-mapping input-paramters to nd_mapping_desc
> libnvdimm, label: convert label tracking to a linked list
> libnvdimm, namespace: refactor uuid_show() into a namespace_to_uuid() helper
> libnvdimm, namespace: unify blk and pmem label scanning
> tools/testing/nvdimm: support for sub-dividing a pmem region
> libnvdimm, namespace: allow multiple pmem-namespaces per region at scan time
> libnvdimm, namespace: sort namespaces by dpa at init
> libnvdimm, region: update nd_region_available_dpa() for multi-pmem support
> libnvdimm, namespace: expand pmem device naming scheme for multi-pmem
> libnvdimm, namespace: update label implementation for multi-pmem
> libnvdimm, namespace: enable allocation of multiple pmem namespaces
> libnvdimm, namespace: filter out of range labels in scan_labels()
> libnvdimm, namespace: lift single pmem limit in scan_labels()
> libnvdimm, namespace: allow creation of multiple pmem-namespaces per region
>
>
> drivers/acpi/nfit/core.c | 30 +
> drivers/nvdimm/dimm_devs.c | 192 ++++++--
> drivers/nvdimm/label.c | 192 +++++---
> drivers/nvdimm/namespace_devs.c | 786 +++++++++++++++++++++++----------
> drivers/nvdimm/nd-core.h | 23 +
> drivers/nvdimm/nd.h | 28 +
> drivers/nvdimm/region_devs.c | 58 ++
> include/linux/libnvdimm.h | 25 -
> include/linux/nd.h | 8
> tools/testing/nvdimm/test/iomap.c | 134 ++++--
> tools/testing/nvdimm/test/nfit.c | 21 -
> tools/testing/nvdimm/test/nfit_test.h | 12 -
> 12 files changed, 1055 insertions(+), 454 deletions(-)
> _______________________________________________
> Linux-nvdimm mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/linux-nvdimm
>
On Fri, Oct 7, 2016 at 11:19 AM, Linda Knippers <[email protected]> wrote:
> Hi Dan,
>
> A couple of general questions...
>
> On 10/7/2016 12:38 PM, Dan Williams wrote:
>> With the arrival of the device-dax facility in 4.7 a pmem namespace can
>> now be configured into a total of four distinct modes: 'raw', 'sector',
>> 'memory', and 'dax'. Where raw, sector, and memory are block device
>> modes and dax supports the device-dax character device. With that degree
>> of freedom in the use cases it is overly restrictive to continue the
>> current limit of only one pmem namespace per-region, or "interleave-set"
>> in ACPI 6+ terminology.
>
> If I understand correctly, at least some of the restrictions were
> part of the Intel NVDIMM Namespace spec rather than ACPI/NFIT restrictions.
> The most recent namespace spec on pmem.io hasn't been updated to remove
> those restrictions. Is there a different public spec?
Yes, this is Linux specific and use of this capability needs to be
cognizant that it could create a configuration that is not understood
by EFI, or other OSes (including older Linux implementations). I plan
to add documentation to ndctl along these lines. This is similar to
the current situation with 'pfn' and 'dax' info blocks that are also
Linux specific. However, I should note that this implementation
changes none of the interpretation of the fields nor layout of the
existing label specification. It simply allows two pmem labels that
happen to appear in the same region to result in two namespaces rather
than 0.
>> This series adds support for reading and writing configurations that
>> describe multiple pmem allocations within a region. The new rules for
>> allocating / validating the available capacity when blk and pmem regions
>> alias are (quoting space_valid()):
>>
>> BLK-space is valid as long as it does not precede a PMEM
>> allocation in a given region. PMEM-space must be contiguous
>> and adjacent to an existing existing allocation (if one
>> exists).
>
> Why is this new rule necessary? Is this a HW-specific rule or something
> related to how Linux could possibly support something? Why do we care
> whether blk-space is before or after pmem-space? If it's a HW-specific
> rule, then shouldn't the enforcement be in the management tool that
> configures the namespaces?
It is not HW specific, and it's not new in the sense that we already
arrange for pmem to be allocated from low addresses and blk to be
allocated from high addresses. If another implementation violated
this constraint Linux would parse it just fine. The constraint is a
Linux decision to maximize available pmem capacity when blk and pmem
alias. So this is a situation where Linux is liberal in what it will
accept when reading labels, but conservative on the configurations it
will create when writing labels.
>> Where "adjacent" allocations grow an existing namespace. Note that
>> growing a namespace is potentially destructive if free space is consumed
>> from a location preceding the current allocation. There is no support
>> for dis-continuity within a given namespace allocation.
>
> Are you talking about DPAs here?
No, this is referring to system-physical-address partitioning.
>> Previously, since there was only one namespace per-region, the resulting
>> pmem device would be named after the region. Now, subsequent namespaces
>> after the first are named with the region index and a
>> ".<namespace-index>" suffix. For example:
>>
>> /dev/pmem0.1
>
> According to the existing namespace spec, you can already have multiple
> block namespaces on a device. I've not see a system with block namespaces
> so what do those /dev entries look like? (The dots are somewhat unattractive.)
Block namespaces result in devices with names like "/dev/ndblk0.0"
where the X.Y numbers are <region-index>.<namespace-index>. This new
naming for pmem devices is following that precedent. The "dot" was
originally adopted from Linux USB device naming.
On 10/7/2016 3:52 PM, Dan Williams wrote:
> On Fri, Oct 7, 2016 at 11:19 AM, Linda Knippers <[email protected]> wrote:
>> Hi Dan,
>>
>> A couple of general questions...
>>
>> On 10/7/2016 12:38 PM, Dan Williams wrote:
>>> With the arrival of the device-dax facility in 4.7 a pmem namespace can
>>> now be configured into a total of four distinct modes: 'raw', 'sector',
>>> 'memory', and 'dax'. Where raw, sector, and memory are block device
>>> modes and dax supports the device-dax character device. With that degree
>>> of freedom in the use cases it is overly restrictive to continue the
>>> current limit of only one pmem namespace per-region, or "interleave-set"
>>> in ACPI 6+ terminology.
>>
>> If I understand correctly, at least some of the restrictions were
>> part of the Intel NVDIMM Namespace spec rather than ACPI/NFIT restrictions.
>> The most recent namespace spec on pmem.io hasn't been updated to remove
>> those restrictions. Is there a different public spec?
>
> Yes, this is Linux specific and use of this capability needs to be
> cognizant that it could create a configuration that is not understood
> by EFI, or other OSes (including older Linux implementations). I plan
> to add documentation to ndctl along these lines. This is similar to
> the current situation with 'pfn' and 'dax' info blocks that are also
> Linux specific. However, I should note that this implementation
> changes none of the interpretation of the fields nor layout of the
> existing label specification. It simply allows two pmem labels that
> happen to appear in the same region to result in two namespaces rather
> than 0.
Ok, but the namespace spec says that's not allowed. It seemed like an odd
restriction to be in the label spec but it is there.
>
>>> This series adds support for reading and writing configurations that
>>> describe multiple pmem allocations within a region. The new rules for
>>> allocating / validating the available capacity when blk and pmem regions
>>> alias are (quoting space_valid()):
>>>
>>> BLK-space is valid as long as it does not precede a PMEM
>>> allocation in a given region. PMEM-space must be contiguous
>>> and adjacent to an existing existing allocation (if one
>>> exists).
>>
>> Why is this new rule necessary? Is this a HW-specific rule or something
>> related to how Linux could possibly support something? Why do we care
>> whether blk-space is before or after pmem-space? If it's a HW-specific
>> rule, then shouldn't the enforcement be in the management tool that
>> configures the namespaces?
>
> It is not HW specific, and it's not new in the sense that we already
> arrange for pmem to be allocated from low addresses and blk to be
> allocated from high addresses.
Who's the "we"? Does the location within the region come from the OS
or from the tool that created the namespace? (I should probably know
this but not having labels, I've never looked at this.)
If we're relaxing some of the rules, it seems like one could have
pmem, then block, then free space, and later want to use free space
for another pmem range. If hardware supported it and the management
tool created it, would the kernel allow it?
> If another implementation violated
> this constraint Linux would parse it just fine. The constraint is a
> Linux decision to maximize available pmem capacity when blk and pmem
> alias. So this is a situation where Linux is liberal in what it will
> accept when reading labels, but conservative on the configurations it
> will create when writing labels.
Is it ndctl that's being conservative? It seems like the kernel shouldn't care.
>
>>> Where "adjacent" allocations grow an existing namespace. Note that
>>> growing a namespace is potentially destructive if free space is consumed
>>> from a location preceding the current allocation. There is no support
>>> for dis-continuity within a given namespace allocation.
>>
>> Are you talking about DPAs here?
>
> No, this is referring to system-physical-address partitioning.
>
>>> Previously, since there was only one namespace per-region, the resulting
>>> pmem device would be named after the region. Now, subsequent namespaces
>>> after the first are named with the region index and a
>>> ".<namespace-index>" suffix. For example:
>>>
>>> /dev/pmem0.1
>>
>> According to the existing namespace spec, you can already have multiple
>> block namespaces on a device. I've not see a system with block namespaces
>> so what do those /dev entries look like? (The dots are somewhat unattractive.)
>
> Block namespaces result in devices with names like "/dev/ndblk0.0"
> where the X.Y numbers are <region-index>.<namespace-index>. This new
> naming for pmem devices is following that precedent. The "dot" was
> originally adopted from Linux USB device naming.
Does this mean that if someone updates their kernel then their /dev/pmem0
becomes /dev/pmem0.0? Or do you only get the dot if there is more
than one namespace per region?
-- ljk
>
On Fri, Oct 7, 2016 at 2:42 PM, Linda Knippers <[email protected]> wrote:
>
>
> On 10/7/2016 3:52 PM, Dan Williams wrote:
>> On Fri, Oct 7, 2016 at 11:19 AM, Linda Knippers <[email protected]> wrote:
>>> Hi Dan,
>>>
>>> A couple of general questions...
>>>
>>> On 10/7/2016 12:38 PM, Dan Williams wrote:
>>>> With the arrival of the device-dax facility in 4.7 a pmem namespace can
>>>> now be configured into a total of four distinct modes: 'raw', 'sector',
>>>> 'memory', and 'dax'. Where raw, sector, and memory are block device
>>>> modes and dax supports the device-dax character device. With that degree
>>>> of freedom in the use cases it is overly restrictive to continue the
>>>> current limit of only one pmem namespace per-region, or "interleave-set"
>>>> in ACPI 6+ terminology.
>>>
>>> If I understand correctly, at least some of the restrictions were
>>> part of the Intel NVDIMM Namespace spec rather than ACPI/NFIT restrictions.
>>> The most recent namespace spec on pmem.io hasn't been updated to remove
>>> those restrictions. Is there a different public spec?
>>
>> Yes, this is Linux specific and use of this capability needs to be
>> cognizant that it could create a configuration that is not understood
>> by EFI, or other OSes (including older Linux implementations). I plan
>> to add documentation to ndctl along these lines. This is similar to
>> the current situation with 'pfn' and 'dax' info blocks that are also
>> Linux specific. However, I should note that this implementation
>> changes none of the interpretation of the fields nor layout of the
>> existing label specification. It simply allows two pmem labels that
>> happen to appear in the same region to result in two namespaces rather
>> than 0.
>
> Ok, but the namespace spec says that's not allowed. It seemed like an odd
> restriction to be in the label spec but it is there.
The restriction greatly simplified the implementation back a couple
years ago when we assumed that partitioning of block devices could
handle any cases of needing distinct operation modes for different
sub-divisions of pmem. If you look at the original implementation of
the btt, before it went upstream, it was designed as a stacked block
device driver. In that arrangement you could theoretically have
/dev/pmem0 as the whole disk device and then create a btt
configuration on top of /dev/pmem0p1 but leave /dev/pmem0p2 as a plain
/ raw pmem device.
We killed that design during the review process and moved btt to be an
intrinsic property of the whole device.
Another development since that one-namespace-per-region restriction
was thought to be tenable was that we (the Linux community) decided
not to pursue raw-device-dax support for block devices. Linux block
devices are tied in with the page cache and filesystems sometimes use
the block-device-inode page cache to submit metadata updates
(particularly ext4). This collided with the msync/fsync dirty
cacheline tracking implementation. We started to fix a few of those
collisions, but then decided it would be better to leave block devices
alone and move raw-device-dax support to its own / new device node
type. That's the genesis of device-dax.
So the rationale of "don't allow sub-division because an
implementation can just use block device partitions for different use
case" no longer holds.
>>>> This series adds support for reading and writing configurations that
>>>> describe multiple pmem allocations within a region. The new rules for
>>>> allocating / validating the available capacity when blk and pmem regions
>>>> alias are (quoting space_valid()):
>>>>
>>>> BLK-space is valid as long as it does not precede a PMEM
>>>> allocation in a given region. PMEM-space must be contiguous
>>>> and adjacent to an existing existing allocation (if one
>>>> exists).
>>>
>>> Why is this new rule necessary? Is this a HW-specific rule or something
>>> related to how Linux could possibly support something? Why do we care
>>> whether blk-space is before or after pmem-space? If it's a HW-specific
>>> rule, then shouldn't the enforcement be in the management tool that
>>> configures the namespaces?
>>
>> It is not HW specific, and it's not new in the sense that we already
>> arrange for pmem to be allocated from low addresses and blk to be
>> allocated from high addresses.
>
> Who's the "we"?
"We" == the current Linux kernel implementation, i.e. we the Linux community.
> Does the location within the region come from the OS
> or from the tool that created the namespace? (I should probably know
> this but not having labels, I've never looked at this.)
The location is chosen by the kernel. Userspace only selects the size.
> If we're relaxing some of the rules, it seems like one could have
> pmem, then block, then free space, and later want to use free space
> for another pmem range. If hardware supported it and the management
> tool created it, would the kernel allow it?
As long as external tooling lays down those labels in that manner the
kernel will honor it, but as far as the kernel is concerned that free
space is reserved for further block capacity. This is why the kernel
allocates block bottom-up and pmem top-down so that free space can be
allocated to either interface until it's all consumed. This lets the
kernel's aliasing code remain relatively simple.
>> If another implementation violated
>> this constraint Linux would parse it just fine. The constraint is a
>> Linux decision to maximize available pmem capacity when blk and pmem
>> alias. So this is a situation where Linux is liberal in what it will
>> accept when reading labels, but conservative on the configurations it
>> will create when writing labels.
>
> Is it ndctl that's being conservative? It seems like the kernel shouldn't care.
ndctl only sees 'available_size'. The kernel makes the placement
decision, and makes it as conservatively as possible.
>>
>>>> Where "adjacent" allocations grow an existing namespace. Note that
>>>> growing a namespace is potentially destructive if free space is consumed
>>>> from a location preceding the current allocation. There is no support
>>>> for dis-continuity within a given namespace allocation.
>>>
>>> Are you talking about DPAs here?
>>
>> No, this is referring to system-physical-address partitioning.
>>
>>>> Previously, since there was only one namespace per-region, the resulting
>>>> pmem device would be named after the region. Now, subsequent namespaces
>>>> after the first are named with the region index and a
>>>> ".<namespace-index>" suffix. For example:
>>>>
>>>> /dev/pmem0.1
>>>
>>> According to the existing namespace spec, you can already have multiple
>>> block namespaces on a device. I've not see a system with block namespaces
>>> so what do those /dev entries look like? (The dots are somewhat unattractive.)
>>
>> Block namespaces result in devices with names like "/dev/ndblk0.0"
>> where the X.Y numbers are <region-index>.<namespace-index>. This new
>> naming for pmem devices is following that precedent. The "dot" was
>> originally adopted from Linux USB device naming.
>
> Does this mean that if someone updates their kernel then their /dev/pmem0
> becomes /dev/pmem0.0? Or do you only get the dot if there is more
> than one namespace per region?
Correct, /dev/pmem0 always remains /dev/pmem0, only follow on
sub-divisions get the new suffix.