2017-04-14 17:04:01

by Dan Williams

[permalink] [raw]
Subject: [PATCH 0/5] libnvdimm: acpi updates and a revert

With Dave's recent fix [1], we can restore error clearing for btt i/o in
4.12.

ACPI 6.1 introduced new health state flags. Beyond reflecting them in
the dimmX/flags sysfs attribute we also need to handle the deeper
implications of the ACPI_NFIT_MEM_MAP_FAILED flag which changes
assumptions on how the driver discovers dimms. In the "map failed" case
there may missing or no SPA entries associated with a dimm. Those dimms
should still be registered with libnvdimm so that the error state can be
communicated and recovery attempted.

[1]: https://patchwork.kernel.org/patch/9680035/

---

Dan Williams (5):
Revert "libnvdimm: band aid btt vs clear poison locking"
acpi, nfit: add support for acpi 6.1 dimm state flags
tools/testing/nvdimm: test acpi 6.1 health state flags
acpi, nfit: support "map failed" dimms
acpi, nfit: limit ->flush_probe() to initialization work


drivers/acpi/nfit/core.c | 61 +++++++++++++++++++++++++++++++-------
drivers/acpi/nfit/nfit.h | 1 +
drivers/nvdimm/claim.c | 10 +-----
tools/testing/nvdimm/test/nfit.c | 40 +++++++++++++++++++++++--
4 files changed, 88 insertions(+), 24 deletions(-)


2017-04-14 17:04:10

by Dan Williams

[permalink] [raw]
Subject: [PATCH 2/5] acpi, nfit: add support for acpi 6.1 dimm state flags

Add support for the ACPI_NFIT_MEM_MAP_FAILED ("map_fail") and
ACPI_NFIT_MEM_HEALTH_ENABLED ("smart_notify") health state flags. The
"map_fail" flag identifies DIMMs that were not mapped into one or more
physical address ranges. The "health_notify" flag indicates whether
platform firmware will send notifications when there is new SMART health
data to consume.

Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 53943d6f4214..05829de43b1d 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1246,12 +1246,14 @@ static ssize_t flags_show(struct device *dev,
{
u16 flags = to_nfit_memdev(dev)->flags;

- return sprintf(buf, "%s%s%s%s%s\n",
+ return sprintf(buf, "%s%s%s%s%s%s%s\n",
flags & ACPI_NFIT_MEM_SAVE_FAILED ? "save_fail " : "",
flags & ACPI_NFIT_MEM_RESTORE_FAILED ? "restore_fail " : "",
flags & ACPI_NFIT_MEM_FLUSH_FAILED ? "flush_fail " : "",
flags & ACPI_NFIT_MEM_NOT_ARMED ? "not_armed " : "",
- flags & ACPI_NFIT_MEM_HEALTH_OBSERVED ? "smart_event " : "");
+ flags & ACPI_NFIT_MEM_HEALTH_OBSERVED ? "smart_event " : "",
+ flags & ACPI_NFIT_MEM_MAP_FAILED ? "map_fail " : "",
+ flags & ACPI_NFIT_MEM_HEALTH_ENABLED ? "smart_notify " : "");
}
static DEVICE_ATTR_RO(flags);


2017-04-14 17:04:21

by Dan Williams

[permalink] [raw]
Subject: [PATCH 4/5] acpi, nfit: support "map failed" dimms

Stop requiring dimms be successfully mapped into a
system-physical-address range. For provisioning and hardware remediation
purposes the kernel should account for failed devices in sysfs. If
possible it should still allow management commands to be sent to the
device.

Reported-by: Toshi Kani <[email protected]>
Reported-by: Linda Knippers <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit/core.c | 44 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 05829de43b1d..06738df477db 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -743,23 +743,33 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
{
struct nfit_mem *nfit_mem, *found;
struct nfit_memdev *nfit_memdev;
- int type = nfit_spa_type(spa);
+ int type = spa ? nfit_spa_type(spa) : 0;

switch (type) {
case NFIT_SPA_DCR:
case NFIT_SPA_PM:
break;
default:
- return 0;
+ if (spa)
+ return 0;
}

+ /*
+ * This loop runs in two modes, when a dimm is mapped the loop
+ * adds memdev associations to an existing dimm, or creates a
+ * dimm. In the unmapped dimm case this loop sweeps for memdev
+ * instances with an invalid / zero range_index and adds those
+ * dimms without spa associations.
+ */
list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
struct nfit_flush *nfit_flush;
struct nfit_dcr *nfit_dcr;
u32 device_handle;
u16 dcr;

- if (nfit_memdev->memdev->range_index != spa->range_index)
+ if (spa && nfit_memdev->memdev->range_index != spa->range_index)
+ continue;
+ if (!spa && nfit_memdev->memdev->range_index)
continue;
found = NULL;
dcr = nfit_memdev->memdev->region_index;
@@ -844,14 +854,15 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
break;
}
nfit_mem_init_bdw(acpi_desc, nfit_mem, spa);
- } else {
+ } else if (type == NFIT_SPA_PM) {
/*
* A single dimm may belong to multiple SPA-PM
* ranges, record at least one in addition to
* any SPA-DCR range.
*/
nfit_mem->memdev_pmem = nfit_memdev->memdev;
- }
+ } else
+ nfit_mem->memdev_dcr = nfit_memdev->memdev;
}

return 0;
@@ -875,6 +886,8 @@ static int nfit_mem_cmp(void *priv, struct list_head *_a, struct list_head *_b)
static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
{
struct nfit_spa *nfit_spa;
+ int rc;
+

/*
* For each SPA-DCR or SPA-PMEM address range find its
@@ -885,13 +898,20 @@ static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
* BDWs are optional.
*/
list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
- int rc;
-
rc = nfit_mem_dcr_init(acpi_desc, nfit_spa->spa);
if (rc)
return rc;
}

+ /*
+ * If a DIMM has failed to be mapped into SPA there will be no
+ * SPA entries above. Find and register all the unmapped DIMMs
+ * for reporting and recovery purposes.
+ */
+ rc = nfit_mem_dcr_init(acpi_desc, NULL);
+ if (rc)
+ return rc;
+
list_sort(NULL, &acpi_desc->dimms, nfit_mem_cmp);

return 0;
@@ -1301,8 +1321,16 @@ static umode_t acpi_nfit_dimm_attr_visible(struct kobject *kobj,
struct device *dev = container_of(kobj, struct device, kobj);
struct nvdimm *nvdimm = to_nvdimm(dev);

- if (!to_nfit_dcr(dev))
+ if (!to_nfit_dcr(dev)) {
+ /* Without a dcr only the memdev attributes can be surfaced */
+ if (a == &dev_attr_handle.attr || a == &dev_attr_phys_id.attr
+ || a == &dev_attr_flags.attr
+ || a == &dev_attr_family.attr
+ || a == &dev_attr_dsm_mask.attr)
+ return a->mode;
return 0;
+ }
+
if (a == &dev_attr_format1.attr && num_nvdimm_formats(nvdimm) <= 1)
return 0;
return a->mode;

2017-04-14 17:04:29

by Dan Williams

[permalink] [raw]
Subject: [PATCH 1/5] Revert "libnvdimm: band aid btt vs clear poison locking"

This reverts commit 4aa5615e080a "libnvdimm: band aid btt vs clear
poison locking".

Now that poison list locking has been converted to a spinlock and poison
list entry allocation during i/o has been converted to GFP_NOWAIT,
revert the band-aid that disabled error clearing from btt i/o.

Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/claim.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index ca6d572c48fc..b3323c0697f6 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -243,15 +243,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}

if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) {
- /*
- * FIXME: nsio_rw_bytes() may be called from atomic
- * context in the btt case and nvdimm_clear_poison()
- * takes a sleeping lock. Until the locking can be
- * reworked this capability requires that the namespace
- * is not claimed by btt.
- */
- if (IS_ALIGNED(offset, 512) && IS_ALIGNED(size, 512)
- && (!ndns->claim || !is_nd_btt(ndns->claim))) {
+ if (IS_ALIGNED(offset, 512) && IS_ALIGNED(size, 512)) {
long cleared;

cleared = nvdimm_clear_poison(&ndns->dev, offset, size);

2017-04-14 17:04:32

by Dan Williams

[permalink] [raw]
Subject: [PATCH 5/5] acpi, nfit: limit ->flush_probe() to initialization work

The nvdimm probe flushing mechanism gives userspace a sync point where
it knows all asynchronous driver probe sequences have completed.
However, it need not wait for other asynchronous actions, like
on-demand address-range-scrub. Track the init work separately from other
work in the workqueue, and only flush the former.

Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit/core.c | 13 ++++++++++---
drivers/acpi/nfit/nfit.h | 1 +
2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 06738df477db..17cac9d369e0 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -738,7 +738,7 @@ static void nfit_mem_init_bdw(struct acpi_nfit_desc *acpi_desc,
}
}

-static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
+static int __nfit_mem_init(struct acpi_nfit_desc *acpi_desc,
struct acpi_nfit_system_address *spa)
{
struct nfit_mem *nfit_mem, *found;
@@ -898,7 +898,7 @@ static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
* BDWs are optional.
*/
list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
- rc = nfit_mem_dcr_init(acpi_desc, nfit_spa->spa);
+ rc = __nfit_mem_init(acpi_desc, nfit_spa->spa);
if (rc)
return rc;
}
@@ -908,7 +908,7 @@ static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
* SPA entries above. Find and register all the unmapped DIMMs
* for reporting and recovery purposes.
*/
- rc = nfit_mem_dcr_init(acpi_desc, NULL);
+ rc = __nfit_mem_init(acpi_desc, NULL);
if (rc)
return rc;

@@ -2568,6 +2568,7 @@ static void acpi_nfit_scrub(struct work_struct *work)
acpi_nfit_register_region(acpi_desc, nfit_spa);
}
}
+ acpi_desc->init_complete = 1;

list_for_each_entry(nfit_spa, &acpi_desc->spas, list)
acpi_nfit_async_scrub(acpi_desc, nfit_spa);
@@ -2771,6 +2772,12 @@ static int acpi_nfit_flush_probe(struct nvdimm_bus_descriptor *nd_desc)
device_lock(dev);
device_unlock(dev);

+ /* bounce the init_mutex to make init_complete valid */
+ mutex_lock(&acpi_desc->init_mutex);
+ mutex_unlock(&acpi_desc->init_mutex);
+ if (acpi_desc->init_complete)
+ return 0;
+
/*
* Scrub work could take 10s of seconds, userspace may give up so we
* need to be interruptible while waiting.
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index fc29c2e9832e..256829597585 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -163,6 +163,7 @@ struct acpi_nfit_desc {
unsigned int scrub_count;
unsigned int scrub_mode;
unsigned int cancel:1;
+ unsigned int init_complete:1;
unsigned long dimm_cmd_force_en;
unsigned long bus_cmd_force_en;
int (*blk_do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,

2017-04-14 17:05:02

by Dan Williams

[permalink] [raw]
Subject: [PATCH 3/5] tools/testing/nvdimm: test acpi 6.1 health state flags

Add a simulated dimm with an ACPI_NFIT_MEM_MAP_FAILED indication, and
set the ACPI_NFIT_MEM_HEALTH_ENABLED flag on all the dimms where
nfit_test simulates health events, but spread it out over several
redundant memdev entries to test that the nfit driver coalesces all the
flags.

Signed-off-by: Dan Williams <[email protected]>
---
tools/testing/nvdimm/test/nfit.c | 40 +++++++++++++++++++++++++++++++++++---
1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 798f17655433..bc02f28ed8b8 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -132,6 +132,7 @@ static u32 handle[] = {
[3] = NFIT_DIMM_HANDLE(0, 0, 1, 0, 1),
[4] = NFIT_DIMM_HANDLE(0, 1, 0, 0, 0),
[5] = NFIT_DIMM_HANDLE(1, 0, 0, 0, 0),
+ [6] = NFIT_DIMM_HANDLE(1, 0, 0, 0, 1),
};

static unsigned long dimm_fail_cmd_flags[NUM_DCR];
@@ -728,8 +729,8 @@ static int nfit_test0_alloc(struct nfit_test *t)
static int nfit_test1_alloc(struct nfit_test *t)
{
size_t nfit_size = sizeof(struct acpi_nfit_system_address) * 2
- + sizeof(struct acpi_nfit_memory_map)
- + offsetof(struct acpi_nfit_control_region, window_size);
+ + sizeof(struct acpi_nfit_memory_map) * 2
+ + offsetof(struct acpi_nfit_control_region, window_size) * 2;
int i;

t->nfit_buf = test_alloc(t, nfit_size, &t->nfit_dma);
@@ -906,6 +907,7 @@ static void nfit_test0_setup(struct nfit_test *t)
memdev->address = 0;
memdev->interleave_index = 0;
memdev->interleave_ways = 2;
+ memdev->flags = ACPI_NFIT_MEM_HEALTH_ENABLED;

/* mem-region2 (spa1, dimm0) */
memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memory_map) * 2;
@@ -921,6 +923,7 @@ static void nfit_test0_setup(struct nfit_test *t)
memdev->address = SPA0_SIZE/2;
memdev->interleave_index = 0;
memdev->interleave_ways = 4;
+ memdev->flags = ACPI_NFIT_MEM_HEALTH_ENABLED;

/* mem-region3 (spa1, dimm1) */
memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memory_map) * 3;
@@ -951,6 +954,7 @@ static void nfit_test0_setup(struct nfit_test *t)
memdev->address = SPA0_SIZE/2;
memdev->interleave_index = 0;
memdev->interleave_ways = 4;
+ memdev->flags = ACPI_NFIT_MEM_HEALTH_ENABLED;

/* mem-region5 (spa1, dimm3) */
memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memory_map) * 5;
@@ -1086,6 +1090,7 @@ static void nfit_test0_setup(struct nfit_test *t)
memdev->address = 0;
memdev->interleave_index = 0;
memdev->interleave_ways = 1;
+ memdev->flags = ACPI_NFIT_MEM_HEALTH_ENABLED;

offset = offset + sizeof(struct acpi_nfit_memory_map) * 14;
/* dcr-descriptor0: blk */
@@ -1384,6 +1389,7 @@ static void nfit_test0_setup(struct nfit_test *t)
memdev->address = 0;
memdev->interleave_index = 0;
memdev->interleave_ways = 1;
+ memdev->flags = ACPI_NFIT_MEM_HEALTH_ENABLED;

/* mem-region16 (spa/bdw4, dimm4) */
memdev = nfit_buf + offset +
@@ -1486,6 +1492,34 @@ static void nfit_test1_setup(struct nfit_test *t)
dcr->code = NFIT_FIC_BYTE;
dcr->windows = 0;

+ offset += dcr->header.length;
+ memdev = nfit_buf + offset;
+ memdev->header.type = ACPI_NFIT_TYPE_MEMORY_MAP;
+ memdev->header.length = sizeof(*memdev);
+ memdev->device_handle = handle[6];
+ memdev->physical_id = 0;
+ memdev->region_id = 0;
+ memdev->range_index = 0;
+ memdev->region_index = 0+2;
+ memdev->region_size = SPA2_SIZE;
+ memdev->region_offset = 0;
+ memdev->address = 0;
+ memdev->interleave_index = 0;
+ memdev->interleave_ways = 1;
+ memdev->flags = ACPI_NFIT_MEM_MAP_FAILED;
+
+ /* dcr-descriptor1 */
+ offset += sizeof(*memdev);
+ dcr = nfit_buf + offset;
+ dcr->header.type = ACPI_NFIT_TYPE_CONTROL_REGION;
+ dcr->header.length = offsetof(struct acpi_nfit_control_region,
+ window_size);
+ dcr->region_index = 0+2;
+ dcr_common_init(dcr);
+ dcr->serial_number = ~handle[6];
+ dcr->code = NFIT_FIC_BYTE;
+ dcr->windows = 0;
+
post_ars_status(&t->ars_state, t->spa_set_dma[0], SPA2_SIZE);

acpi_desc = &t->acpi_desc;
@@ -1907,7 +1941,7 @@ static __init int nfit_test_init(void)
case 1:
nfit_test->num_pm = 1;
nfit_test->dcr_idx = NUM_DCR;
- nfit_test->num_dcr = 1;
+ nfit_test->num_dcr = 2;
nfit_test->alloc = nfit_test1_alloc;
nfit_test->setup = nfit_test1_setup;
break;

2017-04-14 17:30:03

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 5/5] acpi, nfit: limit ->flush_probe() to initialization work

On Fri, Apr 14, 2017 at 9:58 AM, Dan Williams <[email protected]> wrote:
> The nvdimm probe flushing mechanism gives userspace a sync point where
> it knows all asynchronous driver probe sequences have completed.
> However, it need not wait for other asynchronous actions, like
> on-demand address-range-scrub. Track the init work separately from other
> work in the workqueue, and only flush the former.
>
> Signed-off-by: Dan Williams <[email protected]>
> ---
> drivers/acpi/nfit/core.c | 13 ++++++++++---
> drivers/acpi/nfit/nfit.h | 1 +
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 06738df477db..17cac9d369e0 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -738,7 +738,7 @@ static void nfit_mem_init_bdw(struct acpi_nfit_desc *acpi_desc,
> }
> }
>
> -static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
> +static int __nfit_mem_init(struct acpi_nfit_desc *acpi_desc,
> struct acpi_nfit_system_address *spa)
> {
> struct nfit_mem *nfit_mem, *found;
> @@ -898,7 +898,7 @@ static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
> * BDWs are optional.
> */
> list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> - rc = nfit_mem_dcr_init(acpi_desc, nfit_spa->spa);
> + rc = __nfit_mem_init(acpi_desc, nfit_spa->spa);

Whoops, this function rename should have been folded into the previous patch.

2017-04-17 17:29:14

by Kani, Toshimitsu

[permalink] [raw]
Subject: Re: [PATCH 2/5] acpi, nfit: add support for acpi 6.1 dimm state flags

On Fri, 2017-04-14 at 09:58 -0700, Dan Williams wrote:
> Add support for the ACPI_NFIT_MEM_MAP_FAILED ("map_fail") and
> ACPI_NFIT_MEM_HEALTH_ENABLED ("smart_notify") health state flags. The
> "map_fail" flag identifies DIMMs that were not mapped into one or
> more physical address ranges. The "health_notify" flag indicates
> whether platform firmware will send notifications when there is new
> SMART health data to consume.
>
> Signed-off-by: Dan Williams <[email protected]>
> ---
>  drivers/acpi/nfit/core.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 53943d6f4214..05829de43b1d 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -1246,12 +1246,14 @@ static ssize_t flags_show(struct device *dev,
>  {
>   u16 flags = to_nfit_memdev(dev)->flags;
>  
> - return sprintf(buf, "%s%s%s%s%s\n",
> + return sprintf(buf, "%s%s%s%s%s%s%s\n",
>   flags & ACPI_NFIT_MEM_SAVE_FAILED ? "save_fail " :
> "",
>   flags & ACPI_NFIT_MEM_RESTORE_FAILED ? "restore_fail
> " : "",
>   flags & ACPI_NFIT_MEM_FLUSH_FAILED ? "flush_fail " :
> "",
>   flags & ACPI_NFIT_MEM_NOT_ARMED ? "not_armed " : "",
> - flags & ACPI_NFIT_MEM_HEALTH_OBSERVED ? "smart_event
> " : "");
> + flags & ACPI_NFIT_MEM_HEALTH_OBSERVED ? "smart_event
> " : "",
> + flags & ACPI_NFIT_MEM_MAP_FAILED ? "map_fail " : "",
> + flags & ACPI_NFIT_MEM_HEALTH_ENABLED ? "smart_notify
> " : "");

Thanks for the update! The above change looks good, but we will also
need to make the same changes to the dev_info() in
acpi_nfit_register_dimms().

-Toshi

2017-04-17 18:16:39

by Kani, Toshimitsu

[permalink] [raw]
Subject: Re: [PATCH 4/5] acpi, nfit: support "map failed" dimms

On Fri, 2017-04-14 at 09:58 -0700, Dan Williams wrote:
> Stop requiring dimms be successfully mapped into a
> system-physical-address range. For provisioning and hardware
> remediation purposes the kernel should account for failed devices in
> sysfs. If possible it should still allow management commands to be
> sent to the device.
>
> Reported-by: Toshi Kani <[email protected]>
> Reported-by: Linda Knippers <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>

Tested the "map failed" condition with HPE NVDIMMs.

Tested-by: Toshi Kani <[email protected]>

Thanks Dan!
-Toshi