2017-06-08 18:37:26

by Kani, Toshimitsu

[permalink] [raw]
Subject: [PATCH v2 0/2] NVDIMM memory error notification support

ACPI 6.2 defines a new ACPI notification value to NVDIMM Root Device.
This allows BIOS to inform the OS that new uncorrectable memory error
is detected during run-time.

This patch-set adds support of this notification, which starts ARS and
updates badblocks information to prevent the OS from consuming the new
error.

v2:
- Set flags Bit[1] for Start ARS. (Dan Williams)

Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
---
Toshi Kani (2):
1/2 acpi/nfit: Add support of NVDIMM memory error notification in ACPI 6.2
2/2 acpi/nfit: Issue Start ARS to retrieve existing records

---
drivers/acpi/nfit/core.c | 40 ++++++++++++++++++++++++++++++++--------
drivers/acpi/nfit/mce.c | 2 +-
drivers/acpi/nfit/nfit.h | 4 +++-
include/uapi/linux/ndctl.h | 1 +
4 files changed, 37 insertions(+), 10 deletions(-)


2017-06-08 18:37:29

by Kani, Toshimitsu

[permalink] [raw]
Subject: [PATCH v2 1/2] acpi/nfit: Add support of NVDIMM memory error notification in ACPI 6.2

ACPI 6.2 defines a new ACPI notification value to NVDIMM Root Device
in Table 5-169.

0x81 Unconsumed Uncorrectable Memory Error Detected
Used to pro-actively notify OSPM of uncorrectable memory errors
detected (for example a memory scrubbing engine that continuously
scans the NVDIMMs memory). This is an optional notification. Only
locations that were mapped in to SPA by the platform will generate
a notification.

Add support of this notification value by initiating an ARS scan. This
will find new error locations and add their badblocks information.

Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
Signed-off-by: Toshi Kani <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Linda Knippers <[email protected]>
---
drivers/acpi/nfit/core.c | 28 ++++++++++++++++++++++------
drivers/acpi/nfit/nfit.h | 1 +
2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 656acb5..cc22778 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2967,7 +2967,7 @@ static int acpi_nfit_remove(struct acpi_device *adev)
return 0;
}

-void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event)
+static void acpi_nfit_update_notify(struct device *dev, acpi_handle handle)
{
struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(dev);
struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
@@ -2975,11 +2975,6 @@ void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event)
acpi_status status;
int ret;

- dev_dbg(dev, "%s: event: %d\n", __func__, event);
-
- if (event != NFIT_NOTIFY_UPDATE)
- return;
-
if (!dev->driver) {
/* dev->driver may be null if we're being removed */
dev_dbg(dev, "%s: no driver found for dev\n", __func__);
@@ -3016,6 +3011,27 @@ void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event)
dev_err(dev, "Invalid _FIT\n");
kfree(buf.pointer);
}
+
+static void acpi_nfit_uc_error_notify(struct device *dev, acpi_handle handle)
+{
+ struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(dev);
+
+ acpi_nfit_ars_rescan(acpi_desc);
+}
+
+void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event)
+{
+ dev_dbg(dev, "%s: event: 0x%x\n", __func__, event);
+
+ switch (event) {
+ case NFIT_NOTIFY_UPDATE:
+ return acpi_nfit_update_notify(dev, handle);
+ case NFIT_NOTIFY_UC_MEMORY_ERROR:
+ return acpi_nfit_uc_error_notify(dev, handle);
+ default:
+ return;
+ }
+}
EXPORT_SYMBOL_GPL(__acpi_nfit_notify);

static void acpi_nfit_notify(struct acpi_device *adev, u32 event)
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index 58fb7d6..6cf9d21 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -80,6 +80,7 @@ enum {

enum nfit_root_notifiers {
NFIT_NOTIFY_UPDATE = 0x80,
+ NFIT_NOTIFY_UC_MEMORY_ERROR = 0x81,
};

enum nfit_dimm_notifiers {

2017-06-08 18:37:33

by Kani, Toshimitsu

[permalink] [raw]
Subject: [PATCH v2 2/2] acpi/nfit: Issue Start ARS to retrieve existing records

ACPI 6.2 defines in section 9.20.7.2 that the OSPM may call a Start
ARS with Flags Bit [1] set upon receiving the 0x81 notification.

Upon receiving the notification, the OSPM may decide to issue
a Start ARS with Flags Bit [1] set to prepare for the retrieval
of existing records and issue the Query ARS Status function to
retrieve the records.

Add support to call a Start ARS from acpi_nfit_uc_error_notify()
with ND_ARS_RETURN_PREV_DATA set when HW_ERROR_SCRUB_ON is not set.

Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
Signed-off-by: Toshi Kani <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Linda Knippers <[email protected]>
---
drivers/acpi/nfit/core.c | 14 +++++++++++---
drivers/acpi/nfit/mce.c | 2 +-
drivers/acpi/nfit/nfit.h | 3 ++-
include/uapi/linux/ndctl.h | 1 +
4 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index cc22778..d820d72 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1031,7 +1031,7 @@ static ssize_t scrub_store(struct device *dev,
if (nd_desc) {
struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);

- rc = acpi_nfit_ars_rescan(acpi_desc);
+ rc = acpi_nfit_ars_rescan(acpi_desc, 0);
}
device_unlock(dev);
if (rc)
@@ -2057,6 +2057,10 @@ static int ars_start(struct acpi_nfit_desc *acpi_desc, struct nfit_spa *nfit_spa
ars_start.type = ND_ARS_VOLATILE;
else
return -ENOTTY;
+ if (nfit_spa->ars_prev_data) {
+ ars_start.flags |= ND_ARS_RETURN_PREV_DATA;
+ nfit_spa->ars_prev_data = 0;
+ }

rc = nd_desc->ndctl(nd_desc, NULL, ND_CMD_ARS_START, &ars_start,
sizeof(ars_start), &cmd_rc);
@@ -2817,7 +2821,7 @@ static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}

-int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
+int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc, int flags)
{
struct device *dev = acpi_desc->dev;
struct nfit_spa *nfit_spa;
@@ -2838,6 +2842,8 @@ int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
continue;

nfit_spa->ars_required = 1;
+ if (flags & ND_ARS_RETURN_PREV_DATA)
+ nfit_spa->ars_prev_data = 1;
}
queue_work(nfit_wq, &acpi_desc->work);
dev_dbg(dev, "%s: ars_scan triggered\n", __func__);
@@ -3015,8 +3021,10 @@ static void acpi_nfit_update_notify(struct device *dev, acpi_handle handle)
static void acpi_nfit_uc_error_notify(struct device *dev, acpi_handle handle)
{
struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(dev);
+ int flags = (acpi_desc->scrub_mode == HW_ERROR_SCRUB_ON) ?
+ 0 : ND_ARS_RETURN_PREV_DATA;

- acpi_nfit_ars_rescan(acpi_desc);
+ acpi_nfit_ars_rescan(acpi_desc, flags);
}

void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event)
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index fd86bec..feeb95d 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -79,7 +79,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
* already in progress, just let that be the last
* authoritative one
*/
- acpi_nfit_ars_rescan(acpi_desc);
+ acpi_nfit_ars_rescan(acpi_desc, 0);
}
break;
}
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index 6cf9d21..a6f8833 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -91,6 +91,7 @@ struct nfit_spa {
struct list_head list;
struct nd_region *nd_region;
unsigned int ars_required:1;
+ unsigned int ars_prev_data:1;
u32 clear_err_unit;
u32 max_ars;
struct acpi_nfit_system_address spa[0];
@@ -208,7 +209,7 @@ struct nfit_blk {

extern struct list_head acpi_descs;
extern struct mutex acpi_desc_lock;
-int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc);
+int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc, int flags);

#ifdef CONFIG_X86_MCE
void nfit_mce_register(void);
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index 7ad3863..70a89f7 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -169,6 +169,7 @@ enum {
enum {
ND_ARS_VOLATILE = 1,
ND_ARS_PERSISTENT = 2,
+ ND_ARS_RETURN_PREV_DATA = 1 << 1,
ND_CONFIG_LOCKED = 1,
};


2017-06-15 21:44:46

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] acpi/nfit: Add support of NVDIMM memory error notification in ACPI 6.2

On Thu, Jun 8, 2017 at 11:36 AM, Toshi Kani <[email protected]> wrote:
> ACPI 6.2 defines a new ACPI notification value to NVDIMM Root Device
> in Table 5-169.
>
> 0x81 Unconsumed Uncorrectable Memory Error Detected
> Used to pro-actively notify OSPM of uncorrectable memory errors
> detected (for example a memory scrubbing engine that continuously
> scans the NVDIMMs memory). This is an optional notification. Only
> locations that were mapped in to SPA by the platform will generate
> a notification.
>
> Add support of this notification value by initiating an ARS scan. This
> will find new error locations and add their badblocks information.
>

Looks good to me, applied.

2017-06-15 22:05:31

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] acpi/nfit: Issue Start ARS to retrieve existing records

Thanks for this. I think it's going in the right direction, but one
comment below.

On Thu, Jun 8, 2017 at 11:36 AM, Toshi Kani <[email protected]> wrote:
> ACPI 6.2 defines in section 9.20.7.2 that the OSPM may call a Start
> ARS with Flags Bit [1] set upon receiving the 0x81 notification.
>
> Upon receiving the notification, the OSPM may decide to issue
> a Start ARS with Flags Bit [1] set to prepare for the retrieval
> of existing records and issue the Query ARS Status function to
> retrieve the records.
>
> Add support to call a Start ARS from acpi_nfit_uc_error_notify()
> with ND_ARS_RETURN_PREV_DATA set when HW_ERROR_SCRUB_ON is not set.
>
> Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> Signed-off-by: Toshi Kani <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Vishal Verma <[email protected]>
> Cc: Linda Knippers <[email protected]>
> ---
> drivers/acpi/nfit/core.c | 14 +++++++++++---
> drivers/acpi/nfit/mce.c | 2 +-
> drivers/acpi/nfit/nfit.h | 3 ++-
> include/uapi/linux/ndctl.h | 1 +
> 4 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index cc22778..d820d72 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -1031,7 +1031,7 @@ static ssize_t scrub_store(struct device *dev,
> if (nd_desc) {
> struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
>
> - rc = acpi_nfit_ars_rescan(acpi_desc);
> + rc = acpi_nfit_ars_rescan(acpi_desc, 0);
> }
> device_unlock(dev);
> if (rc)
> @@ -2057,6 +2057,10 @@ static int ars_start(struct acpi_nfit_desc *acpi_desc, struct nfit_spa *nfit_spa
> ars_start.type = ND_ARS_VOLATILE;
> else
> return -ENOTTY;
> + if (nfit_spa->ars_prev_data) {
> + ars_start.flags |= ND_ARS_RETURN_PREV_DATA;
> + nfit_spa->ars_prev_data = 0;
> + }

I'd rather you plumb a new 'flags' parameter all the way through from
acpi_nfit_ars_rescan() to ars_start() rather than carrying this as a
property of nfit_spa.

2017-06-17 00:03:36

by Kani, Toshimitsu

[permalink] [raw]
Subject: RE: [PATCH v2 2/2] acpi/nfit: Issue Start ARS to retrieve existing records

> > --- a/drivers/acpi/nfit/core.c
> > +++ b/drivers/acpi/nfit/core.c
> > @@ -1031,7 +1031,7 @@ static ssize_t scrub_store(struct device *dev,
> > if (nd_desc) {
> > struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
> >
> > - rc = acpi_nfit_ars_rescan(acpi_desc);
> > + rc = acpi_nfit_ars_rescan(acpi_desc, 0);
> > }
> > device_unlock(dev);
> > if (rc)
> > @@ -2057,6 +2057,10 @@ static int ars_start(struct acpi_nfit_desc
> *acpi_desc, struct nfit_spa *nfit_spa
> > ars_start.type = ND_ARS_VOLATILE;
> > else
> > return -ENOTTY;
> > + if (nfit_spa->ars_prev_data) {
> > + ars_start.flags |= ND_ARS_RETURN_PREV_DATA;
> > + nfit_spa->ars_prev_data = 0;
> > + }
>
> I'd rather you plumb a new 'flags' parameter all the way through from
> acpi_nfit_ars_rescan() to ars_start() rather than carrying this as a
> property of nfit_spa.

Yes, I wanted to carry 'flags' all the way, but since acpi_nfit_ars_rescan()
calls acpi_nfit_scrub() via acpi_desc->work, all info needs to be marshalled
into struct acpi_nfit_desc. Using nfit_spa allows a request to be carried as
per-spa basis...

Thanks,
-Toshi



2017-06-17 00:10:06

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] acpi/nfit: Issue Start ARS to retrieve existing records

On Fri, Jun 16, 2017 at 5:01 PM, Kani, Toshimitsu <[email protected]> wrote:
>> > --- a/drivers/acpi/nfit/core.c
>> > +++ b/drivers/acpi/nfit/core.c
>> > @@ -1031,7 +1031,7 @@ static ssize_t scrub_store(struct device *dev,
>> > if (nd_desc) {
>> > struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
>> >
>> > - rc = acpi_nfit_ars_rescan(acpi_desc);
>> > + rc = acpi_nfit_ars_rescan(acpi_desc, 0);
>> > }
>> > device_unlock(dev);
>> > if (rc)
>> > @@ -2057,6 +2057,10 @@ static int ars_start(struct acpi_nfit_desc
>> *acpi_desc, struct nfit_spa *nfit_spa
>> > ars_start.type = ND_ARS_VOLATILE;
>> > else
>> > return -ENOTTY;
>> > + if (nfit_spa->ars_prev_data) {
>> > + ars_start.flags |= ND_ARS_RETURN_PREV_DATA;
>> > + nfit_spa->ars_prev_data = 0;
>> > + }
>>
>> I'd rather you plumb a new 'flags' parameter all the way through from
>> acpi_nfit_ars_rescan() to ars_start() rather than carrying this as a
>> property of nfit_spa.
>
> Yes, I wanted to carry 'flags' all the way, but since acpi_nfit_ars_rescan()
> calls acpi_nfit_scrub() via acpi_desc->work, all info needs to be marshalled
> into struct acpi_nfit_desc. Using nfit_spa allows a request to be carried as
> per-spa basis...

Ah ok, but I still think it does not belong to a spa. This is control
/ context information for the workqueue and that belongs with
acpi_nfit_desc. It think it's fine if all spas get re-scrubbed with
the "prev_data" flag in the the notification case.

2017-06-17 00:47:40

by Kani, Toshimitsu

[permalink] [raw]
Subject: RE: [PATCH v2 2/2] acpi/nfit: Issue Start ARS to retrieve existing records

> On Fri, Jun 16, 2017 at 5:01 PM, Kani, Toshimitsu <[email protected]> wrote:
> >> > --- a/drivers/acpi/nfit/core.c
> >> > +++ b/drivers/acpi/nfit/core.c
> >> > @@ -1031,7 +1031,7 @@ static ssize_t scrub_store(struct device *dev,
> >> > if (nd_desc) {
> >> > struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
> >> >
> >> > - rc = acpi_nfit_ars_rescan(acpi_desc);
> >> > + rc = acpi_nfit_ars_rescan(acpi_desc, 0);
> >> > }
> >> > device_unlock(dev);
> >> > if (rc)
> >> > @@ -2057,6 +2057,10 @@ static int ars_start(struct acpi_nfit_desc
> >> *acpi_desc, struct nfit_spa *nfit_spa
> >> > ars_start.type = ND_ARS_VOLATILE;
> >> > else
> >> > return -ENOTTY;
> >> > + if (nfit_spa->ars_prev_data) {
> >> > + ars_start.flags |= ND_ARS_RETURN_PREV_DATA;
> >> > + nfit_spa->ars_prev_data = 0;
> >> > + }
> >>
> >> I'd rather you plumb a new 'flags' parameter all the way through from
> >> acpi_nfit_ars_rescan() to ars_start() rather than carrying this as a
> >> property of nfit_spa.
> >
> > Yes, I wanted to carry 'flags' all the way, but since acpi_nfit_ars_rescan()
> > calls acpi_nfit_scrub() via acpi_desc->work, all info needs to be marshalled
> > into struct acpi_nfit_desc. Using nfit_spa allows a request to be carried as
> > per-spa basis...
>
> Ah ok, but I still think it does not belong to a spa. This is control
> / context information for the workqueue and that belongs with
> acpi_nfit_desc. It think it's fine if all spas get re-scrubbed with
> the "prev_data" flag in the the notification case.

Sounds good. I will add the flags info to acpi_nfit_desc and update
this patch.

Thanks!
-Toshi