Changes since v1 [1]:
- Move the syscore callback from 'suspend' path to the 'hibernate' path
(Rafael)
- Add a new PM debug test mode, 'mem-quiet' to disable some unnecessary
hibernation steps (memory image preparation) and debug sleeps when the
hibernation code is just being used to quiet the system for firmware
activation. (Rafael)
- Greg already applied "driver-core: Introduce
DEVICE_ATTR_ADMIN_{RO,RW}" to driver-core-next, so I'll need to
duplicate that commit in nvdimm.git, or work out a common branch
baseline with Greg for this topic and driver-core-next to share.
[1]: http://lore.kernel.org/r/159312902033.1850128.1712559453279208264.stgit@dwillia2-desk3.amr.corp.intel.com
---
Quoting the documentation:
Some persistent memory devices run a firmware locally on the device /
"DIMM" to perform tasks like media management, capacity provisioning,
and health monitoring. The process of updating that firmware typically
involves a reboot because it has implications for in-flight memory
transactions. However, reboots are disruptive and at least the Intel
persistent memory platform implementation, described by the Intel ACPI
DSM specification [1], has added support for activating firmware at
runtime.
[1]: https://docs.pmem.io/persistent-memory/
The approach taken is to abstract the Intel platform specific mechanism
behind a libnvdimm-generic sysfs interface. The interface could support
runtime-firmware-activation on another architecture without need to
change userspace tooling.
The ACPI NFIT implementation involves a set of device-specific-methods
(DSMs) to 'arm' individual devices for activation and bus-level
'trigger' method to execute the activation. Informational / enumeration
methods are also provided at the bus and device level.
One complicating aspect of the memory device firmware activation is that
the memory controller may need to be quiesced, no memory cycles, during
the activation. While the platform has mechanisms to support holding off
in-flight DMA during the activation, the device response to that delay
is potentially undefined. The platform may reject a runtime firmware
update if, for example a PCI-E device does not support its completion
timeout value being increased to meet the activation time. Outside of
device timeouts the quiesce period may also violate application
timeouts.
Given the above device and application timeout considerations the
implementation defaults to hooking into the hibernation path to trigger
the activation, i.e. that a hibernate-resume cycle (at least up to the
syscore mem-quiet point) is required. That default policy ensures that
the system is in a quiescent state before ceasing memory controller
responses for the activate. However, if desired, runtime activation
without the hibernate freeze can be forced as an override.
The ndctl utility grows the following extensions / commands to drive
this mechanism:
1/ The existing update-firmware command will 'arm' devices where the
firmware image is staged by default.
ndctl update-firmware all -f firmware_image.bin
2/ The existing ability to enumerate firmware-update capabilities now
includes firmware activate capabilities at the 'bus' and 'dimm/device'
level:
ndctl list -BDF -b nfit_test.0
[
{
"provider":"nfit_test.0",
"dev":"ndbus2",
"scrub_state":"idle",
"firmware":{
"activate_method":"suspend",
"activate_state":"idle"
},
"dimms":[
{
"dev":"nmem1",
"id":"cdab-0a-07e0-ffffffff",
"handle":0,
"phys_id":0,
"security":"disabled",
"firmware":{
"current_version":0,
"can_update":true
}
},
...
3/ When the system can support activation without quiesce, or when the
hibernate-resume requirement is going to be suppressed, the new
activate-firmware command wraps that functionality:
ndctl activate-firmware nfit_test.0 --force
Otherwise, if activate_method is "suspend" then the activation can be
triggered by the mem-quiet hibernate debug state, or a full hibernate
resume:
echo mem-quiet > /sys/power/pm_debug
echo disk > /sys/power/state
---
Dan Williams (12):
libnvdimm: Validate command family indices
ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor
ACPI: NFIT: Define runtime firmware activation commands
tools/testing/nvdimm: Cleanup dimm index passing
tools/testing/nvdimm: Add command debug messages
tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation
tools/testing/nvdimm: Emulate firmware activation commands
driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW}
libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO()
libnvdimm: Add runtime firmware activation sysfs interface
PM, libnvdimm: Add 'mem-quiet' state and callback for firmware activation
ACPI: NFIT: Add runtime firmware activate support
Documentation/ABI/testing/sysfs-bus-nfit | 35 ++
Documentation/ABI/testing/sysfs-bus-nvdimm | 2
.../driver-api/nvdimm/firmware-activate.rst | 74 +++
drivers/acpi/nfit/core.c | 146 +++++--
drivers/acpi/nfit/intel.c | 426 ++++++++++++++++++++
drivers/acpi/nfit/intel.h | 61 +++
drivers/acpi/nfit/nfit.h | 39 ++
drivers/base/syscore.c | 21 +
drivers/nvdimm/bus.c | 46 ++
drivers/nvdimm/core.c | 103 +++++
drivers/nvdimm/dimm_devs.c | 99 +++++
drivers/nvdimm/namespace_devs.c | 2
drivers/nvdimm/nd-core.h | 1
drivers/nvdimm/pfn_devs.c | 2
drivers/nvdimm/region_devs.c | 2
include/linux/device.h | 4
include/linux/libnvdimm.h | 53 ++
include/linux/syscore_ops.h | 2
include/linux/sysfs.h | 7
include/uapi/linux/ndctl.h | 5
kernel/power/hibernate.c | 17 +
kernel/power/main.c | 1
kernel/power/power.h | 7
kernel/power/snapshot.c | 13 +
kernel/power/suspend.c | 12 +
tools/testing/nvdimm/test/nfit.c | 367 ++++++++++++++---
26 files changed, 1427 insertions(+), 120 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-nvdimm
create mode 100644 Documentation/driver-api/nvdimm/firmware-activate.rst
The ND_CMD_CALL format allows for a general passthrough of passlisted
commands targeting a given command set. However there is no validation
of the family index relative to what the bus supports.
- Update the NFIT bus implementation (the only one that supports
ND_CMD_CALL passthrough) to also passlist the valid set of command
family indices.
- Update the generic __nd_ioctl() path to validate that field on behalf
of all implementations.
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism")
Cc: <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit/core.c | 11 +++++++++--
drivers/acpi/nfit/nfit.h | 1 -
drivers/nvdimm/bus.c | 16 ++++++++++++++++
include/linux/libnvdimm.h | 2 ++
include/uapi/linux/ndctl.h | 4 ++++
5 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 7c138a4edc03..1f72ce1a782b 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1823,6 +1823,7 @@ static void populate_shutdown_status(struct nfit_mem *nfit_mem)
static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
struct nfit_mem *nfit_mem, u32 device_handle)
{
+ struct nvdimm_bus_descriptor *nd_desc = &acpi_desc->nd_desc;
struct acpi_device *adev, *adev_dimm;
struct device *dev = acpi_desc->dev;
unsigned long dsm_mask, label_mask;
@@ -1834,6 +1835,7 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
/* nfit test assumes 1:1 relationship between commands and dsms */
nfit_mem->dsm_mask = acpi_desc->dimm_cmd_force_en;
nfit_mem->family = NVDIMM_FAMILY_INTEL;
+ set_bit(NVDIMM_FAMILY_INTEL, &nd_desc->dimm_family_mask);
if (dcr->valid_fields & ACPI_NFIT_CONTROL_MFG_INFO_VALID)
sprintf(nfit_mem->id, "%04x-%02x-%04x-%08x",
@@ -1886,10 +1888,13 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
* Note, that checking for function0 (bit0) tells us if any commands
* are reachable through this GUID.
*/
+ clear_bit(NVDIMM_FAMILY_INTEL, &nd_desc->dimm_family_mask);
for (i = 0; i <= NVDIMM_FAMILY_MAX; i++)
- if (acpi_check_dsm(adev_dimm->handle, to_nfit_uuid(i), 1, 1))
+ if (acpi_check_dsm(adev_dimm->handle, to_nfit_uuid(i), 1, 1)) {
+ set_bit(i, &nd_desc->dimm_family_mask);
if (family < 0 || i == default_dsm_family)
family = i;
+ }
/* limit the supported commands to those that are publicly documented */
nfit_mem->family = family;
@@ -2153,6 +2158,9 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
nd_desc->cmd_mask = acpi_desc->bus_cmd_force_en;
nd_desc->bus_dsm_mask = acpi_desc->bus_nfit_cmd_force_en;
+ set_bit(ND_CMD_CALL, &nd_desc->cmd_mask);
+ set_bit(NVDIMM_BUS_FAMILY_NFIT, &nd_desc->bus_family_mask);
+
adev = to_acpi_dev(acpi_desc);
if (!adev)
return;
@@ -2160,7 +2168,6 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
for (i = ND_CMD_ARS_CAP; i <= ND_CMD_CLEAR_ERROR; i++)
if (acpi_check_dsm(adev->handle, guid, 1, 1ULL << i))
set_bit(i, &nd_desc->cmd_mask);
- set_bit(ND_CMD_CALL, &nd_desc->cmd_mask);
dsm_mask =
(1 << ND_CMD_ARS_CAP) |
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index f5525f8bb770..5c5e7ebba8dc 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -33,7 +33,6 @@
| ACPI_NFIT_MEM_RESTORE_FAILED | ACPI_NFIT_MEM_FLUSH_FAILED \
| ACPI_NFIT_MEM_NOT_ARMED | ACPI_NFIT_MEM_MAP_FAILED)
-#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_HYPERV
#define NVDIMM_CMD_MAX 31
#define NVDIMM_STANDARD_CMDMASK \
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 09087c38fabd..955265656b96 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -1037,9 +1037,25 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
dimm_name = "bus";
}
+ /* Validate command family support against bus declared support */
if (cmd == ND_CMD_CALL) {
+ unsigned long *mask;
+
if (copy_from_user(&pkg, p, sizeof(pkg)))
return -EFAULT;
+
+ if (nvdimm) {
+ if (pkg.nd_family > NVDIMM_FAMILY_MAX)
+ return -EINVAL;
+ mask = &nd_desc->dimm_family_mask;
+ } else {
+ if (pkg.nd_family > NVDIMM_BUS_FAMILY_MAX)
+ return -EINVAL;
+ mask = &nd_desc->bus_family_mask;
+ }
+
+ if (!test_bit(pkg.nd_family, mask))
+ return -EINVAL;
}
if (!desc ||
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..bd39a2cf7972 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -78,6 +78,8 @@ struct nvdimm_bus_descriptor {
const struct attribute_group **attr_groups;
unsigned long bus_dsm_mask;
unsigned long cmd_mask;
+ unsigned long dimm_family_mask;
+ unsigned long bus_family_mask;
struct module *module;
char *provider_name;
struct device_node *of_node;
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index 0e09dc5cec19..e9468b9332bd 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -245,6 +245,10 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_MSFT 3
#define NVDIMM_FAMILY_HYPERV 4
#define NVDIMM_FAMILY_PAPR 5
+#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_PAPR
+
+#define NVDIMM_BUS_FAMILY_NFIT 0
+#define NVDIMM_BUS_FAMILY_MAX NVDIMM_BUS_FAMILY_NFIT
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
Platform reboots are expensive. Towards reducing downtime to apply
firmware updates the Intel NVDIMM command definition is growing support
for applying live firmware updates that only require temporarily
suspending memory traffic instead of a full reboot.
Follow-on commits add support for triggering firmware activation, this
patch only defines the commands, adds probe support, and validates that
they are blocked via the ioctl path. The ioctl-path block ensures that
the OS is in charge since these commands have side effects only the OS
can handle. Specifically firmware activation may cause the memory
controller to be quiesced on the order of 100s of milliseconds. In that
case Linux ensure the activation only takes place while the OS is in a
suspend state.
Cc: Dan Williams <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://pmem.io/documents/IntelOptanePMem_DSM_Interface-V2.0.pdf
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit/core.c | 86 ++++++++++++++++++++++++++++++--------------
drivers/acpi/nfit/intel.h | 53 +++++++++++++++++++++++++++
drivers/acpi/nfit/nfit.h | 25 ++++++++++++-
include/uapi/linux/ndctl.h | 3 +-
4 files changed, 137 insertions(+), 30 deletions(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 9fdd655bdf0e..78cc9e2d2aa3 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -73,6 +73,18 @@ const guid_t *to_nfit_uuid(enum nfit_uuids id)
}
EXPORT_SYMBOL(to_nfit_uuid);
+static const guid_t *to_nfit_bus_uuid(int family)
+{
+ if (WARN_ONCE(family == NVDIMM_BUS_FAMILY_NFIT,
+ "only secondary bus families can be translated\n"))
+ return NULL;
+ /*
+ * The index of bus UUIDs starts immediately following the last
+ * NVDIMM/leaf family.
+ */
+ return to_nfit_uuid(family + NVDIMM_FAMILY_MAX);
+}
+
static struct acpi_device *to_acpi_dev(struct acpi_nfit_desc *acpi_desc)
{
struct nvdimm_bus_descriptor *nd_desc = &acpi_desc->nd_desc;
@@ -362,24 +374,8 @@ static u8 nfit_dsm_revid(unsigned family, unsigned func)
{
static const u8 revid_table[NVDIMM_FAMILY_MAX+1][NVDIMM_CMD_MAX+1] = {
[NVDIMM_FAMILY_INTEL] = {
- [NVDIMM_INTEL_GET_MODES] = 2,
- [NVDIMM_INTEL_GET_FWINFO] = 2,
- [NVDIMM_INTEL_START_FWUPDATE] = 2,
- [NVDIMM_INTEL_SEND_FWUPDATE] = 2,
- [NVDIMM_INTEL_FINISH_FWUPDATE] = 2,
- [NVDIMM_INTEL_QUERY_FWUPDATE] = 2,
- [NVDIMM_INTEL_SET_THRESHOLD] = 2,
- [NVDIMM_INTEL_INJECT_ERROR] = 2,
- [NVDIMM_INTEL_GET_SECURITY_STATE] = 2,
- [NVDIMM_INTEL_SET_PASSPHRASE] = 2,
- [NVDIMM_INTEL_DISABLE_PASSPHRASE] = 2,
- [NVDIMM_INTEL_UNLOCK_UNIT] = 2,
- [NVDIMM_INTEL_FREEZE_LOCK] = 2,
- [NVDIMM_INTEL_SECURE_ERASE] = 2,
- [NVDIMM_INTEL_OVERWRITE] = 2,
- [NVDIMM_INTEL_QUERY_OVERWRITE] = 2,
- [NVDIMM_INTEL_SET_MASTER_PASSPHRASE] = 2,
- [NVDIMM_INTEL_MASTER_SECURE_ERASE] = 2,
+ [NVDIMM_INTEL_GET_MODES ...
+ NVDIMM_INTEL_FW_ACTIVATE_ARM] = 2,
},
};
u8 id;
@@ -406,7 +402,7 @@ static bool payload_dumpable(struct nvdimm *nvdimm, unsigned int func)
}
static int cmd_to_func(struct nfit_mem *nfit_mem, unsigned int cmd,
- struct nd_cmd_pkg *call_pkg)
+ struct nd_cmd_pkg *call_pkg, int *family)
{
if (call_pkg) {
int i;
@@ -417,6 +413,7 @@ static int cmd_to_func(struct nfit_mem *nfit_mem, unsigned int cmd,
for (i = 0; i < ARRAY_SIZE(call_pkg->nd_reserved2); i++)
if (call_pkg->nd_reserved2[i])
return -EINVAL;
+ *family = call_pkg->nd_family;
return call_pkg->nd_command;
}
@@ -450,13 +447,14 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm,
acpi_handle handle;
const guid_t *guid;
int func, rc, i;
+ int family = 0;
if (cmd_rc)
*cmd_rc = -EINVAL;
if (cmd == ND_CMD_CALL)
call_pkg = buf;
- func = cmd_to_func(nfit_mem, cmd, call_pkg);
+ func = cmd_to_func(nfit_mem, cmd, call_pkg, &family);
if (func < 0)
return func;
@@ -478,9 +476,17 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm,
cmd_name = nvdimm_bus_cmd_name(cmd);
cmd_mask = nd_desc->cmd_mask;
- dsm_mask = acpi_desc->bus_dsm_mask;
+ if (cmd == ND_CMD_CALL && call_pkg->nd_family) {
+ family = call_pkg->nd_family;
+ if (!test_bit(family, &nd_desc->bus_family_mask))
+ return -EINVAL;
+ dsm_mask = acpi_desc->family_dsm_mask[family];
+ guid = to_nfit_bus_uuid(family);
+ } else {
+ dsm_mask = acpi_desc->bus_dsm_mask;
+ guid = to_nfit_uuid(NFIT_DEV_BUS);
+ }
desc = nd_cmd_bus_desc(cmd);
- guid = to_nfit_uuid(NFIT_DEV_BUS);
handle = adev->handle;
dimm_name = "bus";
}
@@ -516,8 +522,8 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm,
in_buf.buffer.length = call_pkg->nd_size_in;
}
- dev_dbg(dev, "%s cmd: %d: func: %d input length: %d\n",
- dimm_name, cmd, func, in_buf.buffer.length);
+ dev_dbg(dev, "%s cmd: %d: family: %d func: %d input length: %d\n",
+ dimm_name, cmd, family, func, in_buf.buffer.length);
if (payload_dumpable(nvdimm, func))
print_hex_dump_debug("nvdimm in ", DUMP_PREFIX_OFFSET, 4, 4,
in_buf.buffer.pointer,
@@ -2153,14 +2159,21 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
{
struct nvdimm_bus_descriptor *nd_desc = &acpi_desc->nd_desc;
const guid_t *guid = to_nfit_uuid(NFIT_DEV_BUS);
+ unsigned long dsm_mask, *mask;
struct acpi_device *adev;
- unsigned long dsm_mask;
int i;
- nd_desc->cmd_mask = acpi_desc->bus_cmd_force_en;
set_bit(ND_CMD_CALL, &nd_desc->cmd_mask);
set_bit(NVDIMM_BUS_FAMILY_NFIT, &nd_desc->bus_family_mask);
+ /* enable nfit_test to inject bus command emulation */
+ if (acpi_desc->bus_cmd_force_en) {
+ nd_desc->cmd_mask = acpi_desc->bus_cmd_force_en;
+ mask = &nd_desc->bus_family_mask;
+ if (acpi_desc->family_dsm_mask[NVDIMM_BUS_FAMILY_INTEL])
+ set_bit(NVDIMM_BUS_FAMILY_INTEL, mask);
+ }
+
adev = to_acpi_dev(acpi_desc);
if (!adev)
return;
@@ -2181,6 +2194,14 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
for_each_set_bit(i, &dsm_mask, BITS_PER_LONG)
if (acpi_check_dsm(adev->handle, guid, 1, 1ULL << i))
set_bit(i, &acpi_desc->bus_dsm_mask);
+
+ /* Enumerate allowed NVDIMM_BUS_FAMILY_INTEL commands */
+ dsm_mask = NVDIMM_BUS_INTEL_FW_ACTIVATE_CMDMASK;
+ guid = to_nfit_bus_uuid(NVDIMM_BUS_FAMILY_INTEL);
+ mask = &acpi_desc->family_dsm_mask[NVDIMM_BUS_FAMILY_INTEL];
+ for_each_set_bit(i, &dsm_mask, BITS_PER_LONG)
+ if (acpi_check_dsm(adev->handle, guid, 1, 1ULL << i))
+ set_bit(i, mask);
}
static ssize_t range_index_show(struct device *dev,
@@ -3492,7 +3513,10 @@ static int __acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}
-/* prevent security commands from being issued via ioctl */
+/*
+ * Prevent security and firmware activate commands from being issued via
+ * ioctl.
+ */
static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf)
{
@@ -3503,10 +3527,15 @@ static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
call_pkg->nd_family == NVDIMM_FAMILY_INTEL) {
func = call_pkg->nd_command;
if (func > NVDIMM_CMD_MAX ||
- (1 << func) & NVDIMM_INTEL_SECURITY_CMDMASK)
+ (1 << func) & NVDIMM_INTEL_DENY_CMDMASK)
return -EOPNOTSUPP;
}
+ /* block all non-nfit bus commands */
+ if (!nvdimm && cmd == ND_CMD_CALL &&
+ call_pkg->nd_family != NVDIMM_BUS_FAMILY_NFIT)
+ return -EOPNOTSUPP;
+
return __acpi_nfit_clear_to_send(nd_desc, nvdimm, cmd);
}
@@ -3798,6 +3827,7 @@ static __init int nfit_init(void)
guid_parse(UUID_NFIT_DIMM_N_HPE2, &nfit_uuid[NFIT_DEV_DIMM_N_HPE2]);
guid_parse(UUID_NFIT_DIMM_N_MSFT, &nfit_uuid[NFIT_DEV_DIMM_N_MSFT]);
guid_parse(UUID_NFIT_DIMM_N_HYPERV, &nfit_uuid[NFIT_DEV_DIMM_N_HYPERV]);
+ guid_parse(UUID_INTEL_BUS, &nfit_uuid[NFIT_BUS_INTEL]);
nfit_wq = create_singlethread_workqueue("nfit");
if (!nfit_wq)
diff --git a/drivers/acpi/nfit/intel.h b/drivers/acpi/nfit/intel.h
index 0aca682ab9d7..868d073731cc 100644
--- a/drivers/acpi/nfit/intel.h
+++ b/drivers/acpi/nfit/intel.h
@@ -111,4 +111,57 @@ struct nd_intel_master_secure_erase {
u8 passphrase[ND_INTEL_PASSPHRASE_SIZE];
u32 status;
} __packed;
+
+#define ND_INTEL_FWA_IDLE 0
+#define ND_INTEL_FWA_ARMED 1
+#define ND_INTEL_FWA_BUSY 2
+
+#define ND_INTEL_DIMM_FWA_NONE 0
+#define ND_INTEL_DIMM_FWA_NOTSTAGED 1
+#define ND_INTEL_DIMM_FWA_SUCCESS 2
+#define ND_INTEL_DIMM_FWA_NEEDRESET 3
+#define ND_INTEL_DIMM_FWA_MEDIAFAILED 4
+#define ND_INTEL_DIMM_FWA_ABORT 5
+#define ND_INTEL_DIMM_FWA_NOTSUPP 6
+#define ND_INTEL_DIMM_FWA_ERROR 7
+
+struct nd_intel_fw_activate_dimminfo {
+ u32 status;
+ u16 result;
+ u8 state;
+ u8 reserved[7];
+} __packed;
+
+struct nd_intel_fw_activate_arm {
+ u8 activate_arm;
+ u32 status;
+} __packed;
+
+/* Root device command payloads */
+#define ND_INTEL_BUS_FWA_CAP_FWQUIESCE (1 << 0)
+#define ND_INTEL_BUS_FWA_CAP_OSQUIESCE (1 << 1)
+#define ND_INTEL_BUS_FWA_CAP_RESET (1 << 2)
+
+struct nd_intel_bus_fw_activate_businfo {
+ u32 status;
+ u16 reserved;
+ u8 state;
+ u8 capability;
+ u64 activate_tmo;
+ u64 cpu_quiesce_tmo;
+ u64 io_quiesce_tmo;
+ u64 max_quiesce_tmo;
+} __packed;
+
+#define ND_INTEL_BUS_FWA_STATUS_NOARM (6 | 1 << 16)
+#define ND_INTEL_BUS_FWA_STATUS_BUSY (6 | 2 << 16)
+#define ND_INTEL_BUS_FWA_STATUS_NOFW (6 | 3 << 16)
+#define ND_INTEL_BUS_FWA_STATUS_TMO (6 | 4 << 16)
+#define ND_INTEL_BUS_FWA_STATUS_NOIDLE (6 | 5 << 16)
+#define ND_INTEL_BUS_FWA_STATUS_ABORT (6 | 6 << 16)
+
+struct nd_intel_bus_fw_activate {
+ u8 iodev_state;
+ u32 status;
+} __packed;
#endif
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index da097149d94d..97c122628975 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -18,6 +18,7 @@
/* http://pmem.io/documents/NVDIMM_DSM_Interface-V1.6.pdf */
#define UUID_NFIT_DIMM "4309ac30-0d11-11e4-9191-0800200c9a66"
+#define UUID_INTEL_BUS "c7d8acd4-2df8-4b82-9f65-a325335af149"
/* https://github.com/HewlettPackard/hpe-nvm/blob/master/Documentation/ */
#define UUID_NFIT_DIMM_N_HPE1 "9002c334-acf3-4c0e-9642-a235f0d53bc6"
@@ -65,6 +66,13 @@ enum nvdimm_family_cmds {
NVDIMM_INTEL_QUERY_OVERWRITE = 26,
NVDIMM_INTEL_SET_MASTER_PASSPHRASE = 27,
NVDIMM_INTEL_MASTER_SECURE_ERASE = 28,
+ NVDIMM_INTEL_FW_ACTIVATE_DIMMINFO = 29,
+ NVDIMM_INTEL_FW_ACTIVATE_ARM = 30,
+};
+
+enum nvdimm_bus_family_cmds {
+ NVDIMM_BUS_INTEL_FW_ACTIVATE_BUSINFO = 1,
+ NVDIMM_BUS_INTEL_FW_ACTIVATE = 2,
};
#define NVDIMM_INTEL_SECURITY_CMDMASK \
@@ -75,13 +83,22 @@ enum nvdimm_family_cmds {
| 1 << NVDIMM_INTEL_SET_MASTER_PASSPHRASE \
| 1 << NVDIMM_INTEL_MASTER_SECURE_ERASE)
+#define NVDIMM_INTEL_FW_ACTIVATE_CMDMASK \
+(1 << NVDIMM_INTEL_FW_ACTIVATE_DIMMINFO | 1 << NVDIMM_INTEL_FW_ACTIVATE_ARM)
+
+#define NVDIMM_BUS_INTEL_FW_ACTIVATE_CMDMASK \
+(1 << NVDIMM_BUS_INTEL_FW_ACTIVATE_BUSINFO | 1 << NVDIMM_BUS_INTEL_FW_ACTIVATE)
+
#define NVDIMM_INTEL_CMDMASK \
(NVDIMM_STANDARD_CMDMASK | 1 << NVDIMM_INTEL_GET_MODES \
| 1 << NVDIMM_INTEL_GET_FWINFO | 1 << NVDIMM_INTEL_START_FWUPDATE \
| 1 << NVDIMM_INTEL_SEND_FWUPDATE | 1 << NVDIMM_INTEL_FINISH_FWUPDATE \
| 1 << NVDIMM_INTEL_QUERY_FWUPDATE | 1 << NVDIMM_INTEL_SET_THRESHOLD \
| 1 << NVDIMM_INTEL_INJECT_ERROR | 1 << NVDIMM_INTEL_LATCH_SHUTDOWN \
- | NVDIMM_INTEL_SECURITY_CMDMASK)
+ | NVDIMM_INTEL_SECURITY_CMDMASK | NVDIMM_INTEL_FW_ACTIVATE_CMDMASK)
+
+#define NVDIMM_INTEL_DENY_CMDMASK \
+(NVDIMM_INTEL_SECURITY_CMDMASK | NVDIMM_INTEL_FW_ACTIVATE_CMDMASK)
enum nfit_uuids {
/* for simplicity alias the uuid index with the family id */
@@ -90,6 +107,11 @@ enum nfit_uuids {
NFIT_DEV_DIMM_N_HPE2 = NVDIMM_FAMILY_HPE2,
NFIT_DEV_DIMM_N_MSFT = NVDIMM_FAMILY_MSFT,
NFIT_DEV_DIMM_N_HYPERV = NVDIMM_FAMILY_HYPERV,
+ /*
+ * to_nfit_bus_uuid() expects to translate bus uuid family ids
+ * to a UUID index using NVDIMM_FAMILY_MAX as an offset
+ */
+ NFIT_BUS_INTEL = NVDIMM_FAMILY_MAX + NVDIMM_BUS_FAMILY_INTEL,
NFIT_SPA_VOLATILE,
NFIT_SPA_PM,
NFIT_SPA_DCR,
@@ -238,6 +260,7 @@ struct acpi_nfit_desc {
unsigned long dimm_cmd_force_en;
unsigned long bus_cmd_force_en;
unsigned long bus_dsm_mask;
+ unsigned long family_dsm_mask[NVDIMM_BUS_FAMILY_MAX + 1];
unsigned int platform_cap;
unsigned int scrub_tmo;
int (*blk_do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index e9468b9332bd..8cf1e4884fd5 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -248,7 +248,8 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_PAPR
#define NVDIMM_BUS_FAMILY_NFIT 0
-#define NVDIMM_BUS_FAMILY_MAX NVDIMM_BUS_FAMILY_NFIT
+#define NVDIMM_BUS_FAMILY_INTEL 1
+#define NVDIMM_BUS_FAMILY_MAX NVDIMM_BUS_FAMILY_INTEL
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
The ND_CMD_CALL path only applies to the nfit_test0 emulated DIMMs.
Cleanup occurrences of (i - t->dcr_idx) since that offset fixup only
applies to cases where nfit_test1 needs a bus-local index.
Signed-off-by: Dan Williams <[email protected]>
---
tools/testing/nvdimm/test/nfit.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index a59174ba1d2a..ddf9b3095bfa 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -1224,6 +1224,11 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor *nd_desc,
i = get_dimm(nfit_mem, func);
if (i < 0)
return i;
+ if (i >= NUM_DCR) {
+ dev_WARN_ONCE(&t->pdev.dev, 1,
+ "ND_CMD_CALL only valid for nfit_test0\n");
+ return -EINVAL;
+ }
switch (func) {
case NVDIMM_INTEL_GET_SECURITY_STATE:
@@ -1252,11 +1257,11 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor *nd_desc,
break;
case NVDIMM_INTEL_OVERWRITE:
rc = nd_intel_test_cmd_overwrite(t,
- buf, buf_len, i - t->dcr_idx);
+ buf, buf_len, i);
break;
case NVDIMM_INTEL_QUERY_OVERWRITE:
rc = nd_intel_test_cmd_query_overwrite(t,
- buf, buf_len, i - t->dcr_idx);
+ buf, buf_len, i);
break;
case NVDIMM_INTEL_SET_MASTER_PASSPHRASE:
rc = nd_intel_test_cmd_master_set_pass(t,
@@ -1272,48 +1277,45 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor *nd_desc,
break;
case ND_INTEL_FW_GET_INFO:
rc = nd_intel_test_get_fw_info(t, buf,
- buf_len, i - t->dcr_idx);
+ buf_len, i);
break;
case ND_INTEL_FW_START_UPDATE:
rc = nd_intel_test_start_update(t, buf,
- buf_len, i - t->dcr_idx);
+ buf_len, i);
break;
case ND_INTEL_FW_SEND_DATA:
rc = nd_intel_test_send_data(t, buf,
- buf_len, i - t->dcr_idx);
+ buf_len, i);
break;
case ND_INTEL_FW_FINISH_UPDATE:
rc = nd_intel_test_finish_fw(t, buf,
- buf_len, i - t->dcr_idx);
+ buf_len, i);
break;
case ND_INTEL_FW_FINISH_QUERY:
rc = nd_intel_test_finish_query(t, buf,
- buf_len, i - t->dcr_idx);
+ buf_len, i);
break;
case ND_INTEL_SMART:
rc = nfit_test_cmd_smart(buf, buf_len,
- &t->smart[i - t->dcr_idx]);
+ &t->smart[i]);
break;
case ND_INTEL_SMART_THRESHOLD:
rc = nfit_test_cmd_smart_threshold(buf,
buf_len,
- &t->smart_threshold[i -
- t->dcr_idx]);
+ &t->smart_threshold[i]);
break;
case ND_INTEL_SMART_SET_THRESHOLD:
rc = nfit_test_cmd_smart_set_threshold(buf,
buf_len,
- &t->smart_threshold[i -
- t->dcr_idx],
- &t->smart[i - t->dcr_idx],
+ &t->smart_threshold[i],
+ &t->smart[i],
&t->pdev.dev, t->dimm_dev[i]);
break;
case ND_INTEL_SMART_INJECT:
rc = nfit_test_cmd_smart_inject(buf,
buf_len,
- &t->smart_threshold[i -
- t->dcr_idx],
- &t->smart[i - t->dcr_idx],
+ &t->smart_threshold[i],
+ &t->smart[i],
&t->pdev.dev, t->dimm_dev[i]);
break;
default:
Arrange the for nfit_test_ctl() path to dump command payloads similarly
to the acpi_nfit_ctl() path. This is useful for comparing the
sequence of command events between an emulated ACPI-NFIT platform and a
real one.
Signed-off-by: Dan Williams <[email protected]>
---
tools/testing/nvdimm/test/nfit.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index ddf9b3095bfa..9c6f475befe4 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -1192,6 +1192,29 @@ static int get_dimm(struct nfit_mem *nfit_mem, unsigned int func)
return i;
}
+static void nfit_ctl_dbg(struct acpi_nfit_desc *acpi_desc,
+ struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int len)
+{
+ struct nfit_test *t = container_of(acpi_desc, typeof(*t), acpi_desc);
+ unsigned int func = cmd;
+ unsigned int family = 0;
+
+ if (cmd == ND_CMD_CALL) {
+ struct nd_cmd_pkg *pkg = buf;
+
+ len = pkg->nd_size_in;
+ family = pkg->nd_family;
+ buf = pkg->nd_payload;
+ func = pkg->nd_command;
+ }
+ dev_dbg(&t->pdev.dev, "%s family: %d cmd: %d: func: %d input length: %d\n",
+ nvdimm ? nvdimm_name(nvdimm) : "bus", family, cmd, func,
+ len);
+ print_hex_dump_debug("nvdimm in ", DUMP_PREFIX_OFFSET, 16, 4,
+ buf, min(len, 256u), true);
+}
+
static int nfit_test_ctl(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len, int *cmd_rc)
@@ -1205,6 +1228,8 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor *nd_desc,
cmd_rc = &__cmd_rc;
*cmd_rc = 0;
+ nfit_ctl_dbg(acpi_desc, nvdimm, cmd, buf, buf_len);
+
if (nvdimm) {
struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
unsigned long cmd_mask = nvdimm_cmd_mask(nvdimm);
In preparation for adding a mocked implementation of the
firmware-activate bus-info command, rework nfit_ctl_test() to operate on
a local command payload wrapped in a 'struct nd_cmd_pkg'.
Signed-off-by: Dan Williams <[email protected]>
---
tools/testing/nvdimm/test/nfit.c | 83 ++++++++++++++++++++------------------
1 file changed, 43 insertions(+), 40 deletions(-)
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 9c6f475befe4..2b0bfbfc0abb 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -2726,14 +2726,17 @@ static int nfit_ctl_test(struct device *dev)
struct acpi_nfit_desc *acpi_desc;
const u64 test_val = 0x0123456789abcdefULL;
unsigned long mask, cmd_size, offset;
- union {
- struct nd_cmd_get_config_size cfg_size;
- struct nd_cmd_clear_error clear_err;
- struct nd_cmd_ars_status ars_stat;
- struct nd_cmd_ars_cap ars_cap;
- char buf[sizeof(struct nd_cmd_ars_status)
- + sizeof(struct nd_ars_record)];
- } cmds;
+ struct nfit_ctl_test_cmd {
+ struct nd_cmd_pkg pkg;
+ union {
+ struct nd_cmd_get_config_size cfg_size;
+ struct nd_cmd_clear_error clear_err;
+ struct nd_cmd_ars_status ars_stat;
+ struct nd_cmd_ars_cap ars_cap;
+ char buf[sizeof(struct nd_cmd_ars_status)
+ + sizeof(struct nd_ars_record)];
+ };
+ } cmd;
adev = devm_kzalloc(dev, sizeof(*adev), GFP_KERNEL);
if (!adev)
@@ -2793,21 +2796,21 @@ static int nfit_ctl_test(struct device *dev)
/* basic checkout of a typical 'get config size' command */
- cmd_size = sizeof(cmds.cfg_size);
- cmds.cfg_size = (struct nd_cmd_get_config_size) {
+ cmd_size = sizeof(cmd.cfg_size);
+ cmd.cfg_size = (struct nd_cmd_get_config_size) {
.status = 0,
.config_size = SZ_128K,
.max_xfer = SZ_4K,
};
- rc = setup_result(cmds.buf, cmd_size);
+ rc = setup_result(cmd.buf, cmd_size);
if (rc)
return rc;
rc = acpi_nfit_ctl(&acpi_desc->nd_desc, nvdimm, ND_CMD_GET_CONFIG_SIZE,
- cmds.buf, cmd_size, &cmd_rc);
+ cmd.buf, cmd_size, &cmd_rc);
- if (rc < 0 || cmd_rc || cmds.cfg_size.status != 0
- || cmds.cfg_size.config_size != SZ_128K
- || cmds.cfg_size.max_xfer != SZ_4K) {
+ if (rc < 0 || cmd_rc || cmd.cfg_size.status != 0
+ || cmd.cfg_size.config_size != SZ_128K
+ || cmd.cfg_size.max_xfer != SZ_4K) {
dev_dbg(dev, "%s: failed at: %d rc: %d cmd_rc: %d\n",
__func__, __LINE__, rc, cmd_rc);
return -EIO;
@@ -2816,14 +2819,14 @@ static int nfit_ctl_test(struct device *dev)
/* test ars_status with zero output */
cmd_size = offsetof(struct nd_cmd_ars_status, address);
- cmds.ars_stat = (struct nd_cmd_ars_status) {
+ cmd.ars_stat = (struct nd_cmd_ars_status) {
.out_length = 0,
};
- rc = setup_result(cmds.buf, cmd_size);
+ rc = setup_result(cmd.buf, cmd_size);
if (rc)
return rc;
rc = acpi_nfit_ctl(&acpi_desc->nd_desc, NULL, ND_CMD_ARS_STATUS,
- cmds.buf, cmd_size, &cmd_rc);
+ cmd.buf, cmd_size, &cmd_rc);
if (rc < 0 || cmd_rc) {
dev_dbg(dev, "%s: failed at: %d rc: %d cmd_rc: %d\n",
@@ -2833,16 +2836,16 @@ static int nfit_ctl_test(struct device *dev)
/* test ars_cap with benign extended status */
- cmd_size = sizeof(cmds.ars_cap);
- cmds.ars_cap = (struct nd_cmd_ars_cap) {
+ cmd_size = sizeof(cmd.ars_cap);
+ cmd.ars_cap = (struct nd_cmd_ars_cap) {
.status = ND_ARS_PERSISTENT << 16,
};
offset = offsetof(struct nd_cmd_ars_cap, status);
- rc = setup_result(cmds.buf + offset, cmd_size - offset);
+ rc = setup_result(cmd.buf + offset, cmd_size - offset);
if (rc)
return rc;
rc = acpi_nfit_ctl(&acpi_desc->nd_desc, NULL, ND_CMD_ARS_CAP,
- cmds.buf, cmd_size, &cmd_rc);
+ cmd.buf, cmd_size, &cmd_rc);
if (rc < 0 || cmd_rc) {
dev_dbg(dev, "%s: failed at: %d rc: %d cmd_rc: %d\n",
@@ -2852,19 +2855,19 @@ static int nfit_ctl_test(struct device *dev)
/* test ars_status with 'status' trimmed from 'out_length' */
- cmd_size = sizeof(cmds.ars_stat) + sizeof(struct nd_ars_record);
- cmds.ars_stat = (struct nd_cmd_ars_status) {
+ cmd_size = sizeof(cmd.ars_stat) + sizeof(struct nd_ars_record);
+ cmd.ars_stat = (struct nd_cmd_ars_status) {
.out_length = cmd_size - 4,
};
- record = &cmds.ars_stat.records[0];
+ record = &cmd.ars_stat.records[0];
*record = (struct nd_ars_record) {
.length = test_val,
};
- rc = setup_result(cmds.buf, cmd_size);
+ rc = setup_result(cmd.buf, cmd_size);
if (rc)
return rc;
rc = acpi_nfit_ctl(&acpi_desc->nd_desc, NULL, ND_CMD_ARS_STATUS,
- cmds.buf, cmd_size, &cmd_rc);
+ cmd.buf, cmd_size, &cmd_rc);
if (rc < 0 || cmd_rc || record->length != test_val) {
dev_dbg(dev, "%s: failed at: %d rc: %d cmd_rc: %d\n",
@@ -2874,19 +2877,19 @@ static int nfit_ctl_test(struct device *dev)
/* test ars_status with 'Output (Size)' including 'status' */
- cmd_size = sizeof(cmds.ars_stat) + sizeof(struct nd_ars_record);
- cmds.ars_stat = (struct nd_cmd_ars_status) {
+ cmd_size = sizeof(cmd.ars_stat) + sizeof(struct nd_ars_record);
+ cmd.ars_stat = (struct nd_cmd_ars_status) {
.out_length = cmd_size,
};
- record = &cmds.ars_stat.records[0];
+ record = &cmd.ars_stat.records[0];
*record = (struct nd_ars_record) {
.length = test_val,
};
- rc = setup_result(cmds.buf, cmd_size);
+ rc = setup_result(cmd.buf, cmd_size);
if (rc)
return rc;
rc = acpi_nfit_ctl(&acpi_desc->nd_desc, NULL, ND_CMD_ARS_STATUS,
- cmds.buf, cmd_size, &cmd_rc);
+ cmd.buf, cmd_size, &cmd_rc);
if (rc < 0 || cmd_rc || record->length != test_val) {
dev_dbg(dev, "%s: failed at: %d rc: %d cmd_rc: %d\n",
@@ -2896,15 +2899,15 @@ static int nfit_ctl_test(struct device *dev)
/* test extended status for get_config_size results in failure */
- cmd_size = sizeof(cmds.cfg_size);
- cmds.cfg_size = (struct nd_cmd_get_config_size) {
+ cmd_size = sizeof(cmd.cfg_size);
+ cmd.cfg_size = (struct nd_cmd_get_config_size) {
.status = 1 << 16,
};
- rc = setup_result(cmds.buf, cmd_size);
+ rc = setup_result(cmd.buf, cmd_size);
if (rc)
return rc;
rc = acpi_nfit_ctl(&acpi_desc->nd_desc, nvdimm, ND_CMD_GET_CONFIG_SIZE,
- cmds.buf, cmd_size, &cmd_rc);
+ cmd.buf, cmd_size, &cmd_rc);
if (rc < 0 || cmd_rc >= 0) {
dev_dbg(dev, "%s: failed at: %d rc: %d cmd_rc: %d\n",
@@ -2913,16 +2916,16 @@ static int nfit_ctl_test(struct device *dev)
}
/* test clear error */
- cmd_size = sizeof(cmds.clear_err);
- cmds.clear_err = (struct nd_cmd_clear_error) {
+ cmd_size = sizeof(cmd.clear_err);
+ cmd.clear_err = (struct nd_cmd_clear_error) {
.length = 512,
.cleared = 512,
};
- rc = setup_result(cmds.buf, cmd_size);
+ rc = setup_result(cmd.buf, cmd_size);
if (rc)
return rc;
rc = acpi_nfit_ctl(&acpi_desc->nd_desc, NULL, ND_CMD_CLEAR_ERROR,
- cmds.buf, cmd_size, &cmd_rc);
+ cmd.buf, cmd_size, &cmd_rc);
if (rc < 0 || cmd_rc) {
dev_dbg(dev, "%s: failed at: %d rc: %d cmd_rc: %d\n",
__func__, __LINE__, rc, cmd_rc);
A common pattern for using plain DEVICE_ATTR() instead of
DEVICE_ATTR_RO() and DEVICE_ATTR_RW() is for attributes that want to
limit read to only root. I.e. many users of DEVICE_ATTR() are
specifying 0400 or 0600 for permissions.
Given the expectation that CAP_SYS_ADMIN is needed to access these
sensitive attributes add an explicit helper with the _ADMIN_ identifier
for DEVICE_ATTR_ADMIN_{RO,RW}.
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
include/linux/device.h | 4 ++++
include/linux/sysfs.h | 7 +++++++
2 files changed, 11 insertions(+)
diff --git a/include/linux/device.h b/include/linux/device.h
index 15460a5ac024..d7c2570368fa 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -128,8 +128,12 @@ ssize_t device_store_bool(struct device *dev, struct device_attribute *attr,
__ATTR_PREALLOC(_name, _mode, _show, _store)
#define DEVICE_ATTR_RW(_name) \
struct device_attribute dev_attr_##_name = __ATTR_RW(_name)
+#define DEVICE_ATTR_ADMIN_RW(_name) \
+ struct device_attribute dev_attr_##_name = __ATTR_RW_MODE(_name, 0600)
#define DEVICE_ATTR_RO(_name) \
struct device_attribute dev_attr_##_name = __ATTR_RO(_name)
+#define DEVICE_ATTR_ADMIN_RO(_name) \
+ struct device_attribute dev_attr_##_name = __ATTR_RO_MODE(_name, 0400)
#define DEVICE_ATTR_WO(_name) \
struct device_attribute dev_attr_##_name = __ATTR_WO(_name)
#define DEVICE_ULONG_ATTR(_name, _mode, _var) \
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 86067dbe7745..34e84122f635 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -123,6 +123,13 @@ struct attribute_group {
.show = _name##_show, \
}
+#define __ATTR_RW_MODE(_name, _mode) { \
+ .attr = { .name = __stringify(_name), \
+ .mode = VERIFY_OCTAL_PERMISSIONS(_mode) }, \
+ .show = _name##_show, \
+ .store = _name##_store, \
+}
+
#define __ATTR_WO(_name) { \
.attr = { .name = __stringify(_name), .mode = 0200 }, \
.store = _name##_store, \
Augment the existing firmware update emulation to track activations and
validate proper update vs activate sequencing.
The DIMM firmware activate capability has a concept of a maximum amount
of time platform firmware will quiesce the system relative to how many
DIMMs are being activated in parallel. Simulate that DIMM activation
happens serially, 1 second per-DIMM, and limit the max at 3 seconds. The
nfit_test0 bus emulates 5 DIMMs so it will take 2 activations to update
all DIMMs.
Reported-by: Andy Shevchenko <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit/intel.h | 5 +
tools/testing/nvdimm/test/nfit.c | 209 +++++++++++++++++++++++++++++++++++++-
2 files changed, 210 insertions(+), 4 deletions(-)
diff --git a/drivers/acpi/nfit/intel.h b/drivers/acpi/nfit/intel.h
index 868d073731cc..49a598623024 100644
--- a/drivers/acpi/nfit/intel.h
+++ b/drivers/acpi/nfit/intel.h
@@ -132,6 +132,9 @@ struct nd_intel_fw_activate_dimminfo {
u8 reserved[7];
} __packed;
+#define ND_INTEL_DIMM_FWA_ARM 1
+#define ND_INTEL_DIMM_FWA_DISARM 0
+
struct nd_intel_fw_activate_arm {
u8 activate_arm;
u32 status;
@@ -160,6 +163,8 @@ struct nd_intel_bus_fw_activate_businfo {
#define ND_INTEL_BUS_FWA_STATUS_NOIDLE (6 | 5 << 16)
#define ND_INTEL_BUS_FWA_STATUS_ABORT (6 | 6 << 16)
+#define ND_INTEL_BUS_FWA_IODEV_FORCE_IDLE (0)
+#define ND_INTEL_BUS_FWA_IODEV_OS_IDLE (1)
struct nd_intel_bus_fw_activate {
u8 iodev_state;
u32 status;
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 2b0bfbfc0abb..a1a5dc645b40 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -173,6 +173,9 @@ struct nfit_test_fw {
u64 version;
u32 size_received;
u64 end_time;
+ bool armed;
+ bool missed_activate;
+ unsigned long last_activate;
};
struct nfit_test {
@@ -345,7 +348,7 @@ static int nd_intel_test_finish_fw(struct nfit_test *t,
__func__, t, nd_cmd, buf_len, idx);
if (fw->state == FW_STATE_UPDATED) {
- /* update already done, need cold boot */
+ /* update already done, need activation */
nd_cmd->status = 0x20007;
return 0;
}
@@ -430,6 +433,7 @@ static int nd_intel_test_finish_query(struct nfit_test *t,
}
dev_dbg(dev, "%s: transition out verify\n", __func__);
fw->state = FW_STATE_UPDATED;
+ fw->missed_activate = false;
/* fall through */
case FW_STATE_UPDATED:
nd_cmd->status = 0;
@@ -1178,6 +1182,134 @@ static int nd_intel_test_cmd_master_secure_erase(struct nfit_test *t,
return 0;
}
+static unsigned long last_activate;
+
+static int nvdimm_bus_intel_fw_activate_businfo(struct nfit_test *t,
+ struct nd_intel_bus_fw_activate_businfo *nd_cmd,
+ unsigned int buf_len)
+{
+ int i, armed = 0;
+ int state;
+ u64 tmo;
+
+ for (i = 0; i < NUM_DCR; i++) {
+ struct nfit_test_fw *fw = &t->fw[i];
+
+ if (fw->armed)
+ armed++;
+ }
+
+ /*
+ * Emulate 3 second activation max, and 1 second incremental
+ * quiesce time per dimm requiring multiple activates to get all
+ * DIMMs updated.
+ */
+ if (armed)
+ state = ND_INTEL_FWA_ARMED;
+ else if (!last_activate || time_after(jiffies, last_activate + 3 * HZ))
+ state = ND_INTEL_FWA_IDLE;
+ else
+ state = ND_INTEL_FWA_BUSY;
+
+ tmo = armed * USEC_PER_SEC;
+ *nd_cmd = (struct nd_intel_bus_fw_activate_businfo) {
+ .capability = ND_INTEL_BUS_FWA_CAP_FWQUIESCE
+ | ND_INTEL_BUS_FWA_CAP_OSQUIESCE
+ | ND_INTEL_BUS_FWA_CAP_RESET,
+ .state = state,
+ .activate_tmo = tmo,
+ .cpu_quiesce_tmo = tmo,
+ .io_quiesce_tmo = tmo,
+ .max_quiesce_tmo = 3 * USEC_PER_SEC,
+ };
+
+ return 0;
+}
+
+static int nvdimm_bus_intel_fw_activate(struct nfit_test *t,
+ struct nd_intel_bus_fw_activate *nd_cmd,
+ unsigned int buf_len)
+{
+ struct nd_intel_bus_fw_activate_businfo info;
+ u32 status = 0;
+ int i;
+
+ nvdimm_bus_intel_fw_activate_businfo(t, &info, sizeof(info));
+ if (info.state == ND_INTEL_FWA_BUSY)
+ status = ND_INTEL_BUS_FWA_STATUS_BUSY;
+ else if (info.activate_tmo > info.max_quiesce_tmo)
+ status = ND_INTEL_BUS_FWA_STATUS_TMO;
+ else if (info.state == ND_INTEL_FWA_IDLE)
+ status = ND_INTEL_BUS_FWA_STATUS_NOARM;
+
+ dev_dbg(&t->pdev.dev, "status: %d\n", status);
+ nd_cmd->status = status;
+ if (status && status != ND_INTEL_BUS_FWA_STATUS_TMO)
+ return 0;
+
+ last_activate = jiffies;
+ for (i = 0; i < NUM_DCR; i++) {
+ struct nfit_test_fw *fw = &t->fw[i];
+
+ if (!fw->armed)
+ continue;
+ if (fw->state != FW_STATE_UPDATED)
+ fw->missed_activate = true;
+ else
+ fw->state = FW_STATE_NEW;
+ fw->armed = false;
+ fw->last_activate = last_activate;
+ }
+
+ return 0;
+}
+
+static int nd_intel_test_cmd_fw_activate_dimminfo(struct nfit_test *t,
+ struct nd_intel_fw_activate_dimminfo *nd_cmd,
+ unsigned int buf_len, int dimm)
+{
+ struct nd_intel_bus_fw_activate_businfo info;
+ struct nfit_test_fw *fw = &t->fw[dimm];
+ u32 result, state;
+
+ nvdimm_bus_intel_fw_activate_businfo(t, &info, sizeof(info));
+
+ if (info.state == ND_INTEL_FWA_BUSY)
+ state = ND_INTEL_FWA_BUSY;
+ else if (info.state == ND_INTEL_FWA_IDLE)
+ state = ND_INTEL_FWA_IDLE;
+ else if (fw->armed)
+ state = ND_INTEL_FWA_ARMED;
+ else
+ state = ND_INTEL_FWA_IDLE;
+
+ result = ND_INTEL_DIMM_FWA_NONE;
+ if (last_activate && fw->last_activate == last_activate &&
+ state == ND_INTEL_FWA_IDLE) {
+ if (fw->missed_activate)
+ result = ND_INTEL_DIMM_FWA_NOTSTAGED;
+ else
+ result = ND_INTEL_DIMM_FWA_SUCCESS;
+ }
+
+ *nd_cmd = (struct nd_intel_fw_activate_dimminfo) {
+ .result = result,
+ .state = state,
+ };
+
+ return 0;
+}
+
+static int nd_intel_test_cmd_fw_activate_arm(struct nfit_test *t,
+ struct nd_intel_fw_activate_arm *nd_cmd,
+ unsigned int buf_len, int dimm)
+{
+ struct nfit_test_fw *fw = &t->fw[dimm];
+
+ fw->armed = nd_cmd->activate_arm == ND_INTEL_DIMM_FWA_ARM;
+ nd_cmd->status = 0;
+ return 0;
+}
static int get_dimm(struct nfit_mem *nfit_mem, unsigned int func)
{
@@ -1296,6 +1428,14 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor *nd_desc,
rc = nd_intel_test_cmd_master_secure_erase(t,
buf, buf_len, i);
break;
+ case NVDIMM_INTEL_FW_ACTIVATE_DIMMINFO:
+ rc = nd_intel_test_cmd_fw_activate_dimminfo(
+ t, buf, buf_len, i);
+ break;
+ case NVDIMM_INTEL_FW_ACTIVATE_ARM:
+ rc = nd_intel_test_cmd_fw_activate_arm(
+ t, buf, buf_len, i);
+ break;
case ND_INTEL_ENABLE_LSS_STATUS:
rc = nd_intel_test_cmd_set_lss_status(t,
buf, buf_len);
@@ -1380,9 +1520,9 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor *nd_desc,
if (!nd_desc)
return -ENOTTY;
- if (cmd == ND_CMD_CALL) {
+ if (cmd == ND_CMD_CALL && call_pkg->nd_family
+ == NVDIMM_BUS_FAMILY_NFIT) {
func = call_pkg->nd_command;
-
buf_len = call_pkg->nd_size_in + call_pkg->nd_size_out;
buf = (void *) call_pkg->nd_payload;
@@ -1406,7 +1546,26 @@ static int nfit_test_ctl(struct nvdimm_bus_descriptor *nd_desc,
default:
return -ENOTTY;
}
- }
+ } else if (cmd == ND_CMD_CALL && call_pkg->nd_family
+ == NVDIMM_BUS_FAMILY_INTEL) {
+ func = call_pkg->nd_command;
+ buf_len = call_pkg->nd_size_in + call_pkg->nd_size_out;
+ buf = (void *) call_pkg->nd_payload;
+
+ switch (func) {
+ case NVDIMM_BUS_INTEL_FW_ACTIVATE_BUSINFO:
+ rc = nvdimm_bus_intel_fw_activate_businfo(t,
+ buf, buf_len);
+ return rc;
+ case NVDIMM_BUS_INTEL_FW_ACTIVATE:
+ rc = nvdimm_bus_intel_fw_activate(t, buf,
+ buf_len);
+ return rc;
+ default:
+ return -ENOTTY;
+ }
+ } else if (cmd == ND_CMD_CALL)
+ return -ENOTTY;
if (!nd_desc || !test_bit(cmd, &nd_desc->cmd_mask))
return -ENOTTY;
@@ -1832,6 +1991,7 @@ static void nfit_test0_setup(struct nfit_test *t)
struct acpi_nfit_flush_address *flush;
struct acpi_nfit_capabilities *pcap;
unsigned int offset = 0, i;
+ unsigned long *acpi_mask;
/*
* spa0 (interleave first half of dimm0 and dimm1, note storage
@@ -2558,6 +2718,12 @@ static void nfit_test0_setup(struct nfit_test *t)
&acpi_desc->dimm_cmd_force_en);
set_bit(NVDIMM_INTEL_MASTER_SECURE_ERASE,
&acpi_desc->dimm_cmd_force_en);
+ set_bit(NVDIMM_INTEL_FW_ACTIVATE_DIMMINFO, &acpi_desc->dimm_cmd_force_en);
+ set_bit(NVDIMM_INTEL_FW_ACTIVATE_ARM, &acpi_desc->dimm_cmd_force_en);
+
+ acpi_mask = &acpi_desc->family_dsm_mask[NVDIMM_BUS_FAMILY_INTEL];
+ set_bit(NVDIMM_BUS_INTEL_FW_ACTIVATE_BUSINFO, acpi_mask);
+ set_bit(NVDIMM_BUS_INTEL_FW_ACTIVATE, acpi_mask);
}
static void nfit_test1_setup(struct nfit_test *t)
@@ -2733,6 +2899,7 @@ static int nfit_ctl_test(struct device *dev)
struct nd_cmd_clear_error clear_err;
struct nd_cmd_ars_status ars_stat;
struct nd_cmd_ars_cap ars_cap;
+ struct nd_intel_bus_fw_activate_businfo fwa_info;
char buf[sizeof(struct nd_cmd_ars_status)
+ sizeof(struct nd_ars_record)];
};
@@ -2761,11 +2928,15 @@ static int nfit_ctl_test(struct device *dev)
.module = THIS_MODULE,
.provider_name = "ACPI.NFIT",
.ndctl = acpi_nfit_ctl,
+ .bus_family_mask = 1UL << NVDIMM_BUS_FAMILY_NFIT
+ | 1UL << NVDIMM_BUS_FAMILY_INTEL,
},
.bus_dsm_mask = 1UL << NFIT_CMD_TRANSLATE_SPA
| 1UL << NFIT_CMD_ARS_INJECT_SET
| 1UL << NFIT_CMD_ARS_INJECT_CLEAR
| 1UL << NFIT_CMD_ARS_INJECT_GET,
+ .family_dsm_mask[NVDIMM_BUS_FAMILY_INTEL] =
+ NVDIMM_BUS_INTEL_FW_ACTIVATE_CMDMASK,
.dev = &adev->dev,
};
@@ -2932,6 +3103,36 @@ static int nfit_ctl_test(struct device *dev)
return -EIO;
}
+ /* test firmware activate bus info */
+ cmd_size = sizeof(cmd.fwa_info);
+ cmd = (struct nfit_ctl_test_cmd) {
+ .pkg = {
+ .nd_command = NVDIMM_BUS_INTEL_FW_ACTIVATE_BUSINFO,
+ .nd_family = NVDIMM_BUS_FAMILY_INTEL,
+ .nd_size_out = cmd_size,
+ .nd_fw_size = cmd_size,
+ },
+ .fwa_info = {
+ .state = ND_INTEL_FWA_IDLE,
+ .capability = ND_INTEL_BUS_FWA_CAP_FWQUIESCE
+ | ND_INTEL_BUS_FWA_CAP_OSQUIESCE,
+ .activate_tmo = 1,
+ .cpu_quiesce_tmo = 1,
+ .io_quiesce_tmo = 1,
+ .max_quiesce_tmo = 1,
+ },
+ };
+ rc = setup_result(cmd.buf, cmd_size);
+ if (rc)
+ return rc;
+ rc = acpi_nfit_ctl(&acpi_desc->nd_desc, NULL, ND_CMD_CALL,
+ &cmd, sizeof(cmd.pkg) + cmd_size, &cmd_rc);
+ if (rc < 0 || cmd_rc) {
+ dev_dbg(dev, "%s: failed at: %d rc: %d cmd_rc: %d\n",
+ __func__, __LINE__, rc, cmd_rc);
+ return -EIO;
+ }
+
return 0;
}
Move libnvdimm sysfs attributes that currently use an open coded
DEVICE_ATTR() to hide sensitive root-only information (physical memory
layout) to the new DEVICE_ATTR_ADMIN_RO() helper.
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/nvdimm/namespace_devs.c | 2 +-
drivers/nvdimm/pfn_devs.c | 2 +-
drivers/nvdimm/region_devs.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index ae155e860fdc..6da67f4d641a 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1309,7 +1309,7 @@ static ssize_t resource_show(struct device *dev,
return -ENXIO;
return sprintf(buf, "%#llx\n", (unsigned long long) res->start);
}
-static DEVICE_ATTR(resource, 0400, resource_show, NULL);
+static DEVICE_ATTR_ADMIN_RO(resource);
static const unsigned long blk_lbasize_supported[] = { 512, 520, 528,
4096, 4104, 4160, 4224, 0 };
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 34db557dbad1..3e11ef8d3f5b 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -218,7 +218,7 @@ static ssize_t resource_show(struct device *dev,
return rc;
}
-static DEVICE_ATTR(resource, 0400, resource_show, NULL);
+static DEVICE_ATTR_ADMIN_RO(resource);
static ssize_t size_show(struct device *dev,
struct device_attribute *attr, char *buf)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4502f9c4708d..20ff30c2ab93 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -605,7 +605,7 @@ static ssize_t resource_show(struct device *dev,
return sprintf(buf, "%#llx\n", nd_region->ndr_start);
}
-static DEVICE_ATTR(resource, 0400, resource_show, NULL);
+static DEVICE_ATTR_ADMIN_RO(resource);
static ssize_t persistence_domain_show(struct device *dev,
struct device_attribute *attr, char *buf)
DSMs are strictly an ACPI mechanism, evict the bus_dsm_mask concept from
the generic 'struct nvdimm_bus_descriptor' object.
As a side effect the test facility ->bus_nfit_cmd_force_en is no longer
necessary. The test infrastructure can communicate that information
directly in ->bus_dsm_mask.
Signed-off-by: Dan Williams <[email protected]>
---
drivers/acpi/nfit/core.c | 8 ++++----
drivers/acpi/nfit/nfit.h | 2 +-
include/linux/libnvdimm.h | 1 -
tools/testing/nvdimm/test/nfit.c | 16 ++++++++--------
4 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 1f72ce1a782b..9fdd655bdf0e 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -478,7 +478,7 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm,
cmd_name = nvdimm_bus_cmd_name(cmd);
cmd_mask = nd_desc->cmd_mask;
- dsm_mask = nd_desc->bus_dsm_mask;
+ dsm_mask = acpi_desc->bus_dsm_mask;
desc = nd_cmd_bus_desc(cmd);
guid = to_nfit_uuid(NFIT_DEV_BUS);
handle = adev->handle;
@@ -1238,8 +1238,9 @@ static ssize_t bus_dsm_mask_show(struct device *dev,
{
struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
- return sprintf(buf, "%#lx\n", nd_desc->bus_dsm_mask);
+ return sprintf(buf, "%#lx\n", acpi_desc->bus_dsm_mask);
}
static struct device_attribute dev_attr_bus_dsm_mask =
__ATTR(dsm_mask, 0444, bus_dsm_mask_show, NULL);
@@ -2157,7 +2158,6 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
int i;
nd_desc->cmd_mask = acpi_desc->bus_cmd_force_en;
- nd_desc->bus_dsm_mask = acpi_desc->bus_nfit_cmd_force_en;
set_bit(ND_CMD_CALL, &nd_desc->cmd_mask);
set_bit(NVDIMM_BUS_FAMILY_NFIT, &nd_desc->bus_family_mask);
@@ -2180,7 +2180,7 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
(1 << NFIT_CMD_ARS_INJECT_GET);
for_each_set_bit(i, &dsm_mask, BITS_PER_LONG)
if (acpi_check_dsm(adev->handle, guid, 1, 1ULL << i))
- set_bit(i, &nd_desc->bus_dsm_mask);
+ set_bit(i, &acpi_desc->bus_dsm_mask);
}
static ssize_t range_index_show(struct device *dev,
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index 5c5e7ebba8dc..da097149d94d 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -237,7 +237,7 @@ struct acpi_nfit_desc {
unsigned long scrub_flags;
unsigned long dimm_cmd_force_en;
unsigned long bus_cmd_force_en;
- unsigned long bus_nfit_cmd_force_en;
+ unsigned long bus_dsm_mask;
unsigned int platform_cap;
unsigned int scrub_tmo;
int (*blk_do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index bd39a2cf7972..ad9898ece7d3 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -76,7 +76,6 @@ typedef int (*ndctl_fn)(struct nvdimm_bus_descriptor *nd_desc,
struct device_node;
struct nvdimm_bus_descriptor {
const struct attribute_group **attr_groups;
- unsigned long bus_dsm_mask;
unsigned long cmd_mask;
unsigned long dimm_family_mask;
unsigned long bus_family_mask;
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index a8ee5c4d41eb..a59174ba1d2a 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -2507,10 +2507,10 @@ static void nfit_test0_setup(struct nfit_test *t)
set_bit(ND_CMD_ARS_STATUS, &acpi_desc->bus_cmd_force_en);
set_bit(ND_CMD_CLEAR_ERROR, &acpi_desc->bus_cmd_force_en);
set_bit(ND_CMD_CALL, &acpi_desc->bus_cmd_force_en);
- set_bit(NFIT_CMD_TRANSLATE_SPA, &acpi_desc->bus_nfit_cmd_force_en);
- set_bit(NFIT_CMD_ARS_INJECT_SET, &acpi_desc->bus_nfit_cmd_force_en);
- set_bit(NFIT_CMD_ARS_INJECT_CLEAR, &acpi_desc->bus_nfit_cmd_force_en);
- set_bit(NFIT_CMD_ARS_INJECT_GET, &acpi_desc->bus_nfit_cmd_force_en);
+ set_bit(NFIT_CMD_TRANSLATE_SPA, &acpi_desc->bus_dsm_mask);
+ set_bit(NFIT_CMD_ARS_INJECT_SET, &acpi_desc->bus_dsm_mask);
+ set_bit(NFIT_CMD_ARS_INJECT_CLEAR, &acpi_desc->bus_dsm_mask);
+ set_bit(NFIT_CMD_ARS_INJECT_GET, &acpi_desc->bus_dsm_mask);
set_bit(ND_INTEL_FW_GET_INFO, &acpi_desc->dimm_cmd_force_en);
set_bit(ND_INTEL_FW_START_UPDATE, &acpi_desc->dimm_cmd_force_en);
set_bit(ND_INTEL_FW_SEND_DATA, &acpi_desc->dimm_cmd_force_en);
@@ -2731,11 +2731,11 @@ static int nfit_ctl_test(struct device *dev)
.module = THIS_MODULE,
.provider_name = "ACPI.NFIT",
.ndctl = acpi_nfit_ctl,
- .bus_dsm_mask = 1UL << NFIT_CMD_TRANSLATE_SPA
- | 1UL << NFIT_CMD_ARS_INJECT_SET
- | 1UL << NFIT_CMD_ARS_INJECT_CLEAR
- | 1UL << NFIT_CMD_ARS_INJECT_GET,
},
+ .bus_dsm_mask = 1UL << NFIT_CMD_TRANSLATE_SPA
+ | 1UL << NFIT_CMD_ARS_INJECT_SET
+ | 1UL << NFIT_CMD_ARS_INJECT_CLEAR
+ | 1UL << NFIT_CMD_ARS_INJECT_GET,
.dev = &adev->dev,
};
The runtime firmware activation capability of Intel NVDIMM devices
requires memory transactions to be disabled for 100s of microseconds.
This timeout is large enough to cause in-flight DMA to fail and other
application detectable timeouts. Arrange for firmware activation to be
executed while the system is "quiesced", all processes and device-DMA
frozen.
It is already required that invoking device ->freeze() callbacks is
sufficient to cease DMA. A device that continues memory writes outside
of user-direction violates expectations of the PM core to be to
establish a coherent hibernation image.
That said, RDMA devices are an example of a device that access memory
outside of user process direction. RDMA drivers also typically assume
the system they are operating in will never be hibernated. A solution
for RDMA collisions with firmware activation is outside the scope of
this change and may need to rely on being able to survive the platform
imposed memory controller quiesce period.
Cc: Greg Kroah-Hartman <[email protected]>
[rafael: move from suspend to hibernate post-freeze callback]
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Len Brown <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/base/syscore.c | 21 +++++++++++++++++++++
drivers/nvdimm/bus.c | 30 ++++++++++++++++++++++++++++++
drivers/nvdimm/core.c | 3 +++
include/linux/syscore_ops.h | 2 ++
kernel/power/hibernate.c | 17 ++++++++++++-----
kernel/power/main.c | 1 +
kernel/power/power.h | 7 +++++++
kernel/power/snapshot.c | 13 +++++++++++++
kernel/power/suspend.c | 12 +++++++++++-
9 files changed, 100 insertions(+), 6 deletions(-)
diff --git a/drivers/base/syscore.c b/drivers/base/syscore.c
index 0d346a307140..58c26281667c 100644
--- a/drivers/base/syscore.c
+++ b/drivers/base/syscore.c
@@ -108,6 +108,27 @@ void syscore_resume(void)
trace_suspend_resume(TPS("syscore_resume"), 0, false);
}
EXPORT_SYMBOL_GPL(syscore_resume);
+
+/**
+ * syscore_mem_quiet - Execute callbacks that need memory to be quiet (as
+ * if prepared to be snapshotted for hibernate)
+ *
+ * This function is executed in the hibernate path after device freeze
+ * callbacks.
+ */
+void syscore_mem_quiet(void)
+{
+ struct syscore_ops *ops;
+
+ list_for_each_entry(ops, &syscore_ops_list, node) {
+ if (!ops->mem_quiet)
+ continue;
+ if (initcall_debug)
+ pr_info("PM: Calling %pS\n", ops->mem_quiet);
+ ops->mem_quiet();
+ }
+}
+
#endif /* CONFIG_PM_SLEEP */
/**
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 955265656b96..228b31f85c89 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -3,6 +3,7 @@
* Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/syscore_ops.h>
#include <linux/libnvdimm.h>
#include <linux/sched/mm.h>
#include <linux/vmalloc.h>
@@ -1289,6 +1290,33 @@ static const struct file_operations nvdimm_fops = {
.llseek = noop_llseek,
};
+static void trigger_fw_activate(struct nvdimm_bus_descriptor *nd_desc)
+{
+ if (!nd_desc->fw_ops)
+ return;
+ nd_desc->fw_ops->activate(nd_desc);
+}
+
+static void nvdimm_syscore_mem_quiet(void)
+{
+ struct nvdimm_bus *nvdimm_bus;
+
+ mutex_lock(&nvdimm_bus_list_mutex);
+ list_for_each_entry(nvdimm_bus, &nvdimm_bus_list, list) {
+ struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+ struct device *dev = &nvdimm_bus->dev;
+
+ nvdimm_bus_lock(dev);
+ trigger_fw_activate(nd_desc);
+ nvdimm_bus_unlock(dev);
+ }
+ mutex_unlock(&nvdimm_bus_list_mutex);
+}
+
+static struct syscore_ops nvdimm_syscore_ops = {
+ .mem_quiet = nvdimm_syscore_mem_quiet,
+};
+
int __init nvdimm_bus_init(void)
{
int rc;
@@ -1317,6 +1345,8 @@ int __init nvdimm_bus_init(void)
if (rc)
goto err_nd_bus;
+ register_syscore_ops(&nvdimm_syscore_ops);
+
return 0;
err_nd_bus:
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index b1cc7b35bd51..0cbb5620cd45 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -417,6 +417,9 @@ static ssize_t firmware_activate_store(struct device *dev,
if (!nd_desc->fw_ops)
return -EOPNOTSUPP;
+ if (!sysfs_streq(buf, "activate"))
+ return -EINVAL;
+
nvdimm_bus_lock(dev);
state = nd_desc->fw_ops->activate_state(nd_desc);
diff --git a/include/linux/syscore_ops.h b/include/linux/syscore_ops.h
index ae4d48e4c970..b57f7a93de20 100644
--- a/include/linux/syscore_ops.h
+++ b/include/linux/syscore_ops.h
@@ -15,6 +15,7 @@ struct syscore_ops {
int (*suspend)(void);
void (*resume)(void);
void (*shutdown)(void);
+ void (*mem_quiet)(void);
};
extern void register_syscore_ops(struct syscore_ops *ops);
@@ -22,6 +23,7 @@ extern void unregister_syscore_ops(struct syscore_ops *ops);
#ifdef CONFIG_PM_SLEEP
extern int syscore_suspend(void);
extern void syscore_resume(void);
+extern void syscore_mem_quiet(void);
#endif
extern void syscore_shutdown(void);
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 02ec716a4927..ccf2268b9f27 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -118,11 +118,18 @@ EXPORT_SYMBOL(system_entering_hibernation);
#ifdef CONFIG_PM_DEBUG
static void hibernation_debug_sleep(void)
{
+ /*
+ * The mem-quiet test state wants to get to syscore_mem_quiet()
+ * as quickly as possible, skip all debug timeouts by default.
+ */
+ if (pm_test_level == TEST_MEM_QUIET)
+ return;
+
pr_info("debug: Waiting for 5 seconds.\n");
mdelay(5000);
}
-static int hibernation_test(int level)
+int hibernation_test(int level)
{
if (pm_test_level == level) {
hibernation_debug_sleep();
@@ -130,9 +137,7 @@ static int hibernation_test(int level)
}
return 0;
}
-#else /* !CONFIG_PM_DEBUG */
-static int hibernation_test(int level) { return 0; }
-#endif /* !CONFIG_PM_DEBUG */
+#endif /* CONFIG_PM_DEBUG */
/**
* platform_begin - Call platform to start hibernation.
@@ -400,11 +405,13 @@ int hibernation_snapshot(int platform_mode)
error = dpm_suspend(PMSG_FREEZE);
- if (error || hibernation_test(TEST_DEVICES))
+ if (error || hibernation_test(TEST_DEVICES) || hibernation_test(TEST_MEM_QUIET))
platform_recover(platform_mode);
else
error = create_image(platform_mode);
+ syscore_mem_quiet();
+
/*
* In the case that we call create_image() above, the control
* returns here (1) after the image has been created or the
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 40f86ec4ab30..e1cebd0694b3 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -234,6 +234,7 @@ static const char * const pm_tests[__TEST_AFTER_LAST] = {
[TEST_PLATFORM] = "platform",
[TEST_DEVICES] = "devices",
[TEST_FREEZER] = "freezer",
+ [TEST_MEM_QUIET] = "mem-quiet",
};
static ssize_t pm_test_show(struct kobject *kobj, struct kobj_attribute *attr,
diff --git a/kernel/power/power.h b/kernel/power/power.h
index ba2094db6294..de7dd36865a5 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -233,6 +233,7 @@ enum {
TEST_PLATFORM,
TEST_DEVICES,
TEST_FREEZER,
+ TEST_MEM_QUIET,
/* keep last */
__TEST_AFTER_LAST
};
@@ -246,6 +247,12 @@ extern int pm_test_level;
#define pm_test_level (TEST_NONE)
#endif
+#ifdef CONFIG_PM_DEBUG
+int hibernation_test(int level);
+#else /* !CONFIG_PM_DEBUG */
+static inline int hibernation_test(int level) { return 0; }
+#endif /* !CONFIG_PM_DEBUG */
+
#ifdef CONFIG_SUSPEND_FREEZER
static inline int suspend_freeze_processes(void)
{
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 881128b9351e..82fbc8150340 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1080,6 +1080,13 @@ int create_basic_memory_bitmaps(void)
struct memory_bitmap *bm1, *bm2;
int error = 0;
+ /*
+ * No need to prep bitmaps in the case when the hibernate is
+ * just being used trigger a memory quiesce point.
+ */
+ if (hibernation_test(TEST_MEM_QUIET))
+ return 0;
+
if (forbidden_pages_map && free_pages_map)
return 0;
else
@@ -1129,6 +1136,9 @@ void free_basic_memory_bitmaps(void)
{
struct memory_bitmap *bm1, *bm2;
+ if (hibernation_test(TEST_MEM_QUIET))
+ return;
+
if (WARN_ON(!(forbidden_pages_map && free_pages_map)))
return;
@@ -1702,6 +1712,9 @@ int hibernate_preallocate_memory(void)
ktime_t start, stop;
int error;
+ if (hibernation_test(TEST_MEM_QUIET))
+ return 0;
+
pr_info("Preallocating image memory\n");
start = ktime_get();
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 8b1bb5ee7e5d..ffaef2938ece 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -323,7 +323,17 @@ MODULE_PARM_DESC(pm_test_delay,
static int suspend_test(int level)
{
#ifdef CONFIG_PM_DEBUG
- if (pm_test_level == level) {
+ /*
+ * The mem-quiet state has special meaning for the hibernate
+ * path for the suspend path just treat it the same as
+ * TEST_DEVICES
+ */
+ int test_level = pm_test_level;
+
+ if (test_level == TEST_MEM_QUIET)
+ test_level = TEST_DEVICES;
+
+ if (test_level == level) {
pr_info("suspend debug: Waiting for %d second(s).\n",
pm_test_delay);
mdelay(pm_test_delay * 1000);
Abstract the platform specific mechanics for firmware activation behind
a handful of generic ops. At the bus level ->activate_state() indicates
the unified state (idle, busy, armed) of all DIMMs on the bus, and
->capability() indicates the system state expectations for activate. At
the DIMM level ->activate_state() indicates the per-DIMM state,
->activate_result() indicates the outcome of the last activation
attempt, and ->arm() attempts to transition the DIMM from 'idle' to
'armed'.
Cc: Jonathan Corbet <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
Documentation/ABI/testing/sysfs-bus-nvdimm | 2
.../driver-api/nvdimm/firmware-activate.rst | 74 ++++++++++++++++
drivers/nvdimm/core.c | 92 +++++++++++++++++++
drivers/nvdimm/dimm_devs.c | 95 ++++++++++++++++++++
drivers/nvdimm/nd-core.h | 1
include/linux/libnvdimm.h | 44 +++++++++
6 files changed, 308 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-bus-nvdimm
create mode 100644 Documentation/driver-api/nvdimm/firmware-activate.rst
diff --git a/Documentation/ABI/testing/sysfs-bus-nvdimm b/Documentation/ABI/testing/sysfs-bus-nvdimm
new file mode 100644
index 000000000000..d64380262be8
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-nvdimm
@@ -0,0 +1,2 @@
+The libnvdimm sub-system implements a common sysfs interface for
+platform nvdimm resources. See Documentation/driver-api/nvdimm/.
diff --git a/Documentation/driver-api/nvdimm/firmware-activate.rst b/Documentation/driver-api/nvdimm/firmware-activate.rst
new file mode 100644
index 000000000000..8b42c96e1f6f
--- /dev/null
+++ b/Documentation/driver-api/nvdimm/firmware-activate.rst
@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+NVDIMM Runtime Firmware Activation
+==================================
+
+Some persistent memory devices run a firmware locally on the device /
+"DIMM" to perform tasks like media management, capacity provisioning,
+and health monitoring. The process of updating that firmware typically
+involves a reboot because it has implications for in-flight memory
+transactions. However, reboots are disruptive and at least the Intel
+persistent memory platform implementation, described by the Intel ACPI
+DSM specification [1], has added support for activating firmware at
+runtime.
+
+A native sysfs interface is implemented in libnvdimm to allow platform
+to advertise and control their local runtime firmware activation
+capability.
+
+The libnvdimm bus object, ndbusX, implements an ndbusX/firmware_activate
+attribute that shows the state of the firmware activation as one of 'idle',
+'armed', 'overflow', and 'busy'.
+
+- idle:
+ No devices are set / armed to activate firmware
+
+- armed:
+ At least one device is armed
+
+- busy:
+ In the busy state armed devices are in the process of transitioning
+ back to idle and completing an activation cycle.
+
+- overflow:
+ If the platform has a concept of incremental work needed to perform
+ the activation it could be the case that too many DIMMs are armed for
+ activation. In that scenario the potential for firmware activation to
+ timeout is indicated by the 'overflow' state.
+
+This property is optionally read-write if the platform implementation
+allows firmware-activation at run-time. Some activations may involve a
+memory controller quiesce that may trigger undefined behavior in devices
+actively performing DMA. On these platform firmware_activate is not
+writable and instead activation occurs over a suspend event.
+
+The libnvdimm memory-device / DIMM object, nmemX, implements
+nmemX/firmware_activate and nmemX/firmware_activate_result attributes to
+communicate the per-device firmware activation state. Similar to the
+ndbusX/firmware_activate attribute, the nmemX/firmware_activate
+attribute indicates 'idle', 'armed', or 'busy'. The state transitions
+from 'armed' to 'idle' when the system is prepared to activate firmware,
+firmware staged + state set to armed, and a system sleep state
+transition is triggered. After that activation event the
+nmemX/firmware_activate_result attribute reflects the state of the last
+activation as one of:
+
+- none:
+ No runtime activation triggered since the last time the device was reset
+
+- success:
+ The last runtime activation completed successfully.
+
+- fail:
+ The last runtime activation failed for device-specific reasons.
+
+- not_staged:
+ The last runtime activation failed due to a sequencing error of the
+ firmware image not being staged.
+
+- need_reset:
+ Runtime firmware activation failed, but the firmware can still be
+ activated via the legacy method of power-cycling the system.
+
+[1]: https://docs.pmem.io/persistent-memory/
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index fe9bd6febdd2..b1cc7b35bd51 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -378,15 +378,107 @@ static ssize_t wait_probe_show(struct device *dev,
}
static DEVICE_ATTR_RO(wait_probe);
+static ssize_t firmware_activate_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+ struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+ enum nvdimm_fwa_state state;
+
+ if (!nd_desc->fw_ops)
+ return -EOPNOTSUPP;
+
+ nvdimm_bus_lock(dev);
+ state = nd_desc->fw_ops->activate_state(nd_desc);
+ nvdimm_bus_unlock(dev);
+
+ switch (state) {
+ case NVDIMM_FWA_IDLE:
+ return sprintf(buf, "idle\n");
+ case NVDIMM_FWA_BUSY:
+ return sprintf(buf, "busy\n");
+ case NVDIMM_FWA_ARMED:
+ return sprintf(buf, "armed\n");
+ case NVDIMM_FWA_ARM_OVERFLOW:
+ return sprintf(buf, "overflow\n");
+ default:
+ return -ENXIO;
+ }
+}
+
+static ssize_t firmware_activate_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+ struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+ enum nvdimm_fwa_state state;
+ ssize_t rc;
+
+ if (!nd_desc->fw_ops)
+ return -EOPNOTSUPP;
+
+ nvdimm_bus_lock(dev);
+ state = nd_desc->fw_ops->activate_state(nd_desc);
+
+ switch (state) {
+ case NVDIMM_FWA_BUSY:
+ rc = -EBUSY;
+ break;
+ case NVDIMM_FWA_ARMED:
+ case NVDIMM_FWA_ARM_OVERFLOW:
+ rc = nd_desc->fw_ops->activate(nd_desc);
+ break;
+ case NVDIMM_FWA_IDLE:
+ default:
+ rc = -ENXIO;
+ }
+ nvdimm_bus_unlock(dev);
+
+ if (rc == 0)
+ rc = len;
+ return rc;
+}
+
+static DEVICE_ATTR_ADMIN_RW(firmware_activate);
+
+static umode_t nvdimm_bus_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ struct device *dev = container_of(kobj, typeof(*dev), kobj);
+ struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+ struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+
+ if (a == &dev_attr_firmware_activate.attr) {
+ enum nvdimm_fwa_capability cap;
+
+ if (!nd_desc->fw_ops)
+ return 0;
+
+ nvdimm_bus_lock(dev);
+ cap = nd_desc->fw_ops->capability(nd_desc);
+ nvdimm_bus_unlock(dev);
+
+ if (cap < NVDIMM_FWA_CAP_QUIESCE)
+ return 0;
+ /* Force suspend flow for activate */
+ if (cap == NVDIMM_FWA_CAP_QUIESCE)
+ return 0400;
+ /* Live activation permitted */
+ return 0600;
+ }
+ return a->mode;
+}
+
static struct attribute *nvdimm_bus_attributes[] = {
&dev_attr_commands.attr,
&dev_attr_wait_probe.attr,
&dev_attr_provider.attr,
+ &dev_attr_firmware_activate.attr,
NULL,
};
static const struct attribute_group nvdimm_bus_attribute_group = {
.attrs = nvdimm_bus_attributes,
+ .is_visible = nvdimm_bus_visible,
};
const struct attribute_group *nvdimm_bus_attribute_groups[] = {
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index b7b77e8d9027..85835f9add7a 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -406,6 +406,88 @@ static ssize_t security_store(struct device *dev,
}
static DEVICE_ATTR_RW(security);
+static ssize_t firmware_activate_result_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+ enum nvdimm_fwa_result result;
+
+ if (!nvdimm->fw_ops)
+ return -EOPNOTSUPP;
+
+ nvdimm_bus_lock(dev);
+ result = nvdimm->fw_ops->activate_result(nvdimm);
+ nvdimm_bus_unlock(dev);
+
+ switch (result) {
+ case NVDIMM_FWA_RESULT_NONE:
+ return sprintf(buf, "none\n");
+ case NVDIMM_FWA_RESULT_SUCCESS:
+ return sprintf(buf, "success\n");
+ case NVDIMM_FWA_RESULT_FAIL:
+ return sprintf(buf, "fail\n");
+ case NVDIMM_FWA_RESULT_NOTSTAGED:
+ return sprintf(buf, "not_staged\n");
+ case NVDIMM_FWA_RESULT_NEEDRESET:
+ return sprintf(buf, "need_reset\n");
+ default:
+ return -ENXIO;
+ }
+}
+static DEVICE_ATTR_ADMIN_RO(firmware_activate_result);
+
+static ssize_t firmware_activate_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+ enum nvdimm_fwa_state state;
+
+ if (!nvdimm->fw_ops)
+ return -EOPNOTSUPP;
+
+ nvdimm_bus_lock(dev);
+ state = nvdimm->fw_ops->activate_state(nvdimm);
+ nvdimm_bus_unlock(dev);
+
+ switch (state) {
+ case NVDIMM_FWA_IDLE:
+ return sprintf(buf, "idle\n");
+ case NVDIMM_FWA_BUSY:
+ return sprintf(buf, "busy\n");
+ case NVDIMM_FWA_ARMED:
+ return sprintf(buf, "armed\n");
+ default:
+ return -ENXIO;
+ }
+}
+
+static ssize_t firmware_activate_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+ enum nvdimm_fwa_trigger arg;
+ int rc;
+
+ if (!nvdimm->fw_ops)
+ return -EOPNOTSUPP;
+
+ if (sysfs_streq(buf, "arm"))
+ arg = NVDIMM_FWA_ARM;
+ else if (sysfs_streq(buf, "disarm"))
+ arg = NVDIMM_FWA_DISARM;
+ else
+ return -EINVAL;
+
+ nvdimm_bus_lock(dev);
+ rc = nvdimm->fw_ops->arm(nvdimm, arg);
+ nvdimm_bus_unlock(dev);
+
+ if (rc < 0)
+ return rc;
+ return len;
+}
+static DEVICE_ATTR_ADMIN_RW(firmware_activate);
+
static struct attribute *nvdimm_attributes[] = {
&dev_attr_state.attr,
&dev_attr_flags.attr,
@@ -413,6 +495,8 @@ static struct attribute *nvdimm_attributes[] = {
&dev_attr_available_slots.attr,
&dev_attr_security.attr,
&dev_attr_frozen.attr,
+ &dev_attr_firmware_activate.attr,
+ &dev_attr_firmware_activate_result.attr,
NULL,
};
@@ -421,6 +505,17 @@ static umode_t nvdimm_visible(struct kobject *kobj, struct attribute *a, int n)
struct device *dev = container_of(kobj, typeof(*dev), kobj);
struct nvdimm *nvdimm = to_nvdimm(dev);
+ if (a == &dev_attr_firmware_activate.attr) {
+ struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+ struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+
+ if (!nd_desc->fw_ops)
+ return 0;
+ if (!nvdimm->fw_ops)
+ return 0;
+ return a->mode;
+ }
+
if (a != &dev_attr_security.attr && a != &dev_attr_frozen.attr)
return a->mode;
if (!nvdimm->sec.flags)
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index ddb9d97d9129..564faa36a3ca 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -45,6 +45,7 @@ struct nvdimm {
struct kernfs_node *overwrite_state;
} sec;
struct delayed_work dwork;
+ const struct nvdimm_fw_ops *fw_ops;
};
static inline unsigned long nvdimm_security_flags(
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index ad9898ece7d3..15dbcb718316 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -86,6 +86,7 @@ struct nvdimm_bus_descriptor {
int (*flush_probe)(struct nvdimm_bus_descriptor *nd_desc);
int (*clear_to_send)(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *data);
+ const struct nvdimm_bus_fw_ops *fw_ops;
};
struct nd_cmd_desc {
@@ -200,6 +201,49 @@ struct nvdimm_security_ops {
int (*query_overwrite)(struct nvdimm *nvdimm);
};
+enum nvdimm_fwa_state {
+ NVDIMM_FWA_INVALID,
+ NVDIMM_FWA_IDLE,
+ NVDIMM_FWA_ARMED,
+ NVDIMM_FWA_BUSY,
+ NVDIMM_FWA_ARM_OVERFLOW,
+};
+
+enum nvdimm_fwa_trigger {
+ NVDIMM_FWA_ARM,
+ NVDIMM_FWA_DISARM,
+};
+
+enum nvdimm_fwa_capability {
+ NVDIMM_FWA_CAP_INVALID,
+ NVDIMM_FWA_CAP_NONE,
+ NVDIMM_FWA_CAP_QUIESCE,
+ NVDIMM_FWA_CAP_LIVE,
+};
+
+enum nvdimm_fwa_result {
+ NVDIMM_FWA_RESULT_INVALID,
+ NVDIMM_FWA_RESULT_NONE,
+ NVDIMM_FWA_RESULT_SUCCESS,
+ NVDIMM_FWA_RESULT_NOTSTAGED,
+ NVDIMM_FWA_RESULT_NEEDRESET,
+ NVDIMM_FWA_RESULT_FAIL,
+};
+
+struct nvdimm_bus_fw_ops {
+ enum nvdimm_fwa_state (*activate_state)
+ (struct nvdimm_bus_descriptor *nd_desc);
+ enum nvdimm_fwa_capability (*capability)
+ (struct nvdimm_bus_descriptor *nd_desc);
+ int (*activate)(struct nvdimm_bus_descriptor *nd_desc);
+};
+
+struct nvdimm_fw_ops {
+ enum nvdimm_fwa_state (*activate_state)(struct nvdimm *nvdimm);
+ enum nvdimm_fwa_result (*activate_result)(struct nvdimm *nvdimm);
+ int (*arm)(struct nvdimm *nvdimm, enum nvdimm_fwa_trigger arg);
+};
+
void badrange_init(struct badrange *badrange);
int badrange_add(struct badrange *badrange, u64 addr, u64 length);
void badrange_forget(struct badrange *badrange, phys_addr_t start,
Plumb the platform specific backend for the generic libnvdimm firmware
activate interface. Register dimm level operations to arm/disarm
activation, and register bus level operations to report the dynamic
platform-quiesce time relative to the number of dimms armed for firmware
activation.
A new nfit-specific bus attribute "firmware_activate_noidle" is added to
allow the activation to switch between platform enforced, and OS
opportunistic device quiesce. Another nfit-specific bus attribute
"firmware_activate_nosuspend" is added to allow activation to be
initiated at runtime rather than a suspend-callback.
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
Documentation/ABI/testing/sysfs-bus-nfit | 35 ++
drivers/acpi/nfit/core.c | 45 +++
drivers/acpi/nfit/intel.c | 426 ++++++++++++++++++++++++++++++
drivers/acpi/nfit/intel.h | 3
drivers/acpi/nfit/nfit.h | 11 +
drivers/nvdimm/core.c | 8 +
drivers/nvdimm/dimm_devs.c | 4
include/linux/libnvdimm.h | 6
8 files changed, 531 insertions(+), 7 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-nfit b/Documentation/ABI/testing/sysfs-bus-nfit
index a1cb44dcb908..c25d6d19f6e0 100644
--- a/Documentation/ABI/testing/sysfs-bus-nfit
+++ b/Documentation/ABI/testing/sysfs-bus-nfit
@@ -202,6 +202,41 @@ Description:
functions. See the section named 'NVDIMM Root Device _DSMs' in
the ACPI specification.
+What: /sys/bus/nd/devices/ndbusX/nfit/firmware_activate_noidle
+Date: Apr, 2020
+KernelVersion: v5.8
+Contact: [email protected]
+Description:
+ (RW) The Intel platform implementation of firmware activate
+ support exposes an option let the platform force idle devices in
+ the system over the activation event, or trust that the OS will
+ do it. The safe default is to let the platform force idle
+ devices since the kernel is already in a suspend state, and on
+ the chance that a driver does not properly quiesce bus-mastering
+ after a suspend callback the platform will handle it. However,
+ the activation might abort if, for example, platform firmware
+ determines that the activation time exceeds the max PCI-E
+ completion timeout. Since the platform does not know whether the
+ OS is running the activation from a suspend context it aborts,
+ but if the system owner trusts driver suspend callback to be
+ sufficient then 'firmware_activation_noidle' can be
+ enabled to bypass the activation abort.
+
+What: /sys/bus/nd/devices/ndbusX/nfit/firmware_activate_nosuspend
+Date: Apr, 2020
+KernelVersion: v5.8
+Contact: [email protected]
+Description:
+ (RW) The Intel platform implementation of firmware activate
+ enforces a memory-controller quiesce event while DMA might be
+ in-flight. That in-flight DMA may timeout awaiting the
+ activation to complete and cause undefined system behavior.
+ For this reason firmware activate is, by default, limited to
+ be carried out after all system device drivers have executed
+ their power-management-suspend callbacks. Otherwise, setting
+ this override allows firmware activate to be triggered outside
+ of the the suspend context by writing "activate" to
+ /sys/bus/nd/devices/ndbusX/firmware_activate.
What: /sys/bus/nd/devices/regionX/nfit/range_index
Date: Jun, 2015
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 78cc9e2d2aa3..e3e716f46e96 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1392,8 +1392,15 @@ static umode_t nfit_visible(struct kobject *kobj, struct attribute *a, int n)
struct device *dev = container_of(kobj, struct device, kobj);
struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
- if (a == &dev_attr_scrub.attr && !ars_supported(nvdimm_bus))
- return 0;
+ if (a == &dev_attr_scrub.attr)
+ return ars_supported(nvdimm_bus) ? a->mode : 0;
+
+ if (a == &dev_attr_firmware_activate_noidle.attr)
+ return intel_fwa_supported(nvdimm_bus) ? a->mode : 0;
+
+ if (a == &dev_attr_firmware_activate_nosuspend.attr)
+ return intel_fwa_supported(nvdimm_bus) ? a->mode : 0;
+
return a->mode;
}
@@ -1402,6 +1409,8 @@ static struct attribute *acpi_nfit_attributes[] = {
&dev_attr_scrub.attr,
&dev_attr_hw_error_scrub.attr,
&dev_attr_bus_dsm_mask.attr,
+ &dev_attr_firmware_activate_noidle.attr,
+ &dev_attr_firmware_activate_nosuspend.attr,
NULL,
};
@@ -2019,6 +2028,26 @@ static const struct nvdimm_security_ops *acpi_nfit_get_security_ops(int family)
}
}
+static const struct nvdimm_fw_ops *acpi_nfit_get_fw_ops(
+ struct nfit_mem *nfit_mem)
+{
+ unsigned long mask;
+ struct acpi_nfit_desc *acpi_desc = nfit_mem->acpi_desc;
+ struct nvdimm_bus_descriptor *nd_desc = &acpi_desc->nd_desc;
+
+ if (!nd_desc->fw_ops)
+ return NULL;
+
+ if (nfit_mem->family != NVDIMM_FAMILY_INTEL)
+ return NULL;
+
+ mask = nfit_mem->dsm_mask & NVDIMM_INTEL_FW_ACTIVATE_CMDMASK;
+ if (mask != NVDIMM_INTEL_FW_ACTIVATE_CMDMASK)
+ return NULL;
+
+ return intel_fw_ops;
+}
+
static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
{
struct nfit_mem *nfit_mem;
@@ -2095,7 +2124,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
acpi_nfit_dimm_attribute_groups,
flags, cmd_mask, flush ? flush->hint_count : 0,
nfit_mem->flush_wpq, &nfit_mem->id[0],
- acpi_nfit_get_security_ops(nfit_mem->family));
+ acpi_nfit_get_security_ops(nfit_mem->family),
+ acpi_nfit_get_fw_ops(nfit_mem));
if (!nvdimm)
return -ENOMEM;
@@ -2170,8 +2200,10 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
if (acpi_desc->bus_cmd_force_en) {
nd_desc->cmd_mask = acpi_desc->bus_cmd_force_en;
mask = &nd_desc->bus_family_mask;
- if (acpi_desc->family_dsm_mask[NVDIMM_BUS_FAMILY_INTEL])
+ if (acpi_desc->family_dsm_mask[NVDIMM_BUS_FAMILY_INTEL]) {
set_bit(NVDIMM_BUS_FAMILY_INTEL, mask);
+ nd_desc->fw_ops = intel_bus_fw_ops;
+ }
}
adev = to_acpi_dev(acpi_desc);
@@ -2202,6 +2234,11 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
for_each_set_bit(i, &dsm_mask, BITS_PER_LONG)
if (acpi_check_dsm(adev->handle, guid, 1, 1ULL << i))
set_bit(i, mask);
+
+ if (*mask == dsm_mask) {
+ set_bit(NVDIMM_BUS_FAMILY_INTEL, &nd_desc->bus_family_mask);
+ nd_desc->fw_ops = intel_bus_fw_ops;
+ }
}
static ssize_t range_index_show(struct device *dev,
diff --git a/drivers/acpi/nfit/intel.c b/drivers/acpi/nfit/intel.c
index 1113b679cd7b..437ebab08d9f 100644
--- a/drivers/acpi/nfit/intel.c
+++ b/drivers/acpi/nfit/intel.c
@@ -7,6 +7,86 @@
#include "intel.h"
#include "nfit.h"
+static ssize_t firmware_activate_noidle_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+ struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+
+ return sprintf(buf, "%s\n", acpi_desc->fwa_noidle ? "Y" : "N");
+}
+
+static ssize_t firmware_activate_noidle_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t size)
+{
+ struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+ struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+ ssize_t rc;
+ bool val;
+
+ rc = kstrtobool(buf, &val);
+ if (rc)
+ return rc;
+ if (val != acpi_desc->fwa_noidle)
+ acpi_desc->fwa_cap = NVDIMM_FWA_CAP_INVALID;
+ acpi_desc->fwa_noidle = val;
+ return size;
+}
+DEVICE_ATTR_RW(firmware_activate_noidle);
+
+static ssize_t firmware_activate_nosuspend_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+ struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+
+ return sprintf(buf, "%s\n", acpi_desc->fwa_nosuspend ? "Y" : "N");
+}
+
+static ssize_t firmware_activate_nosuspend_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t size)
+{
+ struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+ struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+ ssize_t rc;
+ bool val;
+
+ rc = kstrtobool(buf, &val);
+ if (rc)
+ return rc;
+ if (val != acpi_desc->fwa_nosuspend) {
+ /*
+ * Invalidate cached capability state and update sysfs
+ * permissions.
+ */
+ acpi_desc->fwa_cap = NVDIMM_FWA_CAP_INVALID;
+ acpi_desc->fwa_nosuspend = val;
+ rc = nvdimm_bus_update_sysfs(nvdimm_bus);
+ }
+
+ if (rc == 0)
+ rc = size;
+ return rc;
+}
+DEVICE_ATTR_RW(firmware_activate_nosuspend);
+
+bool intel_fwa_supported(struct nvdimm_bus *nvdimm_bus)
+{
+ struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+ unsigned long *mask;
+
+ if (!test_bit(NVDIMM_BUS_FAMILY_INTEL, &nd_desc->bus_family_mask))
+ return false;
+
+ mask = &acpi_desc->family_dsm_mask[NVDIMM_BUS_FAMILY_INTEL];
+ return *mask == NVDIMM_BUS_INTEL_FW_ACTIVATE_CMDMASK;
+}
+
static unsigned long intel_security_flags(struct nvdimm *nvdimm,
enum nvdimm_passphrase_type ptype)
{
@@ -389,3 +469,349 @@ static const struct nvdimm_security_ops __intel_security_ops = {
};
const struct nvdimm_security_ops *intel_security_ops = &__intel_security_ops;
+
+static int intel_bus_fwa_businfo(struct nvdimm_bus_descriptor *nd_desc,
+ struct nd_intel_bus_fw_activate_businfo *info)
+{
+ struct {
+ struct nd_cmd_pkg pkg;
+ struct nd_intel_bus_fw_activate_businfo cmd;
+ } nd_cmd = {
+ .pkg = {
+ .nd_command = NVDIMM_BUS_INTEL_FW_ACTIVATE_BUSINFO,
+ .nd_family = NVDIMM_BUS_FAMILY_INTEL,
+ .nd_size_out =
+ sizeof(struct nd_intel_bus_fw_activate_businfo),
+ .nd_fw_size =
+ sizeof(struct nd_intel_bus_fw_activate_businfo),
+ },
+ };
+ int rc;
+
+ rc = nd_desc->ndctl(nd_desc, NULL, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd),
+ NULL);
+ *info = nd_cmd.cmd;
+ return rc;
+}
+
+/* The fw_ops expect to be called with the nvdimm_bus_lock() held */
+static enum nvdimm_fwa_state intel_bus_fwa_state(
+ struct nvdimm_bus_descriptor *nd_desc)
+{
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+ struct nd_intel_bus_fw_activate_businfo info;
+ struct device *dev = acpi_desc->dev;
+ enum nvdimm_fwa_state state;
+ int rc;
+
+ /*
+ * It should not be possible for platform firmware to return
+ * busy because activate is a synchronous operation. Treat it
+ * similar to invalid, i.e. always refresh / poll the status.
+ */
+ switch (acpi_desc->fwa_state) {
+ case NVDIMM_FWA_INVALID:
+ case NVDIMM_FWA_BUSY:
+ break;
+ default:
+ /* check if capability needs to be refreshed */
+ if (acpi_desc->fwa_cap == NVDIMM_FWA_CAP_INVALID)
+ break;
+ return acpi_desc->fwa_state;
+ }
+
+ /* Refresh with platform firmware */
+ rc = intel_bus_fwa_businfo(nd_desc, &info);
+ if (rc)
+ return NVDIMM_FWA_INVALID;
+
+ switch (info.state) {
+ case ND_INTEL_FWA_IDLE:
+ state = NVDIMM_FWA_IDLE;
+ break;
+ case ND_INTEL_FWA_BUSY:
+ state = NVDIMM_FWA_BUSY;
+ break;
+ case ND_INTEL_FWA_ARMED:
+ if (info.activate_tmo > info.max_quiesce_tmo)
+ state = NVDIMM_FWA_ARM_OVERFLOW;
+ else
+ state = NVDIMM_FWA_ARMED;
+ break;
+ default:
+ dev_err_once(dev, "invalid firmware activate state %d\n",
+ info.state);
+ return NVDIMM_FWA_INVALID;
+ }
+
+ /*
+ * Capability data is available in the same payload as state.
+ * While it can be assumed that it does not change it may need
+ * to be re-evaluated relative to the ->fwa_noidle and
+ * ->fwa_nosuspend settings.
+ */
+ if (acpi_desc->fwa_cap == NVDIMM_FWA_CAP_INVALID) {
+ if (info.capability & (ND_INTEL_BUS_FWA_CAP_FWQUIESCE))
+ acpi_desc->fwa_cap = NVDIMM_FWA_CAP_QUIESCE;
+ else if (acpi_desc->fwa_noidle && (info.capability
+ & ND_INTEL_BUS_FWA_CAP_OSQUIESCE))
+ acpi_desc->fwa_cap = NVDIMM_FWA_CAP_QUIESCE;
+ else
+ acpi_desc->fwa_cap = NVDIMM_FWA_CAP_NONE;
+ }
+
+ if (acpi_desc->fwa_cap == NVDIMM_FWA_CAP_QUIESCE
+ && acpi_desc->fwa_nosuspend)
+ acpi_desc->fwa_cap = NVDIMM_FWA_CAP_LIVE;
+
+ acpi_desc->fwa_state = state;
+
+ return state;
+}
+
+static enum nvdimm_fwa_capability intel_bus_fwa_capability(
+ struct nvdimm_bus_descriptor *nd_desc)
+{
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+
+ if (acpi_desc->fwa_cap > NVDIMM_FWA_CAP_INVALID)
+ return acpi_desc->fwa_cap;
+
+ if (intel_bus_fwa_state(nd_desc) > NVDIMM_FWA_INVALID)
+ return acpi_desc->fwa_cap;
+
+ return NVDIMM_FWA_CAP_INVALID;
+}
+
+static int intel_bus_fwa_activate(struct nvdimm_bus_descriptor *nd_desc)
+{
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+ struct {
+ struct nd_cmd_pkg pkg;
+ struct nd_intel_bus_fw_activate cmd;
+ } nd_cmd = {
+ .pkg = {
+ .nd_command = NVDIMM_BUS_INTEL_FW_ACTIVATE,
+ .nd_family = NVDIMM_BUS_FAMILY_INTEL,
+ .nd_size_in = sizeof(nd_cmd.cmd.iodev_state),
+ .nd_size_out =
+ sizeof(struct nd_intel_bus_fw_activate),
+ .nd_fw_size =
+ sizeof(struct nd_intel_bus_fw_activate),
+ },
+ /*
+ * Even though activate is run from a suspended context,
+ * for safety, still ask platform firmware to force
+ * quiesce devices by default. Let a module
+ * parameter override that policy.
+ */
+ .cmd = {
+ .iodev_state = acpi_desc->fwa_noidle
+ ? ND_INTEL_BUS_FWA_IODEV_OS_IDLE
+ : ND_INTEL_BUS_FWA_IODEV_FORCE_IDLE,
+ },
+ };
+ int rc;
+
+ switch (intel_bus_fwa_state(nd_desc)) {
+ case NVDIMM_FWA_ARMED:
+ case NVDIMM_FWA_ARM_OVERFLOW:
+ break;
+ default:
+ return -ENXIO;
+ }
+
+ rc = nd_desc->ndctl(nd_desc, NULL, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd),
+ NULL);
+
+ /*
+ * Whether the command succeeded, or failed, the agent checking
+ * for the result needs to query the DIMMs individually.
+ * Increment the activation count to invalidate all the DIMM
+ * states at once (it's otherwise not possible to take
+ * acpi_desc->init_mutex in this context)
+ */
+ acpi_desc->fwa_state = NVDIMM_FWA_INVALID;
+ acpi_desc->fwa_count++;
+
+ dev_dbg(acpi_desc->dev, "result: %d\n", rc);
+
+ return rc;
+}
+
+static const struct nvdimm_bus_fw_ops __intel_bus_fw_ops = {
+ .activate_state = intel_bus_fwa_state,
+ .capability = intel_bus_fwa_capability,
+ .activate = intel_bus_fwa_activate,
+};
+
+const struct nvdimm_bus_fw_ops *intel_bus_fw_ops = &__intel_bus_fw_ops;
+
+static int intel_fwa_dimminfo(struct nvdimm *nvdimm,
+ struct nd_intel_fw_activate_dimminfo *info)
+{
+ struct {
+ struct nd_cmd_pkg pkg;
+ struct nd_intel_fw_activate_dimminfo cmd;
+ } nd_cmd = {
+ .pkg = {
+ .nd_command = NVDIMM_INTEL_FW_ACTIVATE_DIMMINFO,
+ .nd_family = NVDIMM_FAMILY_INTEL,
+ .nd_size_out =
+ sizeof(struct nd_intel_fw_activate_dimminfo),
+ .nd_fw_size =
+ sizeof(struct nd_intel_fw_activate_dimminfo),
+ },
+ };
+ int rc;
+
+ rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
+ *info = nd_cmd.cmd;
+ return rc;
+}
+
+static enum nvdimm_fwa_state intel_fwa_state(struct nvdimm *nvdimm)
+{
+ struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
+ struct acpi_nfit_desc *acpi_desc = nfit_mem->acpi_desc;
+ struct nd_intel_fw_activate_dimminfo info;
+ int rc;
+
+ /*
+ * Similar to the bus state, since activate is synchronous the
+ * busy state should resolve within the context of 'activate'.
+ */
+ switch (nfit_mem->fwa_state) {
+ case NVDIMM_FWA_INVALID:
+ case NVDIMM_FWA_BUSY:
+ break;
+ default:
+ /* If no activations occurred the old state is still valid */
+ if (nfit_mem->fwa_count == acpi_desc->fwa_count)
+ return nfit_mem->fwa_state;
+ }
+
+ rc = intel_fwa_dimminfo(nvdimm, &info);
+ if (rc)
+ return NVDIMM_FWA_INVALID;
+
+ switch (info.state) {
+ case ND_INTEL_FWA_IDLE:
+ nfit_mem->fwa_state = NVDIMM_FWA_IDLE;
+ break;
+ case ND_INTEL_FWA_BUSY:
+ nfit_mem->fwa_state = NVDIMM_FWA_BUSY;
+ break;
+ case ND_INTEL_FWA_ARMED:
+ nfit_mem->fwa_state = NVDIMM_FWA_ARMED;
+ break;
+ default:
+ nfit_mem->fwa_state = NVDIMM_FWA_INVALID;
+ break;
+ }
+
+ switch (info.result) {
+ case ND_INTEL_DIMM_FWA_NONE:
+ nfit_mem->fwa_result = NVDIMM_FWA_RESULT_NONE;
+ break;
+ case ND_INTEL_DIMM_FWA_SUCCESS:
+ nfit_mem->fwa_result = NVDIMM_FWA_RESULT_SUCCESS;
+ break;
+ case ND_INTEL_DIMM_FWA_NOTSTAGED:
+ nfit_mem->fwa_result = NVDIMM_FWA_RESULT_NOTSTAGED;
+ break;
+ case ND_INTEL_DIMM_FWA_NEEDRESET:
+ nfit_mem->fwa_result = NVDIMM_FWA_RESULT_NEEDRESET;
+ break;
+ case ND_INTEL_DIMM_FWA_MEDIAFAILED:
+ case ND_INTEL_DIMM_FWA_ABORT:
+ case ND_INTEL_DIMM_FWA_NOTSUPP:
+ case ND_INTEL_DIMM_FWA_ERROR:
+ default:
+ nfit_mem->fwa_result = NVDIMM_FWA_RESULT_FAIL;
+ break;
+ }
+
+ nfit_mem->fwa_count = acpi_desc->fwa_count;
+
+ return nfit_mem->fwa_state;
+}
+
+static enum nvdimm_fwa_result intel_fwa_result(struct nvdimm *nvdimm)
+{
+ struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
+ struct acpi_nfit_desc *acpi_desc = nfit_mem->acpi_desc;
+
+ if (nfit_mem->fwa_count == acpi_desc->fwa_count
+ && nfit_mem->fwa_result > NVDIMM_FWA_RESULT_INVALID)
+ return nfit_mem->fwa_result;
+
+ if (intel_fwa_state(nvdimm) > NVDIMM_FWA_INVALID)
+ return nfit_mem->fwa_result;
+
+ return NVDIMM_FWA_RESULT_INVALID;
+}
+
+static int intel_fwa_arm(struct nvdimm *nvdimm, enum nvdimm_fwa_trigger arm)
+{
+ struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
+ struct acpi_nfit_desc *acpi_desc = nfit_mem->acpi_desc;
+ struct {
+ struct nd_cmd_pkg pkg;
+ struct nd_intel_fw_activate_arm cmd;
+ } nd_cmd = {
+ .pkg = {
+ .nd_command = NVDIMM_INTEL_FW_ACTIVATE_ARM,
+ .nd_family = NVDIMM_FAMILY_INTEL,
+ .nd_size_in = sizeof(nd_cmd.cmd.activate_arm),
+ .nd_size_out =
+ sizeof(struct nd_intel_fw_activate_arm),
+ .nd_fw_size =
+ sizeof(struct nd_intel_fw_activate_arm),
+ },
+ .cmd = {
+ .activate_arm = arm == NVDIMM_FWA_ARM
+ ? ND_INTEL_DIMM_FWA_ARM
+ : ND_INTEL_DIMM_FWA_DISARM,
+ },
+ };
+ int rc;
+
+ switch (intel_fwa_state(nvdimm)) {
+ case NVDIMM_FWA_INVALID:
+ return -ENXIO;
+ case NVDIMM_FWA_BUSY:
+ return -EBUSY;
+ case NVDIMM_FWA_IDLE:
+ if (arm == NVDIMM_FWA_DISARM)
+ return 0;
+ break;
+ case NVDIMM_FWA_ARMED:
+ if (arm == NVDIMM_FWA_ARM)
+ return 0;
+ break;
+ default:
+ return -ENXIO;
+ }
+
+ /*
+ * Invalidate the bus-level state, now that we're committed to
+ * changing the 'arm' state.
+ */
+ acpi_desc->fwa_state = NVDIMM_FWA_INVALID;
+ nfit_mem->fwa_state = NVDIMM_FWA_INVALID;
+
+ rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
+
+ dev_dbg(acpi_desc->dev, "%s result: %d\n", arm == NVDIMM_FWA_ARM
+ ? "arm" : "disarm", rc);
+ return rc;
+}
+
+static const struct nvdimm_fw_ops __intel_fw_ops = {
+ .activate_state = intel_fwa_state,
+ .activate_result = intel_fwa_result,
+ .arm = intel_fwa_arm,
+};
+
+const struct nvdimm_fw_ops *intel_fw_ops = &__intel_fw_ops;
diff --git a/drivers/acpi/nfit/intel.h b/drivers/acpi/nfit/intel.h
index 49a598623024..b768234ccebc 100644
--- a/drivers/acpi/nfit/intel.h
+++ b/drivers/acpi/nfit/intel.h
@@ -169,4 +169,7 @@ struct nd_intel_bus_fw_activate {
u8 iodev_state;
u32 status;
} __packed;
+
+extern const struct nvdimm_fw_ops *intel_fw_ops;
+extern const struct nvdimm_bus_fw_ops *intel_bus_fw_ops;
#endif
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index 97c122628975..b0c2e83d0fea 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -220,6 +220,9 @@ struct nfit_mem {
struct list_head list;
struct acpi_device *adev;
struct acpi_nfit_desc *acpi_desc;
+ enum nvdimm_fwa_state fwa_state;
+ enum nvdimm_fwa_result fwa_result;
+ int fwa_count;
char id[NFIT_DIMM_ID_LEN+1];
struct resource *flush_wpq;
unsigned long dsm_mask;
@@ -265,6 +268,11 @@ struct acpi_nfit_desc {
unsigned int scrub_tmo;
int (*blk_do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
void *iobuf, u64 len, int rw);
+ enum nvdimm_fwa_state fwa_state;
+ enum nvdimm_fwa_capability fwa_cap;
+ int fwa_count;
+ bool fwa_noidle;
+ bool fwa_nosuspend;
};
enum scrub_mode {
@@ -367,4 +375,7 @@ void __acpi_nvdimm_notify(struct device *dev, u32 event);
int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm,
unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc);
void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev);
+bool intel_fwa_supported(struct nvdimm_bus *nvdimm_bus);
+extern struct device_attribute dev_attr_firmware_activate_noidle;
+extern struct device_attribute dev_attr_firmware_activate_nosuspend;
#endif /* __NFIT_H__ */
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 0cbb5620cd45..56fca9f059fd 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -484,6 +484,14 @@ static const struct attribute_group nvdimm_bus_attribute_group = {
.is_visible = nvdimm_bus_visible,
};
+int nvdimm_bus_update_sysfs(struct nvdimm_bus *nvdimm_bus)
+{
+ struct device *dev = &nvdimm_bus->dev;
+
+ return sysfs_update_group(&dev->kobj, &nvdimm_bus_attribute_group);
+}
+EXPORT_SYMBOL_GPL(nvdimm_bus_update_sysfs);
+
const struct attribute_group *nvdimm_bus_attribute_groups[] = {
&nvdimm_bus_attribute_group,
NULL,
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 85835f9add7a..b6d9fa8f72df 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -562,7 +562,8 @@ struct nvdimm *__nvdimm_create(struct nvdimm_bus *nvdimm_bus,
void *provider_data, const struct attribute_group **groups,
unsigned long flags, unsigned long cmd_mask, int num_flush,
struct resource *flush_wpq, const char *dimm_id,
- const struct nvdimm_security_ops *sec_ops)
+ const struct nvdimm_security_ops *sec_ops,
+ const struct nvdimm_fw_ops *fw_ops)
{
struct nvdimm *nvdimm = kzalloc(sizeof(*nvdimm), GFP_KERNEL);
struct device *dev;
@@ -592,6 +593,7 @@ struct nvdimm *__nvdimm_create(struct nvdimm_bus *nvdimm_bus,
dev->devt = MKDEV(nvdimm_major, nvdimm->id);
dev->groups = groups;
nvdimm->sec.ops = sec_ops;
+ nvdimm->fw_ops = fw_ops;
nvdimm->sec.overwrite_tmo = 0;
INIT_DELAYED_WORK(&nvdimm->dwork, nvdimm_security_overwrite_query);
/*
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 15dbcb718316..815f57f23b7a 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -244,6 +244,7 @@ struct nvdimm_fw_ops {
int (*arm)(struct nvdimm *nvdimm, enum nvdimm_fwa_trigger arg);
};
+int nvdimm_bus_update_sysfs(struct nvdimm_bus *bus);
void badrange_init(struct badrange *badrange);
int badrange_add(struct badrange *badrange, u64 addr, u64 length);
void badrange_forget(struct badrange *badrange, phys_addr_t start,
@@ -269,14 +270,15 @@ struct nvdimm *__nvdimm_create(struct nvdimm_bus *nvdimm_bus,
void *provider_data, const struct attribute_group **groups,
unsigned long flags, unsigned long cmd_mask, int num_flush,
struct resource *flush_wpq, const char *dimm_id,
- const struct nvdimm_security_ops *sec_ops);
+ const struct nvdimm_security_ops *sec_ops,
+ const struct nvdimm_fw_ops *fw_ops);
static inline struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus,
void *provider_data, const struct attribute_group **groups,
unsigned long flags, unsigned long cmd_mask, int num_flush,
struct resource *flush_wpq)
{
return __nvdimm_create(nvdimm_bus, provider_data, groups, flags,
- cmd_mask, num_flush, flush_wpq, NULL, NULL);
+ cmd_mask, num_flush, flush_wpq, NULL, NULL, NULL);
}
const struct nd_cmd_desc *nd_cmd_dimm_desc(int cmd);
Hi!
> @@ -234,6 +234,7 @@ static const char * const pm_tests[__TEST_AFTER_LAST] = {
> [TEST_PLATFORM] = "platform",
> [TEST_DEVICES] = "devices",
> [TEST_FREEZER] = "freezer",
> + [TEST_MEM_QUIET] = "mem-quiet",
> };
>
This adds field to sysfs interface, right? I guess we need
documentation for that....
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Tuesday, July 7, 2020 3:59:32 AM CEST Dan Williams wrote:
> The runtime firmware activation capability of Intel NVDIMM devices
> requires memory transactions to be disabled for 100s of microseconds.
> This timeout is large enough to cause in-flight DMA to fail and other
> application detectable timeouts. Arrange for firmware activation to be
> executed while the system is "quiesced", all processes and device-DMA
> frozen.
>
> It is already required that invoking device ->freeze() callbacks is
> sufficient to cease DMA. A device that continues memory writes outside
> of user-direction violates expectations of the PM core to be to
> establish a coherent hibernation image.
>
> That said, RDMA devices are an example of a device that access memory
> outside of user process direction. RDMA drivers also typically assume
> the system they are operating in will never be hibernated. A solution
> for RDMA collisions with firmware activation is outside the scope of
> this change and may need to rely on being able to survive the platform
> imposed memory controller quiesce period.
Thanks for following my suggestion to use the hibernation infrastructure
rather than the suspend one, but I think it would be better to go a bit
further with that.
Namely, after thinking about this a bit more I have come to the conclusion
that what is needed is an ability to execute a function, inside of the
kernel, in a "quiet" environment in which memory updates are unlikely.
While the hibernation infrastructure as is can be used for that, kind of, IMO
it would be cleaner to introduce a helper for that, like in the (untested)
patch below, so if the "quiet execution environment" is needed, whoever
needs it may simply pass a function to hibernate_quiet_exec() and provide
whatever user-space I/F is suitable on top of that.
Please let me know what you think.
Cheers!
---
include/linux/suspend.h | 6 ++
kernel/power/hibernate.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 103 insertions(+)
Index: linux-pm/kernel/power/hibernate.c
===================================================================
--- linux-pm.orig/kernel/power/hibernate.c
+++ linux-pm/kernel/power/hibernate.c
@@ -795,6 +795,103 @@ int hibernate(void)
return error;
}
+/**
+ * hibernate_quiet_exec - Execute a function with all devices frozen.
+ * @func: Function to execute.
+ * @data: Data pointer to pass to @func.
+ *
+ * Return the @func return value or an error code if it cannot be executed.
+ */
+int hibernate_quiet_exec(int (*func)(void *data), void *data)
+{
+ int error, nr_calls = 0;
+
+ lock_system_sleep();
+
+ if (!hibernate_acquire()) {
+ error = -EBUSY;
+ goto unlock;
+ }
+
+ pm_prepare_console();
+
+ error = __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls);
+ if (error) {
+ nr_calls--;
+ goto exit;
+ }
+
+ error = freeze_processes();
+ if (error)
+ goto exit;
+
+ lock_device_hotplug();
+
+ pm_suspend_clear_flags();
+
+ error = platform_begin(true);
+ if (error)
+ goto thaw;
+
+ error = freeze_kernel_threads();
+ if (error)
+ goto thaw;
+
+ error = dpm_prepare(PMSG_FREEZE);
+ if (error)
+ goto dpm_complete;
+
+ suspend_console();
+
+ error = dpm_suspend(PMSG_FREEZE);
+ if (error)
+ goto dpm_resume;
+
+ error = dpm_suspend_end(PMSG_FREEZE);
+ if (error)
+ goto dpm_resume;
+
+ error = platform_pre_snapshot(true);
+ if (error)
+ goto skip;
+
+ error = func(data);
+
+skip:
+ platform_finish(true);
+
+ dpm_resume_start(PMSG_THAW);
+
+dpm_resume:
+ dpm_resume(PMSG_THAW);
+
+ resume_console();
+
+dpm_complete:
+ dpm_complete(PMSG_THAW);
+
+ thaw_kernel_threads();
+
+thaw:
+ platform_end(true);
+
+ unlock_device_hotplug();
+
+ thaw_processes();
+
+exit:
+ __pm_notifier_call_chain(PM_POST_HIBERNATION, nr_calls, NULL);
+
+ pm_restore_console();
+
+ hibernate_release();
+
+unlock:
+ unlock_system_sleep();
+
+ return error;
+}
+EXPORT_SYMBOL_GPL(hibernate_quiet_exec);
/**
* software_resume - Resume from a saved hibernation image.
Index: linux-pm/include/linux/suspend.h
===================================================================
--- linux-pm.orig/include/linux/suspend.h
+++ linux-pm/include/linux/suspend.h
@@ -453,6 +453,8 @@ extern bool hibernation_available(void);
asmlinkage int swsusp_save(void);
extern struct pbe *restore_pblist;
int pfn_is_nosave(unsigned long pfn);
+
+int hibernate_quiet_exec(int (*func)(void *data), void *data);
#else /* CONFIG_HIBERNATION */
static inline void register_nosave_region(unsigned long b, unsigned long e) {}
static inline void register_nosave_region_late(unsigned long b, unsigned long e) {}
@@ -464,6 +466,10 @@ static inline void hibernation_set_ops(c
static inline int hibernate(void) { return -ENOSYS; }
static inline bool system_entering_hibernation(void) { return false; }
static inline bool hibernation_available(void) { return false; }
+
+static inline hibernate_quiet_exec(int (*func)(void *data), void *data) {
+ return -ENOTSUPP;
+}
#endif /* CONFIG_HIBERNATION */
#ifdef CONFIG_HIBERNATION_SNAPSHOT_DEV
On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote:
> The runtime firmware activation capability of Intel NVDIMM devices
> requires memory transactions to be disabled for 100s of microseconds.
> This timeout is large enough to cause in-flight DMA to fail and other
> application detectable timeouts. Arrange for firmware activation to be
> executed while the system is "quiesced", all processes and device-DMA
> frozen.
>
> It is already required that invoking device ->freeze() callbacks is
> sufficient to cease DMA. A device that continues memory writes outside
> of user-direction violates expectations of the PM core to be to
> establish a coherent hibernation image.
>
> That said, RDMA devices are an example of a device that access memory
> outside of user process direction. RDMA drivers also typically assume
> the system they are operating in will never be hibernated. A solution
> for RDMA collisions with firmware activation is outside the scope of
> this change and may need to rely on being able to survive the platform
> imposed memory controller quiesce period.
Yikes. I don't think we should support such a broken runtime firmware
activation.
On Thu, Jul 09, 2020 at 04:00:51PM +0100, Christoph Hellwig wrote:
> On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote:
> > The runtime firmware activation capability of Intel NVDIMM devices
> > requires memory transactions to be disabled for 100s of microseconds.
> > This timeout is large enough to cause in-flight DMA to fail and other
> > application detectable timeouts. Arrange for firmware activation to be
> > executed while the system is "quiesced", all processes and device-DMA
> > frozen.
> >
> > It is already required that invoking device ->freeze() callbacks is
> > sufficient to cease DMA. A device that continues memory writes outside
> > of user-direction violates expectations of the PM core to be to
> > establish a coherent hibernation image.
> >
> > That said, RDMA devices are an example of a device that access memory
> > outside of user process direction.
Are you saying freeze doesn't work for some RDMA drivers? That would
be a driver bug, I think.
The consequences of doing freeze are pretty serious, but it should
still stop DMA.
Jason
On Thu, Jul 9, 2020 at 5:39 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Thu, Jul 09, 2020 at 04:00:51PM +0100, Christoph Hellwig wrote:
> > On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote:
> > > The runtime firmware activation capability of Intel NVDIMM devices
> > > requires memory transactions to be disabled for 100s of microseconds.
> > > This timeout is large enough to cause in-flight DMA to fail and other
> > > application detectable timeouts. Arrange for firmware activation to be
> > > executed while the system is "quiesced", all processes and device-DMA
> > > frozen.
> > >
> > > It is already required that invoking device ->freeze() callbacks is
> > > sufficient to cease DMA. A device that continues memory writes outside
> > > of user-direction violates expectations of the PM core to be to
> > > establish a coherent hibernation image.
> > >
> > > That said, RDMA devices are an example of a device that access memory
> > > outside of user process direction.
>
> Are you saying freeze doesn't work for some RDMA drivers? That would
> be a driver bug, I think.
>
> The consequences of doing freeze are pretty serious, but it should
> still stop DMA.
Yes, it should. The "freeze" callbacks are expected to prevent any
DMA transfers from being carried out after they have been executed.
Thanks!
On Thu, Jul 9, 2020 at 8:01 AM Christoph Hellwig <[email protected]> wrote:
>
> On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote:
> > The runtime firmware activation capability of Intel NVDIMM devices
> > requires memory transactions to be disabled for 100s of microseconds.
> > This timeout is large enough to cause in-flight DMA to fail and other
> > application detectable timeouts. Arrange for firmware activation to be
> > executed while the system is "quiesced", all processes and device-DMA
> > frozen.
> >
> > It is already required that invoking device ->freeze() callbacks is
> > sufficient to cease DMA. A device that continues memory writes outside
> > of user-direction violates expectations of the PM core to be to
> > establish a coherent hibernation image.
> >
> > That said, RDMA devices are an example of a device that access memory
> > outside of user process direction. RDMA drivers also typically assume
> > the system they are operating in will never be hibernated. A solution
> > for RDMA collisions with firmware activation is outside the scope of
> > this change and may need to rely on being able to survive the platform
> > imposed memory controller quiesce period.
>
> Yikes. I don't think we should support such a broken runtime firmware
> activation.
Yikes indeed, that matches my initial reaction.
So I can say that the platform folks recognize that this situation is
untenable and you can see in the interface that it is built to support
future platforms that can activate without a memory quiesce period.
The question is what to do in the meantime. It turns out that despite
my initial skeptical reaction a significant number of users are
willing to manage this quiesce (or even race this quiesce!) to avoid a
server reboot which is basically guaranteed to knock a "9" off of a "5
nines" uptime system. My minimum requirement for supporting this in
Linux was to at least have a safe way to mitigate the risks of a race
and that's what led me to the hibernate path.
On Thu, Jul 9, 2020 at 8:39 AM Jason Gunthorpe <[email protected]> wrote:
>
> On Thu, Jul 09, 2020 at 04:00:51PM +0100, Christoph Hellwig wrote:
> > On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote:
> > > The runtime firmware activation capability of Intel NVDIMM devices
> > > requires memory transactions to be disabled for 100s of microseconds.
> > > This timeout is large enough to cause in-flight DMA to fail and other
> > > application detectable timeouts. Arrange for firmware activation to be
> > > executed while the system is "quiesced", all processes and device-DMA
> > > frozen.
> > >
> > > It is already required that invoking device ->freeze() callbacks is
> > > sufficient to cease DMA. A device that continues memory writes outside
> > > of user-direction violates expectations of the PM core to be to
> > > establish a coherent hibernation image.
> > >
> > > That said, RDMA devices are an example of a device that access memory
> > > outside of user process direction.
>
> Are you saying freeze doesn't work for some RDMA drivers? That would
> be a driver bug, I think.
Right, it's more my hunch than a known bug at this point, but in my
experience with testing server class hardware when I've reported a
power management bugs I've sometimes got the incredulous response "who
suspends / hibernates servers!?". I can drop that comment.
Are there protocol timeouts that might need to be adjusted for a 100s
of microseconds blip in memory controller response?
> The consequences of doing freeze are pretty serious, but it should
> still stop DMA.
Ok, and there is still the option to race the quiesce if the effects
of the freeze are worse than a potential timeout from the quiesce.
On Thu, Jul 09, 2020 at 09:10:06AM -0700, Dan Williams wrote:
> On Thu, Jul 9, 2020 at 8:39 AM Jason Gunthorpe <[email protected]> wrote:
> >
> > On Thu, Jul 09, 2020 at 04:00:51PM +0100, Christoph Hellwig wrote:
> > > On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote:
> > > > The runtime firmware activation capability of Intel NVDIMM devices
> > > > requires memory transactions to be disabled for 100s of microseconds.
> > > > This timeout is large enough to cause in-flight DMA to fail and other
> > > > application detectable timeouts. Arrange for firmware activation to be
> > > > executed while the system is "quiesced", all processes and device-DMA
> > > > frozen.
> > > >
> > > > It is already required that invoking device ->freeze() callbacks is
> > > > sufficient to cease DMA. A device that continues memory writes outside
> > > > of user-direction violates expectations of the PM core to be to
> > > > establish a coherent hibernation image.
> > > >
> > > > That said, RDMA devices are an example of a device that access memory
> > > > outside of user process direction.
> >
> > Are you saying freeze doesn't work for some RDMA drivers? That would
> > be a driver bug, I think.
>
> Right, it's more my hunch than a known bug at this point, but in my
> experience with testing server class hardware when I've reported a
> power management bugs I've sometimes got the incredulous response "who
> suspends / hibernates servers!?". I can drop that comment.
>
> Are there protocol timeouts that might need to be adjusted for a 100s
> of microseconds blip in memory controller response?
Survivability depends alot on HW support, it has to suspend, not
discard DMAs that it needs to issue. Most likely things are as you
say, and HW doesn't support safe short time suspend. The usual use of
PM stuff here is to make the machine ready for kexec
Jason
On Thu, Jul 9, 2020 at 7:57 AM Rafael J. Wysocki <[email protected]> wrote:
>
> On Tuesday, July 7, 2020 3:59:32 AM CEST Dan Williams wrote:
> > The runtime firmware activation capability of Intel NVDIMM devices
> > requires memory transactions to be disabled for 100s of microseconds.
> > This timeout is large enough to cause in-flight DMA to fail and other
> > application detectable timeouts. Arrange for firmware activation to be
> > executed while the system is "quiesced", all processes and device-DMA
> > frozen.
> >
> > It is already required that invoking device ->freeze() callbacks is
> > sufficient to cease DMA. A device that continues memory writes outside
> > of user-direction violates expectations of the PM core to be to
> > establish a coherent hibernation image.
> >
> > That said, RDMA devices are an example of a device that access memory
> > outside of user process direction. RDMA drivers also typically assume
> > the system they are operating in will never be hibernated. A solution
> > for RDMA collisions with firmware activation is outside the scope of
> > this change and may need to rely on being able to survive the platform
> > imposed memory controller quiesce period.
>
> Thanks for following my suggestion to use the hibernation infrastructure
> rather than the suspend one, but I think it would be better to go a bit
> further with that.
>
> Namely, after thinking about this a bit more I have come to the conclusion
> that what is needed is an ability to execute a function, inside of the
> kernel, in a "quiet" environment in which memory updates are unlikely.
>
> While the hibernation infrastructure as is can be used for that, kind of, IMO
> it would be cleaner to introduce a helper for that, like in the (untested)
> patch below, so if the "quiet execution environment" is needed, whoever
> needs it may simply pass a function to hibernate_quiet_exec() and provide
> whatever user-space I/F is suitable on top of that.
>
> Please let me know what you think.
This looks good to me in concept.
Would you expect that I trigger this from libnvdimm sysfs, or any
future users of this functionality to trigger it through their own
subsystem specific mechanisms?
I have a place for it in libvdimm and could specify the activation
method directly as "suspend" vs "live" activation.
On Thursday, July 9, 2020 9:04:30 PM CEST Dan Williams wrote:
> On Thu, Jul 9, 2020 at 7:57 AM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Tuesday, July 7, 2020 3:59:32 AM CEST Dan Williams wrote:
> > > The runtime firmware activation capability of Intel NVDIMM devices
> > > requires memory transactions to be disabled for 100s of microseconds.
> > > This timeout is large enough to cause in-flight DMA to fail and other
> > > application detectable timeouts. Arrange for firmware activation to be
> > > executed while the system is "quiesced", all processes and device-DMA
> > > frozen.
> > >
> > > It is already required that invoking device ->freeze() callbacks is
> > > sufficient to cease DMA. A device that continues memory writes outside
> > > of user-direction violates expectations of the PM core to be to
> > > establish a coherent hibernation image.
> > >
> > > That said, RDMA devices are an example of a device that access memory
> > > outside of user process direction. RDMA drivers also typically assume
> > > the system they are operating in will never be hibernated. A solution
> > > for RDMA collisions with firmware activation is outside the scope of
> > > this change and may need to rely on being able to survive the platform
> > > imposed memory controller quiesce period.
> >
> > Thanks for following my suggestion to use the hibernation infrastructure
> > rather than the suspend one, but I think it would be better to go a bit
> > further with that.
> >
> > Namely, after thinking about this a bit more I have come to the conclusion
> > that what is needed is an ability to execute a function, inside of the
> > kernel, in a "quiet" environment in which memory updates are unlikely.
> >
> > While the hibernation infrastructure as is can be used for that, kind of, IMO
> > it would be cleaner to introduce a helper for that, like in the (untested)
> > patch below, so if the "quiet execution environment" is needed, whoever
> > needs it may simply pass a function to hibernate_quiet_exec() and provide
> > whatever user-space I/F is suitable on top of that.
> >
> > Please let me know what you think.
>
> This looks good to me in concept.
>
> Would you expect that I trigger this from libnvdimm sysfs, or any
> future users of this functionality to trigger it through their own
> subsystem specific mechanisms?
Yes, I would.
> I have a place for it in libvdimm and could specify the activation
> method directly as "suspend" vs "live" activation.
Sounds good to me.
Cheers!