2013-07-23 08:10:15

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 00/13] ACPI/IPMI: Fix several issues in the current codes

This patchset tries to fix the following kernel bug:
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=46741
This is fixed by [PATCH 05].

The bug shows IPMI operation region may appear in a device not under the
IPMI system interface device's scope, thus it's required to install the
ACPI IPMI operation region handler from the root of the ACPI namespace.

The original acpi_ipmi implementation includes several issues that break
the test process. This patchset also includes a re-design of acpi_ipmi
module to make the test possible.

[PATCH 01-05] are bug-fix patches that can be applied to the kernels whose
version is > 2.6.38. This can be confirmed with:
# git tag --contains e92b297c
[PATCH 06] is also a bug-fix patch.
The drivers/acpi/osl.c part can be back ported to the kernels
whose version > 2.6.14. This can be confirmed with:
# git tag --contains 4be44fcd
The drivers/acpi/acpi_ipmi.c part can be applied on top of
[PATCH 01-05].
[PATCH 07] is a tuning patch for acpi_ipmi.c.
[PATCH 08-10] are cleanup patches for acpi_ipmi.c.
[PATCH 11] is a cleanup patch not for acpi_ipmi.c.
[PATCH 12-13] are test patches.
[PATCH 12] may be accepted by upstream kernel as a useful
facility to test the loading/unloading of the
modules.
[PATCH 13] should not be merged by any published kernel as it
is a driver for a pseudo device with a PnP ID that
does not exist in the real machines.

This patchset has passed the test around a fake device accessing IPMI
operation region fields on an IPMI capable platform. A stress test of
module(acpi_ipmi) load/unload has been performed on such platform. No
races can be found and the IPMI operation region handler is functioning
now. It is not possible to test module(ipmi_si) load/unload as it can't be
unloaded due to its' transfer flushing implementation.

Lv Zheng (13):
ACPI/IPMI: Fix potential response buffer overflow
ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler()
ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
ACPI/IPMI: Fix issue caused by the per-device registration of the
IPMI operation region handler
ACPI/IPMI: Add reference counting for ACPI operation region handlers
ACPI/IPMI: Add reference counting for ACPI IPMI transfers
ACPI/IPMI: Cleanup several acpi_ipmi_device members
ACPI/IPMI: Cleanup some initialization codes
ACPI/IPMI: Cleanup some inclusion codes
ACPI/IPMI: Cleanup some Kconfig codes
Testing: Add module load/unload test suite
ACPI/IPMI: Add IPMI operation region test device driver

drivers/acpi/Kconfig | 71 +++-
drivers/acpi/Makefile | 1 +
drivers/acpi/acpi_ipmi.c | 513 +++++++++++++++----------
drivers/acpi/ipmi_test.c | 254 ++++++++++++
drivers/acpi/osl.c | 224 +++++++++++
include/acpi/acpi_bus.h | 5 +
tools/testing/module-unloading/endless_cat.sh | 32 ++
tools/testing/module-unloading/endless_mod.sh | 81 ++++
8 files changed, 977 insertions(+), 204 deletions(-)
create mode 100644 drivers/acpi/ipmi_test.c
create mode 100755 tools/testing/module-unloading/endless_cat.sh
create mode 100755 tools/testing/module-unloading/endless_mod.sh

--
1.7.10


2013-07-23 08:09:15

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow

This patch enhances sanity checks on message size to avoid potential buffer
overflow.

The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while the
ACPI specification defined IPMI message size is 64 bytes. The difference
is not handled by the original codes. This may cause crash in the response
handling codes.
This patch fixes this gap and also combines rx_data/tx_data to use single
data/len pair since they need not be seperated.

Signed-off-by: Lv Zheng <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/acpi_ipmi.c | 100 ++++++++++++++++++++++++++++------------------
1 file changed, 61 insertions(+), 39 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index f40acef..28e2b4c 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -51,6 +51,7 @@ MODULE_LICENSE("GPL");
#define ACPI_IPMI_UNKNOWN 0x07
/* the IPMI timeout is 5s */
#define IPMI_TIMEOUT (5 * HZ)
+#define ACPI_IPMI_MAX_MSG_LENGTH 64

struct acpi_ipmi_device {
/* the device list attached to driver_data.ipmi_devices */
@@ -89,11 +90,9 @@ struct acpi_ipmi_msg {
struct completion tx_complete;
struct kernel_ipmi_msg tx_message;
int msg_done;
- /* tx data . And copy it from ACPI object buffer */
- u8 tx_data[64];
- int tx_len;
- u8 rx_data[64];
- int rx_len;
+ /* tx/rx data . And copy it from/to ACPI object buffer */
+ u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
+ u8 rx_len;
struct acpi_ipmi_device *device;
};

@@ -101,7 +100,7 @@ struct acpi_ipmi_msg {
struct acpi_ipmi_buffer {
u8 status;
u8 length;
- u8 data[64];
+ u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
};

static void ipmi_register_bmc(int iface, struct device *dev);
@@ -140,9 +139,9 @@ static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)

#define IPMI_OP_RGN_NETFN(offset) ((offset >> 8) & 0xff)
#define IPMI_OP_RGN_CMD(offset) (offset & 0xff)
-static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
- acpi_physical_address address,
- acpi_integer *value)
+static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
+ acpi_physical_address address,
+ acpi_integer *value)
{
struct kernel_ipmi_msg *msg;
struct acpi_ipmi_buffer *buffer;
@@ -155,15 +154,21 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
*/
msg->netfn = IPMI_OP_RGN_NETFN(address);
msg->cmd = IPMI_OP_RGN_CMD(address);
- msg->data = tx_msg->tx_data;
+ msg->data = tx_msg->data;
/*
* value is the parameter passed by the IPMI opregion space handler.
* It points to the IPMI request message buffer
*/
buffer = (struct acpi_ipmi_buffer *)value;
/* copy the tx message data */
+ if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
+ dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
+ "Unexpected request (msg len %d).\n",
+ buffer->length);
+ return -EINVAL;
+ }
msg->data_len = buffer->length;
- memcpy(tx_msg->tx_data, buffer->data, msg->data_len);
+ memcpy(tx_msg->data, buffer->data, msg->data_len);
/*
* now the default type is SYSTEM_INTERFACE and channel type is BMC.
* If the netfn is APP_REQUEST and the cmd is SEND_MESSAGE,
@@ -181,10 +186,12 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
device->curr_msgid++;
tx_msg->tx_msgid = device->curr_msgid;
mutex_unlock(&device->tx_msg_lock);
+
+ return 0;
}

static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
- acpi_integer *value, int rem_time)
+ acpi_integer *value, int rem_time)
{
struct acpi_ipmi_buffer *buffer;

@@ -206,13 +213,14 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
buffer->status = ACPI_IPMI_UNKNOWN;
return;
}
+
/*
* If the IPMI response message is obtained correctly, the status code
* will be ACPI_IPMI_OK
*/
buffer->status = ACPI_IPMI_OK;
buffer->length = msg->rx_len;
- memcpy(buffer->data, msg->rx_data, msg->rx_len);
+ memcpy(buffer->data, msg->data, msg->rx_len);
}

static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
@@ -244,12 +252,12 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;

if (msg->user != ipmi_device->user_interface) {
- dev_warn(&pnp_dev->dev, "Unexpected response is returned. "
- "returned user %p, expected user %p\n",
- msg->user, ipmi_device->user_interface);
- ipmi_free_recv_msg(msg);
- return;
+ dev_warn(&pnp_dev->dev,
+ "Unexpected response is returned. returned user %p, expected user %p\n",
+ msg->user, ipmi_device->user_interface);
+ goto out_msg;
}
+
mutex_lock(&ipmi_device->tx_msg_lock);
list_for_each_entry(tx_msg, &ipmi_device->tx_msg_list, head) {
if (msg->msgid == tx_msg->tx_msgid) {
@@ -257,24 +265,31 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
break;
}
}
-
mutex_unlock(&ipmi_device->tx_msg_lock);
+
if (!msg_found) {
- dev_warn(&pnp_dev->dev, "Unexpected response (msg id %ld) is "
- "returned.\n", msg->msgid);
- ipmi_free_recv_msg(msg);
- return;
+ dev_warn(&pnp_dev->dev,
+ "Unexpected response (msg id %ld) is returned.\n",
+ msg->msgid);
+ goto out_msg;
}

- if (msg->msg.data_len) {
- /* copy the response data to Rx_data buffer */
- memcpy(tx_msg->rx_data, msg->msg_data, msg->msg.data_len);
- tx_msg->rx_len = msg->msg.data_len;
- tx_msg->msg_done = 1;
+ /* copy the response data to Rx_data buffer */
+ if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
+ dev_WARN_ONCE(&pnp_dev->dev, true,
+ "Unexpected response (msg len %d).\n",
+ msg->msg.data_len);
+ goto out_comp;
}
+ tx_msg->rx_len = msg->msg.data_len;
+ memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
+ tx_msg->msg_done = 1;
+
+out_comp:
complete(&tx_msg->tx_complete);
+out_msg:
ipmi_free_recv_msg(msg);
-};
+}

static void ipmi_register_bmc(int iface, struct device *dev)
{
@@ -353,6 +368,7 @@ static void ipmi_bmc_gone(int iface)
}
mutex_unlock(&driver_data.ipmi_lock);
}
+
/* --------------------------------------------------------------------------
* Address Space Management
* -------------------------------------------------------------------------- */
@@ -371,13 +387,14 @@ static void ipmi_bmc_gone(int iface)

static acpi_status
acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
- u32 bits, acpi_integer *value,
- void *handler_context, void *region_context)
+ u32 bits, acpi_integer *value,
+ void *handler_context, void *region_context)
{
struct acpi_ipmi_msg *tx_msg;
struct acpi_ipmi_device *ipmi_device = handler_context;
int err, rem_time;
acpi_status status;
+
/*
* IPMI opregion message.
* IPMI message is firstly written to the BMC and system software
@@ -394,28 +411,33 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
if (!tx_msg)
return AE_NO_MEMORY;

- acpi_format_ipmi_msg(tx_msg, address, value);
+ if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
+ status = AE_TYPE;
+ goto out_msg;
+ }
+
mutex_lock(&ipmi_device->tx_msg_lock);
list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
mutex_unlock(&ipmi_device->tx_msg_lock);
err = ipmi_request_settime(ipmi_device->user_interface,
- &tx_msg->addr,
- tx_msg->tx_msgid,
- &tx_msg->tx_message,
- NULL, 0, 0, 0);
+ &tx_msg->addr,
+ tx_msg->tx_msgid,
+ &tx_msg->tx_message,
+ NULL, 0, 0, 0);
if (err) {
status = AE_ERROR;
- goto end_label;
+ goto out_list;
}
rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
- IPMI_TIMEOUT);
+ IPMI_TIMEOUT);
acpi_format_ipmi_response(tx_msg, value, rem_time);
status = AE_OK;

-end_label:
+out_list:
mutex_lock(&ipmi_device->tx_msg_lock);
list_del(&tx_msg->head);
mutex_unlock(&ipmi_device->tx_msg_lock);
+out_msg:
kfree(tx_msg);
return status;
}
--
1.7.10

2013-07-23 08:10:29

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers

This patch adds reference couting for ACPI operation region handlers to fix
races caused by the ACPICA address space callback invocations.

ACPICA address space callback invocation is not suitable for Linux
CONFIG_MODULE=y execution environment. This patch tries to protect the
address space callbacks by invoking them under a module safe environment.
The IPMI address space handler is also upgraded in this patch.
The acpi_unregister_region() is designed to meet the following
requirements:
1. It acts as a barrier for operation region callbacks - no callback will
happen after acpi_unregister_region().
2. acpi_unregister_region() is safe to be called in moudle->exit()
functions.
Using reference counting rather than module referencing allows
such benefits to be achieved even when acpi_unregister_region() is called
in the environments other than module->exit().
The header file of include/acpi/acpi_bus.h should contain the declarations
that have references to some ACPICA defined types.

Signed-off-by: Lv Zheng <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/acpi_ipmi.c | 16 ++--
drivers/acpi/osl.c | 224 ++++++++++++++++++++++++++++++++++++++++++++++
include/acpi/acpi_bus.h | 5 ++
3 files changed, 235 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 5f8f495..2a09156 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -539,20 +539,18 @@ out_ref:
static int __init acpi_ipmi_init(void)
{
int result = 0;
- acpi_status status;

if (acpi_disabled)
return result;

mutex_init(&driver_data.ipmi_lock);

- status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
- ACPI_ADR_SPACE_IPMI,
- &acpi_ipmi_space_handler,
- NULL, NULL);
- if (ACPI_FAILURE(status)) {
+ result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
+ &acpi_ipmi_space_handler,
+ NULL, NULL);
+ if (result) {
pr_warn("Can't register IPMI opregion space handle\n");
- return -EINVAL;
+ return result;
}

result = ipmi_smi_watcher_register(&driver_data.bmc_events);
@@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
}
mutex_unlock(&driver_data.ipmi_lock);

- acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
- ACPI_ADR_SPACE_IPMI,
- &acpi_ipmi_space_handler);
+ acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
}

module_init(acpi_ipmi_init);
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 6ab2c35..8398e51 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
static struct workqueue_struct *kacpi_notify_wq;
static struct workqueue_struct *kacpi_hotplug_wq;

+struct acpi_region {
+ unsigned long flags;
+#define ACPI_REGION_DEFAULT 0x01
+#define ACPI_REGION_INSTALLED 0x02
+#define ACPI_REGION_REGISTERED 0x04
+#define ACPI_REGION_UNREGISTERING 0x08
+#define ACPI_REGION_INSTALLING 0x10
+ /*
+ * NOTE: Upgrading All Region Handlers
+ * This flag is only used during the period where not all of the
+ * region handers are upgraded to the new interfaces.
+ */
+#define ACPI_REGION_MANAGED 0x80
+ acpi_adr_space_handler handler;
+ acpi_adr_space_setup setup;
+ void *context;
+ /* Invoking references */
+ atomic_t refcnt;
+};
+
+static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS] = {
+ [ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
+ .flags = ACPI_REGION_DEFAULT,
+ },
+ [ACPI_ADR_SPACE_SYSTEM_IO] = {
+ .flags = ACPI_REGION_DEFAULT,
+ },
+ [ACPI_ADR_SPACE_PCI_CONFIG] = {
+ .flags = ACPI_REGION_DEFAULT,
+ },
+ [ACPI_ADR_SPACE_IPMI] = {
+ .flags = ACPI_REGION_MANAGED,
+ },
+};
+static DEFINE_MUTEX(acpi_mutex_region);
+
/*
* This list of permanent mappings is for memory that may be accessed from
* interrupt context, where we can't do the ioremap().
@@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle, u32 type, void *context,
kfree(hp_work);
}
EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
+
+static bool acpi_region_managed(struct acpi_region *rgn)
+{
+ /*
+ * NOTE: Default and Managed
+ * We only need to avoid region management on the regions managed
+ * by ACPICA (ACPI_REGION_DEFAULT). Currently, we need additional
+ * check as many operation region handlers are not upgraded, so
+ * only those known to be safe are managed (ACPI_REGION_MANAGED).
+ */
+ return !(rgn->flags & ACPI_REGION_DEFAULT) &&
+ (rgn->flags & ACPI_REGION_MANAGED);
+}
+
+static bool acpi_region_callable(struct acpi_region *rgn)
+{
+ return (rgn->flags & ACPI_REGION_REGISTERED) &&
+ !(rgn->flags & ACPI_REGION_UNREGISTERING);
+}
+
+static acpi_status
+acpi_region_default_handler(u32 function,
+ acpi_physical_address address,
+ u32 bit_width, u64 *value,
+ void *handler_context, void *region_context)
+{
+ acpi_adr_space_handler handler;
+ struct acpi_region *rgn = (struct acpi_region *)handler_context;
+ void *context;
+ acpi_status status = AE_NOT_EXIST;
+
+ mutex_lock(&acpi_mutex_region);
+ if (!acpi_region_callable(rgn) || !rgn->handler) {
+ mutex_unlock(&acpi_mutex_region);
+ return status;
+ }
+
+ atomic_inc(&rgn->refcnt);
+ handler = rgn->handler;
+ context = rgn->context;
+ mutex_unlock(&acpi_mutex_region);
+
+ status = handler(function, address, bit_width, value, context,
+ region_context);
+ atomic_dec(&rgn->refcnt);
+
+ return status;
+}
+
+static acpi_status
+acpi_region_default_setup(acpi_handle handle, u32 function,
+ void *handler_context, void **region_context)
+{
+ acpi_adr_space_setup setup;
+ struct acpi_region *rgn = (struct acpi_region *)handler_context;
+ void *context;
+ acpi_status status = AE_OK;
+
+ mutex_lock(&acpi_mutex_region);
+ if (!acpi_region_callable(rgn) || !rgn->setup) {
+ mutex_unlock(&acpi_mutex_region);
+ return status;
+ }
+
+ atomic_inc(&rgn->refcnt);
+ setup = rgn->setup;
+ context = rgn->context;
+ mutex_unlock(&acpi_mutex_region);
+
+ status = setup(handle, function, context, region_context);
+ atomic_dec(&rgn->refcnt);
+
+ return status;
+}
+
+static int __acpi_install_region(struct acpi_region *rgn,
+ acpi_adr_space_type space_id)
+{
+ int res = 0;
+ acpi_status status;
+ int installing = 0;
+
+ mutex_lock(&acpi_mutex_region);
+ if (rgn->flags & ACPI_REGION_INSTALLED)
+ goto out_lock;
+ if (rgn->flags & ACPI_REGION_INSTALLING) {
+ res = -EBUSY;
+ goto out_lock;
+ }
+
+ installing = 1;
+ rgn->flags |= ACPI_REGION_INSTALLING;
+ status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT, space_id,
+ acpi_region_default_handler,
+ acpi_region_default_setup,
+ rgn);
+ rgn->flags &= ~ACPI_REGION_INSTALLING;
+ if (ACPI_FAILURE(status))
+ res = -EINVAL;
+ else
+ rgn->flags |= ACPI_REGION_INSTALLED;
+
+out_lock:
+ mutex_unlock(&acpi_mutex_region);
+ if (installing) {
+ if (res)
+ pr_err("Failed to install region %d\n", space_id);
+ else
+ pr_info("Region %d installed\n", space_id);
+ }
+ return res;
+}
+
+int acpi_register_region(acpi_adr_space_type space_id,
+ acpi_adr_space_handler handler,
+ acpi_adr_space_setup setup, void *context)
+{
+ int res;
+ struct acpi_region *rgn;
+
+ if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
+ return -EINVAL;
+
+ rgn = &acpi_regions[space_id];
+ if (!acpi_region_managed(rgn))
+ return -EINVAL;
+
+ res = __acpi_install_region(rgn, space_id);
+ if (res)
+ return res;
+
+ mutex_lock(&acpi_mutex_region);
+ if (rgn->flags & ACPI_REGION_REGISTERED) {
+ mutex_unlock(&acpi_mutex_region);
+ return -EBUSY;
+ }
+
+ rgn->handler = handler;
+ rgn->setup = setup;
+ rgn->context = context;
+ rgn->flags |= ACPI_REGION_REGISTERED;
+ atomic_set(&rgn->refcnt, 1);
+ mutex_unlock(&acpi_mutex_region);
+
+ pr_info("Region %d registered\n", space_id);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(acpi_register_region);
+
+void acpi_unregister_region(acpi_adr_space_type space_id)
+{
+ struct acpi_region *rgn;
+
+ if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
+ return;
+
+ rgn = &acpi_regions[space_id];
+ if (!acpi_region_managed(rgn))
+ return;
+
+ mutex_lock(&acpi_mutex_region);
+ if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
+ mutex_unlock(&acpi_mutex_region);
+ return;
+ }
+ if (rgn->flags & ACPI_REGION_UNREGISTERING) {
+ mutex_unlock(&acpi_mutex_region);
+ return;
+ }
+
+ rgn->flags |= ACPI_REGION_UNREGISTERING;
+ rgn->handler = NULL;
+ rgn->setup = NULL;
+ rgn->context = NULL;
+ mutex_unlock(&acpi_mutex_region);
+
+ while (atomic_read(&rgn->refcnt) > 1)
+ schedule_timeout_uninterruptible(usecs_to_jiffies(5));
+ atomic_dec(&rgn->refcnt);
+
+ mutex_lock(&acpi_mutex_region);
+ rgn->flags &= ~(ACPI_REGION_REGISTERED | ACPI_REGION_UNREGISTERING);
+ mutex_unlock(&acpi_mutex_region);
+
+ pr_info("Region %d unregistered\n", space_id);
+}
+EXPORT_SYMBOL_GPL(acpi_unregister_region);
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index a2c2fbb..15fad0d 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void *bus) { return 0; }

#endif /* CONFIG_ACPI */

+int acpi_register_region(acpi_adr_space_type space_id,
+ acpi_adr_space_handler handler,
+ acpi_adr_space_setup setup, void *context);
+void acpi_unregister_region(acpi_adr_space_type space_id);
+
#endif /*__ACPI_BUS_H__*/
--
1.7.10

2013-07-23 08:10:48

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 10/13] ACPI/IPMI: Cleanup some inclusion codes

This is a trivial patch:
1. Deletes several useless header inclusions.
2. Kernel codes should always include <linux/acpi.h> instead of
<acpi/acpi_bus.h> or <acpi/acpi_drivers.h> where many conditional
declarations are handled.

Signed-off-by: Lv Zheng <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/acpi_ipmi.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 2d31003..e56b1f8 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -24,22 +24,9 @@
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

-#include <linux/kernel.h>
#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/types.h>
-#include <linux/delay.h>
-#include <linux/proc_fs.h>
-#include <linux/seq_file.h>
-#include <linux/interrupt.h>
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <linux/io.h>
-#include <acpi/acpi_bus.h>
-#include <acpi/acpi_drivers.h>
+#include <linux/acpi.h>
#include <linux/ipmi.h>
-#include <linux/device.h>
-#include <linux/pnp.h>
#include <linux/spinlock.h>

MODULE_AUTHOR("Zhao Yakui");
--
1.7.10

2013-07-23 08:10:52

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 12/13] Testing: Add module load/unload test suite

This patch contains two test scripts for module load/unload race testing.

Follow the following steps to use this test suite:
1. Run several instances invoking endless_cat.sh to access the sysfs files
that exported by the module.
2. Run endless_mod.sh to load/unload the module frequently to see if races
will happen.

Signed-off-by: Lv Zheng <[email protected]>
---
tools/testing/module-unloading/endless_cat.sh | 32 ++++++++++
tools/testing/module-unloading/endless_mod.sh | 81 +++++++++++++++++++++++++
2 files changed, 113 insertions(+)
create mode 100755 tools/testing/module-unloading/endless_cat.sh
create mode 100755 tools/testing/module-unloading/endless_mod.sh

diff --git a/tools/testing/module-unloading/endless_cat.sh b/tools/testing/module-unloading/endless_cat.sh
new file mode 100755
index 0000000..72e035c
--- /dev/null
+++ b/tools/testing/module-unloading/endless_cat.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+fatal() {
+ echo $1
+ exit 1
+}
+
+usage() {
+ echo "Usage: `basename $0` <file>"
+ echo "Where:"
+ echo " file: file to cat"
+}
+
+fatal_usage() {
+ usage
+ exit 1
+}
+
+if [ "x$1" = "x" ]; then
+ echo "Missing <file> paraemter."
+ fatal_usage
+fi
+if [ ! -f $1 ]; then
+ echo "$1 is not an accessible file."
+ fatal_usage
+fi
+
+while :
+do
+ cat $1
+ echo "-----------------------------------"
+done
diff --git a/tools/testing/module-unloading/endless_mod.sh b/tools/testing/module-unloading/endless_mod.sh
new file mode 100755
index 0000000..359b0c0
--- /dev/null
+++ b/tools/testing/module-unloading/endless_mod.sh
@@ -0,0 +1,81 @@
+#!/bin/sh
+
+fatal() {
+ echo $1
+ exit 1
+}
+
+usage() {
+ echo "Usage: `basename $0` [-t second] <module>"
+ echo "Where:"
+ echo " second: seconds to sleep between module actions"
+ echo " module: name of module to test"
+}
+
+fatal_usage() {
+ usage
+ exit 1
+}
+
+SLEEPSEC=10
+
+while getopts "t:" opt
+do
+ case $opt in
+ t) SLEEPSEC=$OPTARG;;
+ ?) echo "Invalid argument $opt"
+ fatal_usage;;
+ esac
+done
+shift $(($OPTIND - 1))
+
+if [ "x$1" = "x" ]; then
+ echo "Missing <module> paraemter."
+ fatal_usage
+fi
+
+find_module() {
+ curr_modules=`lsmod | cut -d " " -f1`
+
+ for m in $curr_modules; do
+ if [ "x$m" = "x$1" ]; then
+ return 0
+ fi
+ done
+ return 1
+}
+
+remove_module() {
+ while :
+ do
+ find_module $1
+ if [ $? -eq 0 ]; then
+ rmmod $1
+ echo "Removing $1 ..."
+ else
+ break
+ fi
+ done
+}
+
+insert_module() {
+ while :
+ do
+ find_module $1
+ if [ ! $? -eq 0 ]; then
+ modprobe $1
+ echo "Inserting $1 ..."
+ else
+ break
+ fi
+ done
+}
+
+while :
+do
+ echo "-----------------------------------"
+ insert_module $1
+ sleep $SLEEPSEC
+ remove_module $1
+ sleep $SLEEPSEC
+done
--
1.7.10

2013-07-23 08:11:06

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 13/13] ACPI/IPMI: Add IPMI operation region test device driver

This patch is only used for test purpose and should not be merged by any
public Linux kernel repositories.

This patch contains one driver that can drive a fake test device accessing
IPMI operation region fields.

Signed-off-by: Lv Zheng <[email protected]>
---
drivers/acpi/Kconfig | 68 +++++++++++++
drivers/acpi/Makefile | 1 +
drivers/acpi/ipmi_test.c | 254 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 323 insertions(+)
create mode 100644 drivers/acpi/ipmi_test.c

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index d129869..e3dd3fd 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -377,6 +377,74 @@ config ACPI_BGRT
data from the firmware boot splash. It will appear under
/sys/firmware/acpi/bgrt/ .

+config ACPI_IPMI_TEST
+ tristate "IPMI operation region tester"
+ help
+ This is a test device written for such fake ACPI namespace device.
+ Device (PMIT)
+ {
+ Name (_HID, "ZETA0000") // _HID: Hardware ID
+ Name (_STA, 0x0F) // _STA: Status
+ OperationRegion (SYSI, IPMI, 0x0600, 0x0100)
+ Field (SYSI, BufferAcc, Lock, Preserve)
+ {
+ AccessAs (BufferAcc, 0x01),
+ Offset (0x01),
+ GDIC, 8, // Get Device ID Command
+ }
+ Method (GDIM, 0, NotSerialized) // GDIM: Get Device ID Method
+ {
+ Name (GDIR, Package (0x08)
+ {
+ 0x00,
+ 0x00,
+ 0x0000,
+ 0x00,
+ 0x00,
+ Buffer (0x03) {0x00, 0x00, 0x00},
+ Buffer (0x02) {0x00, 0x00},
+ 0x00000000
+ })
+ Name (BUFF, Buffer (0x42) {})
+ CreateByteField (BUFF, 0x00, STAT)
+ CreateByteField (BUFF, 0x01, LENG)
+ CreateByteField (BUFF, 0x02, CMPC)
+ CreateByteField (BUFF, 0x03, DID)
+ CreateByteField (BUFF, 0x04, DREV)
+ CreateWordField (BUFF, 0x05, FREV)
+ CreateByteField (BUFF, 0x07, SREV)
+ CreateByteField (BUFF, 0x08, ADS)
+ CreateByteField (BUFF, 0x09, VID0)
+ CreateByteField (BUFF, 0x0A, VID1)
+ CreateByteField (BUFF, 0x0B, VID2)
+ CreateByteField (BUFF, 0x0C, PID0)
+ CreateByteField (BUFF, 0x0D, PID1)
+ CreateDWordField (BUFF, 0x0E, AFRI)
+ Store (0x00, LENG)
+ Store (Store (BUFF, GDIC), BUFF)
+ If (LAnd (LEqual (STAT, 0x00), LEqual (CMPC, 0x00)))
+ {
+ Name (VBUF, Buffer (0x03) { 0x00, 0x00, 0x00 })
+ Name (PBUF, Buffer (0x02) { 0x00, 0x00 })
+ Store (DID, Index (GDIR, 0x00))
+ Store (DREV, Index (GDIR, 0x01))
+ Store (FREV, Index (GDIR, 0x02))
+ Store (SREV, Index (GDIR, 0x03))
+ Store (ADS, Index (GDIR, 0x04))
+ Store (VID0, Index (VBUF, 0x00))
+ Store (VID1, Index (VBUF, 0x01))
+ Store (VID2, Index (VBUF, 0x02))
+ Store (VBUF, Index (GDIR, 0x05))
+ Store (PID0, Index (PBUF, 0x00))
+ Store (PID1, Index (PBUF, 0x01))
+ Store (PBUF, Index (GDIR, 0x06))
+ Store (AFRI, Index (GDIR, 0x07))
+ }
+ Return (GDIR)
+ }
+ }
+ It is for validation purpose, only calls "Get Device ID" command.
+
source "drivers/acpi/apei/Kconfig"

endif # ACPI
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 81dbeb8..1476623 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -74,6 +74,7 @@ obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o
obj-$(CONFIG_ACPI_CUSTOM_METHOD)+= custom_method.o
obj-$(CONFIG_ACPI_BGRT) += bgrt.o
obj-$(CONFIG_ACPI_I2C) += acpi_i2c.o
+obj-$(CONFIG_ACPI_IPMI_TEST) += ipmi_test.o

# processor has its own "processor." module_param namespace
processor-y := processor_driver.o processor_throttling.o
diff --git a/drivers/acpi/ipmi_test.c b/drivers/acpi/ipmi_test.c
new file mode 100644
index 0000000..5d144e4
--- /dev/null
+++ b/drivers/acpi/ipmi_test.c
@@ -0,0 +1,254 @@
+/*
+ * An IPMI operation region tester driver
+ *
+ * Copyright (C) 2013 Intel Corporation
+ * Author: Lv Zheng <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/module.h>
+#include <linux/acpi.h>
+
+#define ACPI_IPMI_TEST_NAME "ipmi_test"
+ACPI_MODULE_NAME(ACPI_IPMI_TEST_NAME);
+
+#define ACPI_IPMI_TEST_DEVICE "IPMI Test"
+#define ACPI_IPMI_TEST_CLASS "ipmi_tester"
+
+static const struct acpi_device_id acpi_ipmi_test_ids[] = {
+ {"ZETA0000", 0},
+ {"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, acpi_ipmi_test_ids);
+
+struct acpi_ipmi_device_id {
+ u64 device_id;
+ u64 device_rev;
+ u64 firmware_rev;
+ u64 ipmi_version;
+ u64 additional_dev_support;
+ u8 *vendor_id;
+ u8 *product_id;
+ u64 aux_firm_rev_info;
+ u8 extra_buf[5];
+} __packed;
+
+struct acpi_ipmi_tester {
+ struct acpi_device *adev;
+ acpi_bus_id name;
+ struct acpi_ipmi_device_id device_id;
+ int registered_group;
+};
+
+#define ipmi_err(tester, fmt, ...) \
+ dev_err(&(tester)->adev->dev, fmt, ##__VA_ARGS__)
+#define ipmi_info(tester, fmt, ...) \
+ dev_info(&(tester)->adev->dev, fmt, ##__VA_ARGS__)
+#define IPMI_ACPI_HANDLE(tester) ((tester)->adev->handle)
+
+static int acpi_ipmi_update_device_id(struct acpi_ipmi_tester *tester)
+{
+ int res = 0;
+ acpi_status status;
+ struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+ struct acpi_buffer format = { sizeof("NNNNNBBN"), "NNNNNBBN" };
+ struct acpi_buffer device_id = { 0, NULL };
+ union acpi_object *did;
+
+ status = acpi_evaluate_object(IPMI_ACPI_HANDLE(tester), "GDIM", NULL,
+ &buffer);
+ if (ACPI_FAILURE(status) || !buffer.pointer) {
+ ipmi_err(tester, "Evaluating GDIM, status - %d\n", status);
+ return -ENODEV;
+ }
+
+ did = buffer.pointer;
+ if (did->type != ACPI_TYPE_PACKAGE || did->package.count != 8) {
+ ipmi_err(tester, "Invalid GDIM data, type - %d, count - %d\n",
+ did->type, did->package.count);
+ res = -EFAULT;
+ goto err_buf;
+ }
+
+ device_id.length = sizeof(struct acpi_ipmi_device_id);
+ device_id.pointer = &tester->device_id;
+
+ status = acpi_extract_package(did, &format, &device_id);
+ if (ACPI_FAILURE(status)) {
+ ipmi_err(tester, "Invalid GDIM data\n");
+ res = -EFAULT;
+ goto err_buf;
+ }
+
+err_buf:
+ kfree(buffer.pointer);
+ return res;
+}
+
+static ssize_t show_device_id(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct acpi_device *adev = to_acpi_device(device);
+ struct acpi_ipmi_tester *tester = adev->driver_data;
+
+ acpi_ipmi_update_device_id(tester);
+ return sprintf(buf, "%llu\n", tester->device_id.device_id);
+}
+
+static ssize_t show_device_rev(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct acpi_device *adev = to_acpi_device(device);
+ struct acpi_ipmi_tester *tester = adev->driver_data;
+
+ acpi_ipmi_update_device_id(tester);
+ return sprintf(buf, "%llu\n", tester->device_id.device_rev);
+}
+
+static ssize_t show_firmware_rev(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct acpi_device *adev = to_acpi_device(device);
+ struct acpi_ipmi_tester *tester = adev->driver_data;
+
+ acpi_ipmi_update_device_id(tester);
+ return sprintf(buf, "%llu\n", tester->device_id.firmware_rev);
+}
+
+static ssize_t show_ipmi_version(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct acpi_device *adev = to_acpi_device(device);
+ struct acpi_ipmi_tester *tester = adev->driver_data;
+
+ acpi_ipmi_update_device_id(tester);
+ return sprintf(buf, "%llu\n", tester->device_id.ipmi_version);
+}
+
+static ssize_t show_vendor_id(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct acpi_device *adev = to_acpi_device(device);
+ struct acpi_ipmi_tester *tester = adev->driver_data;
+
+ acpi_ipmi_update_device_id(tester);
+ return sprintf(buf, "%02x %02x %02x\n",
+ tester->device_id.vendor_id[0],
+ tester->device_id.vendor_id[1],
+ tester->device_id.vendor_id[2]);
+}
+
+static ssize_t show_product_id(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct acpi_device *adev = to_acpi_device(device);
+ struct acpi_ipmi_tester *tester = adev->driver_data;
+
+ acpi_ipmi_update_device_id(tester);
+ return sprintf(buf, "%02x %02x\n",
+ tester->device_id.product_id[0],
+ tester->device_id.product_id[1]);
+}
+
+static DEVICE_ATTR(device_id, S_IRUGO, show_device_id, NULL);
+static DEVICE_ATTR(device_rev, S_IRUGO, show_device_rev, NULL);
+static DEVICE_ATTR(firmware_rev, S_IRUGO, show_firmware_rev, NULL);
+static DEVICE_ATTR(ipmi_version, S_IRUGO, show_ipmi_version, NULL);
+static DEVICE_ATTR(vendor_id, S_IRUGO, show_vendor_id, NULL);
+static DEVICE_ATTR(product_id, S_IRUGO, show_product_id, NULL);
+
+static struct attribute *acpi_ipmi_test_attrs[] = {
+ &dev_attr_device_id.attr,
+ &dev_attr_device_rev.attr,
+ &dev_attr_firmware_rev.attr,
+ &dev_attr_ipmi_version.attr,
+ &dev_attr_vendor_id.attr,
+ &dev_attr_product_id.attr,
+ NULL,
+};
+
+static struct attribute_group acpi_ipmi_test_group = {
+ .attrs = acpi_ipmi_test_attrs,
+};
+
+static int acpi_ipmi_test_add(struct acpi_device *device)
+{
+ struct acpi_ipmi_tester *tester;
+
+ if (!device)
+ return -EINVAL;
+
+ tester = kzalloc(sizeof(struct acpi_ipmi_tester), GFP_KERNEL);
+ if (!tester)
+ return -ENOMEM;
+
+ tester->adev = device;
+ strcpy(acpi_device_name(device), ACPI_IPMI_TEST_DEVICE);
+ strcpy(acpi_device_class(device), ACPI_IPMI_TEST_CLASS);
+ device->driver_data = tester;
+ if (sysfs_create_group(&device->dev.kobj, &acpi_ipmi_test_group) == 0)
+ tester->registered_group = 1;
+
+ return 0;
+}
+
+static int acpi_ipmi_test_remove(struct acpi_device *device)
+{
+ struct acpi_ipmi_tester *tester;
+
+ if (!device || !acpi_driver_data(device))
+ return -EINVAL;
+
+ tester = acpi_driver_data(device);
+
+ if (tester->registered_group)
+ sysfs_remove_group(&device->dev.kobj, &acpi_ipmi_test_group);
+
+ kfree(tester);
+ return 0;
+}
+
+static struct acpi_driver acpi_ipmi_test_driver = {
+ .name = "ipmi_test",
+ .class = ACPI_IPMI_TEST_CLASS,
+ .ids = acpi_ipmi_test_ids,
+ .ops = {
+ .add = acpi_ipmi_test_add,
+ .remove = acpi_ipmi_test_remove,
+ },
+};
+
+static int __init acpi_ipmi_test_init(void)
+{
+ return acpi_bus_register_driver(&acpi_ipmi_test_driver);
+}
+module_init(acpi_ipmi_test_init);
+
+static void __exit acpi_ipmi_test_exit(void)
+{
+ acpi_bus_unregister_driver(&acpi_ipmi_test_driver);
+}
+module_exit(acpi_ipmi_test_exit);
+
+MODULE_AUTHOR("Lv Zheng <[email protected]>");
+MODULE_DESCRIPTION("ACPI IPMI operation region tester driver");
+MODULE_LICENSE("GPL");
--
1.7.10

2013-07-23 08:12:00

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 11/13] ACPI/IPMI: Cleanup some Kconfig codes

This is a trivial patch:
1. Deletes duplicate Kconfig dependency as there is "if IPMI_HANDLER"
around "IPMI_SI".

Signed-off-by: Lv Zheng <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 3278a21..d129869 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -181,9 +181,10 @@ config ACPI_PROCESSOR

To compile this driver as a module, choose M here:
the module will be called processor.
+
config ACPI_IPMI
tristate "IPMI"
- depends on IPMI_SI && IPMI_HANDLER
+ depends on IPMI_SI
default n
help
This driver enables the ACPI to access the BMC controller. And it
--
1.7.10

2013-07-23 08:10:26

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 05/13] ACPI/IPMI: Fix issue caused by the per-device registration of the IPMI operation region handler

It is found on a real machine, in its ACPI namespace, the IPMI
OperationRegions (in the ACPI000D - ACPI power meter) are not defined under
the IPMI system interface device (the IPI0001 with KCS type returned from
_IFT control method):
Device (PMI0)
{
Name (_HID, "ACPI000D") // _HID: Hardware ID
OperationRegion (SYSI, IPMI, 0x0600, 0x0100)
Field (SYSI, BufferAcc, Lock, Preserve)
{
AccessAs (BufferAcc, 0x01),
Offset (0x58),
SCMD, 8,
GCMD, 8
}

OperationRegion (POWR, IPMI, 0x3000, 0x0100)
Field (POWR, BufferAcc, Lock, Preserve)
{
AccessAs (BufferAcc, 0x01),
Offset (0xB3),
GPMM, 8
}
}

Device (PCI0)
{
Device (ISA)
{
Device (NIPM)
{
Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID
Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type
{
Return (0x01)
}
}
}
}
Current ACPI_IPMI code registers IPMI operation region handler on a
per-device basis, so that for above namespace, the IPMI operation region
handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus
when an IPMI operation region field of \PMI0 is accessed, there are errors
reported on such platform:
ACPI Error: No handlers for Region [IPMI]
ACPI Error: Region IPMI(7) has no handler
The solution is to install IPMI operation region handler from root node so
that every object that defines IPMI OperationRegion can get an address
space handler registered.

When an IPMI operation region field is accessed, the Network Function
(0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are
passed to the operation region handler, there is no system interface
specified by the BIOS. The patch tries to select one system interface by
monitoring the system interface notification. IPMI messages passed from
the ACPI codes are sent to this selected global IPMI system interface.

Known issues:
- How to select the IPMI system interface:
Currently, the ACPI_IPMI always selects the first registered one with the
ACPI handle set (i.e., defined in the ACPI namespace). It's hard to
determine the selection when there are multiple IPMI system interfaces
defined in the ACPI namespace.
According to the IPMI specification:
A BMC device may make available multiple system interfaces, but only one
management controller is allowed to be 'active' BMC that provides BMC
functionality for the system (in case of a 'partitioned' system, there
can be only one active BMC per partition). Only the system interface(s)
for the active BMC allowed to respond to the 'Get Device Id' command.
According to the ipmi_si desigin:
The ipmi_si registeration notifications can only happen after a
successful "Get Device ID" command.
Thus it should be OK for non-partitioned systems to do such selection.
But we do not have too much knowledges on 'partitioned' systems.
- Lack of smi_gone()/new_smi() testability:
It is not possible to do module(ipmi_si) load/unload test, and I can't
find any multiple IPMI system interfaces platforms available for testing.
There might be issues in the untested code path.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=46741
Signed-off-by: Lv Zheng <[email protected]>
Cc: Zhao Yakui <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/acpi_ipmi.c | 111 +++++++++++++++++++++++-----------------------
1 file changed, 55 insertions(+), 56 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index cbf25e0..5f8f495 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -46,7 +46,8 @@ MODULE_AUTHOR("Zhao Yakui");
MODULE_DESCRIPTION("ACPI IPMI Opregion driver");
MODULE_LICENSE("GPL");

-#define IPMI_FLAGS_HANDLER_INSTALL 0
+#undef PREFIX
+#define PREFIX "ACPI: IPMI: "

#define ACPI_IPMI_OK 0
#define ACPI_IPMI_TIMEOUT 0x10
@@ -66,7 +67,6 @@ struct acpi_ipmi_device {
ipmi_user_t user_interface;
int ipmi_ifnum; /* IPMI interface number */
long curr_msgid;
- unsigned long flags;
struct ipmi_smi_info smi_data;
atomic_t refcnt;
};
@@ -76,6 +76,14 @@ struct ipmi_driver_data {
struct ipmi_smi_watcher bmc_events;
struct ipmi_user_hndl ipmi_hndlrs;
struct mutex ipmi_lock;
+ /*
+ * NOTE: IPMI System Interface Selection
+ * There is no system interface specified by the IPMI operation
+ * region access. We try to select one system interface with ACPI
+ * handle set. IPMI messages passed from the ACPI codes are sent
+ * to this selected global IPMI system interface.
+ */
+ struct acpi_ipmi_device *selected_smi;
};

struct acpi_ipmi_msg {
@@ -109,8 +117,6 @@ struct acpi_ipmi_buffer {
static void ipmi_register_bmc(int iface, struct device *dev);
static void ipmi_bmc_gone(int iface);
static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
-static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
-static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);

static struct ipmi_driver_data driver_data = {
.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
@@ -153,7 +159,6 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
return NULL;
}
ipmi_device->user_interface = user;
- ipmi_install_space_handler(ipmi_device);

return ipmi_device;
}
@@ -168,7 +173,6 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)

static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
{
- ipmi_remove_space_handler(ipmi_device);
ipmi_destroy_user(ipmi_device->user_interface);
put_device(ipmi_device->smi_data.dev);
kfree(ipmi_device);
@@ -180,22 +184,15 @@ static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
ipmi_dev_release(ipmi_device);
}

-static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
+static struct acpi_ipmi_device *acpi_ipmi_get_selected_smi(void)
{
- int dev_found = 0;
struct acpi_ipmi_device *ipmi_device;

mutex_lock(&driver_data.ipmi_lock);
- list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
- if (ipmi_device->ipmi_ifnum == iface) {
- dev_found = 1;
- acpi_ipmi_dev_get(ipmi_device);
- break;
- }
- }
+ ipmi_device = acpi_ipmi_dev_get(driver_data.selected_smi);
mutex_unlock(&driver_data.ipmi_lock);

- return dev_found ? ipmi_device : NULL;
+ return ipmi_device;
}

static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
@@ -410,6 +407,9 @@ static void ipmi_register_bmc(int iface, struct device *dev)
goto err_lock;
}

+ if (!driver_data.selected_smi)
+ driver_data.selected_smi = acpi_ipmi_dev_get(ipmi_device);
+
list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
mutex_unlock(&driver_data.ipmi_lock);
put_device(smi_data.dev);
@@ -426,22 +426,34 @@ err_ref:
static void ipmi_bmc_gone(int iface)
{
int dev_found = 0;
- struct acpi_ipmi_device *ipmi_device;
+ struct acpi_ipmi_device *ipmi_gone, *ipmi_new;

mutex_lock(&driver_data.ipmi_lock);
- list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
- if (ipmi_device->ipmi_ifnum == iface) {
+ list_for_each_entry(ipmi_gone, &driver_data.ipmi_devices, head) {
+ if (ipmi_gone->ipmi_ifnum == iface) {
dev_found = 1;
break;
}
}
- if (dev_found)
- list_del(&ipmi_device->head);
+ if (dev_found) {
+ list_del(&ipmi_gone->head);
+ if (driver_data.selected_smi == ipmi_gone) {
+ acpi_ipmi_dev_put(ipmi_gone);
+ driver_data.selected_smi = NULL;
+ }
+ }
+ if (!driver_data.selected_smi &&
+ !list_empty(&driver_data.ipmi_devices)) {
+ ipmi_new = list_first_entry(&driver_data.ipmi_devices,
+ struct acpi_ipmi_device,
+ head);
+ driver_data.selected_smi = acpi_ipmi_dev_get(ipmi_new);
+ }
mutex_unlock(&driver_data.ipmi_lock);

if (dev_found) {
- ipmi_flush_tx_msg(ipmi_device);
- acpi_ipmi_dev_put(ipmi_device);
+ ipmi_flush_tx_msg(ipmi_gone);
+ acpi_ipmi_dev_put(ipmi_gone);
}
}

@@ -467,7 +479,6 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
void *handler_context, void *region_context)
{
struct acpi_ipmi_msg *tx_msg;
- int iface = (long)handler_context;
struct acpi_ipmi_device *ipmi_device;
int err, rem_time;
acpi_status status;
@@ -482,7 +493,7 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
if ((function & ACPI_IO_MASK) == ACPI_READ)
return AE_TYPE;

- ipmi_device = acpi_ipmi_get_targeted_smi(iface);
+ ipmi_device = acpi_ipmi_get_selected_smi();
if (!ipmi_device)
return AE_NOT_EXIST;

@@ -525,48 +536,28 @@ out_ref:
return status;
}

-static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi)
-{
- if (!test_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags))
- return;
-
- acpi_remove_address_space_handler(ipmi->handle,
- ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler);
-
- clear_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags);
-}
-
-static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
+static int __init acpi_ipmi_init(void)
{
+ int result = 0;
acpi_status status;

- if (test_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags))
- return 0;
+ if (acpi_disabled)
+ return result;

- status = acpi_install_address_space_handler(ipmi->handle,
+ mutex_init(&driver_data.ipmi_lock);
+
+ status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
ACPI_ADR_SPACE_IPMI,
&acpi_ipmi_space_handler,
- NULL, (void *)((long)ipmi->ipmi_ifnum));
+ NULL, NULL);
if (ACPI_FAILURE(status)) {
- struct pnp_dev *pnp_dev = ipmi->pnp_dev;
- dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
- "handle\n");
+ pr_warn("Can't register IPMI opregion space handle\n");
return -EINVAL;
}
- set_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags);
- return 0;
-}
-
-static int __init acpi_ipmi_init(void)
-{
- int result = 0;
-
- if (acpi_disabled)
- return result;
-
- mutex_init(&driver_data.ipmi_lock);

result = ipmi_smi_watcher_register(&driver_data.bmc_events);
+ if (result)
+ pr_err("Can't register IPMI system interface watcher\n");

return result;
}
@@ -592,6 +583,10 @@ static void __exit acpi_ipmi_exit(void)
struct acpi_ipmi_device,
head);
list_del(&ipmi_device->head);
+ if (ipmi_device == driver_data.selected_smi) {
+ acpi_ipmi_dev_put(driver_data.selected_smi);
+ driver_data.selected_smi = NULL;
+ }
mutex_unlock(&driver_data.ipmi_lock);

ipmi_flush_tx_msg(ipmi_device);
@@ -600,6 +595,10 @@ static void __exit acpi_ipmi_exit(void)
mutex_lock(&driver_data.ipmi_lock);
}
mutex_unlock(&driver_data.ipmi_lock);
+
+ acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
+ ACPI_ADR_SPACE_IPMI,
+ &acpi_ipmi_space_handler);
}

module_init(acpi_ipmi_init);
--
1.7.10

2013-07-23 08:10:24

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 04/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user

This patch uses reference counting to fix the race caused by the
unprotected ACPI IPMI user.

As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler()
can happen before setting user_interface to NULL and codes after the check
in acpi_ipmi_space_handler() can happen after user_interface becoming NULL,
then the on-going acpi_ipmi_space_handler() still can pass an invalid
acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race
condition is not allowed by the IPMI layer's API design as crash will
happen in ipmi_request_settime().
In IPMI layer, smi_gone()/new_smi() callbacks are protected by
smi_watchers_mutex, thus their invocations are serialized. But as a new
smi can re-use the freed intf_num, it requires that the callback
implementation must not use intf_num as an identification mean or it must
ensure all references to the previous smi are all dropped before exiting
smi_gone() callback. In case of acpi_ipmi module, this means
ipmi_flush_tx_msg() must ensure all on-going IPMI transfers are completed
before exiting ipmi_flush_tx_msg().

This patch follows ipmi_devintf.c design:
1. Invoking ipmi_destroy_user() after the reference count of
acpi_ipmi_device dropping to 0, this matches IPMI layer's API calling
rule on ipmi_destroy_user() and ipmi_request_settime().
2. References of acpi_ipmi_device dropping to 1 means tx_msg related to
this acpi_ipmi_device are all freed, this can be used to implement the
new flushing mechanism. Note complete() must be retried so that the
on-going tx_msg won't block flushing at the point to add tx_msg into
tx_msg_list where reference of acpi_ipmi_device is held. This matches
the IPMI layer's callback rule on smi_gone()/new_smi() serialization.
3. ipmi_flush_tx_msg() is performed after deleting acpi_ipmi_device from
the list so that no new tx_msg can be created after entering flushing
process.
4. The flushing of tx_msg is also moved out of ipmi_lock in this patch.

The forthcoming IPMI operation region handler installation changes also
requires acpi_ipmi_device be handled in the reference counting style.

Authorship is also updated due to this design change.

Signed-off-by: Lv Zheng <[email protected]>
Cc: Zhao Yakui <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/acpi_ipmi.c | 249 +++++++++++++++++++++++++++-------------------
1 file changed, 149 insertions(+), 100 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 527ee43..cbf25e0 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -1,8 +1,9 @@
/*
* acpi_ipmi.c - ACPI IPMI opregion
*
- * Copyright (C) 2010 Intel Corporation
- * Copyright (C) 2010 Zhao Yakui <[email protected]>
+ * Copyright (C) 2010, 2013 Intel Corporation
+ * Author: Zhao Yakui <[email protected]>
+ * Lv Zheng <[email protected]>
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
@@ -67,6 +68,7 @@ struct acpi_ipmi_device {
long curr_msgid;
unsigned long flags;
struct ipmi_smi_info smi_data;
+ atomic_t refcnt;
};

struct ipmi_driver_data {
@@ -107,8 +109,8 @@ struct acpi_ipmi_buffer {
static void ipmi_register_bmc(int iface, struct device *dev);
static void ipmi_bmc_gone(int iface);
static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
-static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device);
-static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device);
+static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
+static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);

static struct ipmi_driver_data driver_data = {
.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
@@ -122,6 +124,80 @@ static struct ipmi_driver_data driver_data = {
},
};

+static struct acpi_ipmi_device *
+ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
+{
+ struct acpi_ipmi_device *ipmi_device;
+ int err;
+ ipmi_user_t user;
+
+ ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
+ if (!ipmi_device)
+ return NULL;
+
+ atomic_set(&ipmi_device->refcnt, 1);
+ INIT_LIST_HEAD(&ipmi_device->head);
+ INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
+ spin_lock_init(&ipmi_device->tx_msg_lock);
+
+ ipmi_device->handle = handle;
+ ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
+ memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
+ ipmi_device->ipmi_ifnum = iface;
+
+ err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
+ ipmi_device, &user);
+ if (err) {
+ put_device(smi_data->dev);
+ kfree(ipmi_device);
+ return NULL;
+ }
+ ipmi_device->user_interface = user;
+ ipmi_install_space_handler(ipmi_device);
+
+ return ipmi_device;
+}
+
+static struct acpi_ipmi_device *
+acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
+{
+ if (ipmi_device)
+ atomic_inc(&ipmi_device->refcnt);
+ return ipmi_device;
+}
+
+static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
+{
+ ipmi_remove_space_handler(ipmi_device);
+ ipmi_destroy_user(ipmi_device->user_interface);
+ put_device(ipmi_device->smi_data.dev);
+ kfree(ipmi_device);
+}
+
+static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
+{
+ if (ipmi_device && atomic_dec_and_test(&ipmi_device->refcnt))
+ ipmi_dev_release(ipmi_device);
+}
+
+static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
+{
+ int dev_found = 0;
+ struct acpi_ipmi_device *ipmi_device;
+
+ mutex_lock(&driver_data.ipmi_lock);
+ list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
+ if (ipmi_device->ipmi_ifnum == iface) {
+ dev_found = 1;
+ acpi_ipmi_dev_get(ipmi_device);
+ break;
+ }
+ }
+ mutex_unlock(&driver_data.ipmi_lock);
+
+ return dev_found ? ipmi_device : NULL;
+}
+
static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
{
struct acpi_ipmi_msg *ipmi_msg;
@@ -228,25 +304,24 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
{
struct acpi_ipmi_msg *tx_msg, *temp;
- int count = HZ / 10;
- struct pnp_dev *pnp_dev = ipmi->pnp_dev;
unsigned long flags;

- spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
- list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
- /* wake up the sleep thread on the Tx msg */
- complete(&tx_msg->tx_complete);
- }
- spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
-
- /* wait for about 100ms to flush the tx message list */
- while (count--) {
- if (list_empty(&ipmi->tx_msg_list))
- break;
- schedule_timeout(1);
+ /*
+ * NOTE: Synchronous Flushing
+ * Wait until refnct dropping to 1 - no other users unless this
+ * context. This function should always be called before
+ * acpi_ipmi_device destruction.
+ */
+ while (atomic_read(&ipmi->refcnt) > 1) {
+ spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
+ list_for_each_entry_safe(tx_msg, temp,
+ &ipmi->tx_msg_list, head) {
+ /* wake up the sleep thread on the Tx msg */
+ complete(&tx_msg->tx_complete);
+ }
+ spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+ schedule_timeout_uninterruptible(msecs_to_jiffies(1));
}
- if (!list_empty(&ipmi->tx_msg_list))
- dev_warn(&pnp_dev->dev, "tx msg list is not NULL\n");
}

static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
@@ -304,22 +379,26 @@ static void ipmi_register_bmc(int iface, struct device *dev)
{
struct acpi_ipmi_device *ipmi_device, *temp;
struct pnp_dev *pnp_dev;
- ipmi_user_t user;
int err;
struct ipmi_smi_info smi_data;
acpi_handle handle;

err = ipmi_get_smi_info(iface, &smi_data);
-
if (err)
return;

- if (smi_data.addr_src != SI_ACPI) {
- put_device(smi_data.dev);
- return;
- }
-
+ if (smi_data.addr_src != SI_ACPI)
+ goto err_ref;
handle = smi_data.addr_info.acpi_info.acpi_handle;
+ if (!handle)
+ goto err_ref;
+ pnp_dev = to_pnp_dev(smi_data.dev);
+
+ ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
+ if (!ipmi_device) {
+ dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
+ goto err_ref;
+ }

mutex_lock(&driver_data.ipmi_lock);
list_for_each_entry(temp, &driver_data.ipmi_devices, head) {
@@ -328,54 +407,42 @@ static void ipmi_register_bmc(int iface, struct device *dev)
* to the device list, don't add it again.
*/
if (temp->handle == handle)
- goto out;
+ goto err_lock;
}

- ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
-
- if (!ipmi_device)
- goto out;
-
- pnp_dev = to_pnp_dev(smi_data.dev);
- ipmi_device->handle = handle;
- ipmi_device->pnp_dev = pnp_dev;
-
- err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
- ipmi_device, &user);
- if (err) {
- dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
- kfree(ipmi_device);
- goto out;
- }
- acpi_add_ipmi_device(ipmi_device);
- ipmi_device->user_interface = user;
- ipmi_device->ipmi_ifnum = iface;
+ list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
mutex_unlock(&driver_data.ipmi_lock);
- memcpy(&ipmi_device->smi_data, &smi_data, sizeof(struct ipmi_smi_info));
+ put_device(smi_data.dev);
return;

-out:
+err_lock:
mutex_unlock(&driver_data.ipmi_lock);
+ ipmi_dev_release(ipmi_device);
+err_ref:
put_device(smi_data.dev);
return;
}

static void ipmi_bmc_gone(int iface)
{
- struct acpi_ipmi_device *ipmi_device, *temp;
+ int dev_found = 0;
+ struct acpi_ipmi_device *ipmi_device;

mutex_lock(&driver_data.ipmi_lock);
- list_for_each_entry_safe(ipmi_device, temp,
- &driver_data.ipmi_devices, head) {
- if (ipmi_device->ipmi_ifnum != iface)
- continue;
-
- acpi_remove_ipmi_device(ipmi_device);
- put_device(ipmi_device->smi_data.dev);
- kfree(ipmi_device);
- break;
+ list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
+ if (ipmi_device->ipmi_ifnum == iface) {
+ dev_found = 1;
+ break;
+ }
}
+ if (dev_found)
+ list_del(&ipmi_device->head);
mutex_unlock(&driver_data.ipmi_lock);
+
+ if (dev_found) {
+ ipmi_flush_tx_msg(ipmi_device);
+ acpi_ipmi_dev_put(ipmi_device);
+ }
}

/* --------------------------------------------------------------------------
@@ -400,7 +467,8 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
void *handler_context, void *region_context)
{
struct acpi_ipmi_msg *tx_msg;
- struct acpi_ipmi_device *ipmi_device = handler_context;
+ int iface = (long)handler_context;
+ struct acpi_ipmi_device *ipmi_device;
int err, rem_time;
acpi_status status;
unsigned long flags;
@@ -414,12 +482,15 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
if ((function & ACPI_IO_MASK) == ACPI_READ)
return AE_TYPE;

- if (!ipmi_device->user_interface)
+ ipmi_device = acpi_ipmi_get_targeted_smi(iface);
+ if (!ipmi_device)
return AE_NOT_EXIST;

tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
- if (!tx_msg)
- return AE_NO_MEMORY;
+ if (!tx_msg) {
+ status = AE_NO_MEMORY;
+ goto out_ref;
+ }

if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
status = AE_TYPE;
@@ -449,6 +520,8 @@ out_list:
spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
out_msg:
kfree(tx_msg);
+out_ref:
+ acpi_ipmi_dev_put(ipmi_device);
return status;
}

@@ -473,7 +546,7 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
status = acpi_install_address_space_handler(ipmi->handle,
ACPI_ADR_SPACE_IPMI,
&acpi_ipmi_space_handler,
- NULL, ipmi);
+ NULL, (void *)((long)ipmi->ipmi_ifnum));
if (ACPI_FAILURE(status)) {
struct pnp_dev *pnp_dev = ipmi->pnp_dev;
dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
@@ -484,36 +557,6 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
return 0;
}

-static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device)
-{
-
- INIT_LIST_HEAD(&ipmi_device->head);
-
- spin_lock_init(&ipmi_device->tx_msg_lock);
- INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
- ipmi_install_space_handler(ipmi_device);
-
- list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
-}
-
-static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device)
-{
- /*
- * If the IPMI user interface is created, it should be
- * destroyed.
- */
- if (ipmi_device->user_interface) {
- ipmi_destroy_user(ipmi_device->user_interface);
- ipmi_device->user_interface = NULL;
- }
- /* flush the Tx_msg list */
- if (!list_empty(&ipmi_device->tx_msg_list))
- ipmi_flush_tx_msg(ipmi_device);
-
- list_del(&ipmi_device->head);
- ipmi_remove_space_handler(ipmi_device);
-}
-
static int __init acpi_ipmi_init(void)
{
int result = 0;
@@ -530,7 +573,7 @@ static int __init acpi_ipmi_init(void)

static void __exit acpi_ipmi_exit(void)
{
- struct acpi_ipmi_device *ipmi_device, *temp;
+ struct acpi_ipmi_device *ipmi_device;

if (acpi_disabled)
return;
@@ -544,11 +587,17 @@ static void __exit acpi_ipmi_exit(void)
* handler and free it.
*/
mutex_lock(&driver_data.ipmi_lock);
- list_for_each_entry_safe(ipmi_device, temp,
- &driver_data.ipmi_devices, head) {
- acpi_remove_ipmi_device(ipmi_device);
- put_device(ipmi_device->smi_data.dev);
- kfree(ipmi_device);
+ while (!list_empty(&driver_data.ipmi_devices)) {
+ ipmi_device = list_first_entry(&driver_data.ipmi_devices,
+ struct acpi_ipmi_device,
+ head);
+ list_del(&ipmi_device->head);
+ mutex_unlock(&driver_data.ipmi_lock);
+
+ ipmi_flush_tx_msg(ipmi_device);
+ acpi_ipmi_dev_put(ipmi_device);
+
+ mutex_lock(&driver_data.ipmi_lock);
}
mutex_unlock(&driver_data.ipmi_lock);
}
--
1.7.10

2013-07-23 08:13:08

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 09/13] ACPI/IPMI: Cleanup some initialization codes

This is a trivial patch.
1. Changes dynamic mutex initialization to static initialization.
2. Removes one acpi_ipmi_init() variable initialization as it is not
needed.

Signed-off-by: Lv Zheng <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/acpi_ipmi.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 7f93ffd..2d31003 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -128,6 +128,7 @@ static struct ipmi_driver_data driver_data = {
.ipmi_hndlrs = {
.ipmi_recv_hndl = ipmi_msg_handler,
},
+ .ipmi_lock = __MUTEX_INITIALIZER(driver_data.ipmi_lock)
};

static struct acpi_ipmi_device *
@@ -583,12 +584,10 @@ out_msg:

static int __init acpi_ipmi_init(void)
{
- int result = 0;
+ int result;

if (acpi_disabled)
- return result;
-
- mutex_init(&driver_data.ipmi_lock);
+ return 0;

result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
&acpi_ipmi_space_handler,
--
1.7.10

2013-07-23 08:14:31

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members

This is a trivial patch:
1. Deletes a member of the acpi_ipmi_device - smi_data which is not
actually used.
2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
by dev_warn() invocations, so changes it to struct device.

Signed-off-by: Lv Zheng <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/acpi_ipmi.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 0ee1ea6..7f93ffd 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -63,11 +63,10 @@ struct acpi_ipmi_device {
struct list_head tx_msg_list;
spinlock_t tx_msg_lock;
acpi_handle handle;
- struct pnp_dev *pnp_dev;
+ struct device *dev;
ipmi_user_t user_interface;
int ipmi_ifnum; /* IPMI interface number */
long curr_msgid;
- struct ipmi_smi_info smi_data;
atomic_t refcnt;
};

@@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {
};

static struct acpi_ipmi_device *
-ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
+ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
{
struct acpi_ipmi_device *ipmi_device;
int err;
@@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
spin_lock_init(&ipmi_device->tx_msg_lock);

ipmi_device->handle = handle;
- ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
- memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
+ ipmi_device->dev = get_device(pdev);
ipmi_device->ipmi_ifnum = iface;

err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
ipmi_device, &user);
if (err) {
- put_device(smi_data->dev);
+ put_device(pdev);
kfree(ipmi_device);
return NULL;
}
@@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
{
ipmi_destroy_user(ipmi_device->user_interface);
- put_device(ipmi_device->smi_data.dev);
+ put_device(ipmi_device->dev);
kfree(ipmi_device);
}

@@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
buffer = (struct acpi_ipmi_buffer *)value;
/* copy the tx message data */
if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
- dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
+ dev_WARN_ONCE(tx_msg->device->dev, true,
"Unexpected request (msg len %d).\n",
buffer->length);
return -EINVAL;
@@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
struct acpi_ipmi_device *ipmi_device = user_msg_data;
int msg_found = 0;
struct acpi_ipmi_msg *tx_msg;
- struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
+ struct device *dev = ipmi_device->dev;
unsigned long flags;

if (msg->user != ipmi_device->user_interface) {
- dev_warn(&pnp_dev->dev,
+ dev_warn(dev,
"Unexpected response is returned. returned user %p, expected user %p\n",
msg->user, ipmi_device->user_interface);
goto out_msg;
@@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);

if (!msg_found) {
- dev_warn(&pnp_dev->dev,
+ dev_warn(dev,
"Unexpected response (msg id %ld) is returned.\n",
msg->msgid);
goto out_msg;
@@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)

/* copy the response data to Rx_data buffer */
if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
- dev_WARN_ONCE(&pnp_dev->dev, true,
+ dev_WARN_ONCE(dev, true,
"Unexpected response (msg len %d).\n",
msg->msg.data_len);
goto out_comp;
@@ -431,7 +429,7 @@ out_msg:
static void ipmi_register_bmc(int iface, struct device *dev)
{
struct acpi_ipmi_device *ipmi_device, *temp;
- struct pnp_dev *pnp_dev;
+ struct device *pdev;
int err;
struct ipmi_smi_info smi_data;
acpi_handle handle;
@@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct device *dev)
handle = smi_data.addr_info.acpi_info.acpi_handle;
if (!handle)
goto err_ref;
- pnp_dev = to_pnp_dev(smi_data.dev);
+ pdev = smi_data.dev;

- ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
+ ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
if (!ipmi_device) {
- dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
+ dev_warn(pdev, "Can't create IPMI user interface\n");
goto err_ref;
}

--
1.7.10

2013-07-23 08:10:11

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers

This patch fixes races caused by unprotected ACPI IPMI transfers.

We can see the following crashes may occur:
1. There is no tx_msg_lock held for iterating tx_msg_list in
ipmi_flush_tx_msg() while it is parellel unlinked on failure in
acpi_ipmi_space_handler() under protection of tx_msg_lock.
2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
while it is parellel accessed in ipmi_flush_tx_msg() and
ipmi_msg_handler().

This patch enhances tx_msg_lock to protect all tx_msg accesses to solve
this issue. Then tx_msg_lock is always held around complete() and tx_msg
accesses.
Calling smp_wmb() before setting msg_done flag so that messages completed
due to flushing will not be handled as 'done' messages while their contents
are not vaild.

Signed-off-by: Lv Zheng <[email protected]>
Cc: Zhao Yakui <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/acpi_ipmi.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index b37c189..527ee43 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
struct acpi_ipmi_msg *tx_msg, *temp;
int count = HZ / 10;
struct pnp_dev *pnp_dev = ipmi->pnp_dev;
+ unsigned long flags;

+ spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
/* wake up the sleep thread on the Tx msg */
complete(&tx_msg->tx_complete);
}
+ spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);

/* wait for about 100ms to flush the tx message list */
while (count--) {
@@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
break;
}
}
- spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);

if (!msg_found) {
dev_warn(&pnp_dev->dev,
"Unexpected response (msg id %ld) is returned.\n",
msg->msgid);
- goto out_msg;
+ goto out_lock;
}

/* copy the response data to Rx_data buffer */
@@ -286,10 +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
}
tx_msg->rx_len = msg->msg.data_len;
memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
+ /* tx_msg content must be valid before setting msg_done flag */
+ smp_wmb();
tx_msg->msg_done = 1;

out_comp:
complete(&tx_msg->tx_complete);
+out_lock:
+ spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
out_msg:
ipmi_free_recv_msg(msg);
}
--
1.7.10

2013-07-23 08:15:16

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers

This patch adds reference counting for ACPI IPMI transfers to tune the
locking granularity of tx_msg_lock.

The acpi_ipmi_msg handling is re-designed using referece counting.
1. tx_msg is always unlinked before complete(), so that:
1.1. it is safe to put complete() out side of tx_msg_lock;
1.2. complete() can only happen once, thus smp_wmb() is not required.
2. Increasing the reference of tx_msg before calling
ipmi_request_settime() and introducing tx_msg_lock protected
ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
tx_msg unlinking in the failure cases.
3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
and freed in the contexts other than acpi_ipmi_space_handler().

The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
tuning:
1. ipmi_lock is always leaf:
irq_context: 0
[ffffffff81a943f8] smi_watchers_mutex
[ffffffffa06eca60] driver_data.ipmi_lock
irq_context: 0
[ffffffff82767b40] &buffer->mutex
[ffffffffa00a6678] s_active#103
[ffffffffa06eca60] driver_data.ipmi_lock
2. without this patch applied, lock used by complete() is held after
holding tx_msg_lock:
irq_context: 0
[ffffffff82767b40] &buffer->mutex
[ffffffffa00a6678] s_active#103
[ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
irq_context: 1
[ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
irq_context: 1
[ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
[ffffffffa06eccf0] &x->wait#25
irq_context: 1
[ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
[ffffffffa06eccf0] &x->wait#25
[ffffffff81e36620] &p->pi_lock
irq_context: 1
[ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
[ffffffffa06eccf0] &x->wait#25
[ffffffff81e36620] &p->pi_lock
[ffffffff81e5d0a8] &rq->lock
3. with this patch applied, tx_msg_lock is always leaf:
irq_context: 0
[ffffffff82767b40] &buffer->mutex
[ffffffffa00a66d8] s_active#107
[ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
irq_context: 1
[ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock

Signed-off-by: Lv Zheng <[email protected]>
Cc: Zhao Yakui <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/acpi_ipmi.c | 107 +++++++++++++++++++++++++++++++++-------------
1 file changed, 77 insertions(+), 30 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 2a09156..0ee1ea6 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
u8 rx_len;
struct acpi_ipmi_device *device;
+ atomic_t refcnt;
};

/* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
@@ -195,22 +196,47 @@ static struct acpi_ipmi_device *acpi_ipmi_get_selected_smi(void)
return ipmi_device;
}

-static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
+static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
{
+ struct acpi_ipmi_device *ipmi;
struct acpi_ipmi_msg *ipmi_msg;
- struct pnp_dev *pnp_dev = ipmi->pnp_dev;

+ ipmi = acpi_ipmi_get_selected_smi();
+ if (!ipmi)
+ return NULL;
ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
- if (!ipmi_msg) {
- dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
+ if (!ipmi_msg) {
+ acpi_ipmi_dev_put(ipmi);
return NULL;
}
+ atomic_set(&ipmi_msg->refcnt, 1);
init_completion(&ipmi_msg->tx_complete);
INIT_LIST_HEAD(&ipmi_msg->head);
ipmi_msg->device = ipmi;
+
return ipmi_msg;
}

+static struct acpi_ipmi_msg *
+acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
+{
+ if (tx_msg)
+ atomic_inc(&tx_msg->refcnt);
+ return tx_msg;
+}
+
+static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
+{
+ acpi_ipmi_dev_put(tx_msg->device);
+ kfree(tx_msg);
+}
+
+static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
+{
+ if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
+ ipmi_msg_release(tx_msg);
+}
+
#define IPMI_OP_RGN_NETFN(offset) ((offset >> 8) & 0xff)
#define IPMI_OP_RGN_CMD(offset) (offset & 0xff)
static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
@@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,

static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
{
- struct acpi_ipmi_msg *tx_msg, *temp;
+ struct acpi_ipmi_msg *tx_msg;
unsigned long flags;

/*
@@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
*/
while (atomic_read(&ipmi->refcnt) > 1) {
spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
- list_for_each_entry_safe(tx_msg, temp,
- &ipmi->tx_msg_list, head) {
+ while (!list_empty(&ipmi->tx_msg_list)) {
+ tx_msg = list_first_entry(&ipmi->tx_msg_list,
+ struct acpi_ipmi_msg,
+ head);
+ list_del(&tx_msg->head);
+ spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
/* wake up the sleep thread on the Tx msg */
complete(&tx_msg->tx_complete);
+ acpi_ipmi_msg_put(tx_msg);
+ spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
}
spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
schedule_timeout_uninterruptible(msecs_to_jiffies(1));
}
}

+static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
+ struct acpi_ipmi_msg *msg)
+{
+ struct acpi_ipmi_msg *tx_msg;
+ int msg_found = 0;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
+ list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
+ if (msg == tx_msg) {
+ msg_found = 1;
+ break;
+ }
+ }
+ if (msg_found)
+ list_del(&tx_msg->head);
+ spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
+ if (msg_found)
+ acpi_ipmi_msg_put(tx_msg);
+}
+
static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
{
struct acpi_ipmi_device *ipmi_device = user_msg_data;
@@ -343,12 +399,15 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
break;
}
}
+ if (msg_found)
+ list_del(&tx_msg->head);
+ spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);

if (!msg_found) {
dev_warn(&pnp_dev->dev,
"Unexpected response (msg id %ld) is returned.\n",
msg->msgid);
- goto out_lock;
+ goto out_msg;
}

/* copy the response data to Rx_data buffer */
@@ -360,14 +419,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
}
tx_msg->rx_len = msg->msg.data_len;
memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
- /* tx_msg content must be valid before setting msg_done flag */
- smp_wmb();
tx_msg->msg_done = 1;

out_comp:
complete(&tx_msg->tx_complete);
-out_lock:
- spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
+ acpi_ipmi_msg_put(tx_msg);
out_msg:
ipmi_free_recv_msg(msg);
}
@@ -493,21 +549,17 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
if ((function & ACPI_IO_MASK) == ACPI_READ)
return AE_TYPE;

- ipmi_device = acpi_ipmi_get_selected_smi();
- if (!ipmi_device)
+ tx_msg = ipmi_msg_alloc();
+ if (!tx_msg)
return AE_NOT_EXIST;
-
- tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
- if (!tx_msg) {
- status = AE_NO_MEMORY;
- goto out_ref;
- }
+ ipmi_device = tx_msg->device;

if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
- status = AE_TYPE;
- goto out_msg;
+ ipmi_msg_release(tx_msg);
+ return AE_TYPE;
}

+ acpi_ipmi_msg_get(tx_msg);
spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
@@ -518,21 +570,16 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
NULL, 0, 0, 0);
if (err) {
status = AE_ERROR;
- goto out_list;
+ goto out_msg;
}
rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
IPMI_TIMEOUT);
acpi_format_ipmi_response(tx_msg, value, rem_time);
status = AE_OK;

-out_list:
- spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
- list_del(&tx_msg->head);
- spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
out_msg:
- kfree(tx_msg);
-out_ref:
- acpi_ipmi_dev_put(ipmi_device);
+ ipmi_cancel_tx_msg(ipmi_device, tx_msg);
+ acpi_ipmi_msg_put(tx_msg);
return status;
}

--
1.7.10

2013-07-23 08:10:00

by Zheng, Lv

[permalink] [raw]
Subject: [PATCH 02/13] ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler()

This patch quick fixes the issues indicated by the test results that
ipmi_msg_handler() is invoked in atomic context.

BUG: scheduling while atomic: kipmi0/18933/0x10000100
Modules linked in: ipmi_si acpi_ipmi ...
CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2
Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010
ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8
ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4
0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8
Call Trace:
<IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b
[<ffffffff814bfbab>] __schedule_bug+0x46/0x54
[<ffffffff814c73e0>] __schedule+0x83/0x59c
[<ffffffff81058853>] __cond_resched+0x22/0x2d
[<ffffffff814c794b>] _cond_resched+0x14/0x1d
[<ffffffff814c6d82>] mutex_lock+0x11/0x32
[<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58
[<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si]
[<ffffffff812bf6e4>] deliver_response+0x55/0x5a
[<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65
[<ffffffff81007ad1>] ? read_tsc+0x9/0x19
[<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc
[<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si]
...

Known issues:
- Replacing tx_msg_lock with spinlock is not performance friendly
Current solution works but does not have the best performance because it
is better to make atomic context run as fast as possible. Given there
are no many IPMI messages created by ACPI, performance of current
solution may be OK. It can be better via linking ipmi_recv_msg into an
RX message queue and process it in other contexts.

Signed-off-by: Lv Zheng <[email protected]>
Reviewed-by: Huang Ying <[email protected]>
---
drivers/acpi/acpi_ipmi.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 28e2b4c..b37c189 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -39,6 +39,7 @@
#include <linux/ipmi.h>
#include <linux/device.h>
#include <linux/pnp.h>
+#include <linux/spinlock.h>

MODULE_AUTHOR("Zhao Yakui");
MODULE_DESCRIPTION("ACPI IPMI Opregion driver");
@@ -58,7 +59,7 @@ struct acpi_ipmi_device {
struct list_head head;
/* the IPMI request message list */
struct list_head tx_msg_list;
- struct mutex tx_msg_lock;
+ spinlock_t tx_msg_lock;
acpi_handle handle;
struct pnp_dev *pnp_dev;
ipmi_user_t user_interface;
@@ -146,6 +147,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
struct kernel_ipmi_msg *msg;
struct acpi_ipmi_buffer *buffer;
struct acpi_ipmi_device *device;
+ unsigned long flags;

msg = &tx_msg->tx_message;
/*
@@ -182,10 +184,10 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,

/* Get the msgid */
device = tx_msg->device;
- mutex_lock(&device->tx_msg_lock);
+ spin_lock_irqsave(&device->tx_msg_lock, flags);
device->curr_msgid++;
tx_msg->tx_msgid = device->curr_msgid;
- mutex_unlock(&device->tx_msg_lock);
+ spin_unlock_irqrestore(&device->tx_msg_lock, flags);

return 0;
}
@@ -250,6 +252,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
int msg_found = 0;
struct acpi_ipmi_msg *tx_msg;
struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
+ unsigned long flags;

if (msg->user != ipmi_device->user_interface) {
dev_warn(&pnp_dev->dev,
@@ -258,14 +261,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
goto out_msg;
}

- mutex_lock(&ipmi_device->tx_msg_lock);
+ spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
list_for_each_entry(tx_msg, &ipmi_device->tx_msg_list, head) {
if (msg->msgid == tx_msg->tx_msgid) {
msg_found = 1;
break;
}
}
- mutex_unlock(&ipmi_device->tx_msg_lock);
+ spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);

if (!msg_found) {
dev_warn(&pnp_dev->dev,
@@ -394,6 +397,7 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
struct acpi_ipmi_device *ipmi_device = handler_context;
int err, rem_time;
acpi_status status;
+ unsigned long flags;

/*
* IPMI opregion message.
@@ -416,9 +420,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
goto out_msg;
}

- mutex_lock(&ipmi_device->tx_msg_lock);
+ spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
- mutex_unlock(&ipmi_device->tx_msg_lock);
+ spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
err = ipmi_request_settime(ipmi_device->user_interface,
&tx_msg->addr,
tx_msg->tx_msgid,
@@ -434,9 +438,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
status = AE_OK;

out_list:
- mutex_lock(&ipmi_device->tx_msg_lock);
+ spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
list_del(&tx_msg->head);
- mutex_unlock(&ipmi_device->tx_msg_lock);
+ spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
out_msg:
kfree(tx_msg);
return status;
@@ -479,7 +483,7 @@ static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device)

INIT_LIST_HEAD(&ipmi_device->head);

- mutex_init(&ipmi_device->tx_msg_lock);
+ spin_lock_init(&ipmi_device->tx_msg_lock);
INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
ipmi_install_space_handler(ipmi_device);

--
1.7.10

2013-07-23 14:53:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow

On Tue, Jul 23, 2013 at 04:08:59PM +0800, Lv Zheng wrote:
> This patch enhances sanity checks on message size to avoid potential buffer
> overflow.
>
> The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while the
> ACPI specification defined IPMI message size is 64 bytes. The difference
> is not handled by the original codes. This may cause crash in the response
> handling codes.
> This patch fixes this gap and also combines rx_data/tx_data to use single
> data/len pair since they need not be seperated.
>
> Signed-off-by: Lv Zheng <[email protected]>
> Reviewed-by: Huang Ying <[email protected]>
> ---
> drivers/acpi/acpi_ipmi.c | 100 ++++++++++++++++++++++++++++------------------
> 1 file changed, 61 insertions(+), 39 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

Same goes for the other patches you sent in this thread...

2013-07-24 00:22:02

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow

> From: [email protected]
> [mailto:[email protected]] On Behalf Of Greg KH
> Sent: Tuesday, July 23, 2013 10:54 PM
>
> On Tue, Jul 23, 2013 at 04:08:59PM +0800, Lv Zheng wrote:
> > This patch enhances sanity checks on message size to avoid potential
> > buffer overflow.
> >
> > The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while
> > the ACPI specification defined IPMI message size is 64 bytes. The
> > difference is not handled by the original codes. This may cause crash
> > in the response handling codes.
> > This patch fixes this gap and also combines rx_data/tx_data to use
> > single data/len pair since they need not be seperated.
> >
> > Signed-off-by: Lv Zheng <[email protected]>
> > Reviewed-by: Huang Ying <[email protected]>
> > ---
> > drivers/acpi/acpi_ipmi.c | 100
> > ++++++++++++++++++++++++++++------------------
> > 1 file changed, 61 insertions(+), 39 deletions(-)
>
> <formletter>
>
> This is not the correct way to submit patches for inclusion in the stable kernel
> tree. Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.
>
> </formletter>
>
> Same goes for the other patches you sent in this thread...

OK, I'll add prerequisites for each that want to be accepted by the stable queue and re-send them (PATCH 01-06).

Thanks and best regards
-Lv

> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body
> of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html

2013-07-24 00:44:19

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow

> From: Zheng, Lv
> Sent: Wednesday, July 24, 2013 8:22 AM
>
> > From: [email protected]
> > [mailto:[email protected]] On Behalf Of Greg KH
> > Sent: Tuesday, July 23, 2013 10:54 PM
> >
> > On Tue, Jul 23, 2013 at 04:08:59PM +0800, Lv Zheng wrote:
> > > This patch enhances sanity checks on message size to avoid potential
> > > buffer overflow.
> > >
> > > The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while
> > > the ACPI specification defined IPMI message size is 64 bytes. The
> > > difference is not handled by the original codes. This may cause
> > > crash in the response handling codes.
> > > This patch fixes this gap and also combines rx_data/tx_data to use
> > > single data/len pair since they need not be seperated.
> > >
> > > Signed-off-by: Lv Zheng <[email protected]>
> > > Reviewed-by: Huang Ying <[email protected]>
> > > ---
> > > drivers/acpi/acpi_ipmi.c | 100
> > > ++++++++++++++++++++++++++++------------------
> > > 1 file changed, 61 insertions(+), 39 deletions(-)
> >
> > <formletter>
> >
> > This is not the correct way to submit patches for inclusion in the
> > stable kernel tree. Please read Documentation/stable_kernel_rules.txt
> > for how to do this properly.
> >
> > </formletter>
> >
> > Same goes for the other patches you sent in this thread...
>
> OK, I'll add prerequisites for each that want to be accepted by the stable queue
> and re-send them (PATCH 01-06).

Maybe I shouldn't.
I looks it is not possible to add commit ID prerequisites for patch series that has not been accepted by the mainline.
As the patches haven't been merged by the mainline, it is likely that the commit IDs in this series will be changed.
Please ignore [PATCH 01-06] that have been sent to the stable mailing list.
I'll just let ACPI maintainers know which patches I think that can go for stable tree and let they make the decision after the mainline acceptance.

Thanks and best regards
-Lv

>
> Thanks and best regards
> -Lv
>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-acpi"
> > in the body of a message to [email protected] More majordomo
> > info at http://vger.kernel.org/majordomo-info.html

2013-07-24 23:28:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers

On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> This patch fixes races caused by unprotected ACPI IPMI transfers.
>
> We can see the following crashes may occur:
> 1. There is no tx_msg_lock held for iterating tx_msg_list in
> ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> acpi_ipmi_space_handler() under protection of tx_msg_lock.
> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> while it is parellel accessed in ipmi_flush_tx_msg() and
> ipmi_msg_handler().
>
> This patch enhances tx_msg_lock to protect all tx_msg accesses to solve
> this issue. Then tx_msg_lock is always held around complete() and tx_msg
> accesses.
> Calling smp_wmb() before setting msg_done flag so that messages completed
> due to flushing will not be handled as 'done' messages while their contents
> are not vaild.
>
> Signed-off-by: Lv Zheng <[email protected]>
> Cc: Zhao Yakui <[email protected]>
> Reviewed-by: Huang Ying <[email protected]>
> ---
> drivers/acpi/acpi_ipmi.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> index b37c189..527ee43 100644
> --- a/drivers/acpi/acpi_ipmi.c
> +++ b/drivers/acpi/acpi_ipmi.c
> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
> struct acpi_ipmi_msg *tx_msg, *temp;
> int count = HZ / 10;
> struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> + unsigned long flags;
>
> + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> /* wake up the sleep thread on the Tx msg */
> complete(&tx_msg->tx_complete);
> }
> + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
>
> /* wait for about 100ms to flush the tx message list */
> while (count--) {
> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> break;
> }
> }
> - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>
> if (!msg_found) {
> dev_warn(&pnp_dev->dev,
> "Unexpected response (msg id %ld) is returned.\n",
> msg->msgid);
> - goto out_msg;
> + goto out_lock;
> }
>
> /* copy the response data to Rx_data buffer */
> @@ -286,10 +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> }
> tx_msg->rx_len = msg->msg.data_len;
> memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> + /* tx_msg content must be valid before setting msg_done flag */
> + smp_wmb();

That's suspicious.

If you need the write barrier here, you'll most likely need a read barrier
somewhere else. Where's that?

> tx_msg->msg_done = 1;
>
> out_comp:
> complete(&tx_msg->tx_complete);
> +out_lock:
> + spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> out_msg:
> ipmi_free_recv_msg(msg);
> }

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-25 03:09:42

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers

-stable according to the previous conversation.

> From: Rafael J. Wysocki [mailto:[email protected]]
> Sent: Thursday, July 25, 2013 7:38 AM
>
> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > This patch fixes races caused by unprotected ACPI IPMI transfers.
> >
> > We can see the following crashes may occur:
> > 1. There is no tx_msg_lock held for iterating tx_msg_list in
> > ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> > acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> > while it is parellel accessed in ipmi_flush_tx_msg() and
> > ipmi_msg_handler().
> >
> > This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > solve this issue. Then tx_msg_lock is always held around complete()
> > and tx_msg accesses.
> > Calling smp_wmb() before setting msg_done flag so that messages
> > completed due to flushing will not be handled as 'done' messages while
> > their contents are not vaild.
> >
> > Signed-off-by: Lv Zheng <[email protected]>
> > Cc: Zhao Yakui <[email protected]>
> > Reviewed-by: Huang Ying <[email protected]>
> > ---
> > drivers/acpi/acpi_ipmi.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > b37c189..527ee43 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> acpi_ipmi_device *ipmi)
> > struct acpi_ipmi_msg *tx_msg, *temp;
> > int count = HZ / 10;
> > struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > + unsigned long flags;
> >
> > + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> > /* wake up the sleep thread on the Tx msg */
> > complete(&tx_msg->tx_complete);
> > }
> > + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> >
> > /* wait for about 100ms to flush the tx message list */
> > while (count--) {
> > @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> > break;
> > }
> > }
> > - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >
> > if (!msg_found) {
> > dev_warn(&pnp_dev->dev,
> > "Unexpected response (msg id %ld) is returned.\n",
> > msg->msgid);
> > - goto out_msg;
> > + goto out_lock;
> > }
> >
> > /* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
> > static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> *user_msg_data)
> > }
> > tx_msg->rx_len = msg->msg.data_len;
> > memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > + /* tx_msg content must be valid before setting msg_done flag */
> > + smp_wmb();
>
> That's suspicious.
>
> If you need the write barrier here, you'll most likely need a read barrier
> somewhere else. Where's that?

It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().

So comment could be treated as 2 parts:
1. do we need a paired smp_rmb().
2. do we need a smp_wmb().

For 1.
If we want a paired smp_rmb(), then it will appear in this function:

186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
187 acpi_integer *value, int rem_time)
188 {
189 struct acpi_ipmi_buffer *buffer;
190
191 /*
192 * value is also used as output parameter. It represents the response
193 * IPMI message returned by IPMI command.
194 */
195 buffer = (struct acpi_ipmi_buffer *)value;
196 if (!rem_time && !msg->msg_done) {
197 buffer->status = ACPI_IPMI_TIMEOUT;
198 return;
199 }
200 /*
201 * If the flag of msg_done is not set or the recv length is zero, it
202 * means that the IPMI command is not executed correctly.
203 * The status code will be ACPI_IPMI_UNKNOWN.
204 */
205 if (!msg->msg_done || !msg->rx_len) {
206 buffer->status = ACPI_IPMI_UNKNOWN;
207 return;
208 }
+ smp_rmb();
209 /*
210 * If the IPMI response message is obtained correctly, the status code
211 * will be ACPI_IPMI_OK
212 */
213 buffer->status = ACPI_IPMI_OK;
214 buffer->length = msg->rx_len;
215 memcpy(buffer->data, msg->rx_data, msg->rx_len);
216 }

If we don't then there will only be msg content not correctly read from msg->rx_data.
Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.

Being without smp_rmb() is also OK in this case, since:
1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].

So IMO, we needn't add the smp_rmb(), what do you think of this?

For 2.
If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.

Thanks and best regards
-Lv

>
> > tx_msg->msg_done = 1;
> >
> > out_comp:
> > complete(&tx_msg->tx_complete);
> > +out_lock:
> > + spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > out_msg:
> > ipmi_free_recv_msg(msg);
> > }
>
> Rafael
>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-25 11:56:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers

On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> -stable according to the previous conversation.
>
> > From: Rafael J. Wysocki [mailto:[email protected]]
> > Sent: Thursday, July 25, 2013 7:38 AM
> >
> > On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > > This patch fixes races caused by unprotected ACPI IPMI transfers.
> > >
> > > We can see the following crashes may occur:
> > > 1. There is no tx_msg_lock held for iterating tx_msg_list in
> > > ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> > > acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > > 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> > > while it is parellel accessed in ipmi_flush_tx_msg() and
> > > ipmi_msg_handler().
> > >
> > > This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > > solve this issue. Then tx_msg_lock is always held around complete()
> > > and tx_msg accesses.
> > > Calling smp_wmb() before setting msg_done flag so that messages
> > > completed due to flushing will not be handled as 'done' messages while
> > > their contents are not vaild.
> > >
> > > Signed-off-by: Lv Zheng <[email protected]>
> > > Cc: Zhao Yakui <[email protected]>
> > > Reviewed-by: Huang Ying <[email protected]>
> > > ---
> > > drivers/acpi/acpi_ipmi.c | 10 ++++++++--
> > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > b37c189..527ee43 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> > acpi_ipmi_device *ipmi)
> > > struct acpi_ipmi_msg *tx_msg, *temp;
> > > int count = HZ / 10;
> > > struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > > + unsigned long flags;
> > >
> > > + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> > > /* wake up the sleep thread on the Tx msg */
> > > complete(&tx_msg->tx_complete);
> > > }
> > > + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > >
> > > /* wait for about 100ms to flush the tx message list */
> > > while (count--) {
> > > @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> > ipmi_recv_msg *msg, void *user_msg_data)
> > > break;
> > > }
> > > }
> > > - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > >
> > > if (!msg_found) {
> > > dev_warn(&pnp_dev->dev,
> > > "Unexpected response (msg id %ld) is returned.\n",
> > > msg->msgid);
> > > - goto out_msg;
> > > + goto out_lock;
> > > }
> > >
> > > /* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
> > > static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> > *user_msg_data)
> > > }
> > > tx_msg->rx_len = msg->msg.data_len;
> > > memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > > + /* tx_msg content must be valid before setting msg_done flag */
> > > + smp_wmb();
> >
> > That's suspicious.
> >
> > If you need the write barrier here, you'll most likely need a read barrier
> > somewhere else. Where's that?
>
> It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().
>
> So comment could be treated as 2 parts:
> 1. do we need a paired smp_rmb().
> 2. do we need a smp_wmb().
>
> For 1.
> If we want a paired smp_rmb(), then it will appear in this function:
>
> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> 187 acpi_integer *value, int rem_time)
> 188 {
> 189 struct acpi_ipmi_buffer *buffer;
> 190
> 191 /*
> 192 * value is also used as output parameter. It represents the response
> 193 * IPMI message returned by IPMI command.
> 194 */
> 195 buffer = (struct acpi_ipmi_buffer *)value;
> 196 if (!rem_time && !msg->msg_done) {
> 197 buffer->status = ACPI_IPMI_TIMEOUT;
> 198 return;
> 199 }
> 200 /*
> 201 * If the flag of msg_done is not set or the recv length is zero, it
> 202 * means that the IPMI command is not executed correctly.
> 203 * The status code will be ACPI_IPMI_UNKNOWN.
> 204 */
> 205 if (!msg->msg_done || !msg->rx_len) {
> 206 buffer->status = ACPI_IPMI_UNKNOWN;
> 207 return;
> 208 }
> + smp_rmb();
> 209 /*
> 210 * If the IPMI response message is obtained correctly, the status code
> 211 * will be ACPI_IPMI_OK
> 212 */
> 213 buffer->status = ACPI_IPMI_OK;
> 214 buffer->length = msg->rx_len;
> 215 memcpy(buffer->data, msg->rx_data, msg->rx_len);
> 216 }
>
> If we don't then there will only be msg content not correctly read from msg->rx_data.
> Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.
>
> Being without smp_rmb() is also OK in this case, since:
> 1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
> 2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].
>
> So IMO, we needn't add the smp_rmb(), what do you think of this?
>
> For 2.
> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.

Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
because each of them prevents only one flow of control from being
speculatively reordered, either by the CPU or by the compiler. If only one
of them is used without the other, then the flow of control without the
barrier may be reordered in a way that will effectively cancel the effect of
the barrier in the second flow of control.

So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them at all.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-25 18:12:45

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers

On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
>> -stable according to the previous conversation.
>>
>>> From: Rafael J. Wysocki [mailto:[email protected]]
>>> Sent: Thursday, July 25, 2013 7:38 AM
>>>
>>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
>>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
>>>>
>>>> We can see the following crashes may occur:
>>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
>>>> ipmi_flush_tx_msg() while it is parellel unlinked on failure in
>>>> acpi_ipmi_space_handler() under protection of tx_msg_lock.
>>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
>>>> while it is parellel accessed in ipmi_flush_tx_msg() and
>>>> ipmi_msg_handler().
>>>>
>>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
>>>> solve this issue. Then tx_msg_lock is always held around complete()
>>>> and tx_msg accesses.
>>>> Calling smp_wmb() before setting msg_done flag so that messages
>>>> completed due to flushing will not be handled as 'done' messages while
>>>> their contents are not vaild.
>>>>
>>>> Signed-off-by: Lv Zheng <[email protected]>
>>>> Cc: Zhao Yakui <[email protected]>
>>>> Reviewed-by: Huang Ying <[email protected]>
>>>> ---
>>>> drivers/acpi/acpi_ipmi.c | 10 ++++++++--
>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
>>>> b37c189..527ee43 100644
>>>> --- a/drivers/acpi/acpi_ipmi.c
>>>> +++ b/drivers/acpi/acpi_ipmi.c
>>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
>>> acpi_ipmi_device *ipmi)
>>>> struct acpi_ipmi_msg *tx_msg, *temp;
>>>> int count = HZ / 10;
>>>> struct pnp_dev *pnp_dev = ipmi->pnp_dev;
>>>> + unsigned long flags;
>>>>
>>>> + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
>>>> list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
>>>> /* wake up the sleep thread on the Tx msg */
>>>> complete(&tx_msg->tx_complete);
>>>> }
>>>> + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
>>>>
>>>> /* wait for about 100ms to flush the tx message list */
>>>> while (count--) {
>>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
>>> ipmi_recv_msg *msg, void *user_msg_data)
>>>> break;
>>>> }
>>>> }
>>>> - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>>>>
>>>> if (!msg_found) {
>>>> dev_warn(&pnp_dev->dev,
>>>> "Unexpected response (msg id %ld) is returned.\n",
>>>> msg->msgid);
>>>> - goto out_msg;
>>>> + goto out_lock;
>>>> }
>>>>
>>>> /* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
>>>> static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
>>> *user_msg_data)
>>>> }
>>>> tx_msg->rx_len = msg->msg.data_len;
>>>> memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
>>>> + /* tx_msg content must be valid before setting msg_done flag */
>>>> + smp_wmb();
>>> That's suspicious.
>>>
>>> If you need the write barrier here, you'll most likely need a read barrier
>>> somewhere else. Where's that?
>> It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().
>>
>> So comment could be treated as 2 parts:
>> 1. do we need a paired smp_rmb().
>> 2. do we need a smp_wmb().
>>
>> For 1.
>> If we want a paired smp_rmb(), then it will appear in this function:
>>
>> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
>> 187 acpi_integer *value, int rem_time)
>> 188 {
>> 189 struct acpi_ipmi_buffer *buffer;
>> 190
>> 191 /*
>> 192 * value is also used as output parameter. It represents the response
>> 193 * IPMI message returned by IPMI command.
>> 194 */
>> 195 buffer = (struct acpi_ipmi_buffer *)value;
>> 196 if (!rem_time && !msg->msg_done) {
>> 197 buffer->status = ACPI_IPMI_TIMEOUT;
>> 198 return;
>> 199 }
>> 200 /*
>> 201 * If the flag of msg_done is not set or the recv length is zero, it
>> 202 * means that the IPMI command is not executed correctly.
>> 203 * The status code will be ACPI_IPMI_UNKNOWN.
>> 204 */
>> 205 if (!msg->msg_done || !msg->rx_len) {
>> 206 buffer->status = ACPI_IPMI_UNKNOWN;
>> 207 return;
>> 208 }
>> + smp_rmb();
>> 209 /*
>> 210 * If the IPMI response message is obtained correctly, the status code
>> 211 * will be ACPI_IPMI_OK
>> 212 */
>> 213 buffer->status = ACPI_IPMI_OK;
>> 214 buffer->length = msg->rx_len;
>> 215 memcpy(buffer->data, msg->rx_data, msg->rx_len);
>> 216 }
>>
>> If we don't then there will only be msg content not correctly read from msg->rx_data.
>> Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.
>>
>> Being without smp_rmb() is also OK in this case, since:
>> 1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
>> 2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].
>>
>> So IMO, we needn't add the smp_rmb(), what do you think of this?
>>
>> For 2.
>> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.
> Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
> because each of them prevents only one flow of control from being
> speculatively reordered, either by the CPU or by the compiler. If only one
> of them is used without the other, then the flow of control without the
> barrier may be reordered in a way that will effectively cancel the effect of
> the barrier in the second flow of control.
>
> So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them at all.

If I understand this correctly, the problem would be if:

rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
IPMI_TIMEOUT);

returns on a timeout, then checks msg_done and races with something
setting msg_done. If that is the case, you would need the smp_rmb()
before checking msg_done.

However, the timeout above is unnecessary. You are using
ipmi_request_settime(), so you can set the timeout when the IPMI command
fails and returns a failure message. The driver guarantees a return
message for each request. Just remove the timeout from the completion,
set the timeout and retries in the ipmi request, and the completion
should handle the barrier issues.

Plus, from a quick glance at the code, it doesn't look like it will
properly handle a situation where the timeout occurs and is handled then
the response comes in later.

-corey

>
> Thanks,
> Rafael
>
>

2013-07-25 19:22:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers

On Thursday, July 25, 2013 01:12:38 PM Corey Minyard wrote:
> On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> > On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> >> -stable according to the previous conversation.
> >>
> >>> From: Rafael J. Wysocki [mailto:[email protected]]
> >>> Sent: Thursday, July 25, 2013 7:38 AM
> >>>
> >>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> >>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
> >>>>
> >>>> We can see the following crashes may occur:
> >>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
> >>>> ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> >>>> acpi_ipmi_space_handler() under protection of tx_msg_lock.
> >>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> >>>> while it is parellel accessed in ipmi_flush_tx_msg() and
> >>>> ipmi_msg_handler().
> >>>>
> >>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
> >>>> solve this issue. Then tx_msg_lock is always held around complete()
> >>>> and tx_msg accesses.
> >>>> Calling smp_wmb() before setting msg_done flag so that messages
> >>>> completed due to flushing will not be handled as 'done' messages while
> >>>> their contents are not vaild.
> >>>>
> >>>> Signed-off-by: Lv Zheng <[email protected]>
> >>>> Cc: Zhao Yakui <[email protected]>
> >>>> Reviewed-by: Huang Ying <[email protected]>
> >>>> ---
> >>>> drivers/acpi/acpi_ipmi.c | 10 ++++++++--
> >>>> 1 file changed, 8 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> >>>> b37c189..527ee43 100644
> >>>> --- a/drivers/acpi/acpi_ipmi.c
> >>>> +++ b/drivers/acpi/acpi_ipmi.c
> >>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> >>> acpi_ipmi_device *ipmi)
> >>>> struct acpi_ipmi_msg *tx_msg, *temp;
> >>>> int count = HZ / 10;
> >>>> struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >>>> + unsigned long flags;
> >>>>
> >>>> + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> >>>> list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> >>>> /* wake up the sleep thread on the Tx msg */
> >>>> complete(&tx_msg->tx_complete);
> >>>> }
> >>>> + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> >>>>
> >>>> /* wait for about 100ms to flush the tx message list */
> >>>> while (count--) {
> >>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> >>> ipmi_recv_msg *msg, void *user_msg_data)
> >>>> break;
> >>>> }
> >>>> }
> >>>> - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >>>>
> >>>> if (!msg_found) {
> >>>> dev_warn(&pnp_dev->dev,
> >>>> "Unexpected response (msg id %ld) is returned.\n",
> >>>> msg->msgid);
> >>>> - goto out_msg;
> >>>> + goto out_lock;
> >>>> }
> >>>>
> >>>> /* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
> >>>> static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> >>> *user_msg_data)
> >>>> }
> >>>> tx_msg->rx_len = msg->msg.data_len;
> >>>> memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> >>>> + /* tx_msg content must be valid before setting msg_done flag */
> >>>> + smp_wmb();
> >>> That's suspicious.
> >>>
> >>> If you need the write barrier here, you'll most likely need a read barrier
> >>> somewhere else. Where's that?
> >> It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().
> >>
> >> So comment could be treated as 2 parts:
> >> 1. do we need a paired smp_rmb().
> >> 2. do we need a smp_wmb().
> >>
> >> For 1.
> >> If we want a paired smp_rmb(), then it will appear in this function:
> >>
> >> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> >> 187 acpi_integer *value, int rem_time)
> >> 188 {
> >> 189 struct acpi_ipmi_buffer *buffer;
> >> 190
> >> 191 /*
> >> 192 * value is also used as output parameter. It represents the response
> >> 193 * IPMI message returned by IPMI command.
> >> 194 */
> >> 195 buffer = (struct acpi_ipmi_buffer *)value;
> >> 196 if (!rem_time && !msg->msg_done) {
> >> 197 buffer->status = ACPI_IPMI_TIMEOUT;
> >> 198 return;
> >> 199 }
> >> 200 /*
> >> 201 * If the flag of msg_done is not set or the recv length is zero, it
> >> 202 * means that the IPMI command is not executed correctly.
> >> 203 * The status code will be ACPI_IPMI_UNKNOWN.
> >> 204 */
> >> 205 if (!msg->msg_done || !msg->rx_len) {
> >> 206 buffer->status = ACPI_IPMI_UNKNOWN;
> >> 207 return;
> >> 208 }
> >> + smp_rmb();
> >> 209 /*
> >> 210 * If the IPMI response message is obtained correctly, the status code
> >> 211 * will be ACPI_IPMI_OK
> >> 212 */
> >> 213 buffer->status = ACPI_IPMI_OK;
> >> 214 buffer->length = msg->rx_len;
> >> 215 memcpy(buffer->data, msg->rx_data, msg->rx_len);
> >> 216 }
> >>
> >> If we don't then there will only be msg content not correctly read from msg->rx_data.
> >> Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.
> >>
> >> Being without smp_rmb() is also OK in this case, since:
> >> 1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
> >> 2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].
> >>
> >> So IMO, we needn't add the smp_rmb(), what do you think of this?
> >>
> >> For 2.
> >> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.
> > Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
> > because each of them prevents only one flow of control from being
> > speculatively reordered, either by the CPU or by the compiler. If only one
> > of them is used without the other, then the flow of control without the
> > barrier may be reordered in a way that will effectively cancel the effect of
> > the barrier in the second flow of control.
> >
> > So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them at all.
>
> If I understand this correctly, the problem would be if:
>
> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> IPMI_TIMEOUT);
>
> returns on a timeout, then checks msg_done and races with something
> setting msg_done. If that is the case, you would need the smp_rmb()
> before checking msg_done.

I believe so.

> However, the timeout above is unnecessary. You are using
> ipmi_request_settime(), so you can set the timeout when the IPMI command
> fails and returns a failure message. The driver guarantees a return
> message for each request. Just remove the timeout from the completion,
> set the timeout and retries in the ipmi request, and the completion
> should handle the barrier issues.

Good point.

> Plus, from a quick glance at the code, it doesn't look like it will
> properly handle a situation where the timeout occurs and is handled then
> the response comes in later.

Lv, what about this?

Rafael

2013-07-25 20:17:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers

On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> This patch adds reference couting for ACPI operation region handlers to fix
> races caused by the ACPICA address space callback invocations.
>
> ACPICA address space callback invocation is not suitable for Linux
> CONFIG_MODULE=y execution environment. This patch tries to protect the
> address space callbacks by invoking them under a module safe environment.
> The IPMI address space handler is also upgraded in this patch.
> The acpi_unregister_region() is designed to meet the following
> requirements:
> 1. It acts as a barrier for operation region callbacks - no callback will
> happen after acpi_unregister_region().
> 2. acpi_unregister_region() is safe to be called in moudle->exit()
> functions.
> Using reference counting rather than module referencing allows
> such benefits to be achieved even when acpi_unregister_region() is called
> in the environments other than module->exit().
> The header file of include/acpi/acpi_bus.h should contain the declarations
> that have references to some ACPICA defined types.
>
> Signed-off-by: Lv Zheng <[email protected]>
> Reviewed-by: Huang Ying <[email protected]>
> ---
> drivers/acpi/acpi_ipmi.c | 16 ++--
> drivers/acpi/osl.c | 224 ++++++++++++++++++++++++++++++++++++++++++++++
> include/acpi/acpi_bus.h | 5 ++
> 3 files changed, 235 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> index 5f8f495..2a09156 100644
> --- a/drivers/acpi/acpi_ipmi.c
> +++ b/drivers/acpi/acpi_ipmi.c
> @@ -539,20 +539,18 @@ out_ref:
> static int __init acpi_ipmi_init(void)
> {
> int result = 0;
> - acpi_status status;
>
> if (acpi_disabled)
> return result;
>
> mutex_init(&driver_data.ipmi_lock);
>
> - status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> - ACPI_ADR_SPACE_IPMI,
> - &acpi_ipmi_space_handler,
> - NULL, NULL);
> - if (ACPI_FAILURE(status)) {
> + result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> + &acpi_ipmi_space_handler,
> + NULL, NULL);
> + if (result) {
> pr_warn("Can't register IPMI opregion space handle\n");
> - return -EINVAL;
> + return result;
> }
>
> result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> }
> mutex_unlock(&driver_data.ipmi_lock);
>
> - acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> - ACPI_ADR_SPACE_IPMI,
> - &acpi_ipmi_space_handler);
> + acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> }
>
> module_init(acpi_ipmi_init);
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 6ab2c35..8398e51 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
> static struct workqueue_struct *kacpi_notify_wq;
> static struct workqueue_struct *kacpi_hotplug_wq;
>
> +struct acpi_region {
> + unsigned long flags;
> +#define ACPI_REGION_DEFAULT 0x01
> +#define ACPI_REGION_INSTALLED 0x02
> +#define ACPI_REGION_REGISTERED 0x04
> +#define ACPI_REGION_UNREGISTERING 0x08
> +#define ACPI_REGION_INSTALLING 0x10

What about (1UL << 1), (1UL << 2) etc.?

Also please remove the #defines out of the struct definition.

> + /*
> + * NOTE: Upgrading All Region Handlers
> + * This flag is only used during the period where not all of the
> + * region handers are upgraded to the new interfaces.
> + */
> +#define ACPI_REGION_MANAGED 0x80
> + acpi_adr_space_handler handler;
> + acpi_adr_space_setup setup;
> + void *context;
> + /* Invoking references */
> + atomic_t refcnt;

Actually, why don't you use krefs?

> +};
> +
> +static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS] = {
> + [ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> + .flags = ACPI_REGION_DEFAULT,
> + },
> + [ACPI_ADR_SPACE_SYSTEM_IO] = {
> + .flags = ACPI_REGION_DEFAULT,
> + },
> + [ACPI_ADR_SPACE_PCI_CONFIG] = {
> + .flags = ACPI_REGION_DEFAULT,
> + },
> + [ACPI_ADR_SPACE_IPMI] = {
> + .flags = ACPI_REGION_MANAGED,
> + },
> +};
> +static DEFINE_MUTEX(acpi_mutex_region);
> +
> /*
> * This list of permanent mappings is for memory that may be accessed from
> * interrupt context, where we can't do the ioremap().
> @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle, u32 type, void *context,
> kfree(hp_work);
> }
> EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> +
> +static bool acpi_region_managed(struct acpi_region *rgn)
> +{
> + /*
> + * NOTE: Default and Managed
> + * We only need to avoid region management on the regions managed
> + * by ACPICA (ACPI_REGION_DEFAULT). Currently, we need additional
> + * check as many operation region handlers are not upgraded, so
> + * only those known to be safe are managed (ACPI_REGION_MANAGED).
> + */
> + return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> + (rgn->flags & ACPI_REGION_MANAGED);
> +}
> +
> +static bool acpi_region_callable(struct acpi_region *rgn)
> +{
> + return (rgn->flags & ACPI_REGION_REGISTERED) &&
> + !(rgn->flags & ACPI_REGION_UNREGISTERING);
> +}
> +
> +static acpi_status
> +acpi_region_default_handler(u32 function,
> + acpi_physical_address address,
> + u32 bit_width, u64 *value,
> + void *handler_context, void *region_context)
> +{
> + acpi_adr_space_handler handler;
> + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> + void *context;
> + acpi_status status = AE_NOT_EXIST;
> +
> + mutex_lock(&acpi_mutex_region);
> + if (!acpi_region_callable(rgn) || !rgn->handler) {
> + mutex_unlock(&acpi_mutex_region);
> + return status;
> + }
> +
> + atomic_inc(&rgn->refcnt);
> + handler = rgn->handler;
> + context = rgn->context;
> + mutex_unlock(&acpi_mutex_region);
> +
> + status = handler(function, address, bit_width, value, context,
> + region_context);

Why don't we call the handler under the mutex?

What exactly prevents context from becoming NULL before the call above?

> + atomic_dec(&rgn->refcnt);
> +
> + return status;
> +}
> +
> +static acpi_status
> +acpi_region_default_setup(acpi_handle handle, u32 function,
> + void *handler_context, void **region_context)
> +{
> + acpi_adr_space_setup setup;
> + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> + void *context;
> + acpi_status status = AE_OK;
> +
> + mutex_lock(&acpi_mutex_region);
> + if (!acpi_region_callable(rgn) || !rgn->setup) {
> + mutex_unlock(&acpi_mutex_region);
> + return status;
> + }
> +
> + atomic_inc(&rgn->refcnt);
> + setup = rgn->setup;
> + context = rgn->context;
> + mutex_unlock(&acpi_mutex_region);
> +
> + status = setup(handle, function, context, region_context);

Can setup drop rgn->refcnt ?

> + atomic_dec(&rgn->refcnt);
> +
> + return status;
> +}
> +
> +static int __acpi_install_region(struct acpi_region *rgn,
> + acpi_adr_space_type space_id)
> +{
> + int res = 0;
> + acpi_status status;
> + int installing = 0;
> +
> + mutex_lock(&acpi_mutex_region);
> + if (rgn->flags & ACPI_REGION_INSTALLED)
> + goto out_lock;
> + if (rgn->flags & ACPI_REGION_INSTALLING) {
> + res = -EBUSY;
> + goto out_lock;
> + }
> +
> + installing = 1;
> + rgn->flags |= ACPI_REGION_INSTALLING;
> + status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT, space_id,
> + acpi_region_default_handler,
> + acpi_region_default_setup,
> + rgn);
> + rgn->flags &= ~ACPI_REGION_INSTALLING;
> + if (ACPI_FAILURE(status))
> + res = -EINVAL;
> + else
> + rgn->flags |= ACPI_REGION_INSTALLED;
> +
> +out_lock:
> + mutex_unlock(&acpi_mutex_region);
> + if (installing) {
> + if (res)
> + pr_err("Failed to install region %d\n", space_id);
> + else
> + pr_info("Region %d installed\n", space_id);
> + }
> + return res;
> +}
> +
> +int acpi_register_region(acpi_adr_space_type space_id,
> + acpi_adr_space_handler handler,
> + acpi_adr_space_setup setup, void *context)
> +{
> + int res;
> + struct acpi_region *rgn;
> +
> + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> + return -EINVAL;
> +
> + rgn = &acpi_regions[space_id];
> + if (!acpi_region_managed(rgn))
> + return -EINVAL;
> +
> + res = __acpi_install_region(rgn, space_id);
> + if (res)
> + return res;
> +
> + mutex_lock(&acpi_mutex_region);
> + if (rgn->flags & ACPI_REGION_REGISTERED) {
> + mutex_unlock(&acpi_mutex_region);
> + return -EBUSY;
> + }
> +
> + rgn->handler = handler;
> + rgn->setup = setup;
> + rgn->context = context;
> + rgn->flags |= ACPI_REGION_REGISTERED;
> + atomic_set(&rgn->refcnt, 1);
> + mutex_unlock(&acpi_mutex_region);
> +
> + pr_info("Region %d registered\n", space_id);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(acpi_register_region);
> +
> +void acpi_unregister_region(acpi_adr_space_type space_id)
> +{
> + struct acpi_region *rgn;
> +
> + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> + return;
> +
> + rgn = &acpi_regions[space_id];
> + if (!acpi_region_managed(rgn))
> + return;
> +
> + mutex_lock(&acpi_mutex_region);
> + if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> + mutex_unlock(&acpi_mutex_region);
> + return;
> + }
> + if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> + mutex_unlock(&acpi_mutex_region);
> + return;

What about

if ((rgn->flags & ACPI_REGION_UNREGISTERING)
|| !(rgn->flags & ACPI_REGION_REGISTERED)) {
mutex_unlock(&acpi_mutex_region);
return;
}

> + }
> +
> + rgn->flags |= ACPI_REGION_UNREGISTERING;
> + rgn->handler = NULL;
> + rgn->setup = NULL;
> + rgn->context = NULL;
> + mutex_unlock(&acpi_mutex_region);
> +
> + while (atomic_read(&rgn->refcnt) > 1)
> + schedule_timeout_uninterruptible(usecs_to_jiffies(5));

Wouldn't it be better to use a wait queue here?

> + atomic_dec(&rgn->refcnt);
> +
> + mutex_lock(&acpi_mutex_region);
> + rgn->flags &= ~(ACPI_REGION_REGISTERED | ACPI_REGION_UNREGISTERING);
> + mutex_unlock(&acpi_mutex_region);
> +
> + pr_info("Region %d unregistered\n", space_id);
> +}
> +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index a2c2fbb..15fad0d 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void *bus) { return 0; }
>
> #endif /* CONFIG_ACPI */
>
> +int acpi_register_region(acpi_adr_space_type space_id,
> + acpi_adr_space_handler handler,
> + acpi_adr_space_setup setup, void *context);
> +void acpi_unregister_region(acpi_adr_space_type space_id);
> +
> #endif /*__ACPI_BUS_H__*/

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-25 21:19:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers

On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> This patch adds reference couting for ACPI operation region handlers to fix
> races caused by the ACPICA address space callback invocations.
>
> ACPICA address space callback invocation is not suitable for Linux
> CONFIG_MODULE=y execution environment.

Actually, can you please explain to me what *exactly* the problem is?

Rafael

2013-07-25 21:49:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 04/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user

On Tuesday, July 23, 2013 04:09:26 PM Lv Zheng wrote:
> This patch uses reference counting to fix the race caused by the
> unprotected ACPI IPMI user.
>
> As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler()
> can happen before setting user_interface to NULL and codes after the check
> in acpi_ipmi_space_handler() can happen after user_interface becoming NULL,
> then the on-going acpi_ipmi_space_handler() still can pass an invalid
> acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race
> condition is not allowed by the IPMI layer's API design as crash will
> happen in ipmi_request_settime().
> In IPMI layer, smi_gone()/new_smi() callbacks are protected by
> smi_watchers_mutex, thus their invocations are serialized. But as a new
> smi can re-use the freed intf_num, it requires that the callback
> implementation must not use intf_num as an identification mean or it must
> ensure all references to the previous smi are all dropped before exiting
> smi_gone() callback. In case of acpi_ipmi module, this means
> ipmi_flush_tx_msg() must ensure all on-going IPMI transfers are completed
> before exiting ipmi_flush_tx_msg().
>
> This patch follows ipmi_devintf.c design:
> 1. Invoking ipmi_destroy_user() after the reference count of
> acpi_ipmi_device dropping to 0, this matches IPMI layer's API calling
> rule on ipmi_destroy_user() and ipmi_request_settime().
> 2. References of acpi_ipmi_device dropping to 1 means tx_msg related to
> this acpi_ipmi_device are all freed, this can be used to implement the
> new flushing mechanism. Note complete() must be retried so that the
> on-going tx_msg won't block flushing at the point to add tx_msg into
> tx_msg_list where reference of acpi_ipmi_device is held. This matches
> the IPMI layer's callback rule on smi_gone()/new_smi() serialization.
> 3. ipmi_flush_tx_msg() is performed after deleting acpi_ipmi_device from
> the list so that no new tx_msg can be created after entering flushing
> process.
> 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch.
>
> The forthcoming IPMI operation region handler installation changes also
> requires acpi_ipmi_device be handled in the reference counting style.
>
> Authorship is also updated due to this design change.
>
> Signed-off-by: Lv Zheng <[email protected]>
> Cc: Zhao Yakui <[email protected]>
> Reviewed-by: Huang Ying <[email protected]>
> ---
> drivers/acpi/acpi_ipmi.c | 249 +++++++++++++++++++++++++++-------------------
> 1 file changed, 149 insertions(+), 100 deletions(-)
>
> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> index 527ee43..cbf25e0 100644
> --- a/drivers/acpi/acpi_ipmi.c
> +++ b/drivers/acpi/acpi_ipmi.c
> @@ -1,8 +1,9 @@
> /*
> * acpi_ipmi.c - ACPI IPMI opregion
> *
> - * Copyright (C) 2010 Intel Corporation
> - * Copyright (C) 2010 Zhao Yakui <[email protected]>
> + * Copyright (C) 2010, 2013 Intel Corporation
> + * Author: Zhao Yakui <[email protected]>
> + * Lv Zheng <[email protected]>
> *
> * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> *
> @@ -67,6 +68,7 @@ struct acpi_ipmi_device {
> long curr_msgid;
> unsigned long flags;
> struct ipmi_smi_info smi_data;
> + atomic_t refcnt;

Can you use a kref instead?

> };
>
> struct ipmi_driver_data {
> @@ -107,8 +109,8 @@ struct acpi_ipmi_buffer {
> static void ipmi_register_bmc(int iface, struct device *dev);
> static void ipmi_bmc_gone(int iface);
> static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
> -static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device);
> -static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device);
> +static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
> +static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
>
> static struct ipmi_driver_data driver_data = {
> .ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
> @@ -122,6 +124,80 @@ static struct ipmi_driver_data driver_data = {
> },
> };
>
> +static struct acpi_ipmi_device *
> +ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
> +{
> + struct acpi_ipmi_device *ipmi_device;
> + int err;
> + ipmi_user_t user;
> +
> + ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
> + if (!ipmi_device)
> + return NULL;
> +
> + atomic_set(&ipmi_device->refcnt, 1);
> + INIT_LIST_HEAD(&ipmi_device->head);
> + INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
> + spin_lock_init(&ipmi_device->tx_msg_lock);
> +
> + ipmi_device->handle = handle;
> + ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> + memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
> + ipmi_device->ipmi_ifnum = iface;
> +
> + err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> + ipmi_device, &user);
> + if (err) {
> + put_device(smi_data->dev);
> + kfree(ipmi_device);
> + return NULL;
> + }
> + ipmi_device->user_interface = user;
> + ipmi_install_space_handler(ipmi_device);
> +
> + return ipmi_device;
> +}
> +
> +static struct acpi_ipmi_device *
> +acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
> +{
> + if (ipmi_device)
> + atomic_inc(&ipmi_device->refcnt);
> + return ipmi_device;
> +}
> +
> +static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
> +{
> + ipmi_remove_space_handler(ipmi_device);
> + ipmi_destroy_user(ipmi_device->user_interface);
> + put_device(ipmi_device->smi_data.dev);
> + kfree(ipmi_device);
> +}
> +
> +static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
> +{
> + if (ipmi_device && atomic_dec_and_test(&ipmi_device->refcnt))
> + ipmi_dev_release(ipmi_device);
> +}
> +
> +static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
> +{
> + int dev_found = 0;
> + struct acpi_ipmi_device *ipmi_device;
> +

Why don't you do

struct acpi_ipmi_device *ipmi_device, *ret = NULL;

and then ->

> + mutex_lock(&driver_data.ipmi_lock);
> + list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
> + if (ipmi_device->ipmi_ifnum == iface) {

-> ret = ipmi_device; ->

> + dev_found = 1;
> + acpi_ipmi_dev_get(ipmi_device);
> + break;
> + }
> + }
> + mutex_unlock(&driver_data.ipmi_lock);
> +
> + return dev_found ? ipmi_device : NULL;

-> return ret;

> +}
> +
> static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
> {
> struct acpi_ipmi_msg *ipmi_msg;
> @@ -228,25 +304,24 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
> {
> struct acpi_ipmi_msg *tx_msg, *temp;
> - int count = HZ / 10;
> - struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> unsigned long flags;
>
> - spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> - list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> - /* wake up the sleep thread on the Tx msg */
> - complete(&tx_msg->tx_complete);
> - }
> - spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> -
> - /* wait for about 100ms to flush the tx message list */
> - while (count--) {
> - if (list_empty(&ipmi->tx_msg_list))
> - break;
> - schedule_timeout(1);
> + /*
> + * NOTE: Synchronous Flushing
> + * Wait until refnct dropping to 1 - no other users unless this
> + * context. This function should always be called before
> + * acpi_ipmi_device destruction.
> + */
> + while (atomic_read(&ipmi->refcnt) > 1) {

Isn't this racy? What if we see that the refcount is 1 and break the loop,
but someone else bumps up the refcount at the same time?

> + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> + list_for_each_entry_safe(tx_msg, temp,
> + &ipmi->tx_msg_list, head) {
> + /* wake up the sleep thread on the Tx msg */
> + complete(&tx_msg->tx_complete);
> + }
> + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> + schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> }
> - if (!list_empty(&ipmi->tx_msg_list))
> - dev_warn(&pnp_dev->dev, "tx msg list is not NULL\n");
> }
>
> static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> @@ -304,22 +379,26 @@ static void ipmi_register_bmc(int iface, struct device *dev)
> {
> struct acpi_ipmi_device *ipmi_device, *temp;
> struct pnp_dev *pnp_dev;
> - ipmi_user_t user;
> int err;
> struct ipmi_smi_info smi_data;
> acpi_handle handle;
>
> err = ipmi_get_smi_info(iface, &smi_data);
> -
> if (err)
> return;
>
> - if (smi_data.addr_src != SI_ACPI) {
> - put_device(smi_data.dev);
> - return;
> - }
> -
> + if (smi_data.addr_src != SI_ACPI)
> + goto err_ref;
> handle = smi_data.addr_info.acpi_info.acpi_handle;
> + if (!handle)
> + goto err_ref;
> + pnp_dev = to_pnp_dev(smi_data.dev);
> +
> + ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> + if (!ipmi_device) {
> + dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> + goto err_ref;
> + }
>
> mutex_lock(&driver_data.ipmi_lock);
> list_for_each_entry(temp, &driver_data.ipmi_devices, head) {
> @@ -328,54 +407,42 @@ static void ipmi_register_bmc(int iface, struct device *dev)
> * to the device list, don't add it again.
> */
> if (temp->handle == handle)
> - goto out;
> + goto err_lock;
> }
>
> - ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
> -
> - if (!ipmi_device)
> - goto out;
> -
> - pnp_dev = to_pnp_dev(smi_data.dev);
> - ipmi_device->handle = handle;
> - ipmi_device->pnp_dev = pnp_dev;
> -
> - err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> - ipmi_device, &user);
> - if (err) {
> - dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> - kfree(ipmi_device);
> - goto out;
> - }
> - acpi_add_ipmi_device(ipmi_device);
> - ipmi_device->user_interface = user;
> - ipmi_device->ipmi_ifnum = iface;
> + list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
> mutex_unlock(&driver_data.ipmi_lock);
> - memcpy(&ipmi_device->smi_data, &smi_data, sizeof(struct ipmi_smi_info));
> + put_device(smi_data.dev);
> return;
>
> -out:
> +err_lock:
> mutex_unlock(&driver_data.ipmi_lock);
> + ipmi_dev_release(ipmi_device);
> +err_ref:
> put_device(smi_data.dev);
> return;
> }
>
> static void ipmi_bmc_gone(int iface)
> {
> - struct acpi_ipmi_device *ipmi_device, *temp;
> + int dev_found = 0;
> + struct acpi_ipmi_device *ipmi_device;
>
> mutex_lock(&driver_data.ipmi_lock);
> - list_for_each_entry_safe(ipmi_device, temp,
> - &driver_data.ipmi_devices, head) {
> - if (ipmi_device->ipmi_ifnum != iface)
> - continue;
> -
> - acpi_remove_ipmi_device(ipmi_device);
> - put_device(ipmi_device->smi_data.dev);
> - kfree(ipmi_device);
> - break;
> + list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
> + if (ipmi_device->ipmi_ifnum == iface) {
> + dev_found = 1;

You can do the list_del() here, because you're under the mutex, so others
won't see the list in an inconsistens state and you're about to break anyway.

> + break;
> + }
> }
> + if (dev_found)
> + list_del(&ipmi_device->head);
> mutex_unlock(&driver_data.ipmi_lock);
> +
> + if (dev_found) {
> + ipmi_flush_tx_msg(ipmi_device);
> + acpi_ipmi_dev_put(ipmi_device);
> + }
> }
>
> /* --------------------------------------------------------------------------
> @@ -400,7 +467,8 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
> void *handler_context, void *region_context)
> {
> struct acpi_ipmi_msg *tx_msg;
> - struct acpi_ipmi_device *ipmi_device = handler_context;
> + int iface = (long)handler_context;
> + struct acpi_ipmi_device *ipmi_device;
> int err, rem_time;
> acpi_status status;
> unsigned long flags;
> @@ -414,12 +482,15 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
> if ((function & ACPI_IO_MASK) == ACPI_READ)
> return AE_TYPE;
>
> - if (!ipmi_device->user_interface)
> + ipmi_device = acpi_ipmi_get_targeted_smi(iface);
> + if (!ipmi_device)
> return AE_NOT_EXIST;
>
> tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
> - if (!tx_msg)
> - return AE_NO_MEMORY;
> + if (!tx_msg) {
> + status = AE_NO_MEMORY;
> + goto out_ref;
> + }
>
> if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
> status = AE_TYPE;
> @@ -449,6 +520,8 @@ out_list:
> spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> out_msg:
> kfree(tx_msg);
> +out_ref:
> + acpi_ipmi_dev_put(ipmi_device);
> return status;
> }
>
> @@ -473,7 +546,7 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
> status = acpi_install_address_space_handler(ipmi->handle,
> ACPI_ADR_SPACE_IPMI,
> &acpi_ipmi_space_handler,
> - NULL, ipmi);
> + NULL, (void *)((long)ipmi->ipmi_ifnum));
> if (ACPI_FAILURE(status)) {
> struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
> @@ -484,36 +557,6 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
> return 0;
> }
>
> -static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device)
> -{
> -
> - INIT_LIST_HEAD(&ipmi_device->head);
> -
> - spin_lock_init(&ipmi_device->tx_msg_lock);
> - INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
> - ipmi_install_space_handler(ipmi_device);
> -
> - list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
> -}
> -
> -static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device)
> -{
> - /*
> - * If the IPMI user interface is created, it should be
> - * destroyed.
> - */
> - if (ipmi_device->user_interface) {
> - ipmi_destroy_user(ipmi_device->user_interface);
> - ipmi_device->user_interface = NULL;
> - }
> - /* flush the Tx_msg list */
> - if (!list_empty(&ipmi_device->tx_msg_list))
> - ipmi_flush_tx_msg(ipmi_device);
> -
> - list_del(&ipmi_device->head);
> - ipmi_remove_space_handler(ipmi_device);
> -}
> -
> static int __init acpi_ipmi_init(void)
> {
> int result = 0;
> @@ -530,7 +573,7 @@ static int __init acpi_ipmi_init(void)
>
> static void __exit acpi_ipmi_exit(void)
> {
> - struct acpi_ipmi_device *ipmi_device, *temp;
> + struct acpi_ipmi_device *ipmi_device;
>
> if (acpi_disabled)
> return;
> @@ -544,11 +587,17 @@ static void __exit acpi_ipmi_exit(void)
> * handler and free it.
> */
> mutex_lock(&driver_data.ipmi_lock);
> - list_for_each_entry_safe(ipmi_device, temp,
> - &driver_data.ipmi_devices, head) {
> - acpi_remove_ipmi_device(ipmi_device);
> - put_device(ipmi_device->smi_data.dev);
> - kfree(ipmi_device);
> + while (!list_empty(&driver_data.ipmi_devices)) {
> + ipmi_device = list_first_entry(&driver_data.ipmi_devices,
> + struct acpi_ipmi_device,
> + head);
> + list_del(&ipmi_device->head);
> + mutex_unlock(&driver_data.ipmi_lock);
> +
> + ipmi_flush_tx_msg(ipmi_device);
> + acpi_ipmi_dev_put(ipmi_device);
> +
> + mutex_lock(&driver_data.ipmi_lock);
> }
> mutex_unlock(&driver_data.ipmi_lock);
> }
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-25 22:13:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers

On Tuesday, July 23, 2013 04:09:54 PM Lv Zheng wrote:
> This patch adds reference counting for ACPI IPMI transfers to tune the
> locking granularity of tx_msg_lock.
>
> The acpi_ipmi_msg handling is re-designed using referece counting.
> 1. tx_msg is always unlinked before complete(), so that:
> 1.1. it is safe to put complete() out side of tx_msg_lock;
> 1.2. complete() can only happen once, thus smp_wmb() is not required.
> 2. Increasing the reference of tx_msg before calling
> ipmi_request_settime() and introducing tx_msg_lock protected
> ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
> tx_msg unlinking in the failure cases.
> 3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
> and freed in the contexts other than acpi_ipmi_space_handler().
>
> The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
> tuning:
> 1. ipmi_lock is always leaf:
> irq_context: 0
> [ffffffff81a943f8] smi_watchers_mutex
> [ffffffffa06eca60] driver_data.ipmi_lock
> irq_context: 0
> [ffffffff82767b40] &buffer->mutex
> [ffffffffa00a6678] s_active#103
> [ffffffffa06eca60] driver_data.ipmi_lock
> 2. without this patch applied, lock used by complete() is held after
> holding tx_msg_lock:
> irq_context: 0
> [ffffffff82767b40] &buffer->mutex
> [ffffffffa00a6678] s_active#103
> [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> irq_context: 1
> [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> irq_context: 1
> [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> [ffffffffa06eccf0] &x->wait#25
> irq_context: 1
> [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> [ffffffffa06eccf0] &x->wait#25
> [ffffffff81e36620] &p->pi_lock
> irq_context: 1
> [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> [ffffffffa06eccf0] &x->wait#25
> [ffffffff81e36620] &p->pi_lock
> [ffffffff81e5d0a8] &rq->lock
> 3. with this patch applied, tx_msg_lock is always leaf:
> irq_context: 0
> [ffffffff82767b40] &buffer->mutex
> [ffffffffa00a66d8] s_active#107
> [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> irq_context: 1
> [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
>
> Signed-off-by: Lv Zheng <[email protected]>
> Cc: Zhao Yakui <[email protected]>
> Reviewed-by: Huang Ying <[email protected]>
> ---
> drivers/acpi/acpi_ipmi.c | 107 +++++++++++++++++++++++++++++++++-------------
> 1 file changed, 77 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> index 2a09156..0ee1ea6 100644
> --- a/drivers/acpi/acpi_ipmi.c
> +++ b/drivers/acpi/acpi_ipmi.c
> @@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
> u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
> u8 rx_len;
> struct acpi_ipmi_device *device;
> + atomic_t refcnt;

Again: kref, please?

> };
>
> /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
> @@ -195,22 +196,47 @@ static struct acpi_ipmi_device *acpi_ipmi_get_selected_smi(void)
> return ipmi_device;
> }
>
> -static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
> +static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
> {
> + struct acpi_ipmi_device *ipmi;
> struct acpi_ipmi_msg *ipmi_msg;
> - struct pnp_dev *pnp_dev = ipmi->pnp_dev;
>
> + ipmi = acpi_ipmi_get_selected_smi();
> + if (!ipmi)
> + return NULL;
> ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
> - if (!ipmi_msg) {
> - dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
> + if (!ipmi_msg) {
> + acpi_ipmi_dev_put(ipmi);
> return NULL;
> }
> + atomic_set(&ipmi_msg->refcnt, 1);
> init_completion(&ipmi_msg->tx_complete);
> INIT_LIST_HEAD(&ipmi_msg->head);
> ipmi_msg->device = ipmi;
> +
> return ipmi_msg;
> }
>
> +static struct acpi_ipmi_msg *
> +acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
> +{
> + if (tx_msg)
> + atomic_inc(&tx_msg->refcnt);
> + return tx_msg;
> +}
> +
> +static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
> +{
> + acpi_ipmi_dev_put(tx_msg->device);
> + kfree(tx_msg);
> +}
> +
> +static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
> +{
> + if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
> + ipmi_msg_release(tx_msg);
> +}
> +
> #define IPMI_OP_RGN_NETFN(offset) ((offset >> 8) & 0xff)
> #define IPMI_OP_RGN_CMD(offset) (offset & 0xff)
> static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
> @@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
>
> static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
> {
> - struct acpi_ipmi_msg *tx_msg, *temp;
> + struct acpi_ipmi_msg *tx_msg;
> unsigned long flags;
>
> /*
> @@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
> */
> while (atomic_read(&ipmi->refcnt) > 1) {
> spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> - list_for_each_entry_safe(tx_msg, temp,
> - &ipmi->tx_msg_list, head) {
> + while (!list_empty(&ipmi->tx_msg_list)) {
> + tx_msg = list_first_entry(&ipmi->tx_msg_list,
> + struct acpi_ipmi_msg,
> + head);
> + list_del(&tx_msg->head);
> + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> +
> /* wake up the sleep thread on the Tx msg */
> complete(&tx_msg->tx_complete);
> + acpi_ipmi_msg_put(tx_msg);
> + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> }
> spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> +
> schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> }
> }
>
> +static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
> + struct acpi_ipmi_msg *msg)
> +{
> + struct acpi_ipmi_msg *tx_msg;
> + int msg_found = 0;

Use bool?

> + unsigned long flags;
> +
> + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> + list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
> + if (msg == tx_msg) {
> + msg_found = 1;
> + break;
> + }
> + }
> + if (msg_found)
> + list_del(&tx_msg->head);

The list_del() can be done when you set msg_found.

> + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> +
> + if (msg_found)
> + acpi_ipmi_msg_put(tx_msg);
> +}
> +
> static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> {
> struct acpi_ipmi_device *ipmi_device = user_msg_data;
> @@ -343,12 +399,15 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> break;
> }
> }
> + if (msg_found)
> + list_del(&tx_msg->head);
> + spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>
> if (!msg_found) {
> dev_warn(&pnp_dev->dev,
> "Unexpected response (msg id %ld) is returned.\n",
> msg->msgid);
> - goto out_lock;
> + goto out_msg;
> }
>
> /* copy the response data to Rx_data buffer */
> @@ -360,14 +419,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> }
> tx_msg->rx_len = msg->msg.data_len;
> memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> - /* tx_msg content must be valid before setting msg_done flag */
> - smp_wmb();
> tx_msg->msg_done = 1;
>
> out_comp:
> complete(&tx_msg->tx_complete);
> -out_lock:
> - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> + acpi_ipmi_msg_put(tx_msg);
> out_msg:
> ipmi_free_recv_msg(msg);
> }
> @@ -493,21 +549,17 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
> if ((function & ACPI_IO_MASK) == ACPI_READ)
> return AE_TYPE;
>
> - ipmi_device = acpi_ipmi_get_selected_smi();
> - if (!ipmi_device)
> + tx_msg = ipmi_msg_alloc();
> + if (!tx_msg)
> return AE_NOT_EXIST;
> -
> - tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
> - if (!tx_msg) {
> - status = AE_NO_MEMORY;
> - goto out_ref;
> - }
> + ipmi_device = tx_msg->device;
>
> if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
> - status = AE_TYPE;
> - goto out_msg;
> + ipmi_msg_release(tx_msg);
> + return AE_TYPE;
> }
>
> + acpi_ipmi_msg_get(tx_msg);
> spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
> list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
> spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> @@ -518,21 +570,16 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
> NULL, 0, 0, 0);
> if (err) {
> status = AE_ERROR;
> - goto out_list;
> + goto out_msg;
> }
> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> IPMI_TIMEOUT);
> acpi_format_ipmi_response(tx_msg, value, rem_time);
> status = AE_OK;
>
> -out_list:
> - spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
> - list_del(&tx_msg->head);
> - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> out_msg:
> - kfree(tx_msg);
> -out_ref:
> - acpi_ipmi_dev_put(ipmi_device);
> + ipmi_cancel_tx_msg(ipmi_device, tx_msg);
> + acpi_ipmi_msg_put(tx_msg);
> return status;
> }
>
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-25 22:15:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members

On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> This is a trivial patch:
> 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
> actually used.
> 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
> by dev_warn() invocations, so changes it to struct device.
>
> Signed-off-by: Lv Zheng <[email protected]>
> Reviewed-by: Huang Ying <[email protected]>
> ---
> drivers/acpi/acpi_ipmi.c | 30 ++++++++++++++----------------
> 1 file changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> index 0ee1ea6..7f93ffd 100644
> --- a/drivers/acpi/acpi_ipmi.c
> +++ b/drivers/acpi/acpi_ipmi.c
> @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
> struct list_head tx_msg_list;
> spinlock_t tx_msg_lock;
> acpi_handle handle;
> - struct pnp_dev *pnp_dev;
> + struct device *dev;
> ipmi_user_t user_interface;
> int ipmi_ifnum; /* IPMI interface number */
> long curr_msgid;
> - struct ipmi_smi_info smi_data;
> atomic_t refcnt;
> };
>
> @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {
> };
>
> static struct acpi_ipmi_device *
> -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
> +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)

Why is the second arg called pdev?

> {
> struct acpi_ipmi_device *ipmi_device;
> int err;
> @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
> spin_lock_init(&ipmi_device->tx_msg_lock);
>
> ipmi_device->handle = handle;
> - ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> - memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
> + ipmi_device->dev = get_device(pdev);
> ipmi_device->ipmi_ifnum = iface;
>
> err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> ipmi_device, &user);
> if (err) {
> - put_device(smi_data->dev);
> + put_device(pdev);
> kfree(ipmi_device);
> return NULL;
> }
> @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
> static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
> {
> ipmi_destroy_user(ipmi_device->user_interface);
> - put_device(ipmi_device->smi_data.dev);
> + put_device(ipmi_device->dev);
> kfree(ipmi_device);
> }
>
> @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
> buffer = (struct acpi_ipmi_buffer *)value;
> /* copy the tx message data */
> if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> - dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> + dev_WARN_ONCE(tx_msg->device->dev, true,
> "Unexpected request (msg len %d).\n",
> buffer->length);
> return -EINVAL;
> @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> struct acpi_ipmi_device *ipmi_device = user_msg_data;
> int msg_found = 0;
> struct acpi_ipmi_msg *tx_msg;
> - struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> + struct device *dev = ipmi_device->dev;
> unsigned long flags;
>
> if (msg->user != ipmi_device->user_interface) {
> - dev_warn(&pnp_dev->dev,
> + dev_warn(dev,
> "Unexpected response is returned. returned user %p, expected user %p\n",
> msg->user, ipmi_device->user_interface);
> goto out_msg;
> @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>
> if (!msg_found) {
> - dev_warn(&pnp_dev->dev,
> + dev_warn(dev,
> "Unexpected response (msg id %ld) is returned.\n",
> msg->msgid);
> goto out_msg;
> @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
>
> /* copy the response data to Rx_data buffer */
> if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> - dev_WARN_ONCE(&pnp_dev->dev, true,
> + dev_WARN_ONCE(dev, true,
> "Unexpected response (msg len %d).\n",
> msg->msg.data_len);
> goto out_comp;
> @@ -431,7 +429,7 @@ out_msg:
> static void ipmi_register_bmc(int iface, struct device *dev)
> {
> struct acpi_ipmi_device *ipmi_device, *temp;
> - struct pnp_dev *pnp_dev;
> + struct device *pdev;

And here?

> int err;
> struct ipmi_smi_info smi_data;
> acpi_handle handle;
> @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct device *dev)
> handle = smi_data.addr_info.acpi_info.acpi_handle;
> if (!handle)
> goto err_ref;
> - pnp_dev = to_pnp_dev(smi_data.dev);
> + pdev = smi_data.dev;
>
> - ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> + ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
> if (!ipmi_device) {
> - dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> + dev_warn(pdev, "Can't create IPMI user interface\n");
> goto err_ref;
> }
>
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-26 00:10:17

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers



> From: Rafael J. Wysocki [mailto:[email protected]]
> Sent: Thursday, July 25, 2013 8:07 PM
>
> On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> > -stable according to the previous conversation.
> >
> > > From: Rafael J. Wysocki [mailto:[email protected]]
> > > Sent: Thursday, July 25, 2013 7:38 AM
> > >
> > > On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > > > This patch fixes races caused by unprotected ACPI IPMI transfers.
> > > >
> > > > We can see the following crashes may occur:
> > > > 1. There is no tx_msg_lock held for iterating tx_msg_list in
> > > > ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> > > > acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > > > 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> > > > while it is parellel accessed in ipmi_flush_tx_msg() and
> > > > ipmi_msg_handler().
> > > >
> > > > This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > > > solve this issue. Then tx_msg_lock is always held around
> > > > complete() and tx_msg accesses.
> > > > Calling smp_wmb() before setting msg_done flag so that messages
> > > > completed due to flushing will not be handled as 'done' messages
> > > > while their contents are not vaild.
> > > >
> > > > Signed-off-by: Lv Zheng <[email protected]>
> > > > Cc: Zhao Yakui <[email protected]>
> > > > Reviewed-by: Huang Ying <[email protected]>
> > > > ---
> > > > drivers/acpi/acpi_ipmi.c | 10 ++++++++--
> > > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > > > index
> > > > b37c189..527ee43 100644
> > > > --- a/drivers/acpi/acpi_ipmi.c
> > > > +++ b/drivers/acpi/acpi_ipmi.c
> > > > @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> > > acpi_ipmi_device *ipmi)
> > > > struct acpi_ipmi_msg *tx_msg, *temp;
> > > > int count = HZ / 10;
> > > > struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > > > + unsigned long flags;
> > > >
> > > > + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > > list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> > > > /* wake up the sleep thread on the Tx msg */
> > > > complete(&tx_msg->tx_complete);
> > > > }
> > > > + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > > >
> > > > /* wait for about 100ms to flush the tx message list */
> > > > while (count--) {
> > > > @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> > > ipmi_recv_msg *msg, void *user_msg_data)
> > > > break;
> > > > }
> > > > }
> > > > - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > > >
> > > > if (!msg_found) {
> > > > dev_warn(&pnp_dev->dev,
> > > > "Unexpected response (msg id %ld) is returned.\n",
> > > > msg->msgid);
> > > > - goto out_msg;
> > > > + goto out_lock;
> > > > }
> > > >
> > > > /* copy the response data to Rx_data buffer */ @@ -286,10
> > > > +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg,
> > > > void
> > > *user_msg_data)
> > > > }
> > > > tx_msg->rx_len = msg->msg.data_len;
> > > > memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > > > + /* tx_msg content must be valid before setting msg_done flag */
> > > > + smp_wmb();
> > >
> > > That's suspicious.
> > >
> > > If you need the write barrier here, you'll most likely need a read
> > > barrier somewhere else. Where's that?
> >
> > It might depend on whether the content written before the smp_wmb() is
> used or not by the other side codes under the condition set after the
> smp_wmb().
> >
> > So comment could be treated as 2 parts:
> > 1. do we need a paired smp_rmb().
> > 2. do we need a smp_wmb().
> >
> > For 1.
> > If we want a paired smp_rmb(), then it will appear in this function:
> >
> > 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> > 187 acpi_integer *value, int rem_time)
> > 188 {
> > 189 struct acpi_ipmi_buffer *buffer;
> > 190
> > 191 /*
> > 192 * value is also used as output parameter. It represents the
> response
> > 193 * IPMI message returned by IPMI command.
> > 194 */
> > 195 buffer = (struct acpi_ipmi_buffer *)value;
> > 196 if (!rem_time && !msg->msg_done) {
> > 197 buffer->status = ACPI_IPMI_TIMEOUT;
> > 198 return;
> > 199 }
> > 200 /*
> > 201 * If the flag of msg_done is not set or the recv length is zero,
> it
> > 202 * means that the IPMI command is not executed correctly.
> > 203 * The status code will be ACPI_IPMI_UNKNOWN.
> > 204 */
> > 205 if (!msg->msg_done || !msg->rx_len) {
> > 206 buffer->status = ACPI_IPMI_UNKNOWN;
> > 207 return;
> > 208 }
> > + smp_rmb();
> > 209 /*
> > 210 * If the IPMI response message is obtained correctly, the
> status code
> > 211 * will be ACPI_IPMI_OK
> > 212 */
> > 213 buffer->status = ACPI_IPMI_OK;
> > 214 buffer->length = msg->rx_len;
> > 215 memcpy(buffer->data, msg->rx_data, msg->rx_len);
> > 216 }
> >
> > If we don't then there will only be msg content not correctly read from
> msg->rx_data.
> > Note that the rx_len is 0 during initialization and will never exceed the
> sizeof(buffer->data), so the read is safe.
> >
> > Being without smp_rmb() is also OK in this case, since:
> > 1. buffer->data will never be used when buffer->status is not
> > ACPI_IPMI_OK and 2. the smp_rmb()/smp_wmb() added in this patch will be
> deleted in [PATCH 07].
> >
> > So IMO, we needn't add the smp_rmb(), what do you think of this?
> >
> > For 2.
> > If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes
> running on other thread in the acpi_format_ipmi_response() may read wrong
> msg->rx_data (a timeout triggers this function, but when
> acpi_format_ipmi_response() is entered, the msg->msg_done flag could be
> seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in
> this quick fix.
>
> Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
> because each of them prevents only one flow of control from being
> speculatively reordered, either by the CPU or by the compiler. If only one of
> them is used without the other, then the flow of control without the barrier
> may be reordered in a way that will effectively cancel the effect of the barrier in
> the second flow of control.
>
> So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them
> at all.

I think you are right, it is about order of L1 cache flushing.

The smp_wmb()/smp_rmb() is not a useful approach for non-tuning implementations.

It's here because the code used a combined programing model, and thus a bug.
Such bugs can be avoided by:
1. either using bigger granularity locks, in this case, tx_msg_lock should be held around acpi_format_ipmi_response()
2. or using smaller granularity locks, races are automatically avoided by the excluded running flows (like what the PATCH 07 shows)

I'll update this patch or even drop it.

Thanks and best regards
-Lv

>
> Thanks,
> Rafael
>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 00:17:15

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers

> From: Corey Minyard [mailto:[email protected]]
> Sent: Friday, July 26, 2013 2:13 AM
>
> On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> > On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> >> -stable according to the previous conversation.
> >>
> >>> From: Rafael J. Wysocki [mailto:[email protected]]
> >>> Sent: Thursday, July 25, 2013 7:38 AM
> >>>
> >>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> >>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
> >>>>
> >>>> We can see the following crashes may occur:
> >>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
> >>>> ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> >>>> acpi_ipmi_space_handler() under protection of tx_msg_lock.
> >>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> >>>> while it is parellel accessed in ipmi_flush_tx_msg() and
> >>>> ipmi_msg_handler().
> >>>>
> >>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
> >>>> solve this issue. Then tx_msg_lock is always held around
> >>>> complete() and tx_msg accesses.
> >>>> Calling smp_wmb() before setting msg_done flag so that messages
> >>>> completed due to flushing will not be handled as 'done' messages
> >>>> while their contents are not vaild.
> >>>>
> >>>> Signed-off-by: Lv Zheng <[email protected]>
> >>>> Cc: Zhao Yakui <[email protected]>
> >>>> Reviewed-by: Huang Ying <[email protected]>
> >>>> ---
> >>>> drivers/acpi/acpi_ipmi.c | 10 ++++++++--
> >>>> 1 file changed, 8 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> >>>> index
> >>>> b37c189..527ee43 100644
> >>>> --- a/drivers/acpi/acpi_ipmi.c
> >>>> +++ b/drivers/acpi/acpi_ipmi.c
> >>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> >>> acpi_ipmi_device *ipmi)
> >>>> struct acpi_ipmi_msg *tx_msg, *temp;
> >>>> int count = HZ / 10;
> >>>> struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >>>> + unsigned long flags;
> >>>>
> >>>> + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> >>>> list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> >>>> /* wake up the sleep thread on the Tx msg */
> >>>> complete(&tx_msg->tx_complete);
> >>>> }
> >>>> + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> >>>>
> >>>> /* wait for about 100ms to flush the tx message list */
> >>>> while (count--) {
> >>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> >>> ipmi_recv_msg *msg, void *user_msg_data)
> >>>> break;
> >>>> }
> >>>> }
> >>>> - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >>>>
> >>>> if (!msg_found) {
> >>>> dev_warn(&pnp_dev->dev,
> >>>> "Unexpected response (msg id %ld) is returned.\n",
> >>>> msg->msgid);
> >>>> - goto out_msg;
> >>>> + goto out_lock;
> >>>> }
> >>>>
> >>>> /* copy the response data to Rx_data buffer */ @@ -286,10
> >>>> +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg,
> >>>> void
> >>> *user_msg_data)
> >>>> }
> >>>> tx_msg->rx_len = msg->msg.data_len;
> >>>> memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> >>>> + /* tx_msg content must be valid before setting msg_done flag */
> >>>> + smp_wmb();
> >>> That's suspicious.
> >>>
> >>> If you need the write barrier here, you'll most likely need a read
> >>> barrier somewhere else. Where's that?
> >> It might depend on whether the content written before the smp_wmb() is
> used or not by the other side codes under the condition set after the
> smp_wmb().
> >>
> >> So comment could be treated as 2 parts:
> >> 1. do we need a paired smp_rmb().
> >> 2. do we need a smp_wmb().
> >>
> >> For 1.
> >> If we want a paired smp_rmb(), then it will appear in this function:
> >>
> >> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> >> 187 acpi_integer *value, int rem_time)
> >> 188 {
> >> 189 struct acpi_ipmi_buffer *buffer;
> >> 190
> >> 191 /*
> >> 192 * value is also used as output parameter. It represents the
> response
> >> 193 * IPMI message returned by IPMI command.
> >> 194 */
> >> 195 buffer = (struct acpi_ipmi_buffer *)value;
> >> 196 if (!rem_time && !msg->msg_done) {
> >> 197 buffer->status = ACPI_IPMI_TIMEOUT;
> >> 198 return;
> >> 199 }
> >> 200 /*
> >> 201 * If the flag of msg_done is not set or the recv length is zero,
> it
> >> 202 * means that the IPMI command is not executed correctly.
> >> 203 * The status code will be ACPI_IPMI_UNKNOWN.
> >> 204 */
> >> 205 if (!msg->msg_done || !msg->rx_len) {
> >> 206 buffer->status = ACPI_IPMI_UNKNOWN;
> >> 207 return;
> >> 208 }
> >> + smp_rmb();
> >> 209 /*
> >> 210 * If the IPMI response message is obtained correctly, the
> status code
> >> 211 * will be ACPI_IPMI_OK
> >> 212 */
> >> 213 buffer->status = ACPI_IPMI_OK;
> >> 214 buffer->length = msg->rx_len;
> >> 215 memcpy(buffer->data, msg->rx_data, msg->rx_len);
> >> 216 }
> >>
> >> If we don't then there will only be msg content not correctly read from
> msg->rx_data.
> >> Note that the rx_len is 0 during initialization and will never exceed the
> sizeof(buffer->data), so the read is safe.
> >>
> >> Being without smp_rmb() is also OK in this case, since:
> >> 1. buffer->data will never be used when buffer->status is not
> >> ACPI_IPMI_OK and 2. the smp_rmb()/smp_wmb() added in this patch will be
> deleted in [PATCH 07].
> >>
> >> So IMO, we needn't add the smp_rmb(), what do you think of this?
> >>
> >> For 2.
> >> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes
> running on other thread in the acpi_format_ipmi_response() may read wrong
> msg->rx_data (a timeout triggers this function, but when
> acpi_format_ipmi_response() is entered, the msg->msg_done flag could be
> seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in
> this quick fix.
> > Using smp_wmb() without the complementary smp_rmb() doesn't makes
> > sense, because each of them prevents only one flow of control from
> > being speculatively reordered, either by the CPU or by the compiler.
> > If only one of them is used without the other, then the flow of
> > control without the barrier may be reordered in a way that will
> > effectively cancel the effect of the barrier in the second flow of control.
> >
> > So, either we need *both* smp_wmb() and smp_rmb(), or we don't need
> them at all.
>
> If I understand this correctly, the problem would be if:
>
> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> IPMI_TIMEOUT);
>
> returns on a timeout, then checks msg_done and races with something setting
> msg_done. If that is the case, you would need the smp_rmb() before checking
> msg_done.
>
> However, the timeout above is unnecessary. You are using
> ipmi_request_settime(), so you can set the timeout when the IPMI command
> fails and returns a failure message. The driver guarantees a return message
> for each request. Just remove the timeout from the completion, set the
> timeout and retries in the ipmi request, and the completion should handle the
> barrier issues.

It's just difficult for me to determine retry count and timeout value, maybe retry=0, timeout=IPMI_TIMEOUT is OK.
The code of the timeout completion is already there, I think the quick fix code should not introduce this logic.
I'll add a new patch to apply your comment.

>
> Plus, from a quick glance at the code, it doesn't look like it will properly handle a
> situation where the timeout occurs and is handled then the response comes in
> later.

PATCH 07 fixed this issue.
Here we just need the smp_rmb() or holding tx_msg_lock() around the acpi_format_ipmi_response().

Thanks for commenting.

Best regards
-Lv

>
> -corey
>
> >
> > Thanks,
> > Rafael
> >
> >

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 00:18:20

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers

> From: [email protected]
> [mailto:[email protected]] On Behalf Of Rafael J. Wysocki
> Sent: Friday, July 26, 2013 3:33 AM
>
> On Thursday, July 25, 2013 01:12:38 PM Corey Minyard wrote:
> > On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> > > On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> > >> -stable according to the previous conversation.
> > >>
> > >>> From: Rafael J. Wysocki [mailto:[email protected]]
> > >>> Sent: Thursday, July 25, 2013 7:38 AM
> > >>>
> > >>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > >>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
> > >>>>
> > >>>> We can see the following crashes may occur:
> > >>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
> > >>>> ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> > >>>> acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > >>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> > >>>> while it is parellel accessed in ipmi_flush_tx_msg() and
> > >>>> ipmi_msg_handler().
> > >>>>
> > >>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > >>>> solve this issue. Then tx_msg_lock is always held around
> > >>>> complete() and tx_msg accesses.
> > >>>> Calling smp_wmb() before setting msg_done flag so that messages
> > >>>> completed due to flushing will not be handled as 'done' messages
> > >>>> while their contents are not vaild.
> > >>>>
> > >>>> Signed-off-by: Lv Zheng <[email protected]>
> > >>>> Cc: Zhao Yakui <[email protected]>
> > >>>> Reviewed-by: Huang Ying <[email protected]>
> > >>>> ---
> > >>>> drivers/acpi/acpi_ipmi.c | 10 ++++++++--
> > >>>> 1 file changed, 8 insertions(+), 2 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > >>>> index
> > >>>> b37c189..527ee43 100644
> > >>>> --- a/drivers/acpi/acpi_ipmi.c
> > >>>> +++ b/drivers/acpi/acpi_ipmi.c
> > >>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> > >>> acpi_ipmi_device *ipmi)
> > >>>> struct acpi_ipmi_msg *tx_msg, *temp;
> > >>>> int count = HZ / 10;
> > >>>> struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > >>>> + unsigned long flags;
> > >>>>
> > >>>> + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > >>>> list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list,
> head) {
> > >>>> /* wake up the sleep thread on the Tx msg */
> > >>>> complete(&tx_msg->tx_complete);
> > >>>> }
> > >>>> + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > >>>>
> > >>>> /* wait for about 100ms to flush the tx message list */
> > >>>> while (count--) {
> > >>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> > >>> ipmi_recv_msg *msg, void *user_msg_data)
> > >>>> break;
> > >>>> }
> > >>>> }
> > >>>> - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > >>>>
> > >>>> if (!msg_found) {
> > >>>> dev_warn(&pnp_dev->dev,
> > >>>> "Unexpected response (msg id %ld) is returned.\n",
> > >>>> msg->msgid);
> > >>>> - goto out_msg;
> > >>>> + goto out_lock;
> > >>>> }
> > >>>>
> > >>>> /* copy the response data to Rx_data buffer */ @@ -286,10
> > >>>> +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > >>>> *msg, void
> > >>> *user_msg_data)
> > >>>> }
> > >>>> tx_msg->rx_len = msg->msg.data_len;
> > >>>> memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > >>>> + /* tx_msg content must be valid before setting msg_done flag */
> > >>>> + smp_wmb();
> > >>> That's suspicious.
> > >>>
> > >>> If you need the write barrier here, you'll most likely need a read
> > >>> barrier somewhere else. Where's that?
> > >> It might depend on whether the content written before the smp_wmb() is
> used or not by the other side codes under the condition set after the
> smp_wmb().
> > >>
> > >> So comment could be treated as 2 parts:
> > >> 1. do we need a paired smp_rmb().
> > >> 2. do we need a smp_wmb().
> > >>
> > >> For 1.
> > >> If we want a paired smp_rmb(), then it will appear in this function:
> > >>
> > >> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> > >> 187 acpi_integer *value, int rem_time)
> > >> 188 {
> > >> 189 struct acpi_ipmi_buffer *buffer;
> > >> 190
> > >> 191 /*
> > >> 192 * value is also used as output parameter. It represents the
> response
> > >> 193 * IPMI message returned by IPMI command.
> > >> 194 */
> > >> 195 buffer = (struct acpi_ipmi_buffer *)value;
> > >> 196 if (!rem_time && !msg->msg_done) {
> > >> 197 buffer->status = ACPI_IPMI_TIMEOUT;
> > >> 198 return;
> > >> 199 }
> > >> 200 /*
> > >> 201 * If the flag of msg_done is not set or the recv length is
> zero, it
> > >> 202 * means that the IPMI command is not executed correctly.
> > >> 203 * The status code will be ACPI_IPMI_UNKNOWN.
> > >> 204 */
> > >> 205 if (!msg->msg_done || !msg->rx_len) {
> > >> 206 buffer->status = ACPI_IPMI_UNKNOWN;
> > >> 207 return;
> > >> 208 }
> > >> + smp_rmb();
> > >> 209 /*
> > >> 210 * If the IPMI response message is obtained correctly, the
> status code
> > >> 211 * will be ACPI_IPMI_OK
> > >> 212 */
> > >> 213 buffer->status = ACPI_IPMI_OK;
> > >> 214 buffer->length = msg->rx_len;
> > >> 215 memcpy(buffer->data, msg->rx_data, msg->rx_len);
> > >> 216 }
> > >>
> > >> If we don't then there will only be msg content not correctly read from
> msg->rx_data.
> > >> Note that the rx_len is 0 during initialization and will never exceed the
> sizeof(buffer->data), so the read is safe.
> > >>
> > >> Being without smp_rmb() is also OK in this case, since:
> > >> 1. buffer->data will never be used when buffer->status is not
> > >> ACPI_IPMI_OK and 2. the smp_rmb()/smp_wmb() added in this patch will
> be deleted in [PATCH 07].
> > >>
> > >> So IMO, we needn't add the smp_rmb(), what do you think of this?
> > >>
> > >> For 2.
> > >> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes
> running on other thread in the acpi_format_ipmi_response() may read wrong
> msg->rx_data (a timeout triggers this function, but when
> acpi_format_ipmi_response() is entered, the msg->msg_done flag could be
> seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in
> this quick fix.
> > > Using smp_wmb() without the complementary smp_rmb() doesn't makes
> > > sense, because each of them prevents only one flow of control from
> > > being speculatively reordered, either by the CPU or by the compiler.
> > > If only one of them is used without the other, then the flow of
> > > control without the barrier may be reordered in a way that will
> > > effectively cancel the effect of the barrier in the second flow of control.
> > >
> > > So, either we need *both* smp_wmb() and smp_rmb(), or we don't need
> them at all.
> >
> > If I understand this correctly, the problem would be if:
> >
> > rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> > IPMI_TIMEOUT);
> >
> > returns on a timeout, then checks msg_done and races with something
> > setting msg_done. If that is the case, you would need the smp_rmb()
> > before checking msg_done.
>
> I believe so.
>
> > However, the timeout above is unnecessary. You are using
> > ipmi_request_settime(), so you can set the timeout when the IPMI
> > command fails and returns a failure message. The driver guarantees a
> > return message for each request. Just remove the timeout from the
> > completion, set the timeout and retries in the ipmi request, and the
> > completion should handle the barrier issues.
>
> Good point.
>
> > Plus, from a quick glance at the code, it doesn't look like it will
> > properly handle a situation where the timeout occurs and is handled
> > then the response comes in later.
>
> Lv, what about this?

Please refer to my reply to Corey's comment. :-)

Thanks and best regards
-Lv

>
> Rafael
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body
> of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 00:47:54

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers



> From: Rafael J. Wysocki [mailto:[email protected]]
> Sent: Friday, July 26, 2013 4:27 AM
>
> On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > This patch adds reference couting for ACPI operation region handlers
> > to fix races caused by the ACPICA address space callback invocations.
> >
> > ACPICA address space callback invocation is not suitable for Linux
> > CONFIG_MODULE=y execution environment. This patch tries to protect
> > the address space callbacks by invoking them under a module safe
> environment.
> > The IPMI address space handler is also upgraded in this patch.
> > The acpi_unregister_region() is designed to meet the following
> > requirements:
> > 1. It acts as a barrier for operation region callbacks - no callback will
> > happen after acpi_unregister_region().
> > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > functions.
> > Using reference counting rather than module referencing allows such
> > benefits to be achieved even when acpi_unregister_region() is called
> > in the environments other than module->exit().
> > The header file of include/acpi/acpi_bus.h should contain the
> > declarations that have references to some ACPICA defined types.
> >
> > Signed-off-by: Lv Zheng <[email protected]>
> > Reviewed-by: Huang Ying <[email protected]>
> > ---
> > drivers/acpi/acpi_ipmi.c | 16 ++--
> > drivers/acpi/osl.c | 224
> ++++++++++++++++++++++++++++++++++++++++++++++
> > include/acpi/acpi_bus.h | 5 ++
> > 3 files changed, 235 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > 5f8f495..2a09156 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -539,20 +539,18 @@ out_ref:
> > static int __init acpi_ipmi_init(void) {
> > int result = 0;
> > - acpi_status status;
> >
> > if (acpi_disabled)
> > return result;
> >
> > mutex_init(&driver_data.ipmi_lock);
> >
> > - status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > - ACPI_ADR_SPACE_IPMI,
> > - &acpi_ipmi_space_handler,
> > - NULL, NULL);
> > - if (ACPI_FAILURE(status)) {
> > + result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > + &acpi_ipmi_space_handler,
> > + NULL, NULL);
> > + if (result) {
> > pr_warn("Can't register IPMI opregion space handle\n");
> > - return -EINVAL;
> > + return result;
> > }
> >
> > result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > }
> > mutex_unlock(&driver_data.ipmi_lock);
> >
> > - acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > - ACPI_ADR_SPACE_IPMI,
> > - &acpi_ipmi_space_handler);
> > + acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > }
> >
> > module_init(acpi_ipmi_init);
> > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > 6ab2c35..8398e51 100644
> > --- a/drivers/acpi/osl.c
> > +++ b/drivers/acpi/osl.c
> > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq; static
> > struct workqueue_struct *kacpi_notify_wq; static struct
> > workqueue_struct *kacpi_hotplug_wq;
> >
> > +struct acpi_region {
> > + unsigned long flags;
> > +#define ACPI_REGION_DEFAULT 0x01
> > +#define ACPI_REGION_INSTALLED 0x02
> > +#define ACPI_REGION_REGISTERED 0x04
> > +#define ACPI_REGION_UNREGISTERING 0x08
> > +#define ACPI_REGION_INSTALLING 0x10
>
> What about (1UL << 1), (1UL << 2) etc.?
>
> Also please remove the #defines out of the struct definition.

OK.

>
> > + /*
> > + * NOTE: Upgrading All Region Handlers
> > + * This flag is only used during the period where not all of the
> > + * region handers are upgraded to the new interfaces.
> > + */
> > +#define ACPI_REGION_MANAGED 0x80
> > + acpi_adr_space_handler handler;
> > + acpi_adr_space_setup setup;
> > + void *context;
> > + /* Invoking references */
> > + atomic_t refcnt;
>
> Actually, why don't you use krefs?

If you take a look at other piece of my codes, you'll find there are two reasons:

1. I'm using while (atomic_read() > 1) to implement the objects' flushing and there is no kref API to do so.
I just think it is not suitable for me to introduce such an API into kref.h and start another argument around kref designs in this bug fix patch. :-)
I'll start a discussion about kref design using another thread.
2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's kind of atomic_t coding style.
If atomic_t is changed to struct kref, I will need to implement two API, __ipmi_dev_release() to take a struct kref as parameter and call ipmi_dev_release inside it.
By not using kref, I needn't write codes to implement such API.

>
> > +};
> > +
> > +static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> = {
> > + [ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > + .flags = ACPI_REGION_DEFAULT,
> > + },
> > + [ACPI_ADR_SPACE_SYSTEM_IO] = {
> > + .flags = ACPI_REGION_DEFAULT,
> > + },
> > + [ACPI_ADR_SPACE_PCI_CONFIG] = {
> > + .flags = ACPI_REGION_DEFAULT,
> > + },
> > + [ACPI_ADR_SPACE_IPMI] = {
> > + .flags = ACPI_REGION_MANAGED,
> > + },
> > +};
> > +static DEFINE_MUTEX(acpi_mutex_region);
> > +
> > /*
> > * This list of permanent mappings is for memory that may be accessed
> from
> > * interrupt context, where we can't do the ioremap().
> > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle,
> u32 type, void *context,
> > kfree(hp_work);
> > }
> > EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > +
> > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > + /*
> > + * NOTE: Default and Managed
> > + * We only need to avoid region management on the regions managed
> > + * by ACPICA (ACPI_REGION_DEFAULT). Currently, we need additional
> > + * check as many operation region handlers are not upgraded, so
> > + * only those known to be safe are managed (ACPI_REGION_MANAGED).
> > + */
> > + return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > + (rgn->flags & ACPI_REGION_MANAGED); }
> > +
> > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > + return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > + !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > +
> > +static acpi_status
> > +acpi_region_default_handler(u32 function,
> > + acpi_physical_address address,
> > + u32 bit_width, u64 *value,
> > + void *handler_context, void *region_context) {
> > + acpi_adr_space_handler handler;
> > + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > + void *context;
> > + acpi_status status = AE_NOT_EXIST;
> > +
> > + mutex_lock(&acpi_mutex_region);
> > + if (!acpi_region_callable(rgn) || !rgn->handler) {
> > + mutex_unlock(&acpi_mutex_region);
> > + return status;
> > + }
> > +
> > + atomic_inc(&rgn->refcnt);
> > + handler = rgn->handler;
> > + context = rgn->context;
> > + mutex_unlock(&acpi_mutex_region);
> > +
> > + status = handler(function, address, bit_width, value, context,
> > + region_context);
>
> Why don't we call the handler under the mutex?
>
> What exactly prevents context from becoming NULL before the call above?

It's a kind of programming style related concern.
IMO, using locks around callback function is a buggy programming style that could lead to dead locks.
Let me explain this using an example.

Object A exports a register/unregister API for other objects.
Object B calls A's register/unregister API to register/unregister B's callback.
It's likely that object B will hold lock_of_B around unregister/register when object B is destroyed/created, the lock_of_B is likely also used inside the callback.
So when object A holds the lock_of_A around the callback invocation, it leads to dead lock since:
1. the locking order for the register/unregister side will be: lock(lock_of_B), lock(lock_of_A)
2. the locking order for the callback side will be: lock(lock_of_A), lock(lock_of_B)
They are in the reversed order!

IMO, Linux may need to introduce __callback, __api as decelerators for the functions, and use sparse to enforce this rule, sparse knows if a callback is invoked under some locks.

In the case of ACPICA space_handlers, as you may know, when an ACPI operation region handler is invoked, there will be no lock held inside ACPICA (interpreter lock must be freed before executing operation region handlers).
So the likelihood of the dead lock is pretty much high here!

>
> > + atomic_dec(&rgn->refcnt);
> > +
> > + return status;
> > +}
> > +
> > +static acpi_status
> > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > + void *handler_context, void **region_context) {
> > + acpi_adr_space_setup setup;
> > + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > + void *context;
> > + acpi_status status = AE_OK;
> > +
> > + mutex_lock(&acpi_mutex_region);
> > + if (!acpi_region_callable(rgn) || !rgn->setup) {
> > + mutex_unlock(&acpi_mutex_region);
> > + return status;
> > + }
> > +
> > + atomic_inc(&rgn->refcnt);
> > + setup = rgn->setup;
> > + context = rgn->context;
> > + mutex_unlock(&acpi_mutex_region);
> > +
> > + status = setup(handle, function, context, region_context);
>
> Can setup drop rgn->refcnt ?

The reason is same as the handler, as a setup is also a callback.

>
> > + atomic_dec(&rgn->refcnt);
> > +
> > + return status;
> > +}
> > +
> > +static int __acpi_install_region(struct acpi_region *rgn,
> > + acpi_adr_space_type space_id)
> > +{
> > + int res = 0;
> > + acpi_status status;
> > + int installing = 0;
> > +
> > + mutex_lock(&acpi_mutex_region);
> > + if (rgn->flags & ACPI_REGION_INSTALLED)
> > + goto out_lock;
> > + if (rgn->flags & ACPI_REGION_INSTALLING) {
> > + res = -EBUSY;
> > + goto out_lock;
> > + }
> > +
> > + installing = 1;
> > + rgn->flags |= ACPI_REGION_INSTALLING;
> > + status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> space_id,
> > + acpi_region_default_handler,
> > + acpi_region_default_setup,
> > + rgn);
> > + rgn->flags &= ~ACPI_REGION_INSTALLING;
> > + if (ACPI_FAILURE(status))
> > + res = -EINVAL;
> > + else
> > + rgn->flags |= ACPI_REGION_INSTALLED;
> > +
> > +out_lock:
> > + mutex_unlock(&acpi_mutex_region);
> > + if (installing) {
> > + if (res)
> > + pr_err("Failed to install region %d\n", space_id);
> > + else
> > + pr_info("Region %d installed\n", space_id);
> > + }
> > + return res;
> > +}
> > +
> > +int acpi_register_region(acpi_adr_space_type space_id,
> > + acpi_adr_space_handler handler,
> > + acpi_adr_space_setup setup, void *context) {
> > + int res;
> > + struct acpi_region *rgn;
> > +
> > + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > + return -EINVAL;
> > +
> > + rgn = &acpi_regions[space_id];
> > + if (!acpi_region_managed(rgn))
> > + return -EINVAL;
> > +
> > + res = __acpi_install_region(rgn, space_id);
> > + if (res)
> > + return res;
> > +
> > + mutex_lock(&acpi_mutex_region);
> > + if (rgn->flags & ACPI_REGION_REGISTERED) {
> > + mutex_unlock(&acpi_mutex_region);
> > + return -EBUSY;
> > + }
> > +
> > + rgn->handler = handler;
> > + rgn->setup = setup;
> > + rgn->context = context;
> > + rgn->flags |= ACPI_REGION_REGISTERED;
> > + atomic_set(&rgn->refcnt, 1);
> > + mutex_unlock(&acpi_mutex_region);
> > +
> > + pr_info("Region %d registered\n", space_id);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > +
> > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > + struct acpi_region *rgn;
> > +
> > + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > + return;
> > +
> > + rgn = &acpi_regions[space_id];
> > + if (!acpi_region_managed(rgn))
> > + return;
> > +
> > + mutex_lock(&acpi_mutex_region);
> > + if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > + mutex_unlock(&acpi_mutex_region);
> > + return;
> > + }
> > + if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > + mutex_unlock(&acpi_mutex_region);
> > + return;
>
> What about
>
> if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> mutex_unlock(&acpi_mutex_region);
> return;
> }
>

OK.

> > + }
> > +
> > + rgn->flags |= ACPI_REGION_UNREGISTERING;
> > + rgn->handler = NULL;
> > + rgn->setup = NULL;
> > + rgn->context = NULL;
> > + mutex_unlock(&acpi_mutex_region);
> > +
> > + while (atomic_read(&rgn->refcnt) > 1)
> > + schedule_timeout_uninterruptible(usecs_to_jiffies(5));
>
> Wouldn't it be better to use a wait queue here?

Yes, I'll try.

>
> > + atomic_dec(&rgn->refcnt);
> > +
> > + mutex_lock(&acpi_mutex_region);
> > + rgn->flags &= ~(ACPI_REGION_REGISTERED |
> ACPI_REGION_UNREGISTERING);
> > + mutex_unlock(&acpi_mutex_region);
> > +
> > + pr_info("Region %d unregistered\n", space_id); }
> > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > a2c2fbb..15fad0d 100644
> > --- a/include/acpi/acpi_bus.h
> > +++ b/include/acpi/acpi_bus.h
> > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > *bus) { return 0; }
> >
> > #endif /* CONFIG_ACPI */
> >
> > +int acpi_register_region(acpi_adr_space_type space_id,
> > + acpi_adr_space_handler handler,
> > + acpi_adr_space_setup setup, void *context); void
> > +acpi_unregister_region(acpi_adr_space_type space_id);
> > +
> > #endif /*__ACPI_BUS_H__*/
>
> Thanks,
> Rafael

Thanks
-Lv

>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 00:48:17

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers

On 07/25/2013 07:16 PM, Zheng, Lv wrote:
>>
>> If I understand this correctly, the problem would be if:
>>
>> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
>> IPMI_TIMEOUT);
>>
>> returns on a timeout, then checks msg_done and races with something setting
>> msg_done. If that is the case, you would need the smp_rmb() before checking
>> msg_done.
>>
>> However, the timeout above is unnecessary. You are using
>> ipmi_request_settime(), so you can set the timeout when the IPMI command
>> fails and returns a failure message. The driver guarantees a return message
>> for each request. Just remove the timeout from the completion, set the
>> timeout and retries in the ipmi request, and the completion should handle the
>> barrier issues.
> It's just difficult for me to determine retry count and timeout value, maybe retry=0, timeout=IPMI_TIMEOUT is OK.
> The code of the timeout completion is already there, I think the quick fix code should not introduce this logic.
> I'll add a new patch to apply your comment.

Since it is a local BMC, I doubt a retry is required. That is probably
fine. Or you could set retry=1 and timeout=IPMI_TIMEOUT/2 if you wanted
to be more sure, but I doubt it would make a difference. The only time
you really need to worry about retries is if you are resetting the BMC
or it is being overloaded.

>
>> Plus, from a quick glance at the code, it doesn't look like it will properly handle a
>> situation where the timeout occurs and is handled then the response comes in
>> later.
> PATCH 07 fixed this issue.
> Here we just need the smp_rmb() or holding tx_msg_lock() around the acpi_format_ipmi_response().

If you apply the fix like I suggest, then the race goes away. If
there's no timeout and it just waits for the completion, things get a
lot simpler.

>
> Thanks for commenting.

No problem, thanks for working on this.

-corey

2013-07-26 01:18:40

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 04/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user

> From: Rafael J. Wysocki [mailto:[email protected]]
> Sent: Friday, July 26, 2013 5:59 AM
>
> On Tuesday, July 23, 2013 04:09:26 PM Lv Zheng wrote:
> > This patch uses reference counting to fix the race caused by the
> > unprotected ACPI IPMI user.
> >
> > As the acpi_ipmi_device->user_interface check in
> > acpi_ipmi_space_handler() can happen before setting user_interface to
> > NULL and codes after the check in acpi_ipmi_space_handler() can happen
> > after user_interface becoming NULL, then the on-going
> > acpi_ipmi_space_handler() still can pass an invalid
> > acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race
> > condition is not allowed by the IPMI layer's API design as crash will happen in
> ipmi_request_settime().
> > In IPMI layer, smi_gone()/new_smi() callbacks are protected by
> > smi_watchers_mutex, thus their invocations are serialized. But as a
> > new smi can re-use the freed intf_num, it requires that the callback
> > implementation must not use intf_num as an identification mean or it
> > must ensure all references to the previous smi are all dropped before
> > exiting
> > smi_gone() callback. In case of acpi_ipmi module, this means
> > ipmi_flush_tx_msg() must ensure all on-going IPMI transfers are
> > completed before exiting ipmi_flush_tx_msg().
> >
> > This patch follows ipmi_devintf.c design:
> > 1. Invoking ipmi_destroy_user() after the reference count of
> > acpi_ipmi_device dropping to 0, this matches IPMI layer's API calling
> > rule on ipmi_destroy_user() and ipmi_request_settime().
> > 2. References of acpi_ipmi_device dropping to 1 means tx_msg related to
> > this acpi_ipmi_device are all freed, this can be used to implement the
> > new flushing mechanism. Note complete() must be retried so that the
> > on-going tx_msg won't block flushing at the point to add tx_msg into
> > tx_msg_list where reference of acpi_ipmi_device is held. This matches
> > the IPMI layer's callback rule on smi_gone()/new_smi() serialization.
> > 3. ipmi_flush_tx_msg() is performed after deleting acpi_ipmi_device from
> > the list so that no new tx_msg can be created after entering flushing
> > process.
> > 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch.
> >
> > The forthcoming IPMI operation region handler installation changes
> > also requires acpi_ipmi_device be handled in the reference counting style.
> >
> > Authorship is also updated due to this design change.
> >
> > Signed-off-by: Lv Zheng <[email protected]>
> > Cc: Zhao Yakui <[email protected]>
> > Reviewed-by: Huang Ying <[email protected]>
> > ---
> > drivers/acpi/acpi_ipmi.c | 249
> > +++++++++++++++++++++++++++-------------------
> > 1 file changed, 149 insertions(+), 100 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > 527ee43..cbf25e0 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -1,8 +1,9 @@
> > /*
> > * acpi_ipmi.c - ACPI IPMI opregion
> > *
> > - * Copyright (C) 2010 Intel Corporation
> > - * Copyright (C) 2010 Zhao Yakui <[email protected]>
> > + * Copyright (C) 2010, 2013 Intel Corporation
> > + * Author: Zhao Yakui <[email protected]>
> > + * Lv Zheng <[email protected]>
> > *
> > *
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~
> > *
> > @@ -67,6 +68,7 @@ struct acpi_ipmi_device {
> > long curr_msgid;
> > unsigned long flags;
> > struct ipmi_smi_info smi_data;
> > + atomic_t refcnt;
>
> Can you use a kref instead?

Please see my concerns in another email.

>
> > };
> >
> > struct ipmi_driver_data {
> > @@ -107,8 +109,8 @@ struct acpi_ipmi_buffer { static void
> > ipmi_register_bmc(int iface, struct device *dev); static void
> > ipmi_bmc_gone(int iface); static void ipmi_msg_handler(struct
> > ipmi_recv_msg *msg, void *user_msg_data); -static void
> > acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device); -static
> > void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device);
> > +static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
> > +static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
> >
> > static struct ipmi_driver_data driver_data = {
> > .ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
> > @@ -122,6 +124,80 @@ static struct ipmi_driver_data driver_data = {
> > },
> > };
> >
> > +static struct acpi_ipmi_device *
> > +ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > +handle) {
> > + struct acpi_ipmi_device *ipmi_device;
> > + int err;
> > + ipmi_user_t user;
> > +
> > + ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
> > + if (!ipmi_device)
> > + return NULL;
> > +
> > + atomic_set(&ipmi_device->refcnt, 1);
> > + INIT_LIST_HEAD(&ipmi_device->head);
> > + INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
> > + spin_lock_init(&ipmi_device->tx_msg_lock);
> > +
> > + ipmi_device->handle = handle;
> > + ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > + memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> ipmi_smi_info));
> > + ipmi_device->ipmi_ifnum = iface;
> > +
> > + err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > + ipmi_device, &user);
> > + if (err) {
> > + put_device(smi_data->dev);
> > + kfree(ipmi_device);
> > + return NULL;
> > + }
> > + ipmi_device->user_interface = user;
> > + ipmi_install_space_handler(ipmi_device);
> > +
> > + return ipmi_device;
> > +}
> > +
> > +static struct acpi_ipmi_device *
> > +acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device) {
> > + if (ipmi_device)
> > + atomic_inc(&ipmi_device->refcnt);
> > + return ipmi_device;
> > +}
> > +
> > +static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device) {
> > + ipmi_remove_space_handler(ipmi_device);
> > + ipmi_destroy_user(ipmi_device->user_interface);
> > + put_device(ipmi_device->smi_data.dev);
> > + kfree(ipmi_device);
> > +}
> > +
> > +static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device) {
> > + if (ipmi_device && atomic_dec_and_test(&ipmi_device->refcnt))
> > + ipmi_dev_release(ipmi_device);
> > +}
> > +
> > +static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
> > +{
> > + int dev_found = 0;
> > + struct acpi_ipmi_device *ipmi_device;
> > +
>
> Why don't you do
>
> struct acpi_ipmi_device *ipmi_device, *ret = NULL;
>
> and then ->
>
> > + mutex_lock(&driver_data.ipmi_lock);
> > + list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
> > + if (ipmi_device->ipmi_ifnum == iface) {
>
> -> ret = ipmi_device; ->
>
> > + dev_found = 1;
> > + acpi_ipmi_dev_get(ipmi_device);
> > + break;
> > + }
> > + }
> > + mutex_unlock(&driver_data.ipmi_lock);
> > +
> > + return dev_found ? ipmi_device : NULL;
>
> -> return ret;

OK.

>
> > +}
> > +
> > static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct
> > acpi_ipmi_device *ipmi) {
> > struct acpi_ipmi_msg *ipmi_msg;
> > @@ -228,25 +304,24 @@ static void acpi_format_ipmi_response(struct
> > acpi_ipmi_msg *msg, static void ipmi_flush_tx_msg(struct
> > acpi_ipmi_device *ipmi) {
> > struct acpi_ipmi_msg *tx_msg, *temp;
> > - int count = HZ / 10;
> > - struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > unsigned long flags;
> >
> > - spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > - list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> > - /* wake up the sleep thread on the Tx msg */
> > - complete(&tx_msg->tx_complete);
> > - }
> > - spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > -
> > - /* wait for about 100ms to flush the tx message list */
> > - while (count--) {
> > - if (list_empty(&ipmi->tx_msg_list))
> > - break;
> > - schedule_timeout(1);
> > + /*
> > + * NOTE: Synchronous Flushing
> > + * Wait until refnct dropping to 1 - no other users unless this
> > + * context. This function should always be called before
> > + * acpi_ipmi_device destruction.
> > + */
> > + while (atomic_read(&ipmi->refcnt) > 1) {
>
> Isn't this racy? What if we see that the refcount is 1 and break the loop, but
> someone else bumps up the refcount at the same time?

No, it's not racy.
Flushing codes here is invoked after acpi_ipmi_device disappearing from the object managers.
Please look at the ipmi_bmc_gone() and acpi_ipmi_exit().
The ipmi_flush_tx_msg() will only be called after a "list_del()".
There will be no new transfers created in the acpi_ipmi_space_handler() as acpi_ipmi_get_targeted_smi() will return NULL after the "list_del()".

So there are no chances that it reaches to 1 and go back again as the refcount will only increases from 1 to > 1 unless it is still in an object managers.
The trick here is to drop all of the object managers' reference and only hold the "call chain" reference here (thus it is 1) in the ipmi_bmc_gone() and acpi_ipmi_exit().
In case of this patch, the object reference count is converted into "call chain" reference count in the ipmi_bmc_gone() and acpi_ipmi_exit().
The waiting codes here then can wait the reference count dropping to 1 which indicates all on-going transfer references are also get dropped.

>
> > + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > + list_for_each_entry_safe(tx_msg, temp,
> > + &ipmi->tx_msg_list, head) {
> > + /* wake up the sleep thread on the Tx msg */
> > + complete(&tx_msg->tx_complete);
> > + }
> > + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > + schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> > }
> > - if (!list_empty(&ipmi->tx_msg_list))
> > - dev_warn(&pnp_dev->dev, "tx msg list is not NULL\n");
> > }
> >
> > static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> > *user_msg_data) @@ -304,22 +379,26 @@ static void
> > ipmi_register_bmc(int iface, struct device *dev) {
> > struct acpi_ipmi_device *ipmi_device, *temp;
> > struct pnp_dev *pnp_dev;
> > - ipmi_user_t user;
> > int err;
> > struct ipmi_smi_info smi_data;
> > acpi_handle handle;
> >
> > err = ipmi_get_smi_info(iface, &smi_data);
> > -
> > if (err)
> > return;
> >
> > - if (smi_data.addr_src != SI_ACPI) {
> > - put_device(smi_data.dev);
> > - return;
> > - }
> > -
> > + if (smi_data.addr_src != SI_ACPI)
> > + goto err_ref;
> > handle = smi_data.addr_info.acpi_info.acpi_handle;
> > + if (!handle)
> > + goto err_ref;
> > + pnp_dev = to_pnp_dev(smi_data.dev);
> > +
> > + ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > + if (!ipmi_device) {
> > + dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > + goto err_ref;
> > + }
> >
> > mutex_lock(&driver_data.ipmi_lock);
> > list_for_each_entry(temp, &driver_data.ipmi_devices, head) { @@
> > -328,54 +407,42 @@ static void ipmi_register_bmc(int iface, struct device
> *dev)
> > * to the device list, don't add it again.
> > */
> > if (temp->handle == handle)
> > - goto out;
> > + goto err_lock;
> > }
> >
> > - ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
> > -
> > - if (!ipmi_device)
> > - goto out;
> > -
> > - pnp_dev = to_pnp_dev(smi_data.dev);
> > - ipmi_device->handle = handle;
> > - ipmi_device->pnp_dev = pnp_dev;
> > -
> > - err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > - ipmi_device, &user);
> > - if (err) {
> > - dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > - kfree(ipmi_device);
> > - goto out;
> > - }
> > - acpi_add_ipmi_device(ipmi_device);
> > - ipmi_device->user_interface = user;
> > - ipmi_device->ipmi_ifnum = iface;
> > + list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
> > mutex_unlock(&driver_data.ipmi_lock);
> > - memcpy(&ipmi_device->smi_data, &smi_data, sizeof(struct
> ipmi_smi_info));
> > + put_device(smi_data.dev);
> > return;
> >
> > -out:
> > +err_lock:
> > mutex_unlock(&driver_data.ipmi_lock);
> > + ipmi_dev_release(ipmi_device);
> > +err_ref:
> > put_device(smi_data.dev);
> > return;
> > }
> >
> > static void ipmi_bmc_gone(int iface)
> > {
> > - struct acpi_ipmi_device *ipmi_device, *temp;
> > + int dev_found = 0;
> > + struct acpi_ipmi_device *ipmi_device;
> >
> > mutex_lock(&driver_data.ipmi_lock);
> > - list_for_each_entry_safe(ipmi_device, temp,
> > - &driver_data.ipmi_devices, head) {
> > - if (ipmi_device->ipmi_ifnum != iface)
> > - continue;
> > -
> > - acpi_remove_ipmi_device(ipmi_device);
> > - put_device(ipmi_device->smi_data.dev);
> > - kfree(ipmi_device);
> > - break;
> > + list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
> > + if (ipmi_device->ipmi_ifnum == iface) {
> > + dev_found = 1;
>
> You can do the list_del() here, because you're under the mutex, so others won't
> see the list in an inconsistens state and you're about to break anyway.

I'm trying to improve the code maintainability (hence the software internal quality) here for the reviewers.
If we introduce a list_del()/break inside a list_for_each_entry(), then it is pretty much likely that the list_for_each_entry() does not appear in a future patch that deletes the "break".
And reviewers could not detect such bug.
The coding style like what I'm showing here can avoid such issue.
I was thinking maintainers would be happy with such codes - it can prevent many unhappy small mistakes from happening.

Thanks for commenting.

Best regards
-Lv

>
> > + break;
> > + }
> > }
> > + if (dev_found)
> > + list_del(&ipmi_device->head);
> > mutex_unlock(&driver_data.ipmi_lock);
> > +
> > + if (dev_found) {
> > + ipmi_flush_tx_msg(ipmi_device);
> > + acpi_ipmi_dev_put(ipmi_device);
> > + }
> > }
> >
> > /*
> > ----------------------------------------------------------------------
> > ---- @@ -400,7 +467,8 @@ acpi_ipmi_space_handler(u32 function,
> > acpi_physical_address address,
> > void *handler_context, void *region_context) {
> > struct acpi_ipmi_msg *tx_msg;
> > - struct acpi_ipmi_device *ipmi_device = handler_context;
> > + int iface = (long)handler_context;
> > + struct acpi_ipmi_device *ipmi_device;
> > int err, rem_time;
> > acpi_status status;
> > unsigned long flags;
> > @@ -414,12 +482,15 @@ acpi_ipmi_space_handler(u32 function,
> acpi_physical_address address,
> > if ((function & ACPI_IO_MASK) == ACPI_READ)
> > return AE_TYPE;
> >
> > - if (!ipmi_device->user_interface)
> > + ipmi_device = acpi_ipmi_get_targeted_smi(iface);
> > + if (!ipmi_device)
> > return AE_NOT_EXIST;
> >
> > tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
> > - if (!tx_msg)
> > - return AE_NO_MEMORY;
> > + if (!tx_msg) {
> > + status = AE_NO_MEMORY;
> > + goto out_ref;
> > + }
> >
> > if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
> > status = AE_TYPE;
> > @@ -449,6 +520,8 @@ out_list:
> > spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > out_msg:
> > kfree(tx_msg);
> > +out_ref:
> > + acpi_ipmi_dev_put(ipmi_device);
> > return status;
> > }
> >
> > @@ -473,7 +546,7 @@ static int ipmi_install_space_handler(struct
> acpi_ipmi_device *ipmi)
> > status = acpi_install_address_space_handler(ipmi->handle,
> > ACPI_ADR_SPACE_IPMI,
> > &acpi_ipmi_space_handler,
> > - NULL, ipmi);
> > + NULL, (void *)((long)ipmi->ipmi_ifnum));
> > if (ACPI_FAILURE(status)) {
> > struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
> > @@ -484,36 +557,6 @@ static int ipmi_install_space_handler(struct
> acpi_ipmi_device *ipmi)
> > return 0;
> > }
> >
> > -static void acpi_add_ipmi_device(struct acpi_ipmi_device
> > *ipmi_device) -{
> > -
> > - INIT_LIST_HEAD(&ipmi_device->head);
> > -
> > - spin_lock_init(&ipmi_device->tx_msg_lock);
> > - INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
> > - ipmi_install_space_handler(ipmi_device);
> > -
> > - list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
> > -}
> > -
> > -static void acpi_remove_ipmi_device(struct acpi_ipmi_device
> > *ipmi_device) -{
> > - /*
> > - * If the IPMI user interface is created, it should be
> > - * destroyed.
> > - */
> > - if (ipmi_device->user_interface) {
> > - ipmi_destroy_user(ipmi_device->user_interface);
> > - ipmi_device->user_interface = NULL;
> > - }
> > - /* flush the Tx_msg list */
> > - if (!list_empty(&ipmi_device->tx_msg_list))
> > - ipmi_flush_tx_msg(ipmi_device);
> > -
> > - list_del(&ipmi_device->head);
> > - ipmi_remove_space_handler(ipmi_device);
> > -}
> > -
> > static int __init acpi_ipmi_init(void) {
> > int result = 0;
> > @@ -530,7 +573,7 @@ static int __init acpi_ipmi_init(void)
> >
> > static void __exit acpi_ipmi_exit(void) {
> > - struct acpi_ipmi_device *ipmi_device, *temp;
> > + struct acpi_ipmi_device *ipmi_device;
> >
> > if (acpi_disabled)
> > return;
> > @@ -544,11 +587,17 @@ static void __exit acpi_ipmi_exit(void)
> > * handler and free it.
> > */
> > mutex_lock(&driver_data.ipmi_lock);
> > - list_for_each_entry_safe(ipmi_device, temp,
> > - &driver_data.ipmi_devices, head) {
> > - acpi_remove_ipmi_device(ipmi_device);
> > - put_device(ipmi_device->smi_data.dev);
> > - kfree(ipmi_device);
> > + while (!list_empty(&driver_data.ipmi_devices)) {
> > + ipmi_device = list_first_entry(&driver_data.ipmi_devices,
> > + struct acpi_ipmi_device,
> > + head);
> > + list_del(&ipmi_device->head);
> > + mutex_unlock(&driver_data.ipmi_lock);
> > +
> > + ipmi_flush_tx_msg(ipmi_device);
> > + acpi_ipmi_dev_put(ipmi_device);
> > +
> > + mutex_lock(&driver_data.ipmi_lock);
> > }
> > mutex_unlock(&driver_data.ipmi_lock);
> > }
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 01:21:25

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers

> From: [email protected]
> [mailto:[email protected]] On Behalf Of Rafael J. Wysocki
> Sent: Friday, July 26, 2013 6:23 AM
>
> On Tuesday, July 23, 2013 04:09:54 PM Lv Zheng wrote:
> > This patch adds reference counting for ACPI IPMI transfers to tune the
> > locking granularity of tx_msg_lock.
> >
> > The acpi_ipmi_msg handling is re-designed using referece counting.
> > 1. tx_msg is always unlinked before complete(), so that:
> > 1.1. it is safe to put complete() out side of tx_msg_lock;
> > 1.2. complete() can only happen once, thus smp_wmb() is not required.
> > 2. Increasing the reference of tx_msg before calling
> > ipmi_request_settime() and introducing tx_msg_lock protected
> > ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
> > tx_msg unlinking in the failure cases.
> > 3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
> > and freed in the contexts other than acpi_ipmi_space_handler().
> >
> > The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
> > tuning:
> > 1. ipmi_lock is always leaf:
> > irq_context: 0
> > [ffffffff81a943f8] smi_watchers_mutex
> > [ffffffffa06eca60] driver_data.ipmi_lock
> > irq_context: 0
> > [ffffffff82767b40] &buffer->mutex
> > [ffffffffa00a6678] s_active#103
> > [ffffffffa06eca60] driver_data.ipmi_lock
> > 2. without this patch applied, lock used by complete() is held after
> > holding tx_msg_lock:
> > irq_context: 0
> > [ffffffff82767b40] &buffer->mutex
> > [ffffffffa00a6678] s_active#103
> > [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > irq_context: 1
> > [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > irq_context: 1
> > [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > [ffffffffa06eccf0] &x->wait#25
> > irq_context: 1
> > [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > [ffffffffa06eccf0] &x->wait#25
> > [ffffffff81e36620] &p->pi_lock
> > irq_context: 1
> > [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > [ffffffffa06eccf0] &x->wait#25
> > [ffffffff81e36620] &p->pi_lock
> > [ffffffff81e5d0a8] &rq->lock
> > 3. with this patch applied, tx_msg_lock is always leaf:
> > irq_context: 0
> > [ffffffff82767b40] &buffer->mutex
> > [ffffffffa00a66d8] s_active#107
> > [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> > irq_context: 1
> > [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> >
> > Signed-off-by: Lv Zheng <[email protected]>
> > Cc: Zhao Yakui <[email protected]>
> > Reviewed-by: Huang Ying <[email protected]>
> > ---
> > drivers/acpi/acpi_ipmi.c | 107
> +++++++++++++++++++++++++++++++++-------------
> > 1 file changed, 77 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > index 2a09156..0ee1ea6 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
> > u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
> > u8 rx_len;
> > struct acpi_ipmi_device *device;
> > + atomic_t refcnt;
>
> Again: kref, please?

Please see the concerns in another email.

>
> > };
> >
> > /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
> > @@ -195,22 +196,47 @@ static struct acpi_ipmi_device
> *acpi_ipmi_get_selected_smi(void)
> > return ipmi_device;
> > }
> >
> > -static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device
> *ipmi)
> > +static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
> > {
> > + struct acpi_ipmi_device *ipmi;
> > struct acpi_ipmi_msg *ipmi_msg;
> > - struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >
> > + ipmi = acpi_ipmi_get_selected_smi();
> > + if (!ipmi)
> > + return NULL;
> > ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
> > - if (!ipmi_msg) {
> > - dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
> > + if (!ipmi_msg) {
> > + acpi_ipmi_dev_put(ipmi);
> > return NULL;
> > }
> > + atomic_set(&ipmi_msg->refcnt, 1);
> > init_completion(&ipmi_msg->tx_complete);
> > INIT_LIST_HEAD(&ipmi_msg->head);
> > ipmi_msg->device = ipmi;
> > +
> > return ipmi_msg;
> > }
> >
> > +static struct acpi_ipmi_msg *
> > +acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
> > +{
> > + if (tx_msg)
> > + atomic_inc(&tx_msg->refcnt);
> > + return tx_msg;
> > +}
> > +
> > +static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
> > +{
> > + acpi_ipmi_dev_put(tx_msg->device);
> > + kfree(tx_msg);
> > +}
> > +
> > +static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
> > +{
> > + if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
> > + ipmi_msg_release(tx_msg);
> > +}
> > +
> > #define IPMI_OP_RGN_NETFN(offset) ((offset >> 8) & 0xff)
> > #define IPMI_OP_RGN_CMD(offset) (offset & 0xff)
> > static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
> > @@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct
> acpi_ipmi_msg *msg,
> >
> > static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
> > {
> > - struct acpi_ipmi_msg *tx_msg, *temp;
> > + struct acpi_ipmi_msg *tx_msg;
> > unsigned long flags;
> >
> > /*
> > @@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct
> acpi_ipmi_device *ipmi)
> > */
> > while (atomic_read(&ipmi->refcnt) > 1) {
> > spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > - list_for_each_entry_safe(tx_msg, temp,
> > - &ipmi->tx_msg_list, head) {
> > + while (!list_empty(&ipmi->tx_msg_list)) {
> > + tx_msg = list_first_entry(&ipmi->tx_msg_list,
> > + struct acpi_ipmi_msg,
> > + head);
> > + list_del(&tx_msg->head);
> > + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +
> > /* wake up the sleep thread on the Tx msg */
> > complete(&tx_msg->tx_complete);
> > + acpi_ipmi_msg_put(tx_msg);
> > + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > }
> > spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +
> > schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> > }
> > }
> >
> > +static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
> > + struct acpi_ipmi_msg *msg)
> > +{
> > + struct acpi_ipmi_msg *tx_msg;
> > + int msg_found = 0;
>
> Use bool?

OK.
There are other int flags in the original codes, do I need to do a cleanup for all of them (dev_found)?

>
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > + list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
> > + if (msg == tx_msg) {
> > + msg_found = 1;
> > + break;
> > + }
> > + }
> > + if (msg_found)
> > + list_del(&tx_msg->head);
>
> The list_del() can be done when you set msg_found.

Please see my concerns in another email.

Thanks and best regards
-Lv

>
> > + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +
> > + if (msg_found)
> > + acpi_ipmi_msg_put(tx_msg);
> > +}
> > +
> > static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> *user_msg_data)
> > {
> > struct acpi_ipmi_device *ipmi_device = user_msg_data;
> > @@ -343,12 +399,15 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> > break;
> > }
> > }
> > + if (msg_found)
> > + list_del(&tx_msg->head);
> > + spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >
> > if (!msg_found) {
> > dev_warn(&pnp_dev->dev,
> > "Unexpected response (msg id %ld) is returned.\n",
> > msg->msgid);
> > - goto out_lock;
> > + goto out_msg;
> > }
> >
> > /* copy the response data to Rx_data buffer */
> > @@ -360,14 +419,11 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> > }
> > tx_msg->rx_len = msg->msg.data_len;
> > memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > - /* tx_msg content must be valid before setting msg_done flag */
> > - smp_wmb();
> > tx_msg->msg_done = 1;
> >
> > out_comp:
> > complete(&tx_msg->tx_complete);
> > -out_lock:
> > - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > + acpi_ipmi_msg_put(tx_msg);
> > out_msg:
> > ipmi_free_recv_msg(msg);
> > }
> > @@ -493,21 +549,17 @@ acpi_ipmi_space_handler(u32 function,
> acpi_physical_address address,
> > if ((function & ACPI_IO_MASK) == ACPI_READ)
> > return AE_TYPE;
> >
> > - ipmi_device = acpi_ipmi_get_selected_smi();
> > - if (!ipmi_device)
> > + tx_msg = ipmi_msg_alloc();
> > + if (!tx_msg)
> > return AE_NOT_EXIST;
> > -
> > - tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
> > - if (!tx_msg) {
> > - status = AE_NO_MEMORY;
> > - goto out_ref;
> > - }
> > + ipmi_device = tx_msg->device;
> >
> > if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
> > - status = AE_TYPE;
> > - goto out_msg;
> > + ipmi_msg_release(tx_msg);
> > + return AE_TYPE;
> > }
> >
> > + acpi_ipmi_msg_get(tx_msg);
> > spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
> > list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
> > spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > @@ -518,21 +570,16 @@ acpi_ipmi_space_handler(u32 function,
> acpi_physical_address address,
> > NULL, 0, 0, 0);
> > if (err) {
> > status = AE_ERROR;
> > - goto out_list;
> > + goto out_msg;
> > }
> > rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> > IPMI_TIMEOUT);
> > acpi_format_ipmi_response(tx_msg, value, rem_time);
> > status = AE_OK;
> >
> > -out_list:
> > - spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
> > - list_del(&tx_msg->head);
> > - spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > out_msg:
> > - kfree(tx_msg);
> > -out_ref:
> > - acpi_ipmi_dev_put(ipmi_device);
> > + ipmi_cancel_tx_msg(ipmi_device, tx_msg);
> > + acpi_ipmi_msg_put(tx_msg);
> > return status;
> > }
> >
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 01:25:21

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members

> From: Rafael J. Wysocki [mailto:[email protected]]
> Sent: Friday, July 26, 2013 6:26 AM
>
> On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> > This is a trivial patch:
> > 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
> > actually used.
> > 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
> > by dev_warn() invocations, so changes it to struct device.
> >
> > Signed-off-by: Lv Zheng <[email protected]>
> > Reviewed-by: Huang Ying <[email protected]>
> > ---
> > drivers/acpi/acpi_ipmi.c | 30 ++++++++++++++----------------
> > 1 file changed, 14 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > 0ee1ea6..7f93ffd 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
> > struct list_head tx_msg_list;
> > spinlock_t tx_msg_lock;
> > acpi_handle handle;
> > - struct pnp_dev *pnp_dev;
> > + struct device *dev;
> > ipmi_user_t user_interface;
> > int ipmi_ifnum; /* IPMI interface number */
> > long curr_msgid;
> > - struct ipmi_smi_info smi_data;
> > atomic_t refcnt;
> > };
> >
> > @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = { };
> >
> > static struct acpi_ipmi_device *
> > -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > handle)
> > +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
>
> Why is the second arg called pdev?

OK, I will change it to dev.

>
> > {
> > struct acpi_ipmi_device *ipmi_device;
> > int err;
> > @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info
> *smi_data, acpi_handle handle)
> > spin_lock_init(&ipmi_device->tx_msg_lock);
> >
> > ipmi_device->handle = handle;
> > - ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > - memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> ipmi_smi_info));
> > + ipmi_device->dev = get_device(pdev);
> > ipmi_device->ipmi_ifnum = iface;
> >
> > err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > ipmi_device, &user);
> > if (err) {
> > - put_device(smi_data->dev);
> > + put_device(pdev);
> > kfree(ipmi_device);
> > return NULL;
> > }
> > @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device
> > *ipmi_device) static void ipmi_dev_release(struct acpi_ipmi_device
> > *ipmi_device) {
> > ipmi_destroy_user(ipmi_device->user_interface);
> > - put_device(ipmi_device->smi_data.dev);
> > + put_device(ipmi_device->dev);
> > kfree(ipmi_device);
> > }
> >
> > @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct
> acpi_ipmi_msg *tx_msg,
> > buffer = (struct acpi_ipmi_buffer *)value;
> > /* copy the tx message data */
> > if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> > - dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> > + dev_WARN_ONCE(tx_msg->device->dev, true,
> > "Unexpected request (msg len %d).\n",
> > buffer->length);
> > return -EINVAL;
> > @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> > struct acpi_ipmi_device *ipmi_device = user_msg_data;
> > int msg_found = 0;
> > struct acpi_ipmi_msg *tx_msg;
> > - struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> > + struct device *dev = ipmi_device->dev;
> > unsigned long flags;
> >
> > if (msg->user != ipmi_device->user_interface) {
> > - dev_warn(&pnp_dev->dev,
> > + dev_warn(dev,
> > "Unexpected response is returned. returned user %p, expected
> user %p\n",
> > msg->user, ipmi_device->user_interface);
> > goto out_msg;
> > @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> *msg, void *user_msg_data)
> > spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >
> > if (!msg_found) {
> > - dev_warn(&pnp_dev->dev,
> > + dev_warn(dev,
> > "Unexpected response (msg id %ld) is returned.\n",
> > msg->msgid);
> > goto out_msg;
> > @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > *msg, void *user_msg_data)
> >
> > /* copy the response data to Rx_data buffer */
> > if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> > - dev_WARN_ONCE(&pnp_dev->dev, true,
> > + dev_WARN_ONCE(dev, true,
> > "Unexpected response (msg len %d).\n",
> > msg->msg.data_len);
> > goto out_comp;
> > @@ -431,7 +429,7 @@ out_msg:
> > static void ipmi_register_bmc(int iface, struct device *dev) {
> > struct acpi_ipmi_device *ipmi_device, *temp;
> > - struct pnp_dev *pnp_dev;
> > + struct device *pdev;
>
> And here?

The dev is the parameter of the ipmi_register_bmc(), it is not possible to name the "struct ipmi_smi_info " as dev here for this quick fix.

Thanks
-Lv

>
> > int err;
> > struct ipmi_smi_info smi_data;
> > acpi_handle handle;
> > @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct
> device *dev)
> > handle = smi_data.addr_info.acpi_info.acpi_handle;
> > if (!handle)
> > goto err_ref;
> > - pnp_dev = to_pnp_dev(smi_data.dev);
> > + pdev = smi_data.dev;
> >
> > - ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > + ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
> > if (!ipmi_device) {
> > - dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > + dev_warn(pdev, "Can't create IPMI user interface\n");
> > goto err_ref;
> > }
> >
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 01:30:22

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers

> From: Corey Minyard [mailto:[email protected]]
> Sent: Friday, July 26, 2013 8:48 AM
>
> On 07/25/2013 07:16 PM, Zheng, Lv wrote:
> >>
> >> If I understand this correctly, the problem would be if:
> >>
> >> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> >> IPMI_TIMEOUT);
> >>
> >> returns on a timeout, then checks msg_done and races with something
> >> setting msg_done. If that is the case, you would need the smp_rmb()
> >> before checking msg_done.
> >>
> >> However, the timeout above is unnecessary. You are using
> >> ipmi_request_settime(), so you can set the timeout when the IPMI
> >> command fails and returns a failure message. The driver guarantees a
> >> return message for each request. Just remove the timeout from the
> >> completion, set the timeout and retries in the ipmi request, and the
> >> completion should handle the barrier issues.
> > It's just difficult for me to determine retry count and timeout value, maybe
> retry=0, timeout=IPMI_TIMEOUT is OK.
> > The code of the timeout completion is already there, I think the quick fix code
> should not introduce this logic.
> > I'll add a new patch to apply your comment.
>
> Since it is a local BMC, I doubt a retry is required. That is probably fine. Or
> you could set retry=1 and timeout=IPMI_TIMEOUT/2 if you wanted to be more
> sure, but I doubt it would make a difference. The only time you really need to
> worry about retries is if you are resetting the BMC or it is being overloaded.

I think for ACPI IPMI operation region, retries can be implemented in the ASL codes by the BIOS.
I'll check if retry=0 is correct.

>
> >
> >> Plus, from a quick glance at the code, it doesn't look like it will
> >> properly handle a situation where the timeout occurs and is handled
> >> then the response comes in later.
> > PATCH 07 fixed this issue.
> > Here we just need the smp_rmb() or holding tx_msg_lock() around the
> acpi_format_ipmi_response().
>
> If you apply the fix like I suggest, then the race goes away. If there's no
> timeout and it just waits for the completion, things get a lot simpler.

Exactly. I'll try to apply this in this patch, then the PATCH 07 is also need to be re-worked.

Thanks and best regards
-Lv


> >
> > Thanks for commenting.
>
> No problem, thanks for working on this.
>
> -corey
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 01:54:07

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers

> From: Rafael J. Wysocki [mailto:[email protected]]
> Sent: Friday, July 26, 2013 5:29 AM
>
> On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > This patch adds reference couting for ACPI operation region handlers to fix
> > races caused by the ACPICA address space callback invocations.
> >
> > ACPICA address space callback invocation is not suitable for Linux
> > CONFIG_MODULE=y execution environment.
>
> Actually, can you please explain to me what *exactly* the problem is?

OK. I'll add race explanations in the next revision.

The problem is there is no "lock" held inside ACPICA for invoking operation region handlers.
Thus races happens between the acpi_remove/install_address_space_handler and the handler/setup callbacks.

This is correct per ACPI specification.
As if there is interpreter locks held for invoking operation region handlers, the timeout implemented inside the operation region handlers will make all locking facilities (Acquire or Sleep,...) timed out.
Please refer to ACPI specification "5.5.2 Control Method Execution":
Interpretation of a Control Method is not preemptive, but it can block. When a control method does block, OSPM can initiate or continue the execution of a different control method. A control method can only assume that access to global objects is exclusive for any period the control method does not block.

So it is pretty much likely that ACPI IO transfers are locked inside the operation region callback implementations.
Using locking facility to protect the callback invocation will risk dead locks.

Thanks
-Lv

> Rafael

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 08:09:39

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers

> From: [email protected]
> [mailto:[email protected]] On Behalf Of Zheng, Lv
> Sent: Friday, July 26, 2013 8:48 AM
>
>
>
> > From: Rafael J. Wysocki [mailto:[email protected]]
> > Sent: Friday, July 26, 2013 4:27 AM
> >
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers
> > > to fix races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment. This patch tries to protect
> > > the address space callbacks by invoking them under a module safe
> > environment.
> > > The IPMI address space handler is also upgraded in this patch.
> > > The acpi_unregister_region() is designed to meet the following
> > > requirements:
> > > 1. It acts as a barrier for operation region callbacks - no callback will
> > > happen after acpi_unregister_region().
> > > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > > functions.
> > > Using reference counting rather than module referencing allows such
> > > benefits to be achieved even when acpi_unregister_region() is called
> > > in the environments other than module->exit().
> > > The header file of include/acpi/acpi_bus.h should contain the
> > > declarations that have references to some ACPICA defined types.
> > >
> > > Signed-off-by: Lv Zheng <[email protected]>
> > > Reviewed-by: Huang Ying <[email protected]>
> > > ---
> > > drivers/acpi/acpi_ipmi.c | 16 ++--
> > > drivers/acpi/osl.c | 224
> > ++++++++++++++++++++++++++++++++++++++++++++++
> > > include/acpi/acpi_bus.h | 5 ++
> > > 3 files changed, 235 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > > index
> > > 5f8f495..2a09156 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -539,20 +539,18 @@ out_ref:
> > > static int __init acpi_ipmi_init(void) {
> > > int result = 0;
> > > - acpi_status status;
> > >
> > > if (acpi_disabled)
> > > return result;
> > >
> > > mutex_init(&driver_data.ipmi_lock);
> > >
> > > - status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > - ACPI_ADR_SPACE_IPMI,
> > > - &acpi_ipmi_space_handler,
> > > - NULL, NULL);
> > > - if (ACPI_FAILURE(status)) {
> > > + result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > > + &acpi_ipmi_space_handler,
> > > + NULL, NULL);
> > > + if (result) {
> > > pr_warn("Can't register IPMI opregion space handle\n");
> > > - return -EINVAL;
> > > + return result;
> > > }
> > >
> > > result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > > }
> > > mutex_unlock(&driver_data.ipmi_lock);
> > >
> > > - acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > > - ACPI_ADR_SPACE_IPMI,
> > > - &acpi_ipmi_space_handler);
> > > + acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > > }
> > >
> > > module_init(acpi_ipmi_init);
> > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > > 6ab2c35..8398e51 100644
> > > --- a/drivers/acpi/osl.c
> > > +++ b/drivers/acpi/osl.c
> > > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
> > > static struct workqueue_struct *kacpi_notify_wq; static struct
> > > workqueue_struct *kacpi_hotplug_wq;
> > >
> > > +struct acpi_region {
> > > + unsigned long flags;
> > > +#define ACPI_REGION_DEFAULT 0x01
> > > +#define ACPI_REGION_INSTALLED 0x02
> > > +#define ACPI_REGION_REGISTERED 0x04
> > > +#define ACPI_REGION_UNREGISTERING 0x08
> > > +#define ACPI_REGION_INSTALLING 0x10
> >
> > What about (1UL << 1), (1UL << 2) etc.?
> >
> > Also please remove the #defines out of the struct definition.
>
> OK.
>
> >
> > > + /*
> > > + * NOTE: Upgrading All Region Handlers
> > > + * This flag is only used during the period where not all of the
> > > + * region handers are upgraded to the new interfaces.
> > > + */
> > > +#define ACPI_REGION_MANAGED 0x80
> > > + acpi_adr_space_handler handler;
> > > + acpi_adr_space_setup setup;
> > > + void *context;
> > > + /* Invoking references */
> > > + atomic_t refcnt;
> >
> > Actually, why don't you use krefs?
>
> If you take a look at other piece of my codes, you'll find there are two reasons:
>
> 1. I'm using while (atomic_read() > 1) to implement the objects' flushing and
> there is no kref API to do so.
> I just think it is not suitable for me to introduce such an API into kref.h and
> start another argument around kref designs in this bug fix patch. :-)
> I'll start a discussion about kref design using another thread.
> 2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's kind
> of atomic_t coding style.
> If atomic_t is changed to struct kref, I will need to implement two API,
> __ipmi_dev_release() to take a struct kref as parameter and call
> ipmi_dev_release inside it.
> By not using kref, I needn't write codes to implement such API.
>
> >
> > > +};
> > > +
> > > +static struct acpi_region
> acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> > = {
> > > + [ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > > + .flags = ACPI_REGION_DEFAULT,
> > > + },
> > > + [ACPI_ADR_SPACE_SYSTEM_IO] = {
> > > + .flags = ACPI_REGION_DEFAULT,
> > > + },
> > > + [ACPI_ADR_SPACE_PCI_CONFIG] = {
> > > + .flags = ACPI_REGION_DEFAULT,
> > > + },
> > > + [ACPI_ADR_SPACE_IPMI] = {
> > > + .flags = ACPI_REGION_MANAGED,
> > > + },
> > > +};
> > > +static DEFINE_MUTEX(acpi_mutex_region);
> > > +
> > > /*
> > > * This list of permanent mappings is for memory that may be
> > > accessed
> > from
> > > * interrupt context, where we can't do the ioremap().
> > > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle,
> > u32 type, void *context,
> > > kfree(hp_work);
> > > }
> > > EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > > +
> > > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > > + /*
> > > + * NOTE: Default and Managed
> > > + * We only need to avoid region management on the regions
> managed
> > > + * by ACPICA (ACPI_REGION_DEFAULT). Currently, we need
> additional
> > > + * check as many operation region handlers are not upgraded, so
> > > + * only those known to be safe are managed
> (ACPI_REGION_MANAGED).
> > > + */
> > > + return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > > + (rgn->flags & ACPI_REGION_MANAGED); }
> > > +
> > > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > > + return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > > + !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > > +
> > > +static acpi_status
> > > +acpi_region_default_handler(u32 function,
> > > + acpi_physical_address address,
> > > + u32 bit_width, u64 *value,
> > > + void *handler_context, void *region_context) {
> > > + acpi_adr_space_handler handler;
> > > + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > + void *context;
> > > + acpi_status status = AE_NOT_EXIST;
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (!acpi_region_callable(rgn) || !rgn->handler) {
> > > + mutex_unlock(&acpi_mutex_region);
> > > + return status;
> > > + }
> > > +
> > > + atomic_inc(&rgn->refcnt);
> > > + handler = rgn->handler;
> > > + context = rgn->context;
> > > + mutex_unlock(&acpi_mutex_region);
> > > +
> > > + status = handler(function, address, bit_width, value, context,
> > > + region_context);
> >
> > Why don't we call the handler under the mutex?

I think my reply is against this question, let me remove it up.

> It's a kind of programming style related concern.
> IMO, using locks around callback function is a buggy programming style that
> could lead to dead locks.
> Let me explain this using an example.
>
> Object A exports a register/unregister API for other objects.
> Object B calls A's register/unregister API to register/unregister B's callback.

Sorry I have to use object rather than module here as there might be several objects inside a module that have the same situation need to be handled.

> It's likely that object B will hold lock_of_B around unregister/register when
> object B is destroyed/created, the lock_of_B is likely also used inside the
> callback.
> So when object A holds the lock_of_A around the callback invocation, it leads to
> dead lock since:
> 1. the locking order for the register/unregister side will be: lock(lock_of_B),
> lock(lock_of_A) 2. the locking order for the callback side will be: lock(lock_of_A),
> lock(lock_of_B) They are in the reversed order!

I think this example is not quite correct.
There is another aspect in unregister implementation which is the intent of this patch:
No callback running/imitated after "unregister", we can call this a "flush" requirement.
Inside of the callback, lock_of_B should not be held if "flush" is required.

>
> IMO, Linux may need to introduce __callback, __api as decelerators for the
> functions, and use sparse to enforce this rule, sparse knows if a callback is
> invoked under some locks.
>
> In the case of ACPICA space_handlers, as you may know, when an ACPI
> operation region handler is invoked, there will be no lock held inside ACPICA
> (interpreter lock must be freed before executing operation region handlers).
> So the likelihood of the dead lock is pretty much high here!

I need to mention another requirement of the operation region handler.
It is required that multiple operation region handlers are executed at the same time, or the IO operations invoked by the BIOS ASL codes will be serialized.
IMO, IO operations invoked by the BIOS ASL need to be parallelized.
Thus mutex is not useful to implement a protection here.

So the mutex is unlocked before executing the handler, IMO, reference counting is useful here to meet this requirement.

> >
> > What exactly prevents context from becoming NULL before the call above?

I think my answers did not answer this question directly.

Sorry that I'm not clear what you want to ask here. Let me just try to be practical.

The code is here:
>
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (!acpi_region_callable(rgn) || !rgn->handler) {

> > > + handler = rgn->handler;
> > > + context = rgn->context;
> > > + mutex_unlock(&acpi_mutex_region);

The handler is ensured not to be NULL within the mutex lock.

Thanks for commenting.

Best regards
-Lv

> >
> > > + atomic_dec(&rgn->refcnt);
> > > +
> > > + return status;
> > > +}
> > > +
> > > +static acpi_status
> > > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > > + void *handler_context, void **region_context) {
> > > + acpi_adr_space_setup setup;
> > > + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > + void *context;
> > > + acpi_status status = AE_OK;
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (!acpi_region_callable(rgn) || !rgn->setup) {
> > > + mutex_unlock(&acpi_mutex_region);
> > > + return status;
> > > + }
> > > +
> > > + atomic_inc(&rgn->refcnt);
> > > + setup = rgn->setup;
> > > + context = rgn->context;
> > > + mutex_unlock(&acpi_mutex_region);
> > > +
> > > + status = setup(handle, function, context, region_context);
> >
> > Can setup drop rgn->refcnt ?
>
> The reason is same as the handler, as a setup is also a callback.
>
> >
> > > + atomic_dec(&rgn->refcnt);
> > > +
> > > + return status;
> > > +}
> > > +
> > > +static int __acpi_install_region(struct acpi_region *rgn,
> > > + acpi_adr_space_type space_id)
> > > +{
> > > + int res = 0;
> > > + acpi_status status;
> > > + int installing = 0;
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (rgn->flags & ACPI_REGION_INSTALLED)
> > > + goto out_lock;
> > > + if (rgn->flags & ACPI_REGION_INSTALLING) {
> > > + res = -EBUSY;
> > > + goto out_lock;
> > > + }
> > > +
> > > + installing = 1;
> > > + rgn->flags |= ACPI_REGION_INSTALLING;
> > > + status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > space_id,
> > > + acpi_region_default_handler,
> > > + acpi_region_default_setup,
> > > + rgn);
> > > + rgn->flags &= ~ACPI_REGION_INSTALLING;
> > > + if (ACPI_FAILURE(status))
> > > + res = -EINVAL;
> > > + else
> > > + rgn->flags |= ACPI_REGION_INSTALLED;
> > > +
> > > +out_lock:
> > > + mutex_unlock(&acpi_mutex_region);
> > > + if (installing) {
> > > + if (res)
> > > + pr_err("Failed to install region %d\n", space_id);
> > > + else
> > > + pr_info("Region %d installed\n", space_id);
> > > + }
> > > + return res;
> > > +}
> > > +
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > + acpi_adr_space_handler handler,
> > > + acpi_adr_space_setup setup, void *context) {
> > > + int res;
> > > + struct acpi_region *rgn;
> > > +
> > > + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > + return -EINVAL;
> > > +
> > > + rgn = &acpi_regions[space_id];
> > > + if (!acpi_region_managed(rgn))
> > > + return -EINVAL;
> > > +
> > > + res = __acpi_install_region(rgn, space_id);
> > > + if (res)
> > > + return res;
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (rgn->flags & ACPI_REGION_REGISTERED) {
> > > + mutex_unlock(&acpi_mutex_region);
> > > + return -EBUSY;
> > > + }
> > > +
> > > + rgn->handler = handler;
> > > + rgn->setup = setup;
> > > + rgn->context = context;
> > > + rgn->flags |= ACPI_REGION_REGISTERED;
> > > + atomic_set(&rgn->refcnt, 1);
> > > + mutex_unlock(&acpi_mutex_region);
> > > +
> > > + pr_info("Region %d registered\n", space_id);
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > > +
> > > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > > + struct acpi_region *rgn;
> > > +
> > > + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > + return;
> > > +
> > > + rgn = &acpi_regions[space_id];
> > > + if (!acpi_region_managed(rgn))
> > > + return;
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > + mutex_unlock(&acpi_mutex_region);
> > > + return;
> > > + }
> > > + if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > > + mutex_unlock(&acpi_mutex_region);
> > > + return;
> >
> > What about
> >
> > if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> > || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> > mutex_unlock(&acpi_mutex_region);
> > return;
> > }
> >
>
> OK.
>
> > > + }
> > > +
> > > + rgn->flags |= ACPI_REGION_UNREGISTERING;
> > > + rgn->handler = NULL;
> > > + rgn->setup = NULL;
> > > + rgn->context = NULL;
> > > + mutex_unlock(&acpi_mutex_region);
> > > +
> > > + while (atomic_read(&rgn->refcnt) > 1)
> > > + schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> >
> > Wouldn't it be better to use a wait queue here?
>
> Yes, I'll try.
>
> >
> > > + atomic_dec(&rgn->refcnt);
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + rgn->flags &= ~(ACPI_REGION_REGISTERED |
> > ACPI_REGION_UNREGISTERING);
> > > + mutex_unlock(&acpi_mutex_region);
> > > +
> > > + pr_info("Region %d unregistered\n", space_id); }
> > > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > > a2c2fbb..15fad0d 100644
> > > --- a/include/acpi/acpi_bus.h
> > > +++ b/include/acpi/acpi_bus.h
> > > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > > *bus) { return 0; }
> > >
> > > #endif /* CONFIG_ACPI */
> > >
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > + acpi_adr_space_handler handler,
> > > + acpi_adr_space_setup setup, void *context); void
> > > +acpi_unregister_region(acpi_adr_space_type space_id);
> > > +
> > > #endif /*__ACPI_BUS_H__*/
> >
> > Thanks,
> > Rafael
>
> Thanks
> -Lv
>
> >
> >
> > --
> > I speak only for myself.
> > Rafael J. Wysocki, Intel Open Source Technology Center.
> N r y b X ǧv ^ )޺{.n + { i b {ay ʇڙ ,j f h z  w
> j:+v w j m zZ+ ݢj" ! i
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 08:16:22

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers

> From: [email protected]
> [mailto:[email protected]] On Behalf Of Zheng, Lv
> Sent: Friday, July 26, 2013 9:54 AM
> To: Rafael J. Wysocki
> Cc: Wysocki, Rafael J; Brown, Len; [email protected];
> [email protected]
> Subject: RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI
> operation region handlers
>
> > From: Rafael J. Wysocki [mailto:[email protected]]
> > Sent: Friday, July 26, 2013 5:29 AM
> >
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers
> > > to fix races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment.
> >
> > Actually, can you please explain to me what *exactly* the problem is?
>
> OK. I'll add race explanations in the next revision.
>
> The problem is there is no "lock" held inside ACPICA for invoking operation
> region handlers.
> Thus races happens between the acpi_remove/install_address_space_handler
> and the handler/setup callbacks.

This seems not a good explanation of the intent of this patch.
I think the intent is here in the patch description:

1. It acts as a barrier for operation region callbacks - no callback will
happen after acpi_unregister_region().
2. acpi_unregister_region() is safe to be called in moudle->exit()
functions.

Hmm, maybe I need to re-order the patch description for this patch.

Thanks for commenting.

Best regards
-Lv

>
> This is correct per ACPI specification.
> As if there is interpreter locks held for invoking operation region handlers, the
> timeout implemented inside the operation region handlers will make all locking
> facilities (Acquire or Sleep,...) timed out.
> Please refer to ACPI specification "5.5.2 Control Method Execution":
> Interpretation of a Control Method is not preemptive, but it can block. When a
> control method does block, OSPM can initiate or continue the execution of a
> different control method. A control method can only assume that access to
> global objects is exclusive for any period the control method does not block.
>
> So it is pretty much likely that ACPI IO transfers are locked inside the operation
> region callback implementations.
> Using locking facility to protect the callback invocation will risk dead locks.
>
> Thanks
> -Lv
>
> > Rafael
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-26 13:28:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members

On Friday, July 26, 2013 01:25:12 AM Zheng, Lv wrote:
> > From: Rafael J. Wysocki [mailto:[email protected]]
> > Sent: Friday, July 26, 2013 6:26 AM
> >
> > On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> > > This is a trivial patch:
> > > 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
> > > actually used.
> > > 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
> > > by dev_warn() invocations, so changes it to struct device.
> > >
> > > Signed-off-by: Lv Zheng <[email protected]>
> > > Reviewed-by: Huang Ying <[email protected]>
> > > ---
> > > drivers/acpi/acpi_ipmi.c | 30 ++++++++++++++----------------
> > > 1 file changed, 14 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > 0ee1ea6..7f93ffd 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
> > > struct list_head tx_msg_list;
> > > spinlock_t tx_msg_lock;
> > > acpi_handle handle;
> > > - struct pnp_dev *pnp_dev;
> > > + struct device *dev;
> > > ipmi_user_t user_interface;
> > > int ipmi_ifnum; /* IPMI interface number */
> > > long curr_msgid;
> > > - struct ipmi_smi_info smi_data;
> > > atomic_t refcnt;
> > > };
> > >
> > > @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = { };
> > >
> > > static struct acpi_ipmi_device *
> > > -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > > handle)
> > > +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
> >
> > Why is the second arg called pdev?
>
> OK, I will change it to dev.

OK, thanks.

> >
> > > {
> > > struct acpi_ipmi_device *ipmi_device;
> > > int err;
> > > @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info
> > *smi_data, acpi_handle handle)
> > > spin_lock_init(&ipmi_device->tx_msg_lock);
> > >
> > > ipmi_device->handle = handle;
> > > - ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > > - memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> > ipmi_smi_info));
> > > + ipmi_device->dev = get_device(pdev);
> > > ipmi_device->ipmi_ifnum = iface;
> > >
> > > err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > > ipmi_device, &user);
> > > if (err) {
> > > - put_device(smi_data->dev);
> > > + put_device(pdev);
> > > kfree(ipmi_device);
> > > return NULL;
> > > }
> > > @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device
> > > *ipmi_device) static void ipmi_dev_release(struct acpi_ipmi_device
> > > *ipmi_device) {
> > > ipmi_destroy_user(ipmi_device->user_interface);
> > > - put_device(ipmi_device->smi_data.dev);
> > > + put_device(ipmi_device->dev);
> > > kfree(ipmi_device);
> > > }
> > >
> > > @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct
> > acpi_ipmi_msg *tx_msg,
> > > buffer = (struct acpi_ipmi_buffer *)value;
> > > /* copy the tx message data */
> > > if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > - dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> > > + dev_WARN_ONCE(tx_msg->device->dev, true,
> > > "Unexpected request (msg len %d).\n",
> > > buffer->length);
> > > return -EINVAL;
> > > @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct
> > ipmi_recv_msg *msg, void *user_msg_data)
> > > struct acpi_ipmi_device *ipmi_device = user_msg_data;
> > > int msg_found = 0;
> > > struct acpi_ipmi_msg *tx_msg;
> > > - struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> > > + struct device *dev = ipmi_device->dev;
> > > unsigned long flags;
> > >
> > > if (msg->user != ipmi_device->user_interface) {
> > > - dev_warn(&pnp_dev->dev,
> > > + dev_warn(dev,
> > > "Unexpected response is returned. returned user %p, expected
> > user %p\n",
> > > msg->user, ipmi_device->user_interface);
> > > goto out_msg;
> > > @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > *msg, void *user_msg_data)
> > > spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > >
> > > if (!msg_found) {
> > > - dev_warn(&pnp_dev->dev,
> > > + dev_warn(dev,
> > > "Unexpected response (msg id %ld) is returned.\n",
> > > msg->msgid);
> > > goto out_msg;
> > > @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > > *msg, void *user_msg_data)
> > >
> > > /* copy the response data to Rx_data buffer */
> > > if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > - dev_WARN_ONCE(&pnp_dev->dev, true,
> > > + dev_WARN_ONCE(dev, true,
> > > "Unexpected response (msg len %d).\n",
> > > msg->msg.data_len);
> > > goto out_comp;
> > > @@ -431,7 +429,7 @@ out_msg:
> > > static void ipmi_register_bmc(int iface, struct device *dev) {
> > > struct acpi_ipmi_device *ipmi_device, *temp;
> > > - struct pnp_dev *pnp_dev;
> > > + struct device *pdev;
> >
> > And here?
>
> The dev is the parameter of the ipmi_register_bmc(), it is not possible to name the "struct ipmi_smi_info " as dev here for this quick fix.

Right. What about smi_dev? Or just use smi_data.dev directly? It's just two
places and shouldn't cause any line wraps to happen.

Rafael


> > > int err;
> > > struct ipmi_smi_info smi_data;
> > > acpi_handle handle;
> > > @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct
> > device *dev)
> > > handle = smi_data.addr_info.acpi_info.acpi_handle;
> > > if (!handle)
> > > goto err_ref;
> > > - pnp_dev = to_pnp_dev(smi_data.dev);
> > > + pdev = smi_data.dev;
> > >
> > > - ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > > + ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
> > > if (!ipmi_device) {
> > > - dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > > + dev_warn(pdev, "Can't create IPMI user interface\n");
> > > goto err_ref;
> > > }
> > >
> > >
> > --
> > I speak only for myself.
> > Rafael J. Wysocki, Intel Open Source Technology Center.
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-26 13:31:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers

On Friday, July 26, 2013 01:21:18 AM Zheng, Lv wrote:
> > From: [email protected]
> > [mailto:[email protected]] On Behalf Of Rafael J. Wysocki
> > Sent: Friday, July 26, 2013 6:23 AM
> >
> > On Tuesday, July 23, 2013 04:09:54 PM Lv Zheng wrote:
> > > This patch adds reference counting for ACPI IPMI transfers to tune the
> > > locking granularity of tx_msg_lock.
> > >
> > > The acpi_ipmi_msg handling is re-designed using referece counting.
> > > 1. tx_msg is always unlinked before complete(), so that:
> > > 1.1. it is safe to put complete() out side of tx_msg_lock;
> > > 1.2. complete() can only happen once, thus smp_wmb() is not required.
> > > 2. Increasing the reference of tx_msg before calling
> > > ipmi_request_settime() and introducing tx_msg_lock protected
> > > ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
> > > tx_msg unlinking in the failure cases.
> > > 3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
> > > and freed in the contexts other than acpi_ipmi_space_handler().
> > >
> > > The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
> > > tuning:
> > > 1. ipmi_lock is always leaf:
> > > irq_context: 0
> > > [ffffffff81a943f8] smi_watchers_mutex
> > > [ffffffffa06eca60] driver_data.ipmi_lock
> > > irq_context: 0
> > > [ffffffff82767b40] &buffer->mutex
> > > [ffffffffa00a6678] s_active#103
> > > [ffffffffa06eca60] driver_data.ipmi_lock
> > > 2. without this patch applied, lock used by complete() is held after
> > > holding tx_msg_lock:
> > > irq_context: 0
> > > [ffffffff82767b40] &buffer->mutex
> > > [ffffffffa00a6678] s_active#103
> > > [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > > irq_context: 1
> > > [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > > irq_context: 1
> > > [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > > [ffffffffa06eccf0] &x->wait#25
> > > irq_context: 1
> > > [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > > [ffffffffa06eccf0] &x->wait#25
> > > [ffffffff81e36620] &p->pi_lock
> > > irq_context: 1
> > > [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > > [ffffffffa06eccf0] &x->wait#25
> > > [ffffffff81e36620] &p->pi_lock
> > > [ffffffff81e5d0a8] &rq->lock
> > > 3. with this patch applied, tx_msg_lock is always leaf:
> > > irq_context: 0
> > > [ffffffff82767b40] &buffer->mutex
> > > [ffffffffa00a66d8] s_active#107
> > > [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> > > irq_context: 1
> > > [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> > >
> > > Signed-off-by: Lv Zheng <[email protected]>
> > > Cc: Zhao Yakui <[email protected]>
> > > Reviewed-by: Huang Ying <[email protected]>
> > > ---
> > > drivers/acpi/acpi_ipmi.c | 107
> > +++++++++++++++++++++++++++++++++-------------
> > > 1 file changed, 77 insertions(+), 30 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > > index 2a09156..0ee1ea6 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
> > > u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
> > > u8 rx_len;
> > > struct acpi_ipmi_device *device;
> > > + atomic_t refcnt;
> >
> > Again: kref, please?
>
> Please see the concerns in another email.
>
> >
> > > };
> > >
> > > /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
> > > @@ -195,22 +196,47 @@ static struct acpi_ipmi_device
> > *acpi_ipmi_get_selected_smi(void)
> > > return ipmi_device;
> > > }
> > >
> > > -static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device
> > *ipmi)
> > > +static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
> > > {
> > > + struct acpi_ipmi_device *ipmi;
> > > struct acpi_ipmi_msg *ipmi_msg;
> > > - struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > >
> > > + ipmi = acpi_ipmi_get_selected_smi();
> > > + if (!ipmi)
> > > + return NULL;
> > > ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
> > > - if (!ipmi_msg) {
> > > - dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
> > > + if (!ipmi_msg) {
> > > + acpi_ipmi_dev_put(ipmi);
> > > return NULL;
> > > }
> > > + atomic_set(&ipmi_msg->refcnt, 1);
> > > init_completion(&ipmi_msg->tx_complete);
> > > INIT_LIST_HEAD(&ipmi_msg->head);
> > > ipmi_msg->device = ipmi;
> > > +
> > > return ipmi_msg;
> > > }
> > >
> > > +static struct acpi_ipmi_msg *
> > > +acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
> > > +{
> > > + if (tx_msg)
> > > + atomic_inc(&tx_msg->refcnt);
> > > + return tx_msg;
> > > +}
> > > +
> > > +static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
> > > +{
> > > + acpi_ipmi_dev_put(tx_msg->device);
> > > + kfree(tx_msg);
> > > +}
> > > +
> > > +static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
> > > +{
> > > + if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
> > > + ipmi_msg_release(tx_msg);
> > > +}
> > > +
> > > #define IPMI_OP_RGN_NETFN(offset) ((offset >> 8) & 0xff)
> > > #define IPMI_OP_RGN_CMD(offset) (offset & 0xff)
> > > static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
> > > @@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct
> > acpi_ipmi_msg *msg,
> > >
> > > static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
> > > {
> > > - struct acpi_ipmi_msg *tx_msg, *temp;
> > > + struct acpi_ipmi_msg *tx_msg;
> > > unsigned long flags;
> > >
> > > /*
> > > @@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct
> > acpi_ipmi_device *ipmi)
> > > */
> > > while (atomic_read(&ipmi->refcnt) > 1) {
> > > spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > - list_for_each_entry_safe(tx_msg, temp,
> > > - &ipmi->tx_msg_list, head) {
> > > + while (!list_empty(&ipmi->tx_msg_list)) {
> > > + tx_msg = list_first_entry(&ipmi->tx_msg_list,
> > > + struct acpi_ipmi_msg,
> > > + head);
> > > + list_del(&tx_msg->head);
> > > + spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > > +
> > > /* wake up the sleep thread on the Tx msg */
> > > complete(&tx_msg->tx_complete);
> > > + acpi_ipmi_msg_put(tx_msg);
> > > + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > }
> > > spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > > +
> > > schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> > > }
> > > }
> > >
> > > +static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
> > > + struct acpi_ipmi_msg *msg)
> > > +{
> > > + struct acpi_ipmi_msg *tx_msg;
> > > + int msg_found = 0;
> >
> > Use bool?
>
> OK.
> There are other int flags in the original codes, do I need to do a cleanup for all of them (dev_found)?

Not in this patch, but in general it wouldn't hurt.

> > > + unsigned long flags;
> > > +
> > > + spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > + list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
> > > + if (msg == tx_msg) {
> > > + msg_found = 1;
> > > + break;
> > > + }
> > > + }
> > > + if (msg_found)
> > > + list_del(&tx_msg->head);
> >
> > The list_del() can be done when you set msg_found.
>
> Please see my concerns in another email.

OK, I'll reply there.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-26 13:50:33

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers

On Friday, July 26, 2013 12:47:44 AM Zheng, Lv wrote:
>
> > From: Rafael J. Wysocki [mailto:[email protected]]
> > Sent: Friday, July 26, 2013 4:27 AM
> >
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers
> > > to fix races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment. This patch tries to protect
> > > the address space callbacks by invoking them under a module safe
> > environment.
> > > The IPMI address space handler is also upgraded in this patch.
> > > The acpi_unregister_region() is designed to meet the following
> > > requirements:
> > > 1. It acts as a barrier for operation region callbacks - no callback will
> > > happen after acpi_unregister_region().
> > > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > > functions.
> > > Using reference counting rather than module referencing allows such
> > > benefits to be achieved even when acpi_unregister_region() is called
> > > in the environments other than module->exit().
> > > The header file of include/acpi/acpi_bus.h should contain the
> > > declarations that have references to some ACPICA defined types.
> > >
> > > Signed-off-by: Lv Zheng <[email protected]>
> > > Reviewed-by: Huang Ying <[email protected]>
> > > ---
> > > drivers/acpi/acpi_ipmi.c | 16 ++--
> > > drivers/acpi/osl.c | 224
> > ++++++++++++++++++++++++++++++++++++++++++++++
> > > include/acpi/acpi_bus.h | 5 ++
> > > 3 files changed, 235 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > 5f8f495..2a09156 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -539,20 +539,18 @@ out_ref:
> > > static int __init acpi_ipmi_init(void) {
> > > int result = 0;
> > > - acpi_status status;
> > >
> > > if (acpi_disabled)
> > > return result;
> > >
> > > mutex_init(&driver_data.ipmi_lock);
> > >
> > > - status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > - ACPI_ADR_SPACE_IPMI,
> > > - &acpi_ipmi_space_handler,
> > > - NULL, NULL);
> > > - if (ACPI_FAILURE(status)) {
> > > + result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > > + &acpi_ipmi_space_handler,
> > > + NULL, NULL);
> > > + if (result) {
> > > pr_warn("Can't register IPMI opregion space handle\n");
> > > - return -EINVAL;
> > > + return result;
> > > }
> > >
> > > result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > > }
> > > mutex_unlock(&driver_data.ipmi_lock);
> > >
> > > - acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > > - ACPI_ADR_SPACE_IPMI,
> > > - &acpi_ipmi_space_handler);
> > > + acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > > }
> > >
> > > module_init(acpi_ipmi_init);
> > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > > 6ab2c35..8398e51 100644
> > > --- a/drivers/acpi/osl.c
> > > +++ b/drivers/acpi/osl.c
> > > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq; static
> > > struct workqueue_struct *kacpi_notify_wq; static struct
> > > workqueue_struct *kacpi_hotplug_wq;
> > >
> > > +struct acpi_region {
> > > + unsigned long flags;
> > > +#define ACPI_REGION_DEFAULT 0x01
> > > +#define ACPI_REGION_INSTALLED 0x02
> > > +#define ACPI_REGION_REGISTERED 0x04
> > > +#define ACPI_REGION_UNREGISTERING 0x08
> > > +#define ACPI_REGION_INSTALLING 0x10
> >
> > What about (1UL << 1), (1UL << 2) etc.?
> >
> > Also please remove the #defines out of the struct definition.
>
> OK.
>
> >
> > > + /*
> > > + * NOTE: Upgrading All Region Handlers
> > > + * This flag is only used during the period where not all of the
> > > + * region handers are upgraded to the new interfaces.
> > > + */
> > > +#define ACPI_REGION_MANAGED 0x80
> > > + acpi_adr_space_handler handler;
> > > + acpi_adr_space_setup setup;
> > > + void *context;
> > > + /* Invoking references */
> > > + atomic_t refcnt;
> >
> > Actually, why don't you use krefs?
>
> If you take a look at other piece of my codes, you'll find there are two reasons:
>
> 1. I'm using while (atomic_read() > 1) to implement the objects' flushing and there is no kref API to do so.

No, there's not any, but you can read kref.refcount directly, can't you?

Moreover, it is not entirely clear to me that doing the while (atomic_read() > 1)
is actually correct.

> I just think it is not suitable for me to introduce such an API into kref.h and start another argument around kref designs in this bug fix patch. :-)
> I'll start a discussion about kref design using another thread.

You don't need to do that at all.

> 2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's kind of atomic_t coding style.
> If atomic_t is changed to struct kref, I will need to implement two API, __ipmi_dev_release() to take a struct kref as parameter and call ipmi_dev_release inside it.
> By not using kref, I needn't write codes to implement such API.

I'm not following you, sorry.

Please just use krefs for reference counting, the same way as you use
struct list_head for implementing lists. This is the way everyone does
that in the kernel and that's for a reason.

Unless you do your reference counting under a lock, in which case using
atomic_t isn't necessary at all and you can use a non-atomic counter.

> > > +};
> > > +
> > > +static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> > = {
> > > + [ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > > + .flags = ACPI_REGION_DEFAULT,
> > > + },
> > > + [ACPI_ADR_SPACE_SYSTEM_IO] = {
> > > + .flags = ACPI_REGION_DEFAULT,
> > > + },
> > > + [ACPI_ADR_SPACE_PCI_CONFIG] = {
> > > + .flags = ACPI_REGION_DEFAULT,
> > > + },
> > > + [ACPI_ADR_SPACE_IPMI] = {
> > > + .flags = ACPI_REGION_MANAGED,
> > > + },
> > > +};
> > > +static DEFINE_MUTEX(acpi_mutex_region);
> > > +
> > > /*
> > > * This list of permanent mappings is for memory that may be accessed
> > from
> > > * interrupt context, where we can't do the ioremap().
> > > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle,
> > u32 type, void *context,
> > > kfree(hp_work);
> > > }
> > > EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > > +
> > > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > > + /*
> > > + * NOTE: Default and Managed
> > > + * We only need to avoid region management on the regions managed
> > > + * by ACPICA (ACPI_REGION_DEFAULT). Currently, we need additional
> > > + * check as many operation region handlers are not upgraded, so
> > > + * only those known to be safe are managed (ACPI_REGION_MANAGED).
> > > + */
> > > + return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > > + (rgn->flags & ACPI_REGION_MANAGED); }
> > > +
> > > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > > + return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > > + !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > > +
> > > +static acpi_status
> > > +acpi_region_default_handler(u32 function,
> > > + acpi_physical_address address,
> > > + u32 bit_width, u64 *value,
> > > + void *handler_context, void *region_context) {
> > > + acpi_adr_space_handler handler;
> > > + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > + void *context;
> > > + acpi_status status = AE_NOT_EXIST;
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (!acpi_region_callable(rgn) || !rgn->handler) {
> > > + mutex_unlock(&acpi_mutex_region);
> > > + return status;
> > > + }
> > > +
> > > + atomic_inc(&rgn->refcnt);
> > > + handler = rgn->handler;
> > > + context = rgn->context;
> > > + mutex_unlock(&acpi_mutex_region);
> > > +
> > > + status = handler(function, address, bit_width, value, context,
> > > + region_context);
> >
> > Why don't we call the handler under the mutex?
> >
> > What exactly prevents context from becoming NULL before the call above?
>
> It's a kind of programming style related concern.
> IMO, using locks around callback function is a buggy programming style that could lead to dead locks.
> Let me explain this using an example.
>
> Object A exports a register/unregister API for other objects.
> Object B calls A's register/unregister API to register/unregister B's callback.
> It's likely that object B will hold lock_of_B around unregister/register when object B is destroyed/created, the lock_of_B is likely also used inside the callback.

Why is it likely to be used inside the callback? Clearly, if a callback is
executed under a lock, that lock can't be acquired by that callback.

> So when object A holds the lock_of_A around the callback invocation, it leads to dead lock since:
> 1. the locking order for the register/unregister side will be: lock(lock_of_B), lock(lock_of_A)
> 2. the locking order for the callback side will be: lock(lock_of_A), lock(lock_of_B)
> They are in the reversed order!
>
> IMO, Linux may need to introduce __callback, __api as decelerators for the functions, and use sparse to enforce this rule, sparse knows if a callback is invoked under some locks.

Oh, dear. Yes, sparse knows such things, and so what?

> In the case of ACPICA space_handlers, as you may know, when an ACPI operation region handler is invoked, there will be no lock held inside ACPICA (interpreter lock must be freed before executing operation region handlers).
> So the likelihood of the dead lock is pretty much high here!

Sorry, what are you talking about?

Please let me rephrase my question: What *practical* problems would it lead to
if we executed this particular callback under this particular mutex?

Not *theoretical* in the general theory of everything, *practical* in this
particular piece of code.

And we are talking about a *global* mutex here, not something object-specific.

> > > + atomic_dec(&rgn->refcnt);
> > > +
> > > + return status;
> > > +}
> > > +
> > > +static acpi_status
> > > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > > + void *handler_context, void **region_context) {
> > > + acpi_adr_space_setup setup;
> > > + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > + void *context;
> > > + acpi_status status = AE_OK;
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (!acpi_region_callable(rgn) || !rgn->setup) {
> > > + mutex_unlock(&acpi_mutex_region);
> > > + return status;
> > > + }
> > > +
> > > + atomic_inc(&rgn->refcnt);
> > > + setup = rgn->setup;
> > > + context = rgn->context;
> > > + mutex_unlock(&acpi_mutex_region);
> > > +
> > > + status = setup(handle, function, context, region_context);
> >
> > Can setup drop rgn->refcnt ?
>
> The reason is same as the handler, as a setup is also a callback.

Let me rephrase: Is it legitimate for setup to modify rgn->refcnt?
If so, then why?

> >
> > > + atomic_dec(&rgn->refcnt);
> > > +
> > > + return status;
> > > +}
> > > +
> > > +static int __acpi_install_region(struct acpi_region *rgn,
> > > + acpi_adr_space_type space_id)
> > > +{
> > > + int res = 0;
> > > + acpi_status status;
> > > + int installing = 0;
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (rgn->flags & ACPI_REGION_INSTALLED)
> > > + goto out_lock;
> > > + if (rgn->flags & ACPI_REGION_INSTALLING) {
> > > + res = -EBUSY;
> > > + goto out_lock;
> > > + }
> > > +
> > > + installing = 1;
> > > + rgn->flags |= ACPI_REGION_INSTALLING;
> > > + status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > space_id,
> > > + acpi_region_default_handler,
> > > + acpi_region_default_setup,
> > > + rgn);
> > > + rgn->flags &= ~ACPI_REGION_INSTALLING;
> > > + if (ACPI_FAILURE(status))
> > > + res = -EINVAL;
> > > + else
> > > + rgn->flags |= ACPI_REGION_INSTALLED;
> > > +
> > > +out_lock:
> > > + mutex_unlock(&acpi_mutex_region);
> > > + if (installing) {
> > > + if (res)
> > > + pr_err("Failed to install region %d\n", space_id);
> > > + else
> > > + pr_info("Region %d installed\n", space_id);
> > > + }
> > > + return res;
> > > +}
> > > +
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > + acpi_adr_space_handler handler,
> > > + acpi_adr_space_setup setup, void *context) {
> > > + int res;
> > > + struct acpi_region *rgn;
> > > +
> > > + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > + return -EINVAL;
> > > +
> > > + rgn = &acpi_regions[space_id];
> > > + if (!acpi_region_managed(rgn))
> > > + return -EINVAL;
> > > +
> > > + res = __acpi_install_region(rgn, space_id);
> > > + if (res)
> > > + return res;
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (rgn->flags & ACPI_REGION_REGISTERED) {
> > > + mutex_unlock(&acpi_mutex_region);
> > > + return -EBUSY;
> > > + }
> > > +
> > > + rgn->handler = handler;
> > > + rgn->setup = setup;
> > > + rgn->context = context;
> > > + rgn->flags |= ACPI_REGION_REGISTERED;
> > > + atomic_set(&rgn->refcnt, 1);
> > > + mutex_unlock(&acpi_mutex_region);
> > > +
> > > + pr_info("Region %d registered\n", space_id);
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > > +
> > > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > > + struct acpi_region *rgn;
> > > +
> > > + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > + return;
> > > +
> > > + rgn = &acpi_regions[space_id];
> > > + if (!acpi_region_managed(rgn))
> > > + return;
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > + mutex_unlock(&acpi_mutex_region);
> > > + return;
> > > + }
> > > + if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > > + mutex_unlock(&acpi_mutex_region);
> > > + return;
> >
> > What about
> >
> > if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> > || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> > mutex_unlock(&acpi_mutex_region);
> > return;
> > }
> >
>
> OK.
>
> > > + }
> > > +
> > > + rgn->flags |= ACPI_REGION_UNREGISTERING;
> > > + rgn->handler = NULL;
> > > + rgn->setup = NULL;
> > > + rgn->context = NULL;
> > > + mutex_unlock(&acpi_mutex_region);
> > > +
> > > + while (atomic_read(&rgn->refcnt) > 1)
> > > + schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> >
> > Wouldn't it be better to use a wait queue here?
>
> Yes, I'll try.

By the way, we do we need to do that?

> > > + atomic_dec(&rgn->refcnt);
> > > +
> > > + mutex_lock(&acpi_mutex_region);
> > > + rgn->flags &= ~(ACPI_REGION_REGISTERED |
> > ACPI_REGION_UNREGISTERING);
> > > + mutex_unlock(&acpi_mutex_region);
> > > +
> > > + pr_info("Region %d unregistered\n", space_id); }
> > > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > > a2c2fbb..15fad0d 100644
> > > --- a/include/acpi/acpi_bus.h
> > > +++ b/include/acpi/acpi_bus.h
> > > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > > *bus) { return 0; }
> > >
> > > #endif /* CONFIG_ACPI */
> > >
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > + acpi_adr_space_handler handler,
> > > + acpi_adr_space_setup setup, void *context); void
> > > +acpi_unregister_region(acpi_adr_space_type space_id);
> > > +
> > > #endif /*__ACPI_BUS_H__*/

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-26 14:39:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers

On Friday, July 26, 2013 01:54:00 AM Zheng, Lv wrote:
> > From: Rafael J. Wysocki [mailto:[email protected]]
> > Sent: Friday, July 26, 2013 5:29 AM
> >
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers to fix
> > > races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment.
> >
> > Actually, can you please explain to me what *exactly* the problem is?
>
> OK. I'll add race explanations in the next revision.
>
> The problem is there is no "lock" held inside ACPICA for invoking operation
> region handlers.
> Thus races happens between the acpi_remove/install_address_space_handler and
> the handler/setup callbacks.

I see. Now you're trying to introduce something that would prevent those
races from happening, right?

> This is correct per ACPI specification.
> As if there is interpreter locks held for invoking operation region handlers,
> the timeout implemented inside the operation region handlers will make all
> locking facilities (Acquire or Sleep,...) timed out.
> Please refer to ACPI specification "5.5.2 Control Method Execution":
> Interpretation of a Control Method is not preemptive, but it can block. When
> a control method does block, OSPM can initiate or continue the execution of
> a different control method. A control method can only assume that access to
> global objects is exclusive for any period the control method does not block.
>
> So it is pretty much likely that ACPI IO transfers are locked inside the
> operation region callback implementations.
> Using locking facility to protect the callback invocation will risk dead locks.

No. If you use a single global lock around all invocations of operation region
handlers, it won't deadlock, but it will *serialize* things. This means that
there won't be two handlers executing in parallel. That may or may not be bad
depending on what those handlers actually do.

Your concern seems to be that if one address space handler is buggy and it
blocks indefinitely, executing it under such a lock would affect the other
address space handlers and in my opinion this is a valid concern.

So the idea seems to be to add wrappers around acpi_install_address_space_handler()
and acpi_remove_address_space_handler (but I don't see where the latter is called
after the change?), such that they will know when it is safe to unregister the
handler. That is simple enough.

However, I'm not sure it is needed in the context of IPMI. Your address space
handler's context is NULL, so even it if is executed after
acpi_remove_address_space_handler() has been called for it (or in parallel),
it doesn't depend on anything passed by the caller, so I don't see why the
issue can't be addressed by a proper synchronization between
acpi_ipmi_exit() and acpi_ipmi_space_handler().

Clearly, acpi_ipmi_exit() should wait for all already running instances of
acpi_ipmi_space_handler() to complete and all acpi_ipmi_space_handler()
instances started after acpi_ipmi_exit() has been called must return
immediately.

I would imagine an algorithm like this:

acpi_ipmi_exit()
1. Take "address space handler lock".
2. Set "unregistering address space handler" flag.
3. Check if "count of currently running handlers" is 0. If so,
call acpi_remove_address_space_handler(), drop the lock (possibly clear the
flag) and return.
4. Otherwise drop the lock and go to sleep in "address space handler wait queue".
5. When woken up, take "address space handler lock" and go to 3.

acpi_ipmi_space_handler()
1. Take "address space handler lock".
2. Check "unregistering address space handler" flag. If set, drop the lock
and return.
3. Increment "count of currently running handlers".
4. Drop the lock.
5. Do your work.
6. Take "address space handler lock".
7. Decrement "count of currently running handlers" and if 0, signal the
tasks waiting on it to wake up.
8. Drop the lock.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-29 01:13:01

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members

> From: [email protected]
> [mailto:[email protected]] On Behalf Of Rafael J. Wysocki
>
> On Friday, July 26, 2013 01:25:12 AM Zheng, Lv wrote:
> > > From: Rafael J. Wysocki [mailto:[email protected]]
> > > Sent: Friday, July 26, 2013 6:26 AM
> > >
> > > On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> > > > This is a trivial patch:
> > > > 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
> > > > actually used.
> > > > 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only
> used
> > > > by dev_warn() invocations, so changes it to struct device.
> > > >
> > > > Signed-off-by: Lv Zheng <[email protected]>
> > > > Reviewed-by: Huang Ying <[email protected]>
> > > > ---
> > > > drivers/acpi/acpi_ipmi.c | 30 ++++++++++++++----------------
> > > > 1 file changed, 14 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > > 0ee1ea6..7f93ffd 100644
> > > > --- a/drivers/acpi/acpi_ipmi.c
> > > > +++ b/drivers/acpi/acpi_ipmi.c
> > > > @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
> > > > struct list_head tx_msg_list;
> > > > spinlock_t tx_msg_lock;
> > > > acpi_handle handle;
> > > > - struct pnp_dev *pnp_dev;
> > > > + struct device *dev;
> > > > ipmi_user_t user_interface;
> > > > int ipmi_ifnum; /* IPMI interface number */
> > > > long curr_msgid;
> > > > - struct ipmi_smi_info smi_data;
> > > > atomic_t refcnt;
> > > > };
> > > >
> > > > @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = { };
> > > >
> > > > static struct acpi_ipmi_device *
> > > > -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > > > handle)
> > > > +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
> > >
> > > Why is the second arg called pdev?
> >
> > OK, I will change it to dev.
>
> OK, thanks.
>
> > >
> > > > {
> > > > struct acpi_ipmi_device *ipmi_device;
> > > > int err;
> > > > @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info
> > > *smi_data, acpi_handle handle)
> > > > spin_lock_init(&ipmi_device->tx_msg_lock);
> > > >
> > > > ipmi_device->handle = handle;
> > > > - ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > > > - memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> > > ipmi_smi_info));
> > > > + ipmi_device->dev = get_device(pdev);
> > > > ipmi_device->ipmi_ifnum = iface;
> > > >
> > > > err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > > > ipmi_device, &user);
> > > > if (err) {
> > > > - put_device(smi_data->dev);
> > > > + put_device(pdev);
> > > > kfree(ipmi_device);
> > > > return NULL;
> > > > }
> > > > @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device
> > > > *ipmi_device) static void ipmi_dev_release(struct acpi_ipmi_device
> > > > *ipmi_device) {
> > > > ipmi_destroy_user(ipmi_device->user_interface);
> > > > - put_device(ipmi_device->smi_data.dev);
> > > > + put_device(ipmi_device->dev);
> > > > kfree(ipmi_device);
> > > > }
> > > >
> > > > @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct
> > > acpi_ipmi_msg *tx_msg,
> > > > buffer = (struct acpi_ipmi_buffer *)value;
> > > > /* copy the tx message data */
> > > > if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > > - dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> > > > + dev_WARN_ONCE(tx_msg->device->dev, true,
> > > > "Unexpected request (msg len %d).\n",
> > > > buffer->length);
> > > > return -EINVAL;
> > > > @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct
> > > ipmi_recv_msg *msg, void *user_msg_data)
> > > > struct acpi_ipmi_device *ipmi_device = user_msg_data;
> > > > int msg_found = 0;
> > > > struct acpi_ipmi_msg *tx_msg;
> > > > - struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> > > > + struct device *dev = ipmi_device->dev;
> > > > unsigned long flags;
> > > >
> > > > if (msg->user != ipmi_device->user_interface) {
> > > > - dev_warn(&pnp_dev->dev,
> > > > + dev_warn(dev,
> > > > "Unexpected response is returned. returned user %p,
> expected
> > > user %p\n",
> > > > msg->user, ipmi_device->user_interface);
> > > > goto out_msg;
> > > > @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg
> > > *msg, void *user_msg_data)
> > > > spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > > >
> > > > if (!msg_found) {
> > > > - dev_warn(&pnp_dev->dev,
> > > > + dev_warn(dev,
> > > > "Unexpected response (msg id %ld) is returned.\n",
> > > > msg->msgid);
> > > > goto out_msg;
> > > > @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg
> > > > *msg, void *user_msg_data)
> > > >
> > > > /* copy the response data to Rx_data buffer */
> > > > if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > > - dev_WARN_ONCE(&pnp_dev->dev, true,
> > > > + dev_WARN_ONCE(dev, true,
> > > > "Unexpected response (msg len %d).\n",
> > > > msg->msg.data_len);
> > > > goto out_comp;
> > > > @@ -431,7 +429,7 @@ out_msg:
> > > > static void ipmi_register_bmc(int iface, struct device *dev) {
> > > > struct acpi_ipmi_device *ipmi_device, *temp;
> > > > - struct pnp_dev *pnp_dev;
> > > > + struct device *pdev;
> > >
> > > And here?
> >
> > The dev is the parameter of the ipmi_register_bmc(), it is not possible to
> name the "struct ipmi_smi_info " as dev here for this quick fix.
>
> Right. What about smi_dev? Or just use smi_data.dev directly? It's just
> two
> places and shouldn't cause any line wraps to happen.

Sounds good, I'll take your advice. :-)

Thanks and best regards
-Lv

>
> Rafael
>
>
> > > > int err;
> > > > struct ipmi_smi_info smi_data;
> > > > acpi_handle handle;
> > > > @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct
> > > device *dev)
> > > > handle = smi_data.addr_info.acpi_info.acpi_handle;
> > > > if (!handle)
> > > > goto err_ref;
> > > > - pnp_dev = to_pnp_dev(smi_data.dev);
> > > > + pdev = smi_data.dev;
> > > >
> > > > - ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > > > + ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
> > > > if (!ipmi_device) {
> > > > - dev_warn(&pnp_dev->dev, "Can't create IPMI user
> interface\n");
> > > > + dev_warn(pdev, "Can't create IPMI user interface\n");
> > > > goto err_ref;
> > > > }
> > > >
> > > >
> > > --
> > > I speak only for myself.
> > > Rafael J. Wysocki, Intel Open Source Technology Center.
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-29 01:43:30

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers

> On Friday, July 26, 2013 10:01 PM Rafael J. Wysocki wrote:
> > On Friday, July 26, 2013 12:47:44 AM Zheng, Lv wrote:
> >
> > > On Friday, July 26, 2013 4:27 AM Rafael J. Wysocki wrote:
> > >
> > > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > > This patch adds reference couting for ACPI operation region handlers
> > > > to fix races caused by the ACPICA address space callback invocations.
> > > >
> > > > ACPICA address space callback invocation is not suitable for Linux
> > > > CONFIG_MODULE=y execution environment. This patch tries to protect
> > > > the address space callbacks by invoking them under a module safe
> > > environment.
> > > > The IPMI address space handler is also upgraded in this patch.
> > > > The acpi_unregister_region() is designed to meet the following
> > > > requirements:
> > > > 1. It acts as a barrier for operation region callbacks - no callback will
> > > > happen after acpi_unregister_region().
> > > > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > > > functions.
> > > > Using reference counting rather than module referencing allows such
> > > > benefits to be achieved even when acpi_unregister_region() is called
> > > > in the environments other than module->exit().
> > > > The header file of include/acpi/acpi_bus.h should contain the
> > > > declarations that have references to some ACPICA defined types.
> > > >
> > > > Signed-off-by: Lv Zheng <[email protected]>
> > > > Reviewed-by: Huang Ying <[email protected]>
> > > > ---
> > > > drivers/acpi/acpi_ipmi.c | 16 ++--
> > > > drivers/acpi/osl.c | 224
> > > ++++++++++++++++++++++++++++++++++++++++++++++
> > > > include/acpi/acpi_bus.h | 5 ++
> > > > 3 files changed, 235 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > > 5f8f495..2a09156 100644
> > > > --- a/drivers/acpi/acpi_ipmi.c
> > > > +++ b/drivers/acpi/acpi_ipmi.c
> > > > @@ -539,20 +539,18 @@ out_ref:
> > > > static int __init acpi_ipmi_init(void) {
> > > > int result = 0;
> > > > - acpi_status status;
> > > >
> > > > if (acpi_disabled)
> > > > return result;
> > > >
> > > > mutex_init(&driver_data.ipmi_lock);
> > > >
> > > > - status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > > - ACPI_ADR_SPACE_IPMI,
> > > > - &acpi_ipmi_space_handler,
> > > > - NULL, NULL);
> > > > - if (ACPI_FAILURE(status)) {
> > > > + result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > > > + &acpi_ipmi_space_handler,
> > > > + NULL, NULL);
> > > > + if (result) {
> > > > pr_warn("Can't register IPMI opregion space handle\n");
> > > > - return -EINVAL;
> > > > + return result;
> > > > }
> > > >
> > > > result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > > > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > > > }
> > > > mutex_unlock(&driver_data.ipmi_lock);
> > > >
> > > > - acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > > > - ACPI_ADR_SPACE_IPMI,
> > > > - &acpi_ipmi_space_handler);
> > > > + acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > > > }
> > > >
> > > > module_init(acpi_ipmi_init);
> > > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > > > 6ab2c35..8398e51 100644
> > > > --- a/drivers/acpi/osl.c
> > > > +++ b/drivers/acpi/osl.c
> > > > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
> static
> > > > struct workqueue_struct *kacpi_notify_wq; static struct
> > > > workqueue_struct *kacpi_hotplug_wq;
> > > >
> > > > +struct acpi_region {
> > > > + unsigned long flags;
> > > > +#define ACPI_REGION_DEFAULT 0x01
> > > > +#define ACPI_REGION_INSTALLED 0x02
> > > > +#define ACPI_REGION_REGISTERED 0x04
> > > > +#define ACPI_REGION_UNREGISTERING 0x08
> > > > +#define ACPI_REGION_INSTALLING 0x10
> > >
> > > What about (1UL << 1), (1UL << 2) etc.?
> > >
> > > Also please remove the #defines out of the struct definition.
> >
> > OK.
> >
> > >
> > > > + /*
> > > > + * NOTE: Upgrading All Region Handlers
> > > > + * This flag is only used during the period where not all of the
> > > > + * region handers are upgraded to the new interfaces.
> > > > + */
> > > > +#define ACPI_REGION_MANAGED 0x80
> > > > + acpi_adr_space_handler handler;
> > > > + acpi_adr_space_setup setup;
> > > > + void *context;
> > > > + /* Invoking references */
> > > > + atomic_t refcnt;
> > >
> > > Actually, why don't you use krefs?
> >
> > If you take a look at other piece of my codes, you'll find there are two
> reasons:
> >
> > 1. I'm using while (atomic_read() > 1) to implement the objects' flushing and
> there is no kref API to do so.
>
> No, there's not any, but you can read kref.refcount directly, can't you?
>
> Moreover, it is not entirely clear to me that doing the while (atomic_read() > 1)
> is actually correct.
>
> > I just think it is not suitable for me to introduce such an API into kref.h and
> start another argument around kref designs in this bug fix patch. :-)
> > I'll start a discussion about kref design using another thread.
>
> You don't need to do that at all.
>
> > 2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's
> kind of atomic_t coding style.
> > If atomic_t is changed to struct kref, I will need to implement two API,
> __ipmi_dev_release() to take a struct kref as parameter and call
> ipmi_dev_release inside it.
> > By not using kref, I needn't write codes to implement such API.
>
> I'm not following you, sorry.
>
> Please just use krefs for reference counting, the same way as you use
> struct list_head for implementing lists. This is the way everyone does
> that in the kernel and that's for a reason.
>
> Unless you do your reference counting under a lock, in which case using
> atomic_t isn't necessary at all and you can use a non-atomic counter.

I'll follow your suggestion of kref.
You can find my concern 2 related stuff in the next revision.
It's trivial.

>
> > > > +};
> > > > +
> > > > +static struct acpi_region
> acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> > > = {
> > > > + [ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > > > + .flags = ACPI_REGION_DEFAULT,
> > > > + },
> > > > + [ACPI_ADR_SPACE_SYSTEM_IO] = {
> > > > + .flags = ACPI_REGION_DEFAULT,
> > > > + },
> > > > + [ACPI_ADR_SPACE_PCI_CONFIG] = {
> > > > + .flags = ACPI_REGION_DEFAULT,
> > > > + },
> > > > + [ACPI_ADR_SPACE_IPMI] = {
> > > > + .flags = ACPI_REGION_MANAGED,
> > > > + },
> > > > +};
> > > > +static DEFINE_MUTEX(acpi_mutex_region);
> > > > +
> > > > /*
> > > > * This list of permanent mappings is for memory that may be accessed
> > > from
> > > > * interrupt context, where we can't do the ioremap().
> > > > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle
> handle,
> > > u32 type, void *context,
> > > > kfree(hp_work);
> > > > }
> > > > EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > > > +
> > > > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > > > + /*
> > > > + * NOTE: Default and Managed
> > > > + * We only need to avoid region management on the regions
> managed
> > > > + * by ACPICA (ACPI_REGION_DEFAULT). Currently, we need
> additional
> > > > + * check as many operation region handlers are not upgraded, so
> > > > + * only those known to be safe are managed
> (ACPI_REGION_MANAGED).
> > > > + */
> > > > + return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > > > + (rgn->flags & ACPI_REGION_MANAGED); }
> > > > +
> > > > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > > > + return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > > > + !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > > > +
> > > > +static acpi_status
> > > > +acpi_region_default_handler(u32 function,
> > > > + acpi_physical_address address,
> > > > + u32 bit_width, u64 *value,
> > > > + void *handler_context, void *region_context) {
> > > > + acpi_adr_space_handler handler;
> > > > + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > > + void *context;
> > > > + acpi_status status = AE_NOT_EXIST;
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + if (!acpi_region_callable(rgn) || !rgn->handler) {
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + return status;
> > > > + }
> > > > +
> > > > + atomic_inc(&rgn->refcnt);
> > > > + handler = rgn->handler;
> > > > + context = rgn->context;
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > + status = handler(function, address, bit_width, value, context,
> > > > + region_context);
> > >
> > > Why don't we call the handler under the mutex?
> > >
> > > What exactly prevents context from becoming NULL before the call above?
> >
> > It's a kind of programming style related concern.
> > IMO, using locks around callback function is a buggy programming style that
> could lead to dead locks.
> > Let me explain this using an example.
> >
> > Object A exports a register/unregister API for other objects.
> > Object B calls A's register/unregister API to register/unregister B's callback.
> > It's likely that object B will hold lock_of_B around unregister/register when
> object B is destroyed/created, the lock_of_B is likely also used inside the
> callback.
>
> Why is it likely to be used inside the callback? Clearly, if a callback is
> executed under a lock, that lock can't be acquired by that callback.

I think this is not related to the real purpose of why we must not hold a lock in this situation.
So let's ignore this paragraph.

>
> > So when object A holds the lock_of_A around the callback invocation, it leads
> to dead lock since:
> > 1. the locking order for the register/unregister side will be: lock(lock_of_B),
> lock(lock_of_A)
> > 2. the locking order for the callback side will be: lock(lock_of_A),
> lock(lock_of_B)
> > They are in the reversed order!
> >
> > IMO, Linux may need to introduce __callback, __api as decelerators for the
> functions, and use sparse to enforce this rule, sparse knows if a callback is
> invoked under some locks.
>
> Oh, dear. Yes, sparse knows such things, and so what?

I was thinking sparse can give us warnings on __api marked function invocation where __acquire count is not 0, this might be mandatory for high quality codes.
And sparse can also give us warnings on __callback marked function invocations where __acquire count is not 0, this should be optional.
But since it is not related to our topic, let's ignore this paragraph.

>
> > In the case of ACPICA space_handlers, as you may know, when an ACPI
> operation region handler is invoked, there will be no lock held inside ACPICA
> (interpreter lock must be freed before executing operation region handlers).
> > So the likelihood of the dead lock is pretty much high here!
>
> Sorry, what are you talking about?
>
> Please let me rephrase my question: What *practical* problems would it lead
> to
> if we executed this particular callback under this particular mutex?
>
> Not *theoretical* in the general theory of everything, *practical* in this
> particular piece of code.
>
> And we are talking about a *global* mutex here, not something object-specific.

I think you have additional replies on this in another email.
Let me reply you there.

>
> > > > + atomic_dec(&rgn->refcnt);
> > > > +
> > > > + return status;
> > > > +}
> > > > +
> > > > +static acpi_status
> > > > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > > > + void *handler_context, void **region_context) {
> > > > + acpi_adr_space_setup setup;
> > > > + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > > + void *context;
> > > > + acpi_status status = AE_OK;
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + if (!acpi_region_callable(rgn) || !rgn->setup) {
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + return status;
> > > > + }
> > > > +
> > > > + atomic_inc(&rgn->refcnt);
> > > > + setup = rgn->setup;
> > > > + context = rgn->context;
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > + status = setup(handle, function, context, region_context);
> > >
> > > Can setup drop rgn->refcnt ?
> >
> > The reason is same as the handler, as a setup is also a callback.
>
> Let me rephrase: Is it legitimate for setup to modify rgn->refcnt?
> If so, then why?

Yes, the race is same as the handler.
When ACPICA is accessing the text segment of the setup function implementation, the module owns the setup function can also be unloaded as there is no lock hold before invoking setup - note that ExitInter also happens to setup invocations.

>
> > >
> > > > + atomic_dec(&rgn->refcnt);
> > > > +
> > > > + return status;
> > > > +}
> > > > +
> > > > +static int __acpi_install_region(struct acpi_region *rgn,
> > > > + acpi_adr_space_type space_id)
> > > > +{
> > > > + int res = 0;
> > > > + acpi_status status;
> > > > + int installing = 0;
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + if (rgn->flags & ACPI_REGION_INSTALLED)
> > > > + goto out_lock;
> > > > + if (rgn->flags & ACPI_REGION_INSTALLING) {
> > > > + res = -EBUSY;
> > > > + goto out_lock;
> > > > + }
> > > > +
> > > > + installing = 1;
> > > > + rgn->flags |= ACPI_REGION_INSTALLING;
> > > > + status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > space_id,
> > > > + acpi_region_default_handler,
> > > > + acpi_region_default_setup,
> > > > + rgn);
> > > > + rgn->flags &= ~ACPI_REGION_INSTALLING;
> > > > + if (ACPI_FAILURE(status))
> > > > + res = -EINVAL;
> > > > + else
> > > > + rgn->flags |= ACPI_REGION_INSTALLED;
> > > > +
> > > > +out_lock:
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + if (installing) {
> > > > + if (res)
> > > > + pr_err("Failed to install region %d\n", space_id);
> > > > + else
> > > > + pr_info("Region %d installed\n", space_id);
> > > > + }
> > > > + return res;
> > > > +}
> > > > +
> > > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > > + acpi_adr_space_handler handler,
> > > > + acpi_adr_space_setup setup, void *context) {
> > > > + int res;
> > > > + struct acpi_region *rgn;
> > > > +
> > > > + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > > + return -EINVAL;
> > > > +
> > > > + rgn = &acpi_regions[space_id];
> > > > + if (!acpi_region_managed(rgn))
> > > > + return -EINVAL;
> > > > +
> > > > + res = __acpi_install_region(rgn, space_id);
> > > > + if (res)
> > > > + return res;
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + if (rgn->flags & ACPI_REGION_REGISTERED) {
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + return -EBUSY;
> > > > + }
> > > > +
> > > > + rgn->handler = handler;
> > > > + rgn->setup = setup;
> > > > + rgn->context = context;
> > > > + rgn->flags |= ACPI_REGION_REGISTERED;
> > > > + atomic_set(&rgn->refcnt, 1);
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > + pr_info("Region %d registered\n", space_id);
> > > > +
> > > > + return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > > > +
> > > > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > > > + struct acpi_region *rgn;
> > > > +
> > > > + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > > + return;
> > > > +
> > > > + rgn = &acpi_regions[space_id];
> > > > + if (!acpi_region_managed(rgn))
> > > > + return;
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + return;
> > > > + }
> > > > + if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + return;
> > >
> > > What about
> > >
> > > if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> > > || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > mutex_unlock(&acpi_mutex_region);
> > > return;
> > > }
> > >
> >
> > OK.
> >
> > > > + }
> > > > +
> > > > + rgn->flags |= ACPI_REGION_UNREGISTERING;
> > > > + rgn->handler = NULL;
> > > > + rgn->setup = NULL;
> > > > + rgn->context = NULL;
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > + while (atomic_read(&rgn->refcnt) > 1)
> > > > + schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> > >
> > > Wouldn't it be better to use a wait queue here?
> >
> > Yes, I'll try.
>
> By the way, we do we need to do that?

I think you have additional replies on this in another email.
Let me reply you there.

Thanks for commenting.

Best regards
-Lv

>
> > > > + atomic_dec(&rgn->refcnt);
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + rgn->flags &= ~(ACPI_REGION_REGISTERED |
> > > ACPI_REGION_UNREGISTERING);
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > + pr_info("Region %d unregistered\n", space_id); }
> > > > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > > > a2c2fbb..15fad0d 100644
> > > > --- a/include/acpi/acpi_bus.h
> > > > +++ b/include/acpi/acpi_bus.h
> > > > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > > > *bus) { return 0; }
> > > >
> > > > #endif /* CONFIG_ACPI */
> > > >
> > > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > > + acpi_adr_space_handler handler,
> > > > + acpi_adr_space_setup setup, void *context); void
> > > > +acpi_unregister_region(acpi_adr_space_type space_id);
> > > > +
> > > > #endif /*__ACPI_BUS_H__*/
>
> Thanks,
> Rafael
>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-29 01:56:47

by Zheng, Lv

[permalink] [raw]
Subject: RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers

> On Friday, July 26, 2013 10:49 PM Rafael J. Wysocki wrote:
> > On Friday, July 26, 2013 01:54:00 AM Zheng, Lv wrote:
> > > On Friday, July 26, 2013 5:29 AM Rafael J. Wysocki wrote:
> > > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > > This patch adds reference couting for ACPI operation region
> > > > handlers to fix races caused by the ACPICA address space callback
> invocations.
> > > >
> > > > ACPICA address space callback invocation is not suitable for Linux
> > > > CONFIG_MODULE=y execution environment.
> > >
> > > Actually, can you please explain to me what *exactly* the problem is?
> >
> > OK. I'll add race explanations in the next revision.
> >
> > The problem is there is no "lock" held inside ACPICA for invoking
> > operation region handlers.
> > Thus races happens between the
> > acpi_remove/install_address_space_handler and the handler/setup
> callbacks.
>
> I see. Now you're trying to introduce something that would prevent those
> races from happening, right?

Yes. Let me explain this later in this email.

>
> > This is correct per ACPI specification.
> > As if there is interpreter locks held for invoking operation region
> > handlers, the timeout implemented inside the operation region handlers
> > will make all locking facilities (Acquire or Sleep,...) timed out.
> > Please refer to ACPI specification "5.5.2 Control Method Execution":
> > Interpretation of a Control Method is not preemptive, but it can
> > block. When a control method does block, OSPM can initiate or continue
> > the execution of a different control method. A control method can only
> > assume that access to global objects is exclusive for any period the control
> method does not block.
> >
> > So it is pretty much likely that ACPI IO transfers are locked inside
> > the operation region callback implementations.
> > Using locking facility to protect the callback invocation will risk dead locks.
>
> No. If you use a single global lock around all invocations of operation region
> handlers, it won't deadlock, but it will *serialize* things. This means that
> there won't be two handlers executing in parallel. That may or may not be
> bad depending on what those handlers actually do.
>
> Your concern seems to be that if one address space handler is buggy and it
> blocks indefinitely, executing it under such a lock would affect the other address
> space handlers and in my opinion this is a valid concern.

It can be expressed in more detailed ways:

The interpreter runs control methods in the following style according to the ACPI spec.
CM1_Enter -> EnterInter -> CM1_Running -> OpRegion1 -> ExitInter -> EnterInter -> CM1_running -> ExitInter -> CM1_Exit
CM2_Enter -> EnterInter -> -> CM2_Running -> OpRegion1 -> ExitInter -> EnterInter -> CM2_running -> ExitInter -> CM2_Exit

EnterInter: Enter interpreter lock
ExitInter: Leave interpreter lock

Let me introduce two situations:

1. If we hold global "mutex" before "EnterInter", then no second control method can be run "NotSerialized".
If the CM1 just have some codes waiting for a hardware flag and CM2 can access other hardware IOs to trigger this flag, then nothing can happen any longer.
This is a practical bug as what we have already seen in "NotSerialized" marked ACPI control methods behave in the interpreter mode executed in serialized way - kernel parameter "acpi_serialize".

2. If we hold global "mutex" after "EnterInter" and Before OpRegion1
If we do things this way, then all IO accesses are serialized, if we have something in an IPMI operation region failed due to timeout, then any other system IOs that should happen in parallel will just happen after 5 seconds. This is not an acceptable experience.

>
> So the idea seems to be to add wrappers around
> acpi_install_address_space_handler()
> and acpi_remove_address_space_handler (but I don't see where the latter is
> called after the change?), such that they will know when it is safe to unregister
> the handler. That is simple enough.

An obvious bug, it should be put between the while (atomic_read() > 1) block and the final atomic_dec().

> However, I'm not sure it is needed in the context of IPMI.

I think I do this just because I need a quick fix to test IPMI bug-fix series.
The issue is highly related to ACPI interpreter design, and codes should be implemented inside ACPICA.
And there is not only ACPI_ROOT_OBJECT based address space handlers, but also non-ACPI_ROOT_OBJECT based address space handlers, this patch can't protect the latter ones.

> Your address space
> handler's context is NULL, so even it if is executed after
> acpi_remove_address_space_handler() has been called for it (or in parallel), it
> doesn't depend on anything passed by the caller, so I don't see why the issue
> can't be addressed by a proper synchronization between
> acpi_ipmi_exit() and acpi_ipmi_space_handler().
>
> Clearly, acpi_ipmi_exit() should wait for all already running instances of
> acpi_ipmi_space_handler() to complete and all acpi_ipmi_space_handler()
> instances started after acpi_ipmi_exit() has been called must return
> immediately.
>
> I would imagine an algorithm like this:
>
> acpi_ipmi_exit()
> 1. Take "address space handler lock".
> 2. Set "unregistering address space handler" flag.
> 3. Check if "count of currently running handlers" is 0. If so,
> call acpi_remove_address_space_handler(), drop the lock (possibly clear
> the
> flag) and return.
> 4. Otherwise drop the lock and go to sleep in "address space handler wait
> queue".
> 5. When woken up, take "address space handler lock" and go to 3.
>
> acpi_ipmi_space_handler()
> 1. Take "address space handler lock".
> 2. Check "unregistering address space handler" flag. If set, drop the lock
> and return.
> 3. Increment "count of currently running handlers".
> 4. Drop the lock.
> 5. Do your work.
> 6. Take "address space handler lock".
> 7. Decrement "count of currently running handlers" and if 0, signal the
> tasks waiting on it to wake up.
> 8. Drop the lock.

Yes, it can also work, but the fix will go inside IPMI.
And I agree that the codes should not appear in the IPMI context since the issue is highly ACPI interpreter related.
What if we just stop doing any further work on this patch and just mark it as RFC or a test patch for information purpose.
It is only useful for the testers.

Thanks and best regards
-Lv

>
> Thanks,
> Rafael
>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?