2012-11-23 17:50:50

by Vasilis Liaskovitis

[permalink] [raw]
Subject: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

As discussed in https://patchwork.kernel.org/patch/1581581/
the driver core remove function needs to always succeed. This means we need
to know that the device can be successfully removed before acpi_bus_trim /
acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
or SCI-initiated eject of memory devices fail e.g with:
echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

since the ACPI core goes ahead and ejects the device regardless of whether the
the memory is still in use or not.

For this reason a new acpi_device operation called prepare_remove is introduced.
This operation should be registered for acpi devices whose removal (from kernel
perspective) can fail. Memory devices fall in this category.

acpi_bus_remove() is changed to handle removal in 2 steps:
- preparation for removal i.e. perform part of removal that can fail. Should
succeed for device and all its children.
- if above step was successfull, proceed to actual device removal

With this patchset, only acpi memory devices use the new prepare_remove
device operation. The actual memory removal (VM-related offline and other memory
cleanups) is moved to prepare_remove. The old remove operation just cleans up
the acpi structures. Directly ejecting PNP0C80 memory devices works safely. I
haven't tested yet with an ACPI container which contains memory devices.

Note that unbinding the acpi driver from a memory device with:
echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind

will no longer try to remove the memory. This is in compliance with normal
unbind driver core semantics, see the discussion in v2 of this patchset:
https://lkml.org/lkml/2012/11/16/649

After a successful unbind of the driver:
- OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
return -ENODEV on missing driver.
- SCI ejects of the memory device also cannot proceed, as they will also get
a "driver data is NULL" error.
So the memory can continue to be used safely after unbind.

Patchset based on Rafael's linux-pm/linux-next (commit 78c38651).
Comments welcome.

v2->v3:
- remove driver core changes. Only acpi core changes needed. Unbind semantics
follow driver core rules. Unbind does not remove memory.
- new patch to set enable bit in order to proceed with ejects on driver
re-binding scenario.

v1->v2:
- new patch to introduce bus_type prepare_remove callback. Needed to prepare
removal on driver unbinding from device-driver core.
- v1 patches 1 and 2 simplified and merged in one. acpi_bus_trim does not require
argument changes.

Vasilis Liaskovitis (3):
acpi: Introduce prepare_remove operation in acpi_device_ops
acpi_memhotplug: Add prepare_remove operation
acpi_memhotplug: Allow eject to proceed on rebind scenario

drivers/acpi/acpi_memhotplug.c | 21 +++++++++++++++++----
drivers/acpi/scan.c | 9 ++++++++-
include/acpi/acpi_bus.h | 2 ++
3 files changed, 27 insertions(+), 5 deletions(-)

--
1.7.9


2012-11-23 17:50:51

by Vasilis Liaskovitis

[permalink] [raw]
Subject: [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops

This function should be registered for devices that need to execute some
non-acpi related action in order to be safely removed. If this function
returns zero, the acpi core can continue with removing the device.

Make acpi_bus_remove call the device-specific prepare_remove callback before
removing the device. If prepare_remove fails, the removal is aborted.

Signed-off-by: Vasilis Liaskovitis <[email protected]>
---
drivers/acpi/scan.c | 9 ++++++++-
include/acpi/acpi_bus.h | 2 ++
2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 8c4ac6d..e1c1d5d 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1380,10 +1380,16 @@ static int acpi_device_set_context(struct acpi_device *device)

static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
{
+ int ret = 0;
if (!dev)
return -EINVAL;

dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
+
+ if (dev->driver && dev->driver->ops.prepare_remove)
+ ret = dev->driver->ops.prepare_remove(dev);
+ if (ret)
+ return ret;
device_release_driver(&dev->dev);

if (!rmdevice)
@@ -1702,7 +1708,8 @@ int acpi_bus_trim(struct acpi_device *start, int rmdevice)
err = acpi_bus_remove(child, rmdevice);
else
err = acpi_bus_remove(child, 1);
-
+ if (err)
+ return err;
continue;
}

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 7ced5dc..9d94a55 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -94,6 +94,7 @@ typedef int (*acpi_op_start) (struct acpi_device * device);
typedef int (*acpi_op_bind) (struct acpi_device * device);
typedef int (*acpi_op_unbind) (struct acpi_device * device);
typedef void (*acpi_op_notify) (struct acpi_device * device, u32 event);
+typedef int (*acpi_op_prepare_remove) (struct acpi_device *device);

struct acpi_bus_ops {
u32 acpi_op_add:1;
@@ -107,6 +108,7 @@ struct acpi_device_ops {
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
+ acpi_op_prepare_remove prepare_remove;
};

#define ACPI_DRIVER_ALL_NOTIFY_EVENTS 0x1 /* system AND device events */
--
1.7.9

2012-11-23 17:50:53

by Vasilis Liaskovitis

[permalink] [raw]
Subject: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

Consider the following sequence of operations for a hotplugged memory device:

1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
2. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/bind
3. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

The driver is successfully re-bound to the device in step 2. However step 3 will
not attempt to remove the memory. This is because the acpi_memory_info enabled
bit for the newly bound driver has not been set to 1. This bit needs to be set
in the case where the memory is already used by the kernel (add_memory returns
-EEXIST)

Setting the enabled bit in this case (in acpi_memory_enable_device) makes the
driver function properly after a rebind of the driver i.e. eject operation
attempts to remove memory after a successful rebind.

I am not sure if this breaks some other usage of the enabled bit (see commit
65479472). When is it possible for the memory to be in use by the kernel but
not managed by the acpi driver, apart from a driver unbind scenario?

Perhaps the patch is not needed, depending on expected semantics of re-binding.
Is the newly bound driver supposed to manage the device, if it was earlier
managed by the same driver?

This patch is only specific to this scenario, and can be dropped from the patch
series if needed.

Signed-off-by: Vasilis Liaskovitis <[email protected]>
---
drivers/acpi/acpi_memhotplug.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index d0cfbd9..0562cb4 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -271,12 +271,11 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
continue;
}

- if (!result)
- info->enabled = 1;
/*
* Add num_enable even if add_memory() returns -EEXIST, so the
* device is bound to this driver.
*/
+ info->enabled = 1;
num_enabled++;
}
if (!num_enabled) {
--
1.7.9

2012-11-23 17:51:38

by Vasilis Liaskovitis

[permalink] [raw]
Subject: [RFC PATCH v3 2/3] acpi_memhotplug: Add prepare_remove operation

Offlining and removal of memory is now done in the prepare_remove callback,
not in the remove callback.

The prepare_remove callback will be called when trying to remove a memory device
with the following ways:

1. send eject request by SCI
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

Note that unbinding the acpi driver from a memory device with:
echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind

will no longer try to remove the memory. This is in compliance with normal
unbind driver core semantics, see the discussion in v2 of this patchset:
https://lkml.org/lkml/2012/11/16/649

After a successful unbind of the driver:
- OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
return -ENODEV on missing driver.
- SCI ejects of the memory device also cannot proceed, as they will also get
a "driver data is NULL" error.
So the memory can continue to be used safely after unbind.

Signed-off-by: Vasilis Liaskovitis <[email protected]>
---
drivers/acpi/acpi_memhotplug.c | 18 ++++++++++++++++--
1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index eb30e5a..d0cfbd9 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -55,6 +55,7 @@ MODULE_LICENSE("GPL");

static int acpi_memory_device_add(struct acpi_device *device);
static int acpi_memory_device_remove(struct acpi_device *device, int type);
+static int acpi_memory_device_prepare_remove(struct acpi_device *device);

static const struct acpi_device_id memory_device_ids[] = {
{ACPI_MEMORY_DEVICE_HID, 0},
@@ -69,6 +70,7 @@ static struct acpi_driver acpi_memory_device_driver = {
.ops = {
.add = acpi_memory_device_add,
.remove = acpi_memory_device_remove,
+ .prepare_remove = acpi_memory_device_prepare_remove,
},
};

@@ -448,6 +450,20 @@ static int acpi_memory_device_add(struct acpi_device *device)
static int acpi_memory_device_remove(struct acpi_device *device, int type)
{
struct acpi_memory_device *mem_device = NULL;
+
+ if (!device || !acpi_driver_data(device))
+ return -EINVAL;
+
+ mem_device = acpi_driver_data(device);
+
+ acpi_memory_device_free(mem_device);
+
+ return 0;
+}
+
+static int acpi_memory_device_prepare_remove(struct acpi_device *device)
+{
+ struct acpi_memory_device *mem_device = NULL;
int result;

if (!device || !acpi_driver_data(device))
@@ -459,8 +475,6 @@ static int acpi_memory_device_remove(struct acpi_device *device, int type)
if (result)
return result;

- acpi_memory_device_free(mem_device);
-
return 0;
}

--
1.7.9

2012-11-24 16:20:54

by Wen Congyang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
> Consider the following sequence of operations for a hotplugged memory device:
>
> 1. echo "PNP0C80:XX"> /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 2. echo "PNP0C80:XX"> /sys/bus/acpi/drivers/acpi_memhotplug/bind
> 3. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
>
> The driver is successfully re-bound to the device in step 2. However step 3 will
> not attempt to remove the memory. This is because the acpi_memory_info enabled
> bit for the newly bound driver has not been set to 1. This bit needs to be set
> in the case where the memory is already used by the kernel (add_memory returns
> -EEXIST)

Hmm, I think the reason is that we don't offline/remove memory when
unbinding it
from the driver. I have sent a patch to fix this problem, and this patch
is in
pm tree now. With this patch, we will offline/remove memory when
unbinding it from
the drriver.

Consider the following sequence of operations for a hotplugged memory
device:

1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

If we don't offline/remove the memory, we have no chance to do it in
step 2. After
step2, the memory is used by the kernel, but we have powered off it. It
is very
dangerous.

So this patch is unnecessary now.

Thanks
Wen Congyang

>
> Setting the enabled bit in this case (in acpi_memory_enable_device) makes the
> driver function properly after a rebind of the driver i.e. eject operation
> attempts to remove memory after a successful rebind.
>
> I am not sure if this breaks some other usage of the enabled bit (see commit
> 65479472). When is it possible for the memory to be in use by the kernel but
> not managed by the acpi driver, apart from a driver unbind scenario?
>
> Perhaps the patch is not needed, depending on expected semantics of re-binding.
> Is the newly bound driver supposed to manage the device, if it was earlier
> managed by the same driver?
>
> This patch is only specific to this scenario, and can be dropped from the patch
> series if needed.
>
> Signed-off-by: Vasilis Liaskovitis<[email protected]>
> ---
> drivers/acpi/acpi_memhotplug.c | 3 +--
> 1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index d0cfbd9..0562cb4 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -271,12 +271,11 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
> continue;
> }
>
> - if (!result)
> - info->enabled = 1;
> /*
> * Add num_enable even if add_memory() returns -EEXIST, so the
> * device is bound to this driver.
> */
> + info->enabled = 1;
> num_enabled++;
> }
> if (!num_enabled) {

2012-11-24 16:23:52

by Wen Congyang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/3] acpi_memhotplug: Add prepare_remove operation

At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
> Offlining and removal of memory is now done in the prepare_remove callback,
> not in the remove callback.
>
> The prepare_remove callback will be called when trying to remove a memory device
> with the following ways:
>
> 1. send eject request by SCI
> 2. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
>
> Note that unbinding the acpi driver from a memory device with:
> echo "PNP0C80:XX"> /sys/bus/acpi/drivers/acpi_memhotplug/unbind
>
> will no longer try to remove the memory. This is in compliance with normal
> unbind driver core semantics, see the discussion in v2 of this patchset:
> https://lkml.org/lkml/2012/11/16/649

If we don't remove it when unbinding it, it may cause kernel panicked.

I have explained in another mail.

Thanks
Wen Congyang

>
> After a successful unbind of the driver:
> - OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
> return -ENODEV on missing driver.
> - SCI ejects of the memory device also cannot proceed, as they will also get
> a "driver data is NULL" error.
> So the memory can continue to be used safely after unbind.
>
> Signed-off-by: Vasilis Liaskovitis<[email protected]>
> ---
> drivers/acpi/acpi_memhotplug.c | 18 ++++++++++++++++--
> 1 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index eb30e5a..d0cfbd9 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -55,6 +55,7 @@ MODULE_LICENSE("GPL");
>
> static int acpi_memory_device_add(struct acpi_device *device);
> static int acpi_memory_device_remove(struct acpi_device *device, int type);
> +static int acpi_memory_device_prepare_remove(struct acpi_device *device);
>
> static const struct acpi_device_id memory_device_ids[] = {
> {ACPI_MEMORY_DEVICE_HID, 0},
> @@ -69,6 +70,7 @@ static struct acpi_driver acpi_memory_device_driver = {
> .ops = {
> .add = acpi_memory_device_add,
> .remove = acpi_memory_device_remove,
> + .prepare_remove = acpi_memory_device_prepare_remove,
> },
> };
>
> @@ -448,6 +450,20 @@ static int acpi_memory_device_add(struct acpi_device *device)
> static int acpi_memory_device_remove(struct acpi_device *device, int type)
> {
> struct acpi_memory_device *mem_device = NULL;
> +
> + if (!device || !acpi_driver_data(device))
> + return -EINVAL;
> +
> + mem_device = acpi_driver_data(device);
> +
> + acpi_memory_device_free(mem_device);
> +
> + return 0;
> +}
> +
> +static int acpi_memory_device_prepare_remove(struct acpi_device *device)
> +{
> + struct acpi_memory_device *mem_device = NULL;
> int result;
>
> if (!device || !acpi_driver_data(device))
> @@ -459,8 +475,6 @@ static int acpi_memory_device_remove(struct acpi_device *device, int type)
> if (result)
> return result;
>
> - acpi_memory_device_free(mem_device);
> -
> return 0;
> }
>

2012-11-26 08:36:42

by Vasilis Liaskovitis

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Sun, Nov 25, 2012 at 12:20:47AM +0800, Wen Congyang wrote:
> At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
> > Consider the following sequence of operations for a hotplugged memory device:
> >
> > 1. echo "PNP0C80:XX"> /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > 2. echo "PNP0C80:XX"> /sys/bus/acpi/drivers/acpi_memhotplug/bind
> > 3. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
> >
> > The driver is successfully re-bound to the device in step 2. However step 3 will
> > not attempt to remove the memory. This is because the acpi_memory_info enabled
> > bit for the newly bound driver has not been set to 1. This bit needs to be set
> > in the case where the memory is already used by the kernel (add_memory returns
> > -EEXIST)
>
> Hmm, I think the reason is that we don't offline/remove memory when
> unbinding it
> from the driver. I have sent a patch to fix this problem, and this patch
> is in
> pm tree now. With this patch, we will offline/remove memory when
> unbinding it from
> the drriver.

ok. Which patch is this? Does it require driver-core changes?

>
> Consider the following sequence of operations for a hotplugged memory
> device:
>
> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>
> If we don't offline/remove the memory, we have no chance to do it in
> step 2. After
> step2, the memory is used by the kernel, but we have powered off it. It
> is very
> dangerous.

How does power-off happen after unbind? acpi_eject_store checks for existing
driver before taking any action:

#ifndef FORCE_EJECT
if (acpi_device->driver == NULL) {
ret = -ENODEV;
goto err;
}
#endif

FORCE_EJECT is not defined afaict, so the function returns without scheduling
acpi_bus_hot_remove_device. Is there another code path that calls power-off?

thanks,

- Vasilis

2012-11-26 09:52:18

by Wen Congyang

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

At 11/26/2012 04:36 PM, Vasilis Liaskovitis Wrote:
> On Sun, Nov 25, 2012 at 12:20:47AM +0800, Wen Congyang wrote:
>> At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
>>> Consider the following sequence of operations for a hotplugged memory device:
>>>
>>> 1. echo "PNP0C80:XX"> /sys/bus/acpi/drivers/acpi_memhotplug/unbind
>>> 2. echo "PNP0C80:XX"> /sys/bus/acpi/drivers/acpi_memhotplug/bind
>>> 3. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
>>>
>>> The driver is successfully re-bound to the device in step 2. However step 3 will
>>> not attempt to remove the memory. This is because the acpi_memory_info enabled
>>> bit for the newly bound driver has not been set to 1. This bit needs to be set
>>> in the case where the memory is already used by the kernel (add_memory returns
>>> -EEXIST)
>>
>> Hmm, I think the reason is that we don't offline/remove memory when
>> unbinding it
>> from the driver. I have sent a patch to fix this problem, and this patch
>> is in
>> pm tree now. With this patch, we will offline/remove memory when
>> unbinding it from
>> the drriver.
>
> ok. Which patch is this? Does it require driver-core changes?

https://lkml.org/lkml/2012/11/15/21

Patch 1-6 is in pm tree now.

>
>>
>> Consider the following sequence of operations for a hotplugged memory
>> device:
>>
>> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
>> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>
>> If we don't offline/remove the memory, we have no chance to do it in
>> step 2. After
>> step2, the memory is used by the kernel, but we have powered off it. It
>> is very
>> dangerous.
>
> How does power-off happen after unbind? acpi_eject_store checks for existing
> driver before taking any action:
>
> #ifndef FORCE_EJECT
> if (acpi_device->driver == NULL) {
> ret = -ENODEV;
> goto err;
> }
> #endif
>
> FORCE_EJECT is not defined afaict, so the function returns without scheduling
> acpi_bus_hot_remove_device. Is there another code path that calls power-off?

Consider the following case:

We hotremove the memory device by SCI and unbind it from the driver at the same time:

CPUa CPUb
acpi_memory_device_notify()
unbind it from the driver
acpi_bus_hot_remove_device()

Thanks
Wen Congyang

>
> thanks,
>
> - Vasilis
>

2012-11-27 00:18:46

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops

On Fri, 2012-11-23 at 18:50 +0100, Vasilis Liaskovitis wrote:
> This function should be registered for devices that need to execute some
> non-acpi related action in order to be safely removed. If this function
> returns zero, the acpi core can continue with removing the device.
>
> Make acpi_bus_remove call the device-specific prepare_remove callback before
> removing the device. If prepare_remove fails, the removal is aborted.
>
> Signed-off-by: Vasilis Liaskovitis <[email protected]>
> ---
> drivers/acpi/scan.c | 9 ++++++++-
> include/acpi/acpi_bus.h | 2 ++
> 2 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 8c4ac6d..e1c1d5d 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1380,10 +1380,16 @@ static int acpi_device_set_context(struct acpi_device *device)
>
> static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
> {
> + int ret = 0;
> if (!dev)
> return -EINVAL;
>
> dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
> +
> + if (dev->driver && dev->driver->ops.prepare_remove)
> + ret = dev->driver->ops.prepare_remove(dev);
> + if (ret)
> + return ret;

Hi Vasilis,

The above code should be like below. Then you do not need to initialize
ret, either. Please also add some comments explaining about
prepare_remove can fail, but remove cannot.

if (dev->driver && dev->driver->ops.prepare_remove) {
ret = dev->driver->ops.prepare_remove(dev);
if (ret)
return ret;
}

> device_release_driver(&dev->dev);
>
> if (!rmdevice)
> @@ -1702,7 +1708,8 @@ int acpi_bus_trim(struct acpi_device *start, int rmdevice)
> err = acpi_bus_remove(child, rmdevice);
> else
> err = acpi_bus_remove(child, 1);
> -
> + if (err)
> + return err;
> continue;
> }
>
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index 7ced5dc..9d94a55 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -94,6 +94,7 @@ typedef int (*acpi_op_start) (struct acpi_device * device);
> typedef int (*acpi_op_bind) (struct acpi_device * device);
> typedef int (*acpi_op_unbind) (struct acpi_device * device);
> typedef void (*acpi_op_notify) (struct acpi_device * device, u32 event);
> +typedef int (*acpi_op_prepare_remove) (struct acpi_device *device);
>
> struct acpi_bus_ops {
> u32 acpi_op_add:1;
> @@ -107,6 +108,7 @@ struct acpi_device_ops {
> acpi_op_bind bind;
> acpi_op_unbind unbind;
> acpi_op_notify notify;
> + acpi_op_prepare_remove prepare_remove;

I'd prefer pre_remove, which indicates this interface is called before
remove. prepare_remove sounds as if it only performs preparation, which
may be misleading.

BTW, Rafael mentioned we should avoid extending ACPI driver's
interface... But I do not have other idea, either.


Thanks,
-Toshi



> };
>
> #define ACPI_DRIVER_ALL_NOTIFY_EVENTS 0x1 /* system AND device events */

2012-11-27 00:27:29

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

> >> Consider the following sequence of operations for a hotplugged memory
> >> device:
> >>
> >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> >>
> >> If we don't offline/remove the memory, we have no chance to do it in
> >> step 2. After
> >> step2, the memory is used by the kernel, but we have powered off it. It
> >> is very
> >> dangerous.
> >
> > How does power-off happen after unbind? acpi_eject_store checks for existing
> > driver before taking any action:
> >
> > #ifndef FORCE_EJECT
> > if (acpi_device->driver == NULL) {
> > ret = -ENODEV;
> > goto err;
> > }
> > #endif
> >
> > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
>
> Consider the following case:
>
> We hotremove the memory device by SCI and unbind it from the driver at the same time:
>
> CPUa CPUb
> acpi_memory_device_notify()
> unbind it from the driver
> acpi_bus_hot_remove_device()

Can we make acpi_bus_remove() to fail if a given acpi_device is not
bound with a driver? If so, can we make the unbind operation to perform
unbind only?

Thanks,
-Toshi

2012-11-27 18:32:52

by Vasilis Liaskovitis

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > >> Consider the following sequence of operations for a hotplugged memory
> > >> device:
> > >>
> > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > >>
> > >> If we don't offline/remove the memory, we have no chance to do it in
> > >> step 2. After
> > >> step2, the memory is used by the kernel, but we have powered off it. It
> > >> is very
> > >> dangerous.
> > >
> > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > driver before taking any action:
> > >
> > > #ifndef FORCE_EJECT
> > > if (acpi_device->driver == NULL) {
> > > ret = -ENODEV;
> > > goto err;
> > > }
> > > #endif
> > >
> > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> >
> > Consider the following case:
> >
> > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> >
> > CPUa CPUb
> > acpi_memory_device_notify()
> > unbind it from the driver
> > acpi_bus_hot_remove_device()
>
> Can we make acpi_bus_remove() to fail if a given acpi_device is not
> bound with a driver? If so, can we make the unbind operation to perform
> unbind only?

acpi_bus_remove_device could check if the driver is present, and return -ENODEV
if it's not present (dev->driver == NULL).

But there can still be a race between an eject and an unbind operation happening
simultaneously. This seems like a general problem to me i.e. not specific to an
acpi memory device. How do we ensure an eject does not race with a driver unbind
for other acpi devices?

Is there a per-device lock in acpi-core or device-core that can prevent this from
happening? Driver core does a device_lock(dev) on all operations, but this is
probably not grabbed on SCI-initiated acpi ejects.

thanks,

- Vasilis

2012-11-27 18:36:36

by Vasilis Liaskovitis

[permalink] [raw]
Subject: Re: [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops

Hi Toshi,

On Mon, Nov 26, 2012 at 05:10:21PM -0700, Toshi Kani wrote:
> On Fri, 2012-11-23 at 18:50 +0100, Vasilis Liaskovitis wrote:
> > This function should be registered for devices that need to execute some
> > non-acpi related action in order to be safely removed. If this function
> > returns zero, the acpi core can continue with removing the device.
> >
> > Make acpi_bus_remove call the device-specific prepare_remove callback before
> > removing the device. If prepare_remove fails, the removal is aborted.
> >
> > Signed-off-by: Vasilis Liaskovitis <[email protected]>
> > ---
> > drivers/acpi/scan.c | 9 ++++++++-
> > include/acpi/acpi_bus.h | 2 ++
> > 2 files changed, 10 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index 8c4ac6d..e1c1d5d 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -1380,10 +1380,16 @@ static int acpi_device_set_context(struct acpi_device *device)
> >
> > static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
> > {
> > + int ret = 0;
> > if (!dev)
> > return -EINVAL;
> >
> > dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
> > +
> > + if (dev->driver && dev->driver->ops.prepare_remove)
> > + ret = dev->driver->ops.prepare_remove(dev);
> > + if (ret)
> > + return ret;
>
> Hi Vasilis,
>
> The above code should be like below. Then you do not need to initialize
> ret, either. Please also add some comments explaining about
> prepare_remove can fail, but remove cannot.
>
> if (dev->driver && dev->driver->ops.prepare_remove) {
> ret = dev->driver->ops.prepare_remove(dev);
> if (ret)
> return ret;
> }

right.

>
> > device_release_driver(&dev->dev);
> >
> > if (!rmdevice)
> > @@ -1702,7 +1708,8 @@ int acpi_bus_trim(struct acpi_device *start, int rmdevice)
> > err = acpi_bus_remove(child, rmdevice);
> > else
> > err = acpi_bus_remove(child, 1);
> > -
> > + if (err)
> > + return err;
> > continue;
> > }
> >
> > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> > index 7ced5dc..9d94a55 100644
> > --- a/include/acpi/acpi_bus.h
> > +++ b/include/acpi/acpi_bus.h
> > @@ -94,6 +94,7 @@ typedef int (*acpi_op_start) (struct acpi_device * device);
> > typedef int (*acpi_op_bind) (struct acpi_device * device);
> > typedef int (*acpi_op_unbind) (struct acpi_device * device);
> > typedef void (*acpi_op_notify) (struct acpi_device * device, u32 event);
> > +typedef int (*acpi_op_prepare_remove) (struct acpi_device *device);
> >
> > struct acpi_bus_ops {
> > u32 acpi_op_add:1;
> > @@ -107,6 +108,7 @@ struct acpi_device_ops {
> > acpi_op_bind bind;
> > acpi_op_unbind unbind;
> > acpi_op_notify notify;
> > + acpi_op_prepare_remove prepare_remove;
>
> I'd prefer pre_remove, which indicates this interface is called before
> remove. prepare_remove sounds as if it only performs preparation, which
> may be misleading.

ok, I 'll use pre_remove from now on.

>
> BTW, Rafael mentioned we should avoid extending ACPI driver's
> interface... But I do not have other idea, either.

If we reach agreement that this is the approach we want, I 'll resend the series.

thanks,

- Vasilis

2012-11-27 22:12:17

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > >> Consider the following sequence of operations for a hotplugged memory
> > > >> device:
> > > >>
> > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > >>
> > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > >> step 2. After
> > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > >> is very
> > > >> dangerous.
> > > >
> > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > driver before taking any action:
> > > >
> > > > #ifndef FORCE_EJECT
> > > > if (acpi_device->driver == NULL) {
> > > > ret = -ENODEV;
> > > > goto err;
> > > > }
> > > > #endif
> > > >
> > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > >
> > > Consider the following case:
> > >
> > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > >
> > > CPUa CPUb
> > > acpi_memory_device_notify()
> > > unbind it from the driver
> > > acpi_bus_hot_remove_device()
> >
> > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > bound with a driver? If so, can we make the unbind operation to perform
> > unbind only?
>
> acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> if it's not present (dev->driver == NULL).
>
> But there can still be a race between an eject and an unbind operation happening
> simultaneously. This seems like a general problem to me i.e. not specific to an
> acpi memory device. How do we ensure an eject does not race with a driver unbind
> for other acpi devices?
>
> Is there a per-device lock in acpi-core or device-core that can prevent this from
> happening? Driver core does a device_lock(dev) on all operations, but this is
> probably not grabbed on SCI-initiated acpi ejects.

Since driver_unbind() calls device_lock(dev->parent) before calling
device_release_driver(), I am wondering if we can call
device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
(i.e. before calling pre_remove) and fails if dev->driver is NULL. The
parent lock is otherwise released after device_release_driver() is done.

Thanks,
-Toshi

2012-11-27 23:13:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops

On Monday, November 26, 2012 05:10:21 PM Toshi Kani wrote:
> On Fri, 2012-11-23 at 18:50 +0100, Vasilis Liaskovitis wrote:
> > This function should be registered for devices that need to execute some
> > non-acpi related action in order to be safely removed. If this function
> > returns zero, the acpi core can continue with removing the device.
> >
> > Make acpi_bus_remove call the device-specific prepare_remove callback before
> > removing the device. If prepare_remove fails, the removal is aborted.
> >
> > Signed-off-by: Vasilis Liaskovitis <[email protected]>
> > ---
> > drivers/acpi/scan.c | 9 ++++++++-
> > include/acpi/acpi_bus.h | 2 ++
> > 2 files changed, 10 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index 8c4ac6d..e1c1d5d 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -1380,10 +1380,16 @@ static int acpi_device_set_context(struct acpi_device *device)
> >
> > static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
> > {
> > + int ret = 0;
> > if (!dev)
> > return -EINVAL;
> >
> > dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
> > +
> > + if (dev->driver && dev->driver->ops.prepare_remove)
> > + ret = dev->driver->ops.prepare_remove(dev);
> > + if (ret)
> > + return ret;
>
> Hi Vasilis,
>
> The above code should be like below. Then you do not need to initialize
> ret, either. Please also add some comments explaining about
> prepare_remove can fail, but remove cannot.
>
> if (dev->driver && dev->driver->ops.prepare_remove) {
> ret = dev->driver->ops.prepare_remove(dev);
> if (ret)
> return ret;
> }
>
> > device_release_driver(&dev->dev);
> >
> > if (!rmdevice)
> > @@ -1702,7 +1708,8 @@ int acpi_bus_trim(struct acpi_device *start, int rmdevice)
> > err = acpi_bus_remove(child, rmdevice);
> > else
> > err = acpi_bus_remove(child, 1);
> > -
> > + if (err)
> > + return err;
> > continue;
> > }
> >
> > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> > index 7ced5dc..9d94a55 100644
> > --- a/include/acpi/acpi_bus.h
> > +++ b/include/acpi/acpi_bus.h
> > @@ -94,6 +94,7 @@ typedef int (*acpi_op_start) (struct acpi_device * device);
> > typedef int (*acpi_op_bind) (struct acpi_device * device);
> > typedef int (*acpi_op_unbind) (struct acpi_device * device);
> > typedef void (*acpi_op_notify) (struct acpi_device * device, u32 event);
> > +typedef int (*acpi_op_prepare_remove) (struct acpi_device *device);
> >
> > struct acpi_bus_ops {
> > u32 acpi_op_add:1;
> > @@ -107,6 +108,7 @@ struct acpi_device_ops {
> > acpi_op_bind bind;
> > acpi_op_unbind unbind;
> > acpi_op_notify notify;
> > + acpi_op_prepare_remove prepare_remove;
>
> I'd prefer pre_remove, which indicates this interface is called before
> remove. prepare_remove sounds as if it only performs preparation, which
> may be misleading.
>
> BTW, Rafael mentioned we should avoid extending ACPI driver's
> interface... But I do not have other idea, either.

It's fine in this particular case, since it looks like it would be difficult
to do that differently with what we have at the moment.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-27 23:36:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Tuesday, November 27, 2012 03:03:47 PM Toshi Kani wrote:
> On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> > On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > > >> Consider the following sequence of operations for a hotplugged memory
> > > > >> device:
> > > > >>
> > > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > >>
> > > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > > >> step 2. After
> > > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > > >> is very
> > > > >> dangerous.
> > > > >
> > > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > > driver before taking any action:
> > > > >
> > > > > #ifndef FORCE_EJECT
> > > > > if (acpi_device->driver == NULL) {
> > > > > ret = -ENODEV;
> > > > > goto err;
> > > > > }
> > > > > #endif
> > > > >
> > > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > >
> > > > Consider the following case:
> > > >
> > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > >
> > > > CPUa CPUb
> > > > acpi_memory_device_notify()
> > > > unbind it from the driver
> > > > acpi_bus_hot_remove_device()
> > >
> > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > bound with a driver? If so, can we make the unbind operation to perform
> > > unbind only?
> >
> > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > if it's not present (dev->driver == NULL).
> >
> > But there can still be a race between an eject and an unbind operation happening
> > simultaneously. This seems like a general problem to me i.e. not specific to an
> > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > for other acpi devices?
> >
> > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > happening? Driver core does a device_lock(dev) on all operations, but this is
> > probably not grabbed on SCI-initiated acpi ejects.
>
> Since driver_unbind() calls device_lock(dev->parent) before calling
> device_release_driver(), I am wondering if we can call
> device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> parent lock is otherwise released after device_release_driver() is done.

I would be careful. You may introduce some subtle locking-related issues
this way.

Besides, there may be an alternative approach to all this. For example,
what if we don't remove struct device objects on eject? The ACPI handles
associated with them don't go away in that case after all, do they?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-28 11:05:32

by Hanjun Guo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> As discussed in https://patchwork.kernel.org/patch/1581581/
> the driver core remove function needs to always succeed. This means we need
> to know that the device can be successfully removed before acpi_bus_trim /
> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> or SCI-initiated eject of memory devices fail e.g with:
> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>
> since the ACPI core goes ahead and ejects the device regardless of whether the
> the memory is still in use or not.
>
> For this reason a new acpi_device operation called prepare_remove is introduced.
> This operation should be registered for acpi devices whose removal (from kernel
> perspective) can fail. Memory devices fall in this category.
>
> acpi_bus_remove() is changed to handle removal in 2 steps:
> - preparation for removal i.e. perform part of removal that can fail. Should
> succeed for device and all its children.
> - if above step was successfull, proceed to actual device removal

Hi Vasilis,
We met the same problem when we doing computer node hotplug, It is a good idea
to introduce prepare_remove before actual device removal.

I think we could do more in prepare_remove, such as rollback. In most cases, we can
offline most of memory sections except kernel used pages now, should we rollback
and online the memory sections when prepare_remove failed ?

As you may know, the ACPI based hotplug framework we are working on already addressed
this problem, and the way we slove this problem is a bit like yours.

We introduce hp_ops in struct acpi_device_ops:
struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
#ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
#endif /* CONFIG_ACPI_HOTPLUG */
};

in hp_ops, we divide the prepare_remove into six small steps, that is:
1) pre_release(): optional step to mark device going to be removed/busy
2) release(): reclaim device from running system
3) post_release(): rollback if cancelled by user or error happened
4) pre_unconfigure(): optional step to solve possible dependency issue
5) unconfigure(): remove devices from running system
6) post_unconfigure(): free resources used by devices

In this way, we can easily rollback if error happens.
How do you think of this solution, any suggestion ? I think we can achieve
a better way for sharing ideas. :)

Thanks
Hanjun Guo

>
> With this patchset, only acpi memory devices use the new prepare_remove
> device operation. The actual memory removal (VM-related offline and other memory
> cleanups) is moved to prepare_remove. The old remove operation just cleans up
> the acpi structures. Directly ejecting PNP0C80 memory devices works safely. I
> haven't tested yet with an ACPI container which contains memory devices.
>
> Note that unbinding the acpi driver from a memory device with:
> echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
>
> will no longer try to remove the memory. This is in compliance with normal
> unbind driver core semantics, see the discussion in v2 of this patchset:
> https://lkml.org/lkml/2012/11/16/649
>
> After a successful unbind of the driver:
> - OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
> return -ENODEV on missing driver.
> - SCI ejects of the memory device also cannot proceed, as they will also get
> a "driver data is NULL" error.
> So the memory can continue to be used safely after unbind.
>
> Patchset based on Rafael's linux-pm/linux-next (commit 78c38651).
> Comments welcome.
>
> v2->v3:
> - remove driver core changes. Only acpi core changes needed. Unbind semantics
> follow driver core rules. Unbind does not remove memory.
> - new patch to set enable bit in order to proceed with ejects on driver
> re-binding scenario.
>
> v1->v2:
> - new patch to introduce bus_type prepare_remove callback. Needed to prepare
> removal on driver unbinding from device-driver core.
> - v1 patches 1 and 2 simplified and merged in one. acpi_bus_trim does not require
> argument changes.
>
> Vasilis Liaskovitis (3):
> acpi: Introduce prepare_remove operation in acpi_device_ops
> acpi_memhotplug: Add prepare_remove operation
> acpi_memhotplug: Allow eject to proceed on rebind scenario
>
> drivers/acpi/acpi_memhotplug.c | 21 +++++++++++++++++----
> drivers/acpi/scan.c | 9 ++++++++-
> include/acpi/acpi_bus.h | 2 ++
> 3 files changed, 27 insertions(+), 5 deletions(-)
>

2012-11-28 16:09:42

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wed, 2012-11-28 at 00:41 +0100, Rafael J. Wysocki wrote:
> On Tuesday, November 27, 2012 03:03:47 PM Toshi Kani wrote:
> > On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> > > On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > > > >> Consider the following sequence of operations for a hotplugged memory
> > > > > >> device:
> > > > > >>
> > > > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > >>
> > > > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > > > >> step 2. After
> > > > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > > > >> is very
> > > > > >> dangerous.
> > > > > >
> > > > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > > > driver before taking any action:
> > > > > >
> > > > > > #ifndef FORCE_EJECT
> > > > > > if (acpi_device->driver == NULL) {
> > > > > > ret = -ENODEV;
> > > > > > goto err;
> > > > > > }
> > > > > > #endif
> > > > > >
> > > > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > > >
> > > > > Consider the following case:
> > > > >
> > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > >
> > > > > CPUa CPUb
> > > > > acpi_memory_device_notify()
> > > > > unbind it from the driver
> > > > > acpi_bus_hot_remove_device()
> > > >
> > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > unbind only?
> > >
> > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > if it's not present (dev->driver == NULL).
> > >
> > > But there can still be a race between an eject and an unbind operation happening
> > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > for other acpi devices?
> > >
> > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > probably not grabbed on SCI-initiated acpi ejects.
> >
> > Since driver_unbind() calls device_lock(dev->parent) before calling
> > device_release_driver(), I am wondering if we can call
> > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > parent lock is otherwise released after device_release_driver() is done.
>
> I would be careful. You may introduce some subtle locking-related issues
> this way.

Right. This requires careful inspection and testing. As far as the
locking is concerned, I am not keen on using fine grained locking for
hot-plug. It is much simpler and solid if we serialize such operations.

> Besides, there may be an alternative approach to all this. For example,
> what if we don't remove struct device objects on eject? The ACPI handles
> associated with them don't go away in that case after all, do they?

Umm... Sorry, I am not getting your point. The issue is that we need
to be able to fail a request when memory range cannot be off-lined.
Otherwise, we end up ejecting online memory range.

Thanks,
-Toshi

2012-11-28 18:36:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wednesday, November 28, 2012 09:01:13 AM Toshi Kani wrote:
> On Wed, 2012-11-28 at 00:41 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, November 27, 2012 03:03:47 PM Toshi Kani wrote:
> > > On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> > > > On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > > > > >> Consider the following sequence of operations for a hotplugged memory
> > > > > > >> device:
> > > > > > >>
> > > > > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > > >>
> > > > > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > > > > >> step 2. After
> > > > > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > > > > >> is very
> > > > > > >> dangerous.
> > > > > > >
> > > > > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > > > > driver before taking any action:
> > > > > > >
> > > > > > > #ifndef FORCE_EJECT
> > > > > > > if (acpi_device->driver == NULL) {
> > > > > > > ret = -ENODEV;
> > > > > > > goto err;
> > > > > > > }
> > > > > > > #endif
> > > > > > >
> > > > > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > > > >
> > > > > > Consider the following case:
> > > > > >
> > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > >
> > > > > > CPUa CPUb
> > > > > > acpi_memory_device_notify()
> > > > > > unbind it from the driver
> > > > > > acpi_bus_hot_remove_device()
> > > > >
> > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > unbind only?
> > > >
> > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > if it's not present (dev->driver == NULL).
> > > >
> > > > But there can still be a race between an eject and an unbind operation happening
> > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > for other acpi devices?
> > > >
> > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > probably not grabbed on SCI-initiated acpi ejects.
> > >
> > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > device_release_driver(), I am wondering if we can call
> > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > parent lock is otherwise released after device_release_driver() is done.
> >
> > I would be careful. You may introduce some subtle locking-related issues
> > this way.
>
> Right. This requires careful inspection and testing. As far as the
> locking is concerned, I am not keen on using fine grained locking for
> hot-plug. It is much simpler and solid if we serialize such operations.
>
> > Besides, there may be an alternative approach to all this. For example,
> > what if we don't remove struct device objects on eject? The ACPI handles
> > associated with them don't go away in that case after all, do they?
>
> Umm... Sorry, I am not getting your point. The issue is that we need
> to be able to fail a request when memory range cannot be off-lined.
> Otherwise, we end up ejecting online memory range.

Yes, this is the major one. The minor issue, however, is a race condition
between unbinding a driver from a device and removing the device if I
understand it correctly. Which will go away automatically if the device is
not removed in the first place. Or so I would think. :-)

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-28 18:50:06

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > As discussed in https://patchwork.kernel.org/patch/1581581/
> > the driver core remove function needs to always succeed. This means we need
> > to know that the device can be successfully removed before acpi_bus_trim /
> > acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> > or SCI-initiated eject of memory devices fail e.g with:
> > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> >
> > since the ACPI core goes ahead and ejects the device regardless of whether the
> > the memory is still in use or not.
> >
> > For this reason a new acpi_device operation called prepare_remove is introduced.
> > This operation should be registered for acpi devices whose removal (from kernel
> > perspective) can fail. Memory devices fall in this category.
> >
> > acpi_bus_remove() is changed to handle removal in 2 steps:
> > - preparation for removal i.e. perform part of removal that can fail. Should
> > succeed for device and all its children.
> > - if above step was successfull, proceed to actual device removal
>
> Hi Vasilis,
> We met the same problem when we doing computer node hotplug, It is a good idea
> to introduce prepare_remove before actual device removal.
>
> I think we could do more in prepare_remove, such as rollback. In most cases, we can
> offline most of memory sections except kernel used pages now, should we rollback
> and online the memory sections when prepare_remove failed ?

I think hot-plug operation should have all-or-nothing semantics. That
is, an operation should either complete successfully, or rollback to the
original state.

> As you may know, the ACPI based hotplug framework we are working on already addressed
> this problem, and the way we slove this problem is a bit like yours.
>
> We introduce hp_ops in struct acpi_device_ops:
> struct acpi_device_ops {
> acpi_op_add add;
> acpi_op_remove remove;
> acpi_op_start start;
> acpi_op_bind bind;
> acpi_op_unbind unbind;
> acpi_op_notify notify;
> #ifdef CONFIG_ACPI_HOTPLUG
> struct acpihp_dev_ops *hp_ops;
> #endif /* CONFIG_ACPI_HOTPLUG */
> };
>
> in hp_ops, we divide the prepare_remove into six small steps, that is:
> 1) pre_release(): optional step to mark device going to be removed/busy
> 2) release(): reclaim device from running system
> 3) post_release(): rollback if cancelled by user or error happened
> 4) pre_unconfigure(): optional step to solve possible dependency issue
> 5) unconfigure(): remove devices from running system
> 6) post_unconfigure(): free resources used by devices
>
> In this way, we can easily rollback if error happens.
> How do you think of this solution, any suggestion ? I think we can achieve
> a better way for sharing ideas. :)

Yes, sharing idea is good. :) I do not know if we need all 6 steps (I
have not looked at all your changes yet..), but in my mind, a hot-plug
operation should be composed with the following 3 phases.

1. Validate phase - Verify if the request is a supported operation. All
known restrictions are verified at this phase. For instance, if a
hot-remove request involves kernel memory, it is failed in this phase.
Since this phase makes no change, no rollback is necessary to fail.

2. Execute phase - Perform hot-add / hot-remove operation that can be
rolled-back in case of error or cancel.

3. Commit phase - Perform the final hot-add / hot-remove operation that
cannot be rolled-back. No error / cancel is allowed in this phase. For
instance, eject operation is performed at this phase.


Thanks,
-Toshi



2012-11-28 21:11:17

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

> > > > > > > Consider the following case:
> > > > > > >
> > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > >
> > > > > > > CPUa CPUb
> > > > > > > acpi_memory_device_notify()
> > > > > > > unbind it from the driver
> > > > > > > acpi_bus_hot_remove_device()
> > > > > >
> > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > > unbind only?
> > > > >
> > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > if it's not present (dev->driver == NULL).
> > > > >
> > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > for other acpi devices?
> > > > >
> > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > >
> > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > device_release_driver(), I am wondering if we can call
> > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > > parent lock is otherwise released after device_release_driver() is done.
> > >
> > > I would be careful. You may introduce some subtle locking-related issues
> > > this way.
> >
> > Right. This requires careful inspection and testing. As far as the
> > locking is concerned, I am not keen on using fine grained locking for
> > hot-plug. It is much simpler and solid if we serialize such operations.
> >
> > > Besides, there may be an alternative approach to all this. For example,
> > > what if we don't remove struct device objects on eject? The ACPI handles
> > > associated with them don't go away in that case after all, do they?
> >
> > Umm... Sorry, I am not getting your point. The issue is that we need
> > to be able to fail a request when memory range cannot be off-lined.
> > Otherwise, we end up ejecting online memory range.
>
> Yes, this is the major one. The minor issue, however, is a race condition
> between unbinding a driver from a device and removing the device if I
> understand it correctly. Which will go away automatically if the device is
> not removed in the first place. Or so I would think. :-)

I see. I do not think whether or not the device is removed on eject
makes any difference here. The issue is that after driver_unbind() is
done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
driver (hence, it cannot fail in prepare_remove), and goes ahead to call
_EJ0. If driver_unbind() did off-line the memory, this is OK. However,
it cannot off-line kernel memory ranges. So, we basically need to
either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
during the operation.

Thanks,
-Toshi

2012-11-28 21:36:07

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > Consider the following case:
> > > > > > > >
> > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > >
> > > > > > > > CPUa CPUb
> > > > > > > > acpi_memory_device_notify()
> > > > > > > > unbind it from the driver
> > > > > > > > acpi_bus_hot_remove_device()
> > > > > > >
> > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > > > unbind only?
> > > > > >
> > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > if it's not present (dev->driver == NULL).
> > > > > >
> > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > for other acpi devices?
> > > > > >
> > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > >
> > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > device_release_driver(), I am wondering if we can call
> > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > > > parent lock is otherwise released after device_release_driver() is done.
> > > >
> > > > I would be careful. You may introduce some subtle locking-related issues
> > > > this way.
> > >
> > > Right. This requires careful inspection and testing. As far as the
> > > locking is concerned, I am not keen on using fine grained locking for
> > > hot-plug. It is much simpler and solid if we serialize such operations.
> > >
> > > > Besides, there may be an alternative approach to all this. For example,
> > > > what if we don't remove struct device objects on eject? The ACPI handles
> > > > associated with them don't go away in that case after all, do they?
> > >
> > > Umm... Sorry, I am not getting your point. The issue is that we need
> > > to be able to fail a request when memory range cannot be off-lined.
> > > Otherwise, we end up ejecting online memory range.
> >
> > Yes, this is the major one. The minor issue, however, is a race condition
> > between unbinding a driver from a device and removing the device if I
> > understand it correctly. Which will go away automatically if the device is
> > not removed in the first place. Or so I would think. :-)
>
> I see. I do not think whether or not the device is removed on eject
> makes any difference here. The issue is that after driver_unbind() is
> done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> _EJ0. If driver_unbind() did off-line the memory, this is OK. However,
> it cannot off-line kernel memory ranges. So, we basically need to
> either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> during the operation.

OK, I see the problem now.

What exactly is triggering the driver_unbind() in this scenario?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-28 21:48:35

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > Consider the following case:
> > > > > > > > >
> > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > >
> > > > > > > > > CPUa CPUb
> > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > unbind it from the driver
> > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > >
> > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > > > > unbind only?
> > > > > > >
> > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > if it's not present (dev->driver == NULL).
> > > > > > >
> > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > for other acpi devices?
> > > > > > >
> > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > >
> > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > device_release_driver(), I am wondering if we can call
> > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > >
> > > > > I would be careful. You may introduce some subtle locking-related issues
> > > > > this way.
> > > >
> > > > Right. This requires careful inspection and testing. As far as the
> > > > locking is concerned, I am not keen on using fine grained locking for
> > > > hot-plug. It is much simpler and solid if we serialize such operations.
> > > >
> > > > > Besides, there may be an alternative approach to all this. For example,
> > > > > what if we don't remove struct device objects on eject? The ACPI handles
> > > > > associated with them don't go away in that case after all, do they?
> > > >
> > > > Umm... Sorry, I am not getting your point. The issue is that we need
> > > > to be able to fail a request when memory range cannot be off-lined.
> > > > Otherwise, we end up ejecting online memory range.
> > >
> > > Yes, this is the major one. The minor issue, however, is a race condition
> > > between unbinding a driver from a device and removing the device if I
> > > understand it correctly. Which will go away automatically if the device is
> > > not removed in the first place. Or so I would think. :-)
> >
> > I see. I do not think whether or not the device is removed on eject
> > makes any difference here. The issue is that after driver_unbind() is
> > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > _EJ0. If driver_unbind() did off-line the memory, this is OK. However,
> > it cannot off-line kernel memory ranges. So, we basically need to
> > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > during the operation.
>
> OK, I see the problem now.
>
> What exactly is triggering the driver_unbind() in this scenario?

User can request driver_unbind() from sysfs as follows. I do not see
much reason why user has to do for memory, though.

echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind


Thanks,
-Toshi


2012-11-28 21:56:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wednesday, November 28, 2012 02:40:09 PM Toshi Kani wrote:
> On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > Consider the following case:
> > > > > > > > > >
> > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > >
> > > > > > > > > > CPUa CPUb
> > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > unbind it from the driver
> > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > >
> > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > > > > > unbind only?
> > > > > > > >
> > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > >
> > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > for other acpi devices?
> > > > > > > >
> > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > >
> > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > >
> > > > > > I would be careful. You may introduce some subtle locking-related issues
> > > > > > this way.
> > > > >
> > > > > Right. This requires careful inspection and testing. As far as the
> > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > hot-plug. It is much simpler and solid if we serialize such operations.
> > > > >
> > > > > > Besides, there may be an alternative approach to all this. For example,
> > > > > > what if we don't remove struct device objects on eject? The ACPI handles
> > > > > > associated with them don't go away in that case after all, do they?
> > > > >
> > > > > Umm... Sorry, I am not getting your point. The issue is that we need
> > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > Otherwise, we end up ejecting online memory range.
> > > >
> > > > Yes, this is the major one. The minor issue, however, is a race condition
> > > > between unbinding a driver from a device and removing the device if I
> > > > understand it correctly. Which will go away automatically if the device is
> > > > not removed in the first place. Or so I would think. :-)
> > >
> > > I see. I do not think whether or not the device is removed on eject
> > > makes any difference here. The issue is that after driver_unbind() is
> > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > _EJ0. If driver_unbind() did off-line the memory, this is OK. However,
> > > it cannot off-line kernel memory ranges. So, we basically need to
> > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > during the operation.
> >
> > OK, I see the problem now.
> >
> > What exactly is triggering the driver_unbind() in this scenario?
>
> User can request driver_unbind() from sysfs as follows. I do not see
> much reason why user has to do for memory, though.
>
> echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind

This is wrong. Even if we want to permit user space to forcibly unbind
drivers from anything like this, we should at least check for some
situations in which it is plain dangerous. Like in this case. So I think
the above should fail unless we know that the driver won't be necessary
to handle hot-removal of memory.

Alternatively, this may actually try to carry out the hot-removal and only
call driver_unbind() if that succeeds. Whichever is preferable, I'd say.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-28 22:13:19

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wed, 2012-11-28 at 23:01 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:40:09 PM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > Consider the following case:
> > > > > > > > > > >
> > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > >
> > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > > >
> > > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > > > > > > unbind only?
> > > > > > > > >
> > > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > >
> > > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > > for other acpi devices?
> > > > > > > > >
> > > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > >
> > > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > >
> > > > > > > I would be careful. You may introduce some subtle locking-related issues
> > > > > > > this way.
> > > > > >
> > > > > > Right. This requires careful inspection and testing. As far as the
> > > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > > hot-plug. It is much simpler and solid if we serialize such operations.
> > > > > >
> > > > > > > Besides, there may be an alternative approach to all this. For example,
> > > > > > > what if we don't remove struct device objects on eject? The ACPI handles
> > > > > > > associated with them don't go away in that case after all, do they?
> > > > > >
> > > > > > Umm... Sorry, I am not getting your point. The issue is that we need
> > > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > > Otherwise, we end up ejecting online memory range.
> > > > >
> > > > > Yes, this is the major one. The minor issue, however, is a race condition
> > > > > between unbinding a driver from a device and removing the device if I
> > > > > understand it correctly. Which will go away automatically if the device is
> > > > > not removed in the first place. Or so I would think. :-)
> > > >
> > > > I see. I do not think whether or not the device is removed on eject
> > > > makes any difference here. The issue is that after driver_unbind() is
> > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > _EJ0. If driver_unbind() did off-line the memory, this is OK. However,
> > > > it cannot off-line kernel memory ranges. So, we basically need to
> > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > during the operation.
> > >
> > > OK, I see the problem now.
> > >
> > > What exactly is triggering the driver_unbind() in this scenario?
> >
> > User can request driver_unbind() from sysfs as follows. I do not see
> > much reason why user has to do for memory, though.
> >
> > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
>
> This is wrong. Even if we want to permit user space to forcibly unbind
> drivers from anything like this, we should at least check for some
> situations in which it is plain dangerous. Like in this case. So I think
> the above should fail unless we know that the driver won't be necessary
> to handle hot-removal of memory.

Well, we tried twice already... :)
https://lkml.org/lkml/2012/11/16/649

> Alternatively, this may actually try to carry out the hot-removal and only
> call driver_unbind() if that succeeds. Whichever is preferable, I'd say.

Greg clarified in the above link that this interface is "unbind", not
remove.


Thanks,
-Toshi

2012-11-28 22:17:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wednesday, November 28, 2012 03:04:52 PM Toshi Kani wrote:
> On Wed, 2012-11-28 at 23:01 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:40:09 PM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > >
> > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > >
> > > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > > > >
> > > > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > > > > > > > unbind only?
> > > > > > > > > >
> > > > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > > >
> > > > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > > > for other acpi devices?
> > > > > > > > > >
> > > > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > > >
> > > > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > > >
> > > > > > > > I would be careful. You may introduce some subtle locking-related issues
> > > > > > > > this way.
> > > > > > >
> > > > > > > Right. This requires careful inspection and testing. As far as the
> > > > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > > > hot-plug. It is much simpler and solid if we serialize such operations.
> > > > > > >
> > > > > > > > Besides, there may be an alternative approach to all this. For example,
> > > > > > > > what if we don't remove struct device objects on eject? The ACPI handles
> > > > > > > > associated with them don't go away in that case after all, do they?
> > > > > > >
> > > > > > > Umm... Sorry, I am not getting your point. The issue is that we need
> > > > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > > > Otherwise, we end up ejecting online memory range.
> > > > > >
> > > > > > Yes, this is the major one. The minor issue, however, is a race condition
> > > > > > between unbinding a driver from a device and removing the device if I
> > > > > > understand it correctly. Which will go away automatically if the device is
> > > > > > not removed in the first place. Or so I would think. :-)
> > > > >
> > > > > I see. I do not think whether or not the device is removed on eject
> > > > > makes any difference here. The issue is that after driver_unbind() is
> > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > _EJ0. If driver_unbind() did off-line the memory, this is OK. However,
> > > > > it cannot off-line kernel memory ranges. So, we basically need to
> > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > during the operation.
> > > >
> > > > OK, I see the problem now.
> > > >
> > > > What exactly is triggering the driver_unbind() in this scenario?
> > >
> > > User can request driver_unbind() from sysfs as follows. I do not see
> > > much reason why user has to do for memory, though.
> > >
> > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> >
> > This is wrong. Even if we want to permit user space to forcibly unbind
> > drivers from anything like this, we should at least check for some
> > situations in which it is plain dangerous. Like in this case. So I think
> > the above should fail unless we know that the driver won't be necessary
> > to handle hot-removal of memory.
>
> Well, we tried twice already... :)
> https://lkml.org/lkml/2012/11/16/649

I didn't mean driver_unbind() should fail. The code path that executes
driver_unbind() eventually should fail _before_ executing it.

> > Alternatively, this may actually try to carry out the hot-removal and only
> > call driver_unbind() if that succeeds. Whichever is preferable, I'd say.
>
> Greg clarified in the above link that this interface is "unbind", not
> remove.

OK, so that's clear.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-28 22:24:54

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

> > > > > > I see. I do not think whether or not the device is removed on eject
> > > > > > makes any difference here. The issue is that after driver_unbind() is
> > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > _EJ0. If driver_unbind() did off-line the memory, this is OK. However,
> > > > > > it cannot off-line kernel memory ranges. So, we basically need to
> > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > during the operation.
> > > > >
> > > > > OK, I see the problem now.
> > > > >
> > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > >
> > > > User can request driver_unbind() from sysfs as follows. I do not see
> > > > much reason why user has to do for memory, though.
> > > >
> > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > >
> > > This is wrong. Even if we want to permit user space to forcibly unbind
> > > drivers from anything like this, we should at least check for some
> > > situations in which it is plain dangerous. Like in this case. So I think
> > > the above should fail unless we know that the driver won't be necessary
> > > to handle hot-removal of memory.
> >
> > Well, we tried twice already... :)
> > https://lkml.org/lkml/2012/11/16/649
>
> I didn't mean driver_unbind() should fail. The code path that executes
> driver_unbind() eventually should fail _before_ executing it.

driver_unbind() is the handler, so it is called directly from this
unbind interface.

Thanks,
-Toshi

2012-11-28 22:34:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wednesday, November 28, 2012 03:16:22 PM Toshi Kani wrote:
> > > > > > > I see. I do not think whether or not the device is removed on eject
> > > > > > > makes any difference here. The issue is that after driver_unbind() is
> > > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > > _EJ0. If driver_unbind() did off-line the memory, this is OK. However,
> > > > > > > it cannot off-line kernel memory ranges. So, we basically need to
> > > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > > during the operation.
> > > > > >
> > > > > > OK, I see the problem now.
> > > > > >
> > > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > >
> > > > > User can request driver_unbind() from sysfs as follows. I do not see
> > > > > much reason why user has to do for memory, though.
> > > > >
> > > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > >
> > > > This is wrong. Even if we want to permit user space to forcibly unbind
> > > > drivers from anything like this, we should at least check for some
> > > > situations in which it is plain dangerous. Like in this case. So I think
> > > > the above should fail unless we know that the driver won't be necessary
> > > > to handle hot-removal of memory.
> > >
> > > Well, we tried twice already... :)
> > > https://lkml.org/lkml/2012/11/16/649
> >
> > I didn't mean driver_unbind() should fail. The code path that executes
> > driver_unbind() eventually should fail _before_ executing it.
>
> driver_unbind() is the handler, so it is called directly from this
> unbind interface.

Yes, sorry for the confusion.

So, it looks like the driver core wants us to handle driver unbinding no
matter what.

This pretty much means that it is a bad idea to have a driver that is
exposed as a "device driver" in sysfs for memory hotplugging.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-28 22:46:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wed, Nov 28, 2012 at 11:39:22PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 03:16:22 PM Toshi Kani wrote:
> > > > > > > > I see. I do not think whether or not the device is removed on eject
> > > > > > > > makes any difference here. The issue is that after driver_unbind() is
> > > > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > > > _EJ0. If driver_unbind() did off-line the memory, this is OK. However,
> > > > > > > > it cannot off-line kernel memory ranges. So, we basically need to
> > > > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > > > during the operation.
> > > > > > >
> > > > > > > OK, I see the problem now.
> > > > > > >
> > > > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > > >
> > > > > > User can request driver_unbind() from sysfs as follows. I do not see
> > > > > > much reason why user has to do for memory, though.
> > > > > >
> > > > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > >
> > > > > This is wrong. Even if we want to permit user space to forcibly unbind
> > > > > drivers from anything like this, we should at least check for some
> > > > > situations in which it is plain dangerous. Like in this case. So I think
> > > > > the above should fail unless we know that the driver won't be necessary
> > > > > to handle hot-removal of memory.
> > > >
> > > > Well, we tried twice already... :)
> > > > https://lkml.org/lkml/2012/11/16/649
> > >
> > > I didn't mean driver_unbind() should fail. The code path that executes
> > > driver_unbind() eventually should fail _before_ executing it.
> >
> > driver_unbind() is the handler, so it is called directly from this
> > unbind interface.
>
> Yes, sorry for the confusion.
>
> So, it looks like the driver core wants us to handle driver unbinding no
> matter what.

Yes. Well, the driver core does the unbinding no matter what, if it was
told, by a user, to do so. Why is that a problem? The user then is
responsible for any bad things (i.e. not able to control the device any
more), if they do so.

> This pretty much means that it is a bad idea to have a driver that is
> exposed as a "device driver" in sysfs for memory hotplugging.

Again, why? All this means is that the driver is now not connected to
the device (memory in this case.) The memory is still there, still
operates as before, only difference is, the driver can't touch it
anymore.

This is the same for any ACPI driver, and has been for years.

Please don't confuse unbind with any "normal" system operation, it is
not to be used for memory hotplug, or anything else like this.

Also, if you really do not want to do this, turn off the ability to
unbind/bind for these devices, that is under your control in your bus
logic.

thanks,

greg k-h

2012-11-28 23:00:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wednesday, November 28, 2012 02:46:33 PM Greg KH wrote:
> On Wed, Nov 28, 2012 at 11:39:22PM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 03:16:22 PM Toshi Kani wrote:
> > > > > > > > > I see. I do not think whether or not the device is removed on eject
> > > > > > > > > makes any difference here. The issue is that after driver_unbind() is
> > > > > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > > > > _EJ0. If driver_unbind() did off-line the memory, this is OK. However,
> > > > > > > > > it cannot off-line kernel memory ranges. So, we basically need to
> > > > > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > > > > during the operation.
> > > > > > > >
> > > > > > > > OK, I see the problem now.
> > > > > > > >
> > > > > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > > > >
> > > > > > > User can request driver_unbind() from sysfs as follows. I do not see
> > > > > > > much reason why user has to do for memory, though.
> > > > > > >
> > > > > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > >
> > > > > > This is wrong. Even if we want to permit user space to forcibly unbind
> > > > > > drivers from anything like this, we should at least check for some
> > > > > > situations in which it is plain dangerous. Like in this case. So I think
> > > > > > the above should fail unless we know that the driver won't be necessary
> > > > > > to handle hot-removal of memory.
> > > > >
> > > > > Well, we tried twice already... :)
> > > > > https://lkml.org/lkml/2012/11/16/649
> > > >
> > > > I didn't mean driver_unbind() should fail. The code path that executes
> > > > driver_unbind() eventually should fail _before_ executing it.
> > >
> > > driver_unbind() is the handler, so it is called directly from this
> > > unbind interface.
> >
> > Yes, sorry for the confusion.
> >
> > So, it looks like the driver core wants us to handle driver unbinding no
> > matter what.
>
> Yes. Well, the driver core does the unbinding no matter what, if it was
> told, by a user, to do so. Why is that a problem? The user then is
> responsible for any bad things (i.e. not able to control the device any
> more), if they do so.

I don't really agree with that, because the user may simply not know what
the consequences of that will be. In my not so humble opinion any interface
allowing user space to crash the kernel is a bad one. And this is an example
of that.

> > This pretty much means that it is a bad idea to have a driver that is
> > exposed as a "device driver" in sysfs for memory hotplugging.
>
> Again, why? All this means is that the driver is now not connected to
> the device (memory in this case.) The memory is still there, still
> operates as before, only difference is, the driver can't touch it
> anymore.
>
> This is the same for any ACPI driver, and has been for years.

Except that if this driver has been unbound and the removal is triggered by
an SCI, the core will just go on and remove the memory, although it may
be killing the kernel this way.

Arguably, this may be considered as the core's fault, but the only way to
fix that would be to move the code from that driver into the core and not to
register it as a "driver" any more. Which was my point. :-)

> Please don't confuse unbind with any "normal" system operation, it is
> not to be used for memory hotplug, or anything else like this.
>
> Also, if you really do not want to do this, turn off the ability to
> unbind/bind for these devices, that is under your control in your bus
> logic.

OK, but how? I'm looking at driver_unbind() and not seeing any way to do
that actually.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-28 23:10:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thu, Nov 29, 2012 at 12:05:20AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:46:33 PM Greg KH wrote:
> > > So, it looks like the driver core wants us to handle driver unbinding no
> > > matter what.
> >
> > Yes. Well, the driver core does the unbinding no matter what, if it was
> > told, by a user, to do so. Why is that a problem? The user then is
> > responsible for any bad things (i.e. not able to control the device any
> > more), if they do so.
>
> I don't really agree with that, because the user may simply not know what
> the consequences of that will be. In my not so humble opinion any interface
> allowing user space to crash the kernel is a bad one. And this is an example
> of that.

This has been in place since 2005, over 7 years now, and I have never
heard any problems with it being used to crash the kernel, despite the
easy ability for people to unbind all of their devices from drivers and
instantly cause a system hang. So really doubt this is a problem in
real life :)

> > > This pretty much means that it is a bad idea to have a driver that is
> > > exposed as a "device driver" in sysfs for memory hotplugging.
> >
> > Again, why? All this means is that the driver is now not connected to
> > the device (memory in this case.) The memory is still there, still
> > operates as before, only difference is, the driver can't touch it
> > anymore.
> >
> > This is the same for any ACPI driver, and has been for years.
>
> Except that if this driver has been unbound and the removal is triggered by
> an SCI, the core will just go on and remove the memory, although it may
> be killing the kernel this way.

Why would memory go away if a driver is unbound from a device? The
device didn't go away. It's the same if the driver was a module and it
was unloaded, you should not turn memory off in that situation, right?
Are you also going to prevent module unloading of this driver?

> Arguably, this may be considered as the core's fault, but the only way to
> fix that would be to move the code from that driver into the core and not to
> register it as a "driver" any more. Which was my point. :-)

No, I think people are totally overreacting to the unbind/bind files,
which are there to aid in development, and in adding new device ids to
drivers, as well as sometimes doing a hacky revoke() call.

> > Please don't confuse unbind with any "normal" system operation, it is
> > not to be used for memory hotplug, or anything else like this.
> >
> > Also, if you really do not want to do this, turn off the ability to
> > unbind/bind for these devices, that is under your control in your bus
> > logic.
>
> OK, but how? I'm looking at driver_unbind() and not seeing any way to do
> that actually.

See the suppress_bind_attrs field in struct device_driver. It's even
documented in device.h, but sadly, no one reads documentation :)

I recommend you set this field if you don't want the bind/unbind files
to show up for your memory driver, although I would argue that the
driver needs to be fixed up to not do foolish things like removing
memory from a system unless it really does go away...

hope this helps,

greg k-h

2012-11-28 23:26:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wednesday, November 28, 2012 03:10:46 PM Greg KH wrote:
> On Thu, Nov 29, 2012 at 12:05:20AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:46:33 PM Greg KH wrote:
> > > > So, it looks like the driver core wants us to handle driver unbinding no
> > > > matter what.
> > >
> > > Yes. Well, the driver core does the unbinding no matter what, if it was
> > > told, by a user, to do so. Why is that a problem? The user then is
> > > responsible for any bad things (i.e. not able to control the device any
> > > more), if they do so.
> >
> > I don't really agree with that, because the user may simply not know what
> > the consequences of that will be. In my not so humble opinion any interface
> > allowing user space to crash the kernel is a bad one. And this is an example
> > of that.
>
> This has been in place since 2005, over 7 years now, and I have never
> heard any problems with it being used to crash the kernel, despite the
> easy ability for people to unbind all of their devices from drivers and
> instantly cause a system hang. So really doubt this is a problem in
> real life :)
>
> > > > This pretty much means that it is a bad idea to have a driver that is
> > > > exposed as a "device driver" in sysfs for memory hotplugging.
> > >
> > > Again, why? All this means is that the driver is now not connected to
> > > the device (memory in this case.) The memory is still there, still
> > > operates as before, only difference is, the driver can't touch it
> > > anymore.
> > >
> > > This is the same for any ACPI driver, and has been for years.
> >
> > Except that if this driver has been unbound and the removal is triggered by
> > an SCI, the core will just go on and remove the memory, although it may
> > be killing the kernel this way.
>
> Why would memory go away if a driver is unbound from a device? The
> device didn't go away. It's the same if the driver was a module and it
> was unloaded, you should not turn memory off in that situation, right?

Right. It looks like there's some confusion about the role of .remove()
in the ACPI subsystem, but I need to investigate it a bit more.

> Are you also going to prevent module unloading of this driver?

I'm not sure what I'm going to do with that driver at the moment to be honest. :-)

> > Arguably, this may be considered as the core's fault, but the only way to
> > fix that would be to move the code from that driver into the core and not to
> > register it as a "driver" any more. Which was my point. :-)
>
> No, I think people are totally overreacting to the unbind/bind files,
> which are there to aid in development, and in adding new device ids to
> drivers, as well as sometimes doing a hacky revoke() call.
>
> > > Please don't confuse unbind with any "normal" system operation, it is
> > > not to be used for memory hotplug, or anything else like this.
> > >
> > > Also, if you really do not want to do this, turn off the ability to
> > > unbind/bind for these devices, that is under your control in your bus
> > > logic.
> >
> > OK, but how? I'm looking at driver_unbind() and not seeing any way to do
> > that actually.
>
> See the suppress_bind_attrs field in struct device_driver. It's even
> documented in device.h, but sadly, no one reads documentation :)

That's good to know, thanks. :-)

And if I knew I could find that information in device.h, I'd look in there.

> I recommend you set this field if you don't want the bind/unbind files
> to show up for your memory driver, although I would argue that the
> driver needs to be fixed up to not do foolish things like removing
> memory from a system unless it really does go away...

Quite frankly, I need to look at that driver and how things are supposed to
work more thoroughly, because I don't seem to see a reason to do various
things the way they are done. Well, maybe it's just me.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-28 23:45:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > Consider the following case:
> > > > > > > >
> > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > >
> > > > > > > > CPUa CPUb
> > > > > > > > acpi_memory_device_notify()
> > > > > > > > unbind it from the driver
> > > > > > > > acpi_bus_hot_remove_device()
> > > > > > >
> > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > > > unbind only?
> > > > > >
> > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > if it's not present (dev->driver == NULL).
> > > > > >
> > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > for other acpi devices?
> > > > > >
> > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > >
> > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > device_release_driver(), I am wondering if we can call
> > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > > > parent lock is otherwise released after device_release_driver() is done.
> > > >
> > > > I would be careful. You may introduce some subtle locking-related issues
> > > > this way.
> > >
> > > Right. This requires careful inspection and testing. As far as the
> > > locking is concerned, I am not keen on using fine grained locking for
> > > hot-plug. It is much simpler and solid if we serialize such operations.
> > >
> > > > Besides, there may be an alternative approach to all this. For example,
> > > > what if we don't remove struct device objects on eject? The ACPI handles
> > > > associated with them don't go away in that case after all, do they?
> > >
> > > Umm... Sorry, I am not getting your point. The issue is that we need
> > > to be able to fail a request when memory range cannot be off-lined.
> > > Otherwise, we end up ejecting online memory range.
> >
> > Yes, this is the major one. The minor issue, however, is a race condition
> > between unbinding a driver from a device and removing the device if I
> > understand it correctly. Which will go away automatically if the device is
> > not removed in the first place. Or so I would think. :-)
>
> I see. I do not think whether or not the device is removed on eject
> makes any difference here. The issue is that after driver_unbind() is
> done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> _EJ0.

I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
acpi_eject_store() which is exposed through sysfs. If we disabled exposing
acpi_eject_store() for memory devices, then the only way would be from the
notify handler. So I wonder if driver_unbind() shouldn't just uninstall the
notify handler for memory (so that memory eject events are simply dropped on
the floor after unbinding the driver)?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-29 01:11:00

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > Consider the following case:
> > > > > > > > >
> > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > >
> > > > > > > > > CPUa CPUb
> > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > unbind it from the driver
> > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > >
> > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > > > > unbind only?
> > > > > > >
> > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > if it's not present (dev->driver == NULL).
> > > > > > >
> > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > for other acpi devices?
> > > > > > >
> > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > >
> > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > device_release_driver(), I am wondering if we can call
> > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > >
> > > > > I would be careful. You may introduce some subtle locking-related issues
> > > > > this way.
> > > >
> > > > Right. This requires careful inspection and testing. As far as the
> > > > locking is concerned, I am not keen on using fine grained locking for
> > > > hot-plug. It is much simpler and solid if we serialize such operations.
> > > >
> > > > > Besides, there may be an alternative approach to all this. For example,
> > > > > what if we don't remove struct device objects on eject? The ACPI handles
> > > > > associated with them don't go away in that case after all, do they?
> > > >
> > > > Umm... Sorry, I am not getting your point. The issue is that we need
> > > > to be able to fail a request when memory range cannot be off-lined.
> > > > Otherwise, we end up ejecting online memory range.
> > >
> > > Yes, this is the major one. The minor issue, however, is a race condition
> > > between unbinding a driver from a device and removing the device if I
> > > understand it correctly. Which will go away automatically if the device is
> > > not removed in the first place. Or so I would think. :-)
> >
> > I see. I do not think whether or not the device is removed on eject
> > makes any difference here. The issue is that after driver_unbind() is
> > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > _EJ0.
>
> I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> acpi_eject_store() which is exposed through sysfs.

Yes, that is correct.

> If we disabled exposing
> acpi_eject_store() for memory devices, then the only way would be from the
> notify handler. So I wonder if driver_unbind() shouldn't just uninstall the
> notify handler for memory (so that memory eject events are simply dropped on
> the floor after unbinding the driver)?

If driver_unbind() happens before an eject request, we do not have a
problem. acpi_eject_store() fails if a driver is not bound to the
device. acpi_memory_device_notify() fails as well.

The race condition Wen pointed out (see the top of this email) is that
driver_unbind() may come in while eject operation is in-progress. This
is why I mentioned the following in previous email.

> So, we basically need to either 1) serialize
> acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> during the operation.


Thanks,
-Toshi

2012-11-29 01:24:10

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > Consider the following case:
> > > > > > > > > >
> > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > >
> > > > > > > > > > CPUa CPUb
> > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > unbind it from the driver
> > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > >
> > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > > > > > unbind only?
> > > > > > > >
> > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > >
> > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > for other acpi devices?
> > > > > > > >
> > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > >
> > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > >
> > > > > > I would be careful. You may introduce some subtle locking-related issues
> > > > > > this way.
> > > > >
> > > > > Right. This requires careful inspection and testing. As far as the
> > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > hot-plug. It is much simpler and solid if we serialize such operations.
> > > > >
> > > > > > Besides, there may be an alternative approach to all this. For example,
> > > > > > what if we don't remove struct device objects on eject? The ACPI handles
> > > > > > associated with them don't go away in that case after all, do they?
> > > > >
> > > > > Umm... Sorry, I am not getting your point. The issue is that we need
> > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > Otherwise, we end up ejecting online memory range.
> > > >
> > > > Yes, this is the major one. The minor issue, however, is a race condition
> > > > between unbinding a driver from a device and removing the device if I
> > > > understand it correctly. Which will go away automatically if the device is
> > > > not removed in the first place. Or so I would think. :-)
> > >
> > > I see. I do not think whether or not the device is removed on eject
> > > makes any difference here. The issue is that after driver_unbind() is
> > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > _EJ0.
> >
> > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > acpi_eject_store() which is exposed through sysfs.
>
> Yes, that is correct.
>
> > If we disabled exposing
> > acpi_eject_store() for memory devices, then the only way would be from the
> > notify handler. So I wonder if driver_unbind() shouldn't just uninstall the
> > notify handler for memory (so that memory eject events are simply dropped on
> > the floor after unbinding the driver)?
>
> If driver_unbind() happens before an eject request, we do not have a
> problem. acpi_eject_store() fails if a driver is not bound to the
> device. acpi_memory_device_notify() fails as well.
>
> The race condition Wen pointed out (see the top of this email) is that
> driver_unbind() may come in while eject operation is in-progress. This
> is why I mentioned the following in previous email.
>
> > So, we basically need to either 1) serialize
> > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > during the operation.

Forgot to mention. The 3rd option is what Greg said -- use the
suppress_bind_attrs field. I think this is a good option to address
this race condition for now. For a long term solution, we should have a
better infrastructure in place to address such issue in general.

Thanks,
-Toshi

2012-11-29 04:49:41

by Hanjun Guo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On 2012/11/29 2:41, Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>> As discussed in https://patchwork.kernel.org/patch/1581581/
>>> the driver core remove function needs to always succeed. This means we need
>>> to know that the device can be successfully removed before acpi_bus_trim /
>>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
>>> or SCI-initiated eject of memory devices fail e.g with:
>>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>>
>>> since the ACPI core goes ahead and ejects the device regardless of whether the
>>> the memory is still in use or not.
>>>
>>> For this reason a new acpi_device operation called prepare_remove is introduced.
>>> This operation should be registered for acpi devices whose removal (from kernel
>>> perspective) can fail. Memory devices fall in this category.
>>>
>>> acpi_bus_remove() is changed to handle removal in 2 steps:
>>> - preparation for removal i.e. perform part of removal that can fail. Should
>>> succeed for device and all its children.
>>> - if above step was successfull, proceed to actual device removal
>>
>> Hi Vasilis,
>> We met the same problem when we doing computer node hotplug, It is a good idea
>> to introduce prepare_remove before actual device removal.
>>
>> I think we could do more in prepare_remove, such as rollback. In most cases, we can
>> offline most of memory sections except kernel used pages now, should we rollback
>> and online the memory sections when prepare_remove failed ?
>
> I think hot-plug operation should have all-or-nothing semantics. That
> is, an operation should either complete successfully, or rollback to the
> original state.

Yes, we have the same point of view with you. We handle this problem in the ACPI
based hot-plug framework as following:
1) hot add / hot remove complete successfully if no error happens;
2) automatic rollback to the original state if meets some error ;
3) rollback to the original if hot-plug operation cancelled by user ;

>
>> As you may know, the ACPI based hotplug framework we are working on already addressed
>> this problem, and the way we slove this problem is a bit like yours.
>>
>> We introduce hp_ops in struct acpi_device_ops:
>> struct acpi_device_ops {
>> acpi_op_add add;
>> acpi_op_remove remove;
>> acpi_op_start start;
>> acpi_op_bind bind;
>> acpi_op_unbind unbind;
>> acpi_op_notify notify;
>> #ifdef CONFIG_ACPI_HOTPLUG
>> struct acpihp_dev_ops *hp_ops;
>> #endif /* CONFIG_ACPI_HOTPLUG */
>> };
>>
>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>> 1) pre_release(): optional step to mark device going to be removed/busy
>> 2) release(): reclaim device from running system
>> 3) post_release(): rollback if cancelled by user or error happened
>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>> 5) unconfigure(): remove devices from running system
>> 6) post_unconfigure(): free resources used by devices
>>
>> In this way, we can easily rollback if error happens.
>> How do you think of this solution, any suggestion ? I think we can achieve
>> a better way for sharing ideas. :)
>
> Yes, sharing idea is good. :) I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.

Good idea ! we also implement a hot-plug operation in 3 phases:
1) acpihp_drv_pre_execute
2) acpihp_drv_execute
3) acpihp_drv_post_execute
you may refer to :
https://lkml.org/lkml/2012/11/4/79

>
> 1. Validate phase - Verify if the request is a supported operation. All
> known restrictions are verified at this phase. For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail.

Yes, we have done this in acpihp_drv_pre_execute, and check following things:

1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
when memory device remove;

2) Dependency check involved. For instance, if hot-add a memory device,
processor should be added first, otherwise it's not valid to this operation.

3) Race condition check. if the device and its dependent device is in hot-plug
process, another request will be denied.

No rollback is needed for the above checks.

>
> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> rolled-back in case of error or cancel.

In this phase, we introduce a state machine for the hot-plugble device,
please refer to:
https://lkml.org/lkml/2012/11/4/79

I think we have the same idea for the major framework, but the ACPI based
hot-plug framework implement it differently in detail, right ?

Thanks
Hanjun

>
> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> cannot be rolled-back. No error / cancel is allowed in this phase. For
> instance, eject operation is performed at this phase.
>
>
> Thanks,
> -Toshi
>


2012-11-29 09:58:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > Consider the following case:
> > > > > > > > > > >
> > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > >
> > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > > >
> > > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > > bound with a driver? If so, can we make the unbind operation to perform
> > > > > > > > > > unbind only?
> > > > > > > > >
> > > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > >
> > > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > > for other acpi devices?
> > > > > > > > >
> > > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > >
> > > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL. The
> > > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > >
> > > > > > > I would be careful. You may introduce some subtle locking-related issues
> > > > > > > this way.
> > > > > >
> > > > > > Right. This requires careful inspection and testing. As far as the
> > > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > > hot-plug. It is much simpler and solid if we serialize such operations.
> > > > > >
> > > > > > > Besides, there may be an alternative approach to all this. For example,
> > > > > > > what if we don't remove struct device objects on eject? The ACPI handles
> > > > > > > associated with them don't go away in that case after all, do they?
> > > > > >
> > > > > > Umm... Sorry, I am not getting your point. The issue is that we need
> > > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > > Otherwise, we end up ejecting online memory range.
> > > > >
> > > > > Yes, this is the major one. The minor issue, however, is a race condition
> > > > > between unbinding a driver from a device and removing the device if I
> > > > > understand it correctly. Which will go away automatically if the device is
> > > > > not removed in the first place. Or so I would think. :-)
> > > >
> > > > I see. I do not think whether or not the device is removed on eject
> > > > makes any difference here. The issue is that after driver_unbind() is
> > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > _EJ0.
> > >
> > > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > > acpi_eject_store() which is exposed through sysfs.
> >
> > Yes, that is correct.
> >
> > > If we disabled exposing
> > > acpi_eject_store() for memory devices, then the only way would be from the
> > > notify handler. So I wonder if driver_unbind() shouldn't just uninstall the
> > > notify handler for memory (so that memory eject events are simply dropped on
> > > the floor after unbinding the driver)?
> >
> > If driver_unbind() happens before an eject request, we do not have a
> > problem. acpi_eject_store() fails if a driver is not bound to the
> > device. acpi_memory_device_notify() fails as well.
> >
> > The race condition Wen pointed out (see the top of this email) is that
> > driver_unbind() may come in while eject operation is in-progress. This
> > is why I mentioned the following in previous email.
> >
> > > So, we basically need to either 1) serialize
> > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > during the operation.
>
> Forgot to mention. The 3rd option is what Greg said -- use the
> suppress_bind_attrs field. I think this is a good option to address
> this race condition for now. For a long term solution, we should have a
> better infrastructure in place to address such issue in general.

Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
friends and I think there's a way to address all of these problems
without big redesign (for now).

First, why don't we introduce an ACPI device flag (in the flags field of
struct acpi_device) called eject_forbidden or something like this such that:

(1) It will be clear by default.
(2) It may only be set by a driver's .add() routine if necessary.
(3) Once set, it may only be cleared by the driver's .remove() routine if
it's safe to physically remove the device after the .remove().

Then, after the .remove() (which must be successful) has returned, and the
flag is set, it will tell acpi_bus_remove() to return a specific error code
(such as -EBUSY or -EAGAIN). It doesn't matter if .remove() was called
earlier, because if it left the flag set, there's no way to clear it afterward
and acpi_bus_remove() will see it set anyway. I think the struct acpi_device
should be unregistered anyway if that error code is to be returned.

[By the way, do you know where we free the memory allocated for struct
acpi_device objects?]

Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
store it, but continue the trimming normally and finally it should return that
error code to acpi_bus_hot_remove_device().

Now, if acpi_bus_hot_remove_device() gets that error code, it should just
reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
we attempted to eject) and notify the firmware about the failure.

If we have that, then the memory hotplug driver would only need to set
flags.eject_forbidden in its .add() routine and make its .remove() routine
only clear that flag if it is safe to actually remove the memory.

Does this make sense to you?

[BTW, using _PS3 in acpi_bus_hot_remove_device() directly to power off the
device is a nonsense, because this method is not guaranteed to turn the power
off in the first place (it may just put the device into D3hot). If anything,
acpi_device_set_power() should be used for that, but even that is not
guaranteed to actually remove the power (power resources may be shared with
other devices, so in fact that operation should be done by acpi_bus_trim()
for each of the trimmed devices.]

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-29 10:10:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > the driver core remove function needs to always succeed. This means we need
> > > to know that the device can be successfully removed before acpi_bus_trim /
> > > acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> > > or SCI-initiated eject of memory devices fail e.g with:
> > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > >
> > > since the ACPI core goes ahead and ejects the device regardless of whether the
> > > the memory is still in use or not.
> > >
> > > For this reason a new acpi_device operation called prepare_remove is introduced.
> > > This operation should be registered for acpi devices whose removal (from kernel
> > > perspective) can fail. Memory devices fall in this category.
> > >
> > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > - preparation for removal i.e. perform part of removal that can fail. Should
> > > succeed for device and all its children.
> > > - if above step was successfull, proceed to actual device removal
> >
> > Hi Vasilis,
> > We met the same problem when we doing computer node hotplug, It is a good idea
> > to introduce prepare_remove before actual device removal.
> >
> > I think we could do more in prepare_remove, such as rollback. In most cases, we can
> > offline most of memory sections except kernel used pages now, should we rollback
> > and online the memory sections when prepare_remove failed ?
>
> I think hot-plug operation should have all-or-nothing semantics. That
> is, an operation should either complete successfully, or rollback to the
> original state.

That's correct.

> > As you may know, the ACPI based hotplug framework we are working on already addressed
> > this problem, and the way we slove this problem is a bit like yours.
> >
> > We introduce hp_ops in struct acpi_device_ops:
> > struct acpi_device_ops {
> > acpi_op_add add;
> > acpi_op_remove remove;
> > acpi_op_start start;
> > acpi_op_bind bind;
> > acpi_op_unbind unbind;
> > acpi_op_notify notify;
> > #ifdef CONFIG_ACPI_HOTPLUG
> > struct acpihp_dev_ops *hp_ops;
> > #endif /* CONFIG_ACPI_HOTPLUG */
> > };
> >
> > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > 1) pre_release(): optional step to mark device going to be removed/busy
> > 2) release(): reclaim device from running system
> > 3) post_release(): rollback if cancelled by user or error happened
> > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > 5) unconfigure(): remove devices from running system
> > 6) post_unconfigure(): free resources used by devices
> >
> > In this way, we can easily rollback if error happens.
> > How do you think of this solution, any suggestion ? I think we can achieve
> > a better way for sharing ideas. :)
>
> Yes, sharing idea is good. :) I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.
>
> 1. Validate phase - Verify if the request is a supported operation. All
> known restrictions are verified at this phase. For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail.

Actually, we can't do it this way, because the conditions may change between
the check and the execution. So the first phase needs to involve execution
to some extent, although only as far as it remains reversible.

> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> rolled-back in case of error or cancel.

I would just merge 1 and 2.

> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> cannot be rolled-back. No error / cancel is allowed in this phase. For
> instance, eject operation is performed at this phase.

Yup.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-29 11:05:08

by Vasilis Liaskovitis

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

Hi,

On Wed, Nov 28, 2012 at 06:15:42PM -0700, Toshi Kani wrote:
> On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > Consider the following case:
> > > > > > > > > > >
> > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > >
> > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > acpi_bus_hot_remove_device()
> > > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > > acpi_eject_store() which is exposed through sysfs.
> >
> > Yes, that is correct.
> >
> > > If we disabled exposing
> > > acpi_eject_store() for memory devices, then the only way would be from the
> > > notify handler. So I wonder if driver_unbind() shouldn't just uninstall the
> > > notify handler for memory (so that memory eject events are simply dropped on
> > > the floor after unbinding the driver)?
> >
> > If driver_unbind() happens before an eject request, we do not have a
> > problem. acpi_eject_store() fails if a driver is not bound to the
> > device. acpi_memory_device_notify() fails as well.
> >
> > The race condition Wen pointed out (see the top of this email) is that
> > driver_unbind() may come in while eject operation is in-progress. This
> > is why I mentioned the following in previous email.
> >
> > > So, we basically need to either 1) serialize
> > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > during the operation.
>
> Forgot to mention. The 3rd option is what Greg said -- use the
> suppress_bind_attrs field. I think this is a good option to address
> this race condition for now. For a long term solution, we should have a
> better infrastructure in place to address such issue in general.

I like the suppress_bind_attrs idea, I 'll take a look.

As I said for option 2), acpi_bus_remove could check for driver presence.
But It's more a quick hack to abort the eject (the race with unbind can still
happen, but acpi_bus_remove can now detect it later in the eject path).
Something like:

static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
{
+ int ret;
if (!dev)
return -EINVAL;

dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
+
+ if (dev->driver && dev->driver->ops.prepare_remove) {
+ ret = dev->driver->ops.prepare_remove(dev);
+ if (ret)
+ return ret;
+ }
+ else if (!dev->driver)
+ return -ENODEV;
device_release_driver(&dev->dev);

thanks,

- Vasilis

2012-11-29 11:30:39

by Vasilis Liaskovitis

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > >
> > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > >
> > > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > > > >
[...]
> Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> friends and I think there's a way to address all of these problems
> without big redesign (for now).
>
> First, why don't we introduce an ACPI device flag (in the flags field of
> struct acpi_device) called eject_forbidden or something like this such that:
>
> (1) It will be clear by default.
> (2) It may only be set by a driver's .add() routine if necessary.
> (3) Once set, it may only be cleared by the driver's .remove() routine if
> it's safe to physically remove the device after the .remove().
>
> Then, after the .remove() (which must be successful) has returned, and the
> flag is set, it will tell acpi_bus_remove() to return a specific error code
> (such as -EBUSY or -EAGAIN). It doesn't matter if .remove() was called
> earlier, because if it left the flag set, there's no way to clear it afterward
> and acpi_bus_remove() will see it set anyway. I think the struct acpi_device
> should be unregistered anyway if that error code is to be returned.
>
> [By the way, do you know where we free the memory allocated for struct
> acpi_device objects?]
>
> Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> store it, but continue the trimming normally and finally it should return that
> error code to acpi_bus_hot_remove_device().

Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
fails). Trimming is not continued.

Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
error. Is this the desired behaviour that we want to keep for bus_trim? (This is
more a general question, not specific to the eject_forbidden suggestion)

>
> Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> we attempted to eject) and notify the firmware about the failure.

sounds like this rollback needs to be implemented in any solution we choose
to implement, correct?

>
> If we have that, then the memory hotplug driver would only need to set
> flags.eject_forbidden in its .add() routine and make its .remove() routine
> only clear that flag if it is safe to actually remove the memory.
>

But when .remove op is called, we are already in the irreversible/error-free
removal (final removal step).
Maybe we need to reset eject_forbidden in a prepare_remove operation which
handles the removal part that can fail ?

thanks,

- Vasilis

2012-11-29 11:36:43

by Vasilis Liaskovitis

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On Thu, Nov 29, 2012 at 11:15:31AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > We met the same problem when we doing computer node hotplug, It is a good idea
> > > to introduce prepare_remove before actual device removal.
> > >
> > > I think we could do more in prepare_remove, such as rollback. In most cases, we can
> > > offline most of memory sections except kernel used pages now, should we rollback
> > > and online the memory sections when prepare_remove failed ?
> >
> > I think hot-plug operation should have all-or-nothing semantics. That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
>
> That's correct.
>
> > > As you may know, the ACPI based hotplug framework we are working on already addressed
> > > this problem, and the way we slove this problem is a bit like yours.
> > >
> > > We introduce hp_ops in struct acpi_device_ops:
> > > struct acpi_device_ops {
> > > acpi_op_add add;
> > > acpi_op_remove remove;
> > > acpi_op_start start;
> > > acpi_op_bind bind;
> > > acpi_op_unbind unbind;
> > > acpi_op_notify notify;
> > > #ifdef CONFIG_ACPI_HOTPLUG
> > > struct acpihp_dev_ops *hp_ops;
> > > #endif /* CONFIG_ACPI_HOTPLUG */
> > > };
> > >
> > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > 2) release(): reclaim device from running system
> > > 3) post_release(): rollback if cancelled by user or error happened
> > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > 5) unconfigure(): remove devices from running system
> > > 6) post_unconfigure(): free resources used by devices
> > >
> > > In this way, we can easily rollback if error happens.
> > > How do you think of this solution, any suggestion ? I think we can achieve
> > > a better way for sharing ideas. :)
> >
> > Yes, sharing idea is good. :) I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> >
> > 1. Validate phase - Verify if the request is a supported operation. All
> > known restrictions are verified at this phase. For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail.
>
> Actually, we can't do it this way, because the conditions may change between
> the check and the execution. So the first phase needs to involve execution
> to some extent, although only as far as it remains reversible.
>
> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
>
> I would just merge 1 and 2.

I agree steps 1 and 2 can be merged, at least for the current ACPI framework.
E.g. for memory hotplug, the mm function we call for memory removal
(remove_memory) handles both these steps.

The new ACPI framework could perhaps expand the operations as Hanjun described,
if it makes sense.

thanks,

- Vasilis

2012-11-29 16:51:47

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thu, 2012-11-29 at 11:03 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > If we disabled exposing
> > > > acpi_eject_store() for memory devices, then the only way would be from the
> > > > notify handler. So I wonder if driver_unbind() shouldn't just uninstall the
> > > > notify handler for memory (so that memory eject events are simply dropped on
> > > > the floor after unbinding the driver)?
> > >
> > > If driver_unbind() happens before an eject request, we do not have a
> > > problem. acpi_eject_store() fails if a driver is not bound to the
> > > device. acpi_memory_device_notify() fails as well.
> > >
> > > The race condition Wen pointed out (see the top of this email) is that
> > > driver_unbind() may come in while eject operation is in-progress. This
> > > is why I mentioned the following in previous email.
> > >
> > > > So, we basically need to either 1) serialize
> > > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > during the operation.
> >
> > Forgot to mention. The 3rd option is what Greg said -- use the
> > suppress_bind_attrs field. I think this is a good option to address
> > this race condition for now. For a long term solution, we should have a
> > better infrastructure in place to address such issue in general.
>
> Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> friends and I think there's a way to address all of these problems
> without big redesign (for now).
>
> First, why don't we introduce an ACPI device flag (in the flags field of
> struct acpi_device) called eject_forbidden or something like this such that:
>
> (1) It will be clear by default.
> (2) It may only be set by a driver's .add() routine if necessary.
> (3) Once set, it may only be cleared by the driver's .remove() routine if
> it's safe to physically remove the device after the .remove().
>
> Then, after the .remove() (which must be successful) has returned, and the
> flag is set, it will tell acpi_bus_remove() to return a specific error code
> (such as -EBUSY or -EAGAIN). It doesn't matter if .remove() was called
> earlier, because if it left the flag set, there's no way to clear it afterward
> and acpi_bus_remove() will see it set anyway. I think the struct acpi_device
> should be unregistered anyway if that error code is to be returned.

I like the idea! It's a good intermediate solution if we need to keep
the bind/unbind interface. That said, I still prefer to go with option
3) for now. I do not see much reason to keep the bind/unbind interface
for ACPI hotplug drivers, and it seems that the semantics of .remove()
is .remove_driver(), not .remove_device() for driver_unbind(). So, I
think we should disable the bind/unbind interface until we settle this
issue.

> [By the way, do you know where we free the memory allocated for struct
> acpi_device objects?]

device_release() -> acpi_device_release().

> Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> store it, but continue the trimming normally and finally it should return that
> error code to acpi_bus_hot_remove_device().
>
> Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> we attempted to eject) and notify the firmware about the failure.
>
> If we have that, then the memory hotplug driver would only need to set
> flags.eject_forbidden in its .add() routine and make its .remove() routine
> only clear that flag if it is safe to actually remove the memory.
>
> Does this make sense to you?

In high-level, yes. Rollback strategy, such as we should continue the
trimming after an error, is something we need to think about along with
the framework design. I think we need a good framework before
implementing rollback.

> [BTW, using _PS3 in acpi_bus_hot_remove_device() directly to power off the
> device is a nonsense, because this method is not guaranteed to turn the power
> off in the first place (it may just put the device into D3hot). If anything,
> acpi_device_set_power() should be used for that, but even that is not
> guaranteed to actually remove the power (power resources may be shared with
> other devices, so in fact that operation should be done by acpi_bus_trim()
> for each of the trimmed devices.]

I agree. I cannot tell for other vendor's implementation, but I expect
that _EJ0 takes care of the power state after it is ejected.

Thanks,
-Toshi

2012-11-29 16:53:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thursday, November 29, 2012 12:30:30 PM Vasilis Liaskovitis wrote:
> On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > >
> > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > >
> > > > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > > > > >
> [...]
> > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > friends and I think there's a way to address all of these problems
> > without big redesign (for now).
> >
> > First, why don't we introduce an ACPI device flag (in the flags field of
> > struct acpi_device) called eject_forbidden or something like this such that:
> >
> > (1) It will be clear by default.
> > (2) It may only be set by a driver's .add() routine if necessary.
> > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > it's safe to physically remove the device after the .remove().
> >
> > Then, after the .remove() (which must be successful) has returned, and the
> > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > (such as -EBUSY or -EAGAIN). It doesn't matter if .remove() was called
> > earlier, because if it left the flag set, there's no way to clear it afterward
> > and acpi_bus_remove() will see it set anyway. I think the struct acpi_device
> > should be unregistered anyway if that error code is to be returned.
> >
> > [By the way, do you know where we free the memory allocated for struct
> > acpi_device objects?]
> >
> > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > store it, but continue the trimming normally and finally it should return that
> > error code to acpi_bus_hot_remove_device().
>
> Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> fails). Trimming is not continued.
>
> Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> more a general question, not specific to the eject_forbidden suggestion)
>
> >
> > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > we attempted to eject) and notify the firmware about the failure.
>
> sounds like this rollback needs to be implemented in any solution we choose
> to implement, correct?
>
> >
> > If we have that, then the memory hotplug driver would only need to set
> > flags.eject_forbidden in its .add() routine and make its .remove() routine
> > only clear that flag if it is safe to actually remove the memory.
> >
>
> But when .remove op is called, we are already in the irreversible/error-free
> removal (final removal step).

Why so? What prevents us from doing a bus scan again and binding the driver
again to the device? Is .remove() doing something to the firmware?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-29 17:11:41

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > > the driver core remove function needs to always succeed. This means we need
> > > > to know that the device can be successfully removed before acpi_bus_trim /
> > > > acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> > > > or SCI-initiated eject of memory devices fail e.g with:
> > > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > >
> > > > since the ACPI core goes ahead and ejects the device regardless of whether the
> > > > the memory is still in use or not.
> > > >
> > > > For this reason a new acpi_device operation called prepare_remove is introduced.
> > > > This operation should be registered for acpi devices whose removal (from kernel
> > > > perspective) can fail. Memory devices fall in this category.
> > > >
> > > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > > - preparation for removal i.e. perform part of removal that can fail. Should
> > > > succeed for device and all its children.
> > > > - if above step was successfull, proceed to actual device removal
> > >
> > > Hi Vasilis,
> > > We met the same problem when we doing computer node hotplug, It is a good idea
> > > to introduce prepare_remove before actual device removal.
> > >
> > > I think we could do more in prepare_remove, such as rollback. In most cases, we can
> > > offline most of memory sections except kernel used pages now, should we rollback
> > > and online the memory sections when prepare_remove failed ?
> >
> > I think hot-plug operation should have all-or-nothing semantics. That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
>
> That's correct.
>
> > > As you may know, the ACPI based hotplug framework we are working on already addressed
> > > this problem, and the way we slove this problem is a bit like yours.
> > >
> > > We introduce hp_ops in struct acpi_device_ops:
> > > struct acpi_device_ops {
> > > acpi_op_add add;
> > > acpi_op_remove remove;
> > > acpi_op_start start;
> > > acpi_op_bind bind;
> > > acpi_op_unbind unbind;
> > > acpi_op_notify notify;
> > > #ifdef CONFIG_ACPI_HOTPLUG
> > > struct acpihp_dev_ops *hp_ops;
> > > #endif /* CONFIG_ACPI_HOTPLUG */
> > > };
> > >
> > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > 2) release(): reclaim device from running system
> > > 3) post_release(): rollback if cancelled by user or error happened
> > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > 5) unconfigure(): remove devices from running system
> > > 6) post_unconfigure(): free resources used by devices
> > >
> > > In this way, we can easily rollback if error happens.
> > > How do you think of this solution, any suggestion ? I think we can achieve
> > > a better way for sharing ideas. :)
> >
> > Yes, sharing idea is good. :) I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> >
> > 1. Validate phase - Verify if the request is a supported operation. All
> > known restrictions are verified at this phase. For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail.
>
> Actually, we can't do it this way, because the conditions may change between
> the check and the execution. So the first phase needs to involve execution
> to some extent, although only as far as it remains reversible.

For memory hot-remove, we can check if the target memory ranges are
within ZONE_MOVABLE. We should not allow user to change this setup
during hot-remove operation. Other things may be to check if a target
node contains cpu0 (until it is supported), the console UART (assuming
we cannot delete it), etc. We should avoid doing rollback as much as we
can.

Thanks,
-Toshi


> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
>
> I would just merge 1 and 2.
>
> > 3. Commit phase - Perform the final hot-add / hot-remove operation that
> > cannot be rolled-back. No error / cancel is allowed in this phase. For
> > instance, eject operation is performed at this phase.
>
> Yup.
>
> Thanks,
> Rafael
>
>

2012-11-29 17:52:40

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thu, 2012-11-29 at 12:04 +0100, Vasilis Liaskovitis wrote:
> Hi,
>
> On Wed, Nov 28, 2012 at 06:15:42PM -0700, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > >
> > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > >
> > > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > > acpi_bus_hot_remove_device()
> > > > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > > > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > > > acpi_eject_store() which is exposed through sysfs.
> > >
> > > Yes, that is correct.
> > >
> > > > If we disabled exposing
> > > > acpi_eject_store() for memory devices, then the only way would be from the
> > > > notify handler. So I wonder if driver_unbind() shouldn't just uninstall the
> > > > notify handler for memory (so that memory eject events are simply dropped on
> > > > the floor after unbinding the driver)?
> > >
> > > If driver_unbind() happens before an eject request, we do not have a
> > > problem. acpi_eject_store() fails if a driver is not bound to the
> > > device. acpi_memory_device_notify() fails as well.
> > >
> > > The race condition Wen pointed out (see the top of this email) is that
> > > driver_unbind() may come in while eject operation is in-progress. This
> > > is why I mentioned the following in previous email.
> > >
> > > > So, we basically need to either 1) serialize
> > > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > during the operation.
> >
> > Forgot to mention. The 3rd option is what Greg said -- use the
> > suppress_bind_attrs field. I think this is a good option to address
> > this race condition for now. For a long term solution, we should have a
> > better infrastructure in place to address such issue in general.
>
> I like the suppress_bind_attrs idea, I 'll take a look.

Great!

> As I said for option 2), acpi_bus_remove could check for driver presence.
> But It's more a quick hack to abort the eject (the race with unbind can still
> happen, but acpi_bus_remove can now detect it later in the eject path).
> Something like:
>
> static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
> {
> + int ret;
> if (!dev)
> return -EINVAL;
>
> dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
> +
> + if (dev->driver && dev->driver->ops.prepare_remove) {
> + ret = dev->driver->ops.prepare_remove(dev);
> + if (ret)
> + return ret;
> + }
> + else if (!dev->driver)
> + return -ENODEV;
> device_release_driver(&dev->dev);

Yes, that's what I had in mind along with device_lock(). I think the
lock is necessary to close the window.
http://www.spinics.net/lists/linux-mm/msg46973.html

But as I mentioned in other email, I prefer option 3 with
suppress_bind_attrs. So, yes, please take a look to see how it works
out.

Thanks,
-Toshi

2012-11-29 18:04:58

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > >
> > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > >
> > > > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > > > > >
> [...]
> > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > friends and I think there's a way to address all of these problems
> > without big redesign (for now).
> >
> > First, why don't we introduce an ACPI device flag (in the flags field of
> > struct acpi_device) called eject_forbidden or something like this such that:
> >
> > (1) It will be clear by default.
> > (2) It may only be set by a driver's .add() routine if necessary.
> > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > it's safe to physically remove the device after the .remove().
> >
> > Then, after the .remove() (which must be successful) has returned, and the
> > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > (such as -EBUSY or -EAGAIN). It doesn't matter if .remove() was called
> > earlier, because if it left the flag set, there's no way to clear it afterward
> > and acpi_bus_remove() will see it set anyway. I think the struct acpi_device
> > should be unregistered anyway if that error code is to be returned.
> >
> > [By the way, do you know where we free the memory allocated for struct
> > acpi_device objects?]
> >
> > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > store it, but continue the trimming normally and finally it should return that
> > error code to acpi_bus_hot_remove_device().
>
> Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> fails). Trimming is not continued.
>
> Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> more a general question, not specific to the eject_forbidden suggestion)

Your change makes sense to me. At least until we have rollback code in
place, we need to fail as soon as we hit an error.

> > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > we attempted to eject) and notify the firmware about the failure.
>
> sounds like this rollback needs to be implemented in any solution we choose
> to implement, correct?

Yes, rollback is necessary. But I do not think we need to include it
into your patch, though.

Thanks,
-Toshi

> > If we have that, then the memory hotplug driver would only need to set
> > flags.eject_forbidden in its .add() routine and make its .remove() routine
> > only clear that flag if it is safe to actually remove the memory.
> >
>
> But when .remove op is called, we are already in the irreversible/error-free
> removal (final removal step).
> Maybe we need to reset eject_forbidden in a prepare_remove operation which
> handles the removal part that can fail ?
>
> thanks,
>
> - Vasilis

2012-11-29 20:21:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > > > > > >
> > [...]
> > > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > > friends and I think there's a way to address all of these problems
> > > without big redesign (for now).
> > >
> > > First, why don't we introduce an ACPI device flag (in the flags field of
> > > struct acpi_device) called eject_forbidden or something like this such that:
> > >
> > > (1) It will be clear by default.
> > > (2) It may only be set by a driver's .add() routine if necessary.
> > > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > > it's safe to physically remove the device after the .remove().
> > >
> > > Then, after the .remove() (which must be successful) has returned, and the
> > > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > > (such as -EBUSY or -EAGAIN). It doesn't matter if .remove() was called
> > > earlier, because if it left the flag set, there's no way to clear it afterward
> > > and acpi_bus_remove() will see it set anyway. I think the struct acpi_device
> > > should be unregistered anyway if that error code is to be returned.
> > >
> > > [By the way, do you know where we free the memory allocated for struct
> > > acpi_device objects?]
> > >
> > > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > > store it, but continue the trimming normally and finally it should return that
> > > error code to acpi_bus_hot_remove_device().
> >
> > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > fails). Trimming is not continued.
> >
> > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > more a general question, not specific to the eject_forbidden suggestion)
>
> Your change makes sense to me. At least until we have rollback code in
> place, we need to fail as soon as we hit an error.

Are you sure this makes sense? What happens to the devices that we have
trimmed already and then there's an error? Looks like they are just unusable
going forward, aren't they?

> > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > we attempted to eject) and notify the firmware about the failure.
> >
> > sounds like this rollback needs to be implemented in any solution we choose
> > to implement, correct?
>
> Yes, rollback is necessary. But I do not think we need to include it
> into your patch, though.

As the first step, we should just trim everything and then return an error
code in my opinion.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-29 20:25:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > > > the driver core remove function needs to always succeed. This means we need
> > > > > to know that the device can be successfully removed before acpi_bus_trim /
> > > > > acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> > > > > or SCI-initiated eject of memory devices fail e.g with:
> > > > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > >
> > > > > since the ACPI core goes ahead and ejects the device regardless of whether the
> > > > > the memory is still in use or not.
> > > > >
> > > > > For this reason a new acpi_device operation called prepare_remove is introduced.
> > > > > This operation should be registered for acpi devices whose removal (from kernel
> > > > > perspective) can fail. Memory devices fall in this category.
> > > > >
> > > > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > > > - preparation for removal i.e. perform part of removal that can fail. Should
> > > > > succeed for device and all its children.
> > > > > - if above step was successfull, proceed to actual device removal
> > > >
> > > > Hi Vasilis,
> > > > We met the same problem when we doing computer node hotplug, It is a good idea
> > > > to introduce prepare_remove before actual device removal.
> > > >
> > > > I think we could do more in prepare_remove, such as rollback. In most cases, we can
> > > > offline most of memory sections except kernel used pages now, should we rollback
> > > > and online the memory sections when prepare_remove failed ?
> > >
> > > I think hot-plug operation should have all-or-nothing semantics. That
> > > is, an operation should either complete successfully, or rollback to the
> > > original state.
> >
> > That's correct.
> >
> > > > As you may know, the ACPI based hotplug framework we are working on already addressed
> > > > this problem, and the way we slove this problem is a bit like yours.
> > > >
> > > > We introduce hp_ops in struct acpi_device_ops:
> > > > struct acpi_device_ops {
> > > > acpi_op_add add;
> > > > acpi_op_remove remove;
> > > > acpi_op_start start;
> > > > acpi_op_bind bind;
> > > > acpi_op_unbind unbind;
> > > > acpi_op_notify notify;
> > > > #ifdef CONFIG_ACPI_HOTPLUG
> > > > struct acpihp_dev_ops *hp_ops;
> > > > #endif /* CONFIG_ACPI_HOTPLUG */
> > > > };
> > > >
> > > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > > 2) release(): reclaim device from running system
> > > > 3) post_release(): rollback if cancelled by user or error happened
> > > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > > 5) unconfigure(): remove devices from running system
> > > > 6) post_unconfigure(): free resources used by devices
> > > >
> > > > In this way, we can easily rollback if error happens.
> > > > How do you think of this solution, any suggestion ? I think we can achieve
> > > > a better way for sharing ideas. :)
> > >
> > > Yes, sharing idea is good. :) I do not know if we need all 6 steps (I
> > > have not looked at all your changes yet..), but in my mind, a hot-plug
> > > operation should be composed with the following 3 phases.
> > >
> > > 1. Validate phase - Verify if the request is a supported operation. All
> > > known restrictions are verified at this phase. For instance, if a
> > > hot-remove request involves kernel memory, it is failed in this phase.
> > > Since this phase makes no change, no rollback is necessary to fail.
> >
> > Actually, we can't do it this way, because the conditions may change between
> > the check and the execution. So the first phase needs to involve execution
> > to some extent, although only as far as it remains reversible.
>
> For memory hot-remove, we can check if the target memory ranges are
> within ZONE_MOVABLE. We should not allow user to change this setup
> during hot-remove operation. Other things may be to check if a target
> node contains cpu0 (until it is supported), the console UART (assuming
> we cannot delete it), etc. We should avoid doing rollback as much as we
> can.

Yes, we can make some checks upfront as an optimization and fail early if
the conditions are not met, but for correctness we need to repeat those
checks later anyway. Once we've decided to go for the eject, the conditions
must hold whatever happens.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-29 20:47:11

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > > > > > > >
> > > [...]
> > > > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > > > friends and I think there's a way to address all of these problems
> > > > without big redesign (for now).
> > > >
> > > > First, why don't we introduce an ACPI device flag (in the flags field of
> > > > struct acpi_device) called eject_forbidden or something like this such that:
> > > >
> > > > (1) It will be clear by default.
> > > > (2) It may only be set by a driver's .add() routine if necessary.
> > > > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > > > it's safe to physically remove the device after the .remove().
> > > >
> > > > Then, after the .remove() (which must be successful) has returned, and the
> > > > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > > > (such as -EBUSY or -EAGAIN). It doesn't matter if .remove() was called
> > > > earlier, because if it left the flag set, there's no way to clear it afterward
> > > > and acpi_bus_remove() will see it set anyway. I think the struct acpi_device
> > > > should be unregistered anyway if that error code is to be returned.
> > > >
> > > > [By the way, do you know where we free the memory allocated for struct
> > > > acpi_device objects?]
> > > >
> > > > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > > > store it, but continue the trimming normally and finally it should return that
> > > > error code to acpi_bus_hot_remove_device().
> > >
> > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > fails). Trimming is not continued.
> > >
> > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > more a general question, not specific to the eject_forbidden suggestion)
> >
> > Your change makes sense to me. At least until we have rollback code in
> > place, we need to fail as soon as we hit an error.
>
> Are you sure this makes sense? What happens to the devices that we have
> trimmed already and then there's an error? Looks like they are just unusable
> going forward, aren't they?

Yes, the devices trimmed already are released from the kernel, and their
memory ranges become unusable. This is bad. But I do not think we
should trim further to make more devices unusable after an error.


> > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > we attempted to eject) and notify the firmware about the failure.
> > >
> > > sounds like this rollback needs to be implemented in any solution we choose
> > > to implement, correct?
> >
> > Yes, rollback is necessary. But I do not think we need to include it
> > into your patch, though.
>
> As the first step, we should just trim everything and then return an error
> code in my opinion.

But we cannot trim devices with kernel memory.

Thanks,
-Toshi

2012-11-29 20:47:58

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > > > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > > > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > > > > the driver core remove function needs to always succeed. This means we need
> > > > > > to know that the device can be successfully removed before acpi_bus_trim /
> > > > > > acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> > > > > > or SCI-initiated eject of memory devices fail e.g with:
> > > > > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > >
> > > > > > since the ACPI core goes ahead and ejects the device regardless of whether the
> > > > > > the memory is still in use or not.
> > > > > >
> > > > > > For this reason a new acpi_device operation called prepare_remove is introduced.
> > > > > > This operation should be registered for acpi devices whose removal (from kernel
> > > > > > perspective) can fail. Memory devices fall in this category.
> > > > > >
> > > > > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > > > > - preparation for removal i.e. perform part of removal that can fail. Should
> > > > > > succeed for device and all its children.
> > > > > > - if above step was successfull, proceed to actual device removal
> > > > >
> > > > > Hi Vasilis,
> > > > > We met the same problem when we doing computer node hotplug, It is a good idea
> > > > > to introduce prepare_remove before actual device removal.
> > > > >
> > > > > I think we could do more in prepare_remove, such as rollback. In most cases, we can
> > > > > offline most of memory sections except kernel used pages now, should we rollback
> > > > > and online the memory sections when prepare_remove failed ?
> > > >
> > > > I think hot-plug operation should have all-or-nothing semantics. That
> > > > is, an operation should either complete successfully, or rollback to the
> > > > original state.
> > >
> > > That's correct.
> > >
> > > > > As you may know, the ACPI based hotplug framework we are working on already addressed
> > > > > this problem, and the way we slove this problem is a bit like yours.
> > > > >
> > > > > We introduce hp_ops in struct acpi_device_ops:
> > > > > struct acpi_device_ops {
> > > > > acpi_op_add add;
> > > > > acpi_op_remove remove;
> > > > > acpi_op_start start;
> > > > > acpi_op_bind bind;
> > > > > acpi_op_unbind unbind;
> > > > > acpi_op_notify notify;
> > > > > #ifdef CONFIG_ACPI_HOTPLUG
> > > > > struct acpihp_dev_ops *hp_ops;
> > > > > #endif /* CONFIG_ACPI_HOTPLUG */
> > > > > };
> > > > >
> > > > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > > > 2) release(): reclaim device from running system
> > > > > 3) post_release(): rollback if cancelled by user or error happened
> > > > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > > > 5) unconfigure(): remove devices from running system
> > > > > 6) post_unconfigure(): free resources used by devices
> > > > >
> > > > > In this way, we can easily rollback if error happens.
> > > > > How do you think of this solution, any suggestion ? I think we can achieve
> > > > > a better way for sharing ideas. :)
> > > >
> > > > Yes, sharing idea is good. :) I do not know if we need all 6 steps (I
> > > > have not looked at all your changes yet..), but in my mind, a hot-plug
> > > > operation should be composed with the following 3 phases.
> > > >
> > > > 1. Validate phase - Verify if the request is a supported operation. All
> > > > known restrictions are verified at this phase. For instance, if a
> > > > hot-remove request involves kernel memory, it is failed in this phase.
> > > > Since this phase makes no change, no rollback is necessary to fail.
> > >
> > > Actually, we can't do it this way, because the conditions may change between
> > > the check and the execution. So the first phase needs to involve execution
> > > to some extent, although only as far as it remains reversible.
> >
> > For memory hot-remove, we can check if the target memory ranges are
> > within ZONE_MOVABLE. We should not allow user to change this setup
> > during hot-remove operation. Other things may be to check if a target
> > node contains cpu0 (until it is supported), the console UART (assuming
> > we cannot delete it), etc. We should avoid doing rollback as much as we
> > can.
>
> Yes, we can make some checks upfront as an optimization and fail early if
> the conditions are not met, but for correctness we need to repeat those
> checks later anyway. Once we've decided to go for the eject, the conditions
> must hold whatever happens.

Agreed.

Thanks,
-Toshi

2012-11-29 21:04:45

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
> On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > > > 1. Validate phase - Verify if the request is a supported operation. All
> > > > > known restrictions are verified at this phase. For instance, if a
> > > > > hot-remove request involves kernel memory, it is failed in this phase.
> > > > > Since this phase makes no change, no rollback is necessary to fail.
> > > >
> > > > Actually, we can't do it this way, because the conditions may change between
> > > > the check and the execution. So the first phase needs to involve execution
> > > > to some extent, although only as far as it remains reversible.
> > >
> > > For memory hot-remove, we can check if the target memory ranges are
> > > within ZONE_MOVABLE. We should not allow user to change this setup
> > > during hot-remove operation. Other things may be to check if a target
> > > node contains cpu0 (until it is supported), the console UART (assuming
> > > we cannot delete it), etc. We should avoid doing rollback as much as we
> > > can.
> >
> > Yes, we can make some checks upfront as an optimization and fail early if
> > the conditions are not met, but for correctness we need to repeat those
> > checks later anyway. Once we've decided to go for the eject, the conditions
> > must hold whatever happens.
>
> Agreed.

BTW, it is not an optimization I am after for this phase. There are
many error cases during hot-plug operations. It is difficult to assure
that rollback is successful for every error condition in terms of
testing and maintaining the code. So, it is easier to fail beforehand
when possible.

Thanks,
-Toshi

2012-11-29 21:18:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > > On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > > > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > CPUa CPUb
> > > > > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > > > > unbind it from the driver
> > > > > > > > > > > > > > > > acpi_bus_hot_remove_device()
> > > > > > > > > > > > > > >
> > > > [...]
> > > > > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > > > > friends and I think there's a way to address all of these problems
> > > > > without big redesign (for now).
> > > > >
> > > > > First, why don't we introduce an ACPI device flag (in the flags field of
> > > > > struct acpi_device) called eject_forbidden or something like this such that:
> > > > >
> > > > > (1) It will be clear by default.
> > > > > (2) It may only be set by a driver's .add() routine if necessary.
> > > > > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > > > > it's safe to physically remove the device after the .remove().
> > > > >
> > > > > Then, after the .remove() (which must be successful) has returned, and the
> > > > > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > > > > (such as -EBUSY or -EAGAIN). It doesn't matter if .remove() was called
> > > > > earlier, because if it left the flag set, there's no way to clear it afterward
> > > > > and acpi_bus_remove() will see it set anyway. I think the struct acpi_device
> > > > > should be unregistered anyway if that error code is to be returned.
> > > > >
> > > > > [By the way, do you know where we free the memory allocated for struct
> > > > > acpi_device objects?]
> > > > >
> > > > > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > > > > store it, but continue the trimming normally and finally it should return that
> > > > > error code to acpi_bus_hot_remove_device().
> > > >
> > > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > > fails). Trimming is not continued.
> > > >
> > > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > > more a general question, not specific to the eject_forbidden suggestion)
> > >
> > > Your change makes sense to me. At least until we have rollback code in
> > > place, we need to fail as soon as we hit an error.
> >
> > Are you sure this makes sense? What happens to the devices that we have
> > trimmed already and then there's an error? Looks like they are just unusable
> > going forward, aren't they?
>
> Yes, the devices trimmed already are released from the kernel, and their
> memory ranges become unusable. This is bad. But I do not think we
> should trim further to make more devices unusable after an error.
>
>
> > > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > > we attempted to eject) and notify the firmware about the failure.
> > > >
> > > > sounds like this rollback needs to be implemented in any solution we choose
> > > > to implement, correct?
> > >
> > > Yes, rollback is necessary. But I do not think we need to include it
> > > into your patch, though.
> >
> > As the first step, we should just trim everything and then return an error
> > code in my opinion.
>
> But we cannot trim devices with kernel memory.

Well, let's put it this way: If we started a trim, we should just do it
completely, in which case we know we can go for the eject, or we should
roll it back completely. Now, if you just break the trim on first error,
the complete rollback is kind of problematic. It should be doable, but
it won't be easy. On the other hand, if you go for the full trim,
doing a rollback is trivial, it's as though you have reinserted the whole
stuff.

Now, that need not harm functionality, and that's why I proposed the
eject_forbidden flag, so that .remove() can say "I'm not done, please
rollback", in which case the device can happily function going forward,
even if we don't rebind the driver to it.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-29 21:20:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On Thursday, November 29, 2012 01:56:17 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > > > > 1. Validate phase - Verify if the request is a supported operation. All
> > > > > > known restrictions are verified at this phase. For instance, if a
> > > > > > hot-remove request involves kernel memory, it is failed in this phase.
> > > > > > Since this phase makes no change, no rollback is necessary to fail.
> > > > >
> > > > > Actually, we can't do it this way, because the conditions may change between
> > > > > the check and the execution. So the first phase needs to involve execution
> > > > > to some extent, although only as far as it remains reversible.
> > > >
> > > > For memory hot-remove, we can check if the target memory ranges are
> > > > within ZONE_MOVABLE. We should not allow user to change this setup
> > > > during hot-remove operation. Other things may be to check if a target
> > > > node contains cpu0 (until it is supported), the console UART (assuming
> > > > we cannot delete it), etc. We should avoid doing rollback as much as we
> > > > can.
> > >
> > > Yes, we can make some checks upfront as an optimization and fail early if
> > > the conditions are not met, but for correctness we need to repeat those
> > > checks later anyway. Once we've decided to go for the eject, the conditions
> > > must hold whatever happens.
> >
> > Agreed.
>
> BTW, it is not an optimization I am after for this phase. There are
> many error cases during hot-plug operations. It is difficult to assure
> that rollback is successful for every error condition in terms of
> testing and maintaining the code. So, it is easier to fail beforehand
> when possible.

OK, but as I said it is necessary to ensure that the conditions will be met
in the next phases as well if we don't fail.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-29 21:55:13

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > > > fails). Trimming is not continued.
> > > > >
> > > > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > > > more a general question, not specific to the eject_forbidden suggestion)
> > > >
> > > > Your change makes sense to me. At least until we have rollback code in
> > > > place, we need to fail as soon as we hit an error.
> > >
> > > Are you sure this makes sense? What happens to the devices that we have
> > > trimmed already and then there's an error? Looks like they are just unusable
> > > going forward, aren't they?
> >
> > Yes, the devices trimmed already are released from the kernel, and their
> > memory ranges become unusable. This is bad. But I do not think we
> > should trim further to make more devices unusable after an error.
> >
> >
> > > > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > > > we attempted to eject) and notify the firmware about the failure.
> > > > >
> > > > > sounds like this rollback needs to be implemented in any solution we choose
> > > > > to implement, correct?
> > > >
> > > > Yes, rollback is necessary. But I do not think we need to include it
> > > > into your patch, though.
> > >
> > > As the first step, we should just trim everything and then return an error
> > > code in my opinion.
> >
> > But we cannot trim devices with kernel memory.
>
> Well, let's put it this way: If we started a trim, we should just do it
> completely, in which case we know we can go for the eject, or we should
> roll it back completely. Now, if you just break the trim on first error,
> the complete rollback is kind of problematic. It should be doable, but
> it won't be easy. On the other hand, if you go for the full trim,
> doing a rollback is trivial, it's as though you have reinserted the whole
> stuff.

acpi_bus_check_add() skips initialization when an ACPI device already
has its associated acpi_device. So, I think it works either way.


> Now, that need not harm functionality, and that's why I proposed the
> eject_forbidden flag, so that .remove() can say "I'm not done, please
> rollback", in which case the device can happily function going forward,
> even if we don't rebind the driver to it.

A partially trimmed acpi_device is hard to rollback. acpi_device should
be either trimmed completely or intact. When a function failed to trim
an acpi_device, it needs to rollback its operation for the device before
returning an error. This is because only the failed function has enough
context to rollback when an error occurred in the middle of its
procedure.

Thanks,
-Toshi



2012-11-29 22:07:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > > > > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > > > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > > > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > > > > fails). Trimming is not continued.
> > > > > >
> > > > > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > > > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > > > > more a general question, not specific to the eject_forbidden suggestion)
> > > > >
> > > > > Your change makes sense to me. At least until we have rollback code in
> > > > > place, we need to fail as soon as we hit an error.
> > > >
> > > > Are you sure this makes sense? What happens to the devices that we have
> > > > trimmed already and then there's an error? Looks like they are just unusable
> > > > going forward, aren't they?
> > >
> > > Yes, the devices trimmed already are released from the kernel, and their
> > > memory ranges become unusable. This is bad. But I do not think we
> > > should trim further to make more devices unusable after an error.
> > >
> > >
> > > > > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > > > > we attempted to eject) and notify the firmware about the failure.
> > > > > >
> > > > > > sounds like this rollback needs to be implemented in any solution we choose
> > > > > > to implement, correct?
> > > > >
> > > > > Yes, rollback is necessary. But I do not think we need to include it
> > > > > into your patch, though.
> > > >
> > > > As the first step, we should just trim everything and then return an error
> > > > code in my opinion.
> > >
> > > But we cannot trim devices with kernel memory.
> >
> > Well, let's put it this way: If we started a trim, we should just do it
> > completely, in which case we know we can go for the eject, or we should
> > roll it back completely. Now, if you just break the trim on first error,
> > the complete rollback is kind of problematic. It should be doable, but
> > it won't be easy. On the other hand, if you go for the full trim,
> > doing a rollback is trivial, it's as though you have reinserted the whole
> > stuff.
>
> acpi_bus_check_add() skips initialization when an ACPI device already
> has its associated acpi_device. So, I think it works either way.

OK

> > Now, that need not harm functionality, and that's why I proposed the
> > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > rollback", in which case the device can happily function going forward,
> > even if we don't rebind the driver to it.
>
> A partially trimmed acpi_device is hard to rollback. acpi_device should
> be either trimmed completely or intact.

I may or may not agree, depending on what you mean by "trimmed". :-)

> When a function failed to trim
> an acpi_device, it needs to rollback its operation for the device before
> returning an error.

Unless it is .remove(), because .remove() is supposed to always succeed
(ie. unbind the driver from the device). However, it may signal the caller
that something's fishy, by setting a flag in the device object, for example.

> This is because only the failed function has enough
> context to rollback when an error occurred in the middle of its
> procedure.

Not really. If it actually removes the struct acpi_device then the caller
may run acpi_bus_scan() on that device if necessary. There may be a problem
if the device has an associated physical node (or more of them), but that
requires special care anyway.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-29 22:35:38

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
> On 2012/11/29 2:41, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >>> As discussed in https://patchwork.kernel.org/patch/1581581/
> >>> the driver core remove function needs to always succeed. This means we need
> >>> to know that the device can be successfully removed before acpi_bus_trim /
> >>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> >>> or SCI-initiated eject of memory devices fail e.g with:
> >>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> >>>
> >>> since the ACPI core goes ahead and ejects the device regardless of whether the
> >>> the memory is still in use or not.
> >>>
> >>> For this reason a new acpi_device operation called prepare_remove is introduced.
> >>> This operation should be registered for acpi devices whose removal (from kernel
> >>> perspective) can fail. Memory devices fall in this category.
> >>>
> >>> acpi_bus_remove() is changed to handle removal in 2 steps:
> >>> - preparation for removal i.e. perform part of removal that can fail. Should
> >>> succeed for device and all its children.
> >>> - if above step was successfull, proceed to actual device removal
> >>
> >> Hi Vasilis,
> >> We met the same problem when we doing computer node hotplug, It is a good idea
> >> to introduce prepare_remove before actual device removal.
> >>
> >> I think we could do more in prepare_remove, such as rollback. In most cases, we can
> >> offline most of memory sections except kernel used pages now, should we rollback
> >> and online the memory sections when prepare_remove failed ?
> >
> > I think hot-plug operation should have all-or-nothing semantics. That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
>
> Yes, we have the same point of view with you. We handle this problem in the ACPI
> based hot-plug framework as following:
> 1) hot add / hot remove complete successfully if no error happens;
> 2) automatic rollback to the original state if meets some error ;
> 3) rollback to the original if hot-plug operation cancelled by user ;

Cool!

> >> As you may know, the ACPI based hotplug framework we are working on already addressed
> >> this problem, and the way we slove this problem is a bit like yours.
> >>
> >> We introduce hp_ops in struct acpi_device_ops:
> >> struct acpi_device_ops {
> >> acpi_op_add add;
> >> acpi_op_remove remove;
> >> acpi_op_start start;
> >> acpi_op_bind bind;
> >> acpi_op_unbind unbind;
> >> acpi_op_notify notify;
> >> #ifdef CONFIG_ACPI_HOTPLUG
> >> struct acpihp_dev_ops *hp_ops;
> >> #endif /* CONFIG_ACPI_HOTPLUG */
> >> };
> >>
> >> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >> 1) pre_release(): optional step to mark device going to be removed/busy
> >> 2) release(): reclaim device from running system
> >> 3) post_release(): rollback if cancelled by user or error happened
> >> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >> 5) unconfigure(): remove devices from running system
> >> 6) post_unconfigure(): free resources used by devices
> >>
> >> In this way, we can easily rollback if error happens.
> >> How do you think of this solution, any suggestion ? I think we can achieve
> >> a better way for sharing ideas. :)
> >
> > Yes, sharing idea is good. :) I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
>
> Good idea ! we also implement a hot-plug operation in 3 phases:
> 1) acpihp_drv_pre_execute
> 2) acpihp_drv_execute
> 3) acpihp_drv_post_execute
> you may refer to :
> https://lkml.org/lkml/2012/11/4/79

Great. Yes, I will take a look.

> > 1. Validate phase - Verify if the request is a supported operation. All
> > known restrictions are verified at this phase. For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail.
>
> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
>
> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
> when memory device remove;

Agreed.

> 2) Dependency check involved. For instance, if hot-add a memory device,
> processor should be added first, otherwise it's not valid to this operation.

I think FW should be the one that assures such dependency. That is,
when a memory device object is marked as present/enabled/functioning, it
should be ready for the OS to use.

> 3) Race condition check. if the device and its dependent device is in hot-plug
> process, another request will be denied.

I agree that hot-plug operation should be serialized. I think another
request should be either queued or denied based on the caller's intent
(i.e. wait-ok or no-wait).

> No rollback is needed for the above checks.

Great.

> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
>
> In this phase, we introduce a state machine for the hot-plugble device,
> please refer to:
> https://lkml.org/lkml/2012/11/4/79
>
> I think we have the same idea for the major framework, but the ACPI based
> hot-plug framework implement it differently in detail, right ?

Yes, I am surprised with the similarity. What I described is something
we had implemented for other OS. I am still studying how best we can
improve the Linux hotplug code. :)

Thanks,
-Toshi

2012-11-29 23:25:47

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thu, 2012-11-29 at 23:11 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > >
> > > Well, let's put it this way: If we started a trim, we should just do it
> > > completely, in which case we know we can go for the eject, or we should
> > > roll it back completely. Now, if you just break the trim on first error,
> > > the complete rollback is kind of problematic. It should be doable, but
> > > it won't be easy. On the other hand, if you go for the full trim,
> > > doing a rollback is trivial, it's as though you have reinserted the whole
> > > stuff.
> >
> > acpi_bus_check_add() skips initialization when an ACPI device already
> > has its associated acpi_device. So, I think it works either way.
>
> OK
>
> > > Now, that need not harm functionality, and that's why I proposed the
> > > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > > rollback", in which case the device can happily function going forward,
> > > even if we don't rebind the driver to it.
> >
> > A partially trimmed acpi_device is hard to rollback. acpi_device should
> > be either trimmed completely or intact.
>
> I may or may not agree, depending on what you mean by "trimmed". :-)
>
> > When a function failed to trim
> > an acpi_device, it needs to rollback its operation for the device before
> > returning an error.
>
> Unless it is .remove(), because .remove() is supposed to always succeed
> (ie. unbind the driver from the device). However, it may signal the caller
> that something's fishy, by setting a flag in the device object, for example.

Right, .remove() cannot fail. We still need to check if we should
continue to use .remove(), though.

As for the flag, are you thinking that we call acpi_bus_trim() with
rmdevice false first, so that it won't remove acpi_device?

> > This is because only the failed function has enough
> > context to rollback when an error occurred in the middle of its
> > procedure.
>
> Not really. If it actually removes the struct acpi_device then the caller
> may run acpi_bus_scan() on that device if necessary. There may be a problem
> if the device has an associated physical node (or more of them), but that
> requires special care anyway.

Well, hot-remove to a device fails when there is a reason to fail. IOW,
such reason prevented the device to be removed safely. So, I think we
need to put it back to the original state in this case. Removing it by
ignoring the cause of failure sounds unsafe to me. Some status/data may
be left un-deleted as a result.

Thanks,
-Toshi

2012-11-30 00:08:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Thursday, November 29, 2012 04:17:19 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 23:11 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > >
> > > > Well, let's put it this way: If we started a trim, we should just do it
> > > > completely, in which case we know we can go for the eject, or we should
> > > > roll it back completely. Now, if you just break the trim on first error,
> > > > the complete rollback is kind of problematic. It should be doable, but
> > > > it won't be easy. On the other hand, if you go for the full trim,
> > > > doing a rollback is trivial, it's as though you have reinserted the whole
> > > > stuff.
> > >
> > > acpi_bus_check_add() skips initialization when an ACPI device already
> > > has its associated acpi_device. So, I think it works either way.
> >
> > OK
> >
> > > > Now, that need not harm functionality, and that's why I proposed the
> > > > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > > > rollback", in which case the device can happily function going forward,
> > > > even if we don't rebind the driver to it.
> > >
> > > A partially trimmed acpi_device is hard to rollback. acpi_device should
> > > be either trimmed completely or intact.
> >
> > I may or may not agree, depending on what you mean by "trimmed". :-)
> >
> > > When a function failed to trim
> > > an acpi_device, it needs to rollback its operation for the device before
> > > returning an error.
> >
> > Unless it is .remove(), because .remove() is supposed to always succeed
> > (ie. unbind the driver from the device). However, it may signal the caller
> > that something's fishy, by setting a flag in the device object, for example.
>
> Right, .remove() cannot fail. We still need to check if we should
> continue to use .remove(), though.
>
> As for the flag, are you thinking that we call acpi_bus_trim() with
> rmdevice false first, so that it won't remove acpi_device?

I'm not sure if that's going to help.

Definitely, .remove() should just unbind the driver from the device.
That's what it's supposed to do. Still, it may leave some information for
the caller in the device structure itself. For example, "I have unbound
from the device, but it is not safe to remove it physically".

I'm now thinking that we may need to rework the trimming so that
.remove() is called for all drivers first and the struct acpi_device
objects are not removed at this stage. Then, if .remove() from one
driver signals the situation like above, the routine will have to
rebind the drivers that have been unbound and we're done.

After that stage, when all drivers have been unbound, we should be
able to go for full eject. First, we can drop all struct acpi_device
objects in the relevant subtree and then we can run _EJ0.

> > > This is because only the failed function has enough
> > > context to rollback when an error occurred in the middle of its
> > > procedure.
> >
> > Not really. If it actually removes the struct acpi_device then the caller
> > may run acpi_bus_scan() on that device if necessary. There may be a problem
> > if the device has an associated physical node (or more of them), but that
> > requires special care anyway.
>
> Well, hot-remove to a device fails when there is a reason to fail. IOW,
> such reason prevented the device to be removed safely. So, I think we
> need to put it back to the original state in this case. Removing it by
> ignoring the cause of failure sounds unsafe to me. Some status/data may
> be left un-deleted as a result.

Again, I may or may not agree with that, depending on whether you're talking
about physical devices or about struct acpi_device objects.

Anyway, I agree that removing struct acpi_device objects may not be worth the
effort if we're going to re-create them in a while, because that may be costly.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-11-30 01:18:07

by Toshi Kani

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

On Fri, 2012-11-30 at 01:13 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 04:17:19 PM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 23:11 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > > > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > > > Now, that need not harm functionality, and that's why I proposed the
> > > > > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > > > > rollback", in which case the device can happily function going forward,
> > > > > even if we don't rebind the driver to it.
> > > >
> > > > A partially trimmed acpi_device is hard to rollback. acpi_device should
> > > > be either trimmed completely or intact.
> > >
> > > I may or may not agree, depending on what you mean by "trimmed". :-)
> > >
> > > > When a function failed to trim
> > > > an acpi_device, it needs to rollback its operation for the device before
> > > > returning an error.
> > >
> > > Unless it is .remove(), because .remove() is supposed to always succeed
> > > (ie. unbind the driver from the device). However, it may signal the caller
> > > that something's fishy, by setting a flag in the device object, for example.
> >
> > Right, .remove() cannot fail. We still need to check if we should
> > continue to use .remove(), though.
> >
> > As for the flag, are you thinking that we call acpi_bus_trim() with
> > rmdevice false first, so that it won't remove acpi_device?
>
> I'm not sure if that's going to help.
>
> Definitely, .remove() should just unbind the driver from the device.
> That's what it's supposed to do. Still, it may leave some information for
> the caller in the device structure itself. For example, "I have unbound
> from the device, but it is not safe to remove it physically".

Right.

> I'm now thinking that we may need to rework the trimming so that
> .remove() is called for all drivers first and the struct acpi_device
> objects are not removed at this stage. Then, if .remove() from one
> driver signals the situation like above, the routine will have to
> rebind the drivers that have been unbound and we're done.
>
> After that stage, when all drivers have been unbound, we should be
> able to go for full eject. First, we can drop all struct acpi_device
> objects in the relevant subtree and then we can run _EJ0.

I agree that such approach is worth pursuing.

> > > > This is because only the failed function has enough
> > > > context to rollback when an error occurred in the middle of its
> > > > procedure.
> > >
> > > Not really. If it actually removes the struct acpi_device then the caller
> > > may run acpi_bus_scan() on that device if necessary. There may be a problem
> > > if the device has an associated physical node (or more of them), but that
> > > requires special care anyway.
> >
> > Well, hot-remove to a device fails when there is a reason to fail. IOW,
> > such reason prevented the device to be removed safely. So, I think we
> > need to put it back to the original state in this case. Removing it by
> > ignoring the cause of failure sounds unsafe to me. Some status/data may
> > be left un-deleted as a result.
>
> Again, I may or may not agree with that, depending on whether you're talking
> about physical devices or about struct acpi_device objects.

Sorry, by "hot-remove a device", I was referring removing struct
acpi_device and off-lining its resource. By "left un-delted", I was
referring its resource left un-deleted, such as memory ranges.

> Anyway, I agree that removing struct acpi_device objects may not be worth the
> effort if we're going to re-create them in a while, because that may be costly.

Thanks,
-Toshi