2020-01-22 23:12:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH] PM: core: Fix handling of devices deleted during system-wide resume

From: Rafael J. Wysocki <[email protected]>

If a device is deleted by one of its system-wide resume callbacks
(for example, because it does not appear to be present or accessible
any more) along with its children, the resume of the children may
continue leading to use-after-free errors and other issues
(potentially).

Namely, if the device's children are resumed asynchronously, their
resume may have been scheduled already before the device's callback
runs and so the device may be deleted while dpm_wait_for_superior()
is being executed for them. The memory taken up by the parent device
object may be freed then while dpm_wait() is waiting for the parent's
resume callback to complete, which leads to a use-after-free.
Moreover, the resume of the children is really not expected to
continue after they have been unregistered, so it must be terminated
right away in that case.

To address this problem, modify dpm_wait_for_superior() to check
if the target device is still there in the system-wide PM list of
devices and if so, to increment its parent's reference counter, both
under dpm_list_mtx which prevents device_del() running for the child
from dropping the parent's reference counter prematurely.

If the device is not present in the system-wide PM list of devices
any more, the resume of it cannot continue, so check that again after
dpm_wait() returns, which means that the parent's callback has been
completed, and pass the result of that check to the caller of
dpm_wait_for_superior() to allow it to abort the device's resume
if it is not there any more.

Link: https://lore.kernel.org/linux-pm/[email protected]
Reported-by: Chanho Min <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/main.c | 42 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 37 insertions(+), 5 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -273,10 +273,38 @@ static void dpm_wait_for_suppliers(struc
device_links_read_unlock(idx);
}

-static void dpm_wait_for_superior(struct device *dev, bool async)
+static bool dpm_wait_for_superior(struct device *dev, bool async)
{
- dpm_wait(dev->parent, async);
+ struct device *parent;
+
+ /*
+ * If the device is resumed asynchronously and the parent's callback
+ * deletes both the device and the parent itself, the parent object may
+ * be freed while this function is running, so avoid that by reference
+ * counting the parent once more unless the device has been deleted
+ * already (in which case return right away).
+ */
+ mutex_lock(&dpm_list_mtx);
+
+ if (!device_pm_initialized(dev)) {
+ mutex_unlock(&dpm_list_mtx);
+ return false;
+ }
+
+ parent = get_device(dev->parent);
+
+ mutex_unlock(&dpm_list_mtx);
+
+ dpm_wait(parent, async);
+ put_device(parent);
+
dpm_wait_for_suppliers(dev, async);
+
+ /*
+ * If the parent's callback has deleted the device, attempting to resume
+ * it would be invalid, so avoid doing that then.
+ */
+ return device_pm_initialized(dev);
}

static void dpm_wait_for_consumers(struct device *dev, bool async)
@@ -621,7 +649,8 @@ static int device_resume_noirq(struct de
if (!dev->power.is_noirq_suspended)
goto Out;

- dpm_wait_for_superior(dev, async);
+ if (!dpm_wait_for_superior(dev, async))
+ goto Out;

skip_resume = dev_pm_may_skip_resume(dev);

@@ -829,7 +858,8 @@ static int device_resume_early(struct de
if (!dev->power.is_late_suspended)
goto Out;

- dpm_wait_for_superior(dev, async);
+ if (!dpm_wait_for_superior(dev, async))
+ goto Out;

callback = dpm_subsys_resume_early_cb(dev, state, &info);

@@ -944,7 +974,9 @@ static int device_resume(struct device *
goto Complete;
}

- dpm_wait_for_superior(dev, async);
+ if (!dpm_wait_for_superior(dev, async))
+ goto Complete;
+
dpm_watchdog_set(&wd, dev);
device_lock(dev);





2020-01-23 02:16:26

by Chanho Min

[permalink] [raw]
Subject: Re: [PATCH] PM: core: Fix handling of devices deleted during system-wide resume

> From: Rafael J. Wysocki <[email protected]>
>
> If a device is deleted by one of its system-wide resume callbacks
> (for example, because it does not appear to be present or accessible
> any more) along with its children, the resume of the children may
> continue leading to use-after-free errors and other issues
> (potentially).
>
> Namely, if the device's children are resumed asynchronously, their
> resume may have been scheduled already before the device's callback
> runs and so the device may be deleted while dpm_wait_for_superior()
> is being executed for them. The memory taken up by the parent device
> object may be freed then while dpm_wait() is waiting for the parent's
> resume callback to complete, which leads to a use-after-free.
> Moreover, the resume of the children is really not expected to
> continue after they have been unregistered, so it must be terminated
> right away in that case.Seokjoo Lee <[email protected]>
>
> To address this problem, modify dpm_wait_for_superior() to check
> if the target device is still there in the system-wide PM list of
> devices and if so, to increment its parent's reference counter, both
> under dpm_list_mtx which prevents device_del() running for the child
> from dropping the parent's reference counter prematurely.
>
> If the device is not present in the system-wide PM list of devices
> any more, the resume of it cannot continue, so check that again after
> dpm_wait() returns, which means that the parent's callback has been
> completed, and pass the result of that check to the caller of
> dpm_wait_for_superior() to allow it to abort the device's resume
> if it is not there any more.
>
> Link: https://lore.kernel.org/linux-pm/[email protected]
> Reported-by: Chanho Min <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/base/power/main.c | 42 +++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 37 insertions(+), 5 deletions(-)
>
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.cSeokjoo Lee <[email protected]>
> +++ linux-pm/drivers/base/power/main.c20. 1. 23. ???? 8:11?? Rafael J. Wysocki ??(??) ?? ??:
> @@ -273,10 +273,38 @@ static void dpm_wait_for_suppliers(struc
> device_links_read_unlock(idx);
> }
>
> -static void dpm_wait_for_superior(struct device *dev, bool async)
> +static bool dpm_wait_for_superior(struct device *dev, bool async)
> {
> - dpm_wait(dev->parent, async);
> + struct device *parent;board
> +
> + /*
> + * If the device is resumed asynchronously and the parent's callback
> + * deletes both the device and the parent itself, the parent object may
> + * be freed while this function is running, so avoid that by reference
> + * counting the parent once more unless the device has been deleted
> + * already (in which case return right away).
> + */
> + mutex_lock(&dpm_list_mtx);
> +
> + if (!device_pm_initialized(dev)) {20. 1. 23. ???? 8:11?? Rafael J. Wysocki ??(??) ?? ??:
> + mutex_unlock(&dpm_list_mtx);
> + return false;
> + }
> +
> + parent = get_device(dev->parent);
> +
> + mutex_unlock(&dpm_list_mtx);
> +
> + dpm_wait(parent, async);
> + put_device(parent);
> +
> dpm_wait_for_suppliers(dev, async);
> +
> + /*
> + * If the parent's callback has deleted the device, attempting to resume
> + * it would be invalid, so avoid doing that then.
> + */
> + return device_pm_initialized(dev);20. 1. 23. ???? 8:11?? Rafael J. Wysocki ??(??) ?? ??:
> }
>
> static void dpm_wait_for_consumers(struct device *dev, bool async)
> @@ -621,7 +649,8 @@ static int device_resume_noirq(struct de
> if (!dev->power.is_noirq_suspended)
> goto Out;
>
> - dpm_wait_for_superior(dev, async);
> + if (!dpm_wait_for_superior(dev, async))
> + goto Out;
>
> skip_resume = dev_pm_may_skip_resume(dev);
>
> @@ -829,7 +858,8 @@ static int device_resume_early(struct de
> if (!dev->power.is_late_suspended)
> goto Out;
>
> - dpm_wait_for_superior(dev, async);Seokjoo Lee <[email protected]>
> + if (!dpm_wait_for_superior(dev, async))
> + goto Out;
>
> callback = dpm_subsys_resume_early_cb(dev, state, &info);
>
> @@ -944,7 +974,9 @@ static int device_resume(struct device *
> goto Complete;
> }
>
> - dpm_wait_for_superior(dev, async);
> + if (!dpm_wait_for_superior(dev, async))
> + goto Complete;
> +
> dpm_watchdog_set(&wd, dev);
> device_lock(dev);Thanks, This seems to solve the rare hang on our target.
Actually, the problem is occurred in v4.4.
Shouldn't it apply to -stable?

Chanho

2020-01-23 09:25:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM: core: Fix handling of devices deleted during system-wide resume

On Thu, Jan 23, 2020 at 3:14 AM Chanho Min <[email protected]> wrote:
>
> > From: Rafael J. Wysocki <[email protected]>
> >
> > If a device is deleted by one of its system-wide resume callbacks
> > (for example, because it does not appear to be present or accessible
> > any more) along with its children, the resume of the children may
> > continue leading to use-after-free errors and other issues
> > (potentially).
> >
> > Namely, if the device's children are resumed asynchronously, their
> > resume may have been scheduled already before the device's callback
> > runs and so the device may be deleted while dpm_wait_for_superior()
> > is being executed for them. The memory taken up by the parent device
> > object may be freed then while dpm_wait() is waiting for the parent's
> > resume callback to complete, which leads to a use-after-free.
> > Moreover, the resume of the children is really not expected to
> > continue after they have been unregistered, so it must be terminated
> > right away in that case.Seokjoo Lee <[email protected]>
> >
> > To address this problem, modify dpm_wait_for_superior() to check
> > if the target device is still there in the system-wide PM list of
> > devices and if so, to increment its parent's reference counter, both
> > under dpm_list_mtx which prevents device_del() running for the child
> > from dropping the parent's reference counter prematurely.
> >
> > If the device is not present in the system-wide PM list of devices
> > any more, the resume of it cannot continue, so check that again after
> > dpm_wait() returns, which means that the parent's callback has been
> > completed, and pass the result of that check to the caller of
> > dpm_wait_for_superior() to allow it to abort the device's resume
> > if it is not there any more.
> >
> > Link: https://lore.kernel.org/linux-pm/[email protected]
> > Reported-by: Chanho Min <[email protected]>
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > drivers/base/power/main.c | 42 +++++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 37 insertions(+), 5 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/main.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/main.cSeokjoo Lee <[email protected]>
> > +++ linux-pm/drivers/base/power/main.c20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글:
> > @@ -273,10 +273,38 @@ static void dpm_wait_for_suppliers(struc
> > device_links_read_unlock(idx);
> > }
> >
> > -static void dpm_wait_for_superior(struct device *dev, bool async)
> > +static bool dpm_wait_for_superior(struct device *dev, bool async)
> > {
> > - dpm_wait(dev->parent, async);
> > + struct device *parent;board
> > +
> > + /*
> > + * If the device is resumed asynchronously and the parent's callback
> > + * deletes both the device and the parent itself, the parent object may
> > + * be freed while this function is running, so avoid that by reference
> > + * counting the parent once more unless the device has been deleted
> > + * already (in which case return right away).
> > + */
> > + mutex_lock(&dpm_list_mtx);
> > +
> > + if (!device_pm_initialized(dev)) {20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글:
> > + mutex_unlock(&dpm_list_mtx);
> > + return false;
> > + }
> > +
> > + parent = get_device(dev->parent);
> > +
> > + mutex_unlock(&dpm_list_mtx);
> > +
> > + dpm_wait(parent, async);
> > + put_device(parent);
> > +
> > dpm_wait_for_suppliers(dev, async);
> > +
> > + /*
> > + * If the parent's callback has deleted the device, attempting to resume
> > + * it would be invalid, so avoid doing that then.
> > + */
> > + return device_pm_initialized(dev);20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글:
> > }
> >
> > static void dpm_wait_for_consumers(struct device *dev, bool async)
> > @@ -621,7 +649,8 @@ static int device_resume_noirq(struct de
> > if (!dev->power.is_noirq_suspended)
> > goto Out;
> >
> > - dpm_wait_for_superior(dev, async);
> > + if (!dpm_wait_for_superior(dev, async))
> > + goto Out;
> >
> > skip_resume = dev_pm_may_skip_resume(dev);
> >
> > @@ -829,7 +858,8 @@ static int device_resume_early(struct de
> > if (!dev->power.is_late_suspended)
> > goto Out;
> >
> > - dpm_wait_for_superior(dev, async);Seokjoo Lee <[email protected]>
> > + if (!dpm_wait_for_superior(dev, async))
> > + goto Out;
> >
> > callback = dpm_subsys_resume_early_cb(dev, state, &info);
> >
> > @@ -944,7 +974,9 @@ static int device_resume(struct device *
> > goto Complete;
> > }
> >
> > - dpm_wait_for_superior(dev, async);
> > + if (!dpm_wait_for_superior(dev, async))
> > + goto Complete;
> > +
> > dpm_watchdog_set(&wd, dev);
> > device_lock(dev);Thanks, This seems to solve the rare hang on our target.
> Actually, the problem is occurred in v4.4.
> Shouldn't it apply to -stable?

Yes, it should, but I'll add a "stable" tag later.

2020-01-24 08:59:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] PM: core: Fix handling of devices deleted during system-wide resume

On Thu, Jan 23, 2020 at 12:11:24AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> If a device is deleted by one of its system-wide resume callbacks
> (for example, because it does not appear to be present or accessible
> any more) along with its children, the resume of the children may
> continue leading to use-after-free errors and other issues
> (potentially).
>
> Namely, if the device's children are resumed asynchronously, their
> resume may have been scheduled already before the device's callback
> runs and so the device may be deleted while dpm_wait_for_superior()
> is being executed for them. The memory taken up by the parent device
> object may be freed then while dpm_wait() is waiting for the parent's
> resume callback to complete, which leads to a use-after-free.
> Moreover, the resume of the children is really not expected to
> continue after they have been unregistered, so it must be terminated
> right away in that case.
>
> To address this problem, modify dpm_wait_for_superior() to check
> if the target device is still there in the system-wide PM list of
> devices and if so, to increment its parent's reference counter, both
> under dpm_list_mtx which prevents device_del() running for the child
> from dropping the parent's reference counter prematurely.
>
> If the device is not present in the system-wide PM list of devices
> any more, the resume of it cannot continue, so check that again after
> dpm_wait() returns, which means that the parent's callback has been
> completed, and pass the result of that check to the caller of
> dpm_wait_for_superior() to allow it to abort the device's resume
> if it is not there any more.
>
> Link: https://lore.kernel.org/linux-pm/[email protected]
> Reported-by: Chanho Min <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/base/power/main.c | 42 +++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 37 insertions(+), 5 deletions(-)

Acked-by: Greg Kroah-Hartman <[email protected]>