From: Rafael J. Wysocki <[email protected]>
In preparation for subsequent changes, introduce a specialized variant
of async_schedule_dev() that will not invoke the argument function
synchronously when it cannot be scheduled for asynchronous execution.
The new function, async_schedule_dev_nocall(), will be used for fixing
possible deadlocks in the system-wide power management core code.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/main.c | 12 ++++++++----
include/linux/async.h | 2 ++
kernel/async.c | 29 +++++++++++++++++++++++++++++
3 files changed, 39 insertions(+), 4 deletions(-)
Index: linux-pm/kernel/async.c
===================================================================
--- linux-pm.orig/kernel/async.c
+++ linux-pm/kernel/async.c
@@ -244,6 +244,35 @@ async_cookie_t async_schedule_node(async
EXPORT_SYMBOL_GPL(async_schedule_node);
/**
+ * async_schedule_dev_nocall - A simplified variant of async_schedule_dev()
+ * @func: function to execute asynchronously
+ * @dev: device argument to be passed to function
+ *
+ * @dev is used as both the argument for the function and to provide NUMA
+ * context for where to run the function.
+ *
+ * If the asynchronous execution of @func is scheduled successfully, return
+ * true. Otherwise, do nothing and return false, unlike async_schedule_dev()
+ * that will run the function synchronously then.
+ */
+bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
+{
+ struct async_entry *entry;
+
+ entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
+
+ /* Give up if there is no memory or too much work. */
+ if (!entry || atomic_read(&entry_count) > MAX_WORK) {
+ kfree(entry);
+ return false;
+ }
+
+ __async_schedule_node_domain(func, dev, dev_to_node(dev),
+ &async_dfl_domain, entry);
+ return true;
+}
+
+/**
* async_synchronize_full - synchronize all asynchronous function calls
*
* This function waits until all asynchronous function calls have been done.
Index: linux-pm/include/linux/async.h
===================================================================
--- linux-pm.orig/include/linux/async.h
+++ linux-pm/include/linux/async.h
@@ -90,6 +90,8 @@ async_schedule_dev(async_func_t func, st
return async_schedule_node(func, dev, dev_to_node(dev));
}
+bool async_schedule_dev_nocall(async_func_t func, struct device *dev);
+
/**
* async_schedule_dev_domain - A device specific version of async_schedule_domain
* @func: function to execute asynchronously
On Wed, Dec 27, 2023 at 09:38:23PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> In preparation for subsequent changes, introduce a specialized variant
> of async_schedule_dev() that will not invoke the argument function
> synchronously when it cannot be scheduled for asynchronous execution.
>
> The new function, async_schedule_dev_nocall(), will be used for fixing
> possible deadlocks in the system-wide power management core code.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/base/power/main.c | 12 ++++++++----
> include/linux/async.h | 2 ++
> kernel/async.c | 29 +++++++++++++++++++++++++++++
> 3 files changed, 39 insertions(+), 4 deletions(-)
>
> Index: linux-pm/kernel/async.c
> ===================================================================
> --- linux-pm.orig/kernel/async.c
> +++ linux-pm/kernel/async.c
> @@ -244,6 +244,35 @@ async_cookie_t async_schedule_node(async
> EXPORT_SYMBOL_GPL(async_schedule_node);
>
> /**
> + * async_schedule_dev_nocall - A simplified variant of async_schedule_dev()
> + * @func: function to execute asynchronously
> + * @dev: device argument to be passed to function
> + *
> + * @dev is used as both the argument for the function and to provide NUMA
> + * context for where to run the function.
> + *
> + * If the asynchronous execution of @func is scheduled successfully, return
> + * true. Otherwise, do nothing and return false, unlike async_schedule_dev()
> + * that will run the function synchronously then.
> + */
> +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> +{
> + struct async_entry *entry;
> +
> + entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
Is GFP_KERNEL intended here ? I think it's not safe since will
be called from device_resume_noirq() .
Regards
Stanislaw
On Fri, Dec 29, 2023 at 8:02 AM Stanislaw Gruszka
<[email protected]> wrote:
>
> On Wed, Dec 27, 2023 at 09:38:23PM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > In preparation for subsequent changes, introduce a specialized variant
> > of async_schedule_dev() that will not invoke the argument function
> > synchronously when it cannot be scheduled for asynchronous execution.
> >
> > The new function, async_schedule_dev_nocall(), will be used for fixing
> > possible deadlocks in the system-wide power management core code.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > drivers/base/power/main.c | 12 ++++++++----
> > include/linux/async.h | 2 ++
> > kernel/async.c | 29 +++++++++++++++++++++++++++++
> > 3 files changed, 39 insertions(+), 4 deletions(-)
> >
> > Index: linux-pm/kernel/async.c
> > ===================================================================
> > --- linux-pm.orig/kernel/async.c
> > +++ linux-pm/kernel/async.c
> > @@ -244,6 +244,35 @@ async_cookie_t async_schedule_node(async
> > EXPORT_SYMBOL_GPL(async_schedule_node);
> >
> > /**
> > + * async_schedule_dev_nocall - A simplified variant of async_schedule_dev()
> > + * @func: function to execute asynchronously
> > + * @dev: device argument to be passed to function
> > + *
> > + * @dev is used as both the argument for the function and to provide NUMA
> > + * context for where to run the function.
> > + *
> > + * If the asynchronous execution of @func is scheduled successfully, return
> > + * true. Otherwise, do nothing and return false, unlike async_schedule_dev()
> > + * that will run the function synchronously then.
> > + */
> > +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> > +{
> > + struct async_entry *entry;
> > +
> > + entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
>
> Is GFP_KERNEL intended here ?
Yes, it is.
PM will be the only user of this, at least for now, and it all runs in
process context.
> I think it's not safe since will
> be called from device_resume_noirq() .
device_resume_noirq() runs in process context too.
The name is somewhat confusing (sorry about that) and it means that
hardirq handlers (for the majority of IRQs) don't run in that resume
phase, but interrupts are enabled locally on all CPUs (this is
required for wakeup handling, among other things).
On Fri, Dec 29, 2023 at 02:37:36PM +0100, Rafael J. Wysocki wrote:
> > > +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> > > +{
> > > + struct async_entry *entry;
> > > +
> > > + entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
> >
> > Is GFP_KERNEL intended here ?
>
> Yes, it is.
>
> PM will be the only user of this, at least for now, and it all runs in
> process context.
>
> > I think it's not safe since will
> > be called from device_resume_noirq() .
>
> device_resume_noirq() runs in process context too.
>
> The name is somewhat confusing (sorry about that) and it means that
> hardirq handlers (for the majority of IRQs) don't run in that resume
> phase, but interrupts are enabled locally on all CPUs (this is
> required for wakeup handling, among other things).
Then my concern would be: if among devices with disabled IRQs are
disk devices? Seems there are disk devices as well, and because
GFP_KERNEL can start reclaiming memory by doing disk IO (write
dirty pages for example), with disk driver interrupts disabled
reclaiming process can not finish.
I do not see how such possible infinite waiting for disk IO
scenario is prevented here, did I miss something?
Regards
Stanislaw
On Fri, Dec 29, 2023 at 3:54 PM Stanislaw Gruszka
<[email protected]> wrote:
>
> On Fri, Dec 29, 2023 at 02:37:36PM +0100, Rafael J. Wysocki wrote:
> > > > +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> > > > +{
> > > > + struct async_entry *entry;
> > > > +
> > > > + entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
> > >
> > > Is GFP_KERNEL intended here ?
> >
> > Yes, it is.
> >
> > PM will be the only user of this, at least for now, and it all runs in
> > process context.
> >
> > > I think it's not safe since will
> > > be called from device_resume_noirq() .
> >
> > device_resume_noirq() runs in process context too.
> >
> > The name is somewhat confusing (sorry about that) and it means that
> > hardirq handlers (for the majority of IRQs) don't run in that resume
> > phase, but interrupts are enabled locally on all CPUs (this is
> > required for wakeup handling, among other things).
>
> Then my concern would be: if among devices with disabled IRQs are
> disk devices? Seems there are disk devices as well, and because
> GFP_KERNEL can start reclaiming memory by doing disk IO (write
> dirty pages for example), with disk driver interrupts disabled
> reclaiming process can not finish.
>
> I do not see how such possible infinite waiting for disk IO
> scenario is prevented here, did I miss something?
Well, it is not a concern, because the suspend code already prevents
the mm subsystem from trying too hard to find free memory. See the
pm_restrict_gfp_mask() call in enter_state().
Otherwise, it would have been a problem for any GFP_KERNEL allocations
made during system-wide suspend-resume, not just in the _noirq phases.
On Fri, Dec 29, 2023 at 05:36:01PM +0100, Rafael J. Wysocki wrote:
> On Fri, Dec 29, 2023 at 3:54 PM Stanislaw Gruszka
> <[email protected]> wrote:
> >
> > On Fri, Dec 29, 2023 at 02:37:36PM +0100, Rafael J. Wysocki wrote:
> > > > > +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> > > > > +{
> > > > > + struct async_entry *entry;
> > > > > +
> > > > > + entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
> > > >
> > > > Is GFP_KERNEL intended here ?
> > >
> > > Yes, it is.
> > >
> > > PM will be the only user of this, at least for now, and it all runs in
> > > process context.
> > >
> > > > I think it's not safe since will
> > > > be called from device_resume_noirq() .
> > >
> > > device_resume_noirq() runs in process context too.
> > >
> > > The name is somewhat confusing (sorry about that) and it means that
> > > hardirq handlers (for the majority of IRQs) don't run in that resume
> > > phase, but interrupts are enabled locally on all CPUs (this is
> > > required for wakeup handling, among other things).
> >
> > Then my concern would be: if among devices with disabled IRQs are
> > disk devices? Seems there are disk devices as well, and because
> > GFP_KERNEL can start reclaiming memory by doing disk IO (write
> > dirty pages for example), with disk driver interrupts disabled
> > reclaiming process can not finish.
> >
> > I do not see how such possible infinite waiting for disk IO
> > scenario is prevented here, did I miss something?
>
> Well, it is not a concern, because the suspend code already prevents
> the mm subsystem from trying too hard to find free memory. See the
> pm_restrict_gfp_mask() call in enter_state().
So that I missed :-) Thanks for explanations.
Reviewed-by: Stanislaw Gruszka <[email protected]> for the series.
On Tue, Jan 2, 2024 at 8:10 AM Stanislaw Gruszka
<[email protected]> wrote:
>
> On Fri, Dec 29, 2023 at 05:36:01PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Dec 29, 2023 at 3:54 PM Stanislaw Gruszka
> > <[email protected]> wrote:
> > >
> > > On Fri, Dec 29, 2023 at 02:37:36PM +0100, Rafael J. Wysocki wrote:
> > > > > > +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> > > > > > +{
> > > > > > + struct async_entry *entry;
> > > > > > +
> > > > > > + entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
> > > > >
> > > > > Is GFP_KERNEL intended here ?
> > > >
> > > > Yes, it is.
> > > >
> > > > PM will be the only user of this, at least for now, and it all runs in
> > > > process context.
> > > >
> > > > > I think it's not safe since will
> > > > > be called from device_resume_noirq() .
> > > >
> > > > device_resume_noirq() runs in process context too.
> > > >
> > > > The name is somewhat confusing (sorry about that) and it means that
> > > > hardirq handlers (for the majority of IRQs) don't run in that resume
> > > > phase, but interrupts are enabled locally on all CPUs (this is
> > > > required for wakeup handling, among other things).
> > >
> > > Then my concern would be: if among devices with disabled IRQs are
> > > disk devices? Seems there are disk devices as well, and because
> > > GFP_KERNEL can start reclaiming memory by doing disk IO (write
> > > dirty pages for example), with disk driver interrupts disabled
> > > reclaiming process can not finish.
> > >
> > > I do not see how such possible infinite waiting for disk IO
> > > scenario is prevented here, did I miss something?
> >
> > Well, it is not a concern, because the suspend code already prevents
> > the mm subsystem from trying too hard to find free memory. See the
> > pm_restrict_gfp_mask() call in enter_state().
>
> So that I missed :-) Thanks for explanations.
>
> Reviewed-by: Stanislaw Gruszka <[email protected]> for the series.
Thank you!