2022-04-21 10:57:58

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCH 6/9] pm/irq: make for_each_irq_desc() safe of irq_desc release

On Wed, Apr 20, 2022 at 06:23:48PM +0200, Rafael J. Wysocki wrote:
> On Wed, Apr 20, 2022 at 4:06 PM Pingfan Liu <[email protected]> wrote:
> >
> > The invloved context is no a RCU read section. Furthermore there may be
> > more than one task at this point. Hence it demands a measure to prevent
> > irq_desc from freeing. Use irq_lock_sparse to serve the protection
> > purpose.
>
> Can you please describe an example scenario in which the added locking
> will prevent a failure from occurring?
>

Sorry to forget mentioning that this is based on the code analysis.

Suppose the following scenario:
Two threads invloved
threadA "hibernate" runs suspend_device_irqs()
threadB "rcu_cpu_kthread" runs rcu_core()->rcu_do_batch(), which releases
object, let's say irq_desc

Zoom in:
threadA threadB
for_each_irq_desc(irq, desc) {
get irq_descA which is under freeing
--->preempted by rcu_core()->rcu_do_batch() which releases irq_descA
raw_spin_lock_irqsave(&desc->lock, flags);
//Oops

And since in the involved code piece, threadA runs in a preemptible
context, and there may be more than one thread at this stage. So the
preempted can happen.


Thanks,

Pingfan


> > Signed-off-by: Pingfan Liu <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: "Rafael J. Wysocki" <[email protected]>
> > To: [email protected]
> > ---
> > kernel/irq/pm.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/irq/pm.c b/kernel/irq/pm.c
> > index ca71123a6130..4b67a4c7de3c 100644
> > --- a/kernel/irq/pm.c
> > +++ b/kernel/irq/pm.c
> > @@ -133,6 +133,7 @@ void suspend_device_irqs(void)
> > struct irq_desc *desc;
> > int irq;
> >
> > + irq_lock_sparse();
> > for_each_irq_desc(irq, desc) {
> > unsigned long flags;
> > bool sync;
> > @@ -146,6 +147,7 @@ void suspend_device_irqs(void)
> > if (sync)
> > synchronize_irq(irq);
> > }
> > + irq_unlock_sparse();
> > }
> > EXPORT_SYMBOL_GPL(suspend_device_irqs);
> >
> > @@ -186,6 +188,7 @@ static void resume_irqs(bool want_early)
> > struct irq_desc *desc;
> > int irq;
> >
> > + /* The early resume stage is free of irq_desc release */
> > for_each_irq_desc(irq, desc) {
> > unsigned long flags;
> > bool is_early = desc->action &&
> > --
> > 2.31.1
> >


2022-04-22 11:50:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 6/9] pm/irq: make for_each_irq_desc() safe of irq_desc release

On Thu, Apr 21, 2022 at 5:31 AM Pingfan Liu <[email protected]> wrote:
>
> On Wed, Apr 20, 2022 at 06:23:48PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Apr 20, 2022 at 4:06 PM Pingfan Liu <[email protected]> wrote:
> > >
> > > The invloved context is no a RCU read section. Furthermore there may be
> > > more than one task at this point. Hence it demands a measure to prevent
> > > irq_desc from freeing. Use irq_lock_sparse to serve the protection
> > > purpose.
> >
> > Can you please describe an example scenario in which the added locking
> > will prevent a failure from occurring?
> >
>
> Sorry to forget mentioning that this is based on the code analysis.
>
> Suppose the following scenario:
> Two threads invloved
> threadA "hibernate" runs suspend_device_irqs()
> threadB "rcu_cpu_kthread" runs rcu_core()->rcu_do_batch(), which releases
> object, let's say irq_desc
>
> Zoom in:
> threadA threadB
> for_each_irq_desc(irq, desc) {
> get irq_descA which is under freeing
> --->preempted by rcu_core()->rcu_do_batch() which releases irq_descA
> raw_spin_lock_irqsave(&desc->lock, flags);
> //Oops
>
> And since in the involved code piece, threadA runs in a preemptible
> context, and there may be more than one thread at this stage. So the
> preempted can happen.

Well, I'm still not sure that this can ever trigger in practice, but I
guess the locking can be added for extra safety.

Anyway, the above information should go into the changelog IMO.

That said ->

> > > Signed-off-by: Pingfan Liu <[email protected]>
> > > Cc: Thomas Gleixner <[email protected]>
> > > Cc: "Rafael J. Wysocki" <[email protected]>
> > > To: [email protected]
> > > ---
> > > kernel/irq/pm.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/kernel/irq/pm.c b/kernel/irq/pm.c
> > > index ca71123a6130..4b67a4c7de3c 100644
> > > --- a/kernel/irq/pm.c
> > > +++ b/kernel/irq/pm.c
> > > @@ -133,6 +133,7 @@ void suspend_device_irqs(void)
> > > struct irq_desc *desc;
> > > int irq;
> > >
> > > + irq_lock_sparse();
> > > for_each_irq_desc(irq, desc) {
> > > unsigned long flags;
> > > bool sync;
> > > @@ -146,6 +147,7 @@ void suspend_device_irqs(void)
> > > if (sync)
> > > synchronize_irq(irq);

-> is it entirely safe to call synchronize_irq() under irq_lock_sparse?

> > > }
> > > + irq_unlock_sparse();
> > > }
> > > EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > >
> > > @@ -186,6 +188,7 @@ static void resume_irqs(bool want_early)
> > > struct irq_desc *desc;
> > > int irq;
> > >
> > > + /* The early resume stage is free of irq_desc release */
> > > for_each_irq_desc(irq, desc) {
> > > unsigned long flags;
> > > bool is_early = desc->action &&
> > > --
> > > 2.31.1
> > >

2022-04-22 20:58:12

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCH 6/9] pm/irq: make for_each_irq_desc() safe of irq_desc release

On Thu, Apr 21, 2022 at 12:57:28PM +0200, Rafael J. Wysocki wrote:
> On Thu, Apr 21, 2022 at 5:31 AM Pingfan Liu <[email protected]> wrote:
> >
> > On Wed, Apr 20, 2022 at 06:23:48PM +0200, Rafael J. Wysocki wrote:
> > > On Wed, Apr 20, 2022 at 4:06 PM Pingfan Liu <[email protected]> wrote:
> > > >
> > > > The invloved context is no a RCU read section. Furthermore there may be
> > > > more than one task at this point. Hence it demands a measure to prevent
> > > > irq_desc from freeing. Use irq_lock_sparse to serve the protection
> > > > purpose.
> > >
> > > Can you please describe an example scenario in which the added locking
> > > will prevent a failure from occurring?
> > >
> >
> > Sorry to forget mentioning that this is based on the code analysis.
> >
> > Suppose the following scenario:
> > Two threads invloved
> > threadA "hibernate" runs suspend_device_irqs()
> > threadB "rcu_cpu_kthread" runs rcu_core()->rcu_do_batch(), which releases
> > object, let's say irq_desc
> >
> > Zoom in:
> > threadA threadB
> > for_each_irq_desc(irq, desc) {
> > get irq_descA which is under freeing
> > --->preempted by rcu_core()->rcu_do_batch() which releases irq_descA
> > raw_spin_lock_irqsave(&desc->lock, flags);
> > //Oops
> >
> > And since in the involved code piece, threadA runs in a preemptible
> > context, and there may be more than one thread at this stage. So the
> > preempted can happen.
>
> Well, I'm still not sure that this can ever trigger in practice, but I

Yes, I also think it hardly happen. I had gone through all
accesses to irq_desc in kernel, and just want to make anything
completely obey the rule.
> guess the locking can be added for extra safety.
>
> Anyway, the above information should go into the changelog IMO.
>

OK, I will update it in V2.
> That said ->
>
> > > > Signed-off-by: Pingfan Liu <[email protected]>
> > > > Cc: Thomas Gleixner <[email protected]>
> > > > Cc: "Rafael J. Wysocki" <[email protected]>
> > > > To: [email protected]
> > > > ---
> > > > kernel/irq/pm.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/kernel/irq/pm.c b/kernel/irq/pm.c
> > > > index ca71123a6130..4b67a4c7de3c 100644
> > > > --- a/kernel/irq/pm.c
> > > > +++ b/kernel/irq/pm.c
> > > > @@ -133,6 +133,7 @@ void suspend_device_irqs(void)
> > > > struct irq_desc *desc;
> > > > int irq;
> > > >
> > > > + irq_lock_sparse();
> > > > for_each_irq_desc(irq, desc) {
> > > > unsigned long flags;
> > > > bool sync;
> > > > @@ -146,6 +147,7 @@ void suspend_device_irqs(void)
> > > > if (sync)
> > > > synchronize_irq(irq);
>
> -> is it entirely safe to call synchronize_irq() under irq_lock_sparse?

synchronize_irq - wait for pending IRQ handlers (on other CPUs). It
only holds irq_desc->lock and has no connections with irq sparse tree or
bitmap. I can not see any deadlock issue or miss something?

Thanks for your time.

Regards,

Pingfan
>
> > > > }
> > > > + irq_unlock_sparse();
> > > > }
> > > > EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > >
> > > > @@ -186,6 +188,7 @@ static void resume_irqs(bool want_early)
> > > > struct irq_desc *desc;
> > > > int irq;
> > > >
> > > > + /* The early resume stage is free of irq_desc release */
> > > > for_each_irq_desc(irq, desc) {
> > > > unsigned long flags;
> > > > bool is_early = desc->action &&
> > > > --
> > > > 2.31.1
> > > >