2012-02-06 08:14:51

by Lothar Waßmann

[permalink] [raw]
Subject: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever

Hi,

I already sent this to <[email protected]> on Feb. 1, 2012
but did not get any response there. So resending to a wider audience
with improved subject line:

there is a race condition in the threaded IRQ handler code for oneshot
interrupts that may lead to disabling an IRQ indefinitely. IRQs are
masked before calling the hard-irq handler and are unmasked only after
the soft-irq handler has been run. Thus if the hard-irq handler
returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
will not be called, the interrupt will remain masked forever.

This can happen due to a short pulse on the interrupt line, that
triggers the interrupt logic, but goes undetected by the hard-irq
handler. The problem can be reproduced with the TSC2007 touch
controller driver that uses ONESHOT interrupts.

The problem arises also with interrupt controllers that latch a level
triggered IRQ until it is acknowledged (like the i.MX28 does).
In this case the IRQ status bit will remain asserted after the
soft-irq finishes and retrigger the interrupt while the interrupt line
is already deasserted.

The following patch would solve the problem, but I'm not sure whether
it's the Right Thing(TM) to do. Especially wrt. shared interrupts.

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 470d08c..93beadb 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -146,6 +146,11 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
/* Fall through to add to randomness */
case IRQ_HANDLED:
random |= action->flags;
+ /* unmask the IRQ that has been left masked
+ * due to race condition
+ */
+ if (res == IRQ_HANDLED && (action->flags & IRQF_ONESHOT))
+ unmask_irq(desc);
break;

default:

Best regards,

Lothar Wassmann
--
___________________________________________________________

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

http://www.karo-electronics.de | [email protected]
___________________________________________________________


2012-02-06 10:41:26

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever

On 02/06/2012 09:14 AM, =?utf-8?Q?Lothar_Wa=C3=9Fmann?= wrote:
> Hi,
>
> I already sent this to <[email protected]> on Feb. 1, 2012
> but did not get any response there. So resending to a wider audience
> with improved subject line:
>
> there is a race condition in the threaded IRQ handler code for oneshot
> interrupts that may lead to disabling an IRQ indefinitely. IRQs are
> masked before calling the hard-irq handler and are unmasked only after
> the soft-irq handler has been run. Thus if the hard-irq handler
> returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
> will not be called, the interrupt will remain masked forever.
>
> This can happen due to a short pulse on the interrupt line, that
> triggers the interrupt logic, but goes undetected by the hard-irq
> handler. The problem can be reproduced with the TSC2007 touch
> controller driver that uses ONESHOT interrupts.
>
> The problem arises also with interrupt controllers that latch a level
> triggered IRQ until it is acknowledged (like the i.MX28 does).
> In this case the IRQ status bit will remain asserted after the
> soft-irq finishes and retrigger the interrupt while the interrupt line
> is already deasserted.
>
> The following patch would solve the problem, but I'm not sure whether
> it's the Right Thing(TM) to do. Especially wrt. shared interrupts.
>
> diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
> index 470d08c..93beadb 100644
> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -146,6 +146,11 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
> /* Fall through to add to randomness */
> case IRQ_HANDLED:
> random |= action->flags;
> + /* unmask the IRQ that has been left masked
> + * due to race condition
> + */
> + if (res == IRQ_HANDLED && (action->flags & IRQF_ONESHOT))
> + unmask_irq(desc);
> break;
>
> default:

I think a better fix is to check the return value of handle_irq_event in
handle_level_irq and if the IRQ_WAKE_THREADED bit is not set unmask the irq.

The same should probably also be done for handle_fasteoi_irq.

2012-02-07 09:03:28

by Yong Zhang

[permalink] [raw]
Subject: Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever

On Mon, Feb 06, 2012 at 09:14:47AM +0100, =?utf-8?Q?Lothar_Wa=C3=9Fmann?= wrote:
> Hi,
>
> I already sent this to <[email protected]> on Feb. 1, 2012
> but did not get any response there. So resending to a wider audience
> with improved subject line:
>
> there is a race condition in the threaded IRQ handler code for oneshot
> interrupts that may lead to disabling an IRQ indefinitely. IRQs are
> masked before calling the hard-irq handler and are unmasked only after
> the soft-irq handler has been run. Thus if the hard-irq handler
> returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
> will not be called, the interrupt will remain masked forever.
>
> This can happen due to a short pulse on the interrupt line, that
> triggers the interrupt logic, but goes undetected by the hard-irq
> handler. The problem can be reproduced with the TSC2007 touch
> controller driver that uses ONESHOT interrupts.

Isn't it the responsibility of the driver (say TSC2007)?

In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO.

Thanks,
Yong


>
> The problem arises also with interrupt controllers that latch a level
> triggered IRQ until it is acknowledged (like the i.MX28 does).
> In this case the IRQ status bit will remain asserted after the
> soft-irq finishes and retrigger the interrupt while the interrupt line
> is already deasserted.
>
> The following patch would solve the problem, but I'm not sure whether
> it's the Right Thing(TM) to do. Especially wrt. shared interrupts.
>
> diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
> index 470d08c..93beadb 100644
> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -146,6 +146,11 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
> /* Fall through to add to randomness */
> case IRQ_HANDLED:
> random |= action->flags;
> + /* unmask the IRQ that has been left masked
> + * due to race condition
> + */
> + if (res == IRQ_HANDLED && (action->flags & IRQF_ONESHOT))
> + unmask_irq(desc);
> break;
>
> default:
>
> Best regards,
>
> Lothar Wassmann
> --
> ___________________________________________________________
>
> Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
> Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
> Geschäftsführer: Matthias Kaussen
> Handelsregistereintrag: Amtsgericht Aachen, HRB 4996
>
> http://www.karo-electronics.de | [email protected]
> ___________________________________________________________
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Only stand for myself

2012-02-07 10:01:11

by Lothar Waßmann

[permalink] [raw]
Subject: Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever

Hi,

> On Mon, Feb 06, 2012 at 09:14:47AM +0100, Lothar Waßmann wrote:
> > Hi,
> >
> > I already sent this to <[email protected]> on Feb. 1, 2012
> > but did not get any response there. So resending to a wider audience
> > with improved subject line:
> >
> > there is a race condition in the threaded IRQ handler code for oneshot
> > interrupts that may lead to disabling an IRQ indefinitely. IRQs are
> > masked before calling the hard-irq handler and are unmasked only after
> > the soft-irq handler has been run. Thus if the hard-irq handler
> > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
> > will not be called, the interrupt will remain masked forever.
> >
> > This can happen due to a short pulse on the interrupt line, that
> > triggers the interrupt logic, but goes undetected by the hard-irq
> > handler. The problem can be reproduced with the TSC2007 touch
> > controller driver that uses ONESHOT interrupts.
>
> Isn't it the responsibility of the driver (say TSC2007)?
>
> In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO.
>
That would mean it had to return IRQ_WAKE_THREAD unconditionally
making the return code useless.
And it would cause an extra useless loop through the softirq
handler.


Lothar Waßmann
--
___________________________________________________________

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

http://www.karo-electronics.de | [email protected]
___________________________________________________________

2012-02-07 12:35:07

by Yong Zhang

[permalink] [raw]
Subject: Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever

On Tue, Feb 07, 2012 at 11:01:06AM +0100, Lothar Waßmann wrote:
> Hi,
>
> > On Mon, Feb 06, 2012 at 09:14:47AM +0100, Lothar Waßmann wrote:
> > > Hi,
> > >
> > > I already sent this to <[email protected]> on Feb. 1, 2012
> > > but did not get any response there. So resending to a wider audience
> > > with improved subject line:
> > >
> > > there is a race condition in the threaded IRQ handler code for oneshot
> > > interrupts that may lead to disabling an IRQ indefinitely. IRQs are
> > > masked before calling the hard-irq handler and are unmasked only after
> > > the soft-irq handler has been run. Thus if the hard-irq handler
> > > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
> > > will not be called, the interrupt will remain masked forever.
> > >
> > > This can happen due to a short pulse on the interrupt line, that
> > > triggers the interrupt logic, but goes undetected by the hard-irq
> > > handler. The problem can be reproduced with the TSC2007 touch
> > > controller driver that uses ONESHOT interrupts.
> >
> > Isn't it the responsibility of the driver (say TSC2007)?
> >
> > In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO.
> >
> That would mean it had to return IRQ_WAKE_THREAD unconditionally
> making the return code useless.
> And it would cause an extra useless loop through the softirq
> handler.

Yeah, it's the default behavior when we introduce 'theadirqs',
and it's safe.

You know in your patch unmask_irq() is called locklessly and
it will introduce other race.

Thanks,
Yong

2012-02-07 12:52:22

by Lothar Waßmann

[permalink] [raw]
Subject: Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever

Hi,

Yong Zhang writes:
> On Tue, Feb 07, 2012 at 11:01:06AM +0100, Lothar Waßmann wrote:
> > Hi,
> >
> > > On Mon, Feb 06, 2012 at 09:14:47AM +0100, Lothar Waßmann wrote:
> > > > Hi,
> > > >
> > > > I already sent this to <[email protected]> on Feb. 1, 2012
> > > > but did not get any response there. So resending to a wider audience
> > > > with improved subject line:
> > > >
> > > > there is a race condition in the threaded IRQ handler code for oneshot
> > > > interrupts that may lead to disabling an IRQ indefinitely. IRQs are
> > > > masked before calling the hard-irq handler and are unmasked only after
> > > > the soft-irq handler has been run. Thus if the hard-irq handler
> > > > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
> > > > will not be called, the interrupt will remain masked forever.
> > > >
> > > > This can happen due to a short pulse on the interrupt line, that
> > > > triggers the interrupt logic, but goes undetected by the hard-irq
> > > > handler. The problem can be reproduced with the TSC2007 touch
> > > > controller driver that uses ONESHOT interrupts.
> > >
> > > Isn't it the responsibility of the driver (say TSC2007)?
> > >
> > > In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO.
> > >
> > That would mean it had to return IRQ_WAKE_THREAD unconditionally
> > making the return code useless.
> > And it would cause an extra useless loop through the softirq
> > handler.
>
> Yeah, it's the default behavior when we introduce 'theadirqs',
> and it's safe.
>
So, the correct solution would be to remove the check for
IRQ_WAKE_THREAD in handle_irq_event_percpu() and always invoke the
softirq handler?
Note that this problem is not specific to the TSC2007 driver, but may
occur with any hardware.

Or maybe do the unmasking in handle_irq_event() as proposed by
Lars-Peter Clausen in <[email protected]>?
Like that:
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index f7c543a..fbf68c7 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -343,6 +343,8 @@ EXPORT_SYMBOL_GPL(handle_simple_irq);
void
handle_level_irq(unsigned int irq, struct irq_desc *desc)
{
+ int ret;
+
raw_spin_lock(&desc->lock);
mask_ack_irq(desc);

@@ -360,10 +362,12 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc)
if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data)))
goto out_unlock;

- handle_irq_event(desc);
+ ret = handle_irq_event(desc);

- if (!irqd_irq_disabled(&desc->irq_data) && !(desc->istate & IRQS_ONESHOT))
+ if (!irqd_irq_disabled(&desc->irq_data) &&
+ (!(desc->istate & IRQS_ONESHOT) || ret != IRQ_WAKE_THREAD))
unmask_irq(desc);
+
out_unlock:
raw_spin_unlock(&desc->lock);
}


Lothar Waßmann
--
___________________________________________________________

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

http://www.karo-electronics.de | [email protected]
___________________________________________________________

2012-02-07 13:06:37

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever

On 02/07/2012 01:52 PM, Lothar Waßmann wrote:
> Hi,
>
> Yong Zhang writes:
>> On Tue, Feb 07, 2012 at 11:01:06AM +0100, Lothar Waßmann wrote:
>>> Hi,
>>>
>>>> On Mon, Feb 06, 2012 at 09:14:47AM +0100, Lothar Waßmann wrote:
>>>>> Hi,
>>>>>
>>>>> I already sent this to <[email protected]> on Feb. 1, 2012
>>>>> but did not get any response there. So resending to a wider audience
>>>>> with improved subject line:
>>>>>
>>>>> there is a race condition in the threaded IRQ handler code for oneshot
>>>>> interrupts that may lead to disabling an IRQ indefinitely. IRQs are
>>>>> masked before calling the hard-irq handler and are unmasked only after
>>>>> the soft-irq handler has been run. Thus if the hard-irq handler
>>>>> returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
>>>>> will not be called, the interrupt will remain masked forever.
>>>>>
>>>>> This can happen due to a short pulse on the interrupt line, that
>>>>> triggers the interrupt logic, but goes undetected by the hard-irq
>>>>> handler. The problem can be reproduced with the TSC2007 touch
>>>>> controller driver that uses ONESHOT interrupts.
>>>>
>>>> Isn't it the responsibility of the driver (say TSC2007)?
>>>>
>>>> In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO.
>>>>
>>> That would mean it had to return IRQ_WAKE_THREAD unconditionally
>>> making the return code useless.
>>> And it would cause an extra useless loop through the softirq
>>> handler.
>>
>> Yeah, it's the default behavior when we introduce 'theadirqs',
>> and it's safe.
>>
> So, the correct solution would be to remove the check for
> IRQ_WAKE_THREAD in handle_irq_event_percpu() and always invoke the
> softirq handler?
> Note that this problem is not specific to the TSC2007 driver, but may
> occur with any hardware.
>
> Or maybe do the unmasking in handle_irq_event() as proposed by
> Lars-Peter Clausen in <[email protected]>?
> Like that:
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index f7c543a..fbf68c7 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -343,6 +343,8 @@ EXPORT_SYMBOL_GPL(handle_simple_irq);
> void
> handle_level_irq(unsigned int irq, struct irq_desc *desc)
> {
> + int ret;

This should be irqreturn_t

> +
> raw_spin_lock(&desc->lock);
> mask_ack_irq(desc);
>
> @@ -360,10 +362,12 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc)
> if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data)))
> goto out_unlock;
>
> - handle_irq_event(desc);
> + ret = handle_irq_event(desc);
>
> - if (!irqd_irq_disabled(&desc->irq_data) && !(desc->istate & IRQS_ONESHOT))
> + if (!irqd_irq_disabled(&desc->irq_data) &&
> + (!(desc->istate & IRQS_ONESHOT) || ret != IRQ_WAKE_THREAD))

As I said, check for the bit, not for the value. This will ensure that will
also work with shared interrupts. So something like this:

!((desc->istate & IRQS_ONESHOT) && (ret & IRQ_WAKE_THREAD)))

> unmask_irq(desc);
> +
> out_unlock:
> raw_spin_unlock(&desc->lock);
> }
>
>
> Lothar Waßmann