2015-08-25 19:59:04

by Felipe Balbi

[permalink] [raw]
Subject: CONFIG_DEBUG_SHIRQ and PM

Hi Ingo,

I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
devm_request_*irq().

If we using devm_request_*irq(), that irq will be freed after device
drivers' ->remove() gets called. If on ->remove(), we're calling
pm_runtime_put_sync(); pm_runtime_disable(), device's clocks might get
gated and, because we do an extra call to the device's IRQ handler when
CONFIG_DEBUG_SHIRQ=y, we might trigger an abort exception if, inside the
IRQ handler, we try to read a register which is clocked by the device's
clock.

This is, of course, really old code which has been in tree for many,
many years. I guess nobody has been running their tests in the setup
mentioned above (CONFIG_DEBUG_SHIRQ=y, pm_runtime_put_sync() on
->remove(), a register read on IRQ handler, and a shared IRQ handler),
so that's why we never caught this before.

Disabling CONFIG_DEBUG_SHIRQ, of course, makes the problem go away, but
if driver *must* be ready to receive, and handle, an IRQ even during
module removal, I wonder what the IRQ handler should do. We can't, in
most cases, call pm_runtime_put_sync() from IRQ handler.

I'm guessing the only way would be to move pm_runtime calls into the bus
driver (in this case, the platform_bus) and make sure it only gets
called after device managed resources are freed ?

Any hints would be greatly appreciated.

cheers

--
balbi


Attachments:
(No filename) (1.34 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-08-26 19:29:56

by Ezequiel Garcia

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_SHIRQ and PM

Felipe,

On 25 August 2015 at 16:58, Felipe Balbi <[email protected]> wrote:
> Hi Ingo,
>
> I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
> devm_request_*irq().
>

I may be jumping on the gun here, but I believe here's your problem.
Using devm_request_irq with shared IRQs is not a good idea.

Or at least, it forces you to handle interrupts after your device
is _completely_ removed (e.g. your IRQ cookie could be bogus).

AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple
spurious interrupts, that are expected to happen anyway.

Your IRQ is shared, and so you may get any IRQ at any time,
coming from another device (not yours).

So, if I'm right, my suggestion is simple: use request_irq/free_irq
and free your IRQ before you disable your clocks, remove your device, etc.

> If we using devm_request_*irq(), that irq will be freed after device
> drivers' ->remove() gets called. If on ->remove(), we're calling
> pm_runtime_put_sync(); pm_runtime_disable(), device's clocks might get
> gated and, because we do an extra call to the device's IRQ handler when
> CONFIG_DEBUG_SHIRQ=y, we might trigger an abort exception if, inside the
> IRQ handler, we try to read a register which is clocked by the device's
> clock.
>
> This is, of course, really old code which has been in tree for many,
> many years. I guess nobody has been running their tests in the setup
> mentioned above (CONFIG_DEBUG_SHIRQ=y, pm_runtime_put_sync() on
> ->remove(), a register read on IRQ handler, and a shared IRQ handler),
> so that's why we never caught this before.
>
> Disabling CONFIG_DEBUG_SHIRQ, of course, makes the problem go away, but
> if driver *must* be ready to receive, and handle, an IRQ even during
> module removal, I wonder what the IRQ handler should do. We can't, in
> most cases, call pm_runtime_put_sync() from IRQ handler.
>
> I'm guessing the only way would be to move pm_runtime calls into the bus
> driver (in this case, the platform_bus) and make sure it only gets
> called after device managed resources are freed ?
>
> Any hints would be greatly appreciated.
>

Hope it helps!
--
Ezequiel García, VanguardiaSur
http://www.vanguardiasur.com.ar

2015-08-26 19:38:53

by Felipe Balbi

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_SHIRQ and PM

Hi,

On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote:
> Felipe,
>
> On 25 August 2015 at 16:58, Felipe Balbi <[email protected]> wrote:
> > Hi Ingo,
> >
> > I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
> > devm_request_*irq().
> >
>
> I may be jumping on the gun here, but I believe here's your problem.
> Using devm_request_irq with shared IRQs is not a good idea.
>
> Or at least, it forces you to handle interrupts after your device
> is _completely_ removed (e.g. your IRQ cookie could be bogus).
>
> AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple
> spurious interrupts, that are expected to happen anyway.
>
> Your IRQ is shared, and so you may get any IRQ at any time,
> coming from another device (not yours).
>
> So, if I'm right, my suggestion is simple: use request_irq/free_irq
> and free your IRQ before you disable your clocks, remove your device,
> etc.

yeah, that's just a workaround though. Specially with IRQ flags coming
from DT, driver might have no knowledge that its IRQ is shared to start
with.

Besides, removing devm_* is just a workaround to the real problem. It
seems, to me at least, that drivers shouldn't be calling
pm_runtime_put_sync(); pm_runtime_disable() from their ->remove(),
rather the bus driver should be responsible for doing so; much
usb_bus_type and pci_bus_type do. Of course, this has the side effect of
requiring buses to make sure that by the time ->probe() is called, that
device's clocks are stable and running and the device is active.

However, that's not done today for the platform_bus_type and, frankly,
that would be somewhat of a complex problem to solve, considering that
different SoCs integrate IPs the way they want.

Personally, I think removing devm_* is but a workaround to the real
thing we're dealing with.

--
balbi


Attachments:
(No filename) (1.80 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-08-26 19:53:31

by Ezequiel Garcia

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_SHIRQ and PM

On 26 August 2015 at 16:38, Felipe Balbi <[email protected]> wrote:
> Hi,
>
> On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote:
>> Felipe,
>>
>> On 25 August 2015 at 16:58, Felipe Balbi <[email protected]> wrote:
>> > Hi Ingo,
>> >
>> > I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
>> > devm_request_*irq().
>> >
>>
>> I may be jumping on the gun here, but I believe here's your problem.
>> Using devm_request_irq with shared IRQs is not a good idea.
>>
>> Or at least, it forces you to handle interrupts after your device
>> is _completely_ removed (e.g. your IRQ cookie could be bogus).
>>
>> AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple
>> spurious interrupts, that are expected to happen anyway.
>>
>> Your IRQ is shared, and so you may get any IRQ at any time,
>> coming from another device (not yours).
>>
>> So, if I'm right, my suggestion is simple: use request_irq/free_irq
>> and free your IRQ before you disable your clocks, remove your device,
>> etc.
>
> yeah, that's just a workaround though. Specially with IRQ flags coming
> from DT, driver might have no knowledge that its IRQ is shared to start
> with.
>

Really? Is there any way the DT can set the IRQF_SHARED flag?
The caller of devm_request_irq / request_irq needs to pass the irqflags,
so I'd say the driver is perfectly aware of the IRQ being shared or not.

Maybe you can clarify what I'm missing here.

> Besides, removing devm_* is just a workaround to the real problem. It
> seems, to me at least, that drivers shouldn't be calling
> pm_runtime_put_sync(); pm_runtime_disable() from their ->remove(),
> rather the bus driver should be responsible for doing so; much
> usb_bus_type and pci_bus_type do. Of course, this has the side effect of
> requiring buses to make sure that by the time ->probe() is called, that
> device's clocks are stable and running and the device is active.
>
> However, that's not done today for the platform_bus_type and, frankly,
> that would be somewhat of a complex problem to solve, considering that
> different SoCs integrate IPs the way they want.
>
> Personally, I think removing devm_* is but a workaround to the real
> thing we're dealing with.
>

I don't see any problems here: if your interrupt is shared, then you must
be prepared to handle it any time, coming from any sources (not only
your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to
make sure all the drivers passing IRQF_SHARED comply with that rule.

So you either avoid using devm_request_irq, or you prepare your handler
accordingly to be ready to handle an interrupt _any time_.

--
Ezequiel García, VanguardiaSur
http://www.vanguardiasur.com.ar

2015-08-26 20:03:34

by Felipe Balbi

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_SHIRQ and PM

On Wed, Aug 26, 2015 at 04:53:27PM -0300, Ezequiel Garcia wrote:
> On 26 August 2015 at 16:38, Felipe Balbi <[email protected]> wrote:
> > Hi,
> >
> > On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote:
> >> Felipe,
> >>
> >> On 25 August 2015 at 16:58, Felipe Balbi <[email protected]> wrote:
> >> > Hi Ingo,
> >> >
> >> > I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
> >> > devm_request_*irq().
> >> >
> >>
> >> I may be jumping on the gun here, but I believe here's your problem.
> >> Using devm_request_irq with shared IRQs is not a good idea.
> >>
> >> Or at least, it forces you to handle interrupts after your device
> >> is _completely_ removed (e.g. your IRQ cookie could be bogus).
> >>
> >> AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple
> >> spurious interrupts, that are expected to happen anyway.
> >>
> >> Your IRQ is shared, and so you may get any IRQ at any time,
> >> coming from another device (not yours).
> >>
> >> So, if I'm right, my suggestion is simple: use request_irq/free_irq
> >> and free your IRQ before you disable your clocks, remove your device,
> >> etc.
> >
> > yeah, that's just a workaround though. Specially with IRQ flags coming
> > from DT, driver might have no knowledge that its IRQ is shared to start
> > with.
> >
>
> Really? Is there any way the DT can set the IRQF_SHARED flag?
> The caller of devm_request_irq / request_irq needs to pass the irqflags,
> so I'd say the driver is perfectly aware of the IRQ being shared or not.
>
> Maybe you can clarify what I'm missing here.

hmm, that's true. Now that I look at it, DT can pass triggering flags.

> > Besides, removing devm_* is just a workaround to the real problem. It
> > seems, to me at least, that drivers shouldn't be calling
> > pm_runtime_put_sync(); pm_runtime_disable() from their ->remove(),
> > rather the bus driver should be responsible for doing so; much
> > usb_bus_type and pci_bus_type do. Of course, this has the side effect of
> > requiring buses to make sure that by the time ->probe() is called, that
> > device's clocks are stable and running and the device is active.
> >
> > However, that's not done today for the platform_bus_type and, frankly,
> > that would be somewhat of a complex problem to solve, considering that
> > different SoCs integrate IPs the way they want.
> >
> > Personally, I think removing devm_* is but a workaround to the real
> > thing we're dealing with.
> >
>
> I don't see any problems here: if your interrupt is shared, then you must
> be prepared to handle it any time, coming from any sources (not only
> your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to
> make sure all the drivers passing IRQF_SHARED comply with that rule.

you need to be sure of that with non-shared IRQs anyway. Also, an IRQ
which isn't shared in SoC A, might become shared in SoC B which uses the
same IP.

> So you either avoid using devm_request_irq, or you prepare your handler
> accordingly to be ready to handle an interrupt _any time_.

the handler is ready to handle at any time, what isn't correct is the
fact that clocks get gated before IRQ is freed.

There should be no such special case as "if your handler is shared,
don't use devm_request_*irq()" because if we just disable PM_RUNTIME, it
works as expected anyway.

--
balbi


Attachments:
(No filename) (3.26 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-08-26 20:15:56

by Ezequiel Garcia

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_SHIRQ and PM

On 26 August 2015 at 17:03, Felipe Balbi <[email protected]> wrote:
> On Wed, Aug 26, 2015 at 04:53:27PM -0300, Ezequiel Garcia wrote:
>> On 26 August 2015 at 16:38, Felipe Balbi <[email protected]> wrote:
>> > Hi,
>> >
>> > On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote:
>> >> Felipe,
>> >>
>> >> On 25 August 2015 at 16:58, Felipe Balbi <[email protected]> wrote:
>> >> > Hi Ingo,
>> >> >
>> >> > I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
>> >> > devm_request_*irq().
>> >> >
>> >>
>> >> I may be jumping on the gun here, but I believe here's your problem.
>> >> Using devm_request_irq with shared IRQs is not a good idea.
>> >>
>> >> Or at least, it forces you to handle interrupts after your device
>> >> is _completely_ removed (e.g. your IRQ cookie could be bogus).
>> >>
>> >> AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple
>> >> spurious interrupts, that are expected to happen anyway.
>> >>
>> >> Your IRQ is shared, and so you may get any IRQ at any time,
>> >> coming from another device (not yours).
>> >>
>> >> So, if I'm right, my suggestion is simple: use request_irq/free_irq
>> >> and free your IRQ before you disable your clocks, remove your device,
>> >> etc.
>> >
>> > yeah, that's just a workaround though. Specially with IRQ flags coming
>> > from DT, driver might have no knowledge that its IRQ is shared to start
>> > with.
>> >
>>
>> Really? Is there any way the DT can set the IRQF_SHARED flag?
>> The caller of devm_request_irq / request_irq needs to pass the irqflags,
>> so I'd say the driver is perfectly aware of the IRQ being shared or not.
>>
>> Maybe you can clarify what I'm missing here.
>
> hmm, that's true. Now that I look at it, DT can pass triggering flags.
>
>> > Besides, removing devm_* is just a workaround to the real problem. It
>> > seems, to me at least, that drivers shouldn't be calling
>> > pm_runtime_put_sync(); pm_runtime_disable() from their ->remove(),
>> > rather the bus driver should be responsible for doing so; much
>> > usb_bus_type and pci_bus_type do. Of course, this has the side effect of
>> > requiring buses to make sure that by the time ->probe() is called, that
>> > device's clocks are stable and running and the device is active.
>> >
>> > However, that's not done today for the platform_bus_type and, frankly,
>> > that would be somewhat of a complex problem to solve, considering that
>> > different SoCs integrate IPs the way they want.
>> >
>> > Personally, I think removing devm_* is but a workaround to the real
>> > thing we're dealing with.
>> >
>>
>> I don't see any problems here: if your interrupt is shared, then you must
>> be prepared to handle it any time, coming from any sources (not only
>> your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to
>> make sure all the drivers passing IRQF_SHARED comply with that rule.
>
> you need to be sure of that with non-shared IRQs anyway.

Not entirely. If your IRQ is not shared, then you usually have a register
to enable or unmask your peripheral interrupts. So the driver is in control
of when it will get interrupts.

If the IRQ is shared, this won't do. This is what I mean by "shared IRQs
must be prepared to receive an interrupt any time", in the sense that
the driver has no way of preventing IRQs (because they may be
coming from any source).

In the same sense, shared IRQs handlers need to double-check
the IRQ is coming to the current device by checking some IRQ
status register to see if there's pending work.

> Also, an IRQ
> which isn't shared in SoC A, might become shared in SoC B which uses the
> same IP.
>
>> So you either avoid using devm_request_irq, or you prepare your handler
>> accordingly to be ready to handle an interrupt _any time_.
>
> the handler is ready to handle at any time, what isn't correct is the
> fact that clocks get gated before IRQ is freed.
>
> There should be no such special case as "if your handler is shared,
> don't use devm_request_*irq()" because if we just disable PM_RUNTIME, it
> works as expected anyway.
>

Yeah, I meant to say: if you use devm_request_irq with IRQF_SHARED
then you _must_ be prepared to get an IRQ *after* your remove() has
been called.

Let's consider this snippet from tw68:

static irqreturn_t tw68_irq(int irq, void *dev_id)
{
struct tw68_dev *dev = dev_id;
u32 status, orig;
int loop;

status = orig = tw_readl(TW68_INTSTAT) & dev->pci_irqmask;
[etc]
}

The IRQ handler accesses the device struct and then
reads through PCI. So if you use devm_request_irq
you need to make sure the device struct is still allocated
after remove(), and the PCI read won't stall or crash.

Interestingly, tw68 uses devm_request_irq with IRQF_SHARED :-)

Still, I don't think that's a good idea, since it relies on
the IRQ being freed *before* the device struct.
--
Ezequiel García, VanguardiaSur
http://www.vanguardiasur.com.ar

2015-08-26 20:25:04

by Felipe Balbi

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_SHIRQ and PM

Hi,

On Wed, Aug 26, 2015 at 05:15:51PM -0300, Ezequiel Garcia wrote:

<snip>

> >> be prepared to handle it any time, coming from any sources (not only
> >> your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to
> >> make sure all the drivers passing IRQF_SHARED comply with that rule.
> >
> > you need to be sure of that with non-shared IRQs anyway.
>
> Not entirely. If your IRQ is not shared, then you usually have a register
> to enable or unmask your peripheral interrupts. So the driver is in control
> of when it will get interrupts.
>
> If the IRQ is shared, this won't do. This is what I mean by "shared IRQs
> must be prepared to receive an interrupt any time", in the sense that
> the driver has no way of preventing IRQs (because they may be
> coming from any source).

right, the problem is much less likely on non-shared lines but the fine
that a line is shared or not is a function of HW integration, not the
e.g. USB Controller, so that knowledge really doesn't fit the driver in
a sense.

We might as well get rid of IRQF_SHARED and assume all lines are
shareable.

> In the same sense, shared IRQs handlers need to double-check
> the IRQ is coming to the current device by checking some IRQ
> status register to see if there's pending work.

you should the status register even on non-shared IRQs to catch spurious
right ?

> > Also, an IRQ
> > which isn't shared in SoC A, might become shared in SoC B which uses the
> > same IP.
> >
> >> So you either avoid using devm_request_irq, or you prepare your handler
> >> accordingly to be ready to handle an interrupt _any time_.
> >
> > the handler is ready to handle at any time, what isn't correct is the
> > fact that clocks get gated before IRQ is freed.
> >
> > There should be no such special case as "if your handler is shared,
> > don't use devm_request_*irq()" because if we just disable PM_RUNTIME, it
> > works as expected anyway.
> >
>
> Yeah, I meant to say: if you use devm_request_irq with IRQF_SHARED
> then you _must_ be prepared to get an IRQ *after* your remove() has
> been called.
>
> Let's consider this snippet from tw68:
>
> static irqreturn_t tw68_irq(int irq, void *dev_id)
> {
> struct tw68_dev *dev = dev_id;
> u32 status, orig;
> int loop;
>
> status = orig = tw_readl(TW68_INTSTAT) & dev->pci_irqmask;

Now try to read that register when your clock is gated. That's the
problem I'm talking about. Everything about the handler is functioning
correctly; however clocks are gated in ->remove() and free_irq() is
only called *AFTER* ->remove() has returned.

> [etc]
> }
>
> The IRQ handler accesses the device struct and then
> reads through PCI. So if you use devm_request_irq
> you need to make sure the device struct is still allocated
> after remove(), and the PCI read won't stall or crash.

dude, that's not the problem I'm talking about. I still have my
private_data around, what I don't have is:

_ _
__ _ ___| | ___ ___| | __
/ _` | / __| |/ _ \ / __| |/ /
| (_| | | (__| | (_) | (__| <
\__,_| \___|_|\___/ \___|_|\_\


> Interestingly, tw68 uses devm_request_irq with IRQF_SHARED :-)
>
> Still, I don't think that's a good idea, since it relies on
> the IRQ being freed *before* the device struct.

that's not an issue at all. If you're using devm_request_irq() you're
likely using devm_kzalloc() for the device struct anyway. Also, you
called devm_kzalloc() before devm_request_irq() so IRQ *will* be freed
before your private data; there's nothing wrong there.

--
balbi


Attachments:
(No filename) (3.53 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-08-26 20:36:28

by Ezequiel Garcia

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_SHIRQ and PM

On 26 August 2015 at 17:24, Felipe Balbi <[email protected]> wrote:
[..]
>>
>> static irqreturn_t tw68_irq(int irq, void *dev_id)
>> {
>> struct tw68_dev *dev = dev_id;
>> u32 status, orig;
>> int loop;
>>
>> status = orig = tw_readl(TW68_INTSTAT) & dev->pci_irqmask;
>
> Now try to read that register when your clock is gated. That's the
> problem I'm talking about. Everything about the handler is functioning
> correctly; however clocks are gated in ->remove() and free_irq() is
> only called *AFTER* ->remove() has returned.
>

Yeah, it's pretty clear you are talking about clocks here. That's
why I said "read won't stall" in the next paragraph.

>> [etc]
>> }
>>
>> The IRQ handler accesses the device struct and then
>> reads through PCI. So if you use devm_request_irq
>> you need to make sure the device struct is still allocated
>> after remove(), and the PCI read won't stall or crash.
>
> dude, that's not the problem I'm talking about. I still have my
> private_data around, what I don't have is:
>
> _ _
> __ _ ___| | ___ ___| | __
> / _` | / __| |/ _ \ / __| |/ /
> | (_| | | (__| | (_) | (__| <
> \__,_| \___|_|\___/ \___|_|\_\
>
>

Yes, *you* may have your private data around and have a clock gated,
others (the tw68 for instance) may have its region released and unmapped.

And yet others may have $whatever resource released in the
remove() and assume it's available in the IRQ handler.

I honestly can't think why using request_irq / free_irq to solve this
is a workaround.

>> Interestingly, tw68 uses devm_request_irq with IRQF_SHARED :-)
>>
>> Still, I don't think that's a good idea, since it relies on
>> the IRQ being freed *before* the device struct.
>
> that's not an issue at all. If you're using devm_request_irq() you're
> likely using devm_kzalloc() for the device struct anyway. Also, you
> called devm_kzalloc() before devm_request_irq() so IRQ *will* be freed
> before your private data; there's nothing wrong there.
>

--
Ezequiel García, VanguardiaSur
http://www.vanguardiasur.com.ar

2015-08-27 13:02:57

by Felipe Balbi

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_SHIRQ and PM

On Wed, Aug 26, 2015 at 05:36:24PM -0300, Ezequiel Garcia wrote:
> On 26 August 2015 at 17:24, Felipe Balbi <[email protected]> wrote:
> [..]
> >>
> >> static irqreturn_t tw68_irq(int irq, void *dev_id)
> >> {
> >> struct tw68_dev *dev = dev_id;
> >> u32 status, orig;
> >> int loop;
> >>
> >> status = orig = tw_readl(TW68_INTSTAT) & dev->pci_irqmask;
> >
> > Now try to read that register when your clock is gated. That's the
> > problem I'm talking about. Everything about the handler is functioning
> > correctly; however clocks are gated in ->remove() and free_irq() is
> > only called *AFTER* ->remove() has returned.
> >
>
> Yeah, it's pretty clear you are talking about clocks here. That's
> why I said "read won't stall" in the next paragraph.
>
> >> [etc]
> >> }
> >>
> >> The IRQ handler accesses the device struct and then
> >> reads through PCI. So if you use devm_request_irq
> >> you need to make sure the device struct is still allocated
> >> after remove(), and the PCI read won't stall or crash.
> >
> > dude, that's not the problem I'm talking about. I still have my
> > private_data around, what I don't have is:
> >
> > _ _
> > __ _ ___| | ___ ___| | __
> > / _` | / __| |/ _ \ / __| |/ /
> > | (_| | | (__| | (_) | (__| <
> > \__,_| \___|_|\___/ \___|_|\_\
> >
> >
>
> Yes, *you* may have your private data around and have a clock gated,
> others (the tw68 for instance) may have its region released and unmapped.
>
> And yet others may have $whatever resource released in the
> remove() and assume it's available in the IRQ handler.
>
> I honestly can't think why using request_irq / free_irq to solve this
> is a workaround.

because it'll, eventually, boil down to not using devm_* at all and
that's pretty stupid.

--
balbi


Attachments:
(No filename) (1.79 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-08-28 19:43:03

by Thomas Gleixner

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_SHIRQ and PM

On Tue, 25 Aug 2015, Felipe Balbi wrote:
> Hi Ingo,

Thanks for not cc'ing the irq maintainer ....

> I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
> devm_request_*irq().
>
> If we using devm_request_*irq(), that irq will be freed after device
> drivers' ->remove() gets called. If on ->remove(), we're calling
> pm_runtime_put_sync(); pm_runtime_disable(), device's clocks might get
> gated and, because we do an extra call to the device's IRQ handler when
> CONFIG_DEBUG_SHIRQ=y, we might trigger an abort exception if, inside the
> IRQ handler, we try to read a register which is clocked by the device's
> clock.
>
> This is, of course, really old code which has been in tree for many,
> many years. I guess nobody has been running their tests in the setup
> mentioned above (CONFIG_DEBUG_SHIRQ=y, pm_runtime_put_sync() on
> ->remove(), a register read on IRQ handler, and a shared IRQ handler),
> so that's why we never caught this before.
>
> Disabling CONFIG_DEBUG_SHIRQ, of course, makes the problem go away, but
> if driver *must* be ready to receive, and handle, an IRQ even during
> module removal, I wonder what the IRQ handler should do. We can't, in
> most cases, call pm_runtime_put_sync() from IRQ handler.

Well, a shared interrupt handler must handle this situation, no matter
what. Assume the following:

irqreturn_t dev_irq(int irq, void *data)
{
struct devdata *dd = data;
u32 state;

state = readl(dd->base);
...
}

void module_exit(void)
{
/* Write to the device interrupt register */
disable_device_irq(dd->base);
/*
* After this point the device does not longer
* raise an interrupt
*/
iounmap(dd->base);
free_irq();

If the other device which shares the interrupt line raises an
interrupt after the unmap and before free_irq() removed the device
handler from the irq, the machine is toast, because the dev_irq
handler is still called.

If the handler is shut down after critical parts of the driver/device
are shut down, then you can

- either can change the setup/teardown ordering

disable_device_irq(dd->base);
free_irq();
iounmap(dd->base);

- or have a proper flag in the private data which tells the interrupt
handler to sod off.

irqreturn_t dev_irq(int irq, void *data)
{
struct devdata *dd = data;

if (dd->shutdown)
return IRQ_NONE;
...

void module_exit(void)
{
disable_device_irq(dd->base);
dd->shutdown = 1;

/* On an SMP machine you also need: */
synchronize_irq(dd->irq);

So for the problem at hand, the devm magic needs to make sure that the
crucial parts are still alive when the devm allocated irq is released.

I have no idea how that runtime PM stuff is integrated into devm (I
fear not at all), so it's hard to give you a proper advise on that.

Thanks,

tglx