2023-04-05 09:58:20

by Fabrizio Lamarque

[permalink] [raw]
Subject: iio: ad7192: Pending IRQ on SDO/RDY causing retrigger of interrupt handler and missing every sample

Link: https://lore.kernel.org/all/[email protected]/

This commit broke the driver functionality, i.e. cat in_voltage1_raw
triggers a correct sampling sequence only the first time, then it
always returns the first sampled value.

Following the sequence of ad_sigma_delta_single_conversion() within
ad_sigma_delta.c, this is due to:

- IRQ resend mechanism is always enabled for ARM cores
(CONFIG_HARDIRQS_SW_RESEND)
- Edge IRQs are always made pending when a corresponding event
happens, even after disable_irq(). This is intentional and designed to
prevent missing signal edges.
- Level IRQs are not impacted by IRQ resend (i.e. IRQ_PENDING is
always cleared).
- SPI communication causes the IRQ to be set pending (even if
corresponding interrupt is disabled)
- The second time ad_sigma_delta_single_conversion() is called,
enable_irq() will trigger the interrupt immediately, even if RDY line
is high.
- In turn, this causes the call
wait_for_completion_interruptible_timeout() to exit immediately, with
RDY line still high.
- Right after the SPI register read, the MODE register is written with
AD_SD_MODE_IDLE, and pending conversion is stopped. Hence DATA
register is never updated.

I would suggest to revert this commit or set the flag with
IRQF_TRIGGER_LOW instead of IRQF_TRIGGER_FALLING, but I am not sure
about the problem solved by this commit and how to replicate it.
Another option would be to keep IRQ flags within bindings only.

As a side note, AD7192 datasheet says that the falling edge on SDO
line _can_ be used as an interrupt to processor, but it also states
that the _level_ on SDO/RDY line indicates the sample is ready. What
happens on SDO/RDY interrupt line before the ADC conversion is
triggered should be ignored.

This bug should be easy to reproduce on ADI demo boards with impacted
kernel versions, just by manually reading any input channel from
sysfs.

Fabrizio Lamarque


2023-04-05 10:37:02

by Nuno Sá

[permalink] [raw]
Subject: Re: iio: ad7192: Pending IRQ on SDO/RDY causing retrigger of interrupt handler and missing every sample

Hi Fabrizio,

Thanks for reporting this...

On Wed, 2023-04-05 at 11:53 +0200, Fabrizio Lamarque wrote:
> Link:
> https://lore.kernel.org/all/[email protected]/
>
> This commit broke the driver functionality, i.e. cat in_voltage1_raw
> triggers a correct sampling sequence only the first time, then it
> always returns the first sampled value.
>
> Following the sequence of ad_sigma_delta_single_conversion() within
> ad_sigma_delta.c, this is due to:
>
> - IRQ resend mechanism is always enabled for ARM cores
> (CONFIG_HARDIRQS_SW_RESEND)
> - Edge IRQs are always made pending when a corresponding event
> happens, even after disable_irq(). This is intentional and designed to
> prevent missing signal edges.
> - Level IRQs are not impacted by IRQ resend (i.e. IRQ_PENDING is
> always cleared).
> - SPI communication causes the IRQ to be set pending (even if
> corresponding interrupt is disabled)
> - The second time ad_sigma_delta_single_conversion() is called,
> enable_irq() will trigger the interrupt immediately, even if RDY line
> is high.
> - In turn, this causes the call
> wait_for_completion_interruptible_timeout() to exit immediately, with
> RDY line still high.
> - Right after the SPI register read, the MODE register is written with
> AD_SD_MODE_IDLE, and pending conversion is stopped. Hence DATA
> register is never updated.
>
> I would suggest to revert this commit or set the flag with
> IRQF_TRIGGER_LOW instead of IRQF_TRIGGER_FALLING, but I am not sure
> about the problem solved by this commit and how to replicate it.
> Another option would be to keep IRQ flags within bindings only.
>

I'm starting to think that what's stated in that commit 

"Leaving IRQ on level instead of falling might trigger a sample read
when the IRQ is enabled, as the SDO line is already low. "

might actually be related with something else...

> As a side note, AD7192 datasheet says that the falling edge on SDO
> line _can_ be used as an interrupt to processor, but it also states
> that the _level_ on SDO/RDY line indicates the sample is ready. What
> happens on SDO/RDY interrupt line before the ADC conversion is
> triggered should be ignored.
>

Well, from the datasheet (as you already know):

"
In addition, DOUT/RDY operates as a data ready pin, going low to indicate
the completion of a conversion. If the data is not read after the conversion,
the pin goes high before the next
update occurs. The DOUT/RDY falling edge can be used as an interrupt to a
processor, indicating that valid
data is available.
"

So I really read IRQF_TRIGGER_FALLING in the above and all the other sigma
delta devices have the same setting (I think). So the fact that it works with
level IRQs might just be lucky instead of fixing the real problem. Can you try
the below patch (I'm starting to think it might be related):


https://lore.kernel.org/linux-iio/[email protected]/


- Nuno Sá

2023-04-05 13:29:15

by Fabrizio Lamarque

[permalink] [raw]
Subject: Re: iio: ad7192: Pending IRQ on SDO/RDY causing retrigger of interrupt handler and missing every sample

On Wed, Apr 5, 2023 at 12:30 PM Nuno Sá <[email protected]> wrote:
> On Wed, 2023-04-05 at 11:53 +0200, Fabrizio Lamarque wrote:
> > Link:
> > https://lore.kernel.org/all/[email protected]/
> >
> > This commit broke the driver functionality, i.e. cat in_voltage1_raw
> > triggers a correct sampling sequence only the first time, then it
> > always returns the first sampled value.
> >
> > Following the sequence of ad_sigma_delta_single_conversion() within
> > ad_sigma_delta.c, this is due to:
> >
> > - IRQ resend mechanism is always enabled for ARM cores
> > (CONFIG_HARDIRQS_SW_RESEND)
> > - Edge IRQs are always made pending when a corresponding event
> > happens, even after disable_irq(). This is intentional and designed to
> > prevent missing signal edges.
> > - Level IRQs are not impacted by IRQ resend (i.e. IRQ_PENDING is
> > always cleared).
> > - SPI communication causes the IRQ to be set pending (even if
> > corresponding interrupt is disabled)
> > - The second time ad_sigma_delta_single_conversion() is called,
> > enable_irq() will trigger the interrupt immediately, even if RDY line
> > is high.
> > - In turn, this causes the call
> > wait_for_completion_interruptible_timeout() to exit immediately, with
> > RDY line still high.
> > - Right after the SPI register read, the MODE register is written with
> > AD_SD_MODE_IDLE, and pending conversion is stopped. Hence DATA
> > register is never updated.
> >
> > I would suggest to revert this commit or set the flag with
> > IRQF_TRIGGER_LOW instead of IRQF_TRIGGER_FALLING, but I am not sure
> > about the problem solved by this commit and how to replicate it.
> > Another option would be to keep IRQ flags within bindings only.
> >
>
> I'm starting to think that what's stated in that commit
>
> "Leaving IRQ on level instead of falling might trigger a sample read
> when the IRQ is enabled, as the SDO line is already low. "
>
> might actually be related with something else...

Hi Nuno Sá,
thank you again for your replies and the time you are spending in checking
the reported issues.
I see what you mean...
I was asking for details on the actual issue that was solved to have a better
understanding of the change.

>
> > As a side note, AD7192 datasheet says that the falling edge on SDO
> > line _can_ be used as an interrupt to processor, but it also states
> > that the _level_ on SDO/RDY line indicates the sample is ready. What
> > happens on SDO/RDY interrupt line before the ADC conversion is
> > triggered should be ignored.
> >
>
> Well, from the datasheet (as you already know):
>
> "
> In addition, DOUT/RDY operates as a data ready pin, going low to indicate
> the completion of a conversion. If the data is not read after the conversion,
> the pin goes high before the next
> update occurs. The DOUT/RDY falling edge can be used as an interrupt to a
> processor, indicating that valid
> data is available.
> "
>
> So I really read IRQF_TRIGGER_FALLING in the above and all the other sigma
> delta devices have the same setting (I think). So the fact that it works with
> level IRQs might just be lucky instead of fixing the real problem. Can you try
> the below patch (I'm starting to think it might be related):
>

We have been using those ADC series for about a decade, the shared SDO/RDY
signal has its own quirks.
We also discussed several years ago in a support ticket with ADI, both
level and edge
sensing should be acceptable, provided that limitations are understood.

>
> https://lore.kernel.org/linux-iio/[email protected]/

I just tested the patch, but, at least on the platform I'm working on
(I.MX6), it does not
solve the issue.
Whereas the thread describes the very same issue I am experiencing, I
am not sure
IRQ_DISABLE_UNLAZY would have any impact. By reading CPU registers I see
the IRQ line disabled at the hardware level, but when the IRQ flag of
the processor
is set (and this happens even if the interrupt is masked in HW), the
interrupt is immediately
triggered anyway.
(I see GPIOx_IMR cleared, so interrupt is masked, but GPIOx_ISR set. As soon as
enable_irq() is called the interrupt is triggered. Just by clearing
GPIOx_ISR before
enable_irq() solves the issue. I might share a few debug printk).

The "real issue" here is that we have a pending IRQ on the RDY line set
even if IRQ is disabled (masked) at the hardware level.
It does not happen for level sensing interrupts.

This link might further clarify why this should be intentional:
https://www.kernel.org/doc/html/latest/core-api/genericirq.html#delayed-interrupt-disable
(note that I see the IRQ masked at the hardware level anyway, but it
does not prevent
the described behavior to happen)

In case the interrupt flag has to be IRQF_TRIGGER_FALLING, it might be viable to
enable the IRQ before the measurement is requested and eventually
ignore any spurious
interrupt until the measurement is started (SPI write completed).

Fabrizio Lamarque

2023-04-05 13:54:43

by Nuno Sá

[permalink] [raw]
Subject: Re: iio: ad7192: Pending IRQ on SDO/RDY causing retrigger of interrupt handler and missing every sample

On Wed, 2023-04-05 at 15:20 +0200, Fabrizio Lamarque wrote:
> On Wed, Apr 5, 2023 at 12:30 PM Nuno Sá <[email protected]> wrote:
> > On Wed, 2023-04-05 at 11:53 +0200, Fabrizio Lamarque wrote:
> > > Link:
> > > https://lore.kernel.org/all/[email protected]/
> > >
> > > This commit broke the driver functionality, i.e. cat in_voltage1_raw
> > > triggers a correct sampling sequence only the first time, then it
> > > always returns the first sampled value.
> > >
> > > Following the sequence of ad_sigma_delta_single_conversion() within
> > > ad_sigma_delta.c, this is due to:
> > >
> > > - IRQ resend mechanism is always enabled for ARM cores
> > > (CONFIG_HARDIRQS_SW_RESEND)
> > > - Edge IRQs are always made pending when a corresponding event
> > > happens, even after disable_irq(). This is intentional and designed to
> > > prevent missing signal edges.
> > > - Level IRQs are not impacted by IRQ resend (i.e. IRQ_PENDING is
> > > always cleared).
> > > - SPI communication causes the IRQ to be set pending (even if
> > > corresponding interrupt is disabled)
> > > - The second time ad_sigma_delta_single_conversion() is called,
> > > enable_irq() will trigger the interrupt immediately, even if RDY line
> > > is high.
> > > - In turn, this causes the call
> > > wait_for_completion_interruptible_timeout() to exit immediately, with
> > > RDY line still high.
> > > - Right after the SPI register read, the MODE register is written with
> > > AD_SD_MODE_IDLE, and pending conversion is stopped. Hence DATA
> > > register is never updated.
> > >
> > > I would suggest to revert this commit or set the flag with
> > > IRQF_TRIGGER_LOW instead of IRQF_TRIGGER_FALLING, but I am not sure
> > > about the problem solved by this commit and how to replicate it.
> > > Another option would be to keep IRQ flags within bindings only.
> > >
> >
> > I'm starting to think that what's stated in that commit
> >
> > "Leaving IRQ on level instead of falling might trigger a sample read
> > when the IRQ is enabled, as the SDO line is already low. "
> >
> > might actually be related with something else...
>
> Hi Nuno Sá,
>  thank you again for your replies and the time you are spending in checking
> the reported issues.
> I see what you mean...
> I was asking for details on the actual issue that was solved to have a better
> understanding of the change.
>

Yeah, Alex is no longer in ADI so I cannot really say.

> >
> > > As a side note, AD7192 datasheet says that the falling edge on SDO
> > > line _can_ be used as an interrupt to processor, but it also states
> > > that the _level_ on SDO/RDY line indicates the sample is ready. What
> > > happens on SDO/RDY interrupt line before the ADC conversion is
> > > triggered should be ignored.
> > >
> >
> > Well, from the datasheet (as you already know):
> >
> > "
> > In addition, DOUT/RDY operates as a data ready pin, going low to indicate
> > the completion of a conversion. If the data is not read after the
> > conversion,
> > the pin goes high before the next
> > update occurs. The DOUT/RDY falling edge can be used as an interrupt to a
> > processor, indicating that valid
> > data is available.
> > "
> >
> > So I really read IRQF_TRIGGER_FALLING in the above and all the other sigma
> > delta devices have the same setting (I think). So the fact that it works
> > with
> > level IRQs might just be lucky instead of fixing the real problem. Can you
> > try
> > the below patch (I'm starting to think it might be related):
> >
>
> We have been using those ADC series for about a decade, the shared SDO/RDY
> signal has its own quirks.
> We also discussed several years ago in a support ticket with ADI, both
> level and edge
> sensing should be acceptable, provided that limitations are understood.

> >
> > https://lore.kernel.org/linux-iio/[email protected]/
>
> I just tested the patch, but, at least on the platform I'm working on
> (I.MX6), it does not
> solve the issue.
> Whereas the thread describes the very same issue I am experiencing, I
> am not sure
> IRQ_DISABLE_UNLAZY would have any impact. By reading CPU registers I see

I was expecting to have. AFAIU, the lazy approach would be the responsible for
this behavior because when you call disable_irq() it does not really mask the
IRQ in HW. Just marks it as disabled and if another IRQ event comes set's it to
PENDING and only then masks at HW level... If we "unlazy" the IRQ, we should
mask the IRQ right away so I would expect not to have any pending IRQ...

> the IRQ line disabled at the hardware level, but when the IRQ flag of
> the processor
> is set (and this happens even if the interrupt is masked in HW), the
> interrupt is immediately
> triggered anyway.
> (I see GPIOx_IMR cleared, so interrupt is masked, but GPIOx_ISR set. As soon
> as
> enable_irq() is called the interrupt is triggered. Just by clearing
> GPIOx_ISR before
> enable_irq() solves the issue. I might share a few debug printk).
>

Might be that the problem is actually in the behavior of the above controller?
As said, I would expect a masked IRQ not to be handled and set to PENDING.

> The "real issue" here is that we have a pending IRQ on the RDY line set
> even if IRQ is disabled (masked) at the hardware level.
> It does not happen for level sensing interrupts.
>
> This link might further clarify why this should be intentional:
> https://www.kernel.org/doc/html/latest/core-api/genericirq.html#delayed-interrupt-disable
> (note that I see the IRQ masked at the hardware level anyway, but it
> does not prevent
> the described behavior to happen)
>
> In case the interrupt flag has to be IRQF_TRIGGER_FALLING, it might be viable
> to
> enable the IRQ before the measurement is requested and eventually
> ignore any spurious
> interrupt until the measurement is started (SPI write completed).
>

Oh no, that looks like just going around the real problem. It would be
interesting to test this in some other platforms to see if the behavior is the
same... I guess now is the time to fully understand this. IIRC, originally
everything was set as LEVEL but since the datasheet states EDGE should be used,
Alex moved the flags to EDGE.

Maybe you're also right and we should just remove the flags and let users decide
in the bindings whatever works best for their platforms given the fact that it
appears that both LEVEL vs EDGE can be used.

Anyways,

+cc Lars since he was the original developer on the sigma delta stuff and maybe
he remembers something about this level vs edge mess.

Lars, can you shed some light :)?

- Nuno Sá

2023-04-05 14:33:49

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: iio: ad7192: Pending IRQ on SDO/RDY causing retrigger of interrupt handler and missing every sample

On 4/5/23 06:49, Nuno Sá wrote:
> On Wed, 2023-04-05 at 15:20 +0200, Fabrizio Lamarque wrote:
>> On Wed, Apr 5, 2023 at 12:30 PM Nuno Sá <[email protected]> wrote:
>>> On Wed, 2023-04-05 at 11:53 +0200, Fabrizio Lamarque wrote:
>>>> Link:
>>>> https://lore.kernel.org/all/[email protected]/
>>>>
>>>> This commit broke the driver functionality, i.e. cat in_voltage1_raw
>>>> triggers a correct sampling sequence only the first time, then it
>>>> always returns the first sampled value.
>>>>
>>>> Following the sequence of ad_sigma_delta_single_conversion() within
>>>> ad_sigma_delta.c, this is due to:
>>>>
>>>> - IRQ resend mechanism is always enabled for ARM cores
>>>> (CONFIG_HARDIRQS_SW_RESEND)
>>>> - Edge IRQs are always made pending when a corresponding event
>>>> happens, even after disable_irq(). This is intentional and designed to
>>>> prevent missing signal edges.
>>>> - Level IRQs are not impacted by IRQ resend (i.e. IRQ_PENDING is
>>>> always cleared).
>>>> - SPI communication causes the IRQ to be set pending (even if
>>>> corresponding interrupt is disabled)
>>>> - The second time ad_sigma_delta_single_conversion() is called,
>>>> enable_irq() will trigger the interrupt immediately, even if RDY line
>>>> is high.
>>>> - In turn, this causes the call
>>>> wait_for_completion_interruptible_timeout() to exit immediately, with
>>>> RDY line still high.
>>>> - Right after the SPI register read, the MODE register is written with
>>>> AD_SD_MODE_IDLE, and pending conversion is stopped. Hence DATA
>>>> register is never updated.
>>>>
>>>> I would suggest to revert this commit or set the flag with
>>>> IRQF_TRIGGER_LOW instead of IRQF_TRIGGER_FALLING, but I am not sure
>>>> about the problem solved by this commit and how to replicate it.
>>>> Another option would be to keep IRQ flags within bindings only.
>>>>
>>> I'm starting to think that what's stated in that commit
>>>
>>> "Leaving IRQ on level instead of falling might trigger a sample read
>>> when the IRQ is enabled, as the SDO line is already low. "
>>>
>>> might actually be related with something else...
>> Hi Nuno Sá,
>>  thank you again for your replies and the time you are spending in checking
>> the reported issues.
>> I see what you mean...
>> I was asking for details on the actual issue that was solved to have a better
>> understanding of the change.
>>
> Yeah, Alex is no longer in ADI so I cannot really say.
>
>>>> As a side note, AD7192 datasheet says that the falling edge on SDO
>>>> line _can_ be used as an interrupt to processor, but it also states
>>>> that the _level_ on SDO/RDY line indicates the sample is ready. What
>>>> happens on SDO/RDY interrupt line before the ADC conversion is
>>>> triggered should be ignored.
>>>>
>>> Well, from the datasheet (as you already know):
>>>
>>> "
>>> In addition, DOUT/RDY operates as a data ready pin, going low to indicate
>>> the completion of a conversion. If the data is not read after the
>>> conversion,
>>> the pin goes high before the next
>>> update occurs. The DOUT/RDY falling edge can be used as an interrupt to a
>>> processor, indicating that valid
>>> data is available.
>>> "
>>>
>>> So I really read IRQF_TRIGGER_FALLING in the above and all the other sigma
>>> delta devices have the same setting (I think). So the fact that it works
>>> with
>>> level IRQs might just be lucky instead of fixing the real problem. Can you
>>> try
>>> the below patch (I'm starting to think it might be related):
>>>
>> We have been using those ADC series for about a decade, the shared SDO/RDY
>> signal has its own quirks.
>> We also discussed several years ago in a support ticket with ADI, both
>> level and edge
>> sensing should be acceptable, provided that limitations are understood.
>>> https://lore.kernel.org/linux-iio/[email protected]/
>> I just tested the patch, but, at least on the platform I'm working on
>> (I.MX6), it does not
>> solve the issue.
>> Whereas the thread describes the very same issue I am experiencing, I
>> am not sure
>> IRQ_DISABLE_UNLAZY would have any impact. By reading CPU registers I see
> I was expecting to have. AFAIU, the lazy approach would be the responsible for
> this behavior because when you call disable_irq() it does not really mask the
> IRQ in HW. Just marks it as disabled and if another IRQ event comes set's it to
> PENDING and only then masks at HW level... If we "unlazy" the IRQ, we should
> mask the IRQ right away so I would expect not to have any pending IRQ...
>
>> the IRQ line disabled at the hardware level, but when the IRQ flag of
>> the processor
>> is set (and this happens even if the interrupt is masked in HW), the
>> interrupt is immediately
>> triggered anyway.
>> (I see GPIOx_IMR cleared, so interrupt is masked, but GPIOx_ISR set. As soon
>> as
>> enable_irq() is called the interrupt is triggered. Just by clearing
>> GPIOx_ISR before
>> enable_irq() solves the issue. I might share a few debug printk).
>>
> Might be that the problem is actually in the behavior of the above controller?
> As said, I would expect a masked IRQ not to be handled and set to PENDING.
>
>> The "real issue" here is that we have a pending IRQ on the RDY line set
>> even if IRQ is disabled (masked) at the hardware level.
>> It does not happen for level sensing interrupts.
>>
>> This link might further clarify why this should be intentional:
>> https://www.kernel.org/doc/html/latest/core-api/genericirq.html#delayed-interrupt-disable
>> (note that I see the IRQ masked at the hardware level anyway, but it
>> does not prevent
>> the described behavior to happen)
>>
>> In case the interrupt flag has to be IRQF_TRIGGER_FALLING, it might be viable
>> to
>> enable the IRQ before the measurement is requested and eventually
>> ignore any spurious
>> interrupt until the measurement is started (SPI write completed).
>>
> Oh no, that looks like just going around the real problem. It would be
> interesting to test this in some other platforms to see if the behavior is the
> same... I guess now is the time to fully understand this. IIRC, originally
> everything was set as LEVEL but since the datasheet states EDGE should be used,
> Alex moved the flags to EDGE.
>
> Maybe you're also right and we should just remove the flags and let users decide
> in the bindings whatever works best for their platforms given the fact that it
> appears that both LEVEL vs EDGE can be used.
>
> Anyways,
>
> +cc Lars since he was the original developer on the sigma delta stuff and maybe
> he remembers something about this level vs edge mess.
>
> Lars, can you shed some light :)?
I don't remember. But the driver was written with the assumption that
irq_disable() will stop recording of IRQ events. If that does not hold
we might have to switch from irq_enable/irq_disable to
request_irq/free_irq, which will come with a slight overhead.


2023-04-05 14:42:59

by Lars-Peter Clausen

[permalink] [raw]
Subject: Re: iio: ad7192: Pending IRQ on SDO/RDY causing retrigger of interrupt handler and missing every sample

On 4/5/23 06:49, Nuno Sá wrote:
> [...]
> I just tested the patch, but, at least on the platform I'm working on
> (I.MX6), it does not
> solve the issue.
> Whereas the thread describes the very same issue I am experiencing, I
> am not sure
> IRQ_DISABLE_UNLAZY would have any impact. By reading CPU registers I see
> I was expecting to have. AFAIU, the lazy approach would be the responsible for
> this behavior because when you call disable_irq() it does not really mask the
> IRQ in HW. Just marks it as disabled and if another IRQ event comes set's it to
> PENDING and only then masks at HW level... If we "unlazy" the IRQ, we should
> mask the IRQ right away so I would expect not to have any pending IRQ...
>
>> the IRQ line disabled at the hardware level, but when the IRQ flag of
>> the processor
>> is set (and this happens even if the interrupt is masked in HW), the
>> interrupt is immediately
>> triggered anyway.
>> (I see GPIOx_IMR cleared, so interrupt is masked, but GPIOx_ISR set. As soon
>> as
>> enable_irq() is called the interrupt is triggered. Just by clearing
>> GPIOx_ISR before
>> enable_irq() solves the issue. I might share a few debug printk).

This sounds to me that the GPIO driver should implement an
`irq_enable()` callback where it clears the pending bits before
unmasking. This is consistent with what other GPIO IRQ drivers do.

2023-04-07 08:57:25

by Fabrizio Lamarque

[permalink] [raw]
Subject: Re: iio: ad7192: Pending IRQ on SDO/RDY causing retrigger of interrupt handler and missing every sample

On Wed, Apr 5, 2023 at 4:35 PM Lars-Peter Clausen <[email protected]> wrote:
>
> On 4/5/23 06:49, Nuno Sá wrote:
> > [...]
> > I just tested the patch, but, at least on the platform I'm working on
> > (I.MX6), it does not
> > solve the issue.
> > Whereas the thread describes the very same issue I am experiencing, I
> > am not sure
> > IRQ_DISABLE_UNLAZY would have any impact. By reading CPU registers I see
> > I was expecting to have. AFAIU, the lazy approach would be the responsible for
> > this behavior because when you call disable_irq() it does not really mask the
> > IRQ in HW. Just marks it as disabled and if another IRQ event comes set's it to
> > PENDING and only then masks at HW level... If we "unlazy" the IRQ, we should
> > mask the IRQ right away so I would expect not to have any pending IRQ...
> >
> >> the IRQ line disabled at the hardware level, but when the IRQ flag of
> >> the processor
> >> is set (and this happens even if the interrupt is masked in HW), the
> >> interrupt is immediately
> >> triggered anyway.
> >> (I see GPIOx_IMR cleared, so interrupt is masked, but GPIOx_ISR set. As soon
> >> as
> >> enable_irq() is called the interrupt is triggered. Just by clearing
> >> GPIOx_ISR before
> >> enable_irq() solves the issue. I might share a few debug printk).
>
> This sounds to me that the GPIO driver should implement an
> `irq_enable()` callback where it clears the pending bits before
> unmasking. This is consistent with what other GPIO IRQ drivers do.
>

Hello Lars,
thank you for your comments and for having a look at this.

I am not really confident in dissecting how ARM interrupt management
and NXP specific features. However, he "Delayed Interrupt" feature in
kernel documentation (specific to ARM interrupt implementation),
"prevents losing edge interrupts on hardware which does not store an
edge interrupt event while the interrupt is disabled at the hardware
level". This feature is always enabled for any ARM core.
Indeed, NXP hardware _stores_ the GPIO edge interrupt when the IRQ is
disabled. While not explicitly written in kernel docs, to my limited
understanding enable_irq() should not clear the pending edge IRQ
(eventually occurred when interrupt was disabled).
I do not see anything that clearly does not conform to documentation
in how IRQs are being managed.

> the driver was written with the assumption that
> irq_disable() will stop recording of IRQ events. If that does not hold
> we might have to switch from irq_enable/irq_disable to
> request_irq/free_irq, which will come with a slight overhead.

request_irq/free_irq should clear the interrupt flags.
However, since this change might have side effects on other ADI ADC
drivers and trigger functionality, I'd be unable to provide a valid
and verified patch (but I could test a proposed patch on my platform).

In order to give you a better understanding of the issue, I am
providing a simple log here.

Here are the GPIO registers involved (only relevant bits are masked):

ICR = Edge sensitivity (3 = FALLING)
ISR = Interrupt status register (1 = DETECTED) independent of IMR
IMR = Interrupt mask register, 1 = ENABLED, 0 = MASKED
STATUS = IRQ status/config register, IRQ_DISABLE_UNLAZY (20th bit) is set
STATE = internal interrupt state, IRQS_PENDING (10th bit) is never set
COMPLETION = done flag of completion

Here is what happens within ad_sigma_delta_single_conversion() function:

# cat in_voltage1_raw <-- First conversion
[ 1001.540803] ad_sigma_delta_single_conversion() entry
[ 1001.540837] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1001.545849] ad_sigma_delta_set_channel() before ad_sigma_delta_set_channel()
[ 1001.569456] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1001.569700] ad_sigma_delta_set_channel() before ad_sigma_delta_set_mode()
[ 1001.586455] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1001.595279] ad_sigma_delta_set_channel() before enable_irq()
[ 1001.604869] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1001.612409] ad_sigma_delta_set_channel() before
wait_for_completion_interruptible_timeout()
[ 1001.621859] ICR = 3, IMR = 1, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1001.997637] ad_sd_data_rdy_trig_poll() entry
[ 1002.005523] ad_sd_data_rdy_trig_poll() before disable_irq_nosync()
[ 1002.009805] ad_sd_data_rdy_trig_poll() before iio_trigger_poll()
[ 1002.015994] ad_sd_data_rdy_trig_poll() exit
[ 1002.026725] ad_sigma_delta_set_channel() before ad_sd_read_reg()
[ 1002.031136] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1002.037382] ad_sigma_delta_set_channel() before ad_sigma_delta_set_mode()
[ 1002.048902] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1002.057685] ad_sigma_delta_set_channel() exit
[ 1002.067249] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
8304626

# cat in_voltage1_raw <-- Any following conversion
[ 1003.968784] ad_sigma_delta_single_conversion() entry
[ 1003.971193] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1003.976224] ad_sigma_delta_set_channel() before ad_sigma_delta_set_channel()
[ 1003.989727] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1003.998723] ad_sigma_delta_set_channel() before ad_sigma_delta_set_mode()
[ 1004.008524] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1004.017512] ad_sigma_delta_set_channel() before enable_irq()
[ 1004.027031] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1004.034409] ad_sd_data_rdy_trig_poll() entry
[ 1004.042280] ad_sd_data_rdy_trig_poll() before disable_irq_nosync()
[ 1004.046560] ad_sd_data_rdy_trig_poll() before iio_trigger_poll()
[ 1004.052747] ad_sd_data_rdy_trig_poll() exit
[ 1004.063603] ad_sigma_delta_set_channel() before
wait_for_completion_interruptible_timeout()
[ 1004.067981] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 1
[ 1004.079159] ad_sigma_delta_set_channel() before ad_sd_read_reg()
[ 1004.088709] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1004.096793] ad_sigma_delta_set_channel() before ad_sigma_delta_set_mode()
[ 1004.106339] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1004.115121] ad_sigma_delta_set_channel() exit
[ 1004.124737] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
8304626

The first time there is a delay on wait_for_completion_interruptible_timeout().
In the second throw the interrupt function is called as soon as
enable_irq() is executed, completion is already set before entering
the wait state.
You may see the ISR register gets set again after ad_sd_read_reg(),
and this is the only relevant difference between the two sequences.

Unless you have any other suggestions, I will send three simpler
patches relevant to this driver, and leave to you the changes required
to overcome the issue described here.
In any case, I am available to test it on my side.


Fabrizio

2023-04-18 11:36:38

by Nuno Sá

[permalink] [raw]
Subject: Re: iio: ad7192: Pending IRQ on SDO/RDY causing retrigger of interrupt handler and missing every sample

On Fri, 2023-04-07 at 10:51 +0200, Fabrizio Lamarque wrote:
> On Wed, Apr 5, 2023 at 4:35 PM Lars-Peter Clausen <[email protected]> wrote:
> >
> > On 4/5/23 06:49, Nuno Sá wrote:
> > > [...]
> > > I just tested the patch, but, at least on the platform I'm working on
> > > (I.MX6), it does not
> > > solve the issue.
> > > Whereas the thread describes the very same issue I am experiencing, I
> > > am not sure
> > > IRQ_DISABLE_UNLAZY would have any impact. By reading CPU registers I see
> > > I was expecting to have. AFAIU, the lazy approach would be the responsible
> > > for
> > > this behavior because when you call disable_irq() it does not really mask
> > > the
> > > IRQ in HW. Just marks it as disabled and if another IRQ event comes set's
> > > it to
> > > PENDING and only then masks at HW level... If we "unlazy" the IRQ, we
> > > should
> > > mask the IRQ right away so I would expect not to have any pending IRQ...
> > >
> > > > the IRQ line disabled at the hardware level, but when the IRQ flag of
> > > > the processor
> > > > is set (and this happens even if the interrupt is masked in HW), the
> > > > interrupt is immediately
> > > > triggered anyway.
> > > > (I see GPIOx_IMR cleared, so interrupt is masked, but GPIOx_ISR set. As
> > > > soon
> > > > as
> > > > enable_irq() is called the interrupt is triggered. Just by clearing
> > > > GPIOx_ISR before
> > > > enable_irq() solves the issue. I might share a few debug printk).
> >
> > This sounds to me that the GPIO driver should implement an
> > `irq_enable()` callback where it clears the pending bits before
> > unmasking. This is consistent with what other GPIO IRQ drivers do.
> >
>
> Hello Lars,
>  thank you for your comments and for having a look at this.
>
> I am not really confident in dissecting how ARM interrupt management
> and NXP specific features. However, he "Delayed Interrupt" feature in
> kernel documentation (specific to ARM interrupt implementation),
> "prevents losing edge interrupts on hardware which does not store an
> edge interrupt event while the interrupt is disabled at the hardware
> level". This feature is always enabled for any ARM core.
> Indeed, NXP hardware _stores_ the GPIO edge interrupt when the IRQ is
> disabled. While not explicitly written in kernel docs, to my limited
> understanding enable_irq() should not clear the pending edge IRQ
> (eventually occurred when interrupt was disabled).
> I do not see anything that clearly does not conform to documentation
> in how IRQs are being managed.
>

So I'm still not convinced that's the right behavior. If we set
IRQ_DISABLE_UNLAZY, then I would expect that line to be masked/disabled as
soon as disable_irq() is called. Then, I would expect that no IRQ is saved
for a disabled/masked line or we should (I guess) at least be capable of
distinguish between "we want to get interrupts while processing one" or
"we really want to disable the line".

Anyways, I'm not really sure about this and it would be interesting to give
this a try in another platform.

> > the driver was written with the assumption that
> > irq_disable() will stop recording of IRQ events. If that does not hold
> > we might have to switch from irq_enable/irq_disable to
> > request_irq/free_irq, which will come with a slight overhead.
>
> request_irq/free_irq should clear the interrupt flags.
> However, since this change might have side effects on other ADI ADC
> drivers and trigger functionality, I'd be unable to provide a valid
> and verified patch (but I could test a proposed patch on my platform).
>
>

I would prefer the bindings approach. AFAIR, we should not really impose any
flags (possibly overwriting what was described in dts) when requesting the
IRQ. Hence, dropping 'irq_flags' would actually make sense. 

But this brings the question if anyone was relying on it and not properly
setting the flags in devicetree.


- Nuno Sá