2020-05-27 15:37:17

by Frieder Schrempf

[permalink] [raw]
Subject: High interrupt latency with low power idle mode on i.MX6

Hi,

on our i.MX6UL/ULL boards running mainline kernels, we see an issue with
RS485 collisions on the bus. These are caused by the resetting of the
RTS signal being delayed after each transmission. The TXDC interrupt
takes several milliseconds to trigger and the slave on the bus already
starts to send a reply in the meantime.

We found out that these delays only happen when the CPU is in "low power
idle" mode (ARM power off). When we disable cpuidle state 2 or put some
background load on the CPU everything works fine and the delays are gone.

echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state2/disable

It seems like also other interfaces (I2C, etc.) might be affected by
these increased latencies, we haven't investigated this more closely,
though.

We currently apply a patch to our kernel, that disables low power idle
mode by default, but I'm wondering if there's a way to fix this
properly? Any ideas?

Thanks,
Frieder


2020-05-27 15:51:57

by Frieder Schrempf

[permalink] [raw]
Subject: Re: High interrupt latency with low power idle mode on i.MX6

On 27.05.20 13:53, Russell King - ARM Linux admin wrote:
> On Wed, May 27, 2020 at 10:39:12AM +0000, Schrempf Frieder wrote:
>> Hi,
>>
>> on our i.MX6UL/ULL boards running mainline kernels, we see an issue with
>> RS485 collisions on the bus. These are caused by the resetting of the
>> RTS signal being delayed after each transmission. The TXDC interrupt
>> takes several milliseconds to trigger and the slave on the bus already
>> starts to send a reply in the meantime.
>>
>> We found out that these delays only happen when the CPU is in "low power
>> idle" mode (ARM power off). When we disable cpuidle state 2 or put some
>> background load on the CPU everything works fine and the delays are gone.
>>
>> echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state2/disable
>>
>> It seems like also other interfaces (I2C, etc.) might be affected by
>> these increased latencies, we haven't investigated this more closely,
>> though.
>>
>> We currently apply a patch to our kernel, that disables low power idle
>> mode by default, but I'm wondering if there's a way to fix this
>> properly? Any ideas?
>
> Let's examine a basic fact about power management:
>
> The deeper PM modes that the system enters, the higher the latency to
> resume operation.
>
> So, I'm not surprised that you have higher latency when you allow the
> system to enter lower power modes. Does that mean that the kernel
> should not permit entering lower power modes - no, it's policy and
> application dependent.
>
> If the hardware is designed to use software to manage the RTS signal
> to control the RS485 receiver, then I'm afraid that your report really
> does not surprise me - throwing that at software to manage is a really
> stupid idea, but it seems lots of people do this. I've held this view
> since I worked on a safety critical system that used RS485 back in the
> 1990s (London Underground Jubilee Line Extension public address system.)
>
> So, what we have here is several things that come together to create a
> problem:
>
> 1) higher power savings produce higher latency to resume from
> 2) lack of hardware support for RS485 half duplex communication needing
> software support
> 3) an application that makes use of RS485 half duplex communication
> without disabling the higher latency power saving modes
>
> The question is, who should disable those higher latency power saving
> modes - the kernel, or userspace?
>
> The kernel knows whether it needs to provide software control of the
> RTS signal or not, but the kernel does not know the maximum permissible
> latency (which is application specific.) So, the kernel doesn't have
> all the information it needs. However, there is a QoS subsystem which
> may help you.
>
> There's also tweaks available via
> /sys/devices/system/cpu/cpu*/power/pm_qos_resume_latency_us
>
> which can be poked to configure the latency that is required, and will
> prevent the deeper PM states being entered.

Thanks for the detailed explanation. This all makes perfect sense to me.
I will keep in mind that we need to consider this aspect of power saving
vs. latency when designing systems and also that we need to provide the
information for the kernel to decide which of the two is more important.

Also thanks for pointing out the QoS subsystem. I'm not quite sure if it
would work for us to use pm_qos_resume_latency_us in our specific case.
The actual latency we observe is something like 2 to 3 milliseconds
longer with low power idle than without, but the exit_latency for low
power idle specified in the cpuidle driver is only 300 us.

So as far as I can see with this difference even if we would set
pm_qos_resume_latency_us to 1000 us (which should be fast enough for the
RS485 to work properly), the low power idle wouldn't be disabled.

It's rather this discrepancy between the latency set in the driver and
what we see in reality which makes me wonder if there's something I'm
missing.

2020-05-27 15:52:53

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: High interrupt latency with low power idle mode on i.MX6

On Wed, May 27, 2020 at 12:50:01PM +0000, Schrempf Frieder wrote:
> On 27.05.20 13:53, Russell King - ARM Linux admin wrote:
> > On Wed, May 27, 2020 at 10:39:12AM +0000, Schrempf Frieder wrote:
> >> Hi,
> >>
> >> on our i.MX6UL/ULL boards running mainline kernels, we see an issue with
> >> RS485 collisions on the bus. These are caused by the resetting of the
> >> RTS signal being delayed after each transmission. The TXDC interrupt
> >> takes several milliseconds to trigger and the slave on the bus already
> >> starts to send a reply in the meantime.
> >>
> >> We found out that these delays only happen when the CPU is in "low power
> >> idle" mode (ARM power off). When we disable cpuidle state 2 or put some
> >> background load on the CPU everything works fine and the delays are gone.
> >>
> >> echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state2/disable
> >>
> >> It seems like also other interfaces (I2C, etc.) might be affected by
> >> these increased latencies, we haven't investigated this more closely,
> >> though.
> >>
> >> We currently apply a patch to our kernel, that disables low power idle
> >> mode by default, but I'm wondering if there's a way to fix this
> >> properly? Any ideas?
> >
> > Let's examine a basic fact about power management:
> >
> > The deeper PM modes that the system enters, the higher the latency to
> > resume operation.
> >
> > So, I'm not surprised that you have higher latency when you allow the
> > system to enter lower power modes. Does that mean that the kernel
> > should not permit entering lower power modes - no, it's policy and
> > application dependent.
> >
> > If the hardware is designed to use software to manage the RTS signal
> > to control the RS485 receiver, then I'm afraid that your report really
> > does not surprise me - throwing that at software to manage is a really
> > stupid idea, but it seems lots of people do this. I've held this view
> > since I worked on a safety critical system that used RS485 back in the
> > 1990s (London Underground Jubilee Line Extension public address system.)
> >
> > So, what we have here is several things that come together to create a
> > problem:
> >
> > 1) higher power savings produce higher latency to resume from
> > 2) lack of hardware support for RS485 half duplex communication needing
> > software support
> > 3) an application that makes use of RS485 half duplex communication
> > without disabling the higher latency power saving modes
> >
> > The question is, who should disable those higher latency power saving
> > modes - the kernel, or userspace?
> >
> > The kernel knows whether it needs to provide software control of the
> > RTS signal or not, but the kernel does not know the maximum permissible
> > latency (which is application specific.) So, the kernel doesn't have
> > all the information it needs. However, there is a QoS subsystem which
> > may help you.
> >
> > There's also tweaks available via
> > /sys/devices/system/cpu/cpu*/power/pm_qos_resume_latency_us
> >
> > which can be poked to configure the latency that is required, and will
> > prevent the deeper PM states being entered.
>
> Thanks for the detailed explanation. This all makes perfect sense to me.
> I will keep in mind that we need to consider this aspect of power saving
> vs. latency when designing systems and also that we need to provide the
> information for the kernel to decide which of the two is more important.
>
> Also thanks for pointing out the QoS subsystem. I'm not quite sure if it
> would work for us to use pm_qos_resume_latency_us in our specific case.
> The actual latency we observe is something like 2 to 3 milliseconds
> longer with low power idle than without, but the exit_latency for low
> power idle specified in the cpuidle driver is only 300 us.

I wonder whether the exit latencies are correct in that case.
From the comments, it seems 80us is allowed for the software overhead
of entering/leaving the idle state vs 220us for the hardware.
It may be a good idea for someone to add some tracing points in there
to try and measure the minimum software latencies.

> So as far as I can see with this difference even if we would set
> pm_qos_resume_latency_us to 1000 us (which should be fast enough for the
> RS485 to work properly), the low power idle wouldn't be disabled.
>
> It's rather this discrepancy between the latency set in the driver and
> what we see in reality which makes me wonder if there's something I'm
> missing.

It's possible that there's something missing from the kernel's
estimation of the latency required for entering / exiting those
states.

There is an amount of cache flushing that is required when entering
those lower states, and I wonder if that has been accounted for.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-27 17:25:29

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: High interrupt latency with low power idle mode on i.MX6

On Wed, May 27, 2020 at 10:39:12AM +0000, Schrempf Frieder wrote:
> Hi,
>
> on our i.MX6UL/ULL boards running mainline kernels, we see an issue with
> RS485 collisions on the bus. These are caused by the resetting of the
> RTS signal being delayed after each transmission. The TXDC interrupt
> takes several milliseconds to trigger and the slave on the bus already
> starts to send a reply in the meantime.
>
> We found out that these delays only happen when the CPU is in "low power
> idle" mode (ARM power off). When we disable cpuidle state 2 or put some
> background load on the CPU everything works fine and the delays are gone.
>
> echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state2/disable
>
> It seems like also other interfaces (I2C, etc.) might be affected by
> these increased latencies, we haven't investigated this more closely,
> though.
>
> We currently apply a patch to our kernel, that disables low power idle
> mode by default, but I'm wondering if there's a way to fix this
> properly? Any ideas?

Let's examine a basic fact about power management:

The deeper PM modes that the system enters, the higher the latency to
resume operation.

So, I'm not surprised that you have higher latency when you allow the
system to enter lower power modes. Does that mean that the kernel
should not permit entering lower power modes - no, it's policy and
application dependent.

If the hardware is designed to use software to manage the RTS signal
to control the RS485 receiver, then I'm afraid that your report really
does not surprise me - throwing that at software to manage is a really
stupid idea, but it seems lots of people do this. I've held this view
since I worked on a safety critical system that used RS485 back in the
1990s (London Underground Jubilee Line Extension public address system.)

So, what we have here is several things that come together to create a
problem:

1) higher power savings produce higher latency to resume from
2) lack of hardware support for RS485 half duplex communication needing
software support
3) an application that makes use of RS485 half duplex communication
without disabling the higher latency power saving modes

The question is, who should disable those higher latency power saving
modes - the kernel, or userspace?

The kernel knows whether it needs to provide software control of the
RTS signal or not, but the kernel does not know the maximum permissible
latency (which is application specific.) So, the kernel doesn't have
all the information it needs. However, there is a QoS subsystem which
may help you.

There's also tweaks available via
/sys/devices/system/cpu/cpu*/power/pm_qos_resume_latency_us

which can be poked to configure the latency that is required, and will
prevent the deeper PM states being entered.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up