2018-04-19 01:33:41

by Imre Deak

[permalink] [raw]
Subject: Early timeouts due to inaccurate jiffies during system suspend/resume

Hi,

while checking bug [1], I noticed that jiffies based timing loops like

expire = jiffies + timeout + 1;
while (!time_after(jiffies, expire))
do_something;

can last shorter than expected (that is less than timeout). This happens
at least on an Intel Geminilake system when running the timing loop from
a driver's system suspend and resume hooks. To me it looks like expire
above is calculated with a stale jiffies value at the beginning and then
jiffies is updated - corresponding to the actual current time - with a >1
increment in the middle of the loop causing an early expiry.

With the following simplified testcase ran from a driver's suspend/resume
hooks

cpu = raw_smp_processor_id();
cpu_clock_start = cpu_clock(cpu); // 1.
jiffies_start = jiffies; // 2.

usleep_range(200, 200);

jiffies_end = jiffies; // 3.
cpu_clock_end = cpu_clock(cpu); // 4.

jiffies_delta = jiffies_end - jiffies_start;
cpu_clock_delta = cpu_clock_end - cpu_clock_start;

if (jiffies_to_nsecs(jiffies_delta) >
cpu_clock_delta + jiffies_to_nsecs(1))
pr_info("cpu %d jiffies-delta %llu ns (%llu->%llu) cpu_clock-delta %llu ns (%llu -> %llu)\n",
cpu,
jiffies_to_nsecs(jiffies_delta), jiffies_start, jiffies_end,
cpu_clock_delta, cpu_clock_start, cpu_clock_end);



and doing suspend/resume to mem cycles in a loop, I can trigger the
following:

[ 42.415713] cpu 1 jiffies-delta 11000000 ns (4294709700->4294709711) cpu_clock-delta 215738 ns (42415489466 -> 42415705204)

So according to jiffies the delay was 11ms while according to cpu_clock()
it was ~216usec. I have CONFIG_HZ=1000, so AFAIU - due to the ordering of
1.,2.,3.,4. - cpu_clock-delta should be bigger than jiffies-delta
minus 1ms.

Are the above timing loops/assumptions incorrect?

After some ftracing it seems like jiffies gets stale due to a missed
LAPIC timer interrupt after the interrupt is armed in
lapic_next_deadline() and before jiffies is sampled at 2. above.
Eventually the interrupt does get delivered, at which point jiffies gets
updated via tick_do_update_jiffies64() with a >1 ticks increment.
Between lapic_next_deadline() and the - late - delivery of the interrupt
the CPU on which the interrupt is armed doesn't go idle.

Booting with nolapic_timer I couldn't yet trigger the problem.

I'm still trying to do a bisect, without high hopes, since triggering
the problem can take rather long and I suspect this could also be some
HW issue.

Any idea what could go wrong or how to debug this further?

I attached my dmesg.

Thanks,
Imre

[1] https://bugs.freedesktop.org/show_bug.cgi?id=105771


Attachments:
(No filename) (2.77 kB)
dmesg.txt (75.57 kB)
Download all attachments

2018-04-19 11:07:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Early timeouts due to inaccurate jiffies during system suspend/resume

On Thu, 19 Apr 2018, Imre Deak wrote:
> Hi,
>
> while checking bug [1], I noticed that jiffies based timing loops like
>
> expire = jiffies + timeout + 1;
> while (!time_after(jiffies, expire))
> do_something;
>
> can last shorter than expected (that is less than timeout).

Yes, that can happen when the timer interrupt is delayed long enough for
whatever reason. If you need accurate timing then you need to use
ktime_get().

> After some ftracing it seems like jiffies gets stale due to a missed
> LAPIC timer interrupt after the interrupt is armed in
> lapic_next_deadline() and before jiffies is sampled at 2. above.
> Eventually the interrupt does get delivered, at which point jiffies gets
> updated via tick_do_update_jiffies64() with a >1 ticks increment.
> Between lapic_next_deadline() and the - late - delivery of the interrupt
> the CPU on which the interrupt is armed doesn't go idle.

That's odd. I have no real explanation for that.

Thanks,

tglx

2018-04-23 17:02:58

by Imre Deak

[permalink] [raw]
Subject: Re: Early timeouts due to inaccurate jiffies during system suspend/resume

On Thu, Apr 19, 2018 at 01:05:39PM +0200, Thomas Gleixner wrote:
> On Thu, 19 Apr 2018, Imre Deak wrote:
> > Hi,
> >
> > while checking bug [1], I noticed that jiffies based timing loops like
> >
> > expire = jiffies + timeout + 1;
> > while (!time_after(jiffies, expire))
> > do_something;
> >
> > can last shorter than expected (that is less than timeout).
>
> Yes, that can happen when the timer interrupt is delayed long enough for
> whatever reason. If you need accurate timing then you need to use
> ktime_get().

Thanks. I always regarded jiffies as non-accurate, but something that
gives a minimum time delay guarantee (when adjusted by +1 as above). I
wonder if there are other callers in kernel that don't expect an early
timeout.

We switched now to using ktime_get_raw() in the i915 driver.

>
> > After some ftracing it seems like jiffies gets stale due to a missed
> > LAPIC timer interrupt after the interrupt is armed in
> > lapic_next_deadline() and before jiffies is sampled at 2. above.
> > Eventually the interrupt does get delivered, at which point jiffies gets
> > updated via tick_do_update_jiffies64() with a >1 ticks increment.
> > Between lapic_next_deadline() and the - late - delivery of the interrupt
> > the CPU on which the interrupt is armed doesn't go idle.
>
> That's odd. I have no real explanation for that.

Looks like the reason is IRQ latency. For reference here are the
longest ones I found with irqsoff ftracing, all running with IRQs disabled
during system resume:

hpet_rtc_interrupt()->hpet_rtc_timer_reinit():
do { ... } while(!hpet_cnt_ahead(...));
takes sometimes up to ~40msec for me.

hpet_rtc_interrupt()->mc146818_get_time():
if (mc146818_is_updating()) mdelay(20);

driver_probe_device->atkbd_connect()->i8042_port_close()->__i8042_command()->i8042_wait_write():
takes sometimes up to ~10msec for me.

All the above paired with asynchronous calling of the drivers' resume
hooks may result in the jumps in jiffies I saw.

--Imre

2018-04-24 14:09:08

by Imre Deak

[permalink] [raw]
Subject: Re: Early timeouts due to inaccurate jiffies during system suspend/resume

On Mon, Apr 23, 2018 at 08:01:28PM +0300, Imre Deak wrote:
> On Thu, Apr 19, 2018 at 01:05:39PM +0200, Thomas Gleixner wrote:
> > On Thu, 19 Apr 2018, Imre Deak wrote:
> > > Hi,
> > >
> > > while checking bug [1], I noticed that jiffies based timing loops like
> > >
> > > expire = jiffies + timeout + 1;
> > > while (!time_after(jiffies, expire))
> > > do_something;
> > >
> > > can last shorter than expected (that is less than timeout).
> >
> > Yes, that can happen when the timer interrupt is delayed long enough for
> > whatever reason. If you need accurate timing then you need to use
> > ktime_get().
>
> Thanks. I always regarded jiffies as non-accurate, but something that
> gives a minimum time delay guarantee (when adjusted by +1 as above). I
> wonder if there are other callers in kernel that don't expect an early
> timeout.

msleep and any other schedule_timeout based waits are also affected. At the
same time for example msleep's documentation says:
"msleep - sleep safely even with waitqueue interruptions".

To me that suggests a wait with a minimum guaranteed delay.

Ville had an idea to make the behavior more deterministic by clamping
the jiffies increment to 1 for each timer interrupt. Would that work?

>
> We switched now to using ktime_get_raw() in the i915 driver.
>
> >
> > > After some ftracing it seems like jiffies gets stale due to a missed
> > > LAPIC timer interrupt after the interrupt is armed in
> > > lapic_next_deadline() and before jiffies is sampled at 2. above.
> > > Eventually the interrupt does get delivered, at which point jiffies gets
> > > updated via tick_do_update_jiffies64() with a >1 ticks increment.
> > > Between lapic_next_deadline() and the - late - delivery of the interrupt
> > > the CPU on which the interrupt is armed doesn't go idle.
> >
> > That's odd. I have no real explanation for that.
>
> Looks like the reason is IRQ latency. For reference here are the
> longest ones I found with irqsoff ftracing, all running with IRQs disabled
> during system resume:
>
> hpet_rtc_interrupt()->hpet_rtc_timer_reinit():
> do { ... } while(!hpet_cnt_ahead(...));
> takes sometimes up to ~40msec for me.
>
> hpet_rtc_interrupt()->mc146818_get_time():
> if (mc146818_is_updating()) mdelay(20);
>
> driver_probe_device->atkbd_connect()->i8042_port_close()->__i8042_command()->i8042_wait_write():
> takes sometimes up to ~10msec for me.
>
> All the above paired with asynchronous calling of the drivers' resume
> hooks may result in the jumps in jiffies I saw.
>
> --Imre

2018-04-24 14:23:58

by Ville Syrjälä

[permalink] [raw]
Subject: Re: Early timeouts due to inaccurate jiffies during system suspend/resume

On Tue, Apr 24, 2018 at 05:07:41PM +0300, Imre Deak wrote:
> On Mon, Apr 23, 2018 at 08:01:28PM +0300, Imre Deak wrote:
> > On Thu, Apr 19, 2018 at 01:05:39PM +0200, Thomas Gleixner wrote:
> > > On Thu, 19 Apr 2018, Imre Deak wrote:
> > > > Hi,
> > > >
> > > > while checking bug [1], I noticed that jiffies based timing loops like
> > > >
> > > > expire = jiffies + timeout + 1;
> > > > while (!time_after(jiffies, expire))
> > > > do_something;
> > > >
> > > > can last shorter than expected (that is less than timeout).
> > >
> > > Yes, that can happen when the timer interrupt is delayed long enough for
> > > whatever reason. If you need accurate timing then you need to use
> > > ktime_get().
> >
> > Thanks. I always regarded jiffies as non-accurate, but something that
> > gives a minimum time delay guarantee (when adjusted by +1 as above). I
> > wonder if there are other callers in kernel that don't expect an early
> > timeout.
>
> msleep and any other schedule_timeout based waits are also affected. At the
> same time for example msleep's documentation says:
> "msleep - sleep safely even with waitqueue interruptions".
>
> To me that suggests a wait with a minimum guaranteed delay.
>
> Ville had an idea to make the behavior more deterministic by clamping
> the jiffies increment to 1 for each timer interrupt. Would that work?

Another observation is that this is basically the same problem
that we had with the drm vblank counter. I solved that by introducing
drm_accurate_vblank_count() which makes sure the counter is up to
date before sampling it. Then we can safely do stuff like:

count = drm_accurate_vblank_count();
while (drm_vblank_count() == count)
...;

As long as we don't lose all vblank interrupts that will work and never
complete prematurely. And we still allow the vblank counter to increment
by >1.

I suppose doing something similar for jiffies would be nice as well,
but I'm not sure how feasible that would be. At the very least it
would involve patching a lot of code.

>
> >
> > We switched now to using ktime_get_raw() in the i915 driver.
> >
> > >
> > > > After some ftracing it seems like jiffies gets stale due to a missed
> > > > LAPIC timer interrupt after the interrupt is armed in
> > > > lapic_next_deadline() and before jiffies is sampled at 2. above.
> > > > Eventually the interrupt does get delivered, at which point jiffies gets
> > > > updated via tick_do_update_jiffies64() with a >1 ticks increment.
> > > > Between lapic_next_deadline() and the - late - delivery of the interrupt
> > > > the CPU on which the interrupt is armed doesn't go idle.
> > >
> > > That's odd. I have no real explanation for that.
> >
> > Looks like the reason is IRQ latency. For reference here are the
> > longest ones I found with irqsoff ftracing, all running with IRQs disabled
> > during system resume:
> >
> > hpet_rtc_interrupt()->hpet_rtc_timer_reinit():
> > do { ... } while(!hpet_cnt_ahead(...));
> > takes sometimes up to ~40msec for me.
> >
> > hpet_rtc_interrupt()->mc146818_get_time():
> > if (mc146818_is_updating()) mdelay(20);
> >
> > driver_probe_device->atkbd_connect()->i8042_port_close()->__i8042_command()->i8042_wait_write():
> > takes sometimes up to ~10msec for me.
> >
> > All the above paired with asynchronous calling of the drivers' resume
> > hooks may result in the jumps in jiffies I saw.
> >
> > --Imre

--
Ville Syrj?l?
Intel

2018-04-26 21:42:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Early timeouts due to inaccurate jiffies during system suspend/resume

On Tue, 24 Apr 2018, Imre Deak wrote:
> On Mon, Apr 23, 2018 at 08:01:28PM +0300, Imre Deak wrote:
> > On Thu, Apr 19, 2018 at 01:05:39PM +0200, Thomas Gleixner wrote:
> > > On Thu, 19 Apr 2018, Imre Deak wrote:
> > > > Hi,
> > > >
> > > > while checking bug [1], I noticed that jiffies based timing loops like
> > > >
> > > > expire = jiffies + timeout + 1;
> > > > while (!time_after(jiffies, expire))
> > > > do_something;
> > > >
> > > > can last shorter than expected (that is less than timeout).
> > >
> > > Yes, that can happen when the timer interrupt is delayed long enough for
> > > whatever reason. If you need accurate timing then you need to use
> > > ktime_get().
> >
> > Thanks. I always regarded jiffies as non-accurate, but something that
> > gives a minimum time delay guarantee (when adjusted by +1 as above). I
> > wonder if there are other callers in kernel that don't expect an early
> > timeout.
>
> msleep and any other schedule_timeout based waits are also affected. At the
> same time for example msleep's documentation says:
> "msleep - sleep safely even with waitqueue interruptions".
>
> To me that suggests a wait with a minimum guaranteed delay.

Kinda :) The problem with jiffies is that it's a software maintained
counter which depends on interrupt delivery. Contrary to hardware based
counters which just work (most of the time at least).

> Ville had an idea to make the behavior more deterministic by clamping
> the jiffies increment to 1 for each timer interrupt. Would that work?

In theory, but there is the problem with NOHZ. NOHZ idle allows the CPU to
sleep for more than 1 jiffie in order to safe power by not waking up just
to increment jiffies and go back to sleep. So we need to push jiffies
forward when the system was completely idle for some time. We already make
sure that jiffies are updated on interrupt entry from idle before any code
relying on them is run.

Now for the weird case where interrupts get delayed awfully long, the right
answer is to break these long interrupt disabled sections. Anything which
holds interrupts disabled longer than a couple of microseconds is broken.

Thanks,

tglx