2018-07-02 09:47:48

by Dongjiu Geng

[permalink] [raw]
Subject: hrtimer become inaccurate with RT patch

Hi Thomas/Anna/John,

Recently I found that the hrtimer become inaccurate when there is a RT
process runs on the same cpu core, and the kernel has applied preempt_rt
patch.
The Linux kernel version is v4.1.46, and the preempt_rt patch is
patch-4.1.46-rt52.patch.
I know that in the preempt_rt environment the interrupt handlers no
longer run in interrupt context but in process context, so that RT
process will not be interrupt. But if the hrtimer is also runs in
process context the timer is useless when it's inaccurate. so I want to
consult you whether this is expected behavior? whether is reasonable to move the timer IRQ
handling to a thread?

Check the patch-4.1.46-rt52.patch will found in function
'hrtimer_interrupt' the modify below:

@@ -1296,7 +1539,10 @@ retry:
if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer))
break;

- __run_hrtimer(timer, &basenow);
+ if (!hrtimer_rt_defer(timer))
+ __run_hrtimer(timer, &basenow);
+ else
+ raise = 1;
}
}
/* Reevaluate the clock bases for the next expiry */

@@ -1357,6 +1603,9 @@ retry:
tick_program_event(expires_next, 1);
printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
ktime_to_ns(delta));
+out:
+ if (raise)
+ raise_softirq_irqoff(HRTIMER_SOFTIRQ);
}

I think this is why hrtimer is run as a thread, I tried to set
hrtimer.irqsafe to 1, but the timer still seemed not right. Could anyone
give some advise? Thanks.



Subject: Re: hrtimer become inaccurate with RT patch

On 2018-07-02 17:34:34 [+0800], gengdongjiu wrote:
> The Linux kernel version is v4.1.46, and the preempt_rt patch is
> patch-4.1.46-rt52.patch.

the 4.1 series is no longer supported (neither RT wise nor non-RT,
https://www.kernel.org/category/releases.html). I suggest to move away.
If you notice this problem now it is hardly a long running project.

> process will not be interrupt. But if the hrtimer is also runs in
> process context the timer is useless when it's inaccurate. so I want to
> consult you whether this is expected behavior? whether is reasonable to move the timer IRQ
> handling to a thread?

This depends on your expectations. The timer is defined not to fire
before the programmed time. So it fires as soon as possible _after_ the
programmed time.

> I think this is why hrtimer is run as a thread, I tried to set
> hrtimer.irqsafe to 1, but the timer still seemed not right. Could anyone
> give some advise? Thanks.

By setting irqsafe to 1 you ensure taht the timer will fire from the
timer interrupt and before doing so you should ensure that the timer is
indeed IRQ safe.
Depending on what you do it is possible that the timer fires early but
the application notices it later (the scheduler will first handle RT
tasks according to their priorities followed by non RT tasks).

Sebastian

2018-07-02 11:49:55

by Dongjiu Geng

[permalink] [raw]
Subject: Re: hrtimer become inaccurate with RT patch

Hi Sebastian ,
Thanks for the answer.

On 2018/7/2 18:14, Sebastian Andrzej Siewior wrote:
> On 2018-07-02 17:34:34 [+0800], gengdongjiu wrote:
>> The Linux kernel version is v4.1.46, and the preempt_rt patch is
>> patch-4.1.46-rt52.patch.
>
> the 4.1 series is no longer supported (neither RT wise nor non-RT,
> https://www.kernel.org/category/releases.html). I suggest to move away.
> If you notice this problem now it is hardly a long running project.
yes, I Know, but we found the latest RT 4.14 series also has the same problem,
so this is common issue.

>
>> process will not be interrupt. But if the hrtimer is also runs in
>> process context the timer is useless when it's inaccurate. so I want to
>> consult you whether this is expected behavior? whether is reasonable to move the timer IRQ
>> handling to a thread?
>
> This depends on your expectations. The timer is defined not to fire
> before the programmed time. So it fires as soon as possible _after_ the
> programmed time.
It is reasonable that the timer is defined not to fire before the programmed time.
but we found it fires long _after_ the programmed time. For example, we define it to
fire after 2s, but it will fire after 5s, so it is very later than the expectations. I think the reason may be
that the timer handler thread is preempted by another higher priority thread. so from for this issue,
the timer handler should be in IRQ context instead of the process context or increase the timer handler thread priority, right?

>
>> I think this is why hrtimer is run as a thread, I tried to set
>> hrtimer.irqsafe to 1, but the timer still seemed not right. Could anyone
>> give some advise? Thanks.
>
> By setting irqsafe to 1 you ensure taht the timer will fire from the
> timer interrupt and before doing so you should ensure that the timer is
> indeed IRQ safe.
> Depending on what you do it is possible that the timer fires early but
> the application notices it later (the scheduler will first handle RT
> tasks according to their priorities followed by non RT tasks).

(I will continue check this comments, thanks)

>
> Sebastian
>
> .
>


2018-07-02 19:26:58

by John Stultz

[permalink] [raw]
Subject: Re: hrtimer become inaccurate with RT patch

On Mon, Jul 2, 2018 at 2:34 AM, gengdongjiu <[email protected]> wrote:
> Hi Thomas/Anna/John,
>
> Recently I found that the hrtimer become inaccurate when there is a RT
> process runs on the same cpu core, and the kernel has applied preempt_rt
> patch.
> The Linux kernel version is v4.1.46, and the preempt_rt patch is
> patch-4.1.46-rt52.patch.
> I know that in the preempt_rt environment the interrupt handlers no
> longer run in interrupt context but in process context, so that RT
> process will not be interrupt. But if the hrtimer is also runs in
> process context the timer is useless when it's inaccurate. so I want to
> consult you whether this is expected behavior? whether is reasonable to move the timer IRQ
> handling to a thread?

I've not looked at the PREEMPT_RT code in a long time, but years ago
there was a tension in that there is not an easy way track ownership
of timers. Thus timers all fired at the same priority of the hrtimer
irq thread. This thread could be moved up or down in priority, but the
problem was all timers would fire with the same priority. So either
the thread priority was so high that low-priority process could
generate a bunch of timers which would interrupt higher priority
tasks, or the thread priority was lower, so a high priority task could
block all timers.

There was some handwavy talk of trying to keep per-process timer
lists, so the hrtimer irq could still be in irq context but the firing
logic it didn't do anything but mark its task as runnable and do the
the actual timer firing logic before we eventually run the task (in
proper rt priority order), in a fashion similar to signals. But I'm
not sure if any attempts were made in that direction. I also think it
was an open question if there's any logic in kernel that depend on
strict in-order kernel timer processing, so its possible there could
be odd inversion issues where high priority timer logic is waiting on
/expecting lower priority timers to fire, etc, so its probably an area
of research.

thanks
-john

Subject: Re: hrtimer become inaccurate with RT patch

On 2018-07-02 19:19:07 [+0800], gengdongjiu wrote:
> Hi Sebastian ,
Hi gengdongjiu,

> > the 4.1 series is no longer supported (neither RT wise nor non-RT,
> > https://www.kernel.org/category/releases.html). I suggest to move away.
> > If you notice this problem now it is hardly a long running project.
> yes, I Know, but we found the latest RT 4.14 series also has the same problem,
> so this is common issue.
This does not change what I wrote regarding the v4.1 series. Also you
could have mention v4.14 instead v4.1 if you really tested on v4.14.

> >> process will not be interrupt. But if the hrtimer is also runs in
> >> process context the timer is useless when it's inaccurate. so I want to
> >> consult you whether this is expected behavior? whether is reasonable to move the timer IRQ
> >> handling to a thread?
> >
> > This depends on your expectations. The timer is defined not to fire
> > before the programmed time. So it fires as soon as possible _after_ the
> > programmed time.
> It is reasonable that the timer is defined not to fire before the programmed time.
> but we found it fires long _after_ the programmed time. For example, we define it to
> fire after 2s, but it will fire after 5s, so it is very later than the expectations.

under normal circumstances I would expect to have a few µs delay due to
wakeup of the softirq thread. Not seconds. This is either broken HW or a
long running RT thread which blocks the expected execution.

> I think the reason may be
> that the timer handler thread is preempted by another higher priority thread. so from for this issue,
> the timer handler should be in IRQ context instead of the process context or increase the timer handler thread priority, right?

speculating on what is going on and acting based on speculation is one
way to handle situation. You could also enable tracing to see
- when does the timer fire
- when does the thread wake up
- when does is timer's function start / complete

and then you know what *really* causes the delay. The hrtimer and sched
tracepoints should provide enough information. Based on that you can
figure out if it is wise the toggle the irqsave flag or change something
else so that the system does not run ~3sec RT secs without a break.

Sebastian

2018-07-02 20:15:06

by Thomas Gleixner

[permalink] [raw]
Subject: Re: hrtimer become inaccurate with RT patch

On Mon, 2 Jul 2018, John Stultz wrote:
> On Mon, Jul 2, 2018 at 2:34 AM, gengdongjiu <[email protected]> wrote:
> > Hi Thomas/Anna/John,
> >
> > Recently I found that the hrtimer become inaccurate when there is a RT
> > process runs on the same cpu core, and the kernel has applied preempt_rt
> > patch.
> > The Linux kernel version is v4.1.46, and the preempt_rt patch is
> > patch-4.1.46-rt52.patch.
> > I know that in the preempt_rt environment the interrupt handlers no
> > longer run in interrupt context but in process context, so that RT
> > process will not be interrupt. But if the hrtimer is also runs in
> > process context the timer is useless when it's inaccurate. so I want to
> > consult you whether this is expected behavior? whether is reasonable to
> > move the timer IRQ handling to a thread?
>
> I've not looked at the PREEMPT_RT code in a long time, but years ago
> there was a tension in that there is not an easy way track ownership
> of timers. Thus timers all fired at the same priority of the hrtimer
> irq thread. This thread could be moved up or down in priority, but the
> problem was all timers would fire with the same priority. So either
> the thread priority was so high that low-priority process could
> generate a bunch of timers which would interrupt higher priority
> tasks, or the thread priority was lower, so a high priority task could
> block all timers.

Yeah, we had that long ago. It was complex and nasty.

> There was some handwavy talk of trying to keep per-process timer
> lists, so the hrtimer irq could still be in irq context but the firing
> logic it didn't do anything but mark its task as runnable and do the
> the actual timer firing logic before we eventually run the task (in
> proper rt priority order), in a fashion similar to signals. But I'm
> not sure if any attempts were made in that direction. I also think it
> was an open question if there's any logic in kernel that depend on
> strict in-order kernel timer processing, so its possible there could
> be odd inversion issues where high priority timer logic is waiting on
> /expecting lower priority timers to fire, etc, so its probably an area
> of research.

Well, the main issues are actually the signal based posix-timers. The
problem is not to keep track of them, them problem is which of the threads
to wake up for which timer. I've had experimental code which kinda worked,
but ran into issues with the signal masking and the horrors of sighand
lock. Definitely more research required for that. It might have become
simpler, but still sighand lock cannot be taken from hard interrupt context
on RT.

Thanks,

tglx

2018-07-06 14:23:27

by Daniel Wagner

[permalink] [raw]
Subject: Re: hrtimer become inaccurate with RT patch

On 07/02/2018 12:14 PM, Sebastian Andrzej Siewior wrote:
> On 2018-07-02 17:34:34 [+0800], gengdongjiu wrote:
>> The Linux kernel version is v4.1.46, and the preempt_rt patch is
>> patch-4.1.46-rt52.patch.
>
> the 4.1 series is no longer supported (neither RT wise nor non-RT,
> https://www.kernel.org/category/releases.html). I suggest to move away.
> If you notice this problem now it is hardly a long running project.

Who can edit this page:

https://wiki.linuxfoundation.org/realtime/preempt_rt_versions

? It still lists 4.1-rt as supported.

Subject: Re: hrtimer become inaccurate with RT patch

On 2018-07-06 16:22:16 [+0200], Daniel Wagner wrote:
> Who can edit this page:
>
> https://wiki.linuxfoundation.org/realtime/preempt_rt_versions
>
> ? It still lists 4.1-rt as supported.

updated.

Sebastian