2021-09-24 06:55:08

by Maninder Singh

[permalink] [raw]
Subject: [Issue] timer callback registered with mod_timer is getting called beforetime

Hi,


we are facing issue of early invocation of timer callback(kernel versio 5.4), Is it known issue for timer callback?

As we checekd code if CPU responsible for updating jiffies value(lets say CPU0) has interrupts disabled for 60 ms
and in mean time other CPU(CPU1) registers new timer for (jiffies + x) value. its callback will be called before time.

As it reads old jiffies value and register callback for x jiffies afterwards, but instead it is called
at x - y jiffies. (y is lapsed period which CPU 1 does not know from jiffies)


CPU 0 CPU 1

// last jiffies updated
local_irq_disable(); do_some_work()
do_work() ....
... ....
... ....
... // 60 ms mod_timer(&timer, jiffies + x); // read old jiffies value
local_irq_enable() ...
// new jiffies updated with // timer interrupt reads updated jiffies and calls callback function where x was not actually passed on HW clock.
// jiffies + 16


we tried with TC (with 500 ms irq disable just to check behaviour, in actual issue it was 60 ms IRQ disable)

// IRQs disabled on same CPU0 responsible for jiffies updation for 500 mili seconds

[23.5598] ##### disable IRQs
[24.0600] ##### registered timer 25025334960 5887 //registered timer at HW clock 25.025 seconds for 600 ms at 5887 jiffies on CPU1

[24.0604] ##### ticks change abruptly 5887 124 //IQRS enabled, CPU0 increments 124 jiffies colllectively because of irq disable long time

[24.1595] ##### timer called 25124783406 6037 // as per jiffies callback trigger at 6037 jiffies i.e. after 150 jiffies(600ms)
// but HW clock is passed only 100 milliseconds.



Is it known behaviour for timers?
because only 1 CPU is assigned to update jiffies work to call do_timer utill unless it goes to idle state and pass ownership to other CPU.

we tried by making all CPU to handle code for jiffies updation (it will add performance hit)
but then no issue of abrupt jiffies change occured on system.


Thanks and Regards
----------------------
Maninder Singh


2021-09-24 16:22:39

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [Issue] timer callback registered with mod_timer is getting called beforetime

Hi Maninder,

On Fri, Sep 24, 2021 at 12:23:10PM +0530, Maninder Singh wrote:
> Hi,
>
>
> we are facing issue of early invocation of timer callback(kernel versio 5.4), Is it known issue for timer callback?
>
> As we checekd code if CPU responsible for updating jiffies value(lets say CPU0) has interrupts disabled for 60 ms
> and in mean time other CPU(CPU1) registers new timer for (jiffies + x) value. its callback will be called before time.
>
> As it reads old jiffies value and register callback for x jiffies afterwards, but instead it is called
> at x - y jiffies. (y is lapsed period which CPU 1 does not know from jiffies)
>
>
> CPU 0 CPU 1
>
> // last jiffies updated
> local_irq_disable(); do_some_work()
> do_work() ....
> ... ....
> ... ....
> ... // 60 ms mod_timer(&timer, jiffies + x); // read old jiffies value
> local_irq_enable() ...
> // new jiffies updated with // timer interrupt reads updated jiffies and calls callback function where x was not actually passed on HW clock.
> // jiffies + 16
>
>
> we tried with TC (with 500 ms irq disable just to check behaviour, in actual issue it was 60 ms IRQ disable)
>
> // IRQs disabled on same CPU0 responsible for jiffies updation for 500 mili seconds
>
> [23.5598] ##### disable IRQs
> [24.0600] ##### registered timer 25025334960 5887 //registered timer at HW clock 25.025 seconds for 600 ms at 5887 jiffies on CPU1
>
> [24.0604] ##### ticks change abruptly 5887 124 //IQRS enabled, CPU0 increments 124 jiffies colllectively because of irq disable long time
>
> [24.1595] ##### timer called 25124783406 6037 // as per jiffies callback trigger at 6037 jiffies i.e. after 150 jiffies(600ms)
> // but HW clock is passed only 100 milliseconds.
>
>
>
> Is it known behaviour for timers?
> because only 1 CPU is assigned to update jiffies work to call do_timer utill unless it goes to idle state and pass ownership to other CPU.
>
> we tried by making all CPU to handle code for jiffies updation (it will add performance hit)
> but then no issue of abrupt jiffies change occured on system.

First of all, are you meeting this issue specifically on NOHZ_FULL? Because
there is a pending fix for a related matter there:

https://lore.kernel.org/lkml/[email protected]/

As for what you're reporting here, I think the core problem is the fact that the
timekeeper (jiffies updater) is stuck with IRQs disabled for way too long. Even
one millisecond is too much to tolerate. Do you have any idea about the source of
that situation?

Thanks.

2021-09-24 16:53:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Issue] timer callback registered with mod_timer is getting called beforetime

On Fri, Sep 24 2021 at 12:23, Maninder Singh wrote:
?
> As we checekd code if CPU responsible for updating jiffies value(lets
> say CPU0) has interrupts disabled for 60 ms and in mean time other
> CPU(CPU1) registers new timer for (jiffies + x) value. its callback
> will be called before time.

So you run some stuff for 60ms with interrupts disabled and now you are
asking what's wrong?

The answer is simply: Keeping interrupts disabled for 60ms is wrong.

Anything else is just a symptom.

Thanks,

tglx