Hi Frederic,
> > Is it known behaviour for timers?
> > because only 1 CPU is assigned to update jiffies work to call do_timer utill unless it goes to idle state and pass ownership to other CPU.
> >
> > we tried by making all CPU to handle code for jiffies updation (it will add performance hit)
> > but then no issue of abrupt jiffies change occured on system.
>
> First of all, are you meeting this issue specifically on NOHZ_FULL? Because
> there is a pending fix for a related matter there:
No, this is not our case.
>
> https://lore.kernel.org/lkml/[email protected]/
>
> As for what you're reporting here, I think the core problem is the fact that the
> timekeeper (jiffies updater) is stuck with IRQs disabled for way too long. Even
> one millisecond is too much to tolerate. Do you have any idea about the source of
> that situation?
>
Yes, definately interrupts should not be disabled for so long,
but sometimes there are 3rd party drivers/vendors module code can cause issue,
and it can be the same case and we are trying to reproduce issue again and check code path.
So we had 2 doubts:
(1) In this explained case timer callback will be called early right?
(2) What if jiffies updation can be done by any of the CPU rather that making one
CPU owner? can it cause any side effectes? one we know is performance, there will be redundant calls
from other CPUs.
/* Check, if the jiffies need an update */
if (tick_do_timer_cpu == cpu)
tick_do_update_jiffies64(now);
On our target, there is a race condition when irq_disable code path scheduled on same CPU
which is responsible for jiffies updation and in parallel CPU1 registers evet callback for 20/30 ms.
and due to abrupt jiffies change callback triggered within 1 ms of actual time, which cause actual
issue.
Thanks
Maninder Singh