After tasks enter or leave a runqueue (wakeup/block) SCHED full_nohz
dependency is checked (via sched_update_tick_dependency()). In case tick
can be stopped on a CPU (see sched_can_stop_tick() for details), SCHED
dependency for such CPU is cleared. However, this new information is not
used right away to actually stop the tick.
In CONFIG_PREEMPT systems booted with threadirqs option, sched clock
tick is serviced by an actual task (ksoftirqd corresponding to the CPU
where tick timer fired). So, in case a CPU was running a single task,
servicing the timer involves exiting full nozh mode. Problem at this
point is that we might lose chances to enter back into full nozh mode,
since info about ksoftirqd thread going back to sleep is not used (as
mentioned above).
Fix it by calling tick_nozh_full_update_tick(_cpu)() right after SCHED
dependency is cleared, so that tick can be promptly stopped.
Signed-off-by: Juri Lelli <[email protected]>
---
Hi,
I noticed what seems to be the problem described in the changelog while
running sysjitter [1] on a PREEMPT system setup for isolation and full
nozh, i.e. ... skew_tick=1 nohz=on nohz_full=4-35 rcu_nocbs=4-35
threadirqs (36 CPUs box).
Starting sysjitter with something like the following
perf stat -C 4-35 -e irq_vectors:local_timer_entry taskset \
--cpu-list 4-35 ./sysjitter/sysjitter --runtime 30 200
with vanilla kernel was returning ~5k local_timer_entry events, while
that number goes down to ~100 or so with the proposed patch applied.
The following trace snippet also highlight the problematic situation:
...
ksoftirqd/19-125 [019] 170.700754: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.700755: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.700756: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.700757: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.700759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
sysjitter-2459 [019] 170.701740: local_timer_entry: vector=236
sysjitter-2459 [019] 170.701742: softirq_raise: vec=1 [action=TIMER]
sysjitter-2459 [019] 170.701743: softirq_raise: vec=7 [action=SCHED]
sysjitter-2459 [019] 170.701744: local_timer_exit: vector=236
sysjitter-2459 [019] 170.701747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019
sysjitter-2459 [019] 170.701748: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.701749: irq_work_entry: vector=246
sysjitter-2459 [019] 170.701750: irq_work_exit: vector=246
sysjitter-2459 [019] 170.701751: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.701753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125 [019] 170.701754: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.701756: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.701756: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.701758: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.701759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
sysjitter-2459 [019] 170.702740: local_timer_entry: vector=236
sysjitter-2459 [019] 170.702742: softirq_raise: vec=1 [action=TIMER]
sysjitter-2459 [019] 170.702743: softirq_raise: vec=7 [action=SCHED]
sysjitter-2459 [019] 170.702744: local_timer_exit: vector=236
sysjitter-2459 [019] 170.702747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019
sysjitter-2459 [019] 170.702748: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.702749: irq_work_entry: vector=246
sysjitter-2459 [019] 170.702750: irq_work_exit: vector=246
sysjitter-2459 [019] 170.702751: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.702753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125 [019] 170.702755: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.702756: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.702757: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.702758: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.702760: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
sysjitter-2459 [019] 170.703740: local_timer_entry: vector=236
sysjitter-2459 [019] 170.703742: softirq_raise: vec=1 [action=TIMER]
sysjitter-2459 [019] 170.703743: softirq_raise: vec=7 [action=SCHED]
sysjitter-2459 [019] 170.703745: local_timer_exit: vector=236
sysjitter-2459 [019] 170.703747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019
sysjitter-2459 [019] 170.703748: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.703749: irq_work_entry: vector=246
sysjitter-2459 [019] 170.703750: irq_work_exit: vector=246
sysjitter-2459 [019] 170.703751: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.703753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125 [019] 170.703755: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.703756: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.703757: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.703758: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.703759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
sysjitter-2459 [019] 170.704500: call_function_single_entry: vector=251
sysjitter-2459 [019] 170.704502: call_function_single_exit: vector=251
sysjitter-2459 [019] 170.704504: tick_stop: success=1 dependency=NONE
...
Luckily sometimes it happens (like in the above) that an interrupt fires
on a CPU while ksoftirqd is not executing, SCHED dep is seen as cleared
and the CPU enters full nohz. I've however observed pathological cases
where CPUs aren't able to enter back full nohz for a long time.
Hope my understanding is correct. Please find below a naive attempt to
fix the problem. I'm pretty sure I'm missing details, e.g. I'm not sure
it is safe to possibly stop the tick remotely. Consider the following as
not more than another way of pointing the problem out. :-)
Best,
Juri
1 - https://github.com/alexeiz/sysjitter
---
kernel/time/tick-sched.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3e2dc9b8858c..0d696e55d774 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -342,11 +342,16 @@ void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
}
EXPORT_SYMBOL_GPL(tick_nohz_dep_set_cpu);
+static void tick_nohz_full_update_tick_cpu(struct tick_sched *ts, int cpu);
+
void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
{
struct tick_sched *ts = per_cpu_ptr(&tick_cpu_sched, cpu);
atomic_andnot(BIT(bit), &ts->tick_dep_mask);
+
+ if (!in_irq())
+ tick_nohz_full_update_tick_cpu(ts, cpu);
}
EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);
@@ -870,11 +875,9 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
tick_nohz_restart(ts, now);
}
-static void tick_nohz_full_update_tick(struct tick_sched *ts)
+static void tick_nohz_full_update_tick_cpu(struct tick_sched *ts, int cpu)
{
#ifdef CONFIG_NO_HZ_FULL
- int cpu = smp_processor_id();
-
if (!tick_nohz_full_cpu(cpu))
return;
@@ -888,6 +891,15 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
#endif
}
+static void tick_nohz_full_update_tick(struct tick_sched *ts)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ int cpu = smp_processor_id();
+
+ tick_nohz_full_update_tick_cpu(ts, cpu);
+#endif
+}
+
static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
{
/*
--
2.25.3
Hi Juri,
On Wed, May 20, 2020 at 04:04:02PM +0200, Juri Lelli wrote:
> After tasks enter or leave a runqueue (wakeup/block) SCHED full_nohz
> dependency is checked (via sched_update_tick_dependency()). In case tick
> can be stopped on a CPU (see sched_can_stop_tick() for details), SCHED
> dependency for such CPU is cleared. However, this new information is not
> used right away to actually stop the tick.
>
> In CONFIG_PREEMPT systems booted with threadirqs option, sched clock
> tick is serviced by an actual task (ksoftirqd corresponding to the CPU
> where tick timer fired).
I must confess I haven't tested threaded IRQs but I was
pretty sure that the timer tick is always serviced on hardirq.
Now the timer list callbacks are executed on softirqs. So if the
tick itself, executed on hardirq, sees pending timers, it raise
the softirq which wakes up ksoftirqd on forced irq thread mode
while calling irq_exit(). Then tick_nohz_irq_exit() sees ksoftirqd
and the current task on the runqueue, which together can indeed prevent
from turning off the tick. But then the root cause is pending timer
list callbacks.
> So, in case a CPU was running a single task,
> servicing the timer involves exiting full nozh mode. Problem at this
> point is that we might lose chances to enter back into full nozh mode,
> since info about ksoftirqd thread going back to sleep is not used (as
> mentioned above).
It should enter into nohz_full mode in the next tick, which is usually
a reasonable delay. If you need guarantee that the tick is stopped before
resuming userspace, you need a stronger machinery such as the task isolation
patchset.
Let's have a look at the trace below:
> ksoftirqd/19-125 [019] 170.700754: softirq_entry: vec=1 [action=TIMER]
> ksoftirqd/19-125 [019] 170.700755: softirq_exit: vec=1 [action=TIMER]
> ksoftirqd/19-125 [019] 170.700756: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/19-125 [019] 170.700757: softirq_exit: vec=7 [action=SCHED]
> ksoftirqd/19-125 [019] 170.700759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
> sysjitter-2459 [019] 170.701740: local_timer_entry: vector=236
> sysjitter-2459 [019] 170.701742: softirq_raise: vec=1 [action=TIMER]
See here the tick sees pending timer callbacks so it raises the timer softirq.
> sysjitter-2459 [019] 170.701743: softirq_raise: vec=7 [action=SCHED]
Oh and the scheduler tick activates the scheduler softirq as well.
> sysjitter-2459 [019] 170.701744: local_timer_exit: vector=236
> sysjitter-2459 [019] 170.701747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019
> sysjitter-2459 [019] 170.701748: tick_stop: success=0 dependency=SCHED
> sysjitter-2459 [019] 170.701749: irq_work_entry: vector=246
> sysjitter-2459 [019] 170.701750: irq_work_exit: vector=246
> sysjitter-2459 [019] 170.701751: tick_stop: success=0 dependency=SCHED
> sysjitter-2459 [019] 170.701753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
> ksoftirqd/19-125 [019] 170.701754: softirq_entry: vec=1 [action=TIMER]
> ksoftirqd/19-125 [019] 170.701756: softirq_exit: vec=1 [action=TIMER]
> ksoftirqd/19-125 [019] 170.701756: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/19-125 [019] 170.701758: softirq_exit: vec=7 [action=SCHED]
> ksoftirqd/19-125 [019] 170.701759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
> sysjitter-2459 [019] 170.702740: local_timer_entry: vector=236
> sysjitter-2459 [019] 170.702742: softirq_raise: vec=1 [action=TIMER]
A new tick but we still have pending timer callback so we'll need to wakeup ksoftirqd
again.
I think you should trace timers and check which one is concerned here.
Thanks.
On 20/05/20 18:24, Frederic Weisbecker wrote:
> Hi Juri,
>
> On Wed, May 20, 2020 at 04:04:02PM +0200, Juri Lelli wrote:
> > After tasks enter or leave a runqueue (wakeup/block) SCHED full_nohz
> > dependency is checked (via sched_update_tick_dependency()). In case tick
> > can be stopped on a CPU (see sched_can_stop_tick() for details), SCHED
> > dependency for such CPU is cleared. However, this new information is not
> > used right away to actually stop the tick.
> >
> > In CONFIG_PREEMPT systems booted with threadirqs option, sched clock
> > tick is serviced by an actual task (ksoftirqd corresponding to the CPU
> > where tick timer fired).
>
> I must confess I haven't tested threaded IRQs but I was
> pretty sure that the timer tick is always serviced on hardirq.
>
> Now the timer list callbacks are executed on softirqs. So if the
> tick itself, executed on hardirq, sees pending timers, it raise
> the softirq which wakes up ksoftirqd on forced irq thread mode
> while calling irq_exit(). Then tick_nohz_irq_exit() sees ksoftirqd
> and the current task on the runqueue, which together can indeed prevent
> from turning off the tick. But then the root cause is pending timer
> list callbacks.
>
> > So, in case a CPU was running a single task,
> > servicing the timer involves exiting full nozh mode. Problem at this
> > point is that we might lose chances to enter back into full nozh mode,
> > since info about ksoftirqd thread going back to sleep is not used (as
> > mentioned above).
>
> It should enter into nohz_full mode in the next tick, which is usually
> a reasonable delay. If you need guarantee that the tick is stopped before
> resuming userspace, you need a stronger machinery such as the task isolation
> patchset.
>
> Let's have a look at the trace below:
>
> > ksoftirqd/19-125 [019] 170.700754: softirq_entry: vec=1 [action=TIMER]
> > ksoftirqd/19-125 [019] 170.700755: softirq_exit: vec=1 [action=TIMER]
> > ksoftirqd/19-125 [019] 170.700756: softirq_entry: vec=7 [action=SCHED]
> > ksoftirqd/19-125 [019] 170.700757: softirq_exit: vec=7 [action=SCHED]
> > ksoftirqd/19-125 [019] 170.700759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
> > sysjitter-2459 [019] 170.701740: local_timer_entry: vector=236
> > sysjitter-2459 [019] 170.701742: softirq_raise: vec=1 [action=TIMER]
>
> See here the tick sees pending timer callbacks so it raises the timer softirq.
>
> > sysjitter-2459 [019] 170.701743: softirq_raise: vec=7 [action=SCHED]
>
> Oh and the scheduler tick activates the scheduler softirq as well.
>
> > sysjitter-2459 [019] 170.701744: local_timer_exit: vector=236
> > sysjitter-2459 [019] 170.701747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019
> > sysjitter-2459 [019] 170.701748: tick_stop: success=0 dependency=SCHED
> > sysjitter-2459 [019] 170.701749: irq_work_entry: vector=246
> > sysjitter-2459 [019] 170.701750: irq_work_exit: vector=246
> > sysjitter-2459 [019] 170.701751: tick_stop: success=0 dependency=SCHED
> > sysjitter-2459 [019] 170.701753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
> > ksoftirqd/19-125 [019] 170.701754: softirq_entry: vec=1 [action=TIMER]
> > ksoftirqd/19-125 [019] 170.701756: softirq_exit: vec=1 [action=TIMER]
> > ksoftirqd/19-125 [019] 170.701756: softirq_entry: vec=7 [action=SCHED]
> > ksoftirqd/19-125 [019] 170.701758: softirq_exit: vec=7 [action=SCHED]
> > ksoftirqd/19-125 [019] 170.701759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
> > sysjitter-2459 [019] 170.702740: local_timer_entry: vector=236
> > sysjitter-2459 [019] 170.702742: softirq_raise: vec=1 [action=TIMER]
>
> A new tick but we still have pending timer callback so we'll need to wakeup ksoftirqd
> again.
>
> I think you should trace timers and check which one is concerned here.
Hummm, so I enabled 'timer:*', anything else you think I should be
looking at?
...
ksoftirqd/13-117 [013] 148.265945: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/13-117 [013] 148.265947: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/13-117 [013] 148.265948: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/13-117 [013] 148.265950: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/13-117 [013] 148.265952: sched_switch: ksoftirqd/13:117 [120] S ==> sysjitter:2536 [120]
sysjitter-2536 [013] 148.266912: local_timer_entry: vector=236
sysjitter-2536 [013] 148.266914: hrtimer_cancel: hrtimer=0xffff9807df91bee0
sysjitter-2536 [013] 148.266916: hrtimer_expire_entry: hrtimer=0xffff9807df91bee0 now=148249107837 function=tick_sched_timer/0x0
sysjitter-2536 [013] 148.266918: softirq_raise: vec=1 [action=TIMER]
sysjitter-2536 [013] 148.266920: softirq_raise: vec=7 [action=SCHED]
sysjitter-2536 [013] 148.266922: hrtimer_expire_exit: hrtimer=0xffff9807df91bee0
sysjitter-2536 [013] 148.266924: hrtimer_start: hrtimer=0xffff9807df91bee0 function=tick_sched_timer/0x0 expires=148250105422 softexpires=148250105422
sysjitter-2536 [013] 148.266926: local_timer_exit: vector=236
sysjitter-2536 [013] 148.266930: sched_wakeup: ksoftirqd/13:117 [120] success=1 CPU:013
sysjitter-2536 [013] 148.266932: tick_stop: success=0 dependency=SCHED
sysjitter-2536 [013] 148.266934: irq_work_entry: vector=246
sysjitter-2536 [013] 148.266936: irq_work_exit: vector=246
sysjitter-2536 [013] 148.266937: tick_stop: success=0 dependency=SCHED
sysjitter-2536 [013] 148.266940: sched_switch: sysjitter:2536 [120] R ==> ksoftirqd/13:117 [120]
ksoftirqd/13-117 [013] 148.266943: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/13-117 [013] 148.266946: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/13-117 [013] 148.266946: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/13-117 [013] 148.266948: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/13-117 [013] 148.266950: sched_switch: ksoftirqd/13:117 [120] S ==> sysjitter:2536 [120]
sysjitter-2536 [013] 148.267912: local_timer_entry: vector=236
sysjitter-2536 [013] 148.267914: hrtimer_cancel: hrtimer=0xffff9807df91bee0
sysjitter-2536 [013] 148.267916: hrtimer_expire_entry: hrtimer=0xffff9807df91bee0 now=148250107750 function=tick_sched_timer/0x0
sysjitter-2536 [013] 148.267918: softirq_raise: vec=1 [action=TIMER]
sysjitter-2536 [013] 148.267920: softirq_raise: vec=7 [action=SCHED]
sysjitter-2536 [013] 148.267922: hrtimer_expire_exit: hrtimer=0xffff9807df91bee0
sysjitter-2536 [013] 148.267924: hrtimer_start: hrtimer=0xffff9807df91bee0 function=tick_sched_timer/0x0 expires=148251105422 softexpires=148251105422
sysjitter-2536 [013] 148.267926: local_timer_exit: vector=236
sysjitter-2536 [013] 148.267929: sched_wakeup: ksoftirqd/13:117 [120] success=1 CPU:013
sysjitter-2536 [013] 148.267931: tick_stop: success=0 dependency=SCHED
sysjitter-2536 [013] 148.267934: irq_work_entry: vector=246
sysjitter-2536 [013] 148.267936: irq_work_exit: vector=246
sysjitter-2536 [013] 148.267937: tick_stop: success=0 dependency=SCHED
sysjitter-2536 [013] 148.267940: sched_switch: sysjitter:2536 [120] R ==> ksoftirqd/13:117 [120]
ksoftirqd/13-117 [013] 148.267943: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/13-117 [013] 148.267945: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/13-117 [013] 148.267946: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/13-117 [013] 148.267947: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/13-117 [013] 148.267949: sched_switch: ksoftirqd/13:117 [120] S ==> sysjitter:2536 [120]
sysjitter-2536 [013] 148.268865: call_function_single_entry: vector=251
sysjitter-2536 [013] 148.268867: call_function_single_exit: vector=251
sysjitter-2536 [013] 148.268870: tick_stop: success=1 dependency=NONE
sysjitter-2536 [013] 148.268872: hrtimer_cancel: hrtimer=0xffff9807df91bee0
sysjitter-2536 [013] 148.268876: call_function_single_entry: vector=251
sysjitter-2536 [013] 148.268878: call_function_single_exit: vector=251
...
it looks like tick sched timer is kept armed until tick_stop
succedes (because of the check performed after _single_exit)?
Thanks for your quick reply!
On Wed, May 20, 2020 at 06:49:25PM +0200, Juri Lelli wrote:
> On 20/05/20 18:24, Frederic Weisbecker wrote:
>
> Hummm, so I enabled 'timer:*', anything else you think I should be
> looking at?
Are you sure you also enabled timer_expire_entry?
Because:
>
> ...
> ksoftirqd/13-117 [013] 148.265945: softirq_entry: vec=1 [action=TIMER]
...the timer softirq keep triggering without any timer callback executed. If that's the
case, we may be dealing with an interesting issue.
Thanks.
> ksoftirqd/13-117 [013] 148.265947: softirq_exit: vec=1 [action=TIMER]
> ksoftirqd/13-117 [013] 148.265948: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/13-117 [013] 148.265950: softirq_exit: vec=7 [action=SCHED]
> ksoftirqd/13-117 [013] 148.265952: sched_switch: ksoftirqd/13:117 [120] S ==> sysjitter:2536 [120]
> sysjitter-2536 [013] 148.266912: local_timer_entry: vector=236
> sysjitter-2536 [013] 148.266914: hrtimer_cancel: hrtimer=0xffff9807df91bee0
> sysjitter-2536 [013] 148.266916: hrtimer_expire_entry: hrtimer=0xffff9807df91bee0 now=148249107837 function=tick_sched_timer/0x0
> sysjitter-2536 [013] 148.266918: softirq_raise: vec=1 [action=TIMER]
> sysjitter-2536 [013] 148.266920: softirq_raise: vec=7 [action=SCHED]
> sysjitter-2536 [013] 148.266922: hrtimer_expire_exit: hrtimer=0xffff9807df91bee0
> sysjitter-2536 [013] 148.266924: hrtimer_start: hrtimer=0xffff9807df91bee0 function=tick_sched_timer/0x0 expires=148250105422 softexpires=148250105422
> sysjitter-2536 [013] 148.266926: local_timer_exit: vector=236
> sysjitter-2536 [013] 148.266930: sched_wakeup: ksoftirqd/13:117 [120] success=1 CPU:013
> sysjitter-2536 [013] 148.266932: tick_stop: success=0 dependency=SCHED
> sysjitter-2536 [013] 148.266934: irq_work_entry: vector=246
> sysjitter-2536 [013] 148.266936: irq_work_exit: vector=246
> sysjitter-2536 [013] 148.266937: tick_stop: success=0 dependency=SCHED
> sysjitter-2536 [013] 148.266940: sched_switch: sysjitter:2536 [120] R ==> ksoftirqd/13:117 [120]
> ksoftirqd/13-117 [013] 148.266943: softirq_entry: vec=1 [action=TIMER]
> ksoftirqd/13-117 [013] 148.266946: softirq_exit: vec=1 [action=TIMER]
> ksoftirqd/13-117 [013] 148.266946: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/13-117 [013] 148.266948: softirq_exit: vec=7 [action=SCHED]
> ksoftirqd/13-117 [013] 148.266950: sched_switch: ksoftirqd/13:117 [120] S ==> sysjitter:2536 [120]
> sysjitter-2536 [013] 148.267912: local_timer_entry: vector=236
> sysjitter-2536 [013] 148.267914: hrtimer_cancel: hrtimer=0xffff9807df91bee0
> sysjitter-2536 [013] 148.267916: hrtimer_expire_entry: hrtimer=0xffff9807df91bee0 now=148250107750 function=tick_sched_timer/0x0
> sysjitter-2536 [013] 148.267918: softirq_raise: vec=1 [action=TIMER]
> sysjitter-2536 [013] 148.267920: softirq_raise: vec=7 [action=SCHED]
> sysjitter-2536 [013] 148.267922: hrtimer_expire_exit: hrtimer=0xffff9807df91bee0
> sysjitter-2536 [013] 148.267924: hrtimer_start: hrtimer=0xffff9807df91bee0 function=tick_sched_timer/0x0 expires=148251105422 softexpires=148251105422
> sysjitter-2536 [013] 148.267926: local_timer_exit: vector=236
> sysjitter-2536 [013] 148.267929: sched_wakeup: ksoftirqd/13:117 [120] success=1 CPU:013
> sysjitter-2536 [013] 148.267931: tick_stop: success=0 dependency=SCHED
> sysjitter-2536 [013] 148.267934: irq_work_entry: vector=246
> sysjitter-2536 [013] 148.267936: irq_work_exit: vector=246
> sysjitter-2536 [013] 148.267937: tick_stop: success=0 dependency=SCHED
> sysjitter-2536 [013] 148.267940: sched_switch: sysjitter:2536 [120] R ==> ksoftirqd/13:117 [120]
> ksoftirqd/13-117 [013] 148.267943: softirq_entry: vec=1 [action=TIMER]
> ksoftirqd/13-117 [013] 148.267945: softirq_exit: vec=1 [action=TIMER]
> ksoftirqd/13-117 [013] 148.267946: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/13-117 [013] 148.267947: softirq_exit: vec=7 [action=SCHED]
> ksoftirqd/13-117 [013] 148.267949: sched_switch: ksoftirqd/13:117 [120] S ==> sysjitter:2536 [120]
> sysjitter-2536 [013] 148.268865: call_function_single_entry: vector=251
> sysjitter-2536 [013] 148.268867: call_function_single_exit: vector=251
> sysjitter-2536 [013] 148.268870: tick_stop: success=1 dependency=NONE
> sysjitter-2536 [013] 148.268872: hrtimer_cancel: hrtimer=0xffff9807df91bee0
> sysjitter-2536 [013] 148.268876: call_function_single_entry: vector=251
> sysjitter-2536 [013] 148.268878: call_function_single_exit: vector=251
> ...
>
> it looks like tick sched timer is kept armed until tick_stop
> succedes (because of the check performed after _single_exit)?
>
> Thanks for your quick reply!
>
On 20/05/20 19:02, Frederic Weisbecker wrote:
> On Wed, May 20, 2020 at 06:49:25PM +0200, Juri Lelli wrote:
> > On 20/05/20 18:24, Frederic Weisbecker wrote:
> >
> > Hummm, so I enabled 'timer:*', anything else you think I should be
> > looking at?
>
> Are you sure you also enabled timer_expire_entry?
> Because:
I run with
trace-cmd record -e sched_switch -e sched_wakeup -e sched_migrate_task \
-e 'timer:*' -e 'irq_vectors:*' -e 'irq:softirq*' \
-e 'irq:irq_handler*' taskset --cpu-list 4-35 ./sysjitter/sysjitter \
--runtime 10 200
And
$ grep ' timer_expire_entry:' trace.txt | wc -l
1173
seems to confirm that the event was indeed enabled, e.g.
ksoftirqd/11-100 [011] 159.270023: timer_expire_entry: timer=0xffff9807df8a5a60 function=delayed_work_timer_fn now=4294826418 baseclk=4294825984
>
> >
> > ...
> > ksoftirqd/13-117 [013] 148.265945: softirq_entry: vec=1 [action=TIMER]
>
> ...the timer softirq keep triggering without any timer callback executed. If that's the
> case, we may be dealing with an interesting issue.
>
> Thanks.
>
> > ksoftirqd/13-117 [013] 148.265947: softirq_exit: vec=1 [action=TIMER]
> > ksoftirqd/13-117 [013] 148.265948: softirq_entry: vec=7 [action=SCHED]
> > ksoftirqd/13-117 [013] 148.265950: softirq_exit: vec=7 [action=SCHED]
> > ksoftirqd/13-117 [013] 148.265952: sched_switch: ksoftirqd/13:117 [120] S ==> sysjitter:2536 [120]
> > sysjitter-2536 [013] 148.266912: local_timer_entry: vector=236
> > sysjitter-2536 [013] 148.266914: hrtimer_cancel: hrtimer=0xffff9807df91bee0
> > sysjitter-2536 [013] 148.266916: hrtimer_expire_entry: hrtimer=0xffff9807df91bee0 now=148249107837 function=tick_sched_timer/0x0
> > sysjitter-2536 [013] 148.266918: softirq_raise: vec=1 [action=TIMER]
> > sysjitter-2536 [013] 148.266920: softirq_raise: vec=7 [action=SCHED]
> > sysjitter-2536 [013] 148.266922: hrtimer_expire_exit: hrtimer=0xffff9807df91bee0
> > sysjitter-2536 [013] 148.266924: hrtimer_start: hrtimer=0xffff9807df91bee0 function=tick_sched_timer/0x0 expires=148250105422 softexpires=148250105422
> > sysjitter-2536 [013] 148.266926: local_timer_exit: vector=236
> > sysjitter-2536 [013] 148.266930: sched_wakeup: ksoftirqd/13:117 [120] success=1 CPU:013
> > sysjitter-2536 [013] 148.266932: tick_stop: success=0 dependency=SCHED
> > sysjitter-2536 [013] 148.266934: irq_work_entry: vector=246
> > sysjitter-2536 [013] 148.266936: irq_work_exit: vector=246
> > sysjitter-2536 [013] 148.266937: tick_stop: success=0 dependency=SCHED
> > sysjitter-2536 [013] 148.266940: sched_switch: sysjitter:2536 [120] R ==> ksoftirqd/13:117 [120]
> > ksoftirqd/13-117 [013] 148.266943: softirq_entry: vec=1 [action=TIMER]
> > ksoftirqd/13-117 [013] 148.266946: softirq_exit: vec=1 [action=TIMER]
> > ksoftirqd/13-117 [013] 148.266946: softirq_entry: vec=7 [action=SCHED]
> > ksoftirqd/13-117 [013] 148.266948: softirq_exit: vec=7 [action=SCHED]
> > ksoftirqd/13-117 [013] 148.266950: sched_switch: ksoftirqd/13:117 [120] S ==> sysjitter:2536 [120]
> > sysjitter-2536 [013] 148.267912: local_timer_entry: vector=236
> > sysjitter-2536 [013] 148.267914: hrtimer_cancel: hrtimer=0xffff9807df91bee0
> > sysjitter-2536 [013] 148.267916: hrtimer_expire_entry: hrtimer=0xffff9807df91bee0 now=148250107750 function=tick_sched_timer/0x0
> > sysjitter-2536 [013] 148.267918: softirq_raise: vec=1 [action=TIMER]
> > sysjitter-2536 [013] 148.267920: softirq_raise: vec=7 [action=SCHED]
> > sysjitter-2536 [013] 148.267922: hrtimer_expire_exit: hrtimer=0xffff9807df91bee0
> > sysjitter-2536 [013] 148.267924: hrtimer_start: hrtimer=0xffff9807df91bee0 function=tick_sched_timer/0x0 expires=148251105422 softexpires=148251105422
> > sysjitter-2536 [013] 148.267926: local_timer_exit: vector=236
> > sysjitter-2536 [013] 148.267929: sched_wakeup: ksoftirqd/13:117 [120] success=1 CPU:013
> > sysjitter-2536 [013] 148.267931: tick_stop: success=0 dependency=SCHED
> > sysjitter-2536 [013] 148.267934: irq_work_entry: vector=246
> > sysjitter-2536 [013] 148.267936: irq_work_exit: vector=246
> > sysjitter-2536 [013] 148.267937: tick_stop: success=0 dependency=SCHED
> > sysjitter-2536 [013] 148.267940: sched_switch: sysjitter:2536 [120] R ==> ksoftirqd/13:117 [120]
> > ksoftirqd/13-117 [013] 148.267943: softirq_entry: vec=1 [action=TIMER]
> > ksoftirqd/13-117 [013] 148.267945: softirq_exit: vec=1 [action=TIMER]
> > ksoftirqd/13-117 [013] 148.267946: softirq_entry: vec=7 [action=SCHED]
> > ksoftirqd/13-117 [013] 148.267947: softirq_exit: vec=7 [action=SCHED]
> > ksoftirqd/13-117 [013] 148.267949: sched_switch: ksoftirqd/13:117 [120] S ==> sysjitter:2536 [120]
> > sysjitter-2536 [013] 148.268865: call_function_single_entry: vector=251
> > sysjitter-2536 [013] 148.268867: call_function_single_exit: vector=251
> > sysjitter-2536 [013] 148.268870: tick_stop: success=1 dependency=NONE
> > sysjitter-2536 [013] 148.268872: hrtimer_cancel: hrtimer=0xffff9807df91bee0
> > sysjitter-2536 [013] 148.268876: call_function_single_entry: vector=251
> > sysjitter-2536 [013] 148.268878: call_function_single_exit: vector=251
> > ...
> >
> > it looks like tick sched timer is kept armed until tick_stop
> > succedes (because of the check performed after _single_exit)?
> >
> > Thanks for your quick reply!
> >
>
On Wed, May 20, 2020 at 08:47:10PM +0200, Juri Lelli wrote:
> On 20/05/20 19:02, Frederic Weisbecker wrote:
> > On Wed, May 20, 2020 at 06:49:25PM +0200, Juri Lelli wrote:
> > > On 20/05/20 18:24, Frederic Weisbecker wrote:
> > >
> > > Hummm, so I enabled 'timer:*', anything else you think I should be
> > > looking at?
> >
> > Are you sure you also enabled timer_expire_entry?
> > Because:
>
> I run with
>
> trace-cmd record -e sched_switch -e sched_wakeup -e sched_migrate_task \
> -e 'timer:*' -e 'irq_vectors:*' -e 'irq:softirq*' \
> -e 'irq:irq_handler*' taskset --cpu-list 4-35 ./sysjitter/sysjitter \
> --runtime 10 200
>
> And
>
> $ grep ' timer_expire_entry:' trace.txt | wc -l
> 1173
>
> seems to confirm that the event was indeed enabled, e.g.
>
> ksoftirqd/11-100 [011] 159.270023: timer_expire_entry: timer=0xffff9807df8a5a60 function=delayed_work_timer_fn now=4294826418 baseclk=4294825984
That's interesting! I'll see if I can reproduce that with threaded irqs. If not we'll
have to investigate on your machine. We really don't want to trigger timer softirqs
when it's not necessary.
On 21/05/20 02:44, Frederic Weisbecker wrote:
> On Wed, May 20, 2020 at 08:47:10PM +0200, Juri Lelli wrote:
> > On 20/05/20 19:02, Frederic Weisbecker wrote:
> > > On Wed, May 20, 2020 at 06:49:25PM +0200, Juri Lelli wrote:
> > > > On 20/05/20 18:24, Frederic Weisbecker wrote:
> > > >
> > > > Hummm, so I enabled 'timer:*', anything else you think I should be
> > > > looking at?
> > >
> > > Are you sure you also enabled timer_expire_entry?
> > > Because:
> >
> > I run with
> >
> > trace-cmd record -e sched_switch -e sched_wakeup -e sched_migrate_task \
> > -e 'timer:*' -e 'irq_vectors:*' -e 'irq:softirq*' \
> > -e 'irq:irq_handler*' taskset --cpu-list 4-35 ./sysjitter/sysjitter \
> > --runtime 10 200
> >
> > And
> >
> > $ grep ' timer_expire_entry:' trace.txt | wc -l
> > 1173
> >
> > seems to confirm that the event was indeed enabled, e.g.
> >
> > ksoftirqd/11-100 [011] 159.270023: timer_expire_entry: timer=0xffff9807df8a5a60 function=delayed_work_timer_fn now=4294826418 baseclk=4294825984
>
> That's interesting! I'll see if I can reproduce that with threaded irqs. If not we'll
> have to investigate on your machine. We really don't want to trigger timer softirqs
> when it's not necessary.
>
So, I managed to get a bit more info on this, please check the following
(which I edited a bit to hopefully make it clearer and apologies for
long lines, but I believe trimming them would mess things up)
---
ksoftirqd/3-26 [003] 99.942485: timer_expire_entry: timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759328 baseclk=4294759328
ksoftirqd/3-26 [003] 99.942489: timer_start: timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294759822 [timeout=494] cpu=4 idx=114 flags=D|P|I
ksoftirqd/3-26 [003] 99.942491: timer_expire_exit: timer=0xffffffffa55a9d20
sysjitter-2377 [004] 99.942491: call_function_single_entry: vector=251
sysjitter-2377 [004] 99.942492: call_function_single_exit: vector=251
ksoftirqd/3-26 [003] 99.942493: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/3-26 [003] 99.942494: softirq_entry: vec=7 [action=SCHED]
sysjitter-2377 [004] 99.942494: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92630457385 softexpires=92630457385
---
So, my understanding is that clock gets reactivated on CPU 4 to run
clocksource watchdog (indeed if I use tsc=nowatchdog this above doesn't
happen, but the potential issue is still there I guess).
---
sysjitter-2377 [004] 100.438455: local_timer_entry: vector=236
sysjitter-2377 [004] 100.438456: hrtimer_cancel: hrtimer=0xffff92e9df91fd20
sysjitter-2377 [004] 100.438458: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630461337 function=tick_sched_timer/0x0
sysjitter-2377 [004] 100.438460: kernel_stack: <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> update_process_times (ffffffffa3768f9d)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
sysjitter-2377 [004] 100.438461: softirq_raise: vec=1 [action=TIMER]
sysjitter-2377 [004] 100.438464: kernel_stack: <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> trigger_load_balance (ffffffffa371cb9c)
=> update_process_times (ffffffffa3768fc7)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
sysjitter-2377 [004] 100.438465: softirq_raise: vec=7 [action=SCHED]
---
Tick fires, the two softirq_raise (which you wondered about) seems to
come from
1. update_process_times::run_local_timers (TIMER_SOFTIRQ)
2. update_process_times::scheduler_tick::trigger_load_balance (SCHED_SOFTIRQ)
---
sysjitter-2377 [004] 100.438468: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20
sysjitter-2377 [004] 100.438470: local_timer_exit: vector=236
sysjitter-2377 [004] 100.438473: sched_wakeup: ksoftirqd/4:31 [120] success=1 CPU:004
sysjitter-2377 [004] 100.438474: tick_stop: success=0 dependency=SCHED
sysjitter-2377 [004] 100.438476: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92630512937 softexpires=92630512937
sysjitter-2377 [004] 100.438484: irq_work_entry: vector=246
sysjitter-2377 [004] 100.438491: irq_work_exit: vector=246
sysjitter-2377 [004] 100.438492: tick_stop: success=0 dependency=SCHED
sysjitter-2377 [004] 100.438495: sched_switch: sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120]
ksoftirqd/4-31 [004] 100.438497: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/4-31 [004] 100.438499: timer_cancel: timer=0xffffffffa55a9d20
ksoftirqd/4-31 [004] 100.438501: timer_expire_entry: timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759824 baseclk=4294759824
ksoftirqd/4-31 [004] 100.438504: timer_start: timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294760322 [timeout=498] cpu=5 idx=113 flags=D|P|I
---
Queueing clocksource watchdog on CPU5.
---
ksoftirqd/4-31 [004] 100.438506: timer_expire_exit: timer=0xffffffffa55a9d20
ksoftirqd/4-31 [004] 100.438507: local_timer_entry: vector=236
---
Back to back timer?
---
ksoftirqd/4-31 [004] 100.438509: hrtimer_cancel: hrtimer=0xffff92e9df91fd20
ksoftirqd/4-31 [004] 100.438511: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630515022 function=tick_sched_timer/0x0
ksoftirqd/4-31 [004] 100.438515: kernel_stack: <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> rcu_sched_clock_irq (ffffffffa375af4a)
=> update_process_times (ffffffffa3768fa4)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
=> filter_pred_32 (ffffffffa37f3357)
=> filter_match_preds (ffffffffa37f3510)
=> trace_event_buffer_commit (ffffffffa37dc7eb)
=> trace_event_raw_event_softirq (ffffffffa36dab77)
=> __do_softirq (ffffffffa420025a)
=> run_ksoftirqd (ffffffffa36dadc6)
=> smpboot_thread_fn (ffffffffa36ffdb8)
=> kthread (ffffffffa36f9fb7)
=> ret_from_fork (ffffffffa4000215)
ksoftirqd/4-31 [004] 100.438516: softirq_raise: vec=9 [action=RCU]
ksoftirqd/4-31 [004] 100.438520: kernel_stack: <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> trigger_load_balance (ffffffffa371cb9c)
=> update_process_times (ffffffffa3768fc7)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
=> filter_pred_32 (ffffffffa37f3357)
=> filter_match_preds (ffffffffa37f3510)
=> trace_event_buffer_commit (ffffffffa37dc7eb)
=> trace_event_raw_event_softirq (ffffffffa36dab77)
=> __do_softirq (ffffffffa420025a)
=> run_ksoftirqd (ffffffffa36dadc6)
=> smpboot_thread_fn (ffffffffa36ffdb8)
=> kthread (ffffffffa36f9fb7)
=> ret_from_fork (ffffffffa4000215)
ksoftirqd/4-31 [004] 100.438520: softirq_raise: vec=7 [action=SCHED]
ksoftirqd/4-31 [004] 100.438521: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20
ksoftirqd/4-31 [004] 100.438523: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92631512937 softexpires=92631512937
ksoftirqd/4-31 [004] 100.438525: local_timer_exit: vector=236
ksoftirqd/4-31 [004] 100.438527: tick_stop: success=0 dependency=SCHED
---
At this point (I think) ksoftirqd is already going to sleep, so only
sysjitter will be eventually executing on CPU4, but tick_stop check
still sees nr_running == 2.
---
ksoftirqd/4-31 [004] 100.438530: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/4-31 [004] 100.438531: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/4-31 [004] 100.438533: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/4-31 [004] 100.438534: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/4-31 [004] 100.438535: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/4-31 [004] 100.438536: softirq_entry: vec=9 [action=RCU]
ksoftirqd/4-31 [004] 100.438537: softirq_exit: vec=9 [action=RCU]
ksoftirqd/4-31 [004] 100.438540: sched_switch: ksoftirqd/4:31 [120] S ==> sysjitter:2377 [120]
---
And tick again fires below.
---
sysjitter-2377 [004] 100.439509: local_timer_entry: vector=236
sysjitter-2377 [004] 100.439511: hrtimer_cancel: hrtimer=0xffff92e9df91fd20
sysjitter-2377 [004] 100.439512: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92631515408 function=tick_sched_timer/0x0
sysjitter-2377 [004] 100.439514: kernel_stack: <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> update_process_times (ffffffffa3768f9d)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
sysjitter-2377 [004] 100.439515: softirq_raise: vec=1 [action=TIMER]
sysjitter-2377 [004] 100.439517: kernel_stack: <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> trigger_load_balance (ffffffffa371cb9c)
=> update_process_times (ffffffffa3768fc7)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
sysjitter-2377 [004] 100.439518: softirq_raise: vec=7 [action=SCHED]
sysjitter-2377 [004] 100.439520: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20
sysjitter-2377 [004] 100.439521: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92632512937 softexpires=92632512937
sysjitter-2377 [004] 100.439522: local_timer_exit: vector=236
sysjitter-2377 [004] 100.439525: sched_wakeup: ksoftirqd/4:31 [120] success=1 CPU:004
sysjitter-2377 [004] 100.439527: tick_stop: success=0 dependency=SCHED
sysjitter-2377 [004] 100.439529: irq_work_entry: vector=246
sysjitter-2377 [004] 100.439531: irq_work_exit: vector=246
sysjitter-2377 [004] 100.439532: tick_stop: success=0 dependency=SCHED
sysjitter-2377 [004] 100.439534: sched_switch: sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120]
ksoftirqd/4-31 [004] 100.439536: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/4-31 [004] 100.439537: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/4-31 [004] 100.439538: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/4-31 [004] 100.439540: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/4-31 [004] 100.439541: sched_switch: ksoftirqd/4:31 [120] S ==> sysjitter:2377 [120]
---
History seems to repeat itself until ~2s later (this is one of the
pathological runs) when an ipi fires on CPU4 and SCHED dependecy is seen
as cleared (as only sysjitter thread was is running at that point).
Does this make sense and help in any way? Suggestions for debugging this
further? :-)
Thanks!
Juri
Hi,
On 21/05/20 19:00, Juri Lelli wrote:
> On 21/05/20 02:44, Frederic Weisbecker wrote:
> > On Wed, May 20, 2020 at 08:47:10PM +0200, Juri Lelli wrote:
> > > On 20/05/20 19:02, Frederic Weisbecker wrote:
> > > > On Wed, May 20, 2020 at 06:49:25PM +0200, Juri Lelli wrote:
> > > > > On 20/05/20 18:24, Frederic Weisbecker wrote:
> > > > >
> > > > > Hummm, so I enabled 'timer:*', anything else you think I should be
> > > > > looking at?
> > > >
> > > > Are you sure you also enabled timer_expire_entry?
> > > > Because:
> > >
> > > I run with
> > >
> > > trace-cmd record -e sched_switch -e sched_wakeup -e sched_migrate_task \
> > > -e 'timer:*' -e 'irq_vectors:*' -e 'irq:softirq*' \
> > > -e 'irq:irq_handler*' taskset --cpu-list 4-35 ./sysjitter/sysjitter \
> > > --runtime 10 200
> > >
> > > And
> > >
> > > $ grep ' timer_expire_entry:' trace.txt | wc -l
> > > 1173
> > >
> > > seems to confirm that the event was indeed enabled, e.g.
> > >
> > > ksoftirqd/11-100 [011] 159.270023: timer_expire_entry: timer=0xffff9807df8a5a60 function=delayed_work_timer_fn now=4294826418 baseclk=4294825984
> >
> > That's interesting! I'll see if I can reproduce that with threaded irqs. If not we'll
> > have to investigate on your machine. We really don't want to trigger timer softirqs
> > when it's not necessary.
> >
>
> So, I managed to get a bit more info on this, please check the following
> (which I edited a bit to hopefully make it clearer and apologies for
> long lines, but I believe trimming them would mess things up)
>
> ---
> ksoftirqd/3-26 [003] 99.942485: timer_expire_entry: timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759328 baseclk=4294759328
> ksoftirqd/3-26 [003] 99.942489: timer_start: timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294759822 [timeout=494] cpu=4 idx=114 flags=D|P|I
> ksoftirqd/3-26 [003] 99.942491: timer_expire_exit: timer=0xffffffffa55a9d20
> sysjitter-2377 [004] 99.942491: call_function_single_entry: vector=251
> sysjitter-2377 [004] 99.942492: call_function_single_exit: vector=251
> ksoftirqd/3-26 [003] 99.942493: softirq_exit: vec=1 [action=TIMER]
> ksoftirqd/3-26 [003] 99.942494: softirq_entry: vec=7 [action=SCHED]
> sysjitter-2377 [004] 99.942494: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92630457385 softexpires=92630457385
> ---
>
> So, my understanding is that clock gets reactivated on CPU 4 to run
> clocksource watchdog (indeed if I use tsc=nowatchdog this above doesn't
> happen, but the potential issue is still there I guess).
>
> ---
> sysjitter-2377 [004] 100.438455: local_timer_entry: vector=236
> sysjitter-2377 [004] 100.438456: hrtimer_cancel: hrtimer=0xffff92e9df91fd20
> sysjitter-2377 [004] 100.438458: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630461337 function=tick_sched_timer/0x0
> sysjitter-2377 [004] 100.438460: kernel_stack: <stack trace>
> => __ftrace_trace_stack (ffffffffa37da921)
> => __raise_softirq_irqoff (ffffffffa36daf50)
> => raise_softirq (ffffffffa36db0fe)
> => update_process_times (ffffffffa3768f9d)
> => tick_sched_handle (ffffffffa377aaa2)
> => tick_sched_timer (ffffffffa377ad53)
> => __hrtimer_run_queues (ffffffffa3769cf0)
> => hrtimer_interrupt (ffffffffa376a58a)
> => smp_apic_timer_interrupt (ffffffffa40028f8)
> => apic_timer_interrupt (ffffffffa4001b7f)
> sysjitter-2377 [004] 100.438461: softirq_raise: vec=1 [action=TIMER]
> sysjitter-2377 [004] 100.438464: kernel_stack: <stack trace>
> => __ftrace_trace_stack (ffffffffa37da921)
> => __raise_softirq_irqoff (ffffffffa36daf50)
> => raise_softirq (ffffffffa36db0fe)
> => trigger_load_balance (ffffffffa371cb9c)
> => update_process_times (ffffffffa3768fc7)
> => tick_sched_handle (ffffffffa377aaa2)
> => tick_sched_timer (ffffffffa377ad53)
> => __hrtimer_run_queues (ffffffffa3769cf0)
> => hrtimer_interrupt (ffffffffa376a58a)
> => smp_apic_timer_interrupt (ffffffffa40028f8)
> => apic_timer_interrupt (ffffffffa4001b7f)
> sysjitter-2377 [004] 100.438465: softirq_raise: vec=7 [action=SCHED]
> ---
>
> Tick fires, the two softirq_raise (which you wondered about) seems to
> come from
>
> 1. update_process_times::run_local_timers (TIMER_SOFTIRQ)
> 2. update_process_times::scheduler_tick::trigger_load_balance (SCHED_SOFTIRQ)
>
> ---
> sysjitter-2377 [004] 100.438468: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20
> sysjitter-2377 [004] 100.438470: local_timer_exit: vector=236
> sysjitter-2377 [004] 100.438473: sched_wakeup: ksoftirqd/4:31 [120] success=1 CPU:004
> sysjitter-2377 [004] 100.438474: tick_stop: success=0 dependency=SCHED
> sysjitter-2377 [004] 100.438476: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92630512937 softexpires=92630512937
> sysjitter-2377 [004] 100.438484: irq_work_entry: vector=246
> sysjitter-2377 [004] 100.438491: irq_work_exit: vector=246
> sysjitter-2377 [004] 100.438492: tick_stop: success=0 dependency=SCHED
> sysjitter-2377 [004] 100.438495: sched_switch: sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120]
> ksoftirqd/4-31 [004] 100.438497: softirq_entry: vec=1 [action=TIMER]
> ksoftirqd/4-31 [004] 100.438499: timer_cancel: timer=0xffffffffa55a9d20
> ksoftirqd/4-31 [004] 100.438501: timer_expire_entry: timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759824 baseclk=4294759824
> ksoftirqd/4-31 [004] 100.438504: timer_start: timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294760322 [timeout=498] cpu=5 idx=113 flags=D|P|I
> ---
>
> Queueing clocksource watchdog on CPU5.
>
> ---
> ksoftirqd/4-31 [004] 100.438506: timer_expire_exit: timer=0xffffffffa55a9d20
> ksoftirqd/4-31 [004] 100.438507: local_timer_entry: vector=236
> ---
>
> Back to back timer?
>
> ---
> ksoftirqd/4-31 [004] 100.438509: hrtimer_cancel: hrtimer=0xffff92e9df91fd20
> ksoftirqd/4-31 [004] 100.438511: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630515022 function=tick_sched_timer/0x0
> ksoftirqd/4-31 [004] 100.438515: kernel_stack: <stack trace>
> => __ftrace_trace_stack (ffffffffa37da921)
> => __raise_softirq_irqoff (ffffffffa36daf50)
> => raise_softirq (ffffffffa36db0fe)
> => rcu_sched_clock_irq (ffffffffa375af4a)
> => update_process_times (ffffffffa3768fa4)
> => tick_sched_handle (ffffffffa377aaa2)
> => tick_sched_timer (ffffffffa377ad53)
> => __hrtimer_run_queues (ffffffffa3769cf0)
> => hrtimer_interrupt (ffffffffa376a58a)
> => smp_apic_timer_interrupt (ffffffffa40028f8)
> => apic_timer_interrupt (ffffffffa4001b7f)
> => filter_pred_32 (ffffffffa37f3357)
> => filter_match_preds (ffffffffa37f3510)
> => trace_event_buffer_commit (ffffffffa37dc7eb)
> => trace_event_raw_event_softirq (ffffffffa36dab77)
> => __do_softirq (ffffffffa420025a)
> => run_ksoftirqd (ffffffffa36dadc6)
> => smpboot_thread_fn (ffffffffa36ffdb8)
> => kthread (ffffffffa36f9fb7)
> => ret_from_fork (ffffffffa4000215)
> ksoftirqd/4-31 [004] 100.438516: softirq_raise: vec=9 [action=RCU]
> ksoftirqd/4-31 [004] 100.438520: kernel_stack: <stack trace>
> => __ftrace_trace_stack (ffffffffa37da921)
> => __raise_softirq_irqoff (ffffffffa36daf50)
> => raise_softirq (ffffffffa36db0fe)
> => trigger_load_balance (ffffffffa371cb9c)
> => update_process_times (ffffffffa3768fc7)
> => tick_sched_handle (ffffffffa377aaa2)
> => tick_sched_timer (ffffffffa377ad53)
> => __hrtimer_run_queues (ffffffffa3769cf0)
> => hrtimer_interrupt (ffffffffa376a58a)
> => smp_apic_timer_interrupt (ffffffffa40028f8)
> => apic_timer_interrupt (ffffffffa4001b7f)
> => filter_pred_32 (ffffffffa37f3357)
> => filter_match_preds (ffffffffa37f3510)
> => trace_event_buffer_commit (ffffffffa37dc7eb)
> => trace_event_raw_event_softirq (ffffffffa36dab77)
> => __do_softirq (ffffffffa420025a)
> => run_ksoftirqd (ffffffffa36dadc6)
> => smpboot_thread_fn (ffffffffa36ffdb8)
> => kthread (ffffffffa36f9fb7)
> => ret_from_fork (ffffffffa4000215)
> ksoftirqd/4-31 [004] 100.438520: softirq_raise: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.438521: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20
> ksoftirqd/4-31 [004] 100.438523: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92631512937 softexpires=92631512937
> ksoftirqd/4-31 [004] 100.438525: local_timer_exit: vector=236
> ksoftirqd/4-31 [004] 100.438527: tick_stop: success=0 dependency=SCHED
> ---
>
> At this point (I think) ksoftirqd is already going to sleep, so only
> sysjitter will be eventually executing on CPU4, but tick_stop check
> still sees nr_running == 2.
>
> ---
> ksoftirqd/4-31 [004] 100.438530: softirq_exit: vec=1 [action=TIMER]
> ksoftirqd/4-31 [004] 100.438531: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.438533: softirq_exit: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.438534: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.438535: softirq_exit: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.438536: softirq_entry: vec=9 [action=RCU]
> ksoftirqd/4-31 [004] 100.438537: softirq_exit: vec=9 [action=RCU]
> ksoftirqd/4-31 [004] 100.438540: sched_switch: ksoftirqd/4:31 [120] S ==> sysjitter:2377 [120]
> ---
>
> And tick again fires below.
>
> ---
> sysjitter-2377 [004] 100.439509: local_timer_entry: vector=236
> sysjitter-2377 [004] 100.439511: hrtimer_cancel: hrtimer=0xffff92e9df91fd20
> sysjitter-2377 [004] 100.439512: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92631515408 function=tick_sched_timer/0x0
> sysjitter-2377 [004] 100.439514: kernel_stack: <stack trace>
> => __ftrace_trace_stack (ffffffffa37da921)
> => __raise_softirq_irqoff (ffffffffa36daf50)
> => raise_softirq (ffffffffa36db0fe)
> => update_process_times (ffffffffa3768f9d)
> => tick_sched_handle (ffffffffa377aaa2)
> => tick_sched_timer (ffffffffa377ad53)
> => __hrtimer_run_queues (ffffffffa3769cf0)
> => hrtimer_interrupt (ffffffffa376a58a)
> => smp_apic_timer_interrupt (ffffffffa40028f8)
> => apic_timer_interrupt (ffffffffa4001b7f)
> sysjitter-2377 [004] 100.439515: softirq_raise: vec=1 [action=TIMER]
> sysjitter-2377 [004] 100.439517: kernel_stack: <stack trace>
> => __ftrace_trace_stack (ffffffffa37da921)
> => __raise_softirq_irqoff (ffffffffa36daf50)
> => raise_softirq (ffffffffa36db0fe)
> => trigger_load_balance (ffffffffa371cb9c)
> => update_process_times (ffffffffa3768fc7)
> => tick_sched_handle (ffffffffa377aaa2)
> => tick_sched_timer (ffffffffa377ad53)
> => __hrtimer_run_queues (ffffffffa3769cf0)
> => hrtimer_interrupt (ffffffffa376a58a)
> => smp_apic_timer_interrupt (ffffffffa40028f8)
> => apic_timer_interrupt (ffffffffa4001b7f)
> sysjitter-2377 [004] 100.439518: softirq_raise: vec=7 [action=SCHED]
> sysjitter-2377 [004] 100.439520: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20
> sysjitter-2377 [004] 100.439521: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92632512937 softexpires=92632512937
> sysjitter-2377 [004] 100.439522: local_timer_exit: vector=236
> sysjitter-2377 [004] 100.439525: sched_wakeup: ksoftirqd/4:31 [120] success=1 CPU:004
> sysjitter-2377 [004] 100.439527: tick_stop: success=0 dependency=SCHED
> sysjitter-2377 [004] 100.439529: irq_work_entry: vector=246
> sysjitter-2377 [004] 100.439531: irq_work_exit: vector=246
> sysjitter-2377 [004] 100.439532: tick_stop: success=0 dependency=SCHED
> sysjitter-2377 [004] 100.439534: sched_switch: sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120]
> ksoftirqd/4-31 [004] 100.439536: softirq_entry: vec=1 [action=TIMER]
> ksoftirqd/4-31 [004] 100.439537: softirq_exit: vec=1 [action=TIMER]
> ksoftirqd/4-31 [004] 100.439538: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.439540: softirq_exit: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.439541: sched_switch: ksoftirqd/4:31 [120] S ==> sysjitter:2377 [120]
> ---
>
> History seems to repeat itself until ~2s later (this is one of the
> pathological runs) when an ipi fires on CPU4 and SCHED dependecy is seen
> as cleared (as only sysjitter thread was is running at that point).
>
> Does this make sense and help in any way? Suggestions for debugging this
> further? :-)
Gentle ping about this issue.
Thanks,
Juri
On Thu, May 21, 2020 at 07:00:20PM +0200, Juri Lelli wrote:
> ksoftirqd/3-26 [003] 99.942485: timer_expire_entry: timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759328 baseclk=4294759328
> ksoftirqd/3-26 [003] 99.942489: timer_start: timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294759822 [timeout=494] cpu=4 idx=114 flags=D|P|I
> ksoftirqd/3-26 [003] 99.942491: timer_expire_exit: timer=0xffffffffa55a9d20
> sysjitter-2377 [004] 99.942491: call_function_single_entry: vector=251
> sysjitter-2377 [004] 99.942492: call_function_single_exit: vector=251
> ksoftirqd/3-26 [003] 99.942493: softirq_exit: vec=1 [action=TIMER]
> ksoftirqd/3-26 [003] 99.942494: softirq_entry: vec=7 [action=SCHED]
> sysjitter-2377 [004] 99.942494: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92630457385 softexpires=92630457385
> ---
>
> So, my understanding is that clock gets reactivated on CPU 4 to run
> clocksource watchdog (indeed if I use tsc=nowatchdog this above doesn't
> happen, but the potential issue is still there I guess).
CPU 4 is being disturbed by an IPI. It's possibly caused by RCU, there
is a pending fix that I need to refresh and repost but you may
want to give it a try: https://lore.kernel.org/lkml/[email protected]/
>
> ---
> sysjitter-2377 [004] 100.438455: local_timer_entry: vector=236
> sysjitter-2377 [004] 100.438456: hrtimer_cancel: hrtimer=0xffff92e9df91fd20
> sysjitter-2377 [004] 100.438458: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630461337 function=tick_sched_timer/0x0
> sysjitter-2377 [004] 100.438460: kernel_stack: <stack trace>
> => __ftrace_trace_stack (ffffffffa37da921)
> => __raise_softirq_irqoff (ffffffffa36daf50)
> => raise_softirq (ffffffffa36db0fe)
> => update_process_times (ffffffffa3768f9d)
> => tick_sched_handle (ffffffffa377aaa2)
> => tick_sched_timer (ffffffffa377ad53)
> => __hrtimer_run_queues (ffffffffa3769cf0)
> => hrtimer_interrupt (ffffffffa376a58a)
> => smp_apic_timer_interrupt (ffffffffa40028f8)
> => apic_timer_interrupt (ffffffffa4001b7f)
> sysjitter-2377 [004] 100.438461: softirq_raise: vec=1 [action=TIMER]
> sysjitter-2377 [004] 100.438464: kernel_stack: <stack trace>
> => __ftrace_trace_stack (ffffffffa37da921)
> => __raise_softirq_irqoff (ffffffffa36daf50)
> => raise_softirq (ffffffffa36db0fe)
> => trigger_load_balance (ffffffffa371cb9c)
> => update_process_times (ffffffffa3768fc7)
> => tick_sched_handle (ffffffffa377aaa2)
> => tick_sched_timer (ffffffffa377ad53)
> => __hrtimer_run_queues (ffffffffa3769cf0)
> => hrtimer_interrupt (ffffffffa376a58a)
> => smp_apic_timer_interrupt (ffffffffa40028f8)
> => apic_timer_interrupt (ffffffffa4001b7f)
> sysjitter-2377 [004] 100.438465: softirq_raise: vec=7 [action=SCHED]
> ---
>
> Tick fires, the two softirq_raise (which you wondered about) seems to
> come from
>
> 1. update_process_times::run_local_timers (TIMER_SOFTIRQ)
> 2. update_process_times::scheduler_tick::trigger_load_balance (SCHED_SOFTIRQ)
Well, the places raising the softirqs are always the same. Now how the timer
internals can decide to raise the TIMER_SOFTIRQ when there is no callback
expired is more mysterious.
>
> ---
> sysjitter-2377 [004] 100.438468: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20
> sysjitter-2377 [004] 100.438470: local_timer_exit: vector=236
> sysjitter-2377 [004] 100.438473: sched_wakeup: ksoftirqd/4:31 [120] success=1 CPU:004
> sysjitter-2377 [004] 100.438474: tick_stop: success=0 dependency=SCHED
> sysjitter-2377 [004] 100.438476: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92630512937 softexpires=92630512937
> sysjitter-2377 [004] 100.438484: irq_work_entry: vector=246
> sysjitter-2377 [004] 100.438491: irq_work_exit: vector=246
> sysjitter-2377 [004] 100.438492: tick_stop: success=0 dependency=SCHED
> sysjitter-2377 [004] 100.438495: sched_switch: sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120]
> ksoftirqd/4-31 [004] 100.438497: softirq_entry: vec=1 [action=TIMER]
> ksoftirqd/4-31 [004] 100.438499: timer_cancel: timer=0xffffffffa55a9d20
> ksoftirqd/4-31 [004] 100.438501: timer_expire_entry: timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759824 baseclk=4294759824
> ksoftirqd/4-31 [004] 100.438504: timer_start: timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294760322 [timeout=498] cpu=5 idx=113 flags=D|P|I
> ---
>
> Queueing clocksource watchdog on CPU5.
>
> ---
> ksoftirqd/4-31 [004] 100.438506: timer_expire_exit: timer=0xffffffffa55a9d20
> ksoftirqd/4-31 [004] 100.438507: local_timer_entry: vector=236
> ---
>
> Back to back timer?
It looks like the tick has been reprogrammed very early indeed.
>
> ---
> ksoftirqd/4-31 [004] 100.438509: hrtimer_cancel: hrtimer=0xffff92e9df91fd20
> ksoftirqd/4-31 [004] 100.438511: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630515022 function=tick_sched_timer/0x0
> ksoftirqd/4-31 [004] 100.438515: kernel_stack: <stack trace>
> => __ftrace_trace_stack (ffffffffa37da921)
> => __raise_softirq_irqoff (ffffffffa36daf50)
> => raise_softirq (ffffffffa36db0fe)
> => rcu_sched_clock_irq (ffffffffa375af4a)
> => update_process_times (ffffffffa3768fa4)
> => tick_sched_handle (ffffffffa377aaa2)
> => tick_sched_timer (ffffffffa377ad53)
> => __hrtimer_run_queues (ffffffffa3769cf0)
> => hrtimer_interrupt (ffffffffa376a58a)
> => smp_apic_timer_interrupt (ffffffffa40028f8)
> => apic_timer_interrupt (ffffffffa4001b7f)
> => filter_pred_32 (ffffffffa37f3357)
> => filter_match_preds (ffffffffa37f3510)
> => trace_event_buffer_commit (ffffffffa37dc7eb)
> => trace_event_raw_event_softirq (ffffffffa36dab77)
> => __do_softirq (ffffffffa420025a)
> => run_ksoftirqd (ffffffffa36dadc6)
> => smpboot_thread_fn (ffffffffa36ffdb8)
> => kthread (ffffffffa36f9fb7)
> => ret_from_fork (ffffffffa4000215)
> ksoftirqd/4-31 [004] 100.438516: softirq_raise: vec=9 [action=RCU]
> ksoftirqd/4-31 [004] 100.438520: kernel_stack: <stack trace>
> => __ftrace_trace_stack (ffffffffa37da921)
> => __raise_softirq_irqoff (ffffffffa36daf50)
> => raise_softirq (ffffffffa36db0fe)
> => trigger_load_balance (ffffffffa371cb9c)
> => update_process_times (ffffffffa3768fc7)
> => tick_sched_handle (ffffffffa377aaa2)
> => tick_sched_timer (ffffffffa377ad53)
> => __hrtimer_run_queues (ffffffffa3769cf0)
> => hrtimer_interrupt (ffffffffa376a58a)
> => smp_apic_timer_interrupt (ffffffffa40028f8)
> => apic_timer_interrupt (ffffffffa4001b7f)
> => filter_pred_32 (ffffffffa37f3357)
> => filter_match_preds (ffffffffa37f3510)
> => trace_event_buffer_commit (ffffffffa37dc7eb)
> => trace_event_raw_event_softirq (ffffffffa36dab77)
> => __do_softirq (ffffffffa420025a)
> => run_ksoftirqd (ffffffffa36dadc6)
> => smpboot_thread_fn (ffffffffa36ffdb8)
> => kthread (ffffffffa36f9fb7)
> => ret_from_fork (ffffffffa4000215)
> ksoftirqd/4-31 [004] 100.438520: softirq_raise: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.438521: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20
> ksoftirqd/4-31 [004] 100.438523: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92631512937 softexpires=92631512937
> ksoftirqd/4-31 [004] 100.438525: local_timer_exit: vector=236
> ksoftirqd/4-31 [004] 100.438527: tick_stop: success=0 dependency=SCHED
> ---
>
> At this point (I think) ksoftirqd is already going to sleep, so only
> sysjitter will be eventually executing on CPU4, but tick_stop check
> still sees nr_running == 2.
As long at it hasn't been dequeued from the runqueue, tick_stop sees nr_running == 2.
>
> ---
> ksoftirqd/4-31 [004] 100.438530: softirq_exit: vec=1 [action=TIMER]
Was there a corresponding softirq_enter? With timers expiring inside?
> ksoftirqd/4-31 [004] 100.438531: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.438533: softirq_exit: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.438534: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.438535: softirq_exit: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.438536: softirq_entry: vec=9 [action=RCU]
> ksoftirqd/4-31 [004] 100.438537: softirq_exit: vec=9 [action=RCU]
> ksoftirqd/4-31 [004] 100.438540: sched_switch: ksoftirqd/4:31 [120] S ==> sysjitter:2377 [120]
> ---
>
> And tick again fires below.
>
> ---
> sysjitter-2377 [004] 100.439509: local_timer_entry: vector=236
> sysjitter-2377 [004] 100.439511: hrtimer_cancel: hrtimer=0xffff92e9df91fd20
> sysjitter-2377 [004] 100.439512: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92631515408 function=tick_sched_timer/0x0
Right that's expected behaviour.
> sysjitter-2377 [004] 100.439518: softirq_raise: vec=7 [action=SCHED]
> sysjitter-2377 [004] 100.439520: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20
> sysjitter-2377 [004] 100.439521: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92632512937 softexpires=92632512937
> sysjitter-2377 [004] 100.439522: local_timer_exit: vector=236
> sysjitter-2377 [004] 100.439525: sched_wakeup: ksoftirqd/4:31 [120] success=1 CPU:004
> sysjitter-2377 [004] 100.439527: tick_stop: success=0 dependency=SCHED
> sysjitter-2377 [004] 100.439529: irq_work_entry: vector=246
> sysjitter-2377 [004] 100.439531: irq_work_exit: vector=246
> sysjitter-2377 [004] 100.439532: tick_stop: success=0 dependency=SCHED
> sysjitter-2377 [004] 100.439534: sched_switch: sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120]
> ksoftirqd/4-31 [004] 100.439536: softirq_entry: vec=1 [action=TIMER]
> ksoftirqd/4-31 [004] 100.439537: softirq_exit: vec=1 [action=TIMER]
The empty timer softirq again.
> ksoftirqd/4-31 [004] 100.439538: softirq_entry: vec=7 [action=SCHED]
> ksoftirqd/4-31 [004] 100.439540: softirq_exit: vec=7 [action=SCHED]
I'm also wondering if it's expected to see the sched softirq repeating so often.
> ksoftirqd/4-31 [004] 100.439541: sched_switch: ksoftirqd/4:31 [120] S ==> sysjitter:2377 [120]
> ---
>
> History seems to repeat itself until ~2s later (this is one of the
> pathological runs) when an ipi fires on CPU4 and SCHED dependecy is seen
> as cleared (as only sysjitter thread was is running at that point).
>
> Does this make sense and help in any way? Suggestions for debugging this
> further? :-)
Let's see if I can reproduce it first. If not I'll need to bother you further :)
Thanks!
On 15/06/20 23:07, Frederic Weisbecker wrote:
> On Thu, May 21, 2020 at 07:00:20PM +0200, Juri Lelli wrote:
[...]
> > sysjitter-2377 [004] 100.438495: sched_switch: sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120]
> > ksoftirqd/4-31 [004] 100.438497: softirq_entry: vec=1 [action=TIMER]
> > ksoftirqd/4-31 [004] 100.438499: timer_cancel: timer=0xffffffffa55a9d20
> > ksoftirqd/4-31 [004] 100.438501: timer_expire_entry: timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759824 baseclk=4294759824
> > ksoftirqd/4-31 [004] 100.438504: timer_start: timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294760322 [timeout=498] cpu=5 idx=113 flags=D|P|I
[...]
> > ksoftirqd/4-31 [004] 100.438509: hrtimer_cancel: hrtimer=0xffff92e9df91fd20
> > ksoftirqd/4-31 [004] 100.438511: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630515022 function=tick_sched_timer/0x0
> > ksoftirqd/4-31 [004] 100.438515: kernel_stack: <stack trace>
> > => __ftrace_trace_stack (ffffffffa37da921)
> > => __raise_softirq_irqoff (ffffffffa36daf50)
> > => raise_softirq (ffffffffa36db0fe)
> > => rcu_sched_clock_irq (ffffffffa375af4a)
> > => update_process_times (ffffffffa3768fa4)
> > => tick_sched_handle (ffffffffa377aaa2)
> > => tick_sched_timer (ffffffffa377ad53)
> > => __hrtimer_run_queues (ffffffffa3769cf0)
> > => hrtimer_interrupt (ffffffffa376a58a)
> > => smp_apic_timer_interrupt (ffffffffa40028f8)
> > => apic_timer_interrupt (ffffffffa4001b7f)
> > => filter_pred_32 (ffffffffa37f3357)
> > => filter_match_preds (ffffffffa37f3510)
> > => trace_event_buffer_commit (ffffffffa37dc7eb)
> > => trace_event_raw_event_softirq (ffffffffa36dab77)
> > => __do_softirq (ffffffffa420025a)
> > => run_ksoftirqd (ffffffffa36dadc6)
> > => smpboot_thread_fn (ffffffffa36ffdb8)
> > => kthread (ffffffffa36f9fb7)
> > => ret_from_fork (ffffffffa4000215)
> > ksoftirqd/4-31 [004] 100.438516: softirq_raise: vec=9 [action=RCU]
> > ksoftirqd/4-31 [004] 100.438520: kernel_stack: <stack trace>
> > => __ftrace_trace_stack (ffffffffa37da921)
> > => __raise_softirq_irqoff (ffffffffa36daf50)
> > => raise_softirq (ffffffffa36db0fe)
> > => trigger_load_balance (ffffffffa371cb9c)
> > => update_process_times (ffffffffa3768fc7)
> > => tick_sched_handle (ffffffffa377aaa2)
> > => tick_sched_timer (ffffffffa377ad53)
> > => __hrtimer_run_queues (ffffffffa3769cf0)
> > => hrtimer_interrupt (ffffffffa376a58a)
> > => smp_apic_timer_interrupt (ffffffffa40028f8)
> > => apic_timer_interrupt (ffffffffa4001b7f)
> > => filter_pred_32 (ffffffffa37f3357)
> > => filter_match_preds (ffffffffa37f3510)
> > => trace_event_buffer_commit (ffffffffa37dc7eb)
> > => trace_event_raw_event_softirq (ffffffffa36dab77)
> > => __do_softirq (ffffffffa420025a)
> > => run_ksoftirqd (ffffffffa36dadc6)
> > => smpboot_thread_fn (ffffffffa36ffdb8)
> > => kthread (ffffffffa36f9fb7)
> > => ret_from_fork (ffffffffa4000215)
> > ksoftirqd/4-31 [004] 100.438520: softirq_raise: vec=7 [action=SCHED]
> > ksoftirqd/4-31 [004] 100.438521: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20
> > ksoftirqd/4-31 [004] 100.438523: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92631512937 softexpires=92631512937
> > ksoftirqd/4-31 [004] 100.438525: local_timer_exit: vector=236
> > ksoftirqd/4-31 [004] 100.438527: tick_stop: success=0 dependency=SCHED
[...]
> > ksoftirqd/4-31 [004] 100.438530: softirq_exit: vec=1 [action=TIMER]
>
> Was there a corresponding softirq_enter? With timers expiring inside?
The one up above, at 100.438497.
[...]
> >
> > Does this make sense and help in any way? Suggestions for debugging this
> > further? :-)
>
> Let's see if I can reproduce it first. If not I'll need to bother you further :)
Sure. Let me know if you find anything.
Best,
Juri
On Tue, Jun 16, 2020 at 08:57:57AM +0200, Juri Lelli wrote:
> Sure. Let me know if you find anything.
I managed to reproduce. With "threadirqs" and without
"tsc=reliable". I see tons of spurious TIMER softirqs.
Investigation begins! I'll let you know.
Thanks!
Hi,
On 16/06/20 22:46, Frederic Weisbecker wrote:
> On Tue, Jun 16, 2020 at 08:57:57AM +0200, Juri Lelli wrote:
> > Sure. Let me know if you find anything.
>
> I managed to reproduce. With "threadirqs" and without
> "tsc=reliable". I see tons of spurious TIMER softirqs.
> Investigation begins! I'll let you know.
Gentle ping. Any new findings about this issue?
Thanks,
Juri
On 29/06/20 14:42, Frederic Weisbecker wrote:
> On Mon, Jun 29, 2020 at 02:36:51PM +0200, Juri Lelli wrote:
> > Hi,
> >
> > On 16/06/20 22:46, Frederic Weisbecker wrote:
> > > On Tue, Jun 16, 2020 at 08:57:57AM +0200, Juri Lelli wrote:
> > > > Sure. Let me know if you find anything.
> > >
> > > I managed to reproduce. With "threadirqs" and without
> > > "tsc=reliable". I see tons of spurious TIMER softirqs.
> > > Investigation begins! I'll let you know.
> >
> > Gentle ping. Any new findings about this issue?
>
> Yep, I'm working on a series to reduce timer softirqs. I'm almost
> done, should post it soon.
Great!
Looking forward to it.
Best,
Juri
On Mon, Jun 29, 2020 at 02:36:51PM +0200, Juri Lelli wrote:
> Hi,
>
> On 16/06/20 22:46, Frederic Weisbecker wrote:
> > On Tue, Jun 16, 2020 at 08:57:57AM +0200, Juri Lelli wrote:
> > > Sure. Let me know if you find anything.
> >
> > I managed to reproduce. With "threadirqs" and without
> > "tsc=reliable". I see tons of spurious TIMER softirqs.
> > Investigation begins! I'll let you know.
>
> Gentle ping. Any new findings about this issue?
Yep, I'm working on a series to reduce timer softirqs. I'm almost
done, should post it soon.
I believe you guessed right:
* Timer tick raises timer softirq
* Ksoftirqd wakes up, restarts the tick
* Timer tick raises timer softirq
* ...
Thanks.