Greetings,
Like so...
#include <time.h>
#include <sys/time.h>
#include <linux/futex.h>
#include <sys/syscall.h>
int sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void *addr2, int val3)
{
return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);
}
int main()
{
struct timespec t;
int f = 1;
clock_gettime(CLOCK_REALTIME, &t);
t.tv_sec -= 10;
while (1) {
sys_futex(&f, FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME, 1, &t, NULL, FUTEX_BITSET_MATCH_ANY);
}
}
The above was handed to me by a colleague working on a Xen guest that
livelocked. I at first though Xen arch must have a weird problem, but
when I tried proggy on my desktop box, while it didn't stop the tick
completely as it did the Xen box, it slowed it to a crawl. I noticed
that this did not happen with newer kernels, so a bisecting I did go,
and found that...
279f14614 x86: apic: Use tsc deadline for oneshot when available
..is what fixed it up. Trouble is, while it fixes up my Haswell box, a
Xen dom0 remains busted by that testcase whether that patch is applied
to the host or not, even though the hypervisor supports deadline timer,
and seemingly regardless of CPU type all together.
Of all the x86_64 bare metal boxen I've tested, only those with the TSC
deadline timer have shown the issue, and there it goes away as of v3.8
unless you boot lapic=notscdeadline.
However, given any x86_64 Intel box with TSC deadline timer (ivy, sandy,
hasbeen) can be made to exhibit the symptom, there may be other arches
that get seriously dinged up or maybe even as thoroughly b0rked as Xen
does when hrtimer_interrupt() is pounded into the ground by userspace.
Alternatively, should someone out there know that all bare metal is in
fact fine post 279f14614, that person will likely also know what the Xen
folks need to do to fix up their busted arch.
The below targets the symptom, consider it hrtimer cluebat attractant.
---
kernel/time/hrtimer.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -933,6 +933,8 @@ remove_hrtimer(struct hrtimer *timer, st
return 0;
}
+static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
+
int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
unsigned long delta_ns, const enum hrtimer_mode mode,
int wakeup)
@@ -980,8 +982,27 @@ int __hrtimer_start_range_ns(struct hrti
* on dynticks target.
*/
wake_up_nohz_cpu(new_base->cpu_base->cpu);
- } else if (new_base->cpu_base == this_cpu_ptr(&hrtimer_bases) &&
- hrtimer_reprogram(timer, new_base)) {
+ } else if (new_base->cpu_base == this_cpu_ptr(&hrtimer_bases)) {
+ int res = hrtimer_reprogram(timer, new_base);
+
+ if (!res)
+ goto out;
+
+ /*
+ * If a buggy app tries forever to be awakened in the past,
+ * banging on hrtimer_interrupt() at high speed can stall
+ * the tick, and on a Xen box, forever. On haswell with
+ * tsc_deadline_timer disabled you can see it, though it
+ * only slows the tick way down. Other bare metal boxes
+ * may also be terminally affected.
+ */
+ if (unlikely(wakeup && !ret && IS_ERR_VALUE(res) &&
+ timer->function == hrtimer_wakeup)) {
+ debug_deactivate(timer);
+ __remove_hrtimer(timer, new_base, 0, 0);
+ ret = -ETIMEDOUT;
+ }
+
/*
* Only allow reprogramming if the new base is on this CPU.
* (it might still be on another CPU if the timer was pending)
@@ -994,7 +1015,10 @@ int __hrtimer_start_range_ns(struct hrti
* lock ordering issue vs. rq->lock.
*/
raw_spin_unlock(&new_base->cpu_base->lock);
- raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+ if (!IS_ERR_VALUE(ret))
+ raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+ else
+ hrtimer_wakeup(timer);
local_irq_restore(flags);
return ret;
} else {
@@ -1002,6 +1026,7 @@ int __hrtimer_start_range_ns(struct hrti
}
}
+out:
unlock_hrtimer_base(timer, &flags);
return ret;
On Wed, 10 Jun 2015, Mike Galbraith wrote:
> The above was handed to me by a colleague working on a Xen guest that
> livelocked. I at first though Xen arch must have a weird problem, but
> when I tried proggy on my desktop box, while it didn't stop the tick
> completely as it did the Xen box, it slowed it to a crawl. I noticed
> that this did not happen with newer kernels, so a bisecting I did go,
> and found that...
>
> 279f14614 x86: apic: Use tsc deadline for oneshot when available
>
> ..is what fixed it up. Trouble is, while it fixes up my Haswell box, a
This does not make any sense at all. It does not matter whether the
box uses tscdeadline or local apic timer. We do not even program the
hardware because we see that the event is in the past already.
So we raise the hrtimer softirqd, which then expires the timer. So all
what happens is that ksoftirqd accumulates runtime, but I cannot at
all see how that amounts to a DoS and brings the machine to a grinding
halt.
> Xen dom0 remains busted by that testcase whether that patch is applied
> to the host or not, even though the hypervisor supports deadline timer,
> and seemingly regardless of CPU type all together.
>
> Of all the x86_64 bare metal boxen I've tested, only those with the TSC
> deadline timer have shown the issue, and there it goes away as of v3.8
> unless you boot lapic=notscdeadline.
I just booted a SNB with lapic=notscdeadline and ran that test
program. All what happens is - as expected - that ksoftirqd runs more
than we would like it to. I cannot observe any anomality vs. local
timer interrupts at all. If I run this pinned on an otherwise idle
core, then I get ~ CONFIG_HZ interrupts per second, which is what you
expect when the cpu never reaches idle.
With the changes pending in tip/timers/core we get more timer
interrupts instead of offloading crap to ksoftirqd, but they cannot
lead to a DoS either and we do not care whether the user spends its
cycles looping in user space or firing timer interrupts. It can only
do as long as it is on the cpu.
These timers (futex, nanosleep, poll, ...) are oneshot and all timers
which are self rearming are rate limited by the fact that we only
rearm when the previous event has been consumed by the task which
scheduled it. So the scheduler controls how many of these events can
be created from user space.
> However, given any x86_64 Intel box with TSC deadline timer (ivy, sandy,
> hasbeen) can be made to exhibit the symptom, there may be other arches
> that get seriously dinged up or maybe even as thoroughly b0rked as Xen
> does when hrtimer_interrupt() is pounded into the ground by userspace.
>
> Alternatively, should someone out there know that all bare metal is in
> fact fine post 279f14614, that person will likely also know what the Xen
> folks need to do to fix up their busted arch.
>
> The below targets the symptom, consider it hrtimer cluebat attractant.
By now I know to take your patches with a grain of salt :)
Some more information about your symptoms in form of configuration,
extra patches, kernel traces etc. would be appreciated.
Thanks,
tglx
On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> On Wed, 10 Jun 2015, Mike Galbraith wrote:
> > The above was handed to me by a colleague working on a Xen guest that
> > livelocked. I at first though Xen arch must have a weird problem, but
> > when I tried proggy on my desktop box, while it didn't stop the tick
> > completely as it did the Xen box, it slowed it to a crawl. I noticed
> > that this did not happen with newer kernels, so a bisecting I did go,
> > and found that...
> >
> > 279f14614 x86: apic: Use tsc deadline for oneshot when available
> >
> > ..is what fixed it up. Trouble is, while it fixes up my Haswell box, a
>
> This does not make any sense at all. It does not matter whether the
> box uses tscdeadline or local apic timer. We do not even program the
> hardware because we see that the event is in the past already.
Yup.
> So we raise the hrtimer softirqd, which then expires the timer. So all
> what happens is that ksoftirqd accumulates runtime, but I cannot at
> all see how that amounts to a DoS and brings the machine to a grinding
> halt.
The tick certainly appears to crawl here, and Dom0 boxen gripe if you
let them not tick at all for a while.
> I just booted a SNB with lapic=notscdeadline and ran that test
> program. All what happens is - as expected - that ksoftirqd runs more
> than we would like it to. I cannot observe any anomality vs. local
> timer interrupts at all. If I run this pinned on an otherwise idle
> core, then I get ~ CONFIG_HZ interrupts per second, which is what you
> expect when the cpu never reaches idle.
Hm. In order to successfully bisect the thing 3.7->3.8 I ran 2xCPUS
copies because the first bisect went gaga. I'm not having any trouble
reproducing on master with a single pinned copy though, nor did I have
any on any of the kernels either stable or enterprise I tested, and
that's quite a few. Whatever, that first bisect did go bad.
> > The below targets the symptom, consider it hrtimer cluebat attractant.
>
> By now I know to take your patches with a grain of salt :)
Sodium being bad for blood pressure is a medical myth.
> Some more information about your symptoms in form of configuration,
> extra patches, kernel traces etc. would be appreciated.
Virgin source or kernels with zillion+ patches, doesn't matter. To test
virgin source earlier than EFI_STUB I had to pollute the source with
EFI backports, but nothing else.
Just a sec while I check yet again that absolutely virgin master really
really does stall.... Yup. I pinned the tescase to CPU3..
while sleep 1; do grep LOC /proc/interrupts; done
LOC: 6706 5367 5053 6217 3031 2866 5477 3022 Local timer interrupts
LOC: 6753 5391 5074 6238 3058 2894 5576 3034 Local timer interrupts
LOC: 6791 5422 5104 6265 3066 2903 5582 3039 Local timer interrupts
LOC: 6846 5472 5154 6293 3096 2909 5595 3042 Local timer interrupts
LOC: 6855 5518 5177 6325 3199 2920 5613 3046 Local timer interrupts
LOC: 6892 5552 5217 6338 3234 2935 5637 3053 Local timer interrupts
LOC: 6983 5568 5236 6347 3244 2944 5660 3065 Local timer interrupts
LOC: 7028 5583 5251 6363 3262 2963 5673 3071 Local timer interrupts
LOC: 7217 5676 5343 6383 3305 2976 5682 3078 Local timer interrupts
LOC: 7432 5803 5418 6387 3371 3039 5757 3080 Local timer interrupts <== here
LOC: 7560 6028 5632 6394 3538 3195 5937 3084 Local timer interrupts
LOC: 7747 6135 5720 6394 3543 3262 6087 3086 Local timer interrupts
LOC: 7930 6206 5785 6394 3571 3288 6303 3087 Local timer interrupts
LOC: 8057 6299 5842 6394 3606 3346 6415 3088 Local timer interrupts
LOC: 8236 6361 5921 6394 3632 3409 6630 3090 Local timer interrupts
LOC: 8382 6448 6004 6394 3664 3478 6754 3090 Local timer interrupts
LOC: 8460 6571 6124 6394 3690 3542 6951 3092 Local timer interrupts
LOC: 8605 6670 6224 6394 3723 3614 7078 3093 Local timer interrupts
LOC: 8710 6842 6323 6394 3776 3702 7295 3123 Local timer interrupts
LOC: 8868 6947 6402 6394 3828 3784 7422 3149 Local timer interrupts
LOC: 9077 7124 6523 6394 3901 3848 7637 3172 Local timer interrupts
LOC: 9222 7189 6596 6394 3971 3928 7763 3174 Local timer interrupts
LOC: 9336 7325 6699 6394 4020 3948 7912 3176 Local timer interrupts
LOC: 9423 7414 6849 6395 4089 3979 7940 3177 Local timer interrupts
LOC: 9637 7595 6923 6395 4111 4039 7942 3179 Local timer interrupts
LOC: 9807 7734 7095 6395 4232 4108 8069 3180 Local timer interrupts
^C
Config attached.
-Mike
On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> Some more information about your symptoms in form of configuration,
> extra patches, kernel traces etc. would be appreciated.
homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
homer:~/tmp # perf script | grep irq_vectors:local_timer
ksoftirqd/3 19 [003] 5586.564238: irq_vectors:local_timer_entry: vector=239
ksoftirqd/3 19 [003] 5586.564239: irq_vectors:local_timer_exit: vector=239
ksoftirqd/3 19 [003] 5586.568238: irq_vectors:local_timer_entry: vector=239
ksoftirqd/3 19 [003] 5586.568239: irq_vectors:local_timer_exit: vector=239
ksoftirqd/3 19 [003] 5586.632218: irq_vectors:local_timer_entry: vector=239
ksoftirqd/3 19 [003] 5586.632219: irq_vectors:local_timer_exit: vector=239
bnc886785_livel 9817 [003] 5587.056069: irq_vectors:local_timer_entry: vector=239
bnc886785_livel 9817 [003] 5587.056071: irq_vectors:local_timer_exit: vector=239
ksoftirqd/3 19 [003] 5587.307978: irq_vectors:local_timer_entry: vector=239
ksoftirqd/3 19 [003] 5587.307978: irq_vectors:local_timer_exit: vector=239
ksoftirqd/3 19 [003] 5588.255633: irq_vectors:local_timer_entry: vector=239
ksoftirqd/3 19 [003] 5588.255635: irq_vectors:local_timer_exit: vector=239
ksoftirqd/3 19 [003] 5589.627132: irq_vectors:local_timer_entry: vector=239
ksoftirqd/3 19 [003] 5589.627133: irq_vectors:local_timer_exit: vector=239
bnc886785_livel 9817 [003] 5590.334877: irq_vectors:local_timer_entry: vector=239
bnc886785_livel 9817 [003] 5590.334883: irq_vectors:local_timer_exit: vector=239
bnc886785_livel 9817 [003] 5590.510815: irq_vectors:local_timer_entry: vector=239
bnc886785_livel 9817 [003] 5590.510816: irq_vectors:local_timer_exit: vector=239
ksoftirqd/3 19 [003] 5590.558794: irq_vectors:local_timer_entry: vector=239
ksoftirqd/3 19 [003] 5590.558796: irq_vectors:local_timer_exit: vector=239
bnc886785_livel 9817 [003] 5590.822707: irq_vectors:local_timer_entry: vector=239
bnc886785_livel 9817 [003] 5590.822709: irq_vectors:local_timer_exit: vector=239
ksoftirqd/3 19 [003] 5592.346152: irq_vectors:local_timer_entry: vector=239
ksoftirqd/3 19 [003] 5592.346153: irq_vectors:local_timer_exit: vector=239
bnc886785_livel 9817 [003] 5592.558074: irq_vectors:local_timer_entry: vector=239
bnc886785_livel 9817 [003] 5592.558076: irq_vectors:local_timer_exit: vector=239
bnc886785_livel 9817 [003] 5593.741647: irq_vectors:local_timer_entry: vector=239
bnc886785_livel 9817 [003] 5593.741649: irq_vectors:local_timer_exit: vector=239
ksoftirqd/3 19 [003] 5594.953208: irq_vectors:local_timer_entry: vector=239
ksoftirqd/3 19 [003] 5594.953209: irq_vectors:local_timer_exit: vector=239
bnc886785_livel 9817 [003] 5595.009197: irq_vectors:local_timer_entry: vector=239
bnc886785_livel 9817 [003] 5595.009198: irq_vectors:local_timer_exit: vector=239
ksoftirqd/3 19 [003] 5595.397049: irq_vectors:local_timer_entry: vector=239
ksoftirqd/3 19 [003] 5595.397050: irq_vectors:local_timer_exit: vector=239
bnc886785_livel 9817 [003] 5595.473025: irq_vectors:local_timer_entry: vector=239
bnc886785_livel 9817 [003] 5595.473027: irq_vectors:local_timer_exit: vector=239
homer:~/tmp #
On Wed, 10 Jun 2015, Mike Galbraith wrote:
> Virgin source or kernels with zillion+ patches, doesn't matter. To test
> virgin source earlier than EFI_STUB I had to pollute the source with
> EFI backports, but nothing else.
>
> Just a sec while I check yet again that absolutely virgin master really
> really does stall.... Yup. I pinned the tescase to CPU3..
>
> LOC: 7432 5803 5418 6387 3371 3039 5757 3080 Local timer interrupts <== here
> LOC: 7560 6028 5632 6394 3538 3195 5937 3084 Local timer interrupts
> LOC: 7747 6135 5720 6394 3543 3262 6087 3086 Local timer interrupts
> LOC: 7930 6206 5785 6394 3571 3288 6303 3087 Local timer interrupts
So now it would be helpful to have a trace to see what CPU3 is
actually doing.
Thanks,
tglx
On Wed, 10 Jun 2015, Mike Galbraith wrote:
> On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
>
> > Some more information about your symptoms in form of configuration,
> > extra patches, kernel traces etc. would be appreciated.
>
> homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> homer:~/tmp # perf script | grep irq_vectors:local_timer
> ksoftirqd/3 19 [003] 5586.564238: irq_vectors:local_timer_entry: vector=239
> ksoftirqd/3 19 [003] 5586.564239: irq_vectors:local_timer_exit: vector=239
> ksoftirqd/3 19 [003] 5586.568238: irq_vectors:local_timer_entry: vector=239
> ksoftirqd/3 19 [003] 5586.568239: irq_vectors:local_timer_exit: vector=239
> ksoftirqd/3 19 [003] 5586.632218: irq_vectors:local_timer_entry: vector=239
That gap is odd. Can you fire up the function tracer please?
Thanks,
tglx
On Wed, 10 Jun 2015 20:59:49 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:
> On Wed, 10 Jun 2015, Mike Galbraith wrote:
>
> > On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> >
> > > Some more information about your symptoms in form of configuration,
> > > extra patches, kernel traces etc. would be appreciated.
> >
> > homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
I would recommend using trace-cmd especially if you add function
tracing, as it's made more for fast tracing.
-- Steve
> > [ perf record: Woken up 2 times to write data ]
> > [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> > homer:~/tmp # perf script | grep irq_vectors:local_timer
> > ksoftirqd/3 19 [003] 5586.564238: irq_vectors:local_timer_entry: vector=239
> > ksoftirqd/3 19 [003] 5586.564239: irq_vectors:local_timer_exit: vector=239
> > ksoftirqd/3 19 [003] 5586.568238: irq_vectors:local_timer_entry: vector=239
> > ksoftirqd/3 19 [003] 5586.568239: irq_vectors:local_timer_exit: vector=239
> > ksoftirqd/3 19 [003] 5586.632218: irq_vectors:local_timer_entry: vector=239
>
> That gap is odd. Can you fire up the function tracer please?
>
> Thanks,
>
> tglx
On Wed, 10 Jun 2015, Steven Rostedt wrote:
> On Wed, 10 Jun 2015 20:59:49 +0200 (CEST)
> Thomas Gleixner <[email protected]> wrote:
>
> > On Wed, 10 Jun 2015, Mike Galbraith wrote:
> >
> > > On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> > >
> > > > Some more information about your symptoms in form of configuration,
> > > > extra patches, kernel traces etc. would be appreciated.
> > >
> > > homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
>
> I would recommend using trace-cmd especially if you add function
> tracing, as it's made more for fast tracing.
>
> -- Steve
>
> > > [ perf record: Woken up 2 times to write data ]
> > > [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> > > homer:~/tmp # perf script | grep irq_vectors:local_timer
> > > ksoftirqd/3 19 [003] 5586.564238: irq_vectors:local_timer_entry: vector=239
> > > ksoftirqd/3 19 [003] 5586.564239: irq_vectors:local_timer_exit: vector=239
> > > ksoftirqd/3 19 [003] 5586.568238: irq_vectors:local_timer_entry: vector=239
> > > ksoftirqd/3 19 [003] 5586.568239: irq_vectors:local_timer_exit: vector=239
> > > ksoftirqd/3 19 [003] 5586.632218: irq_vectors:local_timer_entry: vector=239
> >
> > That gap is odd. Can you fire up the function tracer please?
And please enable hrtimer events as well. Output of that cpu is good enough.
Thanks,
tglx
On Wed, 2015-06-10 at 20:59 +0200, Thomas Gleixner wrote:
> On Wed, 10 Jun 2015, Mike Galbraith wrote:
>
> > On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> >
> > > Some more information about your symptoms in form of configuration,
> > > extra patches, kernel traces etc. would be appreciated.
> >
> > homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
> > [ perf record: Woken up 2 times to write data ]
> > [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> > homer:~/tmp # perf script | grep irq_vectors:local_timer
> > ksoftirqd/3 19 [003] 5586.564238: irq_vectors:local_timer_entry: vector=239
> > ksoftirqd/3 19 [003] 5586.564239: irq_vectors:local_timer_exit: vector=239
> > ksoftirqd/3 19 [003] 5586.568238: irq_vectors:local_timer_entry: vector=239
> > ksoftirqd/3 19 [003] 5586.568239: irq_vectors:local_timer_exit: vector=239
> > ksoftirqd/3 19 [003] 5586.632218: irq_vectors:local_timer_entry: vector=239
>
> That gap is odd. Can you fire up the function tracer please?
Well, trying to do that like so...
trace-cmd record -M 8 -p function -e irq:* -e irq_vectors:local* -e timer:hrtimer* -- sleep 1
..capture takes much more than a second...
LOC: 248161 226536 42091 38892 Local timer interrupts
LOC: 248381 226783 42092 38901 Local timer interrupts
LOC: 248668 227038 42092 38903 Local timer interrupts
LOC: 248963 227277 42092 38904 Local timer interrupts
LOC: 249214 227515 42092 38905 Local timer interrupts
LOC: 249486 227732 42092 38905 Local timer interrupts
LOC: 249720 227961 42092 38905 Local timer interrupts
LOC: 249996 228200 42092 38906 Local timer interrupts
LOC: 250215 228450 42092 38906 Local timer interrupts
LOC: 250489 228704 42092 38906 Local timer interrupts
LOC: 250729 228952 42099 38964 Local timer interrupts
LOC: 250951 229141 42100 39215 Local timer interrupts
LOC: 251168 229374 42119 39468 Local timer interrupts
LOC: 251426 229641 42131 39719 Local timer interrupts
LOC: 251652 229868 42131 39970 Local timer interrupts
LOC: 251881 230141 42132 40220 Local timer interrupts
LOC: 252131 230371 42132 40471 Local timer interrupts
LOC: 252361 230646 42133 40721 Local timer interrupts
LOC: 252611 230875 42133 40972 Local timer interrupts
LOC: 252866 231149 42137 41222 Local timer interrupts
LOC: 253103 231386 42153 41450 Local timer interrupts
...but more importantly, when it gets cranked up..
homer:~/tmp # trace-cmd report > report
homer:~/tmp # grep local_timer_entry report|wc -l
252
...it scares the problem away.
(4 cpus because I fired up cpuset shield, isolates 2-3, takes 4-7 down)
-Mike
On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> I just booted a SNB with lapic=notscdeadline and ran that test
> program. All what happens is - as expected - that ksoftirqd runs more
> than we would like it to. I cannot observe any anomality vs. local
> timer interrupts at all.
Ditto on a E5-4620 box this morning, no sign of it. E5-2670 box I had
verified "went away" when I put master on it this morning, but that
doesn't matter much, having deadline timer feature alone definitely
doesn't mean box can be convinced to reproduce the stall. Hohum.
-Mike
On Thu, 2015-06-11 at 09:35 +0200, Mike Galbraith wrote:
> On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
>
> > I just booted a SNB with lapic=notscdeadline and ran that test
> > program. All what happens is - as expected - that ksoftirqd runs more
> > than we would like it to. I cannot observe any anomality vs. local
> > timer interrupts at all.
>
> Ditto on a E5-4620 box this morning, no sign of it. E5-2670 box I had
> verified "went away"
(dropped off the net on reboot, doesn't respond to remote button poking)
On Thu, 11 Jun 2015, Mike Galbraith wrote:
> On Wed, 2015-06-10 at 20:59 +0200, Thomas Gleixner wrote:
> Well, trying to do that like so...
>
> trace-cmd record -M 8 -p function -e irq:* -e irq_vectors:local* -e timer:hrtimer* -- sleep 1
> ..capture takes much more than a second...
>
> LOC: 248161 226536 42091 38892 Local timer interrupts
> LOC: 248381 226783 42092 38901 Local timer interrupts
> LOC: 248668 227038 42092 38903 Local timer interrupts
> LOC: 248963 227277 42092 38904 Local timer interrupts
> LOC: 249214 227515 42092 38905 Local timer interrupts
> LOC: 249486 227732 42092 38905 Local timer interrupts
> LOC: 249720 227961 42092 38905 Local timer interrupts
>
> ...but more importantly, when it gets cranked up..
>
> homer:~/tmp # trace-cmd report > report
> homer:~/tmp # grep local_timer_entry report|wc -l
> 252
>
> ...it scares the problem away.
Can you try the following, please?
Enable function tracer and hrtimer events manually. Then watch the irq
count on cpu3. If it stalls or becomes slow, then stop the trace with
echo 0 >/sys/kernel/debug/tracing/tracing_on
If the overhead of the function tracer hides the problem, then try just
with hrtimer, sched_switch and irq events.
Thanks,
tglx
On Thu, 2015-06-11 at 10:34 +0200, Thomas Gleixner wrote:
> Can you try the following, please?
>
> Enable function tracer and hrtimer events manually. Then watch the irq
> count on cpu3. If it stalls or becomes slow, then stop the trace with
>
> echo 0 >/sys/kernel/debug/tracing/tracing_on
>
> If the overhead of the function tracer hides the problem, then try just
> with hrtimer, sched_switch and irq events.
Yeah, function tracer hides it. After some fiddling with buffer size to
be able to ever see a tick, I finally have a trace trimmed with a tick
on each end, 175ms gap between them. It's still 1.5 after xz, sending
off list.
-Mike
On Thu, 11 Jun 2015, Mike Galbraith wrote:
> On Thu, 2015-06-11 at 10:34 +0200, Thomas Gleixner wrote:
>
> > Can you try the following, please?
> >
> > Enable function tracer and hrtimer events manually. Then watch the irq
> > count on cpu3. If it stalls or becomes slow, then stop the trace with
> >
> > echo 0 >/sys/kernel/debug/tracing/tracing_on
> >
> > If the overhead of the function tracer hides the problem, then try just
> > with hrtimer, sched_switch and irq events.
>
> Yeah, function tracer hides it. After some fiddling with buffer size to
> be able to ever see a tick, I finally have a trace trimmed with a tick
> on each end, 175ms gap between them. It's still 1.5 after xz, sending
> off list.
What you are looking at is the actual timer interrupt vector, but
that's not the the tick.
# grep 'expire_entry.*tick_sched_timer' trace2 | wc -l
45
175 / 45 =~ 4 . So I you have CPNFIG_HZ=250
The kernel does not care whether the actual interrupt happens or the
timer is expired by other means.
If Xen relies on the vector, it's none of my problems ....
Thanks,
tglx
On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:
> What you are looking at is the actual timer interrupt vector, but
> that's not the the tick.
>
> # grep 'expire_entry.*tick_sched_timer' trace2 | wc -l
> 45
>
> 175 / 45 =~ 4 . So I you have CPNFIG_HZ=250
>
> The kernel does not care whether the actual interrupt happens or the
> timer is expired by other means.
Learn something new every day. I (obviously) had no idea that it's
perfectly fine for interrupts to go away when the CPU is busy.
Oh well. Thanks for the new wrinkle in the mush between my ears.
-Mike
On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:
> The kernel does not care whether the actual interrupt happens or the
> timer is expired by other means.
Final question: Notabug explanation fits.. but why would the behavior
delta be so radical and repeatable w/wo deadline timer on some boxen?
-Mike
On Thu, 11 Jun 2015, Mike Galbraith wrote:
> On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:
>
> > The kernel does not care whether the actual interrupt happens or the
> > timer is expired by other means.
>
> Final question: Notabug explanation fits.. but why would the behavior
> delta be so radical and repeatable w/wo deadline timer on some boxen?
Good question. Cannot answer that w/o looking at traces.
Thanks,
tglx
On Thu, 2015-06-11 at 20:52 +0200, Thomas Gleixner wrote:
> On Thu, 11 Jun 2015, Mike Galbraith wrote:
> > On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:
> >
> > > The kernel does not care whether the actual interrupt happens or the
> > > timer is expired by other means.
> >
> > Final question: Notabug explanation fits.. but why would the behavior
> > delta be so radical and repeatable w/wo deadline timer on some boxen?
>
> Good question. Cannot answer that w/o looking at traces.
I may come back with some, I spent enough time doing circular notabug
chasing all over hell and creation to be a tad annoyed ;-)
-Mike