LinuxLists.cc - getnstimeofday stuck for several milliseconds?

2012-11-05 08:51:10

Subject: getnstimeofday stuck for several milliseconds?

Hi LKML,

I'm trying to make audio more useful in everyday low-latency scenarios
such as gaming or VOIP.

While doing so, I ran the wakeup_rt tracer, to track the time from
PulseAudio requesting wakeup (through hrtimers), to the thread actually
running.

I'm not sure how much overhead added by the wakeup_rt tracer itself, but
I got 9 ms on one machine and 20 ms on another, which I consider to be
quite a lot even for a standard kernel (i e without RT or other special
configuration).

The 9 ms example is pastebinned at [1], and here's where we get stuck
for most of the time:

<idle>-0 3d... 1105us : ktime_get_real <-intel_idle
<idle>-0 3d... 1106us!: getnstimeofday <-ktime_get_real
<idle>-0 3d... 7823us : ktime_get_real <-intel_idle

<idle>-0 3d... 7890us : ktime_get_real <-intel_idle
<idle>-0 3d... 7891us!: getnstimeofday <-ktime_get_real
<idle>-0 3d... 9023us : ktime_get_real <-intel_idle

It seems to me that sometimes we get stuck for several milliseconds
inside the getnstimeofday function - this was seen on both the 9 ms and
the 20 ms trace. This looks like a bug to me, and as I'm not sure on how
to best debug it further, and therefore I'm asking for help (or a bug
fix!) here.

For reference, the 9 ms trace was from a ~2 year old laptop (core i3
cpu) running 3.7rc2 vanilla/mainline kernel, and the 20 ms trace was
from an ~1 year old Atom-based machine running the 3.2-ubuntu kernel.
While tracing was enabled, I was running a libSDL game for a minute or two.

Thanks in advance for looking into this, and let me know if you need
further information, or anything else I can do to help sorting this one out.

--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic

[1] http://pastebin.se/6iMRdDfR

2012-11-12 23:54:19

by john stultz

[permalink] [raw]

Subject: Re: getnstimeofday stuck for several milliseconds?

On 11/05/2012 12:51 AM, David Henningsson wrote:
> Hi LKML,
>
> I'm trying to make audio more useful in everyday low-latency scenarios
> such as gaming or VOIP.
>
> While doing so, I ran the wakeup_rt tracer, to track the time from
> PulseAudio requesting wakeup (through hrtimers), to the thread
> actually running.
>
> I'm not sure how much overhead added by the wakeup_rt tracer itself,
> but I got 9 ms on one machine and 20 ms on another, which I consider
> to be quite a lot even for a standard kernel (i e without RT or other
> special configuration).
>
> The 9 ms example is pastebinned at [1], and here's where we get stuck
> for most of the time:
>
> <idle>-0 3d... 1105us : ktime_get_real <-intel_idle
> <idle>-0 3d... 1106us!: getnstimeofday <-ktime_get_real
> <idle>-0 3d... 7823us : ktime_get_real <-intel_idle
>
> <idle>-0 3d... 7890us : ktime_get_real <-intel_idle
> <idle>-0 3d... 7891us!: getnstimeofday <-ktime_get_real
> <idle>-0 3d... 9023us : ktime_get_real <-intel_idle
>
Its been awhile since I looked at wakeup_rt trace output, but that looks
more like ~6.7ms and ~1.2ms latencies, not 9ms (are you adding these
together?).

> It seems to me that sometimes we get stuck for several milliseconds
> inside the getnstimeofday function - this was seen on both the 9 ms
> and the 20 ms trace. This looks like a bug to me, and as I'm not sure
> on how to best debug it further, and therefore I'm asking for help (or
> a bug fix!) here.
>
> For reference, the 9 ms trace was from a ~2 year old laptop (core i3
> cpu) running 3.7rc2 vanilla/mainline kernel, and the 20 ms trace was
> from an ~1 year old Atom-based machine running the 3.2-ubuntu kernel.
> While tracing was enabled, I was running a libSDL game for a minute or
> two.
>
> Thanks in advance for looking into this, and let me know if you need
> further information, or anything else I can do to help sorting this
> one out.

Hrmm.. So 6.7ms is still a long time.

Looking at the trace you posted here: http://pastebin.se/6iMRdDfR

The trace also looks like its the cpuidle to interrupt transition where
you're seeing this. I sort of wonder if its mis-attributing the idle
time to the getnstimeofday()? Mainly because you don't seem to spend
much time in intel_idle() otherwise.

Or maybe we're both misreading it and its saying there's a delay between
the first ktime_get_real() from intel_idle() to the second call of
ktime_get_real(), between which we're in deep idle (which would make sense)?

Because unless the timekeeping lock is getting held for a long time, I
don't know why else you'd see such long delays at getnstimeofday().

Cc'ing Steven to see if he can't help understand whats going on here.

thanks
-john

2012-11-13 00:09:41

by john stultz

[permalink] [raw]

Subject: Re: getnstimeofday stuck for several milliseconds?

On 11/12/2012 03:53 PM, John Stultz wrote:
> On 11/05/2012 12:51 AM, David Henningsson wrote:
>> Hi LKML,
>>
>> I'm trying to make audio more useful in everyday low-latency
>> scenarios such as gaming or VOIP.
>>
>> While doing so, I ran the wakeup_rt tracer, to track the time from
>> PulseAudio requesting wakeup (through hrtimers), to the thread
>> actually running.
>>
>> I'm not sure how much overhead added by the wakeup_rt tracer itself,
>> but I got 9 ms on one machine and 20 ms on another, which I consider
>> to be quite a lot even for a standard kernel (i e without RT or other
>> special configuration).
>>
>> The 9 ms example is pastebinned at [1], and here's where we get stuck
>> for most of the time:
>>
>> <idle>-0 3d... 1105us : ktime_get_real <-intel_idle
>> <idle>-0 3d... 1106us!: getnstimeofday <-ktime_get_real
>> <idle>-0 3d... 7823us : ktime_get_real <-intel_idle
>>
>> <idle>-0 3d... 7890us : ktime_get_real <-intel_idle
>> <idle>-0 3d... 7891us!: getnstimeofday <-ktime_get_real
>> <idle>-0 3d... 9023us : ktime_get_real <-intel_idle
>>
>
> Looking at the trace you posted here: http://pastebin.se/6iMRdDfR
>
> The trace also looks like its the cpuidle to interrupt transition
> where you're seeing this. I sort of wonder if its mis-attributing the
> idle time to the getnstimeofday()? Mainly because you don't seem to
> spend much time in intel_idle() otherwise.
>
> Or maybe we're both misreading it and its saying there's a delay
> between the first ktime_get_real() from intel_idle() to the second
> call of ktime_get_real(), between which we're in deep idle (which
> would make sense)?
>
The more I think about it, I'm pretty sure this is the case:
The full context you need is:
<idle>-0 3d... 7890us : ktime_get_real <-intel_idle
<idle>-0 3d... 7891us!: getnstimeofday <-ktime_get_real
<idle>-0 3d... 9023us : ktime_get_real <-intel_idle
<idle>-0 3d... 9024us : getnstimeofday <-ktime_get_real

Where intel_idle() is calling ktime_get_real twice in a row, and
inbetween we see a large latency. Looking at intel_idle() the code in
question is:

kt_before = ktime_get_real();

stop_critical_timings();
if (!need_resched()) {

__monitor((void *)&current_thread_info()->flags, 0, 0);
smp_mb();
if (!need_resched())
__mwait(eax, ecx);
}

start_critical_timings();

kt_after = ktime_get_real();

Where we're basically timing how long we were in idle for.

So I think the problem is just misreading the trace output.

thanks
-john

2012-11-13 00:15:23

by Steven Rostedt

[permalink] [raw]

Subject: Re: getnstimeofday stuck for several milliseconds?

On Mon, 2012-11-12 at 15:53 -0800, John Stultz wrote:

> Cc'ing Steven to see if he can't help understand whats going on here.

I don't trust the trace...

The wake up happened on CPU 2

pulseaud-2106 2d... 1us : 2106:109:R + [003] 2115: 94:S <...>
pulseaud-2106 2d... 2us+: try_to_wake_up <-default_wake_function

Xorg-1278 3d... 1038us : __switch_to_xtra <-__switch_to
Xorg-1278 3d... 1039us : memset <-__switch_to_xtra
kworker/-37 3d... 1040us : finish_task_switch <-__schedule
kworker/-37 3.... 1040us : _raw_spin_lock_irq <-worker_thread
kworker/-37 3d... 1041us : need_more_worker <-worker_thread
kworker/-37 3d... 1041us : process_one_work <-worker_thread
kworker/-37 3.... 1042us : __wake_up <-wakeup_work_handler

A millisecond goes by, Xorg is schedules out for kworker (which I'm sure
is also less in priority). And we don't see any sign of need resched
being set (N).

kworker/-37 3d... 1082us : pick_next_task_idle <-__schedule
<idle>-0 3d... 1083us : finish_task_switch <-__schedule
<idle>-0 3.... 1083us : tick_nohz_idle_enter <-cpu_idle
<idle>-0 3.... 1084us : set_cpu_sd_state_idle <-tick_nohz_idle_enter
<idle>-0 3d... 1084us : __tick_nohz_idle_enter <-tick_nohz_idle_enter

Idle is even scheduled in here!

<idle>-0 3d.h. 9047us : ttwu_do_wakeup <-ttwu_do_activate.constprop.86
<idle>-0 3d.h. 9048us : check_preempt_curr <-ttwu_do_wakeup
<idle>-0 3d.h. 9049us : resched_task <-check_preempt_curr
<idle>-0 3dNh. 9049us : task_woken_rt <-ttwu_do_wakeup
<idle>-0 3dNh. 9050us : ttwu_stat <-try_to_wake_up
<idle>-0 3dNh. 9050us : _raw_spin_unlock_irqrestore <-try_to_wake_up
<idle>-0 3dNh. 9051us : _raw_spin_lock <-__run_hrtimer

During an interrupt (h), we see another wakeup (not sure what this was
for), and the need resched is set (N).

<idle>-0 3dNh. 9053us : lapic_next_event <-clockevents_program_event
<idle>-0 3dNh. 9054us : irq_exit <-smp_apic_timer_interrupt
<idle>-0 3dN.. 9055us : idle_cpu <-irq_exit
<idle>-0 3dN.. 9055us : rcu_irq_exit <-irq_exit
<idle>-0 3dN.. 9056us : rcu_eqs_enter_common.isra.40 <-rcu_irq_exit
<idle>-0 3dN.. 9056us : rcu_prepare_for_idle <-rcu_eqs_enter_common.isra.40
<idle>-0 3.N.. 9057us : menu_reflect <-cpuidle_idle_call
<idle>-0 3.N.. 9058us : rcu_idle_exit <-cpu_idle
<idle>-0 3dN.. 9058us : rcu_eqs_exit_common.isra.38 <-rcu_idle_exit
<idle>-0 3dN.. 9059us : rcu_cleanup_after_idle <-rcu_eqs_exit_common.isra.38
<idle>-0 3dN.. 9059us : del_timer <-rcu_cleanup_after_idle
<idle>-0 3.N.. 9060us : tick_nohz_idle_exit <-cpu_idle
<idle>-0 3dN.. 9060us : ktime_get <-tick_nohz_idle_exit
<idle>-0 3dN.. 9061us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
<idle>-0 3dN.. 9061us : update_cpu_load_nohz <-tick_nohz_idle_exit
<idle>-0 3dN.. 9061us : calc_load_exit_idle <-tick_nohz_idle_exit
<idle>-0 3dN.. 9062us : touch_softlockup_watchdog <-tick_nohz_idle_exit
<idle>-0 3dN.. 9062us : tick_nohz_restart <-tick_nohz_idle_exit
<idle>-0 3dN.. 9063us : hrtimer_cancel <-tick_nohz_restart
<idle>-0 3dN.. 9063us : hrtimer_try_to_cancel <-hrtimer_cancel

We exit the interrupt and then do a lot of stuff with interrupts
disabled? Oh, this looks like the exit from idle (rcu_idle_exit is
there). It's doing clean up stuff here.

<idle>-0 3dN.. 9077us : _pick_next_task_rt <-pick_next_task_rt
<idle>-0 3dN.. 9078us : dequeue_pushable_task <-pick_next_task_rt
<idle>-0 3d... 9079us : probe_wakeup_sched_switch <-__schedule
<idle>-0 3d... 9079us : 0:120:R ==> [003] 2115: 94:R <...>

Finally our task is scheduled. But this looks all screwed up to me. I
bet the clocks are not in sync. Do a:

echo global > /sys/kernel/debug/tracing/trace_clock

and this will use a slower clock, but one that is in sync between CPUs.

Also, enable the sched_switch and sched_wakeup trace points too.

cd /sys/kernel/debug/tracing/events/sched
echo 1 > sched_wakeup/enable
echo 1 > sched_switch/enable

-- Steve

2012-11-13 08:26:55

by David Henningsson

[permalink] [raw]

Subject: Re: getnstimeofday stuck for several milliseconds?

On 11/13/2012 01:15 AM, Steven Rostedt wrote:
> On Mon, 2012-11-12 at 15:53 -0800, John Stultz wrote:
>
>> Cc'ing Steven to see if he can't help understand whats going on here.
>
> I don't trust the trace...

Thanks John and Steven!

I've redone the trace with the global clock and got a different stack
trace, this time at 11 ms in total. (I don't know how much of these 11
ms are caused from the tracing overhead?)

The result is here:

http://pastebin.se/jxxqf8pt

and most of the time it seems to be these lines repeating:

compiz-1975 0.N.1 11us+: arch_flush_lazy_mmu_mode <-kmap_atomic_prot
compiz-1975 0.N.1 16us : __kunmap_atomic <-drm_clflush_page
compiz-1975 0.N.1 16us : native_flush_tlb_single <-__kunmap_atomic
compiz-1975 0.N.1 17us : arch_flush_lazy_mmu_mode <-__kunmap_atomic
compiz-1975 0.N.. 18us : drm_clflush_page <-drm_clflush_sg
compiz-1975 0.N.. 18us : kmap_atomic <-drm_clflush_page
compiz-1975 0.N.. 19us : kmap_atomic_prot <-kmap_atomic

There are also occasionally sched* tasks going on at other CPUs. If you
would excuse a layman's question - why can't we just schedule alsa-sink
on another CPU, if this one is busy with doing graphics stuff?

For reference, test kernel and test case were the same this time around
(3.7rc2, then playing a game for a few minutes).

--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic