Hi LKML,
I'm trying to make audio more useful in everyday low-latency scenarios
such as gaming or VOIP.
While doing so, I ran the wakeup_rt tracer, to track the time from
PulseAudio requesting wakeup (through hrtimers), to the thread actually
running.
I'm not sure how much overhead added by the wakeup_rt tracer itself, but
I got 9 ms on one machine and 20 ms on another, which I consider to be
quite a lot even for a standard kernel (i e without RT or other special
configuration).
The 9 ms example is pastebinned at [1], and here's where we get stuck
for most of the time:
<idle>-0 3d... 1105us : ktime_get_real <-intel_idle
<idle>-0 3d... 1106us!: getnstimeofday <-ktime_get_real
<idle>-0 3d... 7823us : ktime_get_real <-intel_idle
<idle>-0 3d... 7890us : ktime_get_real <-intel_idle
<idle>-0 3d... 7891us!: getnstimeofday <-ktime_get_real
<idle>-0 3d... 9023us : ktime_get_real <-intel_idle
It seems to me that sometimes we get stuck for several milliseconds
inside the getnstimeofday function - this was seen on both the 9 ms and
the 20 ms trace. This looks like a bug to me, and as I'm not sure on how
to best debug it further, and therefore I'm asking for help (or a bug
fix!) here.
For reference, the 9 ms trace was from a ~2 year old laptop (core i3
cpu) running 3.7rc2 vanilla/mainline kernel, and the 20 ms trace was
from an ~1 year old Atom-based machine running the 3.2-ubuntu kernel.
While tracing was enabled, I was running a libSDL game for a minute or two.
Thanks in advance for looking into this, and let me know if you need
further information, or anything else I can do to help sorting this one out.
--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic
[1] http://pastebin.se/6iMRdDfR
On 11/05/2012 12:51 AM, David Henningsson wrote:
> Hi LKML,
>
> I'm trying to make audio more useful in everyday low-latency scenarios
> such as gaming or VOIP.
>
> While doing so, I ran the wakeup_rt tracer, to track the time from
> PulseAudio requesting wakeup (through hrtimers), to the thread
> actually running.
>
> I'm not sure how much overhead added by the wakeup_rt tracer itself,
> but I got 9 ms on one machine and 20 ms on another, which I consider
> to be quite a lot even for a standard kernel (i e without RT or other
> special configuration).
>
> The 9 ms example is pastebinned at [1], and here's where we get stuck
> for most of the time:
>
> <idle>-0 3d... 1105us : ktime_get_real <-intel_idle
> <idle>-0 3d... 1106us!: getnstimeofday <-ktime_get_real
> <idle>-0 3d... 7823us : ktime_get_real <-intel_idle
>
> <idle>-0 3d... 7890us : ktime_get_real <-intel_idle
> <idle>-0 3d... 7891us!: getnstimeofday <-ktime_get_real
> <idle>-0 3d... 9023us : ktime_get_real <-intel_idle
>
Its been awhile since I looked at wakeup_rt trace output, but that looks
more like ~6.7ms and ~1.2ms latencies, not 9ms (are you adding these
together?).
> It seems to me that sometimes we get stuck for several milliseconds
> inside the getnstimeofday function - this was seen on both the 9 ms
> and the 20 ms trace. This looks like a bug to me, and as I'm not sure
> on how to best debug it further, and therefore I'm asking for help (or
> a bug fix!) here.
>
> For reference, the 9 ms trace was from a ~2 year old laptop (core i3
> cpu) running 3.7rc2 vanilla/mainline kernel, and the 20 ms trace was
> from an ~1 year old Atom-based machine running the 3.2-ubuntu kernel.
> While tracing was enabled, I was running a libSDL game for a minute or
> two.
>
> Thanks in advance for looking into this, and let me know if you need
> further information, or anything else I can do to help sorting this
> one out.
Hrmm.. So 6.7ms is still a long time.
Looking at the trace you posted here: http://pastebin.se/6iMRdDfR
The trace also looks like its the cpuidle to interrupt transition where
you're seeing this. I sort of wonder if its mis-attributing the idle
time to the getnstimeofday()? Mainly because you don't seem to spend
much time in intel_idle() otherwise.
Or maybe we're both misreading it and its saying there's a delay between
the first ktime_get_real() from intel_idle() to the second call of
ktime_get_real(), between which we're in deep idle (which would make sense)?
Because unless the timekeeping lock is getting held for a long time, I
don't know why else you'd see such long delays at getnstimeofday().
Cc'ing Steven to see if he can't help understand whats going on here.
thanks
-john
On 11/12/2012 03:53 PM, John Stultz wrote:
> On 11/05/2012 12:51 AM, David Henningsson wrote:
>> Hi LKML,
>>
>> I'm trying to make audio more useful in everyday low-latency
>> scenarios such as gaming or VOIP.
>>
>> While doing so, I ran the wakeup_rt tracer, to track the time from
>> PulseAudio requesting wakeup (through hrtimers), to the thread
>> actually running.
>>
>> I'm not sure how much overhead added by the wakeup_rt tracer itself,
>> but I got 9 ms on one machine and 20 ms on another, which I consider
>> to be quite a lot even for a standard kernel (i e without RT or other
>> special configuration).
>>
>> The 9 ms example is pastebinned at [1], and here's where we get stuck
>> for most of the time:
>>
>> <idle>-0 3d... 1105us : ktime_get_real <-intel_idle
>> <idle>-0 3d... 1106us!: getnstimeofday <-ktime_get_real
>> <idle>-0 3d... 7823us : ktime_get_real <-intel_idle
>>
>> <idle>-0 3d... 7890us : ktime_get_real <-intel_idle
>> <idle>-0 3d... 7891us!: getnstimeofday <-ktime_get_real
>> <idle>-0 3d... 9023us : ktime_get_real <-intel_idle
>>
>
> Looking at the trace you posted here: http://pastebin.se/6iMRdDfR
>
> The trace also looks like its the cpuidle to interrupt transition
> where you're seeing this. I sort of wonder if its mis-attributing the
> idle time to the getnstimeofday()? Mainly because you don't seem to
> spend much time in intel_idle() otherwise.
>
> Or maybe we're both misreading it and its saying there's a delay
> between the first ktime_get_real() from intel_idle() to the second
> call of ktime_get_real(), between which we're in deep idle (which
> would make sense)?
>
The more I think about it, I'm pretty sure this is the case:
The full context you need is:
<idle>-0 3d... 7890us : ktime_get_real <-intel_idle
<idle>-0 3d... 7891us!: getnstimeofday <-ktime_get_real
<idle>-0 3d... 9023us : ktime_get_real <-intel_idle
<idle>-0 3d... 9024us : getnstimeofday <-ktime_get_real
Where intel_idle() is calling ktime_get_real twice in a row, and
inbetween we see a large latency. Looking at intel_idle() the code in
question is:
kt_before = ktime_get_real();
stop_critical_timings();
if (!need_resched()) {
__monitor((void *)¤t_thread_info()->flags, 0, 0);
smp_mb();
if (!need_resched())
__mwait(eax, ecx);
}
start_critical_timings();
kt_after = ktime_get_real();
Where we're basically timing how long we were in idle for.
So I think the problem is just misreading the trace output.
thanks
-john
On Mon, 2012-11-12 at 15:53 -0800, John Stultz wrote:
> Cc'ing Steven to see if he can't help understand whats going on here.
I don't trust the trace...
The wake up happened on CPU 2
pulseaud-2106 2d... 1us : 2106:109:R + [003] 2115: 94:S <...>
pulseaud-2106 2d... 2us+: try_to_wake_up <-default_wake_function
Xorg-1278 3d... 1038us : __switch_to_xtra <-__switch_to
Xorg-1278 3d... 1039us : memset <-__switch_to_xtra
kworker/-37 3d... 1040us : finish_task_switch <-__schedule
kworker/-37 3.... 1040us : _raw_spin_lock_irq <-worker_thread
kworker/-37 3d... 1041us : need_more_worker <-worker_thread
kworker/-37 3d... 1041us : process_one_work <-worker_thread
kworker/-37 3.... 1042us : __wake_up <-wakeup_work_handler
A millisecond goes by, Xorg is schedules out for kworker (which I'm sure
is also less in priority). And we don't see any sign of need resched
being set (N).
kworker/-37 3d... 1082us : pick_next_task_idle <-__schedule
<idle>-0 3d... 1083us : finish_task_switch <-__schedule
<idle>-0 3.... 1083us : tick_nohz_idle_enter <-cpu_idle
<idle>-0 3.... 1084us : set_cpu_sd_state_idle <-tick_nohz_idle_enter
<idle>-0 3d... 1084us : __tick_nohz_idle_enter <-tick_nohz_idle_enter
Idle is even scheduled in here!
<idle>-0 3d.h. 9047us : ttwu_do_wakeup <-ttwu_do_activate.constprop.86
<idle>-0 3d.h. 9048us : check_preempt_curr <-ttwu_do_wakeup
<idle>-0 3d.h. 9049us : resched_task <-check_preempt_curr
<idle>-0 3dNh. 9049us : task_woken_rt <-ttwu_do_wakeup
<idle>-0 3dNh. 9050us : ttwu_stat <-try_to_wake_up
<idle>-0 3dNh. 9050us : _raw_spin_unlock_irqrestore <-try_to_wake_up
<idle>-0 3dNh. 9051us : _raw_spin_lock <-__run_hrtimer
During an interrupt (h), we see another wakeup (not sure what this was
for), and the need resched is set (N).
<idle>-0 3dNh. 9053us : lapic_next_event <-clockevents_program_event
<idle>-0 3dNh. 9054us : irq_exit <-smp_apic_timer_interrupt
<idle>-0 3dN.. 9055us : idle_cpu <-irq_exit
<idle>-0 3dN.. 9055us : rcu_irq_exit <-irq_exit
<idle>-0 3dN.. 9056us : rcu_eqs_enter_common.isra.40 <-rcu_irq_exit
<idle>-0 3dN.. 9056us : rcu_prepare_for_idle <-rcu_eqs_enter_common.isra.40
<idle>-0 3.N.. 9057us : menu_reflect <-cpuidle_idle_call
<idle>-0 3.N.. 9058us : rcu_idle_exit <-cpu_idle
<idle>-0 3dN.. 9058us : rcu_eqs_exit_common.isra.38 <-rcu_idle_exit
<idle>-0 3dN.. 9059us : rcu_cleanup_after_idle <-rcu_eqs_exit_common.isra.38
<idle>-0 3dN.. 9059us : del_timer <-rcu_cleanup_after_idle
<idle>-0 3.N.. 9060us : tick_nohz_idle_exit <-cpu_idle
<idle>-0 3dN.. 9060us : ktime_get <-tick_nohz_idle_exit
<idle>-0 3dN.. 9061us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
<idle>-0 3dN.. 9061us : update_cpu_load_nohz <-tick_nohz_idle_exit
<idle>-0 3dN.. 9061us : calc_load_exit_idle <-tick_nohz_idle_exit
<idle>-0 3dN.. 9062us : touch_softlockup_watchdog <-tick_nohz_idle_exit
<idle>-0 3dN.. 9062us : tick_nohz_restart <-tick_nohz_idle_exit
<idle>-0 3dN.. 9063us : hrtimer_cancel <-tick_nohz_restart
<idle>-0 3dN.. 9063us : hrtimer_try_to_cancel <-hrtimer_cancel
We exit the interrupt and then do a lot of stuff with interrupts
disabled? Oh, this looks like the exit from idle (rcu_idle_exit is
there). It's doing clean up stuff here.
<idle>-0 3dN.. 9077us : _pick_next_task_rt <-pick_next_task_rt
<idle>-0 3dN.. 9078us : dequeue_pushable_task <-pick_next_task_rt
<idle>-0 3d... 9079us : probe_wakeup_sched_switch <-__schedule
<idle>-0 3d... 9079us : 0:120:R ==> [003] 2115: 94:R <...>
Finally our task is scheduled. But this looks all screwed up to me. I
bet the clocks are not in sync. Do a:
echo global > /sys/kernel/debug/tracing/trace_clock
and this will use a slower clock, but one that is in sync between CPUs.
Also, enable the sched_switch and sched_wakeup trace points too.
cd /sys/kernel/debug/tracing/events/sched
echo 1 > sched_wakeup/enable
echo 1 > sched_switch/enable
-- Steve
On 11/13/2012 01:15 AM, Steven Rostedt wrote:
> On Mon, 2012-11-12 at 15:53 -0800, John Stultz wrote:
>
>> Cc'ing Steven to see if he can't help understand whats going on here.
>
> I don't trust the trace...
Thanks John and Steven!
I've redone the trace with the global clock and got a different stack
trace, this time at 11 ms in total. (I don't know how much of these 11
ms are caused from the tracing overhead?)
The result is here:
http://pastebin.se/jxxqf8pt
and most of the time it seems to be these lines repeating:
compiz-1975 0.N.1 11us+: arch_flush_lazy_mmu_mode <-kmap_atomic_prot
compiz-1975 0.N.1 16us : __kunmap_atomic <-drm_clflush_page
compiz-1975 0.N.1 16us : native_flush_tlb_single <-__kunmap_atomic
compiz-1975 0.N.1 17us : arch_flush_lazy_mmu_mode <-__kunmap_atomic
compiz-1975 0.N.. 18us : drm_clflush_page <-drm_clflush_sg
compiz-1975 0.N.. 18us : kmap_atomic <-drm_clflush_page
compiz-1975 0.N.. 19us : kmap_atomic_prot <-kmap_atomic
There are also occasionally sched* tasks going on at other CPUs. If you
would excuse a layman's question - why can't we just schedule alsa-sink
on another CPU, if this one is busy with doing graphics stuff?
For reference, test kernel and test case were the same this time around
(3.7rc2, then playing a game for a few minutes).
--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic