2006-01-08 20:52:58

by Mark Knecht

[permalink] [raw]
Subject: 2.6.15-rt2 - repeatable xrun - no good data in trace

Hi,
I did run across a way that I can create a repeatable xrun on my
AMD64 machine by burning a CD in k3b while Jack is running.
Unfortunately I do not see any good trace data in dmesg when I do it.
Here's what I see in Jack. I burning the same iso file 3 times to make
sure it was repeatable:

09:35:33.113 Server configuration saved to "/home/mark/.jackdrc".
09:35:33.113 Statistics reset.
09:35:33.557 Client activated.
09:35:33.559 Audio connection change.
09:35:33.563 Audio connection graph change.
09:35:33.762 Audio connection change.
09:35:33.762 Audio connection graph change.
09:35:33.964 Audio connection change.
09:35:36.037 Audio connection graph change.
11:50:22.792 XRUN callback (1).
delay of 14728.000 usecs exceeds estimated spare time of 1360.000; restart ...
delay of 3184.000 usecs exceeds estimated spare time of 1360.000; restart ...
11:50:22.983 XRUN callback (2).
**** alsa_pcm: xrun of at least 0.039 msecs
delay of 2908.000 usecs exceeds estimated spare time of 1360.000; restart ...
11:50:23.407 XRUN callback (2 skipped).
11:51:13.405 XRUN callback (5).
**** alsa_pcm: xrun of at least 0.027 msecs
12:28:58.864 XRUN callback (6).
delay of 2893.000 usecs exceeds estimated spare time of 1363.000; restart ...
**** alsa_pcm: xrun of at least 0.044 msecs
delay of 9293.000 usecs exceeds estimated spare time of 1363.000; restart ...
12:28:59.265 XRUN callback (2 skipped).
12:39:26.418 XRUN callback (9).
delay of 11516.000 usecs exceeds estimated spare time of 1388.000; restart ...
delay of 8877.000 usecs exceeds estimated spare time of 1388.000; restart ...
12:39:28.373 XRUN callback (1 skipped).
12:40:18.427 XRUN callback (11).
**** alsa_pcm: xrun of at least 0.023 msecs


In dmesg I'm getting only this:

( dcopserver-8733 |#0): new 10 us maximum-latency critical section.
=> started at timestamp 12171220117: <preempt_schedule+0x53/0xb0>
=> ended at timestamp 12171220128: <schedule_tail+0xaa/0x110>

Call Trace:<ffffffff8014d79f>{check_critical_timing+479}
<ffffffff8014db78>{sub_ preempt_count_ti+88}
<ffffffff8012c3a3>{schedule_tail+67}
<ffffffff8012c40a>{schedule_tail+170 }
<ffffffff8010db79>{ret_from_fork+5}
=> dump-end timestamp 12171220195

( X-7777 |#0): new 13 us maximum-latency critical section.
=> started at timestamp 12183810145: <__schedule+0xb8/0x596>
=> ended at timestamp 12183810158: <thread_return+0xb5/0x11a>

Call Trace:<ffffffff8014d79f>{check_critical_timing+479}
<ffffffff803c79e0>{unix _poll+0}
<ffffffff8014db78>{sub_preempt_count_ti+88}
<ffffffff80403c0c>{thread_ret urn+70}
<ffffffff80403c7b>{thread_return+181} <ffffffff80403de5>{schedule+261}
<ffffffff804048ed>{schedule_timeout+141}
<ffffffff8013b2e0>{process_timeo ut+0}
<ffffffff8018d237>{do_select+967} <ffffffff8018cd80>{__pollwait+0}
<ffffffff8018d57d>{sys_select+749} <ffffffff8010dc86>{system_call+126}

=> dump-end timestamp 12183810266

Jack is 100.7 from portage. k3b is 0.12.10 from portage. The DVD/CDRW
drive is EIDE based.

Ideas?

- Mark


2006-01-08 21:04:48

by Michal Schmidt

[permalink] [raw]
Subject: Re: 2.6.15-rt2 - repeatable xrun - no good data in trace

Mark Knecht wrote:
> Hi,
> I did run across a way that I can create a repeatable xrun on my
> AMD64 machine by burning a CD in k3b while Jack is running.
> Unfortunately I do not see any good trace data in dmesg when I do it.

Maybe your cdrecord is running with realtime priority higher than Jack?
Michal

2006-01-08 21:49:29

by Jan Engelhardt

[permalink] [raw]
Subject: Re: 2.6.15-rt2 - repeatable xrun - no good data in trace

>> Hi,
>> I did run across a way that I can create a repeatable xrun on my
>> AMD64 machine by burning a CD in k3b while Jack is running.
>> Unfortunately I do not see any good trace data in dmesg when I do it.
>
> Maybe your cdrecord is running with realtime priority higher than Jack?
> Michal

cdrecord does run with SCHED_RR/99 when started with proper privileges.


Jan Engelhardt
--

2006-01-08 22:04:12

by Mark Knecht

[permalink] [raw]
Subject: Re: 2.6.15-rt2 - repeatable xrun - no good data in trace

On 1/8/06, Jan Engelhardt <[email protected]> wrote:
> >> Hi,
> >> I did run across a way that I can create a repeatable xrun on my
> >> AMD64 machine by burning a CD in k3b while Jack is running.
> >> Unfortunately I do not see any good trace data in dmesg when I do it.
> >
> > Maybe your cdrecord is running with realtime priority higher than Jack?
> > Michal
>
> cdrecord does run with SCHED_RR/99 when started with proper privileges.
>
Ah, then it's likely that this isn't a real problem and it would be
expected to cause an xrun?

Anyway, it seems strange that the trace doesn't show anything. I
suppose that's because cdrecord just grabs a lot of time at a higher
priority than Jack and Jack ends up not getting serivces at all for
5-10mS?

OK, back to the drawing board about debugging my problems!

thanks!

- Mark

2006-01-08 22:11:48

by Michal Schmidt

[permalink] [raw]
Subject: Re: 2.6.15-rt2 - repeatable xrun - no good data in trace

Mark Knecht wrote:
> On 1/8/06, Jan Engelhardt <[email protected]> wrote:
>>>> I did run across a way that I can create a repeatable xrun on my
>>>>AMD64 machine by burning a CD in k3b while Jack is running.
>>>>Unfortunately I do not see any good trace data in dmesg when I do it.
>>>Maybe your cdrecord is running with realtime priority higher than Jack?
>>>Michal
>>cdrecord does run with SCHED_RR/99 when started with proper privileges.
>>
> Ah, then it's likely that this isn't a real problem and it would be
> expected to cause an xrun?

By running cdrecord with a higher priority than Jack, you're telling the
system that burning the CD is more important than not getting xruns in Jack.

> Anyway, it seems strange that the trace doesn't show anything. I
> suppose that's because cdrecord just grabs a lot of time at a higher
> priority than Jack and Jack ends up not getting serivces at all for
> 5-10mS?

I guess that's exactly what's happening.

> OK, back to the drawing board about debugging my problems!

Try running cdrecord as a normal user and don't give it SUID root.

Michal

2006-01-08 22:21:16

by Mark Knecht

[permalink] [raw]
Subject: Re: 2.6.15-rt2 - repeatable xrun - no good data in trace

On 1/8/06, Michal Schmidt <[email protected]> wrote:
> Mark Knecht wrote:
> > On 1/8/06, Jan Engelhardt <[email protected]> wrote:
> >>>> I did run across a way that I can create a repeatable xrun on my
> >>>>AMD64 machine by burning a CD in k3b while Jack is running.
> >>>>Unfortunately I do not see any good trace data in dmesg when I do it.
> >>>Maybe your cdrecord is running with realtime priority higher than Jack?
> >>>Michal
> >>cdrecord does run with SCHED_RR/99 when started with proper privileges.
> >>
> > Ah, then it's likely that this isn't a real problem and it would be
> > expected to cause an xrun?
>
> By running cdrecord with a higher priority than Jack, you're telling the
> system that burning the CD is more important than not getting xruns in Jack.

Logically what you say makes sense but cdrecord, if it's running at
all, is running inside of an app called k3b. k3b doesn't seem to have
a config option for priorities, etc., so maybe it's hardwired. I'll
ask on the k3b list.

Sorry for the noise.

Cheers,
Mark

2006-01-10 10:05:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.15-rt2 - repeatable xrun - no good data in trace


* Mark Knecht <[email protected]> wrote:

> > cdrecord does run with SCHED_RR/99 when started with proper privileges.
>
> Ah, then it's likely that this isn't a real problem and it would be
> expected to cause an xrun?
>
> Anyway, it seems strange that the trace doesn't show anything. I
> suppose that's because cdrecord just grabs a lot of time at a higher
> priority than Jack and Jack ends up not getting serivces at all for
> 5-10mS?

the tracer will only detect undue delays in the 'highest prio' task in
the system - but it cannot detect whether all priorities in the system
are given out properly. In this particular case it was cdrecord that had
the highest priorities, and the delays you saw in Jackd were due to
cdrecord trumping Jackd's priority.

One way to make such scenarios easier to notice & debug would be for
jackd to warn if there are tasks in the system that have higher
priorities. (maybe it could even do it at xrun time, from a lower-prio
thread.)

Ingo

2006-01-10 13:49:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.15-rt2 - repeatable xrun - no good data in trace

On Tue, 2006-01-10 at 11:05 +0100, Ingo Molnar wrote:
> * Mark Knecht <[email protected]> wrote:
>
> > > cdrecord does run with SCHED_RR/99 when started with proper privileges.
> >
> > Ah, then it's likely that this isn't a real problem and it would be
> > expected to cause an xrun?
> >
> > Anyway, it seems strange that the trace doesn't show anything. I
> > suppose that's because cdrecord just grabs a lot of time at a higher
> > priority than Jack and Jack ends up not getting serivces at all for
> > 5-10mS?
>
> the tracer will only detect undue delays in the 'highest prio' task in
> the system - but it cannot detect whether all priorities in the system
> are given out properly. In this particular case it was cdrecord that had
> the highest priorities, and the delays you saw in Jackd were due to
> cdrecord trumping Jackd's priority.
>
> One way to make such scenarios easier to notice & debug would be for
> jackd to warn if there are tasks in the system that have higher
> priorities. (maybe it could even do it at xrun time, from a lower-prio
> thread.)

Hmm, this reminds me. This system isn't a SMP machine is it? SMP has
threads that run at priority 99 to handle things like swapping tasks
from one CPU to another. These will never show up in the tracer since
they are the highest priority. But I have seen these threads cause
latencies in some of my own code.

-- Steve


2006-01-10 13:55:22

by Mark Knecht

[permalink] [raw]
Subject: Re: 2.6.15-rt2 - repeatable xrun - no good data in trace

On 1/10/06, Steven Rostedt <[email protected]> wrote:
> On Tue, 2006-01-10 at 11:05 +0100, Ingo Molnar wrote:
> > * Mark Knecht <[email protected]> wrote:
> >
> > > > cdrecord does run with SCHED_RR/99 when started with proper privileges.
> > >
> > > Ah, then it's likely that this isn't a real problem and it would be
> > > expected to cause an xrun?
> > >
> > > Anyway, it seems strange that the trace doesn't show anything. I
> > > suppose that's because cdrecord just grabs a lot of time at a higher
> > > priority than Jack and Jack ends up not getting serivces at all for
> > > 5-10mS?
> >
> > the tracer will only detect undue delays in the 'highest prio' task in
> > the system - but it cannot detect whether all priorities in the system
> > are given out properly. In this particular case it was cdrecord that had
> > the highest priorities, and the delays you saw in Jackd were due to
> > cdrecord trumping Jackd's priority.
> >
> > One way to make such scenarios easier to notice & debug would be for
> > jackd to warn if there are tasks in the system that have higher
> > priorities. (maybe it could even do it at xrun time, from a lower-prio
> > thread.)
>
> Hmm, this reminds me. This system isn't a SMP machine is it? SMP has
> threads that run at priority 99 to handle things like swapping tasks
> from one CPU to another. These will never show up in the tracer since
> they are the highest priority. But I have seen these threads cause
> latencies in some of my own code.
>
> -- Steve

Steven,
No, this specific machine is UMP although I do have an HT machine I
use once in awhile that might fall prey to what you are mentioning.

While I have to agree that *if* cdrecord is running at a higher
priority then Jack would get trumped, I'm not positive yet that we
know that's true in this specific case. I have not yet received any
response from the developer of k3b, and while cdrecord is listed in
the setup of k3b I'm not sure how to test that it is really causing
the specific failure I saw.

NOTE: I'm sure cdrecord probably is causing this. I just don't want to
believe it is without some technical confirmation and assume there's
nothing that could be improved in the kernel.

NOTE 2: This case is pathological as I would never start writing a CD
when doing important audio work that requires zero xruns. I reported
it only to learn more myself.

Thanks,
Mark

2006-01-10 14:04:47

by Michal Schmidt

[permalink] [raw]
Subject: Re: 2.6.15-rt2 - repeatable xrun - no good data in trace

Mark Knecht wrote:
> While I have to agree that *if* cdrecord is running at a higher
> priority then Jack would get trumped, I'm not positive yet that we
> know that's true in this specific case. I have not yet received any
> response from the developer of k3b, and while cdrecord is listed in
> the setup of k3b I'm not sure how to test that it is really causing
> the specific failure I saw.

That shouldn't be too hard. While burning a CD, see the priority
cdrecord is running at. You can use:
ps -eo pid,user,args,pri | grep cdrecord

Also see if you have cdrecord installed suid root:
ls -l `which cdrecord`
If you do, you can be pretty sure that it runs at the highest possible
realtime priority. This is hardcoded in cdrecord. It does it if it runs
priviledged.

Michal