2005-12-29 15:08:34

by kus Kusche Klaus

[permalink] [raw]
Subject: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

I took some latency traces on our sa1100 (see attachment) running
2.6.15-rc7-rt1 with full preemption.
They look very bad at the first glance, but I cannot interpret them in
detail.

Trace 3, 4 and 5 seem to have obvious reasons: FPU emulation,
Framebuffer console updates, and compression/decompression of flash data
for jffs2.

Moreover, if I read these traces correctly, they just disable
preemption, but still allow interrupts. Is that correct? Can anything be
done against these latencies, i.e. do they really need to disable
preemption for such a long time?

However, traces 1, 2, 6 and 7 are completely mysterious to me.
Interrupts seem to be blocked for milliseconds, while nothing is going
on on the system? Moreover, there are console-related function names in
traces 6 and 7, although I've unconfigured the framebuffer console for
these runs!

Many thanks in advance for any help!

Klaus Kusche
Entwicklung Software - Steuerung
Software Development - Control

KEBA AG
A-4041 Linz
Gewerbepark Urfahr
Tel +43 / 732 / 7090-3120
Fax +43 / 732 / 7090-6301
E-Mail: [email protected]
http://www.keba.com



Attachments:
traces.tar.bz2 (24.70 kB)
traces.tar.bz2

2005-12-29 18:25:17

by Daniel Walker

[permalink] [raw]
Subject: Re: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)


Do you have any power management features turned on?

I've seen some traces that look like this on buggy intel x86 hardware.
When a small two line function with no loops lasts for 10ms . One of
your traces showed irq_exit() lasting 9ms . It's like the process just
stops.

Daniel

On Thu, 2005-12-29 at 16:08 +0100, kus Kusche Klaus wrote:
> I took some latency traces on our sa1100 (see attachment) running
> 2.6.15-rc7-rt1 with full preemption.
> They look very bad at the first glance, but I cannot interpret them in
> detail.
>
> Trace 3, 4 and 5 seem to have obvious reasons: FPU emulation,
> Framebuffer console updates, and compression/decompression of flash data
> for jffs2.
>
> Moreover, if I read these traces correctly, they just disable
> preemption, but still allow interrupts. Is that correct? Can anything be
> done against these latencies, i.e. do they really need to disable
> preemption for such a long time?
>
> However, traces 1, 2, 6 and 7 are completely mysterious to me.
> Interrupts seem to be blocked for milliseconds, while nothing is going
> on on the system? Moreover, there are console-related function names in
> traces 6 and 7, although I've unconfigured the framebuffer console for
> these runs!
>
> Many thanks in advance for any help!
>
> Klaus Kusche
> Entwicklung Software - Steuerung
> Software Development - Control
>
> KEBA AG
> A-4041 Linz
> Gewerbepark Urfahr
> Tel +43 / 732 / 7090-3120
> Fax +43 / 732 / 7090-6301
> E-Mail: [email protected]
> http://www.keba.com
>
>

2005-12-30 06:35:33

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Daniel Walker
> On Thu, 2005-12-29 at 16:08 +0100, kus Kusche Klaus wrote:
> > I took some latency traces on our sa1100 (see attachment) running
> > 2.6.15-rc7-rt1 with full preemption.
> > They look very bad at the first glance, but I cannot
> interpret them in
> > detail.
> >
> > Trace 3, 4 and 5 seem to have obvious reasons: FPU emulation,
> > Framebuffer console updates, and compression/decompression
> of flash data
> > for jffs2.
> >
> > Moreover, if I read these traces correctly, they just disable
> > preemption, but still allow interrupts. Is that correct?
> Can anything be
> > done against these latencies, i.e. do they really need to disable
> > preemption for such a long time?
> >
> > However, traces 1, 2, 6 and 7 are completely mysterious to me.
> > Interrupts seem to be blocked for milliseconds, while
> nothing is going
> > on on the system? Moreover, there are console-related
> function names in
> > traces 6 and 7, although I've unconfigured the framebuffer
> console for
> > these runs!
>
> Do you have any power management features turned on?
>
> I've seen some traces that look like this on buggy intel x86 hardware.
> When a small two line function with no loops lasts for 10ms . One of
> your traces showed irq_exit() lasting 9ms . It's like the process just
> stops.

No, no power management at all.

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2005-12-30 07:29:54

by Lee Revell

[permalink] [raw]
Subject: Re: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

On Thu, 2005-12-29 at 16:08 +0100, kus Kusche Klaus wrote:
> However, traces 1, 2, 6 and 7 are completely mysterious to me.
> Interrupts seem to be blocked for milliseconds, while nothing is going
> on on the system? Moreover, there are console-related function names
> in
> traces 6 and 7, although I've unconfigured the framebuffer console for
> these runs!
>

It seems that either some code path really is forgetting to re-enable
interrupts, or there's a bug in the latency tracer.

Do these correspond to observed latencies?

Lee

2005-12-30 07:36:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)


* Lee Revell <[email protected]> wrote:

> > However, traces 1, 2, 6 and 7 are completely mysterious to me.
> > Interrupts seem to be blocked for milliseconds, while nothing is going
> > on on the system? Moreover, there are console-related function names
> > in
> > traces 6 and 7, although I've unconfigured the framebuffer console for
> > these runs!
>
> It seems that either some code path really is forgetting to re-enable
> interrupts, or there's a bug in the latency tracer.

one question is, what do the kernel addresses visible in the first
argument of asm_do_IRQ() correspond to:

trace1:MyThread-153 0D..1 5977us+: asm_do_IRQ (c030c170 1a 0)
trace1:MyThread-153 0D..1 15191us+: asm_do_IRQ (c030c1bc 1a 0)
trace2: <idle>-0 0D..2 8822us+: asm_do_IRQ (c021da24 1a 0)
trace2: <idle>-0 0Dn.2 8920us+: asm_do_IRQ (c021da24 b 0)
trace3: top-169 0D..1 8802us+: asm_do_IRQ (c024e5fc 1a 0)
trace4: insmod-185 0D..1 8794us+: asm_do_IRQ (c030c174 1a 0)
trace5: dd-197 0D..1 8812us+: asm_do_IRQ (c02e4938 1a 0)
trace6: kthread-11 0d..3 2670us+: asm_do_IRQ (c02fe2d0 1a 0)
trace7:MyThread-95 0D..1 542us+: asm_do_IRQ (c02fe2d0 1a 0)
trace7:MyThread-95 0D..1 9755us+: asm_do_IRQ (c02fe2d0 1a 0)

i.e. what is c02fe2d0, c021da24, c02e4938, etc.?

but it seems most of the latencies are printk related: one possibility
is that something is doing a costly printk with preemption disabled.

Ingo

2005-12-30 07:42:42

by Kai Geek

[permalink] [raw]
Subject: Re: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

Hello,
i am inframed trace for irq14 bugging points.

> * Lee Revell <[email protected]> wrote:

> > It seems that either some code path really is forgetting to
> > re-enable interrupts, or there's a bug in the latency tracer.
> one question is, what do the kernel addresses visible in the first
> argument of asm_do_IRQ() correspond to:
> trace1:MyThread-153 0D..1 5977us+: asm_do_IRQ (c030c170 1a 0)
> trace1:MyThread-153 0D..1 15191us+: asm_do_IRQ (c030c1bc 1a 0)
> trace2: <idle>-0 0D..2 8822us+: asm_do_IRQ (c021da24 1a 0)
> trace2: <idle>-0 0Dn.2 8920us+: asm_do_IRQ (c021da24 b 0)
> trace3: top-169 0D..1 8802us+: asm_do_IRQ (c024e5fc 1a 0)
> trace4: insmod-185 0D..1 8794us+: asm_do_IRQ (c030c174 1a 0)
> trace5: dd-197 0D..1 8812us+: asm_do_IRQ (c02e4938 1a 0)
> trace6: kthread-11 0d..3 2670us+: asm_do_IRQ (c02fe2d0 1a 0)
> trace7:MyThread-95 0D..1 542us+: asm_do_IRQ (c02fe2d0 1a 0)
> trace7:MyThread-95 0D..1 9755us+: asm_do_IRQ (c02fe2d0 1a 0)
> i.e. what is c02fe2d0, c021da24, c02e4938, etc.?
> but it seems most of the latencies are printk related: one possibility
> is that something is doing a costly printk with preemption disabled.
> Ingo
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

+-+-+-+ BEGIN PGP SIGNATURE +-+-+-+
Version: GnuPG v1.4.2 (GNU/Linux)
.-. .-. _
: : : : :_;
.-' : .--. : `-. .-. .--. ,-.,-.
' .; :' '_.'' .; :: :' .; ; : ,. :
`.__.'`.__.'`.__.':_;`.__,_;:_;:_;

Kai "Ozgur" Geek
Network Engineer
PGP ID: B1B63B6E
+-+-+-+ END PGP SIGNATURE +-+-+-+


--
_______________________________________________
Check out the latest SMS services @ http://www.linuxmail.org
This allows you to send and receive SMS through your mailbox.

Powered by Outblaze

2005-12-30 07:44:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)


there seem to be leaked preempt counts:

<idle>-0 0.n.1 8974us : touch_critical_timing (cpu_idle)

we should never have preemption disabled in cpu_idle(). To debug leaked
preemption counts, enable CONFIG_DEBUG_PREEMPT.

Ingo

2005-12-30 08:02:44

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Lee Revell
> On Thu, 2005-12-29 at 16:08 +0100, kus Kusche Klaus wrote:
> > However, traces 1, 2, 6 and 7 are completely mysterious to me.
> > Interrupts seem to be blocked for milliseconds, while
> nothing is going
> > on on the system? Moreover, there are console-related function names
> > in
> > traces 6 and 7, although I've unconfigured the framebuffer
> console for
> > these runs!
>
> It seems that either some code path really is forgetting to re-enable
> interrupts, or there's a bug in the latency tracer.
>
> Do these correspond to observed latencies?

As far as I can tell, probably not, but I'm not yet sure,
because I don't really know how to measure latencies *continuously*.

Currently, I'm measuring clock interrupt jitter, i.e. how late
(compared to the theoretical clock tick interval) do events
triggered by the clock interrupt arrive at an usermode process.

This jitter does not exceed 120 microseconds for a pri 99 RT task
or 400-500 microseconds for a pri 1 RT task, so it is far from the
5000-10000 microseconds observed by the latency tracer.

So there are two possibilities:
* Either the latency tracer is wrong.
* Or the latencies are perfectly synchronized with the system clock
(i.e. always fit nicely in between two clock interrupts).
This could well be the case, because all my kernel modules doing
printk's are also clock-triggered.

There seem to be very rare jitters up to 1 second (!),
I'm still hunting those (they also seem to be related to massive
printk's), but they also seem to be unrelated to those observed
by the latency tracer.

By the way, has nobody ever used the latency tracer for ARM yet?
Initially, it gave only 0 timings.
I hacked the following in timex.h to make it work:
#define mach_read_cycles() (OSCR)
#define mach_cycles_to_usecs(d) ((d)>>2)
#define mach_usecs_to_cycles(d) ((d)<<2)

OSCR should be a free-running counter register with 3.6864 MHz.


--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2005-12-30 10:35:25

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Ingo Molnar
> there seem to be leaked preempt counts:
>
> <idle>-0 0.n.1 8974us : touch_critical_timing (cpu_idle)
>
> we should never have preemption disabled in cpu_idle(). To
> debug leaked
> preemption counts, enable CONFIG_DEBUG_PREEMPT.

I enabled CONFIG_DEBUG_PREEMPT.

I still have that latency trace in <idle>-0 with that line at the end.
In fact, it is the most common trace on this system:
I always get that trace immediately after resetting the latency
tracer, always close to 9 ms (until an even longer latency happens).

Moreover, I looked up the address shown in asm_do_IRQ in the second
line of all those traces.
<idle>-0 0D..1 1us!: touch_critical_timing (default_idle)
<idle>-0 0D..2 8841us+: asm_do_IRQ (c021da44 1a 0)
It is always the same address, and it is in "cpu_idle".

However, CONFIG_DEBUG_PREEMPT remains silent: No bugs, no warns,
just plain nothing in the syslog.

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2005-12-30 10:59:12

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Ingo Molnar
> * Lee Revell <[email protected]> wrote:
> > > However, traces 1, 2, 6 and 7 are completely mysterious to me.
> > > Interrupts seem to be blocked for milliseconds, while
> nothing is going
> > > on on the system? Moreover, there are console-related
> function names
> > > in
> > > traces 6 and 7, although I've unconfigured the
> framebuffer console for
> > > these runs!
> >
> > It seems that either some code path really is forgetting to
> re-enable
> > interrupts, or there's a bug in the latency tracer.
>
> one question is, what do the kernel addresses visible in the first
> argument of asm_do_IRQ() correspond to:
>
> trace1:MyThread-153 0D..1 5977us+: asm_do_IRQ (c030c170 1a 0)
> trace1:MyThread-153 0D..1 15191us+: asm_do_IRQ (c030c1bc 1a 0)
> trace2: <idle>-0 0D..2 8822us+: asm_do_IRQ (c021da24 1a 0)
> trace2: <idle>-0 0Dn.2 8920us+: asm_do_IRQ (c021da24 b 0)
> trace3: top-169 0D..1 8802us+: asm_do_IRQ (c024e5fc 1a 0)
> trace4: insmod-185 0D..1 8794us+: asm_do_IRQ (c030c174 1a 0)
> trace5: dd-197 0D..1 8812us+: asm_do_IRQ (c02e4938 1a 0)
> trace6: kthread-11 0d..3 2670us+: asm_do_IRQ (c02fe2d0 1a 0)
> trace7:MyThread-95 0D..1 542us+: asm_do_IRQ (c02fe2d0 1a 0)
> trace7:MyThread-95 0D..1 9755us+: asm_do_IRQ (c02fe2d0 1a 0)
>
> i.e. what is c02fe2d0, c021da24, c02e4938, etc.?

I do not have that kernel or configuration any longer,
so I cannot look those addresses up.

I recreated a similar kernel, did similar experiments, got
similar traces, looked up the new addresses.

The one in the "<idle>-0" trace is "cpu_idle".

The others I got were mostly related to the load on the system,
all over the kernel:
Floating point emulator (call_fpe, float64_to_float32, ...),
jffs2 (deflate_fast, ...), console write (several places within
sa1100_console_write).

The strange ones in my new experiments were "mcount" and
"notify_die".

Do you need more of them?

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2005-12-30 11:18:38

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Ingo Molnar
> * Lee Revell <[email protected]> wrote:
> > It seems that either some code path really is forgetting to
> re-enable
> > interrupts, or there's a bug in the latency tracer.
>
> one question is, what do the kernel addresses visible in the first
> argument of asm_do_IRQ() correspond to:
>
> trace1:MyThread-153 0D..1 5977us+: asm_do_IRQ (c030c170 1a 0)
> trace1:MyThread-153 0D..1 15191us+: asm_do_IRQ (c030c1bc 1a 0)
> trace2: <idle>-0 0D..2 8822us+: asm_do_IRQ (c021da24 1a 0)
> trace2: <idle>-0 0Dn.2 8920us+: asm_do_IRQ (c021da24 b 0)
> trace3: top-169 0D..1 8802us+: asm_do_IRQ (c024e5fc 1a 0)
> trace4: insmod-185 0D..1 8794us+: asm_do_IRQ (c030c174 1a 0)
> trace5: dd-197 0D..1 8812us+: asm_do_IRQ (c02e4938 1a 0)
> trace6: kthread-11 0d..3 2670us+: asm_do_IRQ (c02fe2d0 1a 0)
> trace7:MyThread-95 0D..1 542us+: asm_do_IRQ (c02fe2d0 1a 0)
> trace7:MyThread-95 0D..1 9755us+: asm_do_IRQ (c02fe2d0 1a 0)
>
> i.e. what is c02fe2d0, c021da24, c02e4938, etc.?
>
> but it seems most of the latencies are printk related: one
> possibility
> is that something is doing a costly printk with preemption disabled.

The long latency in idle (trace2) is definitely *not* printk related:
It appears without any printk going on.

If I remember correctly, trace3 and trace5 were also taken
without any printk's happening, I'm not sure about trace4
(my module init function writes a message).

MyThread is a kernel thread of mine which definitely calls printk,
so trace1 and trace7 (and perhaps trace6) could be printk-related.

However, it never ever explicitely locks preemption, interrupts or
anything similar, especially not around the printk's, and there are
no other printk's happening during the tests.

Hence: printk's yes, printk's with preemption disabled not that I
am aware of.

However, in the meantime I found out that I can generate almost any
latency (exceeding 1 sec!) with massive printk's, no matter if they
go to the fb console or the sio or both (only if both are disabled
printk doesn't seem to cause long latencies).
That's even the case when the interrupt jitter test program runs
at a higher rt priority than the printk kernel thread.
Why that and what to do against it?

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2005-12-30 12:40:28

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Ingo Molnar
> there seem to be leaked preempt counts:
>
> <idle>-0 0.n.1 8974us : touch_critical_timing (cpu_idle)
>
> we should never have preemption disabled in cpu_idle(). To
> debug leaked
> preemption counts, enable CONFIG_DEBUG_PREEMPT.

Something really fishy is going on here:
That 9 ms latency seems to be really *idle* time!

* If the box is idle, I get that trace almost immediately, and
almost always with close to 9 ms (system clock is 100 Hz,
i.e. 10 ms tick period).

* If the box is 100 % loaded, I don't get that trace. I get
different traces from different processes, mostly shorter than
9 ms.

* If I load the box with work at regular intervals
and idle time in between, I get traces
identical to the 9 ms idle trace, but consistently shorter.
If I throw a flood ping with 1000 pkt/s against my box, the idle
trace shows up with 800 or 900 microseconds, i.e. the idle time
between packets.

Now, is the tracer wrong, or has the idle time a wrong status?

By the way, I had one trace today where the cat /proc/latency_trace
itself showed up:

\ / ||||| \ | /
cat-3129 0D... 1us!: preempt_schedule_irq (svc_preempt)
cat-3129 0.... 5502us+: rt_up (l_start)
cat-3129 0D..1 5511us+: check_raw_flags (rt_up)
cat-3129 0...1 5514us+: rt_up (l_start)
cat-3129 0...1 5518us : sub_preempt_count_ti (rt_up)

What's happening in those 5 ms?

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2005-12-30 13:07:28

by Daniel Walker

[permalink] [raw]
Subject: Re: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

On Fri, 2005-12-30 at 08:44 +0100, Ingo Molnar wrote:
> there seem to be leaked preempt counts:
>
> <idle>-0 0.n.1 8974us : touch_critical_timing (cpu_idle)
>
> we should never have preemption disabled in cpu_idle(). To debug leaked
> preemption counts, enable CONFIG_DEBUG_PREEMPT.

ARM does disable preemption in cpu_idle() .

Daniel

2005-12-30 13:35:41

by Daniel Walker

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)


It looks like in ARM the cpu_idle() (and default_idle) get traced . So
for instance you'll get a situation when preemption is off, interrupts
are off, and then the cpu runs something like halt on x86. Then an
interrupt wakes up the cpu.

I attached a patch that might fix the tracing. I tried to prevent the
tracing around when the cpu halts.

Daniel


On Fri, 2005-12-30 at 13:40 +0100, kus Kusche Klaus wrote:
> > From: Ingo Molnar
> > there seem to be leaked preempt counts:
> >
> > <idle>-0 0.n.1 8974us : touch_critical_timing (cpu_idle)
> >
> > we should never have preemption disabled in cpu_idle(). To
> > debug leaked
> > preemption counts, enable CONFIG_DEBUG_PREEMPT.
>
> Something really fishy is going on here:
> That 9 ms latency seems to be really *idle* time!
>
> * If the box is idle, I get that trace almost immediately, and
> almost always with close to 9 ms (system clock is 100 Hz,
> i.e. 10 ms tick period).
>
> * If the box is 100 % loaded, I don't get that trace. I get
> different traces from different processes, mostly shorter than
> 9 ms.
>
> * If I load the box with work at regular intervals
> and idle time in between, I get traces
> identical to the 9 ms idle trace, but consistently shorter.
> If I throw a flood ping with 1000 pkt/s against my box, the idle
> trace shows up with 800 or 900 microseconds, i.e. the idle time
> between packets.
>
> Now, is the tracer wrong, or has the idle time a wrong status?
>
> By the way, I had one trace today where the cat /proc/latency_trace
> itself showed up:
>
> \ / ||||| \ | /
> cat-3129 0D... 1us!: preempt_schedule_irq (svc_preempt)
> cat-3129 0.... 5502us+: rt_up (l_start)
> cat-3129 0D..1 5511us+: check_raw_flags (rt_up)
> cat-3129 0...1 5514us+: rt_up (l_start)
> cat-3129 0...1 5518us : sub_preempt_count_ti (rt_up)
>
> What's happening in those 5 ms?
>


Attachments:
fix_tracing_arm_idle_loop.patch (777.00 B)

2005-12-30 18:46:27

by Lee Revell

[permalink] [raw]
Subject: Re: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

On Fri, 2005-12-30 at 15:42 +0800, Kai Geek wrote:
> Hello,
> i am inframed trace for irq14 bugging points.

What?

2006-01-02 07:57:39

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Daniel Walker
>
> It looks like in ARM the cpu_idle() (and default_idle) get traced . So
> for instance you'll get a situation when preemption is off, interrupts
> are off, and then the cpu runs something like halt on x86. Then an
> interrupt wakes up the cpu.
>
> I attached a patch that might fix the tracing. I tried to prevent the
> tracing around when the cpu halts.

I tried your patch. I makes the idle traces look quite different,
but each cpu_idle still causes a latency trace.

preemption latency trace v1.1.5 on 2.6.15-rc7-rt1
--------------------------------------------------------------------
latency: 8964 us, #67/67, CPU#0 | (M:rt VP:0, KP:0, SP:1 HP:1)
-----------------
| task: IRQ 45-18 (uid:0 nice:-5 policy:1 rt_prio:48)
-----------------

_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
<idle>-0 0D... 1us+: preempt_schedule_irq (svc_preempt)
<idle>-0 0.... 5us!: default_idle (cpu_idle)
<idle>-0 0D..1 8700us+: asm_do_IRQ (c021da48 1a 0)
<idle>-0 0D.h2 8708us+: default_mask_ack (handle_level_irq)
<idle>-0 0D.h2 8712us+: sa1100_mask_irq (default_mask_ack)
<idle>-0 0D.h2 8715us+: sa1100_mask_irq (default_mask_ack)
<idle>-0 0D.h2 8719us+: redirect_hardirq (handle_level_irq)
<idle>-0 0D.h1 8724us+: handle_IRQ_event (handle_level_irq)
<idle>-0 0D.h1 8727us+: sa1100_timer_interrupt (handle_IRQ_event)
<idle>-0 0D.h2 8730us+: timer_tick (sa1100_timer_interrupt)
<idle>-0 0D.h2 8733us+: profile_tick (timer_tick)
<idle>-0 0D.h2 8737us+: profile_hit (profile_tick)
<idle>-0 0D.h2 8740us+: do_timer (timer_tick)
<idle>-0 0D.h2 8743us+: update_process_times (timer_tick)
<idle>-0 0D.h2 8746us+: account_system_time (update_process_times)
<idle>-0 0D.h2 8750us+: run_local_timers (update_process_times)
<idle>-0 0D.h2 8752us+: raise_softirq (run_local_timers)
<idle>-0 0D.h2 8757us+: wakeup_softirqd (raise_softirq)
<idle>-0 0D.h2 8760us+: wake_up_process (wakeup_softirqd)
<idle>-0 0D.h2 8763us+: check_preempt_wakeup (wake_up_process)
<idle>-0 0D.h2 8767us+: try_to_wake_up (wake_up_process)
<idle>-0 0D.h3 8772us+: activate_task (try_to_wake_up)
<idle>-0 0D.h3 8775us+: sched_clock (activate_task)
<idle>-0 0D.h3 8780us+: activate_task <<...>-3> (62 0)
<idle>-0 0D.h3 8784us+: enqueue_task (activate_task)
<idle>-0 0Dnh3 8788us+: try_to_wake_up <<...>-3> (62 8c)
<idle>-0 0Dnh3 8792us+: check_raw_flags (try_to_wake_up)
<idle>-0 0Dnh2 8795us+: preempt_schedule (try_to_wake_up)
<idle>-0 0Dnh2 8799us+: wake_up_process (wakeup_softirqd)
<idle>-0 0Dnh2 8801us+: check_raw_flags (raise_softirq)
<idle>-0 0Dnh2 8804us+: rcu_pending (update_process_times)
<idle>-0 0Dnh2 8808us+: scheduler_tick (update_process_times)
<idle>-0 0Dnh2 8810us+: sched_clock (scheduler_tick)
<idle>-0 0Dnh1 8817us+: preempt_schedule (sa1100_timer_interrupt)
<idle>-0 0Dnh1 8822us+: note_interrupt (handle_level_irq)
<idle>-0 0Dnh2 8827us+: default_end (handle_level_irq)
<idle>-0 0Dnh2 8830us+: sa1100_unmask_irq (default_end)
<idle>-0 0Dnh1 8835us+: preempt_schedule (asm_do_IRQ)
<idle>-0 0Dnh1 8837us+: irq_exit (asm_do_IRQ)
<idle>-0 0Dn.1 8841us+: asm_do_IRQ (c021da48 b 0)
<idle>-0 0Dnh2 8847us+: sa1100_high_gpio_handler (asm_do_IRQ)
<idle>-0 0Dnh3 8856us+: default_ack (handle_edge_irq)
<idle>-0 0Dnh3 8861us+: sa1100_high_gpio_ack (default_ack)
<idle>-0 0Dnh3 8865us+: redirect_hardirq (handle_edge_irq)
<idle>-0 0Dnh3 8868us+: wake_up_process (redirect_hardirq)
<idle>-0 0Dnh3 8870us+: check_preempt_wakeup (wake_up_process)
<idle>-0 0Dnh3 8873us+: try_to_wake_up (wake_up_process)
<idle>-0 0Dnh4 8877us+: activate_task (try_to_wake_up)
<idle>-0 0Dnh4 8880us+: sched_clock (activate_task)
<idle>-0 0Dnh4 8885us+: activate_task <<...>-18> (33 1)
<idle>-0 0Dnh4 8888us+: enqueue_task (activate_task)
<idle>-0 0Dnh4 8892us+: try_to_wake_up <<...>-18> (33 8c)
<idle>-0 0Dnh4 8895us+: check_raw_flags (try_to_wake_up)
<idle>-0 0Dnh3 8899us+: preempt_schedule (try_to_wake_up)
<idle>-0 0Dnh3 8901us+: wake_up_process (redirect_hardirq)
<idle>-0 0Dnh2 8905us+: preempt_schedule
(sa1100_high_gpio_handler)
<idle>-0 0Dnh1 8907us+: preempt_schedule (asm_do_IRQ)
<idle>-0 0Dnh1 8910us+: irq_exit (asm_do_IRQ)
<idle>-0 0Dn.. 8914us+: preempt_schedule_irq (svc_preempt)
<idle>-0 0Dn.. 8918us+: __schedule (preempt_schedule_irq)
<idle>-0 0Dn.. 8921us+: profile_hit (__schedule)
<idle>-0 0Dn.1 8927us+: sched_clock (__schedule)
<...>-18 0...1 8947us+: __schedule <<idle>-0> (8c 33)
<...>-18 0...1 8955us+: sub_preempt_count (__schedule)
<...>-18 0...1 8962us : sub_preempt_count_ti (sub_preempt_count)


In general, I still have the problem that the latencies observed
by my test program (jitter of my sysclock-triggered usermode code
running with high rt priority) does not agree *at all* with
the output of the latency tracer (i.e. the latency tracer provides
no help for finding the real reasons for latencies).

1.) Many of the latencies shown by the latency tracer do not cause
any noticeable latency in my test program. This category includes
all these idle latencies, the rt_up latency in the latency tracer
itself (see previous mail), and the fpu emulation related latency
(trace 3 in my original mail): No matter how much fpu load I
generate, latency traces go up, but my test programm still doesn't
suffer from any latency or jitter.

2.) Flash writing activity causes latency traces around 10 ms (see
trace 5 in my original mail). It also causes frequent latency in my
test program, but only around 0.1 ms. I can't correlate the trace
output with the observed latencies, and a can't find the reason for
the 0.1 ms observed latency in the trace output.

3.) Even more strange is the situation w.r.t. console output:
There are "active" latency traces around 10 ms
(where console-related code is executed all the time), e.g. trace 4
in my original mail, and there are "idle" latency traces (with
multi-millisecond holes without any traced activity), also around
10 ms and most likely also console-related (traces 1, 6, 7 in my
original mail).

Console output also causes massive latency in my test program, but
it is not related to the around-10-ms latencies in the trace: The
observed latencies caused by console output range from
sub-millisecond to multi-second (!).


So I still believe something is confusing the latency tracer, not
only for the idle case, but in general.

It is also strange that most of the latencies reported by the tracer
(the idle latency, the fpu latency, the flash write latency, and
some of the console latencies) are mostly around 10 ms, i.e. the
system clock tick period. The tracer seems to mis-consider
everything from a certain point in time to the next system clock
interrupt (but no further) as a latency, even if it is not.


Next thing to try?

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2006-01-02 14:14:59

by Daniel Walker

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

On Mon, 2006-01-02 at 08:57 +0100, kus Kusche Klaus wrote:
>
> <idle>-0 0D... 1us+: preempt_schedule_irq (svc_preempt)
> <idle>-0 0.... 5us!: default_idle (cpu_idle)
> <idle>-0 0D..1 8700us+: asm_do_IRQ (c021da48 1a 0)

Your trace appears to be showing an actual latency of 300us . The trace
starts at 8700us . The default_idle line above is showing interrupts are
enable, and preemption is enabled . So the tracing code really should be
ignoring the default_idle line since there is no reason to be tracing.

Daniel

2006-01-02 14:39:37

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Daniel Walker
> On Mon, 2006-01-02 at 08:57 +0100, kus Kusche Klaus wrote:
> > <idle>-0 0D... 1us+: preempt_schedule_irq (svc_preempt)
> > <idle>-0 0.... 5us!: default_idle (cpu_idle)
> > <idle>-0 0D..1 8700us+: asm_do_IRQ (c021da48 1a 0)
>
> Your trace appears to be showing an actual latency of 300us .
> The trace
> starts at 8700us . The default_idle line above is showing
> interrupts are
> enable, and preemption is enabled . So the tracing code
> really should be
> ignoring the default_idle line since there is no reason to be
> tracing.

Ok, thanks, fine.

I always thought that the output of the tracer always represents a
single block of latency (either interrupts or preemption disabled),
from its beginning to its end.

Does that mean that any line with status "0...." is not dangerous
at all w.r.t. latency?

If this is the case, then it should not only be excluded from the
trace listing, but also from the total timings! The trace header says
that this is a latency of 8964 us, and this also means that any
latency shorter than that is not recorded by the tracer.

However, if the "real" latency of this trace is only 300 us, there
are quite likely longer "real" latencies (at least, my own test
programs strongly indicate that), and I'd like to see their traces
and get the maximum "real" latency duration in the statistics (the
interrupt latency histogram is also based on the values in the trace
and not on the "real" latencies: It has a significant peak around
8800 us, and I certainly don't have that many int-off periods of
8800 us on my system!).

So, what should I change in order to have only "real" latency timings
and to get the trace with the maximum "real" latency recorded?

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2006-01-02 14:44:10

by Daniel Walker

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

On Mon, 2006-01-02 at 15:39 +0100, kus Kusche Klaus wrote:
> > From: Daniel Walker
> > On Mon, 2006-01-02 at 08:57 +0100, kus Kusche Klaus wrote:
> > > <idle>-0 0D... 1us+: preempt_schedule_irq (svc_preempt)
> > > <idle>-0 0.... 5us!: default_idle (cpu_idle)
> > > <idle>-0 0D..1 8700us+: asm_do_IRQ (c021da48 1a 0)
> >
> > Your trace appears to be showing an actual latency of 300us .
> > The trace
> > starts at 8700us . The default_idle line above is showing
> > interrupts are
> > enable, and preemption is enabled . So the tracing code
> > really should be
> > ignoring the default_idle line since there is no reason to be
> > tracing.
>
> Ok, thanks, fine.
>
> I always thought that the output of the tracer always represents a
> single block of latency (either interrupts or preemption disabled),
> from its beginning to its end.
>
> Does that mean that any line with status "0...." is not dangerous
> at all w.r.t. latency?

The tracing isn't correct on ARM . It shouldn't show a max latency of
8700us when it's only 300us . I'm not saying there isn't a problem.

> If this is the case, then it should not only be excluded from the
> trace listing, but also from the total timings! The trace header says
> that this is a latency of 8964 us, and this also means that any
> latency shorter than that is not recorded by the tracer.
>
> However, if the "real" latency of this trace is only 300 us, there
> are quite likely longer "real" latencies (at least, my own test
> programs strongly indicate that), and I'd like to see their traces
> and get the maximum "real" latency duration in the statistics (the
> interrupt latency histogram is also based on the values in the trace
> and not on the "real" latencies: It has a significant peak around
> 8800 us, and I certainly don't have that many int-off periods of
> 8800 us on my system!).

Right .. I'm still looking into it. ARM is just missing some vital
tracing bits I think .

Daniel


2006-01-02 14:55:57

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Daniel Walker
> Right .. I'm still looking into it. ARM is just missing some vital
> tracing bits I think .

As I wrote in some earlier mail, I'm probably the first one ever
who tried it on ARM: When I tried first, tracing didn't work at all,
because the trace timing macro's were not defined (at least for
sa1100). I quick-hacked the three missing macros (this caused the
tracer to produce at least some output) without checking if
anything else is missing.


--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2006-01-02 15:36:53

by Daniel Walker

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

On Mon, 2006-01-02 at 15:55 +0100, kus Kusche Klaus wrote:
> > From: Daniel Walker
> > Right .. I'm still looking into it. ARM is just missing some vital
> > tracing bits I think .
>
> As I wrote in some earlier mail, I'm probably the first one ever
> who tried it on ARM: When I tried first, tracing didn't work at all,
> because the trace timing macro's were not defined (at least for
> sa1100). I quick-hacked the three missing macros (this caused the
> tracer to produce at least some output) without checking if
> anything else is missing.

I'm sure the code has been tested , but I doubt it's ever really been
used .

What macro's are you talking about? Did you submit a patch already to
fix that? If not, please do.

Daniel

2006-01-02 16:07:21

by Daniel Walker

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)


Here's a more updated patch, the should replace the other patch I sent.
I think the tracing error is the result of a missed interrupt enable in
the ARM assembly code. I've only compile tested this.

Daniel

On Mon, 2006-01-02 at 15:55 +0100, kus Kusche Klaus wrote:
> > From: Daniel Walker
> > Right .. I'm still looking into it. ARM is just missing some vital
> > tracing bits I think .
>
> As I wrote in some earlier mail, I'm probably the first one ever
> who tried it on ARM: When I tried first, tracing didn't work at all,
> because the trace timing macro's were not defined (at least for
> sa1100). I quick-hacked the three missing macros (this caused the
> tracer to produce at least some output) without checking if
> anything else is missing.
>
>


Attachments:
fix_tracing_arm_idle_loop.patch (1.49 kB)

2006-01-03 07:27:27

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Daniel Walker
> On Mon, 2006-01-02 at 15:55 +0100, kus Kusche Klaus wrote:
> > As I wrote in some earlier mail, I'm probably the first one ever
> > who tried it on ARM: When I tried first, tracing didn't work at all,
> > because the trace timing macro's were not defined (at least for
> > sa1100). I quick-hacked the three missing macros (this caused the
> > tracer to produce at least some output) without checking if
> > anything else is missing.
> What macro's are you talking about? Did you submit a patch already to
> fix that? If not, please do.

Too dirty to be a patch.

The missing pieces are:
#include <asm/arch/hardware.h>
#define mach_read_cycles() (OSCR)
#define mach_cycles_to_usecs(d) ((d)>>2)
#define mach_usecs_to_cycles(d) ((d)<<2)

First of all, they are plain wrong, because they use a factor of 4,
whereas the correct factor is 3.6864
(so the timings in my traces are not really microseconds).

However, I do not know how to do that quickly and correctly...
(64 bit arithmetic?)

Moreover, they should go to include/asm-arm/arch-sa1100/timex.h,
because only SA and PXA have an OSCR timer, and because the factor
differs across the chips, but I did them in include/asm-arm/timex.h.

(however, the macros are not defined for *any* ARM chip in the
current tree)

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2006-01-03 08:00:55

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Daniel Walker
> Here's a more updated patch, the should replace the other
> patch I sent.
> I think the tracing error is the result of a missed interrupt
> enable in
> the ARM assembly code. I've only compile tested this.

Compiles, but BUGs immediately after uncompressing (second line of
console output) and then runs into an infinite Oops loop...
Reproducible.

Uncompressing
Linux.................................................................
done, booting the kernel.
BUG: bad raw irq-flag value 600000d3, swapper/0!
Linux version 2.6.15-rc7-rt1 (kk@silver) (gcc version 3.4.4) #22 PREEMPT
Tue Jan 3 08:03:43 CET 2006
CPU: StrongARM-1110 [6901b118] revision 8 (ARMv4)
Machine: Keba KETOP
Ignoring unrecognised tag 0x00000000
Memory policy: ECC disabled, Data cache writeback
MM: invalid domain in supersection mapping for 0x8000000000 at
0xea000000
MM: invalid domain in supersection mapping for 0x18000000000 at
0xf0000000
ketop map io done
Real-Time Preemption Support (C) 2004-2005 Ingo Molnar
Built 1 zonelists
Kernel command line: root=31:02 rootfstype=ext2 console=ttySA0,38400n8
console=tty0
WARNING: experimental RCU implementation.
PID hash table entries: 512 (order: 9, 8192 bytes)
Warning: uninitialized Real Time Clock
Console: colour dummy device 80x30
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 64MB = 64MB total
Memory: 61148KB available (1606K code, 2041K data, 80K init)
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
Unable to handle kernel NULL pointer dereference at virtual address
000000c0
pgd = c0204000
[000000c0] *pgd=00000000
Internal error: Oops: 0 [#1]
Modules linked in:
CPU: 0
PC is at 0xc0
LR is at 0xc0
pc : [<000000c0>] lr : [<000000c0>] Not tainted
sp : c00a1fb0 ip : c00a1fb0 fp : 00000000
r10: 00000000 r9 : c00a0000 r8 : 00000001
r7 : 00000000 r6 : 00000000 r5 : 00000000 r4 : 00000000
r3 : c00a0000 r2 : 40000053 r1 : 00000000 r0 : 00000000
Flags: nZCv IRQs on FIQs off Mode SVC_32 Segment kernel
Control: C020717F Table: C020717F DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc00a0194)
Stack: (0xc00a1fb0 to 0xc00a2000)
1fa0: 00000000 00000000 c021c084
c02345f8
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
1fe0: 00000000 00000000 00000000 c021d94c 00000013 00000000 ffffffff
ffffffff
Backtrace: no frame pointer
Code: <1>Unable to handle kernel NULL pointer dereference at virtual
address 000000c0
pgd = c0204000
[000000c0] *pgd=00000000
Internal error: Oops: 0 [#2]
Modules linked in:
CPU: 0
PC is at 0xc0
LR is at 0xc0
pc : [<000000c0>] lr : [<000000c0>] Not tainted
sp : c00a1e74 ip : c00a1e74 fp : c00a1ecc
r10: c05bf9c0 r9 : 000000d3 r8 : c00a1fb0
r7 : 000000c0 r6 : 00000000 r5 : c00a1ea8 r4 : ffffffff
r3 : c00a0000 r2 : 600000d3 r1 : c03b1548 r0 : c038c508
Flags: nzCv IRQs off FIQs off Mode SVC_32 Segment kernel
Control: C020717F Table: C020717F DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc00a0194)
Stack: (0xc00a1e74 to 0xc00a2000)
1e60: 00000009 00000001
000000b0
1e80: 00000000 fffffffc c00a0000 00000000 000000c0 c00a1fb0 c00a1f68
c05bf9c0
1ea0: c00a1ecc c00a1de0 c00a1ebc c024cda4 c0385770 000000d3 ffffffff
00000000
1ec0: c00a1eec c00a1ed0 c0222b60 c0220f60 000000c0 c05bf9c0 c00a0000
00000000
1ee0: c00a1f28 c00a1ef0 c0222e3c c0222afc 00000000 c025034c 001da3bf
00000000
1f00: 000000c0 c00a1f9c c00a0000 00000000 c00a1f68 60000053 00000000
c00a1f54
1f20: c00a1f2c c0222e8c c0222c4c c00a0000 ffffffff c00a1f9c 00000000
00000000
1f40: 00000001 00000000 c00a1f64 c00a1f58 c0223028 c0222e60 00000000
c00a1f68
1f60: c021ca60 c022301c 00000000 00000000 40000053 c00a0000 00000000
00000000
1f80: 00000000 00000000 00000001 c00a0000 00000000 00000000 c00a1fb0
c00a1fb0
1fa0: 000000c0 000000c0 60000053 ffffffff 00000000 00000000 c021c084
c02345f8
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
1fe0: 00000000 00000000 00000000 c021d94c 00000013 00000000 ffffffff
ffffffff
Backtrace:
[<c0220f54>] (die+0x0/0x344) from [<c0222b60>]
(__do_kernel_fault+0x70/0x84)
[<c0222af0>] (__do_kernel_fault+0x0/0x84) from [<c0222e3c>]
(do_page_fault+0x1fc/0x214)
r7 = 00000000 r6 = C00A0000 r5 = C05BF9C0 r4 = 000000C0
[<c0222c40>] (do_page_fault+0x0/0x214) from [<c0222e8c>]
(do_translation_fault+0x38/0xc4)
[<c0222e54>] (do_translation_fault+0x0/0xc4) from [<c0223028>]
(do_PrefetchAbort+0x18/0x1c)
[<c0223010>] (do_PrefetchAbort+0x0/0x1c) from [<c021ca60>]
(__pabt_svc+0x40/0x80)
Code: <1>Unable to handle kernel NULL pointer dereference at virtual
address 000000c0
pgd = c0204000
[000000c0] *pgd=00000000
Internal error: Oops: 0 [#3]
Modules linked in:
CPU: 0
PC is at 0xc0
LR is at 0xc0
pc : [<000000c0>] lr : [<000000c0>] Not tainted
sp : c00a1d38 ip : c00a1d38 fp : c00a1d90
r10: c05bf9c0 r9 : 000000d3 r8 : c00a1e74
r7 : 000000c0 r6 : 00000000 r5 : c00a1d6c r4 : ffffffff
r3 : c00a0000 r2 : 600000d3 r1 : c03b1548 r0 : c038c508
Flags: nzCv IRQs off FIQs off Mode SVC_32 Segment kernel
Control: C020717F Table: C020717F DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc00a0194)
Stack: (0xc00a1d38 to 0xc00a2000)
1d20: 00000009
00000001
1d40: 000000b0 00000000 fffffffc c00a0000 00000000 000000c0 c00a1e74
c00a1e2c
1d60: c05bf9c0 c00a1d90 0000c024 c00a1d80 000000c0 c0385770 000000d3
ffffffff
1d80: 00000000 c00a1db0 c00a1d94 c0222b60 c0220f60 000000c0 c05bf9c0
c00a0000
1da0: 00000000 c00a1dec c00a1db4 c0222e3c c0222afc 000000b0 c05bf9c0
c00a1e70
1dc0: 00000000 000000c0 c00a1e60 c00a0000 00000000 c00a1e2c 200000d3
c05bf9c0
1de0: c00a1e18 c00a1df0 c0222e8c c0222c4c 00000000 ffffffff c00a1e60
00000000
1e00: 000000c0 c00a1fb0 c05bf9c0 c00a1e28 c00a1e1c c0223028 c0222e60
c00a1ecc
1e20: c00a1e2c c021ca60 c022301c c038c508 c03b1548 600000d3 c00a0000
ffffffff
1e40: c00a1ea8 00000000 000000c0 c00a1fb0 000000d3 c05bf9c0 c00a1ecc
c00a1e74
1e60: c00a1e74 000000c0 000000c0 200000d3 ffffffff 00000009 00000001
000000b0
1e80: 00000000 fffffffc c00a0000 00000000 000000c0 c00a1fb0 c00a1f68
c05bf9c0
1ea0: c00a1ecc c00a1de0 c00a1ebc c024cda4 c0385770 000000d3 ffffffff
00000000
1ec0: c00a1eec c00a1ed0 c0222b60 c0220f60 000000c0 c05bf9c0 c00a0000
00000000
1ee0: c00a1f28 c00a1ef0 c0222e3c c0222afc 00000000 c025034c 001da3bf
00000000
1f00: 000000c0 c00a1f9c c00a0000 00000000 c00a1f68 60000053 00000000
c00a1f54
1f20: c00a1f2c c0222e8c c0222c4c c00a0000 ffffffff c00a1f9c 00000000
00000000
1f40: 00000001 00000000 c00a1f64 c00a1f58 c0223028 c0222e60 00000000
c00a1f68
1f60: c021ca60 c022301c 00000000 00000000 40000053 c00a0000 00000000
00000000
1f80: 00000000 00000000 00000001 c00a0000 00000000 00000000 c00a1fb0
c00a1fb0
1fa0: 000000c0 000000c0 60000053 ffffffff 00000000 00000000 c021c084
c02345f8
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
1fe0: 00000000 00000000 00000000 c021d94c 00000013 00000000 ffffffff
ffffffff
Backtrace:
[<c0220f54>] (die+0x0/0x344) from [<c0222b60>]
(__do_kernel_fault+0x70/0x84)
[<c0222af0>] (__do_kernel_fault+0x0/0x84) from [<c0222e3c>]
(do_page_fault+0x1fc/0x214)
r7 = 00000000 r6 = C00A0000 r5 = C05BF9C0 r4 = 000000C0
[<c0222c40>] (do_page_fault+0x0/0x214) from [<c0222e8c>]
(do_translation_fault+0x38/0xc4)
[<c0222e54>] (do_translation_fault+0x0/0xc4) from [<c0223028>]
(do_PrefetchAbort+0x18/0x1c)
[<c0223010>] (do_PrefetchAbort+0x0/0x1c) from [<c021ca60>]
(__pabt_svc+0x40/0x80)
[<c0220f54>] (die+0x0/0x344) from [<c0222b60>]
(__do_kernel_fault+0x70/0x84)
[<c0222af0>] (__do_kernel_fault+0x0/0x84) from [<c0222e3c>]
(do_page_fault+0x1fc/0x214)
r7 = 00000000 r6 = C00A0000 r5 = C05BF9C0 r4 = 000000C0
[<c0222c40>] (do_page_fault+0x0/0x214) from [<c0222e8c>]
(do_translation_fault+0x38/0xc4)
[<c0222e54>] (do_translation_fault+0x0/0xc4) from [<c0223028>]
(do_PrefetchAbort+0x18/0x1c)
[<c0223010>] (do_PrefetchAbort+0x0/0x1c) from [<c021ca60>]
(__pabt_svc+0x40/0x80)


--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2006-01-03 14:16:58

by Daniel Walker

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

On Tue, 2006-01-03 at 09:00 +0100, kus Kusche Klaus wrote:
> > From: Daniel Walker
> > Here's a more updated patch, the should replace the other
> > patch I sent.
> > I think the tracing error is the result of a missed interrupt
> > enable in
> > the ARM assembly code. I've only compile tested this.
>
> Compiles, but BUGs immediately after uncompressing (second line of
> console output) and then runs into an infinite Oops loop...
> Reproducible.

Ok, yet another patch. This one uses the correct lowlevel calls, and I
fixed the call ordering.

Daniel


Attachments:
fix_tracing_arm_idle_loop.patch (1.50 kB)

2006-01-03 14:57:33

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Daniel Walker
> On Tue, 2006-01-03 at 09:00 +0100, kus Kusche Klaus wrote:
> > > From: Daniel Walker
> > > Here's a more updated patch, the should replace the other
> > > patch I sent.
> > > I think the tracing error is the result of a missed interrupt
> > > enable in
> > > the ARM assembly code. I've only compile tested this.
> >
> > Compiles, but BUGs immediately after uncompressing (second line of
> > console output) and then runs into an infinite Oops loop...
> > Reproducible.
>
> Ok, yet another patch. This one uses the correct lowlevel calls, and I
> fixed the call ordering.

Hmmm, it changes a few flag and register values (e.g. lr),
but basically it gives the same BUG and Oops.

As the first BUG is very early:
Is it possible that tracing gets called before it is initialized?


Uncompressing
Linux.................................................................
done, booting the kernel.
BUG: bad raw irq-flag value 600000d3, swapper/0!
Linux version 2.6.15-rc7-rt1 (kk@silver) (gcc version 3.4.4) #23 PREEMPT
Tue Jan 3 14:56:50 CET 2006
CPU: StrongARM-1110 [6901b118] revision 8 (ARMv4)
Machine: Keba KETOP
Ignoring unrecognised tag 0x00000000
Memory policy: ECC disabled, Data cache writeback
MM: invalid domain in supersection mapping for 0x8000000000 at
0xea000000
MM: invalid domain in supersection mapping for 0x18000000000 at
0xf0000000
ketop map io done
Real-Time Preemption Support (C) 2004-2005 Ingo Molnar
Built 1 zonelists
Kernel command line: root=31:02 rootfstype=ext2 console=ttySA0,38400n8
console=tty0
WARNING: experimental RCU implementation.
PID hash table entries: 512 (order: 9, 8192 bytes)
Warning: uninitialized Real Time Clock
Console: colour dummy device 80x30
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 64MB = 64MB total
Memory: 61148KB available (1606K code, 2041K data, 80K init)
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
Unable to handle kernel NULL pointer dereference at virtual address
000000c0
pgd = c0204000
[000000c0] *pgd=00000000
Internal error: Oops: 0 [#1]
Modules linked in:
CPU: 0
PC is at 0xc0
LR is at __init_begin+0x3fdf8000/0x2c
pc : [<000000c0>] lr : [<00000000>] Not tainted
sp : c00a1fb0 ip : 00000000 fp : 00000000
r10: 00000000 r9 : c00a0000 r8 : 00000001
r7 : 00000000 r6 : 00000000 r5 : 00000000 r4 : 00000000
r3 : 00000000 r2 : 80000093 r1 : 00000000 r0 : 00000000
Flags: Nzcv IRQs off FIQs on Mode SVC_32 Segment kernel
Control: C020717F Table: C020717F DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc00a0194)
Stack: (0xc00a1fb0 to 0xc00a2000)
1fa0: 00000000 00000000 c021c084
c02345f8
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
1fe0: 00000000 00000000 00000000 c021d94c 00000013 00000000 ffffffff
ffffffff
Backtrace: no frame pointer
Code: <1>Unable to handle kernel NULL pointer dereference at virtual
address 000000c0
pgd = c0204000
[000000c0] *pgd=00000000
Internal error: Oops: 0 [#2]
Modules linked in:
CPU: 0
PC is at 0xc0
LR is at 0xc0
pc : [<000000c0>] lr : [<000000c0>] Not tainted
sp : c00a1e74 ip : c00a1e74 fp : c00a1ecc
r10: c05bf9c0 r9 : 00000093 r8 : c00a1fb0
r7 : 000000c0 r6 : 00000000 r5 : c00a1ea8 r4 : ffffffff
r3 : 00000001 r2 : 60000093 r1 : c03b1548 r0 : c038c508
Flags: nzCv IRQs off FIQs on Mode SVC_32 Segment kernel
Control: C020717F Table: C020717F DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc00a0194)
Stack: (0xc00a1e74 to 0xc00a2000)
1e60: 00000009 00000001
000000b0
1e80: 00000000 fffffffc c00a0000 00000000 000000c0 c00a1fb0 c00a1f68
c05bf9c0
1ea0: c00a1ecc c00a1de0 c00a1ebc c024cda4 c0385770 00000093 ffffffff
00000000
1ec0: c00a1eec c00a1ed0 c0222b60 c0220f60 000000c0 c05bf9c0 c00a0000
00000000
1ee0: c00a1f28 c00a1ef0 c0222e3c c0222afc 00000000 c025034c 001da3de
00000000
1f00: 000000c0 c00a1f9c c00a0000 00000000 c00a1f68 80000093 00000000
c00a1f54
1f20: c00a1f2c c0222e8c c0222c4c 00000010 ffffffff c00a1f9c 00000000
00000000
1f40: 00000001 00000000 c00a1f64 c00a1f58 c0223028 c0222e60 00000000
c00a1f68
1f60: c021ca60 c022301c 00000000 00000000 80000093 00000000 00000000
00000000
1f80: 00000000 00000000 00000001 c00a0000 00000000 00000000 00000000
c00a1fb0
1fa0: 00000000 000000c0 80000093 ffffffff 00000000 00000000 c021c084
c02345f8
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
1fe0: 00000000 00000000 00000000 c021d94c 00000013 00000000 ffffffff
ffffffff
Backtrace:
[<c0220f54>] (die+0x0/0x344) from [<c0222b60>]
(__do_kernel_fault+0x70/0x84)
[<c0222af0>] (__do_kernel_fault+0x0/0x84) from [<c0222e3c>]
(do_page_fault+0x1fc/0x214)
r7 = 00000000 r6 = C00A0000 r5 = C05BF9C0 r4 = 000000C0
[<c0222c40>] (do_page_fault+0x0/0x214) from [<c0222e8c>]
(do_translation_fault+0x38/0xc4)
[<c0222e54>] (do_translation_fault+0x0/0xc4) from [<c0223028>]
(do_PrefetchAbort+0x18/0x1c)
[<c0223010>] (do_PrefetchAbort+0x0/0x1c) from [<c021ca60>]
(__pabt_svc+0x40/0x80)
Code: <1>Unable to handle kernel NULL pointer dereference at virtual
address 000000c0
pgd = c0204000
[000000c0] *pgd=00000000
Internal error: Oops: 0 [#3]
Modules linked in:
CPU: 0
PC is at 0xc0
LR is at 0xc0
pc : [<000000c0>] lr : [<000000c0>] Not tainted
sp : c00a1d38 ip : c00a1d38 fp : c00a1d90
r10: c05bf9c0 r9 : 00000093 r8 : c00a1e74
r7 : 000000c0 r6 : 00000000 r5 : c00a1d6c r4 : ffffffff
r3 : 00000002 r2 : 60000093 r1 : c03b1548 r0 : c038c508
Flags: nzCv IRQs off FIQs on Mode SVC_32 Segment kernel
Control: C020717F Table: C020717F DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc00a0194)
Stack: (0xc00a1d38 to 0xc00a2000)
1d20: 00000009
00000001
1d40: 000000b0 00000000 fffffffc c00a0000 00000000 000000c0 c00a1e74
c00a1e2c
1d60: c05bf9c0 c00a1d90 0000c024 c00a1d80 000000c0 c0385770 00000093
ffffffff
1d80: 00000000 c00a1db0 c00a1d94 c0222b60 c0220f60 000000c0 c05bf9c0
c00a0000
1da0: 00000000 c00a1dec c00a1db4 c0222e3c c0222afc 000000b0 c05bf9c0
c00a1e70
1dc0: 00000000 000000c0 c00a1e60 c00a0000 00000000 c00a1e2c 20000093
c05bf9c0
1de0: c00a1e18 c00a1df0 c0222e8c c0222c4c 00000000 ffffffff c00a1e60
00000000
1e00: 000000c0 c00a1fb0 c05bf9c0 c00a1e28 c00a1e1c c0223028 c0222e60
c00a1ecc
1e20: c00a1e2c c021ca60 c022301c c038c508 c03b1548 60000093 00000001
ffffffff
1e40: c00a1ea8 00000000 000000c0 c00a1fb0 00000093 c05bf9c0 c00a1ecc
c00a1e74
1e60: c00a1e74 000000c0 000000c0 20000093 ffffffff 00000009 00000001
000000b0
1e80: 00000000 fffffffc c00a0000 00000000 000000c0 c00a1fb0 c00a1f68
c05bf9c0
1ea0: c00a1ecc c00a1de0 c00a1ebc c024cda4 c0385770 00000093 ffffffff
00000000
1ec0: c00a1eec c00a1ed0 c0222b60 c0220f60 000000c0 c05bf9c0 c00a0000
00000000
1ee0: c00a1f28 c00a1ef0 c0222e3c c0222afc 00000000 c025034c 001da3de
00000000
1f00: 000000c0 c00a1f9c c00a0000 00000000 c00a1f68 80000093 00000000
c00a1f54
1f20: c00a1f2c c0222e8c c0222c4c 00000010 ffffffff c00a1f9c 00000000
00000000
1f40: 00000001 00000000 c00a1f64 c00a1f58 c0223028 c0222e60 00000000
c00a1f68
1f60: c021ca60 c022301c 00000000 00000000 80000093 00000000 00000000
00000000
1f80: 00000000 00000000 00000001 c00a0000 00000000 00000000 00000000
c00a1fb0
1fa0: 00000000 000000c0 80000093 ffffffff 00000000 00000000 c021c084
c02345f8
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
1fe0: 00000000 00000000 00000000 c021d94c 00000013 00000000 ffffffff
ffffffff
Backtrace:
[<c0220f54>] (die+0x0/0x344) from [<c0222b60>]
(__do_kernel_fault+0x70/0x84)
[<c0222af0>] (__do_kernel_fault+0x0/0x84) from [<c0222e3c>]
(do_page_fault+0x1fc/0x214)
r7 = 00000000 r6 = C00A0000 r5 = C05BF9C0 r4 = 000000C0
[<c0222c40>] (do_page_fault+0x0/0x214) from [<c0222e8c>]
(do_translation_fault+0x38/0xc4)
[<c0222e54>] (do_translation_fault+0x0/0xc4) from [<c0223028>]
(do_PrefetchAbort+0x18/0x1c)
[<c0223010>] (do_PrefetchAbort+0x0/0x1c) from [<c021ca60>]
(__pabt_svc+0x40/0x80)
[<c0220f54>] (die+0x0/0x344) from [<c0222b60>]
(__do_kernel_fault+0x70/0x84)
[<c0222af0>] (__do_kernel_fault+0x0/0x84) from [<c0222e3c>]
(do_page_fault+0x1fc/0x214)
r7 = 00000000 r6 = C00A0000 r5 = C05BF9C0 r4 = 000000C0
[<c0222c40>] (do_page_fault+0x0/0x214) from [<c0222e8c>]
(do_translation_fault+0x38/0xc4)
[<c0222e54>] (do_translation_fault+0x0/0xc4) from [<c0223028>]
(do_PrefetchAbort+0x18/0x1c)
[<c0223010>] (do_PrefetchAbort+0x0/0x1c) from [<c021ca60>]
(__pabt_svc+0x40/0x80)

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2006-01-03 15:01:52

by Daniel Walker

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

On Tue, 2006-01-03 at 15:57 +0100, kus Kusche Klaus wrote:
> Ok, yet another patch. This one uses the correct lowlevel calls, and I
> > fixed the call ordering.
>
> Hmmm, it changes a few flag and register values (e.g. lr),
> but basically it gives the same BUG and Oops.
>
> As the first BUG is very early:
> Is it possible that tracing gets called before it is initialized?

Most likely . It's hard to create a global solution in the entry-*.S
files cause the code in there is called so early.

Daniel

2006-01-03 15:40:32

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Daniel Walker
> Most likely . It's hard to create a global solution in the entry-*.S
> files cause the code in there is called so early.

One more problem:
There seem to be more irq enable/disable than in entry-header.S:
entry-armv.S contains an open-coded enable,
asm-arm/assembler.h defines irq save/restore macros, ...

And they all do not trace.

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2006-01-04 09:28:52

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Daniel Walker
> Ok, yet another patch. This one uses the correct lowlevel calls, and I
> fixed the call ordering.

Hmm, I have no deep knowledge of ARM assembler programming,
but your patch branches to a C subroutine without setting up a
return address in the lr register.

Hence, the return in trace_irqs_... jumps to god-knows-where,
but not back to the next instruction.

As a wild guess, I replaced the "b trace_irqs_..." with
"bl trace_irqs_...".

With this change, the kernel boots fine, but the system seems to go
into an infinite loop as soon as the first usermode processes start.
Most likely, my change messes up the lr register of some surrounding
context.

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2006-01-05 14:30:46

by Daniel Walker

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

On Wed, 2006-01-04 at 10:28 +0100, kus Kusche Klaus wrote:
> > From: Daniel Walker
> > Ok, yet another patch. This one uses the correct lowlevel calls, and I
> > fixed the call ordering.
>
> Hmm, I have no deep knowledge of ARM assembler programming,
> but your patch branches to a C subroutine without setting up a
> return address in the lr register.
>
> Hence, the return in trace_irqs_... jumps to god-knows-where,
> but not back to the next instruction.
>
> As a wild guess, I replaced the "b trace_irqs_..." with
> "bl trace_irqs_...".

Ok .

> With this change, the kernel boots fine, but the system seems to go
> into an infinite loop as soon as the first usermode processes start.
> Most likely, my change messes up the lr register of some surrounding
> context.

That's odd. Do you have an OOPS, or is it just a silent hang?


Daniel

2006-01-05 15:32:23

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Latency traces I cannot interpret (sa1100, 2.6.15-rc7-rt1)

> From: Daniel Walker [mailto:[email protected]]
> On Wed, 2006-01-04 at 10:28 +0100, kus Kusche Klaus wrote:
> > > From: Daniel Walker
> > > Ok, yet another patch. This one uses the correct lowlevel
> calls, and I
> > > fixed the call ordering.
> >
> > Hmm, I have no deep knowledge of ARM assembler programming,
> > but your patch branches to a C subroutine without setting up a
> > return address in the lr register.
> >
> > Hence, the return in trace_irqs_... jumps to god-knows-where,
> > but not back to the next instruction.
> >
> > As a wild guess, I replaced the "b trace_irqs_..." with
> > "bl trace_irqs_...".
>
> Ok .
>
> > With this change, the kernel boots fine, but the system seems to go
> > into an infinite loop as soon as the first usermode processes start.
> > Most likely, my change messes up the lr register of some surrounding
> > context.
>
> That's odd. Do you have an OOPS, or is it just a silent hang?

No Oops, just a silent hang.

But as far as I can tell, the macros are used in code parts where just
jumps, but no calls are made (e.g. syscall entry / exit sequence)
and the lr register contains data which should not be messed up by
subroutine calls.


--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com