LinuxLists.cc - Mmiotrace regression in 2.6.28-rc6

2008-11-23 13:42:07

Subject: Mmiotrace regression in 2.6.28-rc6

Hi Steve,

now that the NULL deref bug in ring buffer resize is gone, I found
another regression.

It seems I am not getting any "buffer overflow" messages anymore.

Oh bummer, looks like I completely missed this one:
(hope it's not in any .27 kernel)

commit 3928a8a2d98081d1bc3c0a84a2d70e29b90ecf1c
Author: Steven Rostedt <[email protected]>
Date: Mon Sep 29 23:02:41 2008 -0400

ftrace: make work with new ring buffer

This patch ports ftrace over to the new ring buffer.

Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

<...>
diff --git a/kernel/trace/trace_mmiotrace.c b/kernel/trace/trace_mmiotrace.c
index a108c32..bdbf09d 100644
--- a/kernel/trace/trace_mmiotrace.c
+++ b/kernel/trace/trace_mmiotrace.c
<...>
@@ -130,10 +130,14 @@ static unsigned long count_overruns(struct trace_iterator *iter)
{
int cpu;
unsigned long cnt = 0;
+/* FIXME: */
+#if 0
for_each_online_cpu(cpu) {
cnt += iter->overrun[cpu];
iter->overrun[cpu] = 0;
}
+#endif
+ (void)cpu;
return cnt;
}

Steve, how am I going to fix this?
I think it's quite crucial to know when the dump is incomplete.

There's also another issue, but I am not sure how to reproduce it.
The symptom is, that
echo whatever > /debug/tracing/trace_marker
gets stuck in a loop, bash eats 100% CPU: 10% user, 90% system.
I got that twice today, like this: after a fresh boot echoing 64, 200,
70 to trace_entries, switching between nop and mmiotrace tracers,
echoing 0 or 1 to tracing_enabled, and running the testmmiotrace
module. At some point the user marker got stuck.

But then I tried to do that a third time by repeating some usage
patterns I thought might trigger it, but it didn't happen.

On the positive side, the basic test using testmmiotrace.ko
with the basic setup produces the correct output.

Thanks.

--
Pekka Paalanen
http://www.iki.fi/pq/

2008-11-23 14:04:18

by Ingo Molnar

[permalink] [raw]

Subject: Re: Mmiotrace regression in 2.6.28-rc6

* Pekka Paalanen <[email protected]> wrote:

> Hi Steve,
>
> now that the NULL deref bug in ring buffer resize is gone, I found
> another regression.
>
> It seems I am not getting any "buffer overflow" messages anymore.
>
> Oh bummer, looks like I completely missed this one:
> (hope it's not in any .27 kernel)

yeah, the ring-buffer is new in .28.

Ingo

2008-11-23 16:35:23

by Steven Rostedt

[permalink] [raw]

Subject: Re: Mmiotrace regression in 2.6.28-rc6

On Sun, 23 Nov 2008, Pekka Paalanen wrote:

> Hi Steve,
>
> now that the NULL deref bug in ring buffer resize is gone, I found
> another regression.
>
> It seems I am not getting any "buffer overflow" messages anymore.
>
> Oh bummer, looks like I completely missed this one:
> (hope it's not in any .27 kernel)
>
>
> commit 3928a8a2d98081d1bc3c0a84a2d70e29b90ecf1c
> Author: Steven Rostedt <[email protected]>
> Date: Mon Sep 29 23:02:41 2008 -0400
>
> ftrace: make work with new ring buffer
>
> This patch ports ftrace over to the new ring buffer.
>
> Signed-off-by: Steven Rostedt <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> <...>
> diff --git a/kernel/trace/trace_mmiotrace.c b/kernel/trace/trace_mmiotrace.c
> index a108c32..bdbf09d 100644
> --- a/kernel/trace/trace_mmiotrace.c
> +++ b/kernel/trace/trace_mmiotrace.c
> <...>
> @@ -130,10 +130,14 @@ static unsigned long count_overruns(struct trace_iterator *iter)
> {
> int cpu;
> unsigned long cnt = 0;
> +/* FIXME: */
> +#if 0
> for_each_online_cpu(cpu) {
> cnt += iter->overrun[cpu];
> iter->overrun[cpu] = 0;
> }
> +#endif

Hi Pekka,

Use ring_buffer_overruns(iter->tr->buffer). See print_trace_hader in
kernel/trace/trace.c for more info.

-- Steve

> + (void)cpu;
> return cnt;
> }
>
> Steve, how am I going to fix this?
> I think it's quite crucial to know when the dump is incomplete.
>
>
> There's also another issue, but I am not sure how to reproduce it.
> The symptom is, that
> echo whatever > /debug/tracing/trace_marker
> gets stuck in a loop, bash eats 100% CPU: 10% user, 90% system.
> I got that twice today, like this: after a fresh boot echoing 64, 200,
> 70 to trace_entries, switching between nop and mmiotrace tracers,
> echoing 0 or 1 to tracing_enabled, and running the testmmiotrace
> module. At some point the user marker got stuck.
>
> But then I tried to do that a third time by repeating some usage
> patterns I thought might trigger it, but it didn't happen.
>
> On the positive side, the basic test using testmmiotrace.ko
> with the basic setup produces the correct output.
>
>
> Thanks.
>
> --
> Pekka Paalanen
> http://www.iki.fi/pq/
>
>

2008-11-23 19:24:47

by Pekka Paalanen

[permalink] [raw]

Subject: [PATCH 1/2] mmiotrace: fix buffer overrun detection

>From d5d72f3515ba3abffa317b985d7597718a024841 Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <[email protected]>
Date: Sun, 23 Nov 2008 19:18:18 +0200
Subject: [PATCH] mmiotrace: fix buffer overrun detection

When ftrace framework moved to use the ring buffer facility, the buffer
overrun detection was broken after 2.6.27 by commit

commit 3928a8a2d98081d1bc3c0a84a2d70e29b90ecf1c
Author: Steven Rostedt <[email protected]>
Date: Mon Sep 29 23:02:41 2008 -0400

ftrace: make work with new ring buffer

This patch ports ftrace over to the new ring buffer.

The detection is now fixed by using the ring buffer API.

When mmiotrace detects a buffer overrun, it will report the number of
lost events. People reading an mmiotrace log must know if something was
missed, otherwise the data may not make sense.

Signed-off-by: Pekka Paalanen <[email protected]>
---

Ingo, could these two patches still make it into .28?

kernel/trace/trace_mmiotrace.c | 16 +++++++---------
1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace_mmiotrace.c b/kernel/trace/trace_mmiotrace.c
index f284846..e62cbf7 100644
--- a/kernel/trace/trace_mmiotrace.c
+++ b/kernel/trace/trace_mmiotrace.c
@@ -18,12 +18,14 @@ struct header_iter {

static struct trace_array *mmio_trace_array;
static bool overrun_detected;
+static unsigned long prev_overruns;

static void mmio_reset_data(struct trace_array *tr)
{
int cpu;

overrun_detected = false;
+ prev_overruns = 0;
tr->time_start = ftrace_now(tr->cpu);

for_each_online_cpu(cpu)
@@ -128,16 +130,12 @@ static void mmio_close(struct trace_iterator *iter)

static unsigned long count_overruns(struct trace_iterator *iter)
{
- int cpu;
unsigned long cnt = 0;
-/* FIXME: */
-#if 0
- for_each_online_cpu(cpu) {
- cnt += iter->overrun[cpu];
- iter->overrun[cpu] = 0;
- }
-#endif
- (void)cpu;
+ unsigned long over = ring_buffer_overruns(iter->tr->buffer);
+
+ if (over > prev_overruns)
+ cnt = over - prev_overruns;
+ prev_overruns = over;
return cnt;
}

--
1.5.6.4

2008-11-23 19:25:31

by Pekka Paalanen

[permalink] [raw]

Subject: [PATCH 2/2] doc: update mmiotrace documentation

>From 9fac48964c6d109589c9d610a727f72d2a1835dc Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <[email protected]>
Date: Sun, 23 Nov 2008 20:59:56 +0200
Subject: [PATCH] doc: update mmiotrace documentation

Update to reflect the current state of the tracing framework:
- "none" tracer has been replaced by "nop" tracer
- tracing_enabled must be toggled when changing buffer size

Signed-off-by: Pekka Paalanen <[email protected]>
---
Documentation/tracers/mmiotrace.txt | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/tracers/mmiotrace.txt b/Documentation/tracers/mmiotrace.txt
index 5bbbe20..cde23b4 100644
--- a/Documentation/tracers/mmiotrace.txt
+++ b/Documentation/tracers/mmiotrace.txt
@@ -37,7 +37,7 @@ $ echo mmiotrace > /debug/tracing/current_tracer
$ cat /debug/tracing/trace_pipe > mydump.txt &
Start X or whatever.
$ echo "X is up" > /debug/tracing/trace_marker
-$ echo none > /debug/tracing/current_tracer
+$ echo nop > /debug/tracing/current_tracer
Check for lost events.

@@ -66,7 +66,7 @@ which action. It is recommended to place descriptive markers about what you
do.

Shut down mmiotrace (requires root privileges):
-$ echo none > /debug/tracing/current_tracer
+$ echo nop > /debug/tracing/current_tracer
The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
pressing ctrl+c.

@@ -81,7 +81,9 @@ are:
$ cat /debug/tracing/trace_entries
gives you a number. Approximately double this number and write it back, for
instance:
+$ echo 0 > /debug/tracing/tracing_enabled
$ echo 128000 > /debug/tracing/trace_entries
+$ echo 1 > /debug/tracing/tracing_enabled
Then start again from the top.

If you are doing a trace for a driver project, e.g. Nouveau, you should also
--
1.5.6.4

2008-11-23 19:34:36

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH 1/2] mmiotrace: fix buffer overrun detection

* Pekka Paalanen <[email protected]> wrote:

> From d5d72f3515ba3abffa317b985d7597718a024841 Mon Sep 17 00:00:00 2001
> From: Pekka Paalanen <[email protected]>
> Date: Sun, 23 Nov 2008 19:18:18 +0200
> Subject: [PATCH] mmiotrace: fix buffer overrun detection
>
> When ftrace framework moved to use the ring buffer facility, the buffer
> overrun detection was broken after 2.6.27 by commit
>
> commit 3928a8a2d98081d1bc3c0a84a2d70e29b90ecf1c
> Author: Steven Rostedt <[email protected]>
> Date: Mon Sep 29 23:02:41 2008 -0400
>
> ftrace: make work with new ring buffer
>
> This patch ports ftrace over to the new ring buffer.
>
> The detection is now fixed by using the ring buffer API.
>
> When mmiotrace detects a buffer overrun, it will report the number of
> lost events. People reading an mmiotrace log must know if something was
> missed, otherwise the data may not make sense.
>
> Signed-off-by: Pekka Paalanen <[email protected]>
> ---
>
> Ingo, could these two patches still make it into .28?

Sure - applied them to tip/tracing/urgent:

fb91ee6: tracing, doc: update mmiotrace documentation
7ee1768: x86, mmiotrace: fix buffer overrun detection

Does this make mmiotrace fully functional in .28 ?

Ingo

2008-11-23 20:00:14

by Pekka Paalanen

[permalink] [raw]

Subject: Re: [PATCH 1/2] mmiotrace: fix buffer overrun detection

On Sun, 23 Nov 2008 20:34:13 +0100
Ingo Molnar <[email protected]> wrote:

>
> * Pekka Paalanen <[email protected]> wrote:
>
> > Ingo, could these two patches still make it into .28?
>
> Sure - applied them to tip/tracing/urgent:
>
> fb91ee6: tracing, doc: update mmiotrace documentation
> 7ee1768: x86, mmiotrace: fix buffer overrun detection
>
> Does this make mmiotrace fully functional in .28 ?

Yes, as fully as it was originally meant to be. Again.
I don't know of any more urgent bugs.

It passes the testmmiotrace.ko test, and also notices overruns if they
happen. Still untested with the nvidia proprietary driver, though, or
with SMP, but it should work.

btw. the testmmiotrace.ko testing cannot be automated, since it
involves writing to "somewhere" in the PCI address space and is
therefore a bit dangerous.

Thanks.

--
Pekka Paalanen
http://www.iki.fi/pq/

2008-11-24 20:19:14

by Pekka Paalanen

[permalink] [raw]

Subject: Re: [PATCH 1/2] mmiotrace: fix buffer overrun detection

On Sun, 23 Nov 2008 20:34:13 +0100
Ingo Molnar <[email protected]> wrote:

> Sure - applied them to tip/tracing/urgent:
>
> fb91ee6: tracing, doc: update mmiotrace documentation
> 7ee1768: x86, mmiotrace: fix buffer overrun detection
>
> Does this make mmiotrace fully functional in .28 ?

Oh right, the busylooping bug when echoing into trace_marker,
I already forgot about that, since I couldn't reproduce it.
I guess there's nothing to do, until someone finds a way to
reproduce it.

I could speculate about buffer wrap-around issues or such, but
that requires some serious code reviewing. Btw. C-c'ing didn't
interrupt the shell while spinning. The spinning ended when I
switched the tracer or disabled it, IIRC.

(This is related to 2.6.28-rc6.)

On Sun, 23 Nov 2008, Pekka Paalanen wrote:

> There's also another issue, but I am not sure how to reproduce it.
> The symptom is, that
> echo whatever > /debug/tracing/trace_marker
> gets stuck in a loop, bash eats 100% CPU: 10% user, 90% system.
> I got that twice today, like this: after a fresh boot echoing 64, 200,
> 70 to trace_entries, switching between nop and mmiotrace tracers,
> echoing 0 or 1 to tracing_enabled, and running the testmmiotrace
> module. At some point the user marker got stuck.
>
> But then I tried to do that a third time by repeating some usage
> patterns I thought might trigger it, but it didn't happen.

--
Pekka Paalanen
http://www.iki.fi/pq/

2008-11-24 20:30:35

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 1/2] mmiotrace: fix buffer overrun detection

2008/11/24 Pekka Paalanen <[email protected]>:
> On Sun, 23 Nov 2008 20:34:13 +0100
> Ingo Molnar <[email protected]> wrote:
>
>> Sure - applied them to tip/tracing/urgent:
>>
>> fb91ee6: tracing, doc: update mmiotrace documentation
>> 7ee1768: x86, mmiotrace: fix buffer overrun detection
>>
>> Does this make mmiotrace fully functional in .28 ?
>
> Oh right, the busylooping bug when echoing into trace_marker,
> I already forgot about that, since I couldn't reproduce it.
> I guess there's nothing to do, until someone finds a way to
> reproduce it.

Hi Pekka.

Are you talking about this old bug that broke the pipe?
If so, you fixed it yourself partially with a patch that corrected
tracing_read_pipe.
Another origin was the none tracer that set tracer_enabled to 0.
But i has been fixed with the nop tracer.

Perhaps you are talking of something else?

2008-11-24 21:27:35

by Pekka Paalanen

[permalink] [raw]

Subject: Re: [PATCH 1/2] mmiotrace: fix buffer overrun detection

On Mon, 24 Nov 2008 21:30:24 +0100
"Fr?d?ric Weisbecker" <[email protected]> wrote:

> 2008/11/24 Pekka Paalanen <[email protected]>:
> >
> > Oh right, the busylooping bug when echoing into trace_marker,
> > I already forgot about that, since I couldn't reproduce it.
> > I guess there's nothing to do, until someone finds a way to
> > reproduce it.
>
>
> Hi Pekka.
>
> Are you talking about this old bug that broke the pipe?
> If so, you fixed it yourself partially with a patch that corrected
> tracing_read_pipe.
> Another origin was the none tracer that set tracer_enabled to 0.
> But i has been fixed with the nop tracer.
>
> Perhaps you are talking of something else?

Yeah, I'm talking about the shell spinning 90% in kernel and 10% in
user, I tried to explain it in the part you left out from the quote.
That is, doing 'echo foo > trace_marker' makes the shell/echo spin.
And this doesn't seem to be deterministic, or the exact triggering
sequence is too complex. I didn't notice any other adverse effects
than the shell getting stuck.

Maybe a corner case related to having only a single page in the
ring buffer... nah, more likely something unexpected under
trace_marker callbacks. Returning zero instead of error on
write(2) maybe... something that makes 'echo' retry again and
again, never succeeding or really failing.

--
Pekka Paalanen
http://www.iki.fi/pq/

2008-11-25 11:56:01

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 1/2] mmiotrace: fix buffer overrun detection

2008/11/24 Pekka Paalanen <[email protected]>:
> On Mon, 24 Nov 2008 21:30:24 +0100
> "Fr?d?ric Weisbecker" <[email protected]> wrote:
>
>> 2008/11/24 Pekka Paalanen <[email protected]>:
>> >
>> > Oh right, the busylooping bug when echoing into trace_marker,
>> > I already forgot about that, since I couldn't reproduce it.
>> > I guess there's nothing to do, until someone finds a way to
>> > reproduce it.
>>
>>
>> Hi Pekka.
>>
>> Are you talking about this old bug that broke the pipe?
>> If so, you fixed it yourself partially with a patch that corrected
>> tracing_read_pipe.
>> Another origin was the none tracer that set tracer_enabled to 0.
>> But i has been fixed with the nop tracer.
>>
>> Perhaps you are talking of something else?
>
> Yeah, I'm talking about the shell spinning 90% in kernel and 10% in
> user, I tried to explain it in the part you left out from the quote.
> That is, doing 'echo foo > trace_marker' makes the shell/echo spin.
> And this doesn't seem to be deterministic, or the exact triggering
> sequence is too complex. I didn't notice any other adverse effects
> than the shell getting stuck.
>
> Maybe a corner case related to having only a single page in the
> ring buffer... nah, more likely something unexpected under
> trace_marker callbacks. Returning zero instead of error on
> write(2) maybe... something that makes 'echo' retry again and
> again, never succeeding or really failing.
>
> --
> Pekka Paalanen
> http://www.iki.fi/pq/
>

Perhaps that:

int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
{
......
if (tracing_disabled)
return 0;

2008-11-25 11:59:33

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 1/2] mmiotrace: fix buffer overrun detection

2008/11/25 Fr?d?ric Weisbecker <[email protected]>:
> Perhaps that:
>
> int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
> {
> ......
> if (tracing_disabled)
> return 0;

Since this is the return value for tracing_mark_write(), it may better
return an error to avoid a spin when tracing
is disabled. -ENODEV I guess, or -EBUSY

The strange thing is that tracing_open_generic() shoud have return
-ENODEV in this case during the opening of trace_mark.

2008-11-25 17:13:48

by Pekka Paalanen

[permalink] [raw]

Subject: Re: [PATCH 1/2] mmiotrace: fix buffer overrun detection

On Tue, 25 Nov 2008 12:59:22 +0100
"Fr?d?ric Weisbecker" <[email protected]> wrote:

> 2008/11/25 Fr?d?ric Weisbecker <[email protected]>:
> > Perhaps that:
> >
> > int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
> > {
> > ......
> > if (tracing_disabled)
> > return 0;
>
> Since this is the return value for tracing_mark_write(), it may better
> return an error to avoid a spin when tracing
> is disabled. -ENODEV I guess, or -EBUSY
>
> The strange thing is that tracing_open_generic() shoud have return
> -ENODEV in this case during the opening of trace_mark.
>

This sounds worth investigating, thanks for the tip.
Hmm, maybe if I changed the current tracer while trace_enabled was
zero...

Anyway, this isn't too high on my priority list.

--
Pekka Paalanen
http://www.iki.fi/pq/