2016-10-09 01:13:04

by Anton Blanchard

[permalink] [raw]
Subject: perf TUI fails with "failed to process type: 64"

Hi,

Updating to mainline as of last night, I started seeing the following
error when running the perf report TUI:

0x46068 [0x8]: failed to process type: 68

This event is just PERF_RECORD_FINISHED_ROUND:

0x46068 [0x8]: event: 68
.
. ... raw event: size 8 bytes
. 0000: 44 00 00 00 00 00 08 00 D.......

0x46068 [0x8]: PERF_RECORD_FINISHED_ROUND

Which of course is not our error. It took me a while to find the real
culprit:

14c00-14c00 g exc_virt_0x4c00_system_call

A zero length symbol, which __symbol__inc_addr_samples() barfs on:

if (addr < sym->start || addr >= sym->end) {
...
return -ERANGE;

Seems like we have 3 bugs here:

1. Output the real source of the error instead of PERF_RECORD_FINISHED_ROUND

2. Don't exit the TUI if we find a sample on a zero length symbol

3. Why do we have zero length symbols in the first place? Does the recent
ppc64 exception clean up have something to do with it?

Anton


2016-10-10 05:02:11

by Michael Ellerman

[permalink] [raw]
Subject: Re: perf TUI fails with "failed to process type: 64"

Anton Blanchard <[email protected]> writes:

> Hi,
>
> Updating to mainline as of last night, I started seeing the following
> error when running the perf report TUI:
>
> 0x46068 [0x8]: failed to process type: 68
>
> This event is just PERF_RECORD_FINISHED_ROUND:
>
> 0x46068 [0x8]: event: 68
> .
> . ... raw event: size 8 bytes
> . 0000: 44 00 00 00 00 00 08 00 D.......
>
> 0x46068 [0x8]: PERF_RECORD_FINISHED_ROUND
>
> Which of course is not our error. It took me a while to find the real
> culprit:
>
> 14c00-14c00 g exc_virt_0x4c00_system_call
^
What's this? The address? If so it's wrong?


> A zero length symbol, which __symbol__inc_addr_samples() barfs on:
>
> if (addr < sym->start || addr >= sym->end) {
> ...
> return -ERANGE;
>
> Seems like we have 3 bugs here:
...
>
> 3. Why do we have zero length symbols in the first place? Does the recent
> ppc64 exception clean up have something to do with it?

Seems likely. But I can't see why.

AFAICS we have never emitted a size for those symbols:

Old:
$ nm --print-size build/vmlinux | grep -w system_call_relon_pSeries
c000000000004c00 T system_call_relon_pSeries

New:
$ nm --print-size build/vmlinux | grep -w exc_virt_0x4c00_system_call
c000000000004c00 T exc_virt_0x4c00_system_call


It also doesn't look like we're emitting another symbol with the same
address, which has caused confusion in the past:

Old:
c000000000004c00 T exc_virt_0x4c00_system_call
c000000000004d00 T exc_virt_0x4d00_single_step

New:
c000000000004c00 T system_call_relon_pSeries
c000000000004d00 T single_step_relon_pSeries


So more digging required.

cheers

2016-10-10 05:59:33

by Nicholas Piggin

[permalink] [raw]
Subject: Re: perf TUI fails with "failed to process type: 64"

On Mon, 10 Oct 2016 16:02:07 +1100
Michael Ellerman <[email protected]> wrote:

> Anton Blanchard <[email protected]> writes:
>
> > Hi,
> >
> > Updating to mainline as of last night, I started seeing the following
> > error when running the perf report TUI:
> >
> > 0x46068 [0x8]: failed to process type: 68
> >
> > This event is just PERF_RECORD_FINISHED_ROUND:
> >
> > 0x46068 [0x8]: event: 68
> > .
> > . ... raw event: size 8 bytes
> > . 0000: 44 00 00 00 00 00 08 00 D.......
> >
> > 0x46068 [0x8]: PERF_RECORD_FINISHED_ROUND
> >
> > Which of course is not our error. It took me a while to find the real
> > culprit:
> >
> > 14c00-14c00 g exc_virt_0x4c00_system_call
> ^
> What's this? The address? If so it's wrong?
>
>
> > A zero length symbol, which __symbol__inc_addr_samples() barfs on:
> >
> > if (addr < sym->start || addr >= sym->end) {
> > ...
> > return -ERANGE;
> >
> > Seems like we have 3 bugs here:
> ...
> >
> > 3. Why do we have zero length symbols in the first place? Does the recent
> > ppc64 exception clean up have something to do with it?
>
> Seems likely. But I can't see why.
>
> AFAICS we have never emitted a size for those symbols:
>
> Old:
> $ nm --print-size build/vmlinux | grep -w system_call_relon_pSeries
> c000000000004c00 T system_call_relon_pSeries
>
> New:
> $ nm --print-size build/vmlinux | grep -w exc_virt_0x4c00_system_call
> c000000000004c00 T exc_virt_0x4c00_system_call
>
>
> It also doesn't look like we're emitting another symbol with the same
> address, which has caused confusion in the past:
>
> Old:
> c000000000004c00 T exc_virt_0x4c00_system_call
> c000000000004d00 T exc_virt_0x4d00_single_step
>
> New:
> c000000000004c00 T system_call_relon_pSeries
> c000000000004d00 T single_step_relon_pSeries
>
>
> So more digging required.

Yeah, strange. Maybe perf changes?

We were talking about adding size and type to the exception symbols
though.

2016-10-10 10:18:59

by Anton Blanchard

[permalink] [raw]
Subject: Re: perf TUI fails with "failed to process type: 64"

Hi Michael,

> > 14c00-14c00 g exc_virt_0x4c00_system_call
> ^
> What's this? The address? If so it's wrong?

Offset into the binary I think, there's one 64kB page of ELF gunk at
the start.

> Seems likely. But I can't see why.
>
> AFAICS we have never emitted a size for those symbols:
>
> Old:
> $ nm --print-size build/vmlinux | grep -w system_call_relon_pSeries
> c000000000004c00 T system_call_relon_pSeries
>
> New:
> $ nm --print-size build/vmlinux | grep -w exc_virt_0x4c00_system_call
> c000000000004c00 T exc_virt_0x4c00_system_call
>
>
> It also doesn't look like we're emitting another symbol with the same
> address, which has caused confusion in the past:
>
> Old:
> c000000000004c00 T exc_virt_0x4c00_system_call
> c000000000004d00 T exc_virt_0x4d00_single_step
>
> New:
> c000000000004c00 T system_call_relon_pSeries
> c000000000004d00 T single_step_relon_pSeries
>
> So more digging required.

Thanks for checking, it's starting to sound like a perf bug.

Anton

2016-10-10 10:39:29

by Michael Ellerman

[permalink] [raw]
Subject: Re: perf TUI fails with "failed to process type: 64"

Anton Blanchard <[email protected]> writes:
>> > 14c00-14c00 g exc_virt_0x4c00_system_call
>> ^
>> What's this? The address? If so it's wrong?
>
> Offset into the binary I think, there's one 64kB page of ELF gunk at
> the start.

Ah OK makes sense.

>> So more digging required.
>
> Thanks for checking, it's starting to sound like a perf bug.

Yeah possibly. I'll build an updated perf tomorrow and see if I can
reproduce.

As an aside, it looks like it would be easy to set the size with the new
macro. Does the patch below help at all?

The other thing Nick mentioned is we could set the type of those symbols
to function, though that's not entirely correct. But it might help perf
and probably wouldn't break anything else (that I can think of right now
anyway).

cheers


diff --git a/arch/powerpc/include/asm/head-64.h b/arch/powerpc/include/asm/head-64.h
index ab90c2fa1ea6..0e250d4eff44 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -122,6 +122,7 @@ name:
.error "Fixed entry overflow"; \
.abort; \
.endif; \
+ .size name, (. - name); \
. = ((end) - sname##_start); \



2016-11-21 06:28:57

by Anton Blanchard

[permalink] [raw]
Subject: Re: perf TUI fails with "failed to process type: 64"

Hi,

I forgot about the set of issues below. Michael had a suggested powerpc
fix for 3, but it it would be nice to fix the perf bugs in 1 and 2.

Anton
--

> Updating to mainline as of last night, I started seeing the following
> error when running the perf report TUI:
>
> 0x46068 [0x8]: failed to process type: 68
>
> This event is just PERF_RECORD_FINISHED_ROUND:
>
> 0x46068 [0x8]: event: 68
> .
> . ... raw event: size 8 bytes
> . 0000: 44 00 00 00 00 00 08 00
> D.......
>
> 0x46068 [0x8]: PERF_RECORD_FINISHED_ROUND
>
> Which of course is not our error. It took me a while to find the real
> culprit:
>
> 14c00-14c00 g exc_virt_0x4c00_system_call
>
> A zero length symbol, which __symbol__inc_addr_samples() barfs on:
>
> if (addr < sym->start || addr >= sym->end) {
> ...
> return -ERANGE;
>
> Seems like we have 3 bugs here:
>
> 1. Output the real source of the error instead of
> PERF_RECORD_FINISHED_ROUND
>
> 2. Don't exit the TUI if we find a sample on a zero length symbol
>
> 3. Why do we have zero length symbols in the first place? Does the
> recent ppc64 exception clean up have something to do with it?
>
> Anton

2016-11-22 08:41:40

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH] perf TUI: Don't throw error for zero length symbols

perf report (with TUI) exits with error when it finds a sample of zero
length symbol(i.e. addr == sym->start == sym->end). Actually these are
valid samples. Don't exit TUI and show report with such symbols.

Link: https://lkml.org/lkml/2016/10/8/189

Reported-by: Anton Blanchard <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/util/annotate.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index aeb5a44..430d039 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -593,7 +593,8 @@ static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map,

pr_debug3("%s: addr=%#" PRIx64 "\n", __func__, map->unmap_ip(map, addr));

- if (addr < sym->start || addr >= sym->end) {
+ if ((addr < sym->start || addr >= sym->end) &&
+ (addr != sym->end || sym->start != sym->end)) {
pr_debug("%s(%d): ERANGE! sym->name=%s, start=%#" PRIx64 ", addr=%#" PRIx64 ", end=%#" PRIx64 "\n",
__func__, __LINE__, sym->name, sym->start, addr, sym->end);
return -ERANGE;
--
1.8.3.1

2016-11-22 08:49:49

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH] perf TUI: Don't throw error for zero length symbols

Hi Anton,

On Tuesday 22 November 2016 02:10 PM, Ravi Bangoria wrote:
> perf report (with TUI) exits with error when it finds a sample of zero
> length symbol(i.e. addr == sym->start == sym->end). Actually these are
> valid samples. Don't exit TUI and show report with such symbols.
>
> Link: https://lkml.org/lkml/2016/10/8/189

This will solve 2nd issue.

3rd doesn't sound simple to fix. I tried to fix it by replacing pr_debug to
pr_err when addr is out of symbol address range. But error message will
get overwritten every time when subsequent pr_err gets executed.

Arnaldo, any suggestions?

-Ravi

> Reported-by: Anton Blanchard <[email protected]>
> Signed-off-by: Ravi Bangoria <[email protected]>
> ---
> tools/perf/util/annotate.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index aeb5a44..430d039 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -593,7 +593,8 @@ static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map,
>
> pr_debug3("%s: addr=%#" PRIx64 "\n", __func__, map->unmap_ip(map, addr));
>
> - if (addr < sym->start || addr >= sym->end) {
> + if ((addr < sym->start || addr >= sym->end) &&
> + (addr != sym->end || sym->start != sym->end)) {
> pr_debug("%s(%d): ERANGE! sym->name=%s, start=%#" PRIx64 ", addr=%#" PRIx64 ", end=%#" PRIx64 "\n",
> __func__, __LINE__, sym->name, sym->start, addr, sym->end);
> return -ERANGE;

2016-11-22 08:56:30

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH] perf TUI: Don't throw error for zero length symbols



On Tuesday 22 November 2016 02:19 PM, Ravi Bangoria wrote:
> Hi Anton,
>
> On Tuesday 22 November 2016 02:10 PM, Ravi Bangoria wrote:
>> perf report (with TUI) exits with error when it finds a sample of zero
>> length symbol(i.e. addr == sym->start == sym->end). Actually these are
>> valid samples. Don't exit TUI and show report with such symbols.
>>
>> Link: https://lkml.org/lkml/2016/10/8/189
> This will solve 2nd issue.
>
> 3rd doesn't sound simple to fix.

Sorry, I mean 1st doesn't sound ...

> I tried to fix it by replacing pr_debug to
> pr_err when addr is out of symbol address range. But error message will
> get overwritten every time when subsequent pr_err gets executed.
>
> Arnaldo, any suggestions?
>
> -Ravi
>
>> Reported-by: Anton Blanchard <[email protected]>
>> Signed-off-by: Ravi Bangoria <[email protected]>
>> ---
>> tools/perf/util/annotate.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
>> index aeb5a44..430d039 100644
>> --- a/tools/perf/util/annotate.c
>> +++ b/tools/perf/util/annotate.c
>> @@ -593,7 +593,8 @@ static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map,
>>
>> pr_debug3("%s: addr=%#" PRIx64 "\n", __func__, map->unmap_ip(map, addr));
>>
>> - if (addr < sym->start || addr >= sym->end) {
>> + if ((addr < sym->start || addr >= sym->end) &&
>> + (addr != sym->end || sym->start != sym->end)) {
>> pr_debug("%s(%d): ERANGE! sym->name=%s, start=%#" PRIx64 ", addr=%#" PRIx64 ", end=%#" PRIx64 "\n",
>> __func__, __LINE__, sym->name, sym->start, addr, sym->end);
>> return -ERANGE;

2016-12-16 10:40:00

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH] perf TUI: Don't throw error for zero length symbols

Hi Arnaldo,

Can you please pick this up if it looks good?

-Ravi

On Tuesday 22 November 2016 02:10 PM, Ravi Bangoria wrote:
> perf report (with TUI) exits with error when it finds a sample of zero
> length symbol(i.e. addr == sym->start == sym->end). Actually these are
> valid samples. Don't exit TUI and show report with such symbols.
>
> Link: https://lkml.org/lkml/2016/10/8/189
>
> Reported-by: Anton Blanchard <[email protected]>
> Signed-off-by: Ravi Bangoria <[email protected]>
> ---
> tools/perf/util/annotate.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index aeb5a44..430d039 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -593,7 +593,8 @@ static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map,
>
> pr_debug3("%s: addr=%#" PRIx64 "\n", __func__, map->unmap_ip(map, addr));
>
> - if (addr < sym->start || addr >= sym->end) {
> + if ((addr < sym->start || addr >= sym->end) &&
> + (addr != sym->end || sym->start != sym->end)) {
> pr_debug("%s(%d): ERANGE! sym->name=%s, start=%#" PRIx64 ", addr=%#" PRIx64 ", end=%#" PRIx64 "\n",
> __func__, __LINE__, sym->name, sym->start, addr, sym->end);
> return -ERANGE;

2016-12-16 20:28:17

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH] perf TUI: Don't throw error for zero length symbols

Hi Ravi,

> > perf report (with TUI) exits with error when it finds a sample of
> > zero length symbol(i.e. addr == sym->start == sym->end). Actually
> > these are valid samples. Don't exit TUI and show report with such
> > symbols.
> >
> > Link: https://lkml.org/lkml/2016/10/8/189

You can add:

Tested-by: Anton Blanchard <[email protected]>

Also, since this issue makes perf report pretty much useless on
ppc64, can we mark it for stable@, at least to get it into 4.9 where
the ppc64 kernel changes that triggered this appeared?

Anton

> > Reported-by: Anton Blanchard <[email protected]>
> > Signed-off-by: Ravi Bangoria <[email protected]>
> > ---
> > tools/perf/util/annotate.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> > index aeb5a44..430d039 100644
> > --- a/tools/perf/util/annotate.c
> > +++ b/tools/perf/util/annotate.c
> > @@ -593,7 +593,8 @@ static int __symbol__inc_addr_samples(struct
> > symbol *sym, struct map *map,
> >
> > pr_debug3("%s: addr=%#" PRIx64 "\n", __func__,
> > map->unmap_ip(map, addr));
> >
> > - if (addr < sym->start || addr >= sym->end) {
> > + if ((addr < sym->start || addr >= sym->end) &&
> > + (addr != sym->end || sym->start != sym->end)) {
> > pr_debug("%s(%d): ERANGE! sym->name=%s, start=%#"
> > PRIx64 ", addr=%#" PRIx64 ", end=%#" PRIx64 "\n", __func__,
> > __LINE__, sym->name, sym->start, addr, sym->end); return -ERANGE;
>

2016-12-19 18:15:31

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] perf TUI: Don't throw error for zero length symbols

Em Sat, Dec 17, 2016 at 07:27:54AM +1100, Anton Blanchard escreveu:
> Hi Ravi,
>
> > > perf report (with TUI) exits with error when it finds a sample of
> > > zero length symbol(i.e. addr == sym->start == sym->end). Actually
> > > these are valid samples. Don't exit TUI and show report with such
> > > symbols.
> > >
> > > Link: https://lkml.org/lkml/2016/10/8/189
>
> You can add:
>
> Tested-by: Anton Blanchard <[email protected]>
>
> Also, since this issue makes perf report pretty much useless on
> ppc64, can we mark it for stable@, at least to get it into 4.9 where
> the ppc64 kernel changes that triggered this appeared?

Ok, added it, will be in the next pull req to Ingo.

- Arnaldo

> Anton
>
> > > Reported-by: Anton Blanchard <[email protected]>
> > > Signed-off-by: Ravi Bangoria <[email protected]>
> > > ---
> > > tools/perf/util/annotate.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> > > index aeb5a44..430d039 100644
> > > --- a/tools/perf/util/annotate.c
> > > +++ b/tools/perf/util/annotate.c
> > > @@ -593,7 +593,8 @@ static int __symbol__inc_addr_samples(struct
> > > symbol *sym, struct map *map,
> > >
> > > pr_debug3("%s: addr=%#" PRIx64 "\n", __func__,
> > > map->unmap_ip(map, addr));
> > >
> > > - if (addr < sym->start || addr >= sym->end) {
> > > + if ((addr < sym->start || addr >= sym->end) &&
> > > + (addr != sym->end || sym->start != sym->end)) {
> > > pr_debug("%s(%d): ERANGE! sym->name=%s, start=%#"
> > > PRIx64 ", addr=%#" PRIx64 ", end=%#" PRIx64 "\n", __func__,
> > > __LINE__, sym->name, sym->start, addr, sym->end); return -ERANGE;
> >

Subject: [tip:perf/urgent] perf annotate: Don't throw error for zero length symbols

Commit-ID: edee44be59190bf22d5c6e521f3852b7ff16862f
Gitweb: http://git.kernel.org/tip/edee44be59190bf22d5c6e521f3852b7ff16862f
Author: Ravi Bangoria <[email protected]>
AuthorDate: Tue, 22 Nov 2016 14:10:50 +0530
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Tue, 20 Dec 2016 12:00:32 -0300

perf annotate: Don't throw error for zero length symbols

'perf report --tui' exits with error when it finds a sample of zero
length symbol (i.e. addr == sym->start == sym->end). Actually these are
valid samples. Don't exit TUI and show report with such symbols.

Reported-and-Tested-by: Anton Blanchard <[email protected]>
Link: https://lkml.org/lkml/2016/10/8/189
Signed-off-by: Ravi Bangoria <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Riyder <[email protected]>
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected] # v4.9+
Link: http://lkml.kernel.org/r/1479804050-5028-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/annotate.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index c81a395..06cc04e 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -647,7 +647,8 @@ static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map,

pr_debug3("%s: addr=%#" PRIx64 "\n", __func__, map->unmap_ip(map, addr));

- if (addr < sym->start || addr >= sym->end) {
+ if ((addr < sym->start || addr >= sym->end) &&
+ (addr != sym->end || sym->start != sym->end)) {
pr_debug("%s(%d): ERANGE! sym->name=%s, start=%#" PRIx64 ", addr=%#" PRIx64 ", end=%#" PRIx64 "\n",
__func__, __LINE__, sym->name, sym->start, addr, sym->end);
return -ERANGE;