2010-07-22 07:21:01

by Gleb Natapov

[permalink] [raw]
Subject: perf annotate segfaults when source code has goto label that looks like hex number

The script below demonstrate this. The problem is in
hist_entry__parse_objdump_line():

if (*tmp) {
/*
* Parse hexa addresses followed by ':'
*/
line_ip = strtoull(tmp, &tmp2, 16);
if (*tmp2 != ':' || tmp == tmp2)
line_ip = -1;
}

strtoull() returns valid number when it gets line with label and following
test passes too. I can't think of a way to unambiguously distinguish between
label and valid rip. May be running objdump with --prefix-addresses will
help, but it may make other thing unambiguous.

=== script ===
cat > test.c << EOF
int main(int argc, char **argv)
{
int i;

while(1) {
i++;
if (i == 10000000)
goto add;
}
add:
return 0;
}
EOF

gcc -g test.c
perf record ./a.out
perf annotate
--
Gleb.


2010-07-22 14:34:02

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH] Re: perf annotate segfaults when source code has goto label that looks like hex number

Em Thu, Jul 22, 2010 at 10:20:44AM +0300, Gleb Natapov escreveu:
> strtoull() returns valid number when it gets line with label and following
> test passes too. I can't think of a way to unambiguously distinguish between
> label and valid rip. May be running objdump with --prefix-addresses will
> help, but it may make other thing unambiguous.

[root@emilia ~]# objdump --start-address=0x0000000000400474 --stop-address=0x0000000000400496 -dS ./a.out | grep -P ':\t'
400474: 55 push %rbp
400475: 48 89 e5 mov %rsp,%rbp
400478: 89 7d ec mov %edi,-0x14(%rbp)
40047b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
40047f: eb 01 jmp 400482 <main+0xe>
400481: 90 nop
400482: 83 45 fc 01 addl $0x1,-0x4(%rbp)
400486: 81 7d fc 80 96 98 00 cmpl $0x989680,-0x4(%rbp)
40048d: 75 f2 jne 400481 <main+0xd>
40048f: 90 nop
400490: b8 00 00 00 00 mov $0x0,%eax
400495: c9 leaveq
[root@emilia ~]# objdump --start-address=0x0000000000400474
--stop-address=0x0000000000400496 -dS ./a.out | grep ':$'
Disassembly of section .text:
0000000000400474 <main>:
add:
[root@emilia ~]#

Can you try the attached patch?

With it we get:


[root@emilia ~]# perf annotate

------------------------------------------------
Percent | Source code & Disassembly of a.out
------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 0000000000400474 <main>:
: int main(int argc, char **argv)
: {
0.00 : 400474: 55 push %rbp
0.00 : 400475: 48 89 e5 mov %rsp,%rbp
0.00 : 400478: 89 7d ec mov %edi,-0x14(%rbp)
0.00 : 40047b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
0.00 : 40047f: eb 01 jmp 400482 <main+0xe>
:
: while(1) {
: i++;
: if (i == 10000000)
: goto add;
: }
21.05 : 400481: 90 nop
: int main(int argc, char **argv)
: {
: int i;
:
: while(1) {
: i++;
0.00 : 400482: 83 45 fc 01 addl $0x1,-0x4(%rbp)
: if (i == 10000000)
15.79 : 400486: 81 7d fc 80 96 98 00 cmpl $0x989680,-0x4(%rbp)
63.16 : 40048d: 75 f2 jne 400481 <main+0xd>
: goto add;
0.00 : 40048f: 90 nop
: }
: add:
: return 0;
0.00 : 400490: b8 00 00 00 00 mov $0x0,%eax
: }
0.00 : 400495: c9 leaveq


Attachments:
(No filename) (2.90 kB)
annotate_fix.patch (452.00 B)
Download all attachments

2010-07-22 16:39:12

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH] Re: perf annotate segfaults when source code has goto label that looks like hex number

On Thu, Jul 22, 2010 at 11:33:45AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jul 22, 2010 at 10:20:44AM +0300, Gleb Natapov escreveu:
> > strtoull() returns valid number when it gets line with label and following
> > test passes too. I can't think of a way to unambiguously distinguish between
> > label and valid rip. May be running objdump with --prefix-addresses will
> > help, but it may make other thing unambiguous.
>
> [root@emilia ~]# objdump --start-address=0x0000000000400474 --stop-address=0x0000000000400496 -dS ./a.out | grep -P ':\t'
> 400474: 55 push %rbp
> 400475: 48 89 e5 mov %rsp,%rbp
> 400478: 89 7d ec mov %edi,-0x14(%rbp)
> 40047b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
> 40047f: eb 01 jmp 400482 <main+0xe>
> 400481: 90 nop
> 400482: 83 45 fc 01 addl $0x1,-0x4(%rbp)
> 400486: 81 7d fc 80 96 98 00 cmpl $0x989680,-0x4(%rbp)
> 40048d: 75 f2 jne 400481 <main+0xd>
> 40048f: 90 nop
> 400490: b8 00 00 00 00 mov $0x0,%eax
> 400495: c9 leaveq
> [root@emilia ~]# objdump --start-address=0x0000000000400474
> --stop-address=0x0000000000400496 -dS ./a.out | grep ':$'
> Disassembly of section .text:
> 0000000000400474 <main>:
> add:
> [root@emilia ~]#
>
> Can you try the attached patch?
>
I can, only later. But if I underhand your patch correctly perf will
still crash if there is a comment after the label on the same line.

> With it we get:
>
>
> [root@emilia ~]# perf annotate
>
> ------------------------------------------------
> Percent | Source code & Disassembly of a.out
> ------------------------------------------------
> :
> :
> :
> : Disassembly of section .text:
> :
> : 0000000000400474 <main>:
> : int main(int argc, char **argv)
> : {
> 0.00 : 400474: 55 push %rbp
> 0.00 : 400475: 48 89 e5 mov %rsp,%rbp
> 0.00 : 400478: 89 7d ec mov %edi,-0x14(%rbp)
> 0.00 : 40047b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
> 0.00 : 40047f: eb 01 jmp 400482 <main+0xe>
> :
> : while(1) {
> : i++;
> : if (i == 10000000)
> : goto add;
> : }
> 21.05 : 400481: 90 nop
> : int main(int argc, char **argv)
> : {
> : int i;
> :
> : while(1) {
> : i++;
> 0.00 : 400482: 83 45 fc 01 addl $0x1,-0x4(%rbp)
> : if (i == 10000000)
> 15.79 : 400486: 81 7d fc 80 96 98 00 cmpl $0x989680,-0x4(%rbp)
> 63.16 : 40048d: 75 f2 jne 400481 <main+0xd>
> : goto add;
> 0.00 : 40048f: 90 nop
> : }
> : add:
> : return 0;
> 0.00 : 400490: b8 00 00 00 00 mov $0x0,%eax
> : }
> 0.00 : 400495: c9 leaveq

> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 699cf81..e3486d5 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -976,7 +976,7 @@ static int hist_entry__parse_objdump_line(struct hist_entry *self, FILE *file,
> * Parse hexa addresses followed by ':'
> */
> line_ip = strtoull(tmp, &tmp2, 16);
> - if (*tmp2 != ':' || tmp == tmp2)
> + if (*tmp2 != ':' || tmp == tmp2 || tmp2[1] == '\0')
> line_ip = -1;
> }
>


--
Gleb.

2010-07-22 16:47:22

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] Re: perf annotate segfaults when source code has goto label that looks like hex number

Em Thu, Jul 22, 2010 at 07:38:51PM +0300, Gleb Natapov escreveu:
> On Thu, Jul 22, 2010 at 11:33:45AM -0300, Arnaldo Carvalho de Melo wrote:
> > Can you try the attached patch?
> >
> I can, only later. But if I underhand your patch correctly perf will
> still crash if there is a comment after the label on the same line.

Possibly, will try that now.

- Arnaldo

2010-07-22 16:52:32

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] Re: perf annotate segfaults when source code has goto label that looks like hex number

Em Thu, Jul 22, 2010 at 01:47:11PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Jul 22, 2010 at 07:38:51PM +0300, Gleb Natapov escreveu:
> > On Thu, Jul 22, 2010 at 11:33:45AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Can you try the attached patch?
> > >
> > I can, only later. But if I underhand your patch correctly perf will
> > still crash if there is a comment after the label on the same line.
>
> Possibly, will try that now.

It could be a comment of play code, like:

[root@emilia ~]# cat test.c
int main(int argc, char **argv)
{
int i;

while(1) {
i++;
if (i == 10000000)
goto add;
}
add: return 0;
}
[root@emilia ~]#

back to the drawing board :-\

- Arnaldo

2010-07-22 17:05:55

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH v2] Re: perf annotate segfaults when source code has goto label that looks like hex number

Em Thu, Jul 22, 2010 at 01:52:22PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Jul 22, 2010 at 01:47:11PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, Jul 22, 2010 at 07:38:51PM +0300, Gleb Natapov escreveu:
> > > On Thu, Jul 22, 2010 at 11:33:45AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Can you try the attached patch?
> > > >
> > > I can, only later. But if I underhand your patch correctly perf will
> > > still crash if there is a comment after the label on the same line.
> >
> > Possibly, will try that now.
>
> It could be a comment of play code, like:
>
> [root@emilia ~]# cat test.c
> int main(int argc, char **argv)
> {
> int i;
>
> while(1) {
> i++;
> if (i == 10000000)
> goto add;
> }
> add: return 0;
> }
> [root@emilia ~]#
>
> back to the drawing board :-\

What about this one?

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 699cf81..96e9044 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -976,13 +976,15 @@ static int hist_entry__parse_objdump_line(struct hist_entry *self, FILE *file,
* Parse hexa addresses followed by ':'
*/
line_ip = strtoull(tmp, &tmp2, 16);
- if (*tmp2 != ':' || tmp == tmp2)
+ if (*tmp2 != ':' || tmp == tmp2 || tmp2[1] == '\0')
line_ip = -1;
}

if (line_ip != -1) {
u64 start = map__rip_2objdump(self->ms.map, sym->start);
offset = line_ip - start;
+ if (offset < 0 || (u64)line_ip > sym->end)
+ offset = -1;
}

objdump_line = objdump_line__new(offset, line);

2010-07-22 18:06:00

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2] Re: perf annotate segfaults when source code has goto label that looks like hex number

On Thu, Jul 22, 2010 at 02:05:42PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jul 22, 2010 at 01:52:22PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, Jul 22, 2010 at 01:47:11PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, Jul 22, 2010 at 07:38:51PM +0300, Gleb Natapov escreveu:
> > > > On Thu, Jul 22, 2010 at 11:33:45AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > Can you try the attached patch?
> > > > >
> > > > I can, only later. But if I underhand your patch correctly perf will
> > > > still crash if there is a comment after the label on the same line.
> > >
> > > Possibly, will try that now.
> >
> > It could be a comment of play code, like:
> >
> > [root@emilia ~]# cat test.c
> > int main(int argc, char **argv)
> > {
> > int i;
> >
> > while(1) {
> > i++;
> > if (i == 10000000)
> > goto add;
> > }
> > add: return 0;
> > }
> > [root@emilia ~]#
> >
> > back to the drawing board :-\
>
> What about this one?
>
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 699cf81..96e9044 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -976,13 +976,15 @@ static int hist_entry__parse_objdump_line(struct hist_entry *self, FILE *file,
> * Parse hexa addresses followed by ':'
> */
> line_ip = strtoull(tmp, &tmp2, 16);
> - if (*tmp2 != ':' || tmp == tmp2)
> + if (*tmp2 != ':' || tmp == tmp2 || tmp2[1] == '\0')
> line_ip = -1;
> }
>
> if (line_ip != -1) {
> u64 start = map__rip_2objdump(self->ms.map, sym->start);
> offset = line_ip - start;
> + if (offset < 0 || (u64)line_ip > sym->end)
> + offset = -1;
> }
>
This part is good idea anyway. Even if label will be interpreted as ip
perf at least will not crash. It may miss-report something if check will
accidentally succeed though.

> objdump_line = objdump_line__new(offset, line);

--
Gleb.

2010-07-22 19:11:41

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] Re: perf annotate segfaults when source code has goto label that looks like hex number

Em Thu, Jul 22, 2010 at 09:05:38PM +0300, Gleb Natapov escreveu:
> On Thu, Jul 22, 2010 at 02:05:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Jul 22, 2010 at 01:52:22PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, Jul 22, 2010 at 01:47:11PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > It could be a comment of play code, like:

> > > while(1) {
> > > if (++i == 10000000)
> > > goto add;
> > > }
> > > add: return 0;

> > What about this one?

> > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> > @@ -976,13 +976,15 @@ static int hist_entry__parse_objdump_line(struct hist_entry *self, FILE *file,
> > if (line_ip != -1) {
> > u64 start = map__rip_2objdump(self->ms.map, sym->start);
> > offset = line_ip - start;
> > + if (offset < 0 || (u64)line_ip > sym->end)
> > + offset = -1;

> This part is good idea anyway. Even if label will be interpreted as ip
> perf at least will not crash. It may miss-report something if check will
> accidentally succeed though.

Yeah, we can possibly find a label which is a valid hex number and that
falls inside the address range, but with what we have in objdump this
seems to be the best we can have, I'll commit this.

- Arnaldo

2010-07-22 19:16:53

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2] Re: perf annotate segfaults when source code has goto label that looks like hex number

On Thu, Jul 22, 2010 at 04:11:32PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jul 22, 2010 at 09:05:38PM +0300, Gleb Natapov escreveu:
> > On Thu, Jul 22, 2010 at 02:05:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Jul 22, 2010 at 01:52:22PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Thu, Jul 22, 2010 at 01:47:11PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > It could be a comment of play code, like:
>
> > > > while(1) {
> > > > if (++i == 10000000)
> > > > goto add;
> > > > }
> > > > add: return 0;
>
> > > What about this one?
>
> > > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> > > @@ -976,13 +976,15 @@ static int hist_entry__parse_objdump_line(struct hist_entry *self, FILE *file,
> > > if (line_ip != -1) {
> > > u64 start = map__rip_2objdump(self->ms.map, sym->start);
> > > offset = line_ip - start;
> > > + if (offset < 0 || (u64)line_ip > sym->end)
> > > + offset = -1;
>
> > This part is good idea anyway. Even if label will be interpreted as ip
> > perf at least will not crash. It may miss-report something if check will
> > accidentally succeed though.
>
> Yeah, we can possibly find a label which is a valid hex number and that
> falls inside the address range, but with what we have in objdump this
> seems to be the best we can have, I'll commit this.
>
Agree. Definitely better that what we have now. Thanks.

--
Gleb.

2010-07-23 12:11:45

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [tip:perf/urgent] perf annotate: Fix handling of goto labels that are valid hex numbers

Commit-ID: 70a7cb3b39994ff366ff100b46f9dc97b1510c0f
Gitweb: http://git.kernel.org/tip/70a7cb3b39994ff366ff100b46f9dc97b1510c0f
Author: Arnaldo Carvalho de Melo <[email protected]>
AuthorDate: Thu, 22 Jul 2010 14:04:13 -0300
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Thu, 22 Jul 2010 14:04:13 -0300

perf annotate: Fix handling of goto labels that are valid hex numbers

When parsing the objdump disassembly output we can have goto labels that
are valid hex numbers and thus get confused with lines with machine
code.

Handle the common case of a label that has nothing after it and other
cases where there is just source code by validating the resulting "ip".

It is still possible that we find goto labels that are in the function
address range, but only if they are located before the real address we
should be OK.

A change in the objdump output to have a clear marker separating
addresses from the disassembly would come handy, but we would still have
to deal with older versions.

Reported-by: Gleb Natapov <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Gleb Natapov <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/hist.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 699cf81..784ee0b 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -976,13 +976,17 @@ static int hist_entry__parse_objdump_line(struct hist_entry *self, FILE *file,
* Parse hexa addresses followed by ':'
*/
line_ip = strtoull(tmp, &tmp2, 16);
- if (*tmp2 != ':' || tmp == tmp2)
+ if (*tmp2 != ':' || tmp == tmp2 || tmp2[1] == '\0')
line_ip = -1;
}

if (line_ip != -1) {
- u64 start = map__rip_2objdump(self->ms.map, sym->start);
+ u64 start = map__rip_2objdump(self->ms.map, sym->start),
+ end = map__rip_2objdump(self->ms.map, sym->end);
+
offset = line_ip - start;
+ if (offset < 0 || (u64)line_ip > end)
+ offset = -1;
}

objdump_line = objdump_line__new(offset, line);

2010-08-02 09:02:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] Re: perf annotate segfaults when source code has goto label that looks like hex number

On Thu, 2010-07-22 at 16:11 -0300, Arnaldo Carvalho de Melo wrote:
> > > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> > > @@ -976,13 +976,15 @@ static int hist_entry__parse_objdump_line(struct hist_entry *self, FILE *file,
> > > if (line_ip != -1) {
> > > u64 start = map__rip_2objdump(self->ms.map, sym->start);
> > > offset = line_ip - start;
> > > + if (offset < 0 || (u64)line_ip > sym->end)
> > > + offset = -1;
>
> > This part is good idea anyway. Even if label will be interpreted as ip
> > perf at least will not crash. It may miss-report something if check will
> > accidentally succeed though.
>
> Yeah, we can possibly find a label which is a valid hex number and that
> falls inside the address range, but with what we have in objdump this
> seems to be the best we can have, I'll commit this.

Wouldn't it be better to re-write perf-annotate to not have wild monkey
sex with objdump and instead 'borrow' some of the objdump code to
generate the output ourselves? That way we don't rely on the output
syntax at all..

2010-08-02 14:52:32

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] Re: perf annotate segfaults when source code has goto label that looks like hex number

Em Mon, Aug 02, 2010 at 11:02:14AM +0200, Peter Zijlstra escreveu:
> On Thu, 2010-07-22 at 16:11 -0300, Arnaldo Carvalho de Melo wrote:
> > > > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> > > > @@ -976,13 +976,15 @@ static int hist_entry__parse_objdump_line(struct hist_entry *self, FILE *file,
> > > > if (line_ip != -1) {
> > > > u64 start = map__rip_2objdump(self->ms.map, sym->start);
> > > > offset = line_ip - start;
> > > > + if (offset < 0 || (u64)line_ip > sym->end)
> > > > + offset = -1;
> >
> > > This part is good idea anyway. Even if label will be interpreted as ip
> > > perf at least will not crash. It may miss-report something if check will
> > > accidentally succeed though.
> >
> > Yeah, we can possibly find a label which is a valid hex number and that
> > falls inside the address range, but with what we have in objdump this
> > seems to be the best we can have, I'll commit this.
>
> Wouldn't it be better to re-write perf-annotate to not have wild monkey
> sex with objdump and instead 'borrow' some of the objdump code to
> generate the output ourselves? That way we don't rely on the output
> syntax at all..

Right, the problem is that doing that seems like avoiding regular,
mostly satisfying sex with a known partner to having a month long orgy
blindfold. You can try, I'll pass for now.

- Arnaldo