2023-08-01 22:10:05

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 1/1] Revert "perf report: Append inlines to non-DWARF callchains"

Hi Artem,

Can you please double check this? I reproduced with:

git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
build it
perf record -a -g sleep 5s
perf report

Do you get the same slowness and then reverting it, i.e. just
going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
without the inlines in the callchains?

- Arnaldo

----

This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.

The tests were made with a specific workload, further tests on a
recently updated fedora 38 system with a system wide perf.data file
shows 'perf report' taking excessive time, so lets revert this until a
full investigation and improvement on the addr2line support code is
made.

Cc: Andrii Nakryiko <[email protected]>
Cc: Artem Savkov <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/machine.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -45,7 +45,6 @@

static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
struct thread *th, bool lock);
-static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);

static struct dso *machine__kernel_dso(struct machine *machine)
{
@@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
ms.maps = maps__get(al.maps);
ms.map = map__get(al.map);
ms.sym = al.sym;
-
- if (!branch && append_inlines(cursor, &ms, ip) == 0)
- goto out;
-
srcline = callchain_srcline(&ms, al.addr);
err = callchain_cursor_append(cursor, ip, &ms,
branch, flags, nr_loop_iter,
--
2.41.0



2023-08-02 08:39:35

by Artem Savkov

[permalink] [raw]
Subject: Re: [PATCH 1/1] Revert "perf report: Append inlines to non-DWARF callchains"

Hi Arnaldo,

On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
> Hi Artem,
>
> Can you please double check this? I reproduced with:
>
> git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
> build it
> perf record -a -g sleep 5s
> perf report
>
> Do you get the same slowness and then reverting it, i.e. just
> going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
> without the inlines in the callchains?

With a simple test like this I definitely get a slowdown, but not sure
if it can be called excessive.

Below are the times I got by running 'time perf report' and hitting 'q'
during load so that it quits as soon as it is loads up. Tested on a
freshly updated fedora 38.

For 'perf record -a -g sleep 60' (Event count (approx.): 774055090):

with inlines:
$ time ./perf report

real 0m1.477s
user 0m1.324s
sys 0m0.147s

without inlines:
$ time ./perf report

real 0m1.349s
user 0m1.232s
sys 0m0.111s

For 'perf record -a -g sleep 5' (Event count (approx.): 90179399):

with inlines:
$ time ./perf report

real 0m0.657s
user 0m0.555s
sys 0m0.099s

without inlines:
$ time ./perf report

real 0m0.559s
user 0m0.498s
sys 0m0.060s


> - Arnaldo
>
> ----
>
> This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
>
> The tests were made with a specific workload, further tests on a
> recently updated fedora 38 system with a system wide perf.data file
> shows 'perf report' taking excessive time, so lets revert this until a
> full investigation and improvement on the addr2line support code is
> made.
>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Artem Savkov <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Ian Rogers <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Milian Wolff <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> ---
> tools/perf/util/machine.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -45,7 +45,6 @@
>
> static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
> struct thread *th, bool lock);
> -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
>
> static struct dso *machine__kernel_dso(struct machine *machine)
> {
> @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
> ms.maps = maps__get(al.maps);
> ms.map = map__get(al.map);
> ms.sym = al.sym;
> -
> - if (!branch && append_inlines(cursor, &ms, ip) == 0)
> - goto out;
> -
> srcline = callchain_srcline(&ms, al.addr);
> err = callchain_cursor_append(cursor, ip, &ms,
> branch, flags, nr_loop_iter,
> --
> 2.41.0
>

--
Artem


2023-08-07 13:08:56

by Artem Savkov

[permalink] [raw]
Subject: Re: [PATCH 1/1] Revert "perf report: Append inlines to non-DWARF callchains"

On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
> Hi Arnaldo,
>
> On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > Hi Artem,
> >
> > Can you please double check this? I reproduced with:
> >
> > git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
> > build it
> > perf record -a -g sleep 5s
> > perf report
> >
> > Do you get the same slowness and then reverting it, i.e. just
> > going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
> > without the inlines in the callchains?
>
> With a simple test like this I definitely get a slowdown, but not sure
> if it can be called excessive.
>
> Below are the times I got by running 'time perf report' and hitting 'q'
> during load so that it quits as soon as it is loads up. Tested on a
> freshly updated fedora 38.

My bad, I had wrong debuginfo installed for the kernel I tested. I can
reproduce it with the correct one. Looks like vmlinux is just too much
for addr2line. Maybe we can skip it but leave other inlines in, like so:

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 11de3ca8d4fa7..fef309cd401f7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
ms.map = map__get(al.map);
ms.sym = al.sym;

- if (!branch && append_inlines(cursor, &ms, ip) == 0)
+ if (!branch && ms.map && ms.map->dso &&
+ strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
+ append_inlines(cursor, &ms, ip) == 0)
goto out;

srcline = callchain_srcline(&ms, al.addr);

> > - Arnaldo
> >
> > ----
> >
> > This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
> >
> > The tests were made with a specific workload, further tests on a
> > recently updated fedora 38 system with a system wide perf.data file
> > shows 'perf report' taking excessive time, so lets revert this until a
> > full investigation and improvement on the addr2line support code is
> > made.
> >
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Artem Savkov <[email protected]>
> > Cc: Namhyung Kim <[email protected]>
> > Cc: Adrian Hunter <[email protected]>
> > Cc: Alexander Shishkin <[email protected]>
> > Cc: Ian Rogers <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Jiri Olsa <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Cc: Masami Hiramatsu <[email protected]>
> > Cc: Milian Wolff <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> > ---
> > tools/perf/util/machine.c | 5 -----
> > 1 file changed, 5 deletions(-)
> >
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -45,7 +45,6 @@
> >
> > static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
> > struct thread *th, bool lock);
> > -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> >
> > static struct dso *machine__kernel_dso(struct machine *machine)
> > {
> > @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
> > ms.maps = maps__get(al.maps);
> > ms.map = map__get(al.map);
> > ms.sym = al.sym;
> > -
> > - if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > - goto out;
> > -
> > srcline = callchain_srcline(&ms, al.addr);
> > err = callchain_cursor_append(cursor, ip, &ms,
> > branch, flags, nr_loop_iter,
> > --
> > 2.41.0
> >
>
> --
> Artem

--
Artem


2023-08-07 14:20:52

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 1/1] Revert "perf report: Append inlines to non-DWARF callchains"

Em Mon, Aug 07, 2023 at 01:00:08PM +0200, Artem Savkov escreveu:
> On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
> > Hi Arnaldo,
> >
> > On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Hi Artem,
> > >
> > > Can you please double check this? I reproduced with:
> > >
> > > git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
> > > build it
> > > perf record -a -g sleep 5s
> > > perf report
> > >
> > > Do you get the same slowness and then reverting it, i.e. just
> > > going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
> > > without the inlines in the callchains?
> >
> > With a simple test like this I definitely get a slowdown, but not sure
> > if it can be called excessive.
> >
> > Below are the times I got by running 'time perf report' and hitting 'q'
> > during load so that it quits as soon as it is loads up. Tested on a
> > freshly updated fedora 38.
>
> My bad, I had wrong debuginfo installed for the kernel I tested. I can
> reproduce it with the correct one. Looks like vmlinux is just too much
> for addr2line. Maybe we can skip it but leave other inlines in, like so:

That is a possibilit, and probably we could make it cheaper by looking
at the cpumode, avoiding calling addr2line when we didn't makage to
resolve the symbol, etc.

We also may want to have this as an option that has to be explicitely
enabled, like --resolve-inlines, as this will add overhead no matter if
we stop calling addr2line and do it more efficiently, etc.

Fact is, we're late in the 6.5 schedule, so the best thing now is to
just revert the patch and then try again later, ok?

- Arnaldo

> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 11de3ca8d4fa7..fef309cd401f7 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
> ms.map = map__get(al.map);
> ms.sym = al.sym;
>
> - if (!branch && append_inlines(cursor, &ms, ip) == 0)
> + if (!branch && ms.map && ms.map->dso &&
> + strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
> + append_inlines(cursor, &ms, ip) == 0)
> goto out;
>
> srcline = callchain_srcline(&ms, al.addr);
>
> > > - Arnaldo
> > >
> > > ----
> > >
> > > This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
> > >
> > > The tests were made with a specific workload, further tests on a
> > > recently updated fedora 38 system with a system wide perf.data file
> > > shows 'perf report' taking excessive time, so lets revert this until a
> > > full investigation and improvement on the addr2line support code is
> > > made.
> > >
> > > Cc: Andrii Nakryiko <[email protected]>
> > > Cc: Artem Savkov <[email protected]>
> > > Cc: Namhyung Kim <[email protected]>
> > > Cc: Adrian Hunter <[email protected]>
> > > Cc: Alexander Shishkin <[email protected]>
> > > Cc: Ian Rogers <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: Jiri Olsa <[email protected]>
> > > Cc: Mark Rutland <[email protected]>
> > > Cc: Masami Hiramatsu <[email protected]>
> > > Cc: Milian Wolff <[email protected]>
> > > Cc: Peter Zijlstra <[email protected]>
> > > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> > > ---
> > > tools/perf/util/machine.c | 5 -----
> > > 1 file changed, 5 deletions(-)
> > >
> > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
> > > --- a/tools/perf/util/machine.c
> > > +++ b/tools/perf/util/machine.c
> > > @@ -45,7 +45,6 @@
> > >
> > > static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
> > > struct thread *th, bool lock);
> > > -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> > >
> > > static struct dso *machine__kernel_dso(struct machine *machine)
> > > {
> > > @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
> > > ms.maps = maps__get(al.maps);
> > > ms.map = map__get(al.map);
> > > ms.sym = al.sym;
> > > -
> > > - if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > > - goto out;
> > > -
> > > srcline = callchain_srcline(&ms, al.addr);
> > > err = callchain_cursor_append(cursor, ip, &ms,
> > > branch, flags, nr_loop_iter,
> > > --
> > > 2.41.0
> > >
> >
> > --
> > Artem
>
> --
> Artem
>

--

- Arnaldo

2023-08-07 16:15:04

by Artem Savkov

[permalink] [raw]
Subject: Re: [PATCH 1/1] Revert "perf report: Append inlines to non-DWARF callchains"

On Mon, Aug 07, 2023 at 10:34:44AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Aug 07, 2023 at 01:00:08PM +0200, Artem Savkov escreveu:
> > On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
> > > Hi Arnaldo,
> > >
> > > On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Hi Artem,
> > > >
> > > > Can you please double check this? I reproduced with:
> > > >
> > > > git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
> > > > build it
> > > > perf record -a -g sleep 5s
> > > > perf report
> > > >
> > > > Do you get the same slowness and then reverting it, i.e. just
> > > > going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
> > > > without the inlines in the callchains?
> > >
> > > With a simple test like this I definitely get a slowdown, but not sure
> > > if it can be called excessive.
> > >
> > > Below are the times I got by running 'time perf report' and hitting 'q'
> > > during load so that it quits as soon as it is loads up. Tested on a
> > > freshly updated fedora 38.
> >
> > My bad, I had wrong debuginfo installed for the kernel I tested. I can
> > reproduce it with the correct one. Looks like vmlinux is just too much
> > for addr2line. Maybe we can skip it but leave other inlines in, like so:
>
> That is a possibilit, and probably we could make it cheaper by looking
> at the cpumode, avoiding calling addr2line when we didn't makage to
> resolve the symbol, etc.
>
> We also may want to have this as an option that has to be explicitely
> enabled, like --resolve-inlines, as this will add overhead no matter if
> we stop calling addr2line and do it more efficiently, etc.

Sounds good, I'll look into it.

> Fact is, we're late in the 6.5 schedule, so the best thing now is to
> just revert the patch and then try again later, ok?

Yes, sure.

> - Arnaldo
>
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 11de3ca8d4fa7..fef309cd401f7 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
> > ms.map = map__get(al.map);
> > ms.sym = al.sym;
> >
> > - if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > + if (!branch && ms.map && ms.map->dso &&
> > + strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
> > + append_inlines(cursor, &ms, ip) == 0)
> > goto out;
> >
> > srcline = callchain_srcline(&ms, al.addr);
> >
> > > > - Arnaldo
> > > >
> > > > ----
> > > >
> > > > This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
> > > >
> > > > The tests were made with a specific workload, further tests on a
> > > > recently updated fedora 38 system with a system wide perf.data file
> > > > shows 'perf report' taking excessive time, so lets revert this until a
> > > > full investigation and improvement on the addr2line support code is
> > > > made.
> > > >
> > > > Cc: Andrii Nakryiko <[email protected]>
> > > > Cc: Artem Savkov <[email protected]>
> > > > Cc: Namhyung Kim <[email protected]>
> > > > Cc: Adrian Hunter <[email protected]>
> > > > Cc: Alexander Shishkin <[email protected]>
> > > > Cc: Ian Rogers <[email protected]>
> > > > Cc: Ingo Molnar <[email protected]>
> > > > Cc: Jiri Olsa <[email protected]>
> > > > Cc: Mark Rutland <[email protected]>
> > > > Cc: Masami Hiramatsu <[email protected]>
> > > > Cc: Milian Wolff <[email protected]>
> > > > Cc: Peter Zijlstra <[email protected]>
> > > > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> > > > ---
> > > > tools/perf/util/machine.c | 5 -----
> > > > 1 file changed, 5 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
> > > > --- a/tools/perf/util/machine.c
> > > > +++ b/tools/perf/util/machine.c
> > > > @@ -45,7 +45,6 @@
> > > >
> > > > static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
> > > > struct thread *th, bool lock);
> > > > -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> > > >
> > > > static struct dso *machine__kernel_dso(struct machine *machine)
> > > > {
> > > > @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
> > > > ms.maps = maps__get(al.maps);
> > > > ms.map = map__get(al.map);
> > > > ms.sym = al.sym;
> > > > -
> > > > - if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > > > - goto out;
> > > > -
> > > > srcline = callchain_srcline(&ms, al.addr);
> > > > err = callchain_cursor_append(cursor, ip, &ms,
> > > > branch, flags, nr_loop_iter,
> > > > --
> > > > 2.41.0
> > > >
> > >
> > > --
> > > Artem
> >
> > --
> > Artem
> >
>
> --
>
> - Arnaldo
>

--
Artem


2023-08-08 17:50:25

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 1/1] Revert "perf report: Append inlines to non-DWARF callchains"



On 07/08/2023 16.03, Artem Savkov wrote:
> On Mon, Aug 07, 2023 at 10:34:44AM -0300, Arnaldo Carvalho de Melo wrote:
>> Em Mon, Aug 07, 2023 at 01:00:08PM +0200, Artem Savkov escreveu:
>>> On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
>>>> Hi Arnaldo,
>>>>
>>>> On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
>>>>> Hi Artem,
>>>>>
>>>>> Can you please double check this? I reproduced with:
>>>>>
>>>>> git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
>>>>> build it
>>>>> perf record -a -g sleep 5s
>>>>> perf report
>>>>>
>>>>> Do you get the same slowness and then reverting it, i.e. just
>>>>> going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
>>>>> without the inlines in the callchains?
>>>>
>>>> With a simple test like this I definitely get a slowdown, but not sure
>>>> if it can be called excessive.
>>>>
>>>> Below are the times I got by running 'time perf report' and hitting 'q'
>>>> during load so that it quits as soon as it is loads up. Tested on a
>>>> freshly updated fedora 38.
>>>

I reported this problem to ACME. It is also possible to reproduce
without hitting 'q' via using this cmdline with --stdio like this:

time perf report -v --stdio > /dev/null 2> debug01.stderr

The file 'debug01.stderr' contained a lot of addr2line output, that
might help debug the issue further.



>>> My bad, I had wrong debuginfo installed for the kernel I tested. I can
>>> reproduce it with the correct one. Looks like vmlinux is just too much
>>> for addr2line. Maybe we can skip it but leave other inlines in, like so:
>>
>> That is a possibilit, and probably we could make it cheaper by looking
>> at the cpumode, avoiding calling addr2line when we didn't makage to
>> resolve the symbol, etc.
>>
>> We also may want to have this as an option that has to be explicitely
>> enabled, like --resolve-inlines, as this will add overhead no matter if
>> we stop calling addr2line and do it more efficiently, etc.
>
> Sounds good, I'll look into it.
>
>> Fact is, we're late in the 6.5 schedule, so the best thing now is to
>> just revert the patch and then try again later, ok?
>
> Yes, sure.
>
>> - Arnaldo
>>
>>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>>> index 11de3ca8d4fa7..fef309cd401f7 100644
>>> --- a/tools/perf/util/machine.c
>>> +++ b/tools/perf/util/machine.c
>>> @@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
>>> ms.map = map__get(al.map);
>>> ms.sym = al.sym;
>>>
>>> - if (!branch && append_inlines(cursor, &ms, ip) == 0)
>>> + if (!branch && ms.map && ms.map->dso &&
>>> + strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
>>> + append_inlines(cursor, &ms, ip) == 0)
>>> goto out;
>>>
>>> srcline = callchain_srcline(&ms, al.addr);
>>>
>>>>> - Arnaldo
>>>>>
>>>>> ----
>>>>>
>>>>> This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
>>>>>
>>>>> The tests were made with a specific workload, further tests on a
>>>>> recently updated fedora 38 system with a system wide perf.data file
>>>>> shows 'perf report' taking excessive time, so lets revert this until a
>>>>> full investigation and improvement on the addr2line support code is
>>>>> made.
>>>>>

Reported-by: Jesper Dangaard Brouer <[email protected]>

>>>>> Cc: Andrii Nakryiko <[email protected]>
>>>>> Cc: Artem Savkov <[email protected]>
>>>>> Cc: Namhyung Kim <[email protected]>
>>>>> Cc: Adrian Hunter <[email protected]>
>>>>> Cc: Alexander Shishkin <[email protected]>
>>>>> Cc: Ian Rogers <[email protected]>
>>>>> Cc: Ingo Molnar <[email protected]>
>>>>> Cc: Jiri Olsa <[email protected]>
>>>>> Cc: Mark Rutland <[email protected]>
>>>>> Cc: Masami Hiramatsu <[email protected]>
>>>>> Cc: Milian Wolff <[email protected]>
>>>>> Cc: Peter Zijlstra <[email protected]>
>>>>> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
>>>>> ---
>>>>> tools/perf/util/machine.c | 5 -----
>>>>> 1 file changed, 5 deletions(-)
>>>>>
>>>>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>>>>> index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
>>>>> --- a/tools/perf/util/machine.c
>>>>> +++ b/tools/perf/util/machine.c
>>>>> @@ -45,7 +45,6 @@
>>>>>
>>>>> static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
>>>>> struct thread *th, bool lock);
>>>>> -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
>>>>>
>>>>> static struct dso *machine__kernel_dso(struct machine *machine)
>>>>> {
>>>>> @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
>>>>> ms.maps = maps__get(al.maps);
>>>>> ms.map = map__get(al.map);
>>>>> ms.sym = al.sym;
>>>>> -
>>>>> - if (!branch && append_inlines(cursor, &ms, ip) == 0)
>>>>> - goto out;
>>>>> -
>>>>> srcline = callchain_srcline(&ms, al.addr);
>>>>> err = callchain_cursor_append(cursor, ip, &ms,
>>>>> branch, flags, nr_loop_iter,

Tested-by: Jesper Dangaard Brouer <[email protected]>

--Jesper

2023-08-08 18:55:07

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 1/1] Revert "perf report: Append inlines to non-DWARF callchains"

Em Tue, Aug 08, 2023 at 03:38:37PM +0200, Jesper Dangaard Brouer escreveu:
>
>
> On 07/08/2023 16.03, Artem Savkov wrote:
> > On Mon, Aug 07, 2023 at 10:34:44AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Aug 07, 2023 at 01:00:08PM +0200, Artem Savkov escreveu:
> > > > On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
> > > > > Hi Arnaldo,
> > > > >
> > > > > On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > > Hi Artem,
> > > > > >
> > > > > > Can you please double check this? I reproduced with:
> > > > > >
> > > > > > git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
> > > > > > build it
> > > > > > perf record -a -g sleep 5s
> > > > > > perf report
> > > > > >
> > > > > > Do you get the same slowness and then reverting it, i.e. just
> > > > > > going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
> > > > > > without the inlines in the callchains?
> > > > >
> > > > > With a simple test like this I definitely get a slowdown, but not sure
> > > > > if it can be called excessive.
> > > > >
> > > > > Below are the times I got by running 'time perf report' and hitting 'q'
> > > > > during load so that it quits as soon as it is loads up. Tested on a
> > > > > freshly updated fedora 38.
> > > >
>
> I reported this problem to ACME. It is also possible to reproduce
> without hitting 'q' via using this cmdline with --stdio like this:
>
> time perf report -v --stdio > /dev/null 2> debug01.stderr
>
> The file 'debug01.stderr' contained a lot of addr2line output, that
> might help debug the issue further.
>
>
>
> > > > My bad, I had wrong debuginfo installed for the kernel I tested. I can
> > > > reproduce it with the correct one. Looks like vmlinux is just too much
> > > > for addr2line. Maybe we can skip it but leave other inlines in, like so:
> > >
> > > That is a possibilit, and probably we could make it cheaper by looking
> > > at the cpumode, avoiding calling addr2line when we didn't makage to
> > > resolve the symbol, etc.
> > >
> > > We also may want to have this as an option that has to be explicitely
> > > enabled, like --resolve-inlines, as this will add overhead no matter if
> > > we stop calling addr2line and do it more efficiently, etc.
> >
> > Sounds good, I'll look into it.
> >
> > > Fact is, we're late in the 6.5 schedule, so the best thing now is to
> > > just revert the patch and then try again later, ok?
> >
> > Yes, sure.
> >
> > > - Arnaldo
> > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > index 11de3ca8d4fa7..fef309cd401f7 100644
> > > > --- a/tools/perf/util/machine.c
> > > > +++ b/tools/perf/util/machine.c
> > > > @@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
> > > > ms.map = map__get(al.map);
> > > > ms.sym = al.sym;
> > > > - if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > > > + if (!branch && ms.map && ms.map->dso &&
> > > > + strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
> > > > + append_inlines(cursor, &ms, ip) == 0)
> > > > goto out;
> > > > srcline = callchain_srcline(&ms, al.addr);
> > > >
> > > > > > - Arnaldo
> > > > > >
> > > > > > ----
> > > > > >
> > > > > > This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
> > > > > >
> > > > > > The tests were made with a specific workload, further tests on a
> > > > > > recently updated fedora 38 system with a system wide perf.data file
> > > > > > shows 'perf report' taking excessive time, so lets revert this until a
> > > > > > full investigation and improvement on the addr2line support code is
> > > > > > made.
> > > > > >
>
> Reported-by: Jesper Dangaard Brouer <[email protected]>

Thanks, I did add that locally, will keep it and add your Tested-by,
thanks a lot!

- Arnaldo

> > > > > > Cc: Andrii Nakryiko <[email protected]>
> > > > > > Cc: Artem Savkov <[email protected]>
> > > > > > Cc: Namhyung Kim <[email protected]>
> > > > > > Cc: Adrian Hunter <[email protected]>
> > > > > > Cc: Alexander Shishkin <[email protected]>
> > > > > > Cc: Ian Rogers <[email protected]>
> > > > > > Cc: Ingo Molnar <[email protected]>
> > > > > > Cc: Jiri Olsa <[email protected]>
> > > > > > Cc: Mark Rutland <[email protected]>
> > > > > > Cc: Masami Hiramatsu <[email protected]>
> > > > > > Cc: Milian Wolff <[email protected]>
> > > > > > Cc: Peter Zijlstra <[email protected]>
> > > > > > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> > > > > > ---
> > > > > > tools/perf/util/machine.c | 5 -----
> > > > > > 1 file changed, 5 deletions(-)
> > > > > >
> > > > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > > > index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
> > > > > > --- a/tools/perf/util/machine.c
> > > > > > +++ b/tools/perf/util/machine.c
> > > > > > @@ -45,7 +45,6 @@
> > > > > > static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
> > > > > > struct thread *th, bool lock);
> > > > > > -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> > > > > > static struct dso *machine__kernel_dso(struct machine *machine)
> > > > > > {
> > > > > > @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
> > > > > > ms.maps = maps__get(al.maps);
> > > > > > ms.map = map__get(al.map);
> > > > > > ms.sym = al.sym;
> > > > > > -
> > > > > > - if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > > > > > - goto out;
> > > > > > -
> > > > > > srcline = callchain_srcline(&ms, al.addr);
> > > > > > err = callchain_cursor_append(cursor, ip, &ms,
> > > > > > branch, flags, nr_loop_iter,
>
> Tested-by: Jesper Dangaard Brouer <[email protected]>
>
> --Jesper

--

- Arnaldo