2015-02-09 17:27:21

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V7 1/1] perf tool:perf diff support for different binaries

From: Kan Liang <[email protected]>

Currently, the perf diff only works with same binaries. That's because
it compares the symbol start address. It doesn't work if the perf.data
comes from different binaries. This patch matches the symbol names.

Actually, perf diff once intended to compare the symbol names.
The commit as below can look for a pair by name.
604c5c92972d (perf diff: Change the default sort order to "dso,symbol")
However, at that time, perf diff used a global list of dsos. That means
the binaries which has same name can only be loaded once. That's a
problem for different binaries compare. For example, we have an old
binary and an updated binary. They very likely have same name and most
of the function, so only dsos from old binary will be loaded. When
processing the data from updated binary, perf still use the symbol
information from old binary. That's wrong.

Then the commit as below used IP to replace symbol name.
9c443dfdd31e ("perf diff: Fix support for all --sort combinations")
>From that time, perf diff starts to compare the symbol address.

The global dsos is discarded from a patch in 2010.
a1645ce12adb ("perf: 'perf kvm' tool for monitoring guest performance
from host")
However, at that time, perf diff already compared by address. So perf
diff cannot work for different binaries as well.

This patch actually rolls back the perf diff to original design. The
document is also changed, so everybody knows the original design is to
compare the symbol names.

Here is an examples.
The only difference between example_v1.c and example_v2.c is the
location of f2 and f3. There is no change in behavior, but the previous
perf diff display the wrong differential profile.

example_v1.c
noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {

if(i%2)
i++;
else
i++;
}
}

noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;

}

noinline void f1(void)
{
f2();
f3();
}

int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}

example_v2.c
noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;
}

noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {
if(i%2)
i++;
else
i++;
}
}

noinline void f1(void)
{
f2();
f3();
}

int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}

[lk@localhost perf_diff]$ gcc example_v1.c -o example
[lk@localhost perf_diff]$ perf record -o example_v1.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.813 MB example_v1.data (~35522
samples) ]

[lk@localhost perf_diff]$ gcc example_v2.c -o example
[lk@localhost perf_diff]$ perf record -o example_v2.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.824 MB example_v2.data (~36015
samples) ]

Old perf diff result.
[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
Event 'cycles'
Baseline Delta Shared Object Symbol
........ ....... ................ ...............................

[kernel.vmlinux] [k] __perf_event_task_sched_out
0.00% [kernel.vmlinux] [k] apic_timer_interrupt
[kernel.vmlinux] [k] idle_cpu
[kernel.vmlinux] [k] intel_pstate_timer_func
[kernel.vmlinux] [k] native_read_msr_safe
0.00% [kernel.vmlinux] [k] native_read_tsc
0.00% [kernel.vmlinux] [k] native_write_msr_safe
[kernel.vmlinux] [k] ntp_tick_length
0.00% [kernel.vmlinux] [k] rb_erase
0.00% [kernel.vmlinux] [k] tick_sched_timer
0.00% [kernel.vmlinux] [k] unmap_single_vma
0.00% [kernel.vmlinux] [k] update_wall_time
0.00% example [.] f1
46.24% example [.] f2
53.71% -7.55% example [.] f3
+53.81% example [.] f3
0.02% example [.] main

New perf diff result.
[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
[kernel.vmlinux] [k] __perf_event_task_sched_out
0.00% [kernel.vmlinux] [k] apic_timer_interrupt
[kernel.vmlinux] [k] idle_cpu
[kernel.vmlinux] [k] intel_pstate_timer_func
[kernel.vmlinux] [k] native_read_msr_safe
0.00% [kernel.vmlinux] [k] native_read_tsc
0.00% [kernel.vmlinux] [k] native_write_msr_safe
[kernel.vmlinux] [k] ntp_tick_length
0.00% [kernel.vmlinux] [k] rb_erase
0.00% [kernel.vmlinux] [k] tick_sched_timer
0.00% [kernel.vmlinux] [k] unmap_single_vma
0.00% [kernel.vmlinux] [k] update_wall_time
0.00% example [.] f1
46.24% -0.08% example [.] f2
53.71% +0.11% example [.] f3
0.02% example [.] main

Signed-off-by: Kan Liang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>

---

Changes since V4:
- Seperate from old patch set
- Add an example in changelog
- Update man page

Changes since V5:
- Correct patch style

Changes since V6:
- Update description

tools/perf/Documentation/perf-diff.txt | 5 +++++
tools/perf/util/sort.c | 9 +++++++++
2 files changed, 14 insertions(+)

diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt
index e463caa..5182661 100644
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -20,6 +20,11 @@ If no parameters are passed it will assume perf.data.old and perf.data.
The differential profile is displayed only for events matching both
specified perf.data files.

+If no parameters are passed the samples will be sorted by dso and symbol.
+As the perf.data files could come from different binaries, the symbols addresses
+could vary. So perf diff is based on the comparison of the files and
+symbols name.
+
OPTIONS
-------
-D::
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 7a39c1e..4593f36 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1463,6 +1463,15 @@ int sort_dimension__add(const char *tok)
sort__has_parent = 1;
} else if (sd->entry == &sort_sym) {
sort__has_sym = 1;
+ /*
+ * perf diff displays the performance difference amongst
+ * two or more perf.data files. Those files could come
+ * from different binaries. So we should not compare
+ * their ips, but the name of symbol.
+ */
+ if (sort__mode == SORT_MODE__DIFF)
+ sd->entry->se_collapse = sort__sym_sort;
+
} else if (sd->entry == &sort_dso) {
sort__has_dso = 1;
}
--
1.7.11.7


2015-02-25 15:14:16

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH V7 1/1] perf tool:perf diff support for different binaries

Hi Arnaldo,

Could you please review the patch?
I've already updated the patch description to try to address your concern.
Please let me know if you have any questions.

Thanks,
Kan

>
> From: Kan Liang <[email protected]>
>
> Currently, the perf diff only works with same binaries. That's because it
> compares the symbol start address. It doesn't work if the perf.data comes
> from different binaries. This patch matches the symbol names.
>
> Actually, perf diff once intended to compare the symbol names.
> The commit as below can look for a pair by name.
> 604c5c92972d (perf diff: Change the default sort order to "dso,symbol")
> However, at that time, perf diff used a global list of dsos. That means the
> binaries which has same name can only be loaded once. That's a problem
> for different binaries compare. For example, we have an old binary and an
> updated binary. They very likely have same name and most of the function,
> so only dsos from old binary will be loaded. When processing the data from
> updated binary, perf still use the symbol information from old binary.
> That's wrong.
>
> Then the commit as below used IP to replace symbol name.
> 9c443dfdd31e ("perf diff: Fix support for all --sort combinations") From that
> time, perf diff starts to compare the symbol address.
>
> The global dsos is discarded from a patch in 2010.
> a1645ce12adb ("perf: 'perf kvm' tool for monitoring guest performance
> from host") However, at that time, perf diff already compared by address.
> So perf diff cannot work for different binaries as well.
>
> This patch actually rolls back the perf diff to original design. The document
> is also changed, so everybody knows the original design is to compare the
> symbol names.
>
> Here is an examples.
> The only difference between example_v1.c and example_v2.c is the
> location of f2 and f3. There is no change in behavior, but the previous perf
> diff display the wrong differential profile.
>
> example_v1.c
> noinline void f3(void)
> {
> volatile int i;
> for (i = 0; i < 10000;) {
>
> if(i%2)
> i++;
> else
> i++;
> }
> }
>
> noinline void f2(void)
> {
> volatile int a = 100, b, c;
> for (b = 0; b < 10000; b++)
> c = a * b;
>
> }
>
> noinline void f1(void)
> {
> f2();
> f3();
> }
>
> int main()
> {
> int i;
> for (i = 0; i < 100000; i++)
> f1();
> }
>
> example_v2.c
> noinline void f2(void)
> {
> volatile int a = 100, b, c;
> for (b = 0; b < 10000; b++)
> c = a * b;
> }
>
> noinline void f3(void)
> {
> volatile int i;
> for (i = 0; i < 10000;) {
> if(i%2)
> i++;
> else
> i++;
> }
> }
>
> noinline void f1(void)
> {
> f2();
> f3();
> }
>
> int main()
> {
> int i;
> for (i = 0; i < 100000; i++)
> f1();
> }
>
> [lk@localhost perf_diff]$ gcc example_v1.c -o example [lk@localhost
> perf_diff]$ perf record -o example_v1.data ./example [ perf record:
> Woken up 4 times to write data ] [ perf record: Captured and wrote 0.813
> MB example_v1.data (~35522
> samples) ]
>
> [lk@localhost perf_diff]$ gcc example_v2.c -o example [lk@localhost
> perf_diff]$ perf record -o example_v2.data ./example [ perf record:
> Woken up 4 times to write data ] [ perf record: Captured and wrote 0.824
> MB example_v2.data (~36015
> samples) ]
>
> Old perf diff result.
> [lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
> Event 'cycles'
> Baseline Delta Shared Object Symbol
> ........ ....... ................ ...............................
>
> [kernel.vmlinux] [k] __perf_event_task_sched_out
> 0.00% [kernel.vmlinux] [k] apic_timer_interrupt
> [kernel.vmlinux] [k] idle_cpu
> [kernel.vmlinux] [k] intel_pstate_timer_func
> [kernel.vmlinux] [k] native_read_msr_safe
> 0.00% [kernel.vmlinux] [k] native_read_tsc
> 0.00% [kernel.vmlinux] [k] native_write_msr_safe
> [kernel.vmlinux] [k] ntp_tick_length
> 0.00% [kernel.vmlinux] [k] rb_erase
> 0.00% [kernel.vmlinux] [k] tick_sched_timer
> 0.00% [kernel.vmlinux] [k] unmap_single_vma
> 0.00% [kernel.vmlinux] [k] update_wall_time
> 0.00% example [.] f1
> 46.24% example [.] f2
> 53.71% -7.55% example [.] f3
> +53.81% example [.] f3
> 0.02% example [.] main
>
> New perf diff result.
> [lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
> [kernel.vmlinux] [k] __perf_event_task_sched_out
> 0.00% [kernel.vmlinux] [k] apic_timer_interrupt
> [kernel.vmlinux] [k] idle_cpu
> [kernel.vmlinux] [k] intel_pstate_timer_func
> [kernel.vmlinux] [k] native_read_msr_safe
> 0.00% [kernel.vmlinux] [k] native_read_tsc
> 0.00% [kernel.vmlinux] [k] native_write_msr_safe
> [kernel.vmlinux] [k] ntp_tick_length
> 0.00% [kernel.vmlinux] [k] rb_erase
> 0.00% [kernel.vmlinux] [k] tick_sched_timer
> 0.00% [kernel.vmlinux] [k] unmap_single_vma
> 0.00% [kernel.vmlinux] [k] update_wall_time
> 0.00% example [.] f1
> 46.24% -0.08% example [.] f2
> 53.71% +0.11% example [.] f3
> 0.02% example [.] main
>
> Signed-off-by: Kan Liang <[email protected]>
> Acked-by: Namhyung Kim <[email protected]>
> Acked-by: Jiri Olsa <[email protected]>
>
> ---
>
> Changes since V4:
> - Seperate from old patch set
> - Add an example in changelog
> - Update man page
>
> Changes since V5:
> - Correct patch style
>
> Changes since V6:
> - Update description
>
> tools/perf/Documentation/perf-diff.txt | 5 +++++
> tools/perf/util/sort.c | 9 +++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-diff.txt
> b/tools/perf/Documentation/perf-diff.txt
> index e463caa..5182661 100644
> --- a/tools/perf/Documentation/perf-diff.txt
> +++ b/tools/perf/Documentation/perf-diff.txt
> @@ -20,6 +20,11 @@ If no parameters are passed it will assume
> perf.data.old and perf.data.
> The differential profile is displayed only for events matching both
> specified perf.data files.
>
> +If no parameters are passed the samples will be sorted by dso and symbol.
> +As the perf.data files could come from different binaries, the symbols
> +addresses could vary. So perf diff is based on the comparison of the
> +files and symbols name.
> +
> OPTIONS
> -------
> -D::
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index
> 7a39c1e..4593f36 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -1463,6 +1463,15 @@ int sort_dimension__add(const char *tok)
> sort__has_parent = 1;
> } else if (sd->entry == &sort_sym) {
> sort__has_sym = 1;
> + /*
> + * perf diff displays the performance difference
> amongst
> + * two or more perf.data files. Those files could
> come
> + * from different binaries. So we should not
> compare
> + * their ips, but the name of symbol.
> + */
> + if (sort__mode == SORT_MODE__DIFF)
> + sd->entry->se_collapse = sort__sym_sort;
> +
> } else if (sd->entry == &sort_dso) {
> sort__has_dso = 1;
> }
> --
> 1.7.11.7

2015-02-25 15:30:19

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V7 1/1] perf tool:perf diff support for different binaries

Em Wed, Feb 25, 2015 at 03:14:06PM +0000, Liang, Kan escreveu:
> Hi Arnaldo,
>
> Could you please review the patch?
> I've already updated the patch description to try to address your concern.
> Please let me know if you have any questions.

Just out of time, sorry, will get to it eventually.

- Arnaldo

> Thanks,
> Kan
>
> >
> > From: Kan Liang <[email protected]>
> >
> > Currently, the perf diff only works with same binaries. That's because it
> > compares the symbol start address. It doesn't work if the perf.data comes
> > from different binaries. This patch matches the symbol names.
> >
> > Actually, perf diff once intended to compare the symbol names.
> > The commit as below can look for a pair by name.
> > 604c5c92972d (perf diff: Change the default sort order to "dso,symbol")
> > However, at that time, perf diff used a global list of dsos. That means the
> > binaries which has same name can only be loaded once. That's a problem
> > for different binaries compare. For example, we have an old binary and an
> > updated binary. They very likely have same name and most of the function,
> > so only dsos from old binary will be loaded. When processing the data from
> > updated binary, perf still use the symbol information from old binary.
> > That's wrong.
> >
> > Then the commit as below used IP to replace symbol name.
> > 9c443dfdd31e ("perf diff: Fix support for all --sort combinations") From that
> > time, perf diff starts to compare the symbol address.
> >
> > The global dsos is discarded from a patch in 2010.
> > a1645ce12adb ("perf: 'perf kvm' tool for monitoring guest performance
> > from host") However, at that time, perf diff already compared by address.
> > So perf diff cannot work for different binaries as well.
> >
> > This patch actually rolls back the perf diff to original design. The document
> > is also changed, so everybody knows the original design is to compare the
> > symbol names.
> >
> > Here is an examples.
> > The only difference between example_v1.c and example_v2.c is the
> > location of f2 and f3. There is no change in behavior, but the previous perf
> > diff display the wrong differential profile.
> >
> > example_v1.c
> > noinline void f3(void)
> > {
> > volatile int i;
> > for (i = 0; i < 10000;) {
> >
> > if(i%2)
> > i++;
> > else
> > i++;
> > }
> > }
> >
> > noinline void f2(void)
> > {
> > volatile int a = 100, b, c;
> > for (b = 0; b < 10000; b++)
> > c = a * b;
> >
> > }
> >
> > noinline void f1(void)
> > {
> > f2();
> > f3();
> > }
> >
> > int main()
> > {
> > int i;
> > for (i = 0; i < 100000; i++)
> > f1();
> > }
> >
> > example_v2.c
> > noinline void f2(void)
> > {
> > volatile int a = 100, b, c;
> > for (b = 0; b < 10000; b++)
> > c = a * b;
> > }
> >
> > noinline void f3(void)
> > {
> > volatile int i;
> > for (i = 0; i < 10000;) {
> > if(i%2)
> > i++;
> > else
> > i++;
> > }
> > }
> >
> > noinline void f1(void)
> > {
> > f2();
> > f3();
> > }
> >
> > int main()
> > {
> > int i;
> > for (i = 0; i < 100000; i++)
> > f1();
> > }
> >
> > [lk@localhost perf_diff]$ gcc example_v1.c -o example [lk@localhost
> > perf_diff]$ perf record -o example_v1.data ./example [ perf record:
> > Woken up 4 times to write data ] [ perf record: Captured and wrote 0.813
> > MB example_v1.data (~35522
> > samples) ]
> >
> > [lk@localhost perf_diff]$ gcc example_v2.c -o example [lk@localhost
> > perf_diff]$ perf record -o example_v2.data ./example [ perf record:
> > Woken up 4 times to write data ] [ perf record: Captured and wrote 0.824
> > MB example_v2.data (~36015
> > samples) ]
> >
> > Old perf diff result.
> > [lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
> > Event 'cycles'
> > Baseline Delta Shared Object Symbol
> > ........ ....... ................ ...............................
> >
> > [kernel.vmlinux] [k] __perf_event_task_sched_out
> > 0.00% [kernel.vmlinux] [k] apic_timer_interrupt
> > [kernel.vmlinux] [k] idle_cpu
> > [kernel.vmlinux] [k] intel_pstate_timer_func
> > [kernel.vmlinux] [k] native_read_msr_safe
> > 0.00% [kernel.vmlinux] [k] native_read_tsc
> > 0.00% [kernel.vmlinux] [k] native_write_msr_safe
> > [kernel.vmlinux] [k] ntp_tick_length
> > 0.00% [kernel.vmlinux] [k] rb_erase
> > 0.00% [kernel.vmlinux] [k] tick_sched_timer
> > 0.00% [kernel.vmlinux] [k] unmap_single_vma
> > 0.00% [kernel.vmlinux] [k] update_wall_time
> > 0.00% example [.] f1
> > 46.24% example [.] f2
> > 53.71% -7.55% example [.] f3
> > +53.81% example [.] f3
> > 0.02% example [.] main
> >
> > New perf diff result.
> > [lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
> > [kernel.vmlinux] [k] __perf_event_task_sched_out
> > 0.00% [kernel.vmlinux] [k] apic_timer_interrupt
> > [kernel.vmlinux] [k] idle_cpu
> > [kernel.vmlinux] [k] intel_pstate_timer_func
> > [kernel.vmlinux] [k] native_read_msr_safe
> > 0.00% [kernel.vmlinux] [k] native_read_tsc
> > 0.00% [kernel.vmlinux] [k] native_write_msr_safe
> > [kernel.vmlinux] [k] ntp_tick_length
> > 0.00% [kernel.vmlinux] [k] rb_erase
> > 0.00% [kernel.vmlinux] [k] tick_sched_timer
> > 0.00% [kernel.vmlinux] [k] unmap_single_vma
> > 0.00% [kernel.vmlinux] [k] update_wall_time
> > 0.00% example [.] f1
> > 46.24% -0.08% example [.] f2
> > 53.71% +0.11% example [.] f3
> > 0.02% example [.] main
> >
> > Signed-off-by: Kan Liang <[email protected]>
> > Acked-by: Namhyung Kim <[email protected]>
> > Acked-by: Jiri Olsa <[email protected]>
> >
> > ---
> >
> > Changes since V4:
> > - Seperate from old patch set
> > - Add an example in changelog
> > - Update man page
> >
> > Changes since V5:
> > - Correct patch style
> >
> > Changes since V6:
> > - Update description
> >
> > tools/perf/Documentation/perf-diff.txt | 5 +++++
> > tools/perf/util/sort.c | 9 +++++++++
> > 2 files changed, 14 insertions(+)
> >
> > diff --git a/tools/perf/Documentation/perf-diff.txt
> > b/tools/perf/Documentation/perf-diff.txt
> > index e463caa..5182661 100644
> > --- a/tools/perf/Documentation/perf-diff.txt
> > +++ b/tools/perf/Documentation/perf-diff.txt
> > @@ -20,6 +20,11 @@ If no parameters are passed it will assume
> > perf.data.old and perf.data.
> > The differential profile is displayed only for events matching both
> > specified perf.data files.
> >
> > +If no parameters are passed the samples will be sorted by dso and symbol.
> > +As the perf.data files could come from different binaries, the symbols
> > +addresses could vary. So perf diff is based on the comparison of the
> > +files and symbols name.
> > +
> > OPTIONS
> > -------
> > -D::
> > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index
> > 7a39c1e..4593f36 100644
> > --- a/tools/perf/util/sort.c
> > +++ b/tools/perf/util/sort.c
> > @@ -1463,6 +1463,15 @@ int sort_dimension__add(const char *tok)
> > sort__has_parent = 1;
> > } else if (sd->entry == &sort_sym) {
> > sort__has_sym = 1;
> > + /*
> > + * perf diff displays the performance difference
> > amongst
> > + * two or more perf.data files. Those files could
> > come
> > + * from different binaries. So we should not
> > compare
> > + * their ips, but the name of symbol.
> > + */
> > + if (sort__mode == SORT_MODE__DIFF)
> > + sd->entry->se_collapse = sort__sym_sort;
> > +
> > } else if (sd->entry == &sort_dso) {
> > sort__has_dso = 1;
> > }
> > --
> > 1.7.11.7

2015-02-26 15:17:57

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V7 1/1] perf tool:perf diff support for different binaries

Em Mon, Feb 09, 2015 at 05:39:44AM +0000, [email protected] escreveu:
> From: Kan Liang <[email protected]>
> Currently, the perf diff only works with same binaries. That's because
> it compares the symbol start address. It doesn't work if the perf.data
> comes from different binaries. This patch matches the symbol names.

> Actually, perf diff once intended to compare the symbol names.
> The commit as below can look for a pair by name.
> 604c5c92972d (perf diff: Change the default sort order to "dso,symbol")
> However, at that time, perf diff used a global list of dsos. That means
> the binaries which has same name can only be loaded once. That's a
> problem for different binaries compare. For example, we have an old
> binary and an updated binary. They very likely have same name and most
> of the function, so only dsos from old binary will be loaded. When
> processing the data from updated binary, perf still use the symbol
> information from old binary. That's wrong.

> Then the commit as below used IP to replace symbol name.
> 9c443dfdd31e ("perf diff: Fix support for all --sort combinations")
> >From that time, perf diff starts to compare the symbol address.

> The global dsos is discarded from a patch in 2010.
> a1645ce12adb ("perf: 'perf kvm' tool for monitoring guest performance
> from host")
> However, at that time, perf diff already compared by address. So perf
> diff cannot work for different binaries as well.

> This patch actually rolls back the perf diff to original design. The
> document is also changed, so everybody knows the original design is to
> compare the symbol names.

> Here is an examples.
> The only difference between example_v1.c and example_v2.c is the
> location of f2 and f3. There is no change in behavior, but the previous
> perf diff display the wrong differential profile.

Thanks for being persistent and addressing the comments.

Having a nice explanation of the problem helps, as my first reaction to
this patch was: "What? This is what this tool is supposed to do, to
compare two versions of a binary, one that is being developed from the
same source, the other with slight modifications, etc", while the
description of the patch made it look as this was a feature that was now
being introduced.

The problem was that over time new features made it stop working for the
initial purpose of the tool, gack!

It would be _really_ nice if you (or someone else :-) ) could get this
patch description and made it a script that would get the two versions
of the source code, build it, then called perf diff and checked that the
output was the expected one, then added it as an entry to 'perf test',
so that we would catch a problem like this quickly if we ever
reintroduce this problem or something else breaks 'perf diff' :-)

Applying the patch, thanks.

- Arnaldo

2015-02-26 20:34:06

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH V7 1/1] perf tool:perf diff support for different binaries

> Having a nice explanation of the problem helps, as my first reaction to
> this patch was: "What? This is what this tool is supposed to do, to
> compare two versions of a binary, one that is being developed from the
> same source, the other with slight modifications, etc", while the
> description of the patch made it look as this was a feature that was now
> being introduced.

AFAIK it was actually to compare multiple runs of the same binary
with different setups: e.g. varying number of threads.

However in the Linux kernel development world it turns out comparing
multiple similar binaries is much more useful.

-Andi
--
[email protected] -- Speaking for myself only

2015-02-26 20:44:33

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V7 1/1] perf tool:perf diff support for different binaries

Em Thu, Feb 26, 2015 at 12:34:04PM -0800, Andi Kleen escreveu:
> > Having a nice explanation of the problem helps, as my first reaction to
> > this patch was: "What? This is what this tool is supposed to do, to
> > compare two versions of a binary, one that is being developed from the
> > same source, the other with slight modifications, etc", while the
> > description of the patch made it look as this was a feature that was now
> > being introduced.
>
> AFAIK it was actually to compare multiple runs of the same binary
> with different setups: e.g. varying number of threads.
>
> However in the Linux kernel development world it turns out comparing
> multiple similar binaries is much more useful.

Surely I am getting old and forgetting things, but when I wrote it, my
intent was to do:

vi a.c
build it
perf record a
vi a.c # change it
build it
perf record a
perf diff

And see if what I did while vi'ing it matched what I thought it would.

- Arnaldo

2015-02-26 22:49:57

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH V7 1/1] perf tool:perf diff support for different binaries

> Surely I am getting old and forgetting things, but when I wrote it, my
> intent was to do:
>
> vi a.c
> build it
> perf record a
> vi a.c # change it
> build it
> perf record a
> perf diff
>
> And see if what I did while vi'ing it matched what I thought it would.

You're right of course. As you wrote it you know.

-Andi

--
[email protected] -- Speaking for myself only

Subject: [tip:perf/core] perf diff: Support for different binaries

Commit-ID: 94ba462d69efeba2f97111321a9ba1aa8141da57
Gitweb: http://git.kernel.org/tip/94ba462d69efeba2f97111321a9ba1aa8141da57
Author: Kan Liang <[email protected]>
AuthorDate: Mon, 9 Feb 2015 05:39:44 +0000
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Fri, 27 Feb 2015 10:08:38 -0300

perf diff: Support for different binaries

Currently, the perf diff only works with same binaries. That's because
it compares the symbol start address. It doesn't work if the perf.data
comes from different binaries. This patch matches the symbol names.

Actually, perf diff once intended to compare the symbol names. The
commit as below can look for a pair by name.

604c5c92972d (perf diff: Change the default sort order to "dso,symbol")
However, at that time, perf diff used a global list of dsos. That means
the binaries which has same name can only be loaded once. That's a
problem for comparing different binaries.

For example, we have an old binary and an updated binary. They very
likely have same name and most of the functions, so only dsos from old
binary will be loaded. When processing the data from updated binary,
perf still use the symbol information from old binary. That's wrong.

Then the commit as below used IP to replace symbol name.
9c443dfdd31e ("perf diff: Fix support for all --sort combinations")
>From that time, perf diff starts to compare the symbol address.

The global dsos is discarded from a patch in 2010.
a1645ce12adb ("perf: 'perf kvm' tool for monitoring guest performance
from host")
However, at that time, perf diff already compared by address. So perf
diff cannot work for different binaries as well.

This patch actually rolls back the perf diff to original design. The
document is also changed, so everybody knows the original design is to
compare the symbol names.

Here are some examples:

The only difference between example_v1.c and example_v2.c is the
location of f2 and f3. There is no change in behavior, but the previous
perf diff display the wrong differential profile.

example_v1.c
noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {

if(i%2)
i++;
else
i++;
}
}

noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;

}

noinline void f1(void)
{
f2();
f3();
}

int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}

example_v2.c
noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;
}

noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {
if(i%2)
i++;
else
i++;
}
}

noinline void f1(void)
{
f2();
f3();
}

int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}

[lk@localhost perf_diff]$ gcc example_v1.c -o example
[lk@localhost perf_diff]$ perf record -o example_v1.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.813 MB example_v1.data (~35522 samples) ]

[lk@localhost perf_diff]$ gcc example_v2.c -o example
[lk@localhost perf_diff]$ perf record -o example_v2.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.824 MB example_v2.data (~36015 samples) ]

Old perf diff result:

[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
Event 'cycles'
Baseline Delta Shared Object Symbol
........ ....... ................ ...............................

[kernel.vmlinux] [k] __perf_event_task_sched_out
0.00% [kernel.vmlinux] [k] apic_timer_interrupt
[kernel.vmlinux] [k] idle_cpu
[kernel.vmlinux] [k] intel_pstate_timer_func
[kernel.vmlinux] [k] native_read_msr_safe
0.00% [kernel.vmlinux] [k] native_read_tsc
0.00% [kernel.vmlinux] [k] native_write_msr_safe
[kernel.vmlinux] [k] ntp_tick_length
0.00% [kernel.vmlinux] [k] rb_erase
0.00% [kernel.vmlinux] [k] tick_sched_timer
0.00% [kernel.vmlinux] [k] unmap_single_vma
0.00% [kernel.vmlinux] [k] update_wall_time
0.00% example [.] f1
46.24% example [.] f2
53.71% -7.55% example [.] f3
+53.81% example [.] f3
0.02% example [.] main

New perf diff result:

[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
[kernel.vmlinux] [k] __perf_event_task_sched_out
0.00% [kernel.vmlinux] [k] apic_timer_interrupt
[kernel.vmlinux] [k] idle_cpu
[kernel.vmlinux] [k] intel_pstate_timer_func
[kernel.vmlinux] [k] native_read_msr_safe
0.00% [kernel.vmlinux] [k] native_read_tsc
0.00% [kernel.vmlinux] [k] native_write_msr_safe
[kernel.vmlinux] [k] ntp_tick_length
0.00% [kernel.vmlinux] [k] rb_erase
0.00% [kernel.vmlinux] [k] tick_sched_timer
0.00% [kernel.vmlinux] [k] unmap_single_vma
0.00% [kernel.vmlinux] [k] update_wall_time
0.00% example [.] f1
46.24% -0.08% example [.] f2
53.71% +0.11% example [.] f3
0.02% example [.] main

Signed-off-by: Kan Liang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/Documentation/perf-diff.txt | 5 +++++
tools/perf/util/sort.c | 9 +++++++++
2 files changed, 14 insertions(+)

diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt
index e463caa..5182661 100644
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -20,6 +20,11 @@ If no parameters are passed it will assume perf.data.old and perf.data.
The differential profile is displayed only for events matching both
specified perf.data files.

+If no parameters are passed the samples will be sorted by dso and symbol.
+As the perf.data files could come from different binaries, the symbols addresses
+could vary. So perf diff is based on the comparison of the files and
+symbols name.
+
OPTIONS
-------
-D::
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 7a39c1e..4593f36 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1463,6 +1463,15 @@ int sort_dimension__add(const char *tok)
sort__has_parent = 1;
} else if (sd->entry == &sort_sym) {
sort__has_sym = 1;
+ /*
+ * perf diff displays the performance difference amongst
+ * two or more perf.data files. Those files could come
+ * from different binaries. So we should not compare
+ * their ips, but the name of symbol.
+ */
+ if (sort__mode == SORT_MODE__DIFF)
+ sd->entry->se_collapse = sort__sym_sort;
+
} else if (sd->entry == &sort_dso) {
sort__has_dso = 1;
}