2021-03-19 12:37:37

by Yang Jihong

[permalink] [raw]
Subject: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

In hist__find_annotations function, since different hist_entry may point to same
symbol, we free notes->src to signal already processed this symbol in stdio mode;
when annotate, entry will skipped if notes->src is NULL to avoid repeated output.

However, there is a problem, for example, run the following command:

# perf record -e branch-misses -e branch-instructions -a sleep 1

perf.data file contains different types of sample event.

If the same IP sample event exists in branch-misses and branch-instructions,
this event uses the same symbol. When annotate branch-misses events, notes->src
corresponding to this event is set to null, as a result, when annotate
branch-instructions events, this event is skipped and no annotate is output.

Solution of this patch is to remove zfree in hists__find_annotations and
change sort order to "dso,symbol" to avoid duplicate output when different
processes correspond to the same symbol.

Signed-off-by: Yang Jihong <[email protected]>
---

Changes since v6:
- Remove separate setup_sorting() for branch mode.

Changes since v5:
- Add Signed-off-by tag.

Changes since v4:
- Use the same sort key "dso,symbol" in branch stack mode.

Changes since v3:
- Modify the first line of comments.

Changes since v2:
- Remove zfree in hists__find_annotations.
- Change sort order to avoid duplicate output.

Changes since v1:
- Change processed flag variable from u8 to bool.

tools/perf/builtin-annotate.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index a23ba6bb99b6..795c2ac7fcd1 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -374,13 +374,6 @@ static void hists__find_annotations(struct hists *hists,
} else {
hist_entry__tty_annotate(he, evsel, ann);
nd = rb_next(nd);
- /*
- * Since we have a hist_entry per IP for the same
- * symbol, free he->ms.sym->src to signal we already
- * processed this symbol.
- */
- zfree(&notes->src->cycles_hist);
- zfree(&notes->src);
}
}
}
@@ -619,14 +612,22 @@ int cmd_annotate(int argc, const char **argv)

setup_browser(true);

- if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack) {
+ /*
+ * Events of different processes may correspond to the same
+ * symbol, we do not care about the processes in annotate,
+ * set sort order to avoid repeated output.
+ */
+ sort_order = "dso,symbol";
+
+ /*
+ * Set SORT_MODE__BRANCH so that annotate display IPC/Cycle
+ * if branch info is in perf data in TUI mode.
+ */
+ if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack)
sort__mode = SORT_MODE__BRANCH;
- if (setup_sorting(annotate.session->evlist) < 0)
- usage_with_options(annotate_usage, options);
- } else {
- if (setup_sorting(NULL) < 0)
- usage_with_options(annotate_usage, options);
- }
+
+ if (setup_sorting(NULL) < 0)
+ usage_with_options(annotate_usage, options);

ret = __cmd_annotate(&annotate);

--
2.30.GIT


2021-03-26 02:26:53

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

Hello,
ping :)

On 2021/3/19 20:35, Yang Jihong wrote:
> In hist__find_annotations function, since different hist_entry may point to same
> symbol, we free notes->src to signal already processed this symbol in stdio mode;
> when annotate, entry will skipped if notes->src is NULL to avoid repeated output.
>
> However, there is a problem, for example, run the following command:
>
> # perf record -e branch-misses -e branch-instructions -a sleep 1
>
> perf.data file contains different types of sample event.
>
> If the same IP sample event exists in branch-misses and branch-instructions,
> this event uses the same symbol. When annotate branch-misses events, notes->src
> corresponding to this event is set to null, as a result, when annotate
> branch-instructions events, this event is skipped and no annotate is output.
>
> Solution of this patch is to remove zfree in hists__find_annotations and
> change sort order to "dso,symbol" to avoid duplicate output when different
> processes correspond to the same symbol.
>
> Signed-off-by: Yang Jihong <[email protected]>
> ---
>
> Changes since v6:
> - Remove separate setup_sorting() for branch mode.
>
> Changes since v5:
> - Add Signed-off-by tag.
>
> Changes since v4:
> - Use the same sort key "dso,symbol" in branch stack mode.
>
> Changes since v3:
> - Modify the first line of comments.
>
> Changes since v2:
> - Remove zfree in hists__find_annotations.
> - Change sort order to avoid duplicate output.
>
> Changes since v1:
> - Change processed flag variable from u8 to bool.
>
> tools/perf/builtin-annotate.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index a23ba6bb99b6..795c2ac7fcd1 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -374,13 +374,6 @@ static void hists__find_annotations(struct hists *hists,
> } else {
> hist_entry__tty_annotate(he, evsel, ann);
> nd = rb_next(nd);
> - /*
> - * Since we have a hist_entry per IP for the same
> - * symbol, free he->ms.sym->src to signal we already
> - * processed this symbol.
> - */
> - zfree(&notes->src->cycles_hist);
> - zfree(&notes->src);
> }
> }
> }
> @@ -619,14 +612,22 @@ int cmd_annotate(int argc, const char **argv)
>
> setup_browser(true);
>
> - if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack) {
> + /*
> + * Events of different processes may correspond to the same
> + * symbol, we do not care about the processes in annotate,
> + * set sort order to avoid repeated output.
> + */
> + sort_order = "dso,symbol";
> +
> + /*
> + * Set SORT_MODE__BRANCH so that annotate display IPC/Cycle
> + * if branch info is in perf data in TUI mode.
> + */
> + if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack)
> sort__mode = SORT_MODE__BRANCH;
> - if (setup_sorting(annotate.session->evlist) < 0)
> - usage_with_options(annotate_usage, options);
> - } else {
> - if (setup_sorting(NULL) < 0)
> - usage_with_options(annotate_usage, options);
> - }
> +
> + if (setup_sorting(NULL) < 0)
> + usage_with_options(annotate_usage, options);
>
> ret = __cmd_annotate(&annotate);
>
>

2021-03-26 03:27:29

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

Hello,

On Fri, Mar 26, 2021 at 11:24 AM Yang Jihong <[email protected]> wrote:
>
> Hello,
> ping :)
>
> On 2021/3/19 20:35, Yang Jihong wrote:
> > In hist__find_annotations function, since different hist_entry may point to same
> > symbol, we free notes->src to signal already processed this symbol in stdio mode;
> > when annotate, entry will skipped if notes->src is NULL to avoid repeated output.
> >
> > However, there is a problem, for example, run the following command:
> >
> > # perf record -e branch-misses -e branch-instructions -a sleep 1
> >
> > perf.data file contains different types of sample event.
> >
> > If the same IP sample event exists in branch-misses and branch-instructions,
> > this event uses the same symbol. When annotate branch-misses events, notes->src
> > corresponding to this event is set to null, as a result, when annotate
> > branch-instructions events, this event is skipped and no annotate is output.
> >
> > Solution of this patch is to remove zfree in hists__find_annotations and
> > change sort order to "dso,symbol" to avoid duplicate output when different
> > processes correspond to the same symbol.
> >
> > Signed-off-by: Yang Jihong <[email protected]>

Acked-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung


> > ---
> >
> > Changes since v6:
> > - Remove separate setup_sorting() for branch mode.
> >
> > Changes since v5:
> > - Add Signed-off-by tag.
> >
> > Changes since v4:
> > - Use the same sort key "dso,symbol" in branch stack mode.
> >
> > Changes since v3:
> > - Modify the first line of comments.
> >
> > Changes since v2:
> > - Remove zfree in hists__find_annotations.
> > - Change sort order to avoid duplicate output.
> >
> > Changes since v1:
> > - Change processed flag variable from u8 to bool.
> >
> > tools/perf/builtin-annotate.c | 29 +++++++++++++++--------------
> > 1 file changed, 15 insertions(+), 14 deletions(-)
> >
> > diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> > index a23ba6bb99b6..795c2ac7fcd1 100644
> > --- a/tools/perf/builtin-annotate.c
> > +++ b/tools/perf/builtin-annotate.c
> > @@ -374,13 +374,6 @@ static void hists__find_annotations(struct hists *hists,
> > } else {
> > hist_entry__tty_annotate(he, evsel, ann);
> > nd = rb_next(nd);
> > - /*
> > - * Since we have a hist_entry per IP for the same
> > - * symbol, free he->ms.sym->src to signal we already
> > - * processed this symbol.
> > - */
> > - zfree(&notes->src->cycles_hist);
> > - zfree(&notes->src);
> > }
> > }
> > }
> > @@ -619,14 +612,22 @@ int cmd_annotate(int argc, const char **argv)
> >
> > setup_browser(true);
> >
> > - if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack) {
> > + /*
> > + * Events of different processes may correspond to the same
> > + * symbol, we do not care about the processes in annotate,
> > + * set sort order to avoid repeated output.
> > + */
> > + sort_order = "dso,symbol";
> > +
> > + /*
> > + * Set SORT_MODE__BRANCH so that annotate display IPC/Cycle
> > + * if branch info is in perf data in TUI mode.
> > + */
> > + if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack)
> > sort__mode = SORT_MODE__BRANCH;
> > - if (setup_sorting(annotate.session->evlist) < 0)
> > - usage_with_options(annotate_usage, options);
> > - } else {
> > - if (setup_sorting(NULL) < 0)
> > - usage_with_options(annotate_usage, options);
> > - }
> > +
> > + if (setup_sorting(NULL) < 0)
> > + usage_with_options(annotate_usage, options);
> >
> > ret = __cmd_annotate(&annotate);
> >
> >

2021-03-26 12:09:49

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

Em Fri, Mar 26, 2021 at 12:25:37PM +0900, Namhyung Kim escreveu:
> On Fri, Mar 26, 2021 at 11:24 AM Yang Jihong <[email protected]> wrote:
> > On 2021/3/19 20:35, Yang Jihong wrote:
> > > In hist__find_annotations function, since different hist_entry may point to same
> > > symbol, we free notes->src to signal already processed this symbol in stdio mode;
> > > when annotate, entry will skipped if notes->src is NULL to avoid repeated output.
> > >
> > > However, there is a problem, for example, run the following command:
> > >
> > > # perf record -e branch-misses -e branch-instructions -a sleep 1
> > >
> > > perf.data file contains different types of sample event.
> > >
> > > If the same IP sample event exists in branch-misses and branch-instructions,
> > > this event uses the same symbol. When annotate branch-misses events, notes->src
> > > corresponding to this event is set to null, as a result, when annotate
> > > branch-instructions events, this event is skipped and no annotate is output.
> > >
> > > Solution of this patch is to remove zfree in hists__find_annotations and
> > > change sort order to "dso,symbol" to avoid duplicate output when different
> > > processes correspond to the same symbol.

> > > Signed-off-by: Yang Jihong <[email protected]>

> Acked-by: Namhyung Kim <[email protected]>

Without looking at the patch, just at its description of the problem, I
tried to annotate two events in a group, to get the annotate group view
output with both events, and it seems I'm getting samples accounted for
both events:

[root@five ~]# perf record -e '{branch-misses,branch-instructions}' -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.296 MB perf.data (2507 samples) ]
[root@five ~]#
[root@five ~]# perf report | grep -v '^#' | head -5
17.49% 19.19% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarking::Run
12.17% 17.04% ThreadPoolForeg chromium-browser [.] v8::internal::Sweeper::RawSweep
11.14% 11.63% ThreadPoolForeg chromium-browser [.] v8::internal::MarkingVisitorBase<v8::internal::ConcurrentMarkingVisitor, v8::internal::ConcurrentMarkingState>::ProcessStrongHeapObject<v8::internal::CompressedHeapObjectSlot>
7.65% 7.84% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarkingVisitor::ShouldVisit
5.66% 6.72% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarkingVisitor::VisitPointersInSnapshot

[root@five ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run
Samples: 444 of events 'anon group { branch-misses, branch-instructions }', 4000 Hz, Event count (approx.): 596221, [percent: local period]
v8::internal::ConcurrentMarking::Run() /usr/lib64/chromium-browser/chromium-browser
Percent


Disassembly of section .text:

0000000003290b30 <v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool)>:
v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool):
push %rbp
mov %rsp,%rbp
push %r15
push %r14
mov %rdi,%r14
push %r13
mov %edx,%r13d
push %r12
mov %ecx,%r12d
push %rbx
sub $0x1298,%rsp
mov %rsi,-0x1228(%rbp)
mov %fs:0x28,%rax
mov %rax,-0x38(%rbp)
<SNIP>
movzwl 0x2(%rbx),%eax
test %ax,%ax
↓ jne 4a9
mov -0x10e8(%rbp),%rdx
cmpw $0x0,0x2(%rdx)
0.41 0.39 ↓ je 4b90
movq %rbx,%xmm0
movq %rdx,%xmm2
mov %rdx,%rbx
punpcklqdq %xmm2,%xmm0
movups %xmm0,-0x10e8(%rbp)
movzwl 0x2(%rdx),%eax
4a9: sub $0x1,%eax
mov %ax,0x2(%rbx)
0.36 0.91 movzwl %ax,%eax
0.60 0.00 mov 0x10(%rbx,%rax,8),%rax
3.44 2.46 mov %rax,-0x11e0(%rbp)
0.00 0.34 4bf: mov 0x8(%r13),%rax
0.00 0.36 add $0x1,%r15d
0.00 0.34 mov 0x110(%rax),%rax
mov 0x128(%rax),%rcx
0.88 0.36 mov 0x8(%r13),%rax
mov 0x110(%rax),%rdx
mov 0x130(%rdx),%rdx
0.00 0.48 mov 0x140(%rax),%rax
mov 0x110(%rax),%rsi
0.61 0.47 mov -0x11e0(%rbp),%rax
2.01 2.32 sub $0x1,%rax
cmp %rcx,%rax
0.00 0.35 setae %cl
1.31 0.33 cmp %rdx,%rax
setb %dl
0.00 0.24 test %dl,%cl
0.00 0.12 ↓ jne 4b70
cmp %rsi,%rax
↓ je 4b70
mov (%rax),%eax
29.10 29.90 add -0x1220(%rbp),%rax
cmpb $0x0,-0x1218(%rbp)
mov %rax,-0x1210(%rbp)
0.00 0.65 ↓ jne 4fa0
mov -0x11e0(%rbp),%r9
lea 0x6(%rax),%rbx
0.38 0.00 545: movzbl (%rbx),%eax
4.90 5.34 cmp $0x4c,%al
↓ ja 5026
0.58 0.00 550: lea v8::internal::FLAGDEFAULT_abort_on_contradictory_flags+0x457,%rdi
movslq (%rdi,%rax,4),%rax
4.97 3.19 add %rdi,%rax
<SNIP>

If I ask for number of samples:

[root@five ~]# perf config annotate.show_nr_samples=true
[root@five ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run

Samples: 444 of events 'anon group { branch-misses, branch-instructions }', 4000 Hz, Event count (approx.): 596221, [percent: local period]
v8::internal::ConcurrentMarking::Run() /usr/lib64/chromium-browser/chromium-browser
Samples


Disassembly of section .text:

0000000003290b30 <v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool)>:
v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool):
push %rbp
mov %rsp,%rbp
push %r15
push %r14
mov %rdi,%r14
push %r13
mov %edx,%r13d
push %r12
mov %ecx,%r12d
push %rbx
sub $0x1298,%rsp
mov %rsi,-0x1228(%rbp)
mov %fs:0x28,%rax
mov %rax,-0x38(%rbp)
<SNIP>
movzwl 0x2(%rbx),%eax
test %ax,%ax
↓ jne 4a9
mov -0x10e8(%rbp),%rdx
cmpw $0x0,0x2(%rdx)
1 1 ↓ je 4b90
movq %rbx,%xmm0
movq %rdx,%xmm2
mov %rdx,%rbx
punpcklqdq %xmm2,%xmm0
movups %xmm0,-0x10e8(%rbp)
movzwl 0x2(%rdx),%eax
4a9: sub $0x1,%eax
mov %ax,0x2(%rbx)
1 2 movzwl %ax,%eax
1 0 mov 0x10(%rbx,%rax,8),%rax
8 5 mov %rax,-0x11e0(%rbp)
0 1 4bf: mov 0x8(%r13),%rax
0 1 add $0x1,%r15d
0 1 mov 0x110(%rax),%rax
mov 0x128(%rax),%rcx
3 1 mov 0x8(%r13),%rax
mov 0x110(%rax),%rdx
mov 0x130(%rdx),%rdx
0 1 mov 0x140(%rax),%rax
mov 0x110(%rax),%rsi
2 1 mov -0x11e0(%rbp),%rax
6 6 sub $0x1,%rax
cmp %rcx,%rax
0 1 setae %cl
2 1 cmp %rdx,%rax
setb %dl
0 1 test %dl,%cl
0 1 ↓ jne 4b70
cmp %rsi,%rax
↓ je 4b70
mov (%rax),%eax
58 73 add -0x1220(%rbp),%rax
cmpb $0x0,-0x1218(%rbp)
mov %rax,-0x1210(%rbp)
0 1 ↓ jne 4fa0
mov -0x11e0(%rbp),%r9
lea 0x6(%rax),%rbx
1 0 545: movzbl (%rbx),%eax
10 13 cmp $0x4c,%al
↓ ja 5026
1 0 550: lea v8::internal::FLAGDEFAULT_abort_on_contradictory_flags+0x457,%rdi
movslq (%rdi,%rax,4),%rax
9 8 add %rdi,%rax

<SNIP>

So it seems to be working, what am I missing? Is this strictly non
group related?

- Arnaldo

2021-03-27 02:27:01

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

Hello,

On 2021/3/26 20:06, Arnaldo Carvalho de Melo wrote:
> Em Fri, Mar 26, 2021 at 12:25:37PM +0900, Namhyung Kim escreveu:
>> On Fri, Mar 26, 2021 at 11:24 AM Yang Jihong <[email protected]> wrote:
>>> On 2021/3/19 20:35, Yang Jihong wrote:
>>>> In hist__find_annotations function, since different hist_entry may point to same
>>>> symbol, we free notes->src to signal already processed this symbol in stdio mode;
>>>> when annotate, entry will skipped if notes->src is NULL to avoid repeated output.
>>>>
>>>> However, there is a problem, for example, run the following command:
>>>>
>>>> # perf record -e branch-misses -e branch-instructions -a sleep 1
>>>>
>>>> perf.data file contains different types of sample event.
>>>>
>>>> If the same IP sample event exists in branch-misses and branch-instructions,
>>>> this event uses the same symbol. When annotate branch-misses events, notes->src
>>>> corresponding to this event is set to null, as a result, when annotate
>>>> branch-instructions events, this event is skipped and no annotate is output.
>>>>
>>>> Solution of this patch is to remove zfree in hists__find_annotations and
>>>> change sort order to "dso,symbol" to avoid duplicate output when different
>>>> processes correspond to the same symbol.
>
>>>> Signed-off-by: Yang Jihong <[email protected]>
>
>> Acked-by: Namhyung Kim <[email protected]>
>
> Without looking at the patch, just at its description of the problem, I
> tried to annotate two events in a group, to get the annotate group view
> output with both events, and it seems I'm getting samples accounted for
> both events:
>
> [root@five ~]# perf record -e '{branch-misses,branch-instructions}' -a sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 2.296 MB perf.data (2507 samples) ]
> [root@five ~]#
> [root@five ~]# perf report | grep -v '^#' | head -5
> 17.49% 19.19% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarking::Run
> 12.17% 17.04% ThreadPoolForeg chromium-browser [.] v8::internal::Sweeper::RawSweep
> 11.14% 11.63% ThreadPoolForeg chromium-browser [.] v8::internal::MarkingVisitorBase<v8::internal::ConcurrentMarkingVisitor, v8::internal::ConcurrentMarkingState>::ProcessStrongHeapObject<v8::internal::CompressedHeapObjectSlot>
> 7.65% 7.84% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarkingVisitor::ShouldVisit
> 5.66% 6.72% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarkingVisitor::VisitPointersInSnapshot
>
> [root@five ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run
> Samples: 444 of events 'anon group { branch-misses, branch-instructions }', 4000 Hz, Event count (approx.): 596221, [percent: local period]
> v8::internal::ConcurrentMarking::Run() /usr/lib64/chromium-browser/chromium-browser
> Percent
>
>
> Disassembly of section .text:
>
> 0000000003290b30 <v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool)>:
> v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool):
> push %rbp
> mov %rsp,%rbp
> push %r15
> push %r14
> mov %rdi,%r14
> push %r13
> mov %edx,%r13d
> push %r12
> mov %ecx,%r12d
> push %rbx
> sub $0x1298,%rsp
> mov %rsi,-0x1228(%rbp)
> mov %fs:0x28,%rax
> mov %rax,-0x38(%rbp)
> <SNIP>
> movzwl 0x2(%rbx),%eax
> test %ax,%ax
> ↓ jne 4a9
> mov -0x10e8(%rbp),%rdx
> cmpw $0x0,0x2(%rdx)
> 0.41 0.39 ↓ je 4b90
> movq %rbx,%xmm0
> movq %rdx,%xmm2
> mov %rdx,%rbx
> punpcklqdq %xmm2,%xmm0
> movups %xmm0,-0x10e8(%rbp)
> movzwl 0x2(%rdx),%eax
> 4a9: sub $0x1,%eax
> mov %ax,0x2(%rbx)
> 0.36 0.91 movzwl %ax,%eax
> 0.60 0.00 mov 0x10(%rbx,%rax,8),%rax
> 3.44 2.46 mov %rax,-0x11e0(%rbp)
> 0.00 0.34 4bf: mov 0x8(%r13),%rax
> 0.00 0.36 add $0x1,%r15d
> 0.00 0.34 mov 0x110(%rax),%rax
> mov 0x128(%rax),%rcx
> 0.88 0.36 mov 0x8(%r13),%rax
> mov 0x110(%rax),%rdx
> mov 0x130(%rdx),%rdx
> 0.00 0.48 mov 0x140(%rax),%rax
> mov 0x110(%rax),%rsi
> 0.61 0.47 mov -0x11e0(%rbp),%rax
> 2.01 2.32 sub $0x1,%rax
> cmp %rcx,%rax
> 0.00 0.35 setae %cl
> 1.31 0.33 cmp %rdx,%rax
> setb %dl
> 0.00 0.24 test %dl,%cl
> 0.00 0.12 ↓ jne 4b70
> cmp %rsi,%rax
> ↓ je 4b70
> mov (%rax),%eax
> 29.10 29.90 add -0x1220(%rbp),%rax
> cmpb $0x0,-0x1218(%rbp)
> mov %rax,-0x1210(%rbp)
> 0.00 0.65 ↓ jne 4fa0
> mov -0x11e0(%rbp),%r9
> lea 0x6(%rax),%rbx
> 0.38 0.00 545: movzbl (%rbx),%eax
> 4.90 5.34 cmp $0x4c,%al
> ↓ ja 5026
> 0.58 0.00 550: lea v8::internal::FLAGDEFAULT_abort_on_contradictory_flags+0x457,%rdi
> movslq (%rdi,%rax,4),%rax
> 4.97 3.19 add %rdi,%rax
> <SNIP>
>
> If I ask for number of samples:
>
> [root@five ~]# perf config annotate.show_nr_samples=true
> [root@five ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run
>
> Samples: 444 of events 'anon group { branch-misses, branch-instructions }', 4000 Hz, Event count (approx.): 596221, [percent: local period]
> v8::internal::ConcurrentMarking::Run() /usr/lib64/chromium-browser/chromium-browser
> Samples
>
>
> Disassembly of section .text:
>
> 0000000003290b30 <v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool)>:
> v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool):
> push %rbp
> mov %rsp,%rbp
> push %r15
> push %r14
> mov %rdi,%r14
> push %r13
> mov %edx,%r13d
> push %r12
> mov %ecx,%r12d
> push %rbx
> sub $0x1298,%rsp
> mov %rsi,-0x1228(%rbp)
> mov %fs:0x28,%rax
> mov %rax,-0x38(%rbp)
> <SNIP>
> movzwl 0x2(%rbx),%eax
> test %ax,%ax
> ↓ jne 4a9
> mov -0x10e8(%rbp),%rdx
> cmpw $0x0,0x2(%rdx)
> 1 1 ↓ je 4b90
> movq %rbx,%xmm0
> movq %rdx,%xmm2
> mov %rdx,%rbx
> punpcklqdq %xmm2,%xmm0
> movups %xmm0,-0x10e8(%rbp)
> movzwl 0x2(%rdx),%eax
> 4a9: sub $0x1,%eax
> mov %ax,0x2(%rbx)
> 1 2 movzwl %ax,%eax
> 1 0 mov 0x10(%rbx,%rax,8),%rax
> 8 5 mov %rax,-0x11e0(%rbp)
> 0 1 4bf: mov 0x8(%r13),%rax
> 0 1 add $0x1,%r15d
> 0 1 mov 0x110(%rax),%rax
> mov 0x128(%rax),%rcx
> 3 1 mov 0x8(%r13),%rax
> mov 0x110(%rax),%rdx
> mov 0x130(%rdx),%rdx
> 0 1 mov 0x140(%rax),%rax
> mov 0x110(%rax),%rsi
> 2 1 mov -0x11e0(%rbp),%rax
> 6 6 sub $0x1,%rax
> cmp %rcx,%rax
> 0 1 setae %cl
> 2 1 cmp %rdx,%rax
> setb %dl
> 0 1 test %dl,%cl
> 0 1 ↓ jne 4b70
> cmp %rsi,%rax
> ↓ je 4b70
> mov (%rax),%eax
> 58 73 add -0x1220(%rbp),%rax
> cmpb $0x0,-0x1218(%rbp)
> mov %rax,-0x1210(%rbp)
> 0 1 ↓ jne 4fa0
> mov -0x11e0(%rbp),%r9
> lea 0x6(%rax),%rbx
> 1 0 545: movzbl (%rbx),%eax
> 10 13 cmp $0x4c,%al
> ↓ ja 5026
> 1 0 550: lea v8::internal::FLAGDEFAULT_abort_on_contradictory_flags+0x457,%rdi
> movslq (%rdi,%rax,4),%rax
> 9 8 add %rdi,%rax
>
> <SNIP>
>
> So it seems to be working, what am I missing? Is this strictly non
> group related?
>
Yes, it is non group related.
This problem occurs only when different events need to be recorded at
the same time, i.e.:
perf record -e branch-misses -e branch-instructions -a sleep 1

The output results of perf script and perf annotate do not match.
Some events are not output in perf annotate.

> - Arnaldo
> .
>

2021-03-30 07:30:38

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

Hi Yang and Arnaldo,

On Sat, Mar 27, 2021 at 11:16 AM Yang Jihong <[email protected]> wrote:
> On 2021/3/26 20:06, Arnaldo Carvalho de Melo wrote:
> > So it seems to be working, what am I missing? Is this strictly non
> > group related?
> >
> Yes, it is non group related.
> This problem occurs only when different events need to be recorded at
> the same time, i.e.:
> perf record -e branch-misses -e branch-instructions -a sleep 1
>
> The output results of perf script and perf annotate do not match.
> Some events are not output in perf annotate.

Yeah I think it's related to sort keys. The code works with a single
hist_entry for each event and symbol. But the default sort key
creates multiple entries for different threads and it causes the
confusion.

Thanks,
Namhyung

2021-03-31 02:20:32

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

Hello, Namhyung and Arnaldo,

On 2021/3/30 15:26, Namhyung Kim wrote:
> Hi Yang and Arnaldo,
>
> On Sat, Mar 27, 2021 at 11:16 AM Yang Jihong <[email protected]> wrote:
>> On 2021/3/26 20:06, Arnaldo Carvalho de Melo wrote:
>>> So it seems to be working, what am I missing? Is this strictly non
>>> group related?
>>>
>> Yes, it is non group related.
>> This problem occurs only when different events need to be recorded at
>> the same time, i.e.:
>> perf record -e branch-misses -e branch-instructions -a sleep 1
>>
>> The output results of perf script and perf annotate do not match.
>> Some events are not output in perf annotate.
>
> Yeah I think it's related to sort keys. The code works with a single
> hist_entry for each event and symbol. But the default sort key
> creates multiple entries for different threads and it causes the
> confusion.
>
Yes, After revome zfree from hists__find_annotations, the output of perf
annotate is repeated, which is related to sort keys.

The original problem is that notes->src may correspond to multiple
sample events. Therefore, we cannot simply zfree notes->src to avoid
repeated output.

Arnaldo, is there any problem with this patch? :)

Thanks,
Yang
> Thanks,
> Namhyung
> .
>

2021-04-12 18:48:38

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

Hello, Arnaldo,

On 2021/3/31 10:18, Yang Jihong wrote:
> Hello, Namhyung  and Arnaldo,
>
> On 2021/3/30 15:26, Namhyung Kim wrote:
>> Hi Yang and Arnaldo,
>>
>> On Sat, Mar 27, 2021 at 11:16 AM Yang Jihong <[email protected]>
>> wrote:
>>> On 2021/3/26 20:06, Arnaldo Carvalho de Melo wrote:
>>>> So it seems to be working, what am I missing? Is this strictly non
>>>> group related?
>>>>
>>> Yes, it is non group related.
>>> This problem occurs only when different events need to be recorded at
>>> the same time, i.e.:
>>> perf record -e branch-misses -e branch-instructions -a sleep 1
>>>
>>> The output results of perf script and perf annotate do not match.
>>> Some events are not output in perf annotate.
>>
>> Yeah I think it's related to sort keys.  The code works with a single
>> hist_entry for each event and symbol.  But the default sort key
>> creates multiple entries for different threads and it causes the
>> confusion.
>>
> Yes, After revome zfree from hists__find_annotations, the output of perf
> annotate is repeated, which is related to sort keys.
>
> The original problem is that notes->src may correspond to multiple
> sample events. Therefore, we cannot simply zfree notes->src to avoid
> repeated output.
>
> Arnaldo, is there any problem with this patch? :)
>
> Thanks,
> Yang
>> Thanks,
>> Namhyung
>> .
>>
PING :)
Is there any problem with this patch that needs to be modified?

Thanks,
Yang

2021-04-15 00:20:55

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

Em Mon, Apr 12, 2021 at 03:22:29PM +0800, Yang Jihong escreveu:
> On 2021/3/31 10:18, Yang Jihong wrote:
> > On 2021/3/30 15:26, Namhyung Kim wrote:
> > > On Sat, Mar 27, 2021 at 11:16 AM Yang Jihong <[email protected]> wrote:
> > > > On 2021/3/26 20:06, Arnaldo Carvalho de Melo wrote:
> > > > > So it seems to be working, what am I missing? Is this strictly non
> > > > > group related?

> > > > Yes, it is non group related.
> > > > This problem occurs only when different events need to be recorded at
> > > > the same time, i.e.:
> > > > perf record -e branch-misses -e branch-instructions -a sleep 1

> > > > The output results of perf script and perf annotate do not match.
> > > > Some events are not output in perf annotate.

> > > Yeah I think it's related to sort keys.? The code works with a single
> > > hist_entry for each event and symbol.? But the default sort key
> > > creates multiple entries for different threads and it causes the
> > > confusion.

> > Yes, After revome zfree from hists__find_annotations, the output of perf
> > annotate is repeated, which is related to sort keys.

> > The original problem is that notes->src may correspond to multiple
> > sample events. Therefore, we cannot simply zfree notes->src to avoid
> > repeated output.

> > Arnaldo, is there any problem with this patch? :)

> PING :)
> Is there any problem with this patch that needs to be modified?

I continue having a feeling this is kinda a bandaid, i.e. avoid the
problem, and since we have a way to work this when using a group, I fail
to see why it couldn't work when not grouping events.

But since I have no time to dive into this and Namhyung is ok with it,
I'll merge it now.

- Arnaldo

2021-04-15 00:25:49

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

Hi Arnaldo,

On Wed, Apr 14, 2021 at 9:23 PM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Mon, Apr 12, 2021 at 03:22:29PM +0800, Yang Jihong escreveu:
> > On 2021/3/31 10:18, Yang Jihong wrote:
> > > On 2021/3/30 15:26, Namhyung Kim wrote:
> > > > On Sat, Mar 27, 2021 at 11:16 AM Yang Jihong <[email protected]> wrote:
> > > > > On 2021/3/26 20:06, Arnaldo Carvalho de Melo wrote:
> > > > > > So it seems to be working, what am I missing? Is this strictly non
> > > > > > group related?
>
> > > > > Yes, it is non group related.
> > > > > This problem occurs only when different events need to be recorded at
> > > > > the same time, i.e.:
> > > > > perf record -e branch-misses -e branch-instructions -a sleep 1
>
> > > > > The output results of perf script and perf annotate do not match.
> > > > > Some events are not output in perf annotate.
>
> > > > Yeah I think it's related to sort keys. The code works with a single
> > > > hist_entry for each event and symbol. But the default sort key
> > > > creates multiple entries for different threads and it causes the
> > > > confusion.
>
> > > Yes, After revome zfree from hists__find_annotations, the output of perf
> > > annotate is repeated, which is related to sort keys.
>
> > > The original problem is that notes->src may correspond to multiple
> > > sample events. Therefore, we cannot simply zfree notes->src to avoid
> > > repeated output.
>
> > > Arnaldo, is there any problem with this patch? :)
>
> > PING :)
> > Is there any problem with this patch that needs to be modified?
>
> I continue having a feeling this is kinda a bandaid, i.e. avoid the
> problem, and since we have a way to work this when using a group, I fail
> to see why it couldn't work when not grouping events.

When we use a group, there's a single iteration only for the leader event.
But if not, it'll iterate the hist entries twice (for two events).
Each iteration
used to have multiple entries for the same symbol (due to the sort key),
so it marked the symbol (by freeing notes->src) to skip the same symbol
during the iteration.

However as the first iteration freed sym->notes->src, then the second
(or later) event cannot see the annotation for the deleted symbols
for that event even if it has some samples.

>
> But since I have no time to dive into this and Namhyung is ok with it,
> I'll merge it now.

Thanks,
Namhyung