Hi,
4b3a3212233a - "perf hists browser: Support flat callchains" seems to
have broken callchain display in tui mode when using !flat mode, or at
least changed it in an unintended manner.
Llooking at the same perf.data file:
perf report --tui --no-children -g
before:
- 16.69% swapper [kernel.kallsyms] [k] poll_idle
- poll_idle
+ 16.65% cpuidle_enter_state
+ 0.04% cpuidle_enter
- 3.51% swapper [kernel.kallsyms] [k] intel_idle
- intel_idle
+ 3.49% cpuidle_enter_state
+ 0.03% cpuidle_enter
- 1.35% postgres postgres [.] hash_search_with_hash_value
- hash_search_with_hash_value
- 0.32% BufTableLookup
+ ReadBuffer_common
- 0.18% LockAcquireExtended
+ 0.16% LockRelationOid
+ 0.02% XactLockTableInsert
+ 0.14% FetchPreparedStatement
+ 0.13% CreatePortal
+ 0.11% RelationIdGetRelation
+ 0.10% GetPortalByName
after:
- 16.69% swapper [kernel.kallsyms] [k] poll_idle
poll_idle
- 3.51% swapper [kernel.kallsyms] [k] intel_idle
intel_idle
- 1.35% postgres postgres [.] hash_search_with_hash_value
hash_search_with_hash_value
- 1.18% postgres postgres [.] AllocSetAlloc
AllocSetAlloc
- 0.85% postgres libc-2.22.so [.] __memcpy_sse2_unaligned
__memcpy_sse2_unaligned
as you can see after the aforementioned commit there's only one level of
callers reported.
--stdio output isn't affected.
I can provide an example perf.data file, but it apears to reproduce with
just about any profile here.
Greetings,
Andres Freund
Em Wed, Mar 30, 2016 at 02:34:18PM +0200, Andres Freund escreveu:
> Hi,
>
> 4b3a3212233a - "perf hists browser: Support flat callchains" seems to
> have broken callchain display in tui mode when using !flat mode, or at
> least changed it in an unintended manner.
humm, at first I thought this would be related to --percent-limit...
What tree/branch are you using? Can you try pressing 'L' to play with
the percent limit?
- Arnaldo
> Llooking at the same perf.data file:
> perf report --tui --no-children -g
> before:
> - 16.69% swapper [kernel.kallsyms] [k] poll_idle
> - poll_idle
> + 16.65% cpuidle_enter_state
> + 0.04% cpuidle_enter
> - 3.51% swapper [kernel.kallsyms] [k] intel_idle
> - intel_idle
> + 3.49% cpuidle_enter_state
> + 0.03% cpuidle_enter
> - 1.35% postgres postgres [.] hash_search_with_hash_value
> - hash_search_with_hash_value
> - 0.32% BufTableLookup
> + ReadBuffer_common
> - 0.18% LockAcquireExtended
> + 0.16% LockRelationOid
> + 0.02% XactLockTableInsert
> + 0.14% FetchPreparedStatement
> + 0.13% CreatePortal
> + 0.11% RelationIdGetRelation
> + 0.10% GetPortalByName
> after:
> - 16.69% swapper [kernel.kallsyms] [k] poll_idle
> poll_idle
> - 3.51% swapper [kernel.kallsyms] [k] intel_idle
> intel_idle
> - 1.35% postgres postgres [.] hash_search_with_hash_value
> hash_search_with_hash_value
> - 1.18% postgres postgres [.] AllocSetAlloc
> AllocSetAlloc
> - 0.85% postgres libc-2.22.so [.] __memcpy_sse2_unaligned
> __memcpy_sse2_unaligned
>
> as you can see after the aforementioned commit there's only one level of
> callers reported.
>
> --stdio output isn't affected.
>
> I can provide an example perf.data file, but it apears to reproduce with
> just about any profile here.
>
> Greetings,
>
> Andres Freund
On 2016-03-30 10:46:34 -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Mar 30, 2016 at 02:34:18PM +0200, Andres Freund escreveu:
> > Hi,
> >
> > 4b3a3212233a - "perf hists browser: Support flat callchains" seems to
> > have broken callchain display in tui mode when using !flat mode, or at
> > least changed it in an unintended manner.
>
> humm, at first I thought this would be related to --percent-limit...
I'm not using --percent-limit. Just to be sure, I did explicitly set it
to various values, and it looks unrelated.
> What tree/branch are you using? Can you try pressing 'L' to play with
> the percent limit?
I'm primarily using linus' tree, and bisected the behavioural down to
that individual commit.
It's somewhat weird that --stdio doesn't show the problem, but --tui
does. Hm.
I don't know the perf code at all, but skimming through the commit, the
following hunk looks suspicious:
@@ -263,7 +295,7 @@ static void callchain_node__init_have_children(struct callchain_node *node,
chain = list_entry(node->val.next, struct callchain_list, list);
chain->has_children = has_sibling;
- if (!list_empty(&node->val)) {
+ if (node->val.next != node->val.prev) {
chain = list_entry(node->val.prev, struct callchain_list, list);
chain->has_children = !RB_EMPTY_ROOT(&node->rb_root);
}
Reverting that individual change fixes things. I'm not actually sure
what the post 4b3a3212233a version actually tests for?
I think that actually explains why stdio works - nodes are always
unfolded in it, thus ->has_children isn't looked at.
Andres
Hi Andres,
On Wed, Mar 30, 2016 at 04:19:26PM +0200, Andres Freund wrote:
> On 2016-03-30 10:46:34 -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Mar 30, 2016 at 02:34:18PM +0200, Andres Freund escreveu:
> > > Hi,
> > >
> > > 4b3a3212233a - "perf hists browser: Support flat callchains" seems to
> > > have broken callchain display in tui mode when using !flat mode, or at
> > > least changed it in an unintended manner.
> >
> > humm, at first I thought this would be related to --percent-limit...
>
> I'm not using --percent-limit. Just to be sure, I did explicitly set it
> to various values, and it looks unrelated.
>
> > What tree/branch are you using? Can you try pressing 'L' to play with
> > the percent limit?
>
> I'm primarily using linus' tree, and bisected the behavioural down to
> that individual commit.
Thanks for reporting and finding this!
>
> It's somewhat weird that --stdio doesn't show the problem, but --tui
> does. Hm.
>
>
> I don't know the perf code at all, but skimming through the commit, the
> following hunk looks suspicious:
>
> @@ -263,7 +295,7 @@ static void callchain_node__init_have_children(struct callchain_node *node,
> chain = list_entry(node->val.next, struct callchain_list, list);
> chain->has_children = has_sibling;
>
> - if (!list_empty(&node->val)) {
> + if (node->val.next != node->val.prev) {
> chain = list_entry(node->val.prev, struct callchain_list, list);
> chain->has_children = !RB_EMPTY_ROOT(&node->rb_root);
> }
>
> Reverting that individual change fixes things. I'm not actually sure
> what the post 4b3a3212233a version actually tests for?
Yeah, this is it. It's my fault that I thought if the first chain
(node->val.next) was set by has_sibling, no need to go to the body
of the "if" statement when next == prev case. But it's not...
>
>
> I think that actually explains why stdio works - nodes are always
> unfolded in it, thus ->has_children isn't looked at.
Right, the ->has_children thing is only for TUI code which
folds/collapses the entries dynamically.
Do you mind resending the fix as a formal patch with my ack ?
Thanks,
Namhyung
Hi!
On 2016-03-31 01:00:10 +0900, Namhyung Kim wrote:
> On Wed, Mar 30, 2016 at 04:19:26PM +0200, Andres Freund wrote:
> > On 2016-03-30 10:46:34 -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Mar 30, 2016 at 02:34:18PM +0200, Andres Freund escreveu:
> > > > Hi,
> > > >
> > > > 4b3a3212233a - "perf hists browser: Support flat callchains" seems to
> > > > have broken callchain display in tui mode when using !flat mode, or at
> > > > least changed it in an unintended manner.
> > >
> > > humm, at first I thought this would be related to --percent-limit...
> >
> > I'm not using --percent-limit. Just to be sure, I did explicitly set it
> > to various values, and it looks unrelated.
> >
> > > What tree/branch are you using? Can you try pressing 'L' to play with
> > > the percent limit?
> >
> > I'm primarily using linus' tree, and bisected the behavioural down to
> > that individual commit.
>
> Thanks for reporting and finding this!
No problem. I'm somewhat surprised to be the first to report this, the
behavioural change confused me quite a bit. Maybe it took others about
as long as me to figure out it's actually a perf report problem ;)
> > I don't know the perf code at all, but skimming through the commit, the
> > following hunk looks suspicious:
> >
> > @@ -263,7 +295,7 @@ static void callchain_node__init_have_children(struct callchain_node *node,
> > chain = list_entry(node->val.next, struct callchain_list, list);
> > chain->has_children = has_sibling;
> >
> > - if (!list_empty(&node->val)) {
> > + if (node->val.next != node->val.prev) {
> > chain = list_entry(node->val.prev, struct callchain_list, list);
> > chain->has_children = !RB_EMPTY_ROOT(&node->rb_root);
> > }
> >
> > Reverting that individual change fixes things. I'm not actually sure
> > what the post 4b3a3212233a version actually tests for?
>
> Yeah, this is it. It's my fault that I thought if the first chain
> (node->val.next) was set by has_sibling, no need to go to the body
> of the "if" statement when next == prev case. But it's not...
> Do you mind resending the fix as a formal patch with my ack ?
Done.
I couldn't really explain the problem, so I handwaved the actual problem
away: I've not really grasped how the rbtree, callchain_lists and
callchain_node are intertwined.
My symptom description probably is inaccurate as well.
>From 017d80e3c98c7237183ac20bb1cfc81bd5e9a474 Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Wed, 30 Mar 2016 19:39:28 +0200
Subject: [PATCH] perf hists: Fix determination of a callchain node's
childlessness.
4b3a3212233a ("perf hists browser: Support flat callchains")
over-aggressively tried to optimize
callchain_node__init_have_children().
That lead to --tui mode not allowing to expand call chain elements if a
call chain element had only one parent. That's why --inverted callgraphs
looked halfway sane, but plain ones didn't.
Revert that individual optimization, it wasn't really related to the
rest of the commit.
Fixes: 4b3a3212233a042f48b7b8fedc64933e1ccd8643
Acked-by: Namhyung Kim <[email protected]>
Signed-off-by: Andres Freund <[email protected]>
---
tools/perf/ui/browsers/hists.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 4b98165..2a83414 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -337,7 +337,7 @@ static void callchain_node__init_have_children(struct callchain_node *node,
chain = list_entry(node->val.next, struct callchain_list, list);
chain->has_children = has_sibling;
- if (node->val.next != node->val.prev) {
+ if (!list_empty(&node->val)) {
chain = list_entry(node->val.prev, struct callchain_list, list);
chain->has_children = !RB_EMPTY_ROOT(&node->rb_root);
}
--
2.7.0.229.g701fa7f.dirty
Em Thu, Mar 31, 2016 at 01:00:10AM +0900, Namhyung Kim escreveu:
> Hi Andres,
>
> On Wed, Mar 30, 2016 at 04:19:26PM +0200, Andres Freund wrote:
> > On 2016-03-30 10:46:34 -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Mar 30, 2016 at 02:34:18PM +0200, Andres Freund escreveu:
> > > > Hi,
> > > >
> > > > 4b3a3212233a - "perf hists browser: Support flat callchains" seems to
> > > > have broken callchain display in tui mode when using !flat mode, or at
> > > > least changed it in an unintended manner.
> > >
> > > humm, at first I thought this would be related to --percent-limit...
> >
> > I'm not using --percent-limit. Just to be sure, I did explicitly set it
> > to various values, and it looks unrelated.
> >
> > > What tree/branch are you using? Can you try pressing 'L' to play with
> > > the percent limit?
> >
> > I'm primarily using linus' tree, and bisected the behavioural down to
> > that individual commit.
>
> Thanks for reporting and finding this!
Yeah, I noticed it now, we really need to do the equivalent of:
perf report --tui
E
P
Then get the perf.hist.N file before a patch and after it, to see if the
output changed :-\
Ditto for 'perf report --stdio' > before, after, diff.
- Arnaldo
> > It's somewhat weird that --stdio doesn't show the problem, but --tui
> > does. Hm.
> >
> >
> > I don't know the perf code at all, but skimming through the commit, the
> > following hunk looks suspicious:
> >
> > @@ -263,7 +295,7 @@ static void callchain_node__init_have_children(struct callchain_node *node,
> > chain = list_entry(node->val.next, struct callchain_list, list);
> > chain->has_children = has_sibling;
> >
> > - if (!list_empty(&node->val)) {
> > + if (node->val.next != node->val.prev) {
> > chain = list_entry(node->val.prev, struct callchain_list, list);
> > chain->has_children = !RB_EMPTY_ROOT(&node->rb_root);
> > }
> >
> > Reverting that individual change fixes things. I'm not actually sure
> > what the post 4b3a3212233a version actually tests for?
>
> Yeah, this is it. It's my fault that I thought if the first chain
> (node->val.next) was set by has_sibling, no need to go to the body
> of the "if" statement when next == prev case. But it's not...
>
> >
> >
> > I think that actually explains why stdio works - nodes are always
> > unfolded in it, thus ->has_children isn't looked at.
>
> Right, the ->has_children thing is only for TUI code which
> folds/collapses the entries dynamically.
>
> Do you mind resending the fix as a formal patch with my ack ?
>
> Thanks,
> Namhyung
Commit-ID: 909890355507e92bdaf648e73870f6b5df606da8
Gitweb: http://git.kernel.org/tip/909890355507e92bdaf648e73870f6b5df606da8
Author: Andres Freund <[email protected]>
AuthorDate: Wed, 30 Mar 2016 21:02:45 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Wed, 30 Mar 2016 18:08:39 -0300
perf hists: Fix determination of a callchain node's childlessness
The 4b3a3212233a ("perf hists browser: Support flat callchains") commit
over-aggressively tried to optimize callchain_node__init_have_children().
That lead to --tui mode not allowing to expand call chain elements if a
call chain element had only one parent. That's why --inverted callgraphs
looked halfway sane, but plain ones didn't.
Revert that individual optimization, it wasn't really related to the
rest of the commit.
Signed-off-by: Andres Freund <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 4b3a3212233a ("perf hists browser: Support flat callchains")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/ui/browsers/hists.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 4b98165..2a83414 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -337,7 +337,7 @@ static void callchain_node__init_have_children(struct callchain_node *node,
chain = list_entry(node->val.next, struct callchain_list, list);
chain->has_children = has_sibling;
- if (node->val.next != node->val.prev) {
+ if (!list_empty(&node->val)) {
chain = list_entry(node->val.prev, struct callchain_list, list);
chain->has_children = !RB_EMPTY_ROOT(&node->rb_root);
}
Hi,
Perhaps this should also go into stable? It'd be nice to fix that
regression for 4.4 and 4.5.
Regards,
Andres
On 2016-03-30 23:33:37 -0700, tip-bot for Andres Freund wrote:
> Commit-ID: 909890355507e92bdaf648e73870f6b5df606da8
> Gitweb: http://git.kernel.org/tip/909890355507e92bdaf648e73870f6b5df606da8
> Author: Andres Freund <[email protected]>
> AuthorDate: Wed, 30 Mar 2016 21:02:45 +0200
> Committer: Arnaldo Carvalho de Melo <[email protected]>
> CommitDate: Wed, 30 Mar 2016 18:08:39 -0300
>
> perf hists: Fix determination of a callchain node's childlessness
>
> The 4b3a3212233a ("perf hists browser: Support flat callchains") commit
> over-aggressively tried to optimize callchain_node__init_have_children().
>
> That lead to --tui mode not allowing to expand call chain elements if a
> call chain element had only one parent. That's why --inverted callgraphs
> looked halfway sane, but plain ones didn't.
>
> Revert that individual optimization, it wasn't really related to the
> rest of the commit.
>
> Signed-off-by: Andres Freund <[email protected]>
> Acked-by: Namhyung Kim <[email protected]>
> Tested-by: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Fixes: 4b3a3212233a ("perf hists browser: Support flat callchains")
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> ---
> tools/perf/ui/browsers/hists.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
> index 4b98165..2a83414 100644
> --- a/tools/perf/ui/browsers/hists.c
> +++ b/tools/perf/ui/browsers/hists.c
> @@ -337,7 +337,7 @@ static void callchain_node__init_have_children(struct callchain_node *node,
> chain = list_entry(node->val.next, struct callchain_list, list);
> chain->has_children = has_sibling;
>
> - if (node->val.next != node->val.prev) {
> + if (!list_empty(&node->val)) {
> chain = list_entry(node->val.prev, struct callchain_list, list);
> chain->has_children = !RB_EMPTY_ROOT(&node->rb_root);
> }
Em Sat, Apr 16, 2016 at 12:46:14PM -0700, Andres Freund escreveu:
> Hi,
>
> Perhaps this should also go into stable? It'd be nice to fix that
> regression for 4.4 and 4.5.
Yeah, that would be good, just send that request to [email protected].
- Arnaldo
> Regards,
>
> Andres
>
> On 2016-03-30 23:33:37 -0700, tip-bot for Andres Freund wrote:
> > Commit-ID: 909890355507e92bdaf648e73870f6b5df606da8
> > Gitweb: http://git.kernel.org/tip/909890355507e92bdaf648e73870f6b5df606da8
> > Author: Andres Freund <[email protected]>
> > AuthorDate: Wed, 30 Mar 2016 21:02:45 +0200
> > Committer: Arnaldo Carvalho de Melo <[email protected]>
> > CommitDate: Wed, 30 Mar 2016 18:08:39 -0300
> >
> > perf hists: Fix determination of a callchain node's childlessness
> >
> > The 4b3a3212233a ("perf hists browser: Support flat callchains") commit
> > over-aggressively tried to optimize callchain_node__init_have_children().
> >
> > That lead to --tui mode not allowing to expand call chain elements if a
> > call chain element had only one parent. That's why --inverted callgraphs
> > looked halfway sane, but plain ones didn't.
> >
> > Revert that individual optimization, it wasn't really related to the
> > rest of the commit.
> >
> > Signed-off-by: Andres Freund <[email protected]>
> > Acked-by: Namhyung Kim <[email protected]>
> > Tested-by: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Fixes: 4b3a3212233a ("perf hists browser: Support flat callchains")
> > Link: http://lkml.kernel.org/r/[email protected]
> > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> > ---
> > tools/perf/ui/browsers/hists.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
> > index 4b98165..2a83414 100644
> > --- a/tools/perf/ui/browsers/hists.c
> > +++ b/tools/perf/ui/browsers/hists.c
> > @@ -337,7 +337,7 @@ static void callchain_node__init_have_children(struct callchain_node *node,
> > chain = list_entry(node->val.next, struct callchain_list, list);
> > chain->has_children = has_sibling;
> >
> > - if (node->val.next != node->val.prev) {
> > + if (!list_empty(&node->val)) {
> > chain = list_entry(node->val.prev, struct callchain_list, list);
> > chain->has_children = !RB_EMPTY_ROOT(&node->rb_root);
> > }
Hi,
Perhaps the patch quoted below should also go into stable? It'd be nice
to fix that regression for 4.4 and 4.5. It's been integrated into 4.6
(909890355).
Regards,
Andres
On 2016-03-30 23:33:37 -0700, tip-bot for Andres Freund wrote:
> Commit-ID: 909890355507e92bdaf648e73870f6b5df606da8
> Gitweb: http://git.kernel.org/tip/909890355507e92bdaf648e73870f6b5df606da8
> Author: Andres Freund <[email protected]>
> AuthorDate: Wed, 30 Mar 2016 21:02:45 +0200
> Committer: Arnaldo Carvalho de Melo <[email protected]>
> CommitDate: Wed, 30 Mar 2016 18:08:39 -0300
>
> perf hists: Fix determination of a callchain node's childlessness
>
> The 4b3a3212233a ("perf hists browser: Support flat callchains") commit
> over-aggressively tried to optimize callchain_node__init_have_children().
>
> That lead to --tui mode not allowing to expand call chain elements if a
> call chain element had only one parent. That's why --inverted callgraphs
> looked halfway sane, but plain ones didn't.
>
> Revert that individual optimization, it wasn't really related to the
> rest of the commit.
>
> Signed-off-by: Andres Freund <[email protected]>
> Acked-by: Namhyung Kim <[email protected]>
> Tested-by: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Fixes: 4b3a3212233a ("perf hists browser: Support flat callchains")
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> ---
> tools/perf/ui/browsers/hists.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
> index 4b98165..2a83414 100644
> --- a/tools/perf/ui/browsers/hists.c
> +++ b/tools/perf/ui/browsers/hists.c
> @@ -337,7 +337,7 @@ static void callchain_node__init_have_children(struct callchain_node *node,
> chain = list_entry(node->val.next, struct callchain_list, list);
> chain->has_children = has_sibling;
>
> - if (node->val.next != node->val.prev) {
> + if (!list_empty(&node->val)) {
> chain = list_entry(node->val.prev, struct callchain_list, list);
> chain->has_children = !RB_EMPTY_ROOT(&node->rb_root);
> }