Date: Tue, 24 Nov 2015 23:10:25 -0300
From: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>, Ingo Molnar <mingo@kernel.org>,
        linux-kernel@vger.kernel.org, Andi Kleen <andi@firstfloor.org>,
        David Ahern <dsahern@gmail.com>, Jiri Olsa <jolsa@redhat.com>,
        Kan Liang <kan.liang@intel.com>,
        Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 34/37] perf hists browser: Support flat callchains
Message-ID: <20151125021025.GM18140@kernel.org>
References: <1447955603-24895-1-git-send-email-acme@kernel.org>
 <1447955603-24895-35-git-send-email-acme@kernel.org>
 <20151123151647.GD3587@lerouge>
 <20151124052708.GB2636@sejong>
 <20151124144551.GB30441@redhat.com>
 <20151125012608.GA6171@sejong>
 <20151125013418.GL18140@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20151125013418.GL18140@kernel.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 7878
Lines: 187

Em Tue, Nov 24, 2015 at 10:34:18PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Nov 25, 2015 at 10:26:08AM +0900, Namhyung Kim escreveu:
> > On Tue, Nov 24, 2015 at 12:45:51PM -0200, Arnaldo Carvalho de Melo wrote:
> > > Em Tue, Nov 24, 2015 at 02:27:08PM +0900, Namhyung Kim escreveu:
> > > > On Mon, Nov 23, 2015 at 04:16:48PM +0100, Frederic Weisbecker wrote:
> > > > > On Thu, Nov 19, 2015 at 02:53:20PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > > From: Namhyung Kim <namhyung@kernel.org>
> > > > > [...]
> > > > Thus I simply copied callchain lists in parents to leaf nodes.  Yes,
> > > > it will consume some memory but can simplify the code.
> > > 
> > > I haven't done any measuring, but I'm noticing that 'perf top -g' is
> > > showing more warnings about not being able to process events fast enough
> > > and so ends up losing events, I tried with --max-stack 16 and it helped,
> > > this is just a heads up.
> > 
> > OK, but it seems that it's not related to this patch since this patch
> > only affects flat or folded callchain mode.
> 
> Well, doesn't this patch makes some of the involved data structures
> larger, thus putting more pressure on the L1 cache, etc? It may well be
> related, but we need to measure.
> 
> > > Perhaps my workstation workloads are gettning deeper callchains over
> > > time, but perhaps this is the cost of processing callchains that is
> > > increasing, I need to stop and try to quantify this.
> > > 
> > > We really need to look at reducing the overhead of processing
> > > callchains.
> > 
> > Right, but with my multi-thread work, I realized that perf is getting
> > heavier recently.  I guess it's mostly due to the atomic refcount
> > work.  I need to get back to the multi-thread work..
> 
> We really need to measure this ;-)

So, something strange, if I use:

[acme@zoo linux]$ cat ~/bin/allmod 
rm -rf ../build/allmodconfig/ ; mkdir ../build/allmodconfig/ ; make O=../build/allmodconfig/ allmodconfig ; make -j32 O=../build/allmodconfig
[acme@zoo linux]$

To generate background load, I don't see that much this:

+    8.55%     8.49%  libc-2┌─Warning!──────────────────────────────────────────────┐ ▒
+    7.08%     6.98%  perf  │Events are being lost, check IO/CPU overload!          │ ▒
+    6.84%     0.04%  perf  │                                                       │ ▒
+    6.01%     0.09%  perf  │You may want to run 'perf' using a RT scheduler policy:│ ▒
+    5.26%     0.13%  [kerne│                                                       │t▒
+    4.96%     1.50%  perf  │ perf top -r 80                                        │ ▒
+    4.76%     3.58%  perf  │                                                       │ ▒
+    4.69%     0.05%  perf  │Or reduce the sampling frequency.                      │ ▒

Its with a low loadavg:

[acme@zoo linux]$ cat /proc/loadavg 
0.75 0.79 0.64 3/549 21259

That it pops up :-\

If I take a snapshot with 'P'

# head -40 perf.hist.0
+   21.43%    21.09%  libc-2.20.so          [.] _int_malloc
+   19.49%     0.00%  perf                  [.] cmd_top
+   19.46%     0.02%  perf                  [.] perf_top__mmap_read_idx
+   19.03%     0.06%  perf                  [.] hist_entry_iter__add
+   16.46%     1.85%  perf                  [.] iter_add_next_cumulative_entry
+   12.09%    11.98%  libc-2.20.so          [.] free
+   10.68%    10.61%  libc-2.20.so          [.] __libc_calloc
+    9.61%     0.09%  perf                  [.] hists__decay_entries
+    8.92%     8.85%  libc-2.20.so          [.] malloc_consolidate
+    8.17%     6.33%  perf                  [.] free_callchain_node
+    7.94%     0.09%  perf                  [.] hist_entry__delete
+    6.22%     0.03%  perf                  [.] callchain_append
+    6.20%     6.11%  perf                  [.] append_chain_children
+    5.44%     1.50%  perf                  [.] __hists__add_entry
+    4.34%     0.14%  [kernel]              [k] entry_SYSCALL_64_fastpath
+    3.69%     3.67%  perf                  [.] sort__dso_cmp
+    3.12%     0.20%  perf                  [.] hists__output_resort
+    2.88%     0.00%  [unknown]             [.] 0x6d86258d4c544155
+    2.88%     0.00%  libc-2.20.so          [.] __libc_start_main
+    2.88%     0.00%  perf                  [.] main
+    2.88%     0.00%  perf                  [.] run_builtin
+    2.66%     0.00%  libpthread-2.20.so    [.] start_thread
+    2.66%     0.00%  perf                  [.] display_thread_tui
+    2.66%     0.00%  perf                  [.] perf_evlist__tui_browse_hists
+    2.66%     0.00%  perf                  [.] perf_evsel__hists_browse
+    2.49%     0.07%  [kernel]              [k] sys_futex
+    2.42%     0.06%  [kernel]              [k] do_futex
     2.24%     0.00%  perf                  [.] perf_top__sort_new_samples
+    1.92%     0.51%  perf                  [.] hists__collapse_resort
+    1.87%     1.86%  perf                  [.] hpp__sort_overhead_acc
+    1.69%     0.09%  libc-2.20.so          [.] __lll_unlock_wake_private
+    1.45%     1.44%  perf                  [.] hpp__nop_cmp
     1.45%     1.43%  perf                  [.] rb_erase
+    1.44%     1.42%  perf                  [.] __sort__hpp_cmp
     1.31%     0.16%  libc-2.20.so          [.] __lll_lock_wait_private
     1.18%     0.19%  [kernel]              [k] futex_wake
+    1.13%     1.13%  perf                  [.] sort__sym_cmp
     1.11%     0.02%  [kernel]              [k] schedule
     1.09%     0.22%  [kernel]              [k] __schedule
     0.99%     0.08%  [kernel]              [k] futex_wait

So its quite a lot of mallocs, i.e. just do a 'perf top -g' and wait a
bit, malloc goes on bubbling up to the top, about the same time it
starts showing that popup screen telling that we're losing events.

If I use --no-children, to see if there is a difference, using either
--call-graph caller or callee, it doesn't get more than about 1%.

Ok, now I tried with "perf top --call-graph caller" i.e. with
--children, and looked at the _int_malloc callchains I get really long,
bogus callchains, see below. That explains why I don't lose events when
I use --max-stack.

I'll have to stop now, and I put the full perf.hist.1 at
http://vger.kernel.org/~acme/perf/perf.hist.1.xz

- Arnaldo

[root@zoo ~]# head -60 perf.hist.1
-   17.92%    17.10%  libc-2.20.so                        [.] _int_malloc
   + 112.80% _int_malloc
     11.14% 0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
        0x41bf5118
        0x41bf5068
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/