Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752633AbbKYVEa (ORCPT ); Wed, 25 Nov 2015 16:04:30 -0500 Received: from mail-pa0-f51.google.com ([209.85.220.51]:34505 "EHLO mail-pa0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750928AbbKYVE1 (ORCPT ); Wed, 25 Nov 2015 16:04:27 -0500 Date: Thu, 26 Nov 2015 06:03:57 +0900 From: Namhyung Kim To: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker , Ingo Molnar , linux-kernel@vger.kernel.org, Andi Kleen , David Ahern , Jiri Olsa , Kan Liang , Peter Zijlstra Subject: Re: [PATCH 34/37] perf hists browser: Support flat callchains Message-ID: <20151125210357.GA13940@danjae.kornet> References: <1447955603-24895-1-git-send-email-acme@kernel.org> <1447955603-24895-35-git-send-email-acme@kernel.org> <20151123151647.GD3587@lerouge> <20151124052708.GB2636@sejong> <20151124144551.GB30441@redhat.com> <20151125012608.GA6171@sejong> <20151125013418.GL18140@kernel.org> <20151125021025.GM18140@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20151125021025.GM18140@kernel.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8500 Lines: 197 Hi Arnaldo, On Tue, Nov 24, 2015 at 11:10:25PM -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Nov 24, 2015 at 10:34:18PM -0300, Arnaldo Carvalho de Melo escreveu: > > Em Wed, Nov 25, 2015 at 10:26:08AM +0900, Namhyung Kim escreveu: > > > On Tue, Nov 24, 2015 at 12:45:51PM -0200, Arnaldo Carvalho de Melo wrote: > > > > Em Tue, Nov 24, 2015 at 02:27:08PM +0900, Namhyung Kim escreveu: > > > > > On Mon, Nov 23, 2015 at 04:16:48PM +0100, Frederic Weisbecker wrote: > > > > > > On Thu, Nov 19, 2015 at 02:53:20PM -0300, Arnaldo Carvalho de Melo wrote: > > > > > > > From: Namhyung Kim > > > > > > [...] > > > > > Thus I simply copied callchain lists in parents to leaf nodes. Yes, > > > > > it will consume some memory but can simplify the code. > > > > > > > > I haven't done any measuring, but I'm noticing that 'perf top -g' is > > > > showing more warnings about not being able to process events fast enough > > > > and so ends up losing events, I tried with --max-stack 16 and it helped, > > > > this is just a heads up. > > > > > > OK, but it seems that it's not related to this patch since this patch > > > only affects flat or folded callchain mode. > > > > Well, doesn't this patch makes some of the involved data structures > > larger, thus putting more pressure on the L1 cache, etc? It may well be > > related, but we need to measure. > > > > > > Perhaps my workstation workloads are gettning deeper callchains over > > > > time, but perhaps this is the cost of processing callchains that is > > > > increasing, I need to stop and try to quantify this. > > > > > > > > We really need to look at reducing the overhead of processing > > > > callchains. > > > > > > Right, but with my multi-thread work, I realized that perf is getting > > > heavier recently. I guess it's mostly due to the atomic refcount > > > work. I need to get back to the multi-thread work.. > > > > We really need to measure this ;-) > > So, something strange, if I use: > > [acme@zoo linux]$ cat ~/bin/allmod > rm -rf ../build/allmodconfig/ ; mkdir ../build/allmodconfig/ ; make O=../build/allmodconfig/ allmodconfig ; make -j32 O=../build/allmodconfig > [acme@zoo linux]$ > > To generate background load, I don't see that much this: > > + 8.55% 8.49% libc-2┌─Warning!──────────────────────────────────────────────┐ ▒ > + 7.08% 6.98% perf │Events are being lost, check IO/CPU overload! │ ▒ > + 6.84% 0.04% perf │ │ ▒ > + 6.01% 0.09% perf │You may want to run 'perf' using a RT scheduler policy:│ ▒ > + 5.26% 0.13% [kerne│ │t▒ > + 4.96% 1.50% perf │ perf top -r 80 │ ▒ > + 4.76% 3.58% perf │ │ ▒ > + 4.69% 0.05% perf │Or reduce the sampling frequency. │ ▒ > > Its with a low loadavg: > > [acme@zoo linux]$ cat /proc/loadavg > 0.75 0.79 0.64 3/549 21259 > > That it pops up :-\ > > If I take a snapshot with 'P' > > # head -40 perf.hist.0 > + 21.43% 21.09% libc-2.20.so [.] _int_malloc > + 19.49% 0.00% perf [.] cmd_top > + 19.46% 0.02% perf [.] perf_top__mmap_read_idx > + 19.03% 0.06% perf [.] hist_entry_iter__add > + 16.46% 1.85% perf [.] iter_add_next_cumulative_entry > + 12.09% 11.98% libc-2.20.so [.] free > + 10.68% 10.61% libc-2.20.so [.] __libc_calloc > + 9.61% 0.09% perf [.] hists__decay_entries > + 8.92% 8.85% libc-2.20.so [.] malloc_consolidate > + 8.17% 6.33% perf [.] free_callchain_node > + 7.94% 0.09% perf [.] hist_entry__delete > + 6.22% 0.03% perf [.] callchain_append > + 6.20% 6.11% perf [.] append_chain_children > + 5.44% 1.50% perf [.] __hists__add_entry > + 4.34% 0.14% [kernel] [k] entry_SYSCALL_64_fastpath > + 3.69% 3.67% perf [.] sort__dso_cmp > + 3.12% 0.20% perf [.] hists__output_resort > + 2.88% 0.00% [unknown] [.] 0x6d86258d4c544155 > + 2.88% 0.00% libc-2.20.so [.] __libc_start_main > + 2.88% 0.00% perf [.] main > + 2.88% 0.00% perf [.] run_builtin > + 2.66% 0.00% libpthread-2.20.so [.] start_thread > + 2.66% 0.00% perf [.] display_thread_tui > + 2.66% 0.00% perf [.] perf_evlist__tui_browse_hists > + 2.66% 0.00% perf [.] perf_evsel__hists_browse > + 2.49% 0.07% [kernel] [k] sys_futex > + 2.42% 0.06% [kernel] [k] do_futex > 2.24% 0.00% perf [.] perf_top__sort_new_samples > + 1.92% 0.51% perf [.] hists__collapse_resort > + 1.87% 1.86% perf [.] hpp__sort_overhead_acc > + 1.69% 0.09% libc-2.20.so [.] __lll_unlock_wake_private > + 1.45% 1.44% perf [.] hpp__nop_cmp > 1.45% 1.43% perf [.] rb_erase > + 1.44% 1.42% perf [.] __sort__hpp_cmp > 1.31% 0.16% libc-2.20.so [.] __lll_lock_wait_private > 1.18% 0.19% [kernel] [k] futex_wake > + 1.13% 1.13% perf [.] sort__sym_cmp > 1.11% 0.02% [kernel] [k] schedule > 1.09% 0.22% [kernel] [k] __schedule > 0.99% 0.08% [kernel] [k] futex_wait > > So its quite a lot of mallocs, i.e. just do a 'perf top -g' and wait a > bit, malloc goes on bubbling up to the top, about the same time it > starts showing that popup screen telling that we're losing events. > > If I use --no-children, to see if there is a difference, using either > --call-graph caller or callee, it doesn't get more than about 1%. > > Ok, now I tried with "perf top --call-graph caller" i.e. with > --children, and looked at the _int_malloc callchains I get really long, > bogus callchains, see below. That explains why I don't lose events when > I use --max-stack. > > I'll have to stop now, and I put the full perf.hist.1 at > http://vger.kernel.org/~acme/perf/perf.hist.1.xz > > - Arnaldo > > [root@zoo ~]# head -60 perf.hist.1 > - 17.92% 17.10% libc-2.20.so [.] _int_malloc > + 112.80% _int_malloc > 11.14% 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 > 0x41bf5118 > 0x41bf5068 Hmm.. we should cut off the loop in the broken callchains. And it'd be good to apply --hide-unresolved to callchains as well. I'll take a look at it. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/