Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932620AbbKYCKg (ORCPT ); Tue, 24 Nov 2015 21:10:36 -0500 Received: from casper.infradead.org ([85.118.1.10]:41854 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753794AbbKYCKc (ORCPT ); Tue, 24 Nov 2015 21:10:32 -0500 Date: Tue, 24 Nov 2015 23:10:25 -0300 From: Arnaldo Carvalho de Melo To: Namhyung Kim Cc: Frederic Weisbecker , Ingo Molnar , linux-kernel@vger.kernel.org, Andi Kleen , David Ahern , Jiri Olsa , Kan Liang , Peter Zijlstra Subject: Re: [PATCH 34/37] perf hists browser: Support flat callchains Message-ID: <20151125021025.GM18140@kernel.org> References: <1447955603-24895-1-git-send-email-acme@kernel.org> <1447955603-24895-35-git-send-email-acme@kernel.org> <20151123151647.GD3587@lerouge> <20151124052708.GB2636@sejong> <20151124144551.GB30441@redhat.com> <20151125012608.GA6171@sejong> <20151125013418.GL18140@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20151125013418.GL18140@kernel.org> X-Url: http://acmel.wordpress.com User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7878 Lines: 187 Em Tue, Nov 24, 2015 at 10:34:18PM -0300, Arnaldo Carvalho de Melo escreveu: > Em Wed, Nov 25, 2015 at 10:26:08AM +0900, Namhyung Kim escreveu: > > On Tue, Nov 24, 2015 at 12:45:51PM -0200, Arnaldo Carvalho de Melo wrote: > > > Em Tue, Nov 24, 2015 at 02:27:08PM +0900, Namhyung Kim escreveu: > > > > On Mon, Nov 23, 2015 at 04:16:48PM +0100, Frederic Weisbecker wrote: > > > > > On Thu, Nov 19, 2015 at 02:53:20PM -0300, Arnaldo Carvalho de Melo wrote: > > > > > > From: Namhyung Kim > > > > > [...] > > > > Thus I simply copied callchain lists in parents to leaf nodes. Yes, > > > > it will consume some memory but can simplify the code. > > > > > > I haven't done any measuring, but I'm noticing that 'perf top -g' is > > > showing more warnings about not being able to process events fast enough > > > and so ends up losing events, I tried with --max-stack 16 and it helped, > > > this is just a heads up. > > > > OK, but it seems that it's not related to this patch since this patch > > only affects flat or folded callchain mode. > > Well, doesn't this patch makes some of the involved data structures > larger, thus putting more pressure on the L1 cache, etc? It may well be > related, but we need to measure. > > > > Perhaps my workstation workloads are gettning deeper callchains over > > > time, but perhaps this is the cost of processing callchains that is > > > increasing, I need to stop and try to quantify this. > > > > > > We really need to look at reducing the overhead of processing > > > callchains. > > > > Right, but with my multi-thread work, I realized that perf is getting > > heavier recently. I guess it's mostly due to the atomic refcount > > work. I need to get back to the multi-thread work.. > > We really need to measure this ;-) So, something strange, if I use: [acme@zoo linux]$ cat ~/bin/allmod rm -rf ../build/allmodconfig/ ; mkdir ../build/allmodconfig/ ; make O=../build/allmodconfig/ allmodconfig ; make -j32 O=../build/allmodconfig [acme@zoo linux]$ To generate background load, I don't see that much this: + 8.55% 8.49% libc-2┌─Warning!──────────────────────────────────────────────┐ ▒ + 7.08% 6.98% perf │Events are being lost, check IO/CPU overload! │ ▒ + 6.84% 0.04% perf │ │ ▒ + 6.01% 0.09% perf │You may want to run 'perf' using a RT scheduler policy:│ ▒ + 5.26% 0.13% [kerne│ │t▒ + 4.96% 1.50% perf │ perf top -r 80 │ ▒ + 4.76% 3.58% perf │ │ ▒ + 4.69% 0.05% perf │Or reduce the sampling frequency. │ ▒ Its with a low loadavg: [acme@zoo linux]$ cat /proc/loadavg 0.75 0.79 0.64 3/549 21259 That it pops up :-\ If I take a snapshot with 'P' # head -40 perf.hist.0 + 21.43% 21.09% libc-2.20.so [.] _int_malloc + 19.49% 0.00% perf [.] cmd_top + 19.46% 0.02% perf [.] perf_top__mmap_read_idx + 19.03% 0.06% perf [.] hist_entry_iter__add + 16.46% 1.85% perf [.] iter_add_next_cumulative_entry + 12.09% 11.98% libc-2.20.so [.] free + 10.68% 10.61% libc-2.20.so [.] __libc_calloc + 9.61% 0.09% perf [.] hists__decay_entries + 8.92% 8.85% libc-2.20.so [.] malloc_consolidate + 8.17% 6.33% perf [.] free_callchain_node + 7.94% 0.09% perf [.] hist_entry__delete + 6.22% 0.03% perf [.] callchain_append + 6.20% 6.11% perf [.] append_chain_children + 5.44% 1.50% perf [.] __hists__add_entry + 4.34% 0.14% [kernel] [k] entry_SYSCALL_64_fastpath + 3.69% 3.67% perf [.] sort__dso_cmp + 3.12% 0.20% perf [.] hists__output_resort + 2.88% 0.00% [unknown] [.] 0x6d86258d4c544155 + 2.88% 0.00% libc-2.20.so [.] __libc_start_main + 2.88% 0.00% perf [.] main + 2.88% 0.00% perf [.] run_builtin + 2.66% 0.00% libpthread-2.20.so [.] start_thread + 2.66% 0.00% perf [.] display_thread_tui + 2.66% 0.00% perf [.] perf_evlist__tui_browse_hists + 2.66% 0.00% perf [.] perf_evsel__hists_browse + 2.49% 0.07% [kernel] [k] sys_futex + 2.42% 0.06% [kernel] [k] do_futex 2.24% 0.00% perf [.] perf_top__sort_new_samples + 1.92% 0.51% perf [.] hists__collapse_resort + 1.87% 1.86% perf [.] hpp__sort_overhead_acc + 1.69% 0.09% libc-2.20.so [.] __lll_unlock_wake_private + 1.45% 1.44% perf [.] hpp__nop_cmp 1.45% 1.43% perf [.] rb_erase + 1.44% 1.42% perf [.] __sort__hpp_cmp 1.31% 0.16% libc-2.20.so [.] __lll_lock_wait_private 1.18% 0.19% [kernel] [k] futex_wake + 1.13% 1.13% perf [.] sort__sym_cmp 1.11% 0.02% [kernel] [k] schedule 1.09% 0.22% [kernel] [k] __schedule 0.99% 0.08% [kernel] [k] futex_wait So its quite a lot of mallocs, i.e. just do a 'perf top -g' and wait a bit, malloc goes on bubbling up to the top, about the same time it starts showing that popup screen telling that we're losing events. If I use --no-children, to see if there is a difference, using either --call-graph caller or callee, it doesn't get more than about 1%. Ok, now I tried with "perf top --call-graph caller" i.e. with --children, and looked at the _int_malloc callchains I get really long, bogus callchains, see below. That explains why I don't lose events when I use --max-stack. I'll have to stop now, and I put the full perf.hist.1 at http://vger.kernel.org/~acme/perf/perf.hist.1.xz - Arnaldo [root@zoo ~]# head -60 perf.hist.1 - 17.92% 17.10% libc-2.20.so [.] _int_malloc + 112.80% _int_malloc 11.14% 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 0x41bf5118 0x41bf5068 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/